
| PROVIDER↕ | MODEL↕ | CATEGORY | INPUT↑ | OUTPUT↕ | CONTEXT↕ | BLENDED / 1M | NOTES |
|---|---|---|---|---|---|---|---|
Replicate | Whisper Large v3 Audio | MULTIMODAL | $0.00360 | — | — | Speech transcription | |
OpenAI | text-embedding-3-small Embedding | $0.0200 | — | 8K | 1536 dimensions | ||
Google | Gemma 2 2B Nano | EFFICIENT | $0.0200 | $0.0200 | 8K | Smallest Gemma | |
Amazon | Titan Embeddings V2 Embedding | $0.0200 | — | 8K | 1536 dimensions | ||
Google | text-embedding-004 Embedding | $0.0250 | — | 2K | 768 dimensions | ||
Google | text-embedding-gecko-003 Legacy Embed | $0.0250 | — | 3K | Legacy embedding | ||
Qwen | Qwen2.5-1.5B-Instruct Nano | EFFICIENT | $0.0300 | $0.0300 | 32K | Tiny Qwen | |
Meta | Llama 3.2 1B Nano | EFFICIENT | $0.0400 | $0.0400 | 128K | Smallest Llama | |
Replicate | SDXL (Rep) Image Gen | MULTIMODAL | $0.0400 | — | — | Image generation | |
Nvidia | Phi-3-Mini-4K (NIM) Nano | EFFICIENT | $0.0400 | $0.0400 | 4K | Tiny NIM | |
Qwen | Qwen2.5-3B-Instruct Micro | EFFICIENT | $0.0500 | $0.0500 | 32K | 3B micro | |
MiniMax | ABAB 5.5c Compact | EFFICIENT | $0.0500 | $0.0500 | 16K | Chat-optimized | |
IBM | Granite 3.1 2B Instruct Nano | EFFICIENT | $0.0300 | $0.1000 | 128K | Tiny Granite | |
ByteDance | Doubao-Lite-128k Fast | EFFICIENT | $0.0400 | $0.0900 | 128K | Fast 128K | |
ByteDance | Doubao-Lite-32k Budget | EFFICIENT | $0.0400 | $0.0900 | 32K | Budget tier | |
Groq | Llama 3.1 8B (Groq) Fast Nano | EFFICIENT | $0.0500 | $0.0800 | 128K | Fastest 8B | |
Meta | Llama 3.2 3B Micro | EFFICIENT | $0.0600 | $0.0600 | 128K | Ultra-compact | |
Mistral | Mistral 7B v0.1 Legacy | EFFICIENT | $0.0600 | $0.0600 | 8K | Original Mistral 7B | |
Groq | Llama 3.2 3B (Groq) Fast Micro | EFFICIENT | $0.0600 | $0.0600 | 128K | Micro on LPU | |
DeepInfra | Gemma 2 9B (DI) Compact | EFFICIENT | $0.0600 | $0.0600 | 8K | Cheapest Gemma | |
Voyage AI | voyage-3-lite Efficient | $0.0650 | — | 32K | Speed-optimized | ||
Amazon | Nova Micro Micro | EFFICIENT | $0.0350 | $0.1400 | 128K | Text-only micro | |
DeepInfra | Mistral 7B (DI) Nano | EFFICIENT | $0.0700 | $0.0700 | 32K | Cheapest Mistral | |
Lepton AI | Mistral 7B (Lepton) Nano | EFFICIENT | $0.0700 | $0.0700 | 32K | Cheapest at Lepton | |
Microsoft | Phi-2 Legacy | EFFICIENT | $0.0700 | $0.0700 | 2K | Original Phi 2.7B | |
Google | Gemini 1.5 Flash-8B Nano | EFFICIENT | $0.0375 | $0.1500 | 1M | Smallest Gemini | |
Cohere | Command R7B Compact | EFFICIENT | $0.0375 | $0.1500 | 128K | 7B model | |
Microsoft | Phi-4 Mini Tiny | REASONING | $0.0400 | $0.1600 | 128K | 3.8B mini reasoning | |
Mistral | Mistral 7B v0.3 Open | EFFICIENT | $0.0800 | $0.0800 | 32K | Popular open model | |
Amazon | Nova Canvas Image Gen | MULTIMODAL | $0.0800 | — | — | AI image generation | |
Google | Gemma 3 9B Open | EFFICIENT | $0.0900 | $0.0900 | 128K | Gemma 3 compact | |
Google | Gemma 2 9B Open | EFFICIENT | $0.0900 | $0.0900 | 8K | Compact Gemma 2 | |
OpenAI | text-embedding-ada-002 Legacy Embed | $0.1000 | — | 8K | Classic embedding | ||
Meta | Llama 3.1 8B Compact | EFFICIENT | $0.1000 | $0.1000 | 128K | Compact open | |
Mistral | mistral-embed Embedding | $0.1000 | — | 8K | 1024 dimensions | ||
Cohere | Embed v3 English Embedding | $0.1000 | — | 512 | 1024 dimensions | ||
Cohere | Embed v3 Multilingual Embedding | $0.1000 | — | 512 | 100+ languages | ||
Together AI | Llama 3.1 8B (Together) Compact | EFFICIENT | $0.1000 | $0.1000 | 128K | Budget inference | |
Fireworks AI | Llama 3.1 8B (fw) Nano | EFFICIENT | $0.1000 | $0.1000 | 131K | Cheapest 8B | |
Cerebras | Llama-3.1-8B (Cerebras) Ultra-Fast Nano | EFFICIENT | $0.1000 | $0.1000 | 128K | Fastest 8B ever | |
Upstage | Solar Embedding Embedding | $0.1000 | — | 4K | Solar embeddings | ||
Qwen | Qwen2.5-7B-Instruct Compact | EFFICIENT | $0.1000 | $0.1000 | 128K | 7B compact | |
Qwen | Qwen2.5-Coder-7B Code Nano | EFFICIENT | $0.1000 | $0.1000 | 128K | Small code model | |
Qwen | Qwen1.5-7B-Chat Legacy Compact | EFFICIENT | $0.1000 | $0.1000 | 32K | Qwen 1.5 7B | |
Stability AI | StableLM 2 1.6B Nano | EFFICIENT | $0.1000 | $0.1000 | 4K | Tiny language model | |
Stability AI | StableCode 3B Code Nano | EFFICIENT | $0.1000 | $0.1000 | 16K | Tiny code model | |
DeepSeek | DeepSeek-R1-Distill-8B Distilled | REASONING | $0.0700 | $0.2000 | 128K | 8B distilled | |
Replicate | Llama 3 8B (Rep) Compact | EFFICIENT | $0.0500 | $0.2500 | 8K | Budget option | |
IBM | Granite 3.1 8B Instruct Compact | EFFICIENT | $0.0500 | $0.2500 | 128K | Enterprise LLM | |
IBM | Granite 3.0 8B Dense Previous Gen | EFFICIENT | $0.0500 | $0.2500 | 4K | Dense architecture | |
Amazon | Nova Lite Efficient | EFFICIENT | $0.0600 | $0.2400 | 300K | Ultra low-cost | |
Voyage AI | voyage-3 Balanced | $0.1200 | — | 32K | General purpose | ||
Voyage AI | voyage-finance-2 Finance | $0.1200 | — | 32K | Finance domain | ||
Voyage AI | voyage-law-2 Legal | $0.1200 | — | 32K | Legal domain | ||
Voyage AI | voyage-multilingual-2 Multilingual | $0.1200 | — | 32K | 100+ languages | ||
Moonshot AI | Moonshot v1 8k Compact | EFFICIENT | $0.1200 | $0.1200 | 8K | Fast response | |
OpenAI | text-embedding-3-large Embedding | $0.1300 | — | 8K | 3072 dimensions | ||
Microsoft | Phi-4 Frontier SLM | REASONING | $0.0700 | $0.2800 | 16K | 14B reasoning SLM | |
01.AI | Yi-Lightning Fast | EFFICIENT | $0.1400 | $0.1400 | 16K | Ultra-fast Yi | |
Zhipu AI | GLM-4-Air Efficient | EFFICIENT | $0.1400 | $0.1400 | 128K | Affordable GLM | |
Google | Gemini 2.0 Flash Lite Micro | EFFICIENT | $0.0750 | $0.3000 | 1M | Lowest cost Gemini | |
Google | Gemini 1.5 Flash Previous Gen | EFFICIENT | $0.0750 | $0.3000 | 1M | Previous Flash | |
Mistral | Pixtral 12B Compact Vision | MULTIMODAL | $0.1500 | $0.1500 | 128K | Vision-language 12B | |
Mistral | Mistral Nemo Open | EFFICIENT | $0.1500 | $0.1500 | 128K | 12B open model | |
Anyscale | Mistral 7B (Anyscale) Nano | EFFICIENT | $0.1500 | $0.1500 | 32K | Anyscale serverless | |
Nvidia | Mistral-NeMo-12B (NIM) Efficient | EFFICIENT | $0.1500 | $0.1500 | 128K | NIM deployment | |
01.AI | Yi-6B Nano | EFFICIENT | $0.1500 | $0.1500 | 4K | Original Yi 6B | |
Meta | Llama 3.2 11B Vision Compact Vision | MULTIMODAL | $0.1600 | $0.1600 | 128K | Compact vision | |
Mistral | Mistral Small 3.2 Efficient | EFFICIENT | $0.1000 | $0.3000 | 128K | Cost leader | |
ByteDance | Doubao-Pro-32k Balanced | EFFICIENT | $0.1100 | $0.2800 | 32K | Doubao standard | |
Amazon | Titan Text Lite Lightweight | EFFICIENT | $0.1500 | $0.2000 | 4K | Low-latency Titan | |
DeepSeek | DeepSeek-R1-Distill-14B Distilled | REASONING | $0.1000 | $0.3500 | 128K | 14B distilled | |
Meta | Llama 4 Scout Efficient | EFFICIENT | $0.1000 | $0.3500 | 512K | Long-context MoE | |
Meta | Llama 2 7B Chat Legacy | EFFICIENT | $0.1800 | $0.1800 | 4K | Compact Llama 2 | |
Meta | CodeLlama 7B Instruct Code | EFFICIENT | $0.1800 | $0.1800 | 16K | Compact code | |
Groq | Llama 3.2 11B Vision (Groq) Fast Vision | MULTIMODAL | $0.1800 | $0.1800 | 128K | Vision on LPU | |
Voyage AI | voyage-3-large Flagship | $0.1800 | — | 32K | Best retrieval | ||
Voyage AI | voyage-code-3 Code | $0.1800 | — | 32K | Code retrieval | ||
TII / Falcon | Falcon 7B Instruct Nano | EFFICIENT | $0.1800 | $0.1800 | 2K | Original Falcon | |
DeepSeek | DeepSeek-Coder-V2 Code | EFFICIENT | $0.1400 | $0.2800 | 128K | Code specialist | |
OpenAI | GPT-4.1 nano Efficient | EFFICIENT | $0.1000 | $0.4000 | 1M | Cheapest OpenAI model | |
Google | Gemini 2.0 Flash Efficient | EFFICIENT | $0.1000 | $0.4000 | 1M | Next-gen Flash | |
Meta | Llama 3 8B Previous Gen | EFFICIENT | $0.2000 | $0.2000 | 8K | Llama 3 compact | |
Groq | Gemma 2 9B (Groq) Fast Compact | EFFICIENT | $0.2000 | $0.2000 | 8K | Google Gemma fast | |
Fireworks AI | Mixtral 8x7B (fw) Serverless | EFFICIENT | $0.2000 | $0.2000 | 32K | MoE serverless | |
TII / Falcon | Falcon 11B Compact | EFFICIENT | $0.2000 | $0.2000 | 8K | New Falcon 11B | |
Meta | Llama 2 13B Chat Legacy | EFFICIENT | $0.2200 | $0.2200 | 4K | Mid Llama 2 | |
Meta | CodeLlama 13B Instruct Code | EFFICIENT | $0.2200 | $0.2200 | 16K | Mid code model | |
Mistral | Mixtral 8x7B v0.1 Open MoE | EFFICIENT | $0.2400 | $0.2400 | 32K | Classic MoE | |
Groq | Mixtral 8x7B (Groq) Fast MoE | EFFICIENT | $0.2400 | $0.2400 | 32K | MoE on LPU | |
DeepInfra | Mixtral 8x7B (DI) MoE | EFFICIENT | $0.2400 | $0.2400 | 32K | MoE managed | |
Microsoft | Phi-3.5 Mini Compact | EFFICIENT | $0.1300 | $0.5200 | 128K | 3.8B long-context | |
Microsoft | Phi-3 Mini Tiny | EFFICIENT | $0.1300 | $0.5200 | 128K | 3.8B mini | |
Mistral | Codestral Mamba Code Mamba | EFFICIENT | $0.2500 | $0.2500 | 256K | State-space code | |
AI21 Labs | Jamba 1.5 Mini Efficient | EFFICIENT | $0.2000 | $0.4000 | 256K | Compact SSM hybrid | |
Google | Gemma 3 27B Open | FRONTIER | $0.2700 | $0.2700 | 128K | Largest Gemma 3 | |
Google | Gemma 2 27B Open | EFFICIENT | $0.2700 | $0.2700 | 8K | Open weights | |
Meta | Llama 3.3 70B Open | EFFICIENT | $0.2300 | $0.4000 | 128K | Strong open weights | |
DeepInfra | Llama 3.3 70B (DI) Serverless | EFFICIENT | $0.2300 | $0.4000 | 128K | Cost-effective | |
OpenAI | GPT-4o mini Efficient | EFFICIENT | $0.1500 | $0.6000 | 128K | Cost-optimized | |
Google | Gemini 2.5 Flash Balanced | MULTIMODAL | $0.1500 | $0.6000 | 1M | Speed-optimized | |
Cohere | Command R Balanced | EFFICIENT | $0.1500 | $0.6000 | 128K | RAG-optimized | |
Upstage | Solar Mini Efficient | EFFICIENT | $0.1500 | $0.6000 | 32K | Efficient Solar | |
Microsoft | Phi-3 Small Compact | EFFICIENT | $0.1500 | $0.6000 | 128K | 7B small model | |
DeepSeek | DeepSeek-R1-Distill-32B Distilled | REASONING | $0.2000 | $0.5000 | 128K | 32B distilled | |
Qwen | Qwen2-VL-7B Vision | MULTIMODAL | $0.3000 | $0.3000 | 32K | 7B vision-language | |
01.AI | Yi-9B Compact | EFFICIENT | $0.3000 | $0.3000 | 4K | Yi 9B model | |
01.AI | Yi-VL-6B Vision | MULTIMODAL | $0.3000 | $0.3000 | 4K | Yi Vision 6B | |
Stability AI | StableLM 2 12B Open | EFFICIENT | $0.3000 | $0.3000 | 4K | Stability text model | |
Microsoft | Phi-3.5 MoE MoE | EFFICIENT | $0.1600 | $0.6400 | 128K | 16x3.8B mixture | |
Meta | Llama 4 Maverick Frontier | FRONTIER | $0.2000 | $0.6000 | 128K | MoE architecture | |
Mistral | Codestral Code | EFFICIENT | $0.2000 | $0.6000 | 256K | Code specialist | |
Microsoft | Phi-3 Medium Balanced | EFFICIENT | $0.1700 | $0.6800 | 128K | 14B medium | |
MiniMax | ABAB 5.5s Efficient | EFFICIENT | $0.1500 | $0.7500 | 16K | Cost-effective | |
Meta | CodeLlama 34B Instruct Code | EFFICIENT | $0.3500 | $0.3500 | 16K | Strong code model | |
Qwen | Qwen2.5-14B-Instruct Small | EFFICIENT | $0.3500 | $0.3500 | 128K | 14B model | |
Qwen | Qwen2.5-Coder-14B Code | EFFICIENT | $0.3500 | $0.3500 | 128K | 14B code model | |
Qwen | Qwen1.5-14B-Chat Legacy | EFFICIENT | $0.3500 | $0.3500 | 32K | Qwen 1.5 14B | |
xAI | Grok 3 mini Efficient | EFFICIENT | $0.3000 | $0.5000 | 131K | Cost-efficient | |
Meta | Llama 3.1 70B Balanced | EFFICIENT | $0.3500 | $0.4000 | 128K | Balanced open | |
DeepInfra | Llama 3.1 70B (DI) Balanced | EFFICIENT | $0.3500 | $0.4000 | 128K | 70B managed | |
DeepInfra | Qwen2.5-72B (DI) Balanced | FRONTIER | $0.3500 | $0.4000 | 128K | Qwen on DeepInfra | |
Nvidia | Llama-3.1-Nemotron-70B RLHF-Aligned | FRONTIER | $0.3500 | $0.4000 | 128K | RLHF-tuned 70B | |
01.AI | Yi-Large-Turbo Balanced | EFFICIENT | $0.3800 | $0.3800 | 16K | Turbo variant | |
Cohere | Command Light Legacy | EFFICIENT | $0.3000 | $0.6000 | 4K | Older Command | |
Hyperbolic | Llama 3.1 70B (Hyp) Fast | EFFICIENT | $0.4000 | $0.4000 | 128K | 70B fast inference | |
Hyperbolic | Hermes-3-70B (Hyp) Fine-tuned | FRONTIER | $0.4000 | $0.4000 | 128K | Nous Hermes 3 | |
Hyperbolic | Qwen2.5-72B (Hyp) Fast | FRONTIER | $0.4000 | $0.4000 | 128K | Qwen on Hyperbolic | |
IBM | Granite 20B Multilingual Multilingual | EFFICIENT | $0.4000 | $0.4000 | 8K | 10+ languages | |
Moonshot AI | Moonshot v1 32k Balanced | EFFICIENT | $0.4000 | $0.4000 | 32K | Standard context | |
MiniMax | ABAB 6.5t Turbo | EFFICIENT | $0.4500 | $0.4500 | 8K | Turbo variant | |
Tencent | Hunyuan-Standard Balanced | EFFICIENT | $0.3500 | $0.7000 | 256K | Standard model | |
Google | PaLM 2 Bison Legacy | FRONTIER | $0.5000 | $0.5000 | 8K | PaLM 2 text | |
Anyscale | Mixtral 8x7B (Anyscale) MoE | EFFICIENT | $0.5000 | $0.5000 | 32K | MoE on Anyscale | |
Lepton AI | Mixtral 8x7B (Lepton) MoE | EFFICIENT | $0.5000 | $0.5000 | 32K | MoE on Lepton | |
DeepSeek | DeepSeek-R1-Distill-70B Distilled | REASONING | $0.3500 | $0.8800 | 128K | Distilled reasoning | |
Replicate | Mixtral 8x7B (Rep) Serverless | EFFICIENT | $0.3000 | $1.00 | 32K | MoE on Replicate | |
DeepSeek | DeepSeek-V3 Frontier | FRONTIER | $0.2700 | $1.10 | 64K | Top open source | |
DeepSeek | DeepSeek-V3-0324 Frontier | FRONTIER | $0.2700 | $1.10 | 128K | Extended context | |
Anthropic | Claude Haiku 3 Classic | EFFICIENT | $0.2500 | $1.25 | 200K | Fastest Claude 3 | |
AI21 Labs | Jamba Instruct Previous | FRONTIER | $0.5000 | $0.7000 | 256K | First Jamba model | |
IBM | Granite Code 34B Code | EFFICIENT | $0.3500 | $1.05 | 8K | Enterprise code | |
DeepInfra | Phind-CodeLlama-34B (DI) Code | EFFICIENT | $0.6000 | $0.6000 | 16K | Code model | |
DeepInfra | Yi-34B-Chat (DI) Open | FRONTIER | $0.6000 | $0.6000 | 4K | Yi on DeepInfra | |
DeepInfra | WizardLM-2 8x22B (DI) Fine-tuned | FRONTIER | $0.6300 | $0.6300 | 64K | Instruction tuned | |
Meta | Llama 3 70B Previous Gen | EFFICIENT | $0.5900 | $0.7900 | 8K | Llama 3 base | |
Groq | Llama 3.3 70B (Groq) Ultra-Fast | EFFICIENT | $0.5900 | $0.7900 | 128K | Fastest 70B | |
Groq | Llama 3.1 70B (Groq) Fast | EFFICIENT | $0.5900 | $0.7900 | 128K | 70B on LPU | |
Qwen | Qwen2-57B-A14B MoE | FRONTIER | $0.6500 | $0.6500 | 64K | 57B MoE model | |
Cerebras | DeepSeek-R1 (Cerebras) Fast Reasoning | REASONING | $0.5500 | $0.9900 | 64K | R1 on Cerebras | |
Qwen | Qwen2.5-32B-Instruct Balanced | EFFICIENT | $0.7000 | $0.7000 | 128K | 32B compact | |
Qwen | Qwen2.5-Coder-32B Code | EFFICIENT | $0.7000 | $0.7000 | 128K | Best open code model | |
Qwen | Qwen1.5-32B-Chat Legacy | EFFICIENT | $0.7000 | $0.7000 | 32K | Qwen 1.5 32B | |
Zhipu AI | GLM-4-Long Long Context | FRONTIER | $0.7000 | $0.7000 | 1M | 1M context GLM | |
Zhipu AI | GLM-3-Turbo Legacy | EFFICIENT | $0.7000 | $0.7000 | 128K | Previous gen | |
Cerebras | Llama-3.3-70B (Cerebras) Ultra-Fast | EFFICIENT | $0.5900 | $0.9900 | 128K | Fastest 70B inference | |
Baidu | ERNIE 3.5 8K Balanced | EFFICIENT | $0.5500 | $1.10 | 8K | Cost-efficient | |
OpenAI | GPT-4.1 mini Efficient | EFFICIENT | $0.4000 | $1.60 | 1M | Fast & affordable, 1M ctx | |
Groq | Qwen2.5-72B (Groq) Fast | FRONTIER | $0.7900 | $0.7900 | 128K | Qwen on LPU | |
Groq | Qwen2.5-Coder-32B (Groq) Fast Code | EFFICIENT | $0.7900 | $0.7900 | 128K | Code on LPU | |
OpenAI | GPT-3.5 Turbo Legacy | EFFICIENT | $0.5000 | $1.50 | 16K | Affordable classic | |
Google | Gemini 1.0 Pro Legacy | FRONTIER | $0.5000 | $1.50 | 32K | Original Pro | |
Meta | Llama 3.1 405B Large | FRONTIER | $0.8000 | $0.8000 | 128K | Largest open model | |
Amazon | Titan Text Premier Amazon | EFFICIENT | $0.5000 | $1.50 | 32K | Amazon-built | |
Cohere | Aya Expanse 32B Multilingual | FRONTIER | $0.5000 | $1.50 | 128K | 100+ languages | |
Cohere | Aya Expanse 8B Compact ML | EFFICIENT | $0.5000 | $1.50 | 8K | Compact multilingual | |
Together AI | Nous-Hermes-2-Yi-34B Fine-tuned | FRONTIER | $0.8000 | $0.8000 | 4K | Nous Hermes fine-tune | |
Fireworks AI | Phind-CodeLlama-34B (fw) Code | EFFICIENT | $0.8000 | $0.8000 | 16K | Code specialist | |
DeepInfra | Llama 3.1 405B (DI) Serverless | FRONTIER | $0.8000 | $0.8000 | 128K | 405B serverless | |
Lepton AI | Llama 3.1 70B (Lepton) Serverless | EFFICIENT | $0.8000 | $0.8000 | 128K | Lepton inference | |
01.AI | Yi-34B-Chat Open Large | FRONTIER | $0.8000 | $0.8000 | 4K | 34B chat model | |
01.AI | Yi-34B Open Base | FRONTIER | $0.8000 | $0.8000 | 4K | 34B base model | |
ByteDance | Doubao-Pro-256k Long Context | FRONTIER | $0.5600 | $1.40 | 256K | 256K context | |
ByteDance | Doubao-Pro-128k Frontier | FRONTIER | $0.5600 | $1.40 | 128K | 128K context | |
Groq | DeepSeek-R1 (Groq) Fast Reasoning | REASONING | $0.7500 | $0.9900 | 128K | R1 on LPU | |
Reka AI | Reka Edge Micro | EFFICIENT | $0.4000 | $2.00 | 128K | On-device capable | |
Meta | Llama 3.2 90B Vision Vision | MULTIMODAL | $0.8800 | $0.8800 | 128K | Vision + language | |
Together AI | Llama 3.3 70B (Together) Managed | EFFICIENT | $0.8800 | $0.8800 | 128K | Managed inference | |
Meta | Llama 2 70B Chat Legacy | EFFICIENT | $0.9000 | $0.9000 | 4K | Classic Llama 2 | |
Meta | CodeLlama 70B Instruct Code | EFFICIENT | $0.9000 | $0.9000 | 16K | Largest CodeLlama | |
Mistral | Mixtral 8x22B Open MoE | EFFICIENT | $0.9000 | $0.9000 | 64K | Largest open MoE | |
Together AI | Falcon 40B (Together) Open | EFFICIENT | $0.9000 | $0.9000 | 2K | Falcon 40B | |
Fireworks AI | Qwen2.5-72B (fw) Balanced | FRONTIER | $0.9000 | $0.9000 | 32K | Qwen serverless | |
Fireworks AI | Yi-34B (fw) Open | FRONTIER | $0.9000 | $0.9000 | 4K | Yi on Fireworks | |
Qwen | Qwen2-72B-Instruct Previous Gen | FRONTIER | $0.9000 | $0.9000 | 128K | Qwen 2 flagship | |
Qwen | Qwen1.5-72B-Chat Legacy | FRONTIER | $0.9000 | $0.9000 | 32K | Qwen 1.5 72B | |
TII / Falcon | Falcon 40B Instruct Open | EFFICIENT | $0.9000 | $0.9000 | 2K | Instruction tuned | |
Tencent | Hunyuan-Standard-256K Long Context | FRONTIER | $0.7000 | $1.40 | 256K | 256K context | |
Writer | Palmyra X 004 Frontier | FRONTIER | $0.5000 | $2.00 | 128K | Enterprise-grade | |
Cerebras | Llama-3.1-405B (Cerebras) Fast Large | FRONTIER | $0.9900 | $0.9900 | 128K | 405B wafer-scale | |
Perplexity | Sonar Efficient | EFFICIENT | $1.00 | $1.00 | 127K | Grounded search | |
Anyscale | Llama 3.1 70B (Anyscale) Managed | EFFICIENT | $1.00 | $1.00 | 128K | Ray-based inference | |
Anyscale | CodeLlama 70B (Anyscale) Code | EFFICIENT | $1.00 | $1.00 | 16K | Code-tuned Llama | |
DeepSeek | DeepSeek-R1 Reasoning | REASONING | $0.5500 | $2.19 | 64K | Chain-of-thought | |
DeepSeek | DeepSeek-R1-0528 Reasoning | REASONING | $0.5500 | $2.19 | 64K | Latest R1 | |
DeepInfra | DeepSeek-R1 (DI) Reasoning | REASONING | $0.5500 | $2.19 | 64K | R1 managed | |
Qwen | QwQ-32B Reasoning | REASONING | $0.6000 | $2.40 | 131K | Chain-of-thought | |
Databricks | DBRX Instruct Open | FRONTIER | $0.7500 | $2.25 | 32K | 132B open MoE | |
Databricks | DBRX Base Base | FRONTIER | $0.7500 | $2.25 | 32K | DBRX base model | |
Together AI | Qwen2.5-72B (Together) Managed | FRONTIER | $1.20 | $1.20 | 32K | Qwen managed | |
Together AI | DBRX Instruct (Together) Enterprise | FRONTIER | $1.20 | $1.20 | 32K | 132B MoE | |
Together AI | Mixtral 8x22B (Together) Large MoE | FRONTIER | $1.20 | $1.20 | 64K | Largest Mixtral | |
Together AI | WizardLM-2 8x22B Fine-tuned | FRONTIER | $1.20 | $1.20 | 64K | Instruction tuned | |
Qwen | Qwen2.5-72B-Instruct Large | FRONTIER | $1.20 | $1.20 | 128K | 72B open weights | |
Anthropic | Claude Instant 1.2 Legacy | EFFICIENT | $0.8000 | $2.40 | 100K | Legacy fast model | |
MiniMax | MiniMax-Text-01 Frontier | FRONTIER | $0.7000 | $2.80 | 1M | 1M context window | |
Amazon | Nova Pro Frontier | MULTIMODAL | $0.8000 | $3.20 | 300K | Video understanding | |
Nvidia | Llama-3.1-Nemotron-Ultra-253B Ultra | FRONTIER | $1.60 | $1.60 | 128K | 253B Nemotron | |
Writer | Palmyra X 32k Balanced | FRONTIER | $1.00 | $3.00 | 32K | Business writing | |
Writer | Palmyra Med Medical | FRONTIER | $1.00 | $3.00 | 32K | Medical domain | |
Writer | Palmyra Fin Financial | FRONTIER | $1.00 | $3.00 | 32K | Finance domain | |
Databricks | Llama 3.1 70B (DB) Managed | EFFICIENT | $1.00 | $3.00 | 128K | Managed on Databricks | |
OpenAI | GPT-3.5 Turbo Instruct Legacy | EFFICIENT | $1.50 | $2.00 | 4K | Completion-only | |
Anthropic | Claude Haiku 4.5 Efficient | EFFICIENT | $0.8000 | $4.00 | 200K | Fast & affordable | |
Anthropic | Claude Haiku 3.5 Previous Gen | EFFICIENT | $0.8000 | $4.00 | 200K | Previous Haiku | |
Reka AI | Reka Flash 21B Efficient | EFFICIENT | $0.8000 | $4.00 | 128K | Fast multimodal | |
Moonshot AI | Moonshot v1 128k Long Context | FRONTIER | $1.80 | $1.80 | 128K | 128K window | |
Cohere | Rerank v3.5 Reranker | $2.00 | — | — | Per search unit | ||
Cohere | Rerank v3 Multilingual Reranker | $2.00 | — | — | Multilingual rerank | ||
Qwen | Qwen2-VL-72B Vision Large | MULTIMODAL | $2.00 | $2.00 | 32K | 72B vision-language | |
Qwen | Qwen1.5-110B-Chat Legacy Large | FRONTIER | $2.00 | $2.00 | 32K | Qwen 1.5 large | |
OpenAI | o3-mini Reasoning | REASONING | $1.10 | $4.40 | 200K | Efficient reasoning | |
OpenAI | o4-mini Reasoning | REASONING | $1.10 | $4.40 | 200K | Latest mini reasoner | |
Writer | Palmyra Vision Vision | MULTIMODAL | $1.50 | $3.50 | 32K | Multimodal Palmyra | |
Perplexity | Sonar Reasoning Reasoning | REASONING | $1.00 | $5.00 | 127K | Reasoning search | |
MiniMax | ABAB 6.5s Balanced | FRONTIER | $1.00 | $5.00 | 8K | Latest ABAB | |
Google | Gemini 1.5 Pro Previous Gen | FRONTIER | $1.25 | $5.00 | 2M | 2M context | |
Upstage | Solar Pro Frontier | FRONTIER | $1.50 | $6.00 | 32K | Korean AI flagship | |
Groq | Llama 3.1 405B (Groq) Fast Large | FRONTIER | $2.99 | $2.99 | 128K | 405B on LPU | |
Fireworks AI | Llama 3.1 405B (fw) Serverless | FRONTIER | $3.00 | $3.00 | 131K | 405B serverless | |
AI21 Labs | Jurassic-2 Mid Legacy | EFFICIENT | $3.00 | $3.00 | 8K | Mid-tier J2 | |
01.AI | Yi-Large Frontier | FRONTIER | $3.00 | $3.00 | 32K | Yi flagship | |
Qwen | Qwen2.5-Max Flagship | FRONTIER | $1.60 | $6.40 | 32K | Alibaba flagship | |
Mistral | Mistral Large 2 Frontier | FRONTIER | $2.00 | $6.00 | 128K | Top European model | |
Mistral | Pixtral Large Vision | MULTIMODAL | $2.00 | $6.00 | 128K | Multimodal frontier | |
Nvidia | Mistral-Large-2 (NIM) Enterprise | FRONTIER | $2.00 | $6.00 | 128K | Mistral via NIM | |
Together AI | Llama 3.1 405B (Together) Managed Large | FRONTIER | $3.50 | $3.50 | 128K | 405B managed | |
01.AI | Yi-VL-34B Vision Large | MULTIMODAL | $3.50 | $3.50 | 4K | Yi Vision 34B | |
OpenAI | GPT-4.1 Frontier | MULTIMODAL | $2.00 | $8.00 | 1M | 1M context, strong coding | |
Perplexity | Sonar Reasoning Pro Reasoning | REASONING | $2.00 | $8.00 | 127K | R1 + web search | |
Perplexity | Sonar Deep Research Research | REASONING | $2.00 | $8.00 | 127K | Agentic research | |
AI21 Labs | Jamba 1.5 Large Frontier | FRONTIER | $2.00 | $8.00 | 256K | SSM + Transformer | |
OpenAI | GPT-5 Frontier | FRONTIER | $1.25 | $10.00 | 400K | Flagship reasoning + vision | |
Google | Gemini 2.5 Pro Frontier | FRONTIER | $1.25 | $10.00 | 1M | 1M ctx, thinking | |
Hyperbolic | Llama 3.1 405B (Hyp) Fast | FRONTIER | $4.00 | $4.00 | 128K | Fast 405B | |
Together AI | DeepSeek-R1 (Together) Reasoning | REASONING | $3.00 | $7.00 | 64K | R1 managed | |
Nvidia | Nemotron-4-340B Large | FRONTIER | $4.20 | $4.20 | 4K | 340B NVIDIA | |
xAI | Grok 2 Previous | FRONTIER | $2.00 | $10.00 | 131K | Previous Grok | |
xAI | Grok 2 Vision Vision | MULTIMODAL | $2.00 | $10.00 | 32K | Vision model | |
Fireworks AI | DeepSeek-R1 (fw) Reasoning | REASONING | $3.00 | $8.00 | 64K | R1 serverless | |
Tencent | Hunyuan-Turbo Frontier | FRONTIER | $3.50 | $7.00 | 256K | Tencent flagship | |
OpenAI | GPT-4o Flagship | MULTIMODAL | $2.50 | $10.00 | 128K | Omni multimodal (retired ChatGPT Apr 2026) | |
Cohere | Command A Frontier | FRONTIER | $2.50 | $10.00 | 256K | Latest Command | |
Cohere | Command R+ Previous | FRONTIER | $2.50 | $10.00 | 128K | Enterprise RAG | |
Moonshot AI | Kimi k1.5 Reasoning | REASONING | $2.50 | $10.00 | 128K | Long-context reasoning | |
Inflection AI | Pi (Inflection-2.5) Conversational | FRONTIER | $2.50 | $10.00 | 32K | Empathetic AI | |
Google | PaLM 2 Unicorn Legacy | FRONTIER | $5.00 | $5.00 | 8K | Large PaLM 2 | |
Perplexity | Sonar Huge Large | FRONTIER | $5.00 | $5.00 | 127K | 405B based | |
TII / Falcon | Falcon 180B Large Open | FRONTIER | $3.50 | $10.00 | 4K | Largest Falcon | |
Amazon | Nova Premier Top-tier | FRONTIER | $2.50 | $12.50 | 300K | Highest intelligence | |
Tencent | Hunyuan-Vision Vision | MULTIMODAL | $5.50 | $5.50 | 4K | Visual understanding | |
Baidu | ERNIE 4.0 Turbo 8K Frontier | FRONTIER | $3.50 | $10.50 | 8K | Baidu flagship | |
OpenAI | o1-mini Reasoning | REASONING | $3.00 | $12.00 | 128K | Mini o-series | |
Anthropic | Claude Sonnet 4.6 Balanced | MULTIMODAL | $3.00 | $15.00 | 1M | Flagship perf at 1/5 cost | |
Anthropic | Claude Sonnet 4.5 Balanced | MULTIMODAL | $3.00 | $15.00 | 200K | Vision + extended thinking | |
Anthropic | Claude Sonnet 3.7 Previous Gen | REASONING | $3.00 | $15.00 | 200K | Hybrid reasoning | |
Anthropic | Claude Sonnet 3.5 Previous Gen | MULTIMODAL | $3.00 | $15.00 | 200K | Strong at coding | |
xAI | Grok 3 Frontier | FRONTIER | $3.00 | $15.00 | 131K | Real-time X data | |
Perplexity | Sonar Pro Frontier | FRONTIER | $3.00 | $15.00 | 200K | With web search | |
Reka AI | Reka Core Frontier | FRONTIER | $3.00 | $15.00 | 128K | Reka flagship | |
Baidu | ERNIE 4.0 8K Frontier | FRONTIER | $4.20 | $12.60 | 8K | Standard ERNIE 4.0 | |
Zhipu AI | GLM-4-Plus Frontier | FRONTIER | $7.00 | $7.00 | 128K | Zhipu flagship | |
Zhipu AI | GLM-4V Vision | MULTIMODAL | $7.00 | $7.00 | 2K | Vision model | |
xAI | Grok 1.5 Legacy | FRONTIER | $5.00 | $15.00 | 128K | Early Grok | |
Replicate | Llama 3.1 405B (Rep) Serverless | FRONTIER | $9.50 | $9.50 | 128K | Per-token serverless | |
Anthropic | Claude Opus 4.6 Frontier | FRONTIER | $5.00 | $25.00 | 1M | Flagship, 1M context GA | |
xAI | Grok 3 Fast Balanced | MULTIMODAL | $5.00 | $25.00 | 131K | High throughput | |
Google | Gemini 1.0 Ultra Legacy | FRONTIER | $7.50 | $22.50 | 32K | Original Ultra | |
Anthropic | Claude 2.1 Legacy | FRONTIER | $8.00 | $24.00 | 200K | Extended context v2 | |
Anthropic | Claude 2 Legacy | FRONTIER | $8.00 | $24.00 | 100K | Classic Claude 2 | |
AI21 Labs | Jurassic-2 Ultra Legacy | FRONTIER | $15.00 | $15.00 | 8K | Classic AI21 model | |
OpenAI | GPT-4 Turbo Previous Gen | MULTIMODAL | $10.00 | $30.00 | 128K | Vision capable | |
OpenAI | GPT-4 Turbo Preview Previous Gen | FRONTIER | $10.00 | $30.00 | 128K | Preview version | |
OpenAI | o3 Reasoning | REASONING | $10.00 | $40.00 | 200K | Advanced reasoning | |
OpenAI | o1 Reasoning | REASONING | $15.00 | $60.00 | 200K | First o-series | |
OpenAI | o1-preview Reasoning | REASONING | $15.00 | $60.00 | 128K | o1 preview | |
Anthropic | Claude Opus 4.5 Previous | FRONTIER | $15.00 | $75.00 | 200K | State-of-the-art | |
Anthropic | Claude Opus 3 Classic | FRONTIER | $15.00 | $75.00 | 200K | Claude 3 flagship | |
OpenAI | GPT-4 Classic | FRONTIER | $30.00 | $60.00 | 8K | Original GPT-4 | |
OpenAI | GPT-4-32k Classic | FRONTIER | $60.00 | $120.00 | 32K | Extended context | |
OpenAI | GPT-4.5 Near-Frontier | FRONTIER | $75.00 | $150.00 | 128K | Most capable GPT-4 |