Skip to content

chore(pricing): Update vertex-ai pricing#550

Open
siddharthsambharia-portkey wants to merge 32 commits intomainfrom
pricing-update/vertex-ai
Open

chore(pricing): Update vertex-ai pricing#550
siddharthsambharia-portkey wants to merge 32 commits intomainfrom
pricing-update/vertex-ai

Conversation

@siddharthsambharia-portkey
Copy link
Copy Markdown
Collaborator

@siddharthsambharia-portkey siddharthsambharia-portkey commented Mar 17, 2026

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type Count
➕ Models added 14
🔄 Models updated (merged) 17

➕ New Models

  • translate-llm
  • imagetext
  • claude-opus-4-5
  • claude-opus-4-1
  • llama-4-scout-17b-16e-instruct-maas
  • llama3-405b-instruct-maas
  • llama3-70b-instruct-maas
  • llama3-8b-instruct-maas
  • llama3_1-70b-instruct-maas
  • llama3_1-8b-instruct-maas
  • llama3_2-90b-vision-instruct-maas
  • llama3_3-70b-instruct-maas
  • mistral-large-instruct-2411-maas
  • mistral-nemo-instruct-2407-maas

🔄 Updated Models

  • gemini-2.0-flash-001
  • gemini-2.0-flash-lite-001
  • gemini-2.5-pro
  • gemini-2.5-flash
  • gemini-2.5-flash-lite
  • gemini-2.5-flash-preview-09-2025
  • gemini-2.5-flash-lite-preview-09-2025
  • gemini-3-flash-preview
  • gemini-3.1-pro-preview
  • gemini-3.1-flash-lite-preview
  • gemini-3-pro-image-preview
  • gemini-3.1-flash-image-preview
  • veo-3.0-fast-generate-preview
  • veo-3.1-fast-generate-001
  • gemini-embedding-001
  • text-embedding-large-exp-03-07
  • multimodalembedding

Model-to-Pricing-Page Mapping

Google – Gemini (Text/Multimodal)

Model ID Publisher / Section Source Notes
gemini-2.0-flash-001 Google – Gemini 2.0 API Standard table: $0.15/$0.60; cache read $0.0375; batch $0.075/$0.30; search $2.5¢; enterprise search $3¢
gemini-2.0-flash-lite-001 Google – Gemini 2.0 API Standard table: $0.075/$0.30; batch $0.0375/$0.15; search/enterprise search same as Flash
gemini-2.5-pro Google – Gemini 2.5 API Standard table: $1.25/$10 (≤200K), $2.50/$15 (>200K); used lower tier; cache read $0.3125; batch $0.625/$5; search $3.5¢; enterprise $4.5¢
gemini-2.5-flash Google – Gemini 2.5 API Standard table: $0.30/$2.50; cache read $0.075; batch $0.15/$1.25; search $3.5¢; enterprise $4.5¢
gemini-2.5-flash-lite Google – Gemini 2.5 API Standard table: $0.10/$0.40; cache read $0.025; batch $0.05/$0.20; search $3.5¢; enterprise $4.5¢
gemini-2.5-computer-use-preview-10-2025 Google – Gemini 2.5 API Maps to Gemini 2.5 Pro Computer Use; $1.25/$10 (no cache/batch listed); search $3.5¢; enterprise $4.5¢
gemini-2.5-flash-preview-09-2025 Google – Gemini 2.5 API Maps to gemini-2.5-flash pricing; $0.30/$2.50; cache read $0.075; batch $0.15/$1.25
gemini-2.5-flash-lite-preview-09-2025 Google – Gemini 2.5 API Maps to gemini-2.5-flash-lite pricing; $0.10/$0.40
gemini-3-pro-preview Google – Gemini 3 API Standard table: $2/$12; batch $1/$6; search $1.4¢; enterprise $1.4¢
gemini-3-flash-preview Google – Gemini 3 API Standard table: $0.50/$3; batch $0.25/$1.50; search $1.4¢; enterprise $1.4¢
gemini-3.1-pro-preview Google – Gemini 3.1 API Standard table: $2/$12; batch $1/$6; search $1.4¢; enterprise $1.4¢
gemini-3.1-flash-lite-preview Google – Gemini 3.1 API Standard table: $0.25/$1.50; batch $0.125/$0.75; search $1.4¢; enterprise $1.4¢

Google – Gemini (Image Output)

Model ID Publisher / Section Source Notes
gemini-2.5-flash-image Google – Gemini 2.5 Image API $0.30 input/$2.50 output; image_token $30/1M; batch $0.15/$1.25/$15; search $3.5¢
gemini-3-pro-image-preview Google – Gemini 3 Pro Image API $2 input/$12 output; image_token $120/1M; batch $1/$6/$60; search $1.4¢
gemini-3.1-flash-image-preview Google – Gemini 3.1 Flash Image API $0.50 input/$3 output; image_token $60/1M; batch $0.25/$1.50/$30; search $1.4¢

Google – Imagen

Model ID Publisher / Section Source Notes
imagen-3.0-generate-002 Google – Imagen 3 API $0.04/image
imagen-4.0-generate-001 Google – Imagen 4 API $0.04/image
imagen-4.0-fast-generate-001 Google – Imagen 4 Fast API $0.02/image
imagen-4.0-ultra-generate-001 Google – Imagen 4 Ultra API $0.06/image
imagen-3.0-capability-001 Google – Imagen 3 API Capability model; uses imagen-3.0-generate pricing: $0.04/image
imagen-3.0-capability-002 Google – Imagen 3 API Capability model; uses imagen-3.0-generate pricing: $0.04/image
imagetext Google – Imagen (Visual Captioning/VQA) API Visual Captioning / Visual Q&A row: $0.0015/image

Google – Veo

Model ID Publisher / Section Source Notes
veo-2.0-generate-001 Google – Veo 2 API $0.50/sec (720p video); used $0.50 as video_seconds
veo-3.0-generate-001 Google – Veo 3 API Video only 720p/1080p $0.20/sec; video+audio $0.40/sec; used $0.20 as base video_seconds
veo-3.0-fast-generate-001 Google – Veo 3 Fast API Video only 720p/1080p $0.10/sec; video+audio $0.15/sec; used $0.10
veo-3.0-generate-preview Google – Veo 3 API Preview variant; same pricing as veo-3.0-generate-001
veo-3.0-fast-generate-preview Google – Veo 3 Fast API Preview variant; same pricing as veo-3.0-fast-generate-001
veo-3.1-generate-001 Google – Veo 3.1 API Video only 720p/1080p $0.20/sec; video+audio $0.40/sec; used $0.20
veo-3.1-fast-generate-001 Google – Veo 3.1 Fast API Video only 720p/1080p $0.10/sec; video+audio $0.15/sec; used $0.10
veo-3.1-generate-preview Google – Veo 3.1 API Preview variant; same pricing as veo-3.1-generate-001
veo-3.1-fast-generate-preview Google – Veo 3.1 Fast API Preview variant; same pricing as veo-3.1-fast-generate-001

Google – Embeddings

Model ID Publisher / Section Source Notes
gemini-embedding-001 Google – Gemini Embedding API $0.15/1K tokens (online); per_thousand_tokens unit
gemini-embedding-2-preview Google – Gemini Embedding 2 API $0.20/1M tokens; per_million_tokens unit
text-embedding-005 Google – Text Embedding API $0.025/1M chars; per_million_characters unit
text-multilingual-embedding-002 Google – Text Multilingual Embedding API $0.025/1M chars; per_million_characters unit
textembedding-gecko Google – Text Embedding (legacy) API $0.025/1M chars; per_million_characters unit
text-embedding-large-exp-03-07 Google – Text Embedding Large (exp) API $0.15/1K tokens; per_thousand_tokens unit (shares pricing with gemini-embedding-001)
multimodalembedding Google – Multimodal Embedding API $0.0002/1M chars text; image $0.0001/image; video plus $0.0020/sec; standard $0.0010/sec; essential $0.0005/sec

Google – Other

Model ID Publisher / Section Source Notes
translate-llm Google – Translation API – price not found Translation LLM; character-based pricing not in standard token schema; added with price 0

Google – Excluded Models

Model Reason
gemini-live-2.5-flash-native-audio Matches *-live-* pattern — Gemini Live streaming
lyria-002, lyria-3-pro-preview, lyria-3-clip-preview Matches lyria-* — music generation
model-optimizer-* Dynamic routing meta-endpoint
virtual-try-on-001 Product-specific retail model
imagegeneration Legacy, superseded by Imagen 3+
shieldgemma2 Safety/guard model
chirp-2, chirp-3 Audio transcription, not generative
gemma*, codegemma*, paligemma*, medgemma*, txgemma*, functiongemma, translategemma, embeddinggemma, t5gemma Non-generative or self-deploy-only Gemma family
image-segmentation-001 Image segmentation (non-generative CV)
weathernext*, weather-next-* Weather models, not generative AI
earth-ai-imagery* Non-generative vision
bart-large-cnn Self-deploy only
bert-base* Non-generative NLP
t5-flan*, t5-1.1* Fine-tuning-only / no inference endpoint

Anthropic – Claude

Model ID Publisher / Section Source Notes
claude-opus-4-6 Anthropic – Claude Opus 4.6 API $5 in/$25 out; cache write 5m $6.25; cache read $0.50; batch $2.50/$12.50
claude-opus-4-5 Anthropic – Claude Opus 4.5 API $5 in/$25 out; cache write 5m $6.25; cache read $0.50; batch $2.50/$12.50
claude-sonnet-4-6 Anthropic – Claude Sonnet 4.6 API $3 in/$15 out; cache write 5m $3.75; cache read $0.30; batch $1.50/$7.50
claude-sonnet-4-5@20250929 Anthropic – Claude Sonnet 4.5 API $3 in/$15 out; cache write 5m $3.75; cache read $0.30; batch $1.50/$7.50
claude-haiku-4-5@20251001 Anthropic – Claude Haiku 4.5 API $1 in/$5 out; cache write 5m $1.25; cache read $0.10; batch $0.50/$2.50
claude-opus-4-1 Anthropic – Claude Opus 4.1 API $15 in/$75 out; cache write 5m $18.75; cache read $1.50; batch $7.50/$37.50
claude-opus-4@20250514 Anthropic – Claude Opus 4 API $15 in/$75 out; cache write 5m $18.75; cache read $1.50; batch $7.50/$37.50
claude-sonnet-4@20250514 Anthropic – Claude Sonnet 4 API $3 in/$15 out; cache write 5m $3.75; cache read $0.30; batch $1.50/$7.50

DeepSeek

Model ID Publisher / Section Source Notes
deepseek-r1-0528-maas DeepSeek – R1 0528 API $1.35 in/$5.40 out; batch $0.675/$2.70
deepseek-v3.1-maas DeepSeek – V3.1 API $0.60 in/$1.70 out; cache read $0.06; batch $0.30/$0.85
deepseek-v3.2-maas DeepSeek – V3.2 API $0.56 in/$1.68 out; cache read $0.056; batch $0.28/$0.84
deepseek-ocr-maas EXCLUDED OCR model — excluded per global rules (ocr in name)

DeepSeek – Excluded (self-deploy)

Model Reason
deepseek-r1-0528 has_deploy:true, no -maas — self-deploy excluded
deepseek-v3.1 has_deploy:true, no -maas — self-deploy excluded
deepseek-v3.2 has_deploy:true, no -maas — self-deploy excluded

MiniMax

Model ID Publisher / Section Source Notes
minimax-m2-maas MiniMax – M2 API $0.30 in/$1.20 out; cache read $0.03

MiniMax – Excluded (self-deploy)

Model Reason
minimax-m2 has_deploy:true, no -maas — self-deploy excluded

Moonshot / Kimi

Model ID Publisher / Section Source Notes
kimi-k2-thinking-maas Moonshot – Kimi K2 Thinking API $0.60 in/$2.50 out; cache read $0.06

Moonshot – Excluded / Not found

Model Reason
kimi-k2-maas API – price not found; added with price 0 if returned
kimi-k1.5-maas API – price not found; added with price 0 if returned

Note: Only kimi-k2-thinking-maas was confirmed with pricing from the page.


Qwen

Model ID Publisher / Section Source Notes
qwen3-235b-a22b-instruct-2507-maas Qwen – Qwen3 235B Instruct API $0.22 in/$0.88 out; batch $0.11/$0.44
qwen3-coder-480b-a35b-instruct-maas Qwen – Qwen3 Coder 480B API $0.22 in/$1.80 out; cache read $0.022; batch $0.11/$0.90
qwen3-next-80b-a3b-instruct-maas Qwen – Qwen3 Next 80B Instruct API $0.15 in/$1.20 out
qwen3-next-80b-a3b-thinking-maas Qwen – Qwen3 Next 80B Thinking API $0.15 in/$1.20 out

Qwen – Excluded

Model Reason
qwen-image Explicit policy exception — excluded from Vertex AI pricing
qwen3-235b-a22b-instruct-2507 has_deploy:true, no -maas — self-deploy excluded
qwen3-coder-480b-a35b-instruct has_deploy:true, no -maas — self-deploy excluded
Other non-maas Qwen variants Self-deploy excluded

ZAI.org / GLM

Model ID Publisher / Section Source Notes
glm-4.7-maas ZAI – GLM-4.7 API $0.60 in/$2.20 out
glm-5-maas ZAI – GLM-5 API $1.00 in/$3.20 out; cache read $0.10 (free until Feb 19 2026 per page note)

ZAI – Excluded

Model Reason
glm-image Explicit policy exception — excluded from Vertex AI pricing
glm-4.7 has_deploy:true, no -maas — self-deploy excluded
glm-5 has_deploy:true, no -maas — self-deploy excluded

OpenAI (on Vertex)

Model ID Publisher / Section Source Notes
gpt-oss-120b-maas OpenAI – GPT OSS 120B API $0.09 in/$0.36 out; batch $0.045/$0.18

OpenAI – Excluded

Model Reason
clip-vit-* Non-generative (vision classification/embedding) — excluded
whisper-* Audio transcription — not generative inference
gpt-oss-120b (non-maas) has_deploy:true, no -maas — self-deploy excluded

Meta / Llama

Model ID Publisher / Section Source Notes
llama-3.1-405b-instruct-maas Meta – Llama 3.1 405B API $5.00 in/$16.00 out
llama-3.3-70b-instruct-maas Meta – Llama 3.3 70B API $0.72 in/$0.72 out; batch $0.36/$0.36
llama-4-maverick-17b-128e-instruct-maas Meta – Llama 4 Maverick API $0.35 in/$1.15 out; batch $0.175/$0.575
llama-4-scout-17b-16e-instruct-maas Meta – Llama 4 Scout API $0.25 in/$0.70 out; batch $0.125/$0.35
llama3-405b-instruct-maas Meta – Llama 3 405B (legacy) API – price not found Older model ID; no dedicated pricing row found
llama3-70b-instruct-maas Meta – Llama 3 70B (legacy) API – price not found Older model ID; no dedicated pricing row found
llama3-8b-instruct-maas Meta – Llama 3 8B (legacy) API – price not found Older model ID; no dedicated pricing row found
llama3_1-70b-instruct-maas Meta – Llama 3.1 70B API – price not found No dedicated pricing row found
llama3_1-8b-instruct-maas Meta – Llama 3.1 8B API – price not found No dedicated pricing row found
llama3_2-90b-vision-instruct-maas Meta – Llama 3.2 90B Vision API – price not found No dedicated pricing row found
llama3_3-70b-instruct-maas Meta – Llama 3.3 70B (alt ID) API – price not found Likely alias; no matching row found

Meta – Excluded

Model Reason
sam3 Image segmentation (non-generative CV) — excluded

Mistral AI

Model ID Publisher / Section Source Notes
mistral-small-2503 Mistral – Mistral Small 3.1 API $0.10 in/$0.30 out
mistral-medium-3 Mistral – Mistral Medium 3 API $0.40 in/$2.00 out
codestral-2 Mistral – Codestral 2 API $0.30 in/$0.90 out
mistral-large-instruct-2411-maas Mistral – Mistral Large API – price not found No dedicated row found on Vertex pricing page
mistral-nemo-instruct-2407-maas Mistral – Mistral NeMo API – price not found No dedicated row found on Vertex pricing page

Mistral – Excluded

Model Reason
mistral-ocr-* OCR model — excluded per global rules
Non-maas self-deploy variants has_deploy:true, no -maas

AI21

Model ID Publisher / Section Source Notes
jamba-large-1.6 AI21 EXCLUDED has_deploy:true, no -maas — self-deploy excluded

AI21 returned 1 model from API; it is self-deploy-only and excluded. No AI21 entries added.


Data Sources


Generated by Pricing Agent on 2026-03-30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant