chore(pricing): Update vertex-ai pricing by siddharthsambharia-portkey · Pull Request #550 · Portkey-AI/models

siddharthsambharia-portkey · 2026-03-17T12:15:04Z

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

Change Type	Count
➕ Models added	14
🔄 Models updated (merged)	17

➕ New Models

translate-llm
imagetext
claude-opus-4-5
claude-opus-4-1
llama-4-scout-17b-16e-instruct-maas
llama3-405b-instruct-maas
llama3-70b-instruct-maas
llama3-8b-instruct-maas
llama3_1-70b-instruct-maas
llama3_1-8b-instruct-maas
llama3_2-90b-vision-instruct-maas
llama3_3-70b-instruct-maas
mistral-large-instruct-2411-maas
mistral-nemo-instruct-2407-maas

🔄 Updated Models

gemini-2.0-flash-001
gemini-2.0-flash-lite-001
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.5-flash-preview-09-2025
gemini-2.5-flash-lite-preview-09-2025
gemini-3-flash-preview
gemini-3.1-pro-preview
gemini-3.1-flash-lite-preview
gemini-3-pro-image-preview
gemini-3.1-flash-image-preview
veo-3.0-fast-generate-preview
veo-3.1-fast-generate-001
gemini-embedding-001
text-embedding-large-exp-03-07
multimodalembedding

Model-to-Pricing-Page Mapping

Google – Gemini (Text/Multimodal)

Model ID	Publisher / Section	Source	Notes
`gemini-2.0-flash-001`	Google – Gemini 2.0	API	Standard table: $0.15/$0.60; cache read $0.0375; batch $0.075/$0.30; search $2.5¢; enterprise search $3¢
`gemini-2.0-flash-lite-001`	Google – Gemini 2.0	API	Standard table: $0.075/$0.30; batch $0.0375/$0.15; search/enterprise search same as Flash
`gemini-2.5-pro`	Google – Gemini 2.5	API	Standard table: $1.25/$10 (≤200K), $2.50/$15 (>200K); used lower tier; cache read $0.3125; batch $0.625/$5; search $3.5¢; enterprise $4.5¢
`gemini-2.5-flash`	Google – Gemini 2.5	API	Standard table: $0.30/$2.50; cache read $0.075; batch $0.15/$1.25; search $3.5¢; enterprise $4.5¢
`gemini-2.5-flash-lite`	Google – Gemini 2.5	API	Standard table: $0.10/$0.40; cache read $0.025; batch $0.05/$0.20; search $3.5¢; enterprise $4.5¢
`gemini-2.5-computer-use-preview-10-2025`	Google – Gemini 2.5	API	Maps to Gemini 2.5 Pro Computer Use; $1.25/$10 (no cache/batch listed); search $3.5¢; enterprise $4.5¢
`gemini-2.5-flash-preview-09-2025`	Google – Gemini 2.5	API	Maps to gemini-2.5-flash pricing; $0.30/$2.50; cache read $0.075; batch $0.15/$1.25
`gemini-2.5-flash-lite-preview-09-2025`	Google – Gemini 2.5	API	Maps to gemini-2.5-flash-lite pricing; $0.10/$0.40
`gemini-3-pro-preview`	Google – Gemini 3	API	Standard table: $2/$12; batch $1/$6; search $1.4¢; enterprise $1.4¢
`gemini-3-flash-preview`	Google – Gemini 3	API	Standard table: $0.50/$3; batch $0.25/$1.50; search $1.4¢; enterprise $1.4¢
`gemini-3.1-pro-preview`	Google – Gemini 3.1	API	Standard table: $2/$12; batch $1/$6; search $1.4¢; enterprise $1.4¢
`gemini-3.1-flash-lite-preview`	Google – Gemini 3.1	API	Standard table: $0.25/$1.50; batch $0.125/$0.75; search $1.4¢; enterprise $1.4¢

Google – Gemini (Image Output)

Model ID	Publisher / Section	Source	Notes
`gemini-2.5-flash-image`	Google – Gemini 2.5 Image	API	$0.30 input/$2.50 output; image_token $30/1M; batch $0.15/$1.25/$15; search $3.5¢
`gemini-3-pro-image-preview`	Google – Gemini 3 Pro Image	API	$2 input/$12 output; image_token $120/1M; batch $1/$6/$60; search $1.4¢
`gemini-3.1-flash-image-preview`	Google – Gemini 3.1 Flash Image	API	$0.50 input/$3 output; image_token $60/1M; batch $0.25/$1.50/$30; search $1.4¢

Google – Imagen

Model ID	Publisher / Section	Source	Notes
`imagen-3.0-generate-002`	Google – Imagen 3	API	$0.04/image
`imagen-4.0-generate-001`	Google – Imagen 4	API	$0.04/image
`imagen-4.0-fast-generate-001`	Google – Imagen 4 Fast	API	$0.02/image
`imagen-4.0-ultra-generate-001`	Google – Imagen 4 Ultra	API	$0.06/image
`imagen-3.0-capability-001`	Google – Imagen 3	API	Capability model; uses imagen-3.0-generate pricing: $0.04/image
`imagen-3.0-capability-002`	Google – Imagen 3	API	Capability model; uses imagen-3.0-generate pricing: $0.04/image
`imagetext`	Google – Imagen (Visual Captioning/VQA)	API	Visual Captioning / Visual Q&A row: $0.0015/image

Google – Veo

Model ID	Publisher / Section	Source	Notes
`veo-2.0-generate-001`	Google – Veo 2	API	$0.50/sec (720p video); used $0.50 as video_seconds
`veo-3.0-generate-001`	Google – Veo 3	API	Video only 720p/1080p $0.20/sec; video+audio $0.40/sec; used $0.20 as base video_seconds
`veo-3.0-fast-generate-001`	Google – Veo 3 Fast	API	Video only 720p/1080p $0.10/sec; video+audio $0.15/sec; used $0.10
`veo-3.0-generate-preview`	Google – Veo 3	API	Preview variant; same pricing as veo-3.0-generate-001
`veo-3.0-fast-generate-preview`	Google – Veo 3 Fast	API	Preview variant; same pricing as veo-3.0-fast-generate-001
`veo-3.1-generate-001`	Google – Veo 3.1	API	Video only 720p/1080p $0.20/sec; video+audio $0.40/sec; used $0.20
`veo-3.1-fast-generate-001`	Google – Veo 3.1 Fast	API	Video only 720p/1080p $0.10/sec; video+audio $0.15/sec; used $0.10
`veo-3.1-generate-preview`	Google – Veo 3.1	API	Preview variant; same pricing as veo-3.1-generate-001
`veo-3.1-fast-generate-preview`	Google – Veo 3.1 Fast	API	Preview variant; same pricing as veo-3.1-fast-generate-001

Google – Embeddings

Model ID	Publisher / Section	Source	Notes
`gemini-embedding-001`	Google – Gemini Embedding	API	$0.15/1K tokens (online); per_thousand_tokens unit
`gemini-embedding-2-preview`	Google – Gemini Embedding 2	API	$0.20/1M tokens; per_million_tokens unit
`text-embedding-005`	Google – Text Embedding	API	$0.025/1M chars; per_million_characters unit
`text-multilingual-embedding-002`	Google – Text Multilingual Embedding	API	$0.025/1M chars; per_million_characters unit
`textembedding-gecko`	Google – Text Embedding (legacy)	API	$0.025/1M chars; per_million_characters unit
`text-embedding-large-exp-03-07`	Google – Text Embedding Large (exp)	API	$0.15/1K tokens; per_thousand_tokens unit (shares pricing with gemini-embedding-001)
`multimodalembedding`	Google – Multimodal Embedding	API	$0.0002/1M chars text; image $0.0001/image; video plus $0.0020/sec; standard $0.0010/sec; essential $0.0005/sec

Google – Other

Model ID	Publisher / Section	Source	Notes
`translate-llm`	Google – Translation	API – price not found	Translation LLM; character-based pricing not in standard token schema; added with price 0

Google – Excluded Models

Model	Reason
`gemini-live-2.5-flash-native-audio`	Matches `-live-` pattern — Gemini Live streaming
`lyria-002`, `lyria-3-pro-preview`, `lyria-3-clip-preview`	Matches `lyria-*` — music generation
`model-optimizer-*`	Dynamic routing meta-endpoint
`virtual-try-on-001`	Product-specific retail model
`imagegeneration`	Legacy, superseded by Imagen 3+
`shieldgemma2`	Safety/guard model
`chirp-2`, `chirp-3`	Audio transcription, not generative
`gemma`, `codegemma`, `paligemma`, `medgemma`, `txgemma*`, `functiongemma`, `translategemma`, `embeddinggemma`, `t5gemma`	Non-generative or self-deploy-only Gemma family
`image-segmentation-001`	Image segmentation (non-generative CV)
`weathernext`, `weather-next-`	Weather models, not generative AI
`earth-ai-imagery*`	Non-generative vision
`bart-large-cnn`	Self-deploy only
`bert-base*`	Non-generative NLP
`t5-flan`, `t5-1.1`	Fine-tuning-only / no inference endpoint

Anthropic – Claude

Model ID	Publisher / Section	Source	Notes
`claude-opus-4-6`	Anthropic – Claude Opus 4.6	API	$5 in/$25 out; cache write 5m $6.25; cache read $0.50; batch $2.50/$12.50
`claude-opus-4-5`	Anthropic – Claude Opus 4.5	API	$5 in/$25 out; cache write 5m $6.25; cache read $0.50; batch $2.50/$12.50
`claude-sonnet-4-6`	Anthropic – Claude Sonnet 4.6	API	$3 in/$15 out; cache write 5m $3.75; cache read $0.30; batch $1.50/$7.50
`claude-sonnet-4-5@20250929`	Anthropic – Claude Sonnet 4.5	API	$3 in/$15 out; cache write 5m $3.75; cache read $0.30; batch $1.50/$7.50
`claude-haiku-4-5@20251001`	Anthropic – Claude Haiku 4.5	API	$1 in/$5 out; cache write 5m $1.25; cache read $0.10; batch $0.50/$2.50
`claude-opus-4-1`	Anthropic – Claude Opus 4.1	API	$15 in/$75 out; cache write 5m $18.75; cache read $1.50; batch $7.50/$37.50
`claude-opus-4@20250514`	Anthropic – Claude Opus 4	API	$15 in/$75 out; cache write 5m $18.75; cache read $1.50; batch $7.50/$37.50
`claude-sonnet-4@20250514`	Anthropic – Claude Sonnet 4	API	$3 in/$15 out; cache write 5m $3.75; cache read $0.30; batch $1.50/$7.50

DeepSeek

Model ID	Publisher / Section	Source	Notes
`deepseek-r1-0528-maas`	DeepSeek – R1 0528	API	$1.35 in/$5.40 out; batch $0.675/$2.70
`deepseek-v3.1-maas`	DeepSeek – V3.1	API	$0.60 in/$1.70 out; cache read $0.06; batch $0.30/$0.85
`deepseek-v3.2-maas`	DeepSeek – V3.2	API	$0.56 in/$1.68 out; cache read $0.056; batch $0.28/$0.84
`deepseek-ocr-maas`	—	EXCLUDED	OCR model — excluded per global rules (ocr in name)

DeepSeek – Excluded (self-deploy)

Model	Reason
`deepseek-r1-0528`	has_deploy:true, no -maas — self-deploy excluded
`deepseek-v3.1`	has_deploy:true, no -maas — self-deploy excluded
`deepseek-v3.2`	has_deploy:true, no -maas — self-deploy excluded

MiniMax

Model ID	Publisher / Section	Source	Notes
`minimax-m2-maas`	MiniMax – M2	API	$0.30 in/$1.20 out; cache read $0.03

MiniMax – Excluded (self-deploy)

Model	Reason
`minimax-m2`	has_deploy:true, no -maas — self-deploy excluded

Moonshot / Kimi

Model ID	Publisher / Section	Source	Notes
`kimi-k2-thinking-maas`	Moonshot – Kimi K2 Thinking	API	$0.60 in/$2.50 out; cache read $0.06

Moonshot – Excluded / Not found

Model	Reason
`kimi-k2-maas`	API – price not found; added with price 0 if returned
`kimi-k1.5-maas`	API – price not found; added with price 0 if returned

Note: Only kimi-k2-thinking-maas was confirmed with pricing from the page.

Qwen

Model ID	Publisher / Section	Source	Notes
`qwen3-235b-a22b-instruct-2507-maas`	Qwen – Qwen3 235B Instruct	API	$0.22 in/$0.88 out; batch $0.11/$0.44
`qwen3-coder-480b-a35b-instruct-maas`	Qwen – Qwen3 Coder 480B	API	$0.22 in/$1.80 out; cache read $0.022; batch $0.11/$0.90
`qwen3-next-80b-a3b-instruct-maas`	Qwen – Qwen3 Next 80B Instruct	API	$0.15 in/$1.20 out
`qwen3-next-80b-a3b-thinking-maas`	Qwen – Qwen3 Next 80B Thinking	API	$0.15 in/$1.20 out

Qwen – Excluded

Model	Reason
`qwen-image`	Explicit policy exception — excluded from Vertex AI pricing
`qwen3-235b-a22b-instruct-2507`	has_deploy:true, no -maas — self-deploy excluded
`qwen3-coder-480b-a35b-instruct`	has_deploy:true, no -maas — self-deploy excluded
Other non-maas Qwen variants	Self-deploy excluded

ZAI.org / GLM

Model ID	Publisher / Section	Source	Notes
`glm-4.7-maas`	ZAI – GLM-4.7	API	$0.60 in/$2.20 out
`glm-5-maas`	ZAI – GLM-5	API	$1.00 in/$3.20 out; cache read $0.10 (free until Feb 19 2026 per page note)

ZAI – Excluded

Model	Reason
`glm-image`	Explicit policy exception — excluded from Vertex AI pricing
`glm-4.7`	has_deploy:true, no -maas — self-deploy excluded
`glm-5`	has_deploy:true, no -maas — self-deploy excluded

OpenAI (on Vertex)

Model ID	Publisher / Section	Source	Notes
`gpt-oss-120b-maas`	OpenAI – GPT OSS 120B	API	$0.09 in/$0.36 out; batch $0.045/$0.18

OpenAI – Excluded

Model	Reason
`clip-vit-*`	Non-generative (vision classification/embedding) — excluded
`whisper-*`	Audio transcription — not generative inference
`gpt-oss-120b` (non-maas)	has_deploy:true, no -maas — self-deploy excluded

Meta / Llama

Model ID	Publisher / Section	Source	Notes
`llama-3.1-405b-instruct-maas`	Meta – Llama 3.1 405B	API	$5.00 in/$16.00 out
`llama-3.3-70b-instruct-maas`	Meta – Llama 3.3 70B	API	$0.72 in/$0.72 out; batch $0.36/$0.36
`llama-4-maverick-17b-128e-instruct-maas`	Meta – Llama 4 Maverick	API	$0.35 in/$1.15 out; batch $0.175/$0.575
`llama-4-scout-17b-16e-instruct-maas`	Meta – Llama 4 Scout	API	$0.25 in/$0.70 out; batch $0.125/$0.35
`llama3-405b-instruct-maas`	Meta – Llama 3 405B (legacy)	API – price not found	Older model ID; no dedicated pricing row found
`llama3-70b-instruct-maas`	Meta – Llama 3 70B (legacy)	API – price not found	Older model ID; no dedicated pricing row found
`llama3-8b-instruct-maas`	Meta – Llama 3 8B (legacy)	API – price not found	Older model ID; no dedicated pricing row found
`llama3_1-70b-instruct-maas`	Meta – Llama 3.1 70B	API – price not found	No dedicated pricing row found
`llama3_1-8b-instruct-maas`	Meta – Llama 3.1 8B	API – price not found	No dedicated pricing row found
`llama3_2-90b-vision-instruct-maas`	Meta – Llama 3.2 90B Vision	API – price not found	No dedicated pricing row found
`llama3_3-70b-instruct-maas`	Meta – Llama 3.3 70B (alt ID)	API – price not found	Likely alias; no matching row found

Meta – Excluded

Model	Reason
`sam3`	Image segmentation (non-generative CV) — excluded

Mistral AI

Model ID	Publisher / Section	Source	Notes
`mistral-small-2503`	Mistral – Mistral Small 3.1	API	$0.10 in/$0.30 out
`mistral-medium-3`	Mistral – Mistral Medium 3	API	$0.40 in/$2.00 out
`codestral-2`	Mistral – Codestral 2	API	$0.30 in/$0.90 out
`mistral-large-instruct-2411-maas`	Mistral – Mistral Large	API – price not found	No dedicated row found on Vertex pricing page
`mistral-nemo-instruct-2407-maas`	Mistral – Mistral NeMo	API – price not found	No dedicated row found on Vertex pricing page

Mistral – Excluded

Model	Reason
`mistral-ocr-*`	OCR model — excluded per global rules
Non-maas self-deploy variants	has_deploy:true, no -maas

AI21

Model ID	Publisher / Section	Source	Notes
`jamba-large-1.6`	AI21	EXCLUDED	has_deploy:true, no -maas — self-deploy excluded

AI21 returned 1 model from API; it is self-deploy-only and excluded. No AI21 entries added.

Data Sources

Vertex AI Generative AI Pricing Page (Global tab): https://cloud.google.com/vertex-ai/generative-ai/pricing#global
get_vertex_models API: all publishers (google, anthropic, openai, meta, ai21, qwen, mistral-ai, mistralai, deepseek-ai, deepseek, moonshotai, minimaxai, zai-org)
Claude on Vertex AI docs: https://platform.claude.com/docs/en/build-with-claude/claude-on-vertex-ai

Generated by Pricing Agent on 2026-03-30

siddharthsambharia-portkey added 30 commits March 17, 2026 17:45

chore(pricing): Update vertex-ai pricing

a1a3f5f

chore(pricing): Update vertex-ai pricing

53b3f5d

chore(pricing): Update vertex-ai pricing

52dbf8e

chore(pricing): Update vertex-ai pricing

f19c6a3

chore(pricing): Update vertex-ai pricing

a6e1035

chore(pricing): Update vertex-ai pricing

91c6f2a

chore(pricing): Update vertex-ai pricing

d32f719

chore(pricing): Update vertex-ai pricing

6a7c7e8

chore(pricing): Update vertex-ai pricing

916ddaf

chore(pricing): Update vertex-ai pricing

fa02c68

chore(pricing): Update vertex-ai pricing

7320d33

chore(pricing): Update vertex-ai pricing

3604db1

chore(pricing): Update vertex-ai pricing

d31b801

chore(pricing): Update vertex-ai pricing

a267566

chore(pricing): Update vertex-ai pricing

04933eb

chore(pricing): Update vertex-ai pricing

2dd50e4

chore(pricing): Update vertex-ai pricing

21a3a64

chore(pricing): Update vertex-ai pricing

244cd8b

chore(pricing): Update vertex-ai pricing

623bbde

chore(pricing): Update vertex-ai pricing

c7e7113

chore(pricing): Update vertex-ai pricing

5cda0eb

chore(pricing): Update vertex-ai pricing

3b130f0

chore(pricing): Update vertex-ai pricing

271a047

chore(pricing): Update vertex-ai pricing

8867d9d

chore(pricing): Update vertex-ai pricing

bdf8d15

chore(pricing): Update vertex-ai pricing

23b51be

chore(pricing): Update vertex-ai pricing

81c0fd3

chore(pricing): Update vertex-ai pricing

ebf58b5

chore(pricing): Update vertex-ai pricing

6745bf8

chore(pricing): Update vertex-ai pricing

62dc55d

siddharthsambharia-portkey added 2 commits March 29, 2026 23:43

chore(pricing): Update vertex-ai pricing

6bdcbde

chore(pricing): Update vertex-ai pricing

c92c59c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(pricing): Update vertex-ai pricing#550

chore(pricing): Update vertex-ai pricing#550
siddharthsambharia-portkey wants to merge 32 commits intomainfrom
pricing-update/vertex-ai

siddharthsambharia-portkey commented Mar 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

siddharthsambharia-portkey commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔄 Pricing Update: vertex-ai

📊 Summary (complete_diff mode)

➕ New Models

🔄 Updated Models

Model-to-Pricing-Page Mapping

Google – Gemini (Text/Multimodal)

Google – Gemini (Image Output)

Google – Imagen

Google – Veo

Google – Embeddings

Google – Other

Google – Excluded Models

Anthropic – Claude

DeepSeek

DeepSeek – Excluded (self-deploy)

MiniMax

MiniMax – Excluded (self-deploy)

Moonshot / Kimi

Moonshot – Excluded / Not found

Qwen

Qwen – Excluded

ZAI.org / GLM

ZAI – Excluded

OpenAI (on Vertex)

OpenAI – Excluded

Meta / Llama

Meta – Excluded

Mistral AI

Mistral – Excluded

AI21

Data Sources

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

siddharthsambharia-portkey commented Mar 17, 2026 •

edited

Loading