Google has made Gemini 3.5 Flash generally available at $1.50 per million input tokens, offering frontier-level reasoning at four times the speed of comparable models — putting direct price pressure on OpenAI's fast inference lineup.
Google's Gemini 3.5 Flash Is Now Generally Available at $1.50 Per Million Tokens
By Hector Herrera | May 24, 2026 | News
Google has made Gemini 3.5 Flash generally available, offering frontier-level reasoning at four times the speed of comparable models, priced at $1.50 per million input tokens and $9 per million output tokens. The release puts direct cost-and-speed pressure on OpenAI's o-series fast inference models and marks Google's most aggressive pricing move yet to win enterprise AI workloads that demand real-time response.
For developers building production applications where latency matters — customer support agents, code assistants, document processing pipelines — Gemini 3.5 Flash is now a credible and significantly cheaper option at scale.
The Numbers That Matter
According to LLM Stats tracking of the model release, Gemini 3.5 Flash's pricing breaks down as:
- Input: $1.50 per million tokens
- Output: $9.00 per million tokens
- Speed: Approximately 4x faster than comparable frontier-class models in its tier
- Context window and benchmark performance: Not yet independently verified at publication
Note: This report originates from LLM Stats, a model-tracking aggregator. Google has not separately issued a standalone press release on the GA launch as of publication. Verify pricing via Google AI Studio before building pricing models into your infrastructure plans.
Get this in your inbox.
Daily AI intelligence. Free. No spam.
Gemini 3.5 Flash first appeared at Google I/O in May 2026, where it was announced alongside Gemini 2.5 Pro and a suite of agentic features. The GA launch moves it from preview access to full production availability.
Context: The Fast-Inference Race
The fastest-growing segment of the AI model market in 2026 is not the largest models — it is fast, cheap, high-quality inference models that can run at production scale without burning through API budgets. OpenAI's o4-mini and o3 models have dominated this tier since early 2026. Anthropic's Claude Haiku 4.5 is competitive in cost.
Google has historically led on raw infrastructure scale but lagged on enterprise developer mindshare. Gemini 3.5 Flash's $1.50 input price point is designed to change that calculation. At that price, a company processing 10 billion input tokens per month pays $15,000 — competitive with, and in some scenarios below, alternatives in the same capability tier.
Why This Matters
For enterprises: The combination of frontier-quality reasoning and sub-$2 per million token pricing removes the economic barrier that has kept fast inference workloads from scaling. Real-time document review, agentic task orchestration, and high-volume classification jobs all become more viable.
For the model market: Every time Google prices aggressively at GA, it compresses margins for the entire industry. OpenAI and Anthropic face pressure to match or differentiate on quality. This is good for buyers and tightens the economics for every lab competing at this tier.
For Google: Winning enterprise AI workloads through Google Cloud is strategically more valuable than headline benchmark wins. GA availability means Google Cloud enterprise sales teams can now quote Gemini 3.5 Flash to customers with service-level agreements rather than preview caveats — a meaningful shift in the sales motion.
What to Watch
Independent benchmarks from the ML research community on Gemini 3.5 Flash's actual performance against o4-mini and Claude Haiku 4.5 will determine whether the speed and pricing claims hold under real workload conditions. Also watch for Google I/O developer adoption numbers — early API call volumes will signal whether the GA launch is driving new usage or consolidating existing Gemini preview users.
Did this help you understand AI better?
Your feedback helps us write more useful content.
Get tomorrow's AI briefing
Join readers who start their day with NexChron. Free, daily, no spam.