Opinion

Gemini 2.5 Flash comes to the Gemini app as Google seeks to improve “dynamic thinking”

Gemini 2.5 Flash is ready for you to try in the Gemini app, but it's starting as a preview.

2025-04-17 15:04

A publicity shot published by the BBC for Lucy Williamson’s report inside Al-Shifa Hospital in November 2023. (Photo: BBC)

Google's Gemini AI may have had a slow start, but it has been anything but in 2025. Barely a week goes by that another model doesn't arrive in the Gemini app or developer tools like AI Studio, and there's a major release coming to the app today. Google has announced that its faster, more efficient Gemini 2.5 Flash model is rolling out widely in preview. At the same time, developers can begin building with 2.5 Flash using the company's newly announced API pricing, which Google says is much lower than competing products.

A gaggle of Gemini

The model dropdown in the Gemini app is a bit convoluted, particularly as we see products like Veo 2 and Personalization popping up there. Google has been releasing so many preview models and new ways of using Gemini that it can be hard to know which option to choose for a given task. In fairness, Google is far from the only major AI player with this problem. Tulsee Doshi is Google's director of product management for Gemini, which means she leads the team building these models. We asked Doshi what version of Gemini she finds herself using, and unsurprisingly, she likes the more powerful option. "Typically right now, I have been using 2.5 Pro," says Doshi. "I use Gemini throughout the day for my work in a few key areas, like creating documents or slides. That's either for internal consumption or actually sharing externally, and I've found 2.5 Pro to be really helpful for the creative writing element."

The new model is smaller than Gemini 2.5 Pro and about the same size as 2.0 Flash, but it should perform better. Doshi calls it a "strong step up from 2.0 Flash." Gemini 2.5 Flash won't add to the app confusion at least. This model will be listed as 2.5 Flash (Experimental) in the app and on the website, replacing the 2.0 Thinking (Experimental) option. The fact that the 2.0 thinking model never even made it out of the experimental stage is a testament to how quickly Google's Gemini team is moving these days. Unlike the 2.0 thinking model, the new 2.5 Flash will debut with support for Google's Canvas feature for working on text or code. Deep research support for this model will come later, according to a Google spokesperson. Gemini 2.5 Pro is still there and still in the experimental phase, leaving 2.0 Flash as the only non-experimental chatbot. That model doesn't include reasoning capabilities, though.

Thinking on, thinking off

Like all of Google's models in the 2.5 branch and beyond, Gemini 2.5 has simulated reasoning built in, which Google calls "thinking." That means the model checks its facts as it goes, resulting in more accurate outputs. However, that also makes models slower and much more expensive. Since not all queries require that level of ongoing analysis, Google has equipped Flash with some tools that can help developers tune it for their use case. You may remember that Google began courting developers with Gemini 2.5 Flash earlier this month. While the model still isn't completely finished, Google has opted to make it fully available in Vertex AI and AI Studio with variable API pricing. Gemini 2.5 Flash will allow developers to set a token limit for thinking or simply disable thinking altogether. Google has provided pricing per 1 million tokens at $0.15 for input, and output comes in two flavors. Without thinking, outputs are $0.60, but enabling thinking boosts it to $3.50. The thinking budget option will allow developers to fine-tune the model to do what they want for an amount of money they're willing to pay. According to Doshi, you can actually see the reasoning improvements in benchmarks as you add more token budget.

Like 2.5 Pro, this model supports Dynamic Thinking, which can automatically adjust the amount of work that goes into generating an output based on the complexity of the input. The new Flash model goes further by allowing developers to control thinking. According to Doshi, Google is launching the model now to guide improvements in these dynamic features. "Part of the reason we're putting the model out in preview is to get feedback from developers on where the model meets their expectations, where it under-thinks or over-thinks, so that we can continue to iterate on [dynamic thinking]," says Doshi. Don't expect that kind of precise control for consumer Gemini products right now, though. Doshi notes that the main reason you'd want to toggle thinking or set a budget is to control costs and latency, which matters to developers. However, Google is hoping that what it learns from the preview phase will help it understand what users and developers expect from the model. "Creating a simpler Gemini app experience for consumers while still offering flexibility is the goal," Doshi says. With the rapid cadence of releases, a final release for Gemini 2.5 doesn't seem that far off. Google still doesn't have any specifics to share on that front, but with the new developer options and availability in the Gemini app, Doshi tells us the team hopes to move the 2.5 family to general availability soon.

Technology

Gemini 2.5 Flash comes to the Gemini app as Google seeks to improve “dynamic thinking”

A gaggle of Gemini

Thinking on, thinking off