Claude Sonnet 3.7 first impressions

Is very good, good enough?

Feb 25, 2025

Anthropic's newly released Claude Sonnet 3.7 model is impressive. There, I've said it; you can stop reading here if you don't want the details. There is a lot more to be said, but the bottom line is that, just like with its predecessors, Anthropic has done an impressive job of creating a super capable state-of-the-art model that is particularly well suited to coding and creative writing.

But if you came here for the gory details, stick around!

What did Anthropic release?

The new version of Claude Sonnet, somehow still unimaginatively named as version 3.7. Hey, at least it is better than releasing versions 3.5 and 3.6 as 3.5 and 3.5 (new) respectively as they did previously; we have to give credit to the company for listening to the widespread confusion around its naming, even if we would have preferred perhaps something a little less…geeky? But we are all geeks at this point in time so at least this communicates we have an improved model, even if it does not exactly roll off the tongue. But then again, none of the other SOTA model providers are known for their imaginative or user-friendly naming (looking at you OpenAI GPT o3-mini-high, o3-mini, o1, 4o, GPT-4 and various other tongue twisters).

All complaints about model naming aside (I did try to warn you that you didn't need to read this far down 🙂), here are are the details of the 3.7 model release today, incorporating:

Significantly improved capabilities around coding and creative writing, accompanied by some excellent benchmarks which, in many but certainly not all cases, lead all their competitors.
Integrated reasoning capabilities in the mode pioneered by OpenAI o1 and also seen with DeepSeek R1 and xAI Grok 3. Unlike these models however, reasoning is built right into the model, whereas OpenAI and DeepSeek for example require the use of a different model specifically designed for reasoning. Anthropic has delivered a great integrated version, although users still need to explicitly request reasoning to be enabled for a particular query within the model. (OpenAI, in contrast, has discussed unifying ‘regular’ and reasoning models in a seamless interface down the line). Developers can also control the amount of reasoning effort (in tokens) the model spends, which is an improvement on OpenAI’s coarser low / middle / high effort toggles.
Improved agentic capabilities and additional safety alignment for the Computer Use capability it released with the previous version of its model.
A new Claude Code CLI in preview, aimed at developers and enabling many agentic capabilities around interacting and editing codebases from the command line.
A new knowledge cut-off date of October 2024.

So how good is it in practice?

Very good, as we said at the very beginning. The improved capabilities are particularly evident in two areas:

Coding: I am impressed with just how good the model is for assisting developers in building and iterating on code. It was able to quickly identify two performance issues and one logic bug in an agentic application I have been experimenting on, and wrote lengthy explanations on the issues before automatically fixing them. In another test, it used its already impressive Artifacts capability to build a very impressive physics simulation for an imaginary game it was asked to create, and was able to seamlessly produce 2000 lines of flawless (if slightly imperfect) code and a playable prototype in the browser. None of this is new, but the quality is clearly better than before, and some of the frustrating limitations of previous versions about output length seem to have vanished.
Creative Writing: Maybe there is still time for my Hollywood career, as the screen treatment, story beats and sample dialog I asked it to write about a 1950s New York-set detective story was structured correctly, maintained logical consistency, involved interesting pup characters that would not be out of place in a movie of the era, and had some great screen direction pointers and mood settings. Anyone who would like to finance the filming of this, please contact me directly through the website 🙂 This is significantly improved from previous versions that ran out of output tokens and started to lose coherence towards the end of a long writing session.

I can also definitely see more stability around Computer Use and more sophistication in securing against attacks, plus the Code CLI is also very promising (although clearly not up to production standard yet). And the overall personality of the model still remains interesting and less robotic than its competitors.

Limitations still persist unfortunately:

Claude still does not have access to the Internet to perform web searches, putting it at significant disadvantage to many of its competitors.
The Deep Thinking capability and the Code CLI are also exclusively available in its paid tier - Deep Thinking especially is a big miss given that OpenAI, Grok and DeepSeek all provide a limited use version of this capability to their free customers.
And there are definitely areas where OpenAI o3-mini-high and o1 are just overall better in overall reasoning and thinking; even more so when the Deep Research capability is used, something to which Anthropic has no real answer at the moment.

But overall, caveats aside, this is an excellent model, and clearly surpasses its competitors in several real-world use cases. I am impressed.

What does this mean for Anthropic and for the GenAI industry?

I have been thinking about this lately, given the barrage of models we have seen being released over the last couple of months. And while this model is undoubtedly great, I doubt it will move the needle significantly for Anthropic, for many reasons:

Name recognition: Anthropic just do not have the mindshare among the general public and find it very hard to break through the noise of their more well known competitors, especially OpenAI. This release does not do much to change this; this is not a generational leap in capabilities, as good as the improvements are. Incremental improvement is just not enough to unseat the incumbents, as even flashier competitors like Grok are beginning to find out. Especially when it comes to consumer use, brand perception and general cultural recognition matter a lot.

Emphasis on technical use cases: Anthropic is already making the majority of its revenue through API use rather than through its consumer chatbot, and the coding use case will continue to be a significant driver given the demonstrated excellent capabilities of the model. Anthropic clearly recognizes that this use case is widely utilized by its customers and has optimized the model accordingly, plus added additional new developer-oriented reasoning capabilities for the base model. But Claude already had a stellar reputation as arguably the best model for coding, which this release only enhances, so while it will certainly keep its existing developer base happy, I do not see it as being able to break out of that niche, given its product limitations around missing or inaccessible-to-the-general-public capabilities.

Capacity constraints: The company seems to not have as much compute available to serve the model at scale, with frequent complaints from users about stingy usage limits - sometimes even on the paid tier. Nothing released for 3.7 changes that - in fact, the new model seems to be more computationally intensive given its reasoning capabilities. Unless Anthropic’s new data center investments come online and change the overall perception, Claude will probably continue to remain a sophisticated model for niche audiences, with OpenAI et al fighting for both the consumer and the technical market.

As for the rest of the industry, they cannot afford to rest on their laurels. Anthropic’s offering is an incredibly sophisticated, highly tuned general purpose / reasoning model hybrid that is particularly suited for a wide array of technical work - and it can easily steal lucrative business-centric workloads from its more well known competitors. Anthropic is also well positioned to capitalize on its partnership with Amazon and its inclusion in the Bedrock GenAI platform - there is a lot there for businesses in the AWS ecosystem to be happy with, and I see no particular reason for those companies to look to alternatives from OpenAI and others. Also, where developers go, the rest of the industry follows, so while Anthropic may not be the brand juggernaut of some of its competitors, quietly being the substrate of many new innovative products (see Cursor as one of many examples) will stand it in good stead. And its clear product strategy - one Claude variation, not seven like OpenAI - is a compelling advertisement of integration and simplicity. I expect increased adoption, but not a breakout to the general public, at least not just yet.

Finally, this is yet another demonstration of how good iterative enhancements to models continue to become; we are clearly not seeing generational leaps anymore (think back to the GPT 3 to 3.5 to 4 journey for truly groundbreaking advancements in the science) but still seeing the limits of AI intelligence being pushed. Anthropic is at the forefront, and this release will ratchet the pressure on OpenAI especially to keep innovating and delivering something groundbreaking again. Is this feasible? We’ll be here to see how it all plays out (unless Hollywood calls of course 🙂 ).

Yiannis pontificates on AI and Data

Discussion about this post