Grok 3 and xAI close the gap

100,000 GPUs later, a new frontier model emerges and poses interesting technology and political questions

Feb 19, 2025

A new frontier model emerges

Hewing to the tradition of AI labs releasing major news late in the evening, Monday night Eastern US time saw the announcement and demo of xAI’s latest GenAI model, the imaginatively named Grok 3. Trained on xAI's latest supercluster of NVIDIA GPUs in the company’s quickly constructed data center in Memphis, this release is significant for a number of reasons, both technological and political.

Overall, the release is impressive and demonstrates that we have not hit a wall in the pre-training scaling laws, as many in the industry have feared. xAI’s benchmarks (admittedly not independently audited, but that is true for all its competitors as well) show impressive results across the board, either leading or very close to leading many of the relevant categories. And while we may nitpick about how saturated these benchmarks have become, with most of the current frontier models within an inch of each other, the achievement is still impressive. In addition to the main model, two variations have also been announced, including a mini model for faster, cheaper inference, and a reasoning model using similar techniques to OpenAI’s o3 and DeepSeek’s R1 to enable additional ‘thinking’ time. And an (admittedly not quite as good) competitor to OpenAI’s Deep Research functionality - courageously called Deep Search - is also part of the wider product offering.

The usual caveats we have come to expect with all these frontier model releases still apply of course: the release is extremely limited at the moment, with first access reserved for subscribers to Twitter’s premium product; functionality like voice mode is not yet ready; widespread access will come later through a rumored $30/month subscription, which breaks through the current $20 limit established by OpenAI and Anthropic for their frontier models; and the product roadmap is almost as fragmented as OpenAI’s before their recent correction.

But, caveats aside, there are some real lessons to be discussed from this release.

On the technology side, this is undeniably impressive. xAI started from scratch and has already caught up with the leading models. That, by itself alone, is an achievement worth celebrating. But there are two other important points to consider:

Hardware still matters…a lot

First, xAI has proven that compute budget matters - building up and scaling training of a frontier model from scratch by using massive clusters of GPUs still works. We have yet to see how this cluster performs during inference for millions of concurrent requests, but the main point remains that compute during training definitely makes a difference.
Secondly, for those in the industry claiming that DeepSeek’s emergence threatens NVIDIA have been proven wrong; yes, DeepSeek is a great technological achievement, but xAI was able to surpass it in every metric by ample use of computing power. This should be music to the ears of NVidia and keep their orders for next generation chips robust for the foreseeable future.
Thirdly, xAI has also proven that compute power matters for scaling, and for now at least, model capabilities are almost directly correlated to the amount of GPUs used to train them. Certainly, as DeepSeek has proven, clever software optimizations can help a lot, but in the end compute power remains king.
Finally, and somewhat antithetically to the previous points, hardware can only take you so far. While Grok 3 is impressive, it is also clearly not a generational leap over OpenAI or Anthropic, and at best can be argued to be neck and neck with them. This raises interesting questions about when we will see a clear generational gap emerge - or whether it ever will. While hardware remains king for now, if 100,000 of the latest GPU chips won’t suffice, what will? The valuations of all GenAI companies clearly depend on this answer, and xAI, despite valiant efforts, have only generated more questions around this

The elephant in the room - politics

Undeniably impressive as xAI’s achievements are with Grok 3, no one who considers the GenAI landscape can escape the - to put it mildly - colorful, polarizing personality of the company’s CEO.

Will this help, hinder, or be a non-factor in the adoption of Grok 3? While not turning this into a political debate, surely we can expect resistance from some parts of the public on using a product from a company headed by such a controversial figure. Given the lack of a clear leader in the GenAI model race, it is not outlandish to predict that some (many?) will continue using models from different companies. And while the CEOs of, for example, OpenAI and Meta especially are certainly not without their own controversies, none of them are also in charge of a recently created government entity that is certainly seeing both passionate support and fierce resistance to how it operates. Will this limit xAI's addressable market? The early indicators from some consumers’ seeming reluctance to further engage with Tesla, another consumer-facing company headed by xAI’s CEO, should be instructive.

What comes next

We are clearly entering a new phase in the race to shape the next generation of GenAI - with many developments still to come. Exciting for sure, but for now - and leaving politics aside for a moment - the Grok 3 release is exciting from a technological perspective. If xAI continues to execute well, streamline its product strategy, and release capabilities widely, it could become a prime competitor to OpenAI and other major GenAI companies.

Yiannis pontificates on AI and Data

Discussion about this post