Reshaping the GenAI Landscape: Part 2

DeepSeek: A Fading Trend or Trendsetter?

On January 10, 2025, Chinese hedge fund and quantitative firm High-Flyer released its AI chatbot DeepSeek-R1 as an MIT licensed open-source product. As part of the technical report outlining the chatbot’s predecessor, model V3, DeepSeek says its V3 was trained on only 2,048 H800 GPUs from Nvidia over a two-month period, while pre-training, context extension, and post-training costs totaled less than $6 million.^[1] By comparison, OpenAI’s ChatGPT utilized over 20,000 Nvidia A100 chips, with each chip costing between $10,000 and $15,000^[2], while Meta’s Llama 3 required 24,000 of Nvidia’s H100 chips at a total cost of $72 million.^[3] This increased efficiency at a fraction of the cost for training has opened the door to an issue that has plagued the industry since the widescale rollout of chatbots: how to maintain an innovation edge advancing artificial general intelligence while minimizing the large carbon footprint of large language models (LLMs).

To help WTW better understand the forces influencing this evolving AI market, the WTW Research Network (WRN) partnered with the University of Pennsylvania’s Wharton School and its Mack Institute’s Collaborative Innovation Program (CIP). Building on our previous work with the CIP and their Executive MBA students, Green Algorithms – AI and Sustainability, the WRN has sought to further examine the LLM competitive landscape including new disruptions and opportunities for optimization and efficiency. Part 1 looks at GenAI’s impact on risk management frameworks while this piece examines DeepSeek’s initial impact on the LLM market. Part 3 rounds out the series, providing a look at The Future of Hardware Computing and examining the implications for the market going forward as the industry moves from the training to the inference phase.

As the AI revolution has picked up pace, Nvidia’s rise has been concomitant with the rollout of chatbots by hyperscaler companies such as Alphabet, Amazon, Meta, and Microsoft. OpenAI’s ChatGPT largely set the trend of widespread use of LLMs. An unavoidable reality of the industry, however, was the necessity of deploying tens of thousands of chips to power these LLMs, whose compute-intensive functions require elaborate data center cooling systems. DeepSeek-V3’s training costs and GPU usage has forced a rethinking of that business model.

LLM Benchmarks: An Intro

Presented as a Mixture-of-Experts (MoE) language model, DeepSeek was trained on over 671 billion parameters and claims to perform better than OpenAI’s o1 on key benchmarks such as AIME, MATH-500, and SWE-bench Verified^[4], all key metrics designed to evaluate language models’ capabilities in math, physics, reasoning, science, and realistic software engineering scenarios.

To verify the claims of DeepSeek, High-Flyer asserted its benchmarks were beyond those of industry competitors. Deploying such evaluations and benchmarks is now a staple of LLMs, as such standards are now used to determine the accuracy and superiority of chatbots. Evaluations are often broken down into model evaluations vs. system evaluations. Model evaluations focus on the internal capabilities of the LLM while system evaluations focus on the LLM’s performance within a larger application.^[5]

One of the most widely used and comprehensive list of benchmarks for LLMs is HuggingFace’s Big Benchmarks Collection, specifically its Open LLM Leaderboard.^[6] Some common LLM benchmarks include: precision, recall, exact match, and perplexity. LLM benchmarks consist of sample data, a testing regiment of that data, metric evaluation, and scores based off of outputs. AI researchers must constantly curate the data so as to avoid hallucinations while adhering to the fidelity of the outputs. This is done through massive amounts of training, re-training, inputting, and precision and recall. Inputting billions and trillions of parameters into the model will then allow its designers to compare trustworthiness, accuracy, and recall with other LLMs, allowing a standard method of comparison across chatbots.

AI Distillation, Reinforcement Learning, and Mixture of Experts: What are they?

DeepSeek’s impressive reasoning abilities at a fraction of the cost threw into doubt the prevailing notion that tech companies must invest massive capital expenditures as part of their investment processes to power language models for training and inference in their quest to advance superior artificial general intelligence.

In their Technical Report, the authors attributed DeepSeek’s impressive performance to a number of factors, key amongst them were reinforcement learning and a Mixture-of-Experts model. While some such as Elon Musk and Dario Amodei of Anthropic have cast doubt regarding some of DeepSeek’s cost structure and abilities, many believe that DeepSeek was able to achieve its remarkable benchmarks through “knowledge distillation.”This technique aims to transfer the learnings of a larger pre-trained model (teacher model) and fine-tune them to a smaller model (student model), enabling the student model to achieve efficient results on a smaller scale.^[7]

An analogous comparison to this, as Ali Ghodsi, CEO of data management company Databricks, puts it, is to have a two hour interview with Albert Einstein, having adopted the same level of knowledge in physics.^[8] While distillation has been used for years, DeepSeek’s advances has opened the door for start-ups, developers, and companies without large budgets to utilize distillation and access the capabilities of existing large language and foundation models such as Open AI’s ChatGPT, Meta’s LLaMa, Alibaba’s Qwen, and others. This low cost of applicability and refinement stands in stark contrast to hyperscalers such as Microsoft, who had utilized ChatGPT-4 to distill its small language models, Phi, after investing $14 billion in OpenAI.^[9]

DeepSeek’s use of reinforcement learning (RL) is emerging as an avenue for building LLMs with advanced reasoning capabilities. DeepSeek was able to improve on its reasoning due to chain of thought (CoT) prompting which encourages the model to break down problems into smaller, step-by-step reasoning to subjects that are known to have definitive solutions. RL is a type of machine learning that learns by trial and error – in other words, reinforcement learning acts in a way that replicates the learning process that humans undertake to achieve their goals. Unlike supervised learning (prediction tasks) and unsupervised learning (uncovering patterns with unlabeled data), RL doesn’t explicitly tell a model what it should output. Instead, the model starts out behaving randomly and discovers desired behaviors by earning rewards for its actions.

Another method that DeepSeek employed is the Mixture-of-Experts (MoE) approach whereby an AI model is separated into separate sub-networks (or “experts”), each specializing in a subset of the input data, to jointly perform a task. MoE architectures enable large-scale models, even those comprising many billions of parameters, to greatly reduce computation costs during pre-training and achieve faster performance during inference time.^[10] This is attained via the delegation of tasks through micromanaging and the specialization of expertise, allowing a more efficient use of the neural network. MoE models take a larger base model and use it to train itself, increasing capacity relative to the base model by utilizing the MoE’s subnetwork of experts.^[11] This architecture allows a trade-off between higher computational efficiency, reduced energy consumption, improved performance, more seamless scalability, and lower training/operating costs. Utilizing this approach lessens compute requirements while simultaneously retaining model quality.

Conclusion

The immediate release of DeepSeek-V3 and its successor R1 caused significant ripples in the tech world due to its open-source, cost-effective, less computationally powered models. DeepSeek was able to accomplish this goal by shifting the current paradigm of hardware deployment to software-driven resource optimization. Due to the US government’s export control regime regarding high-end AI chips, Nvidia was unable to export its most sophisticated chips, the A100 and H100, to Chinese customers. To comply with these restrictions, Nvidia developed pared down versions of each chip, the A800 and H800, respectively, that reduced their computing capabilities, which allowed the company to continue selling to Chinese cloud computing firms such as Alibaba and Tencent.^[12] DeepSeek’s use of only 2,048 H800 chips were instrumental in the company employing its MoE and Multi-head Latent Attention (MLA) architectures for effective training and efficient inference.

While many of DeepSeek’s server infrastructure and running cost claims have yet to be independently verified (due to trade and IP secrets), its computational advances and sophisticated output veracity have accelerated the trend that, over time, efficiency improvements will continue in stride. Since most businesses do not need massive models to run their products, models such as V3 and R1 may prove to be more suitable as they are less expensive to create and require less data center infrastructure, providing a cost-effective alternative to larger, more expensive models.

References

Authors

Anas Alfarra

MBA Student, Wharton University, Pennsylvania, USA

Swetha Garimalla

MBA Student, Wharton University, Pennsylvania, USA

Carlos Loarte

MBA Student, Wharton University, Pennsylvania, USA

Crystal McKinney

MBA Student, Wharton University, Pennsylvania, USA

Sonal Madhok

Technology Risks Analyst

email Email

Omar Samhan

Technology and People Risks Analyst

email Email

Related capabilities

Related insights

See all insights

List of website locations and languages available in Americas
Location	Languages Available
Argentina	Spanish
Bermuda	English
Brazil	Portuguese
Canada	English French
Chile	Spanish
Colombia	Spanish
Costa Rica	Spanish
El Salvador	Spanish
Guatemala	Spanish
Honduras	Spanish
Mexico	Spanish
Nicaragua	Spanish
Panama	Spanish
Peru	Spanish
United States	English
Venezuela	Spanish

List of website locations and languages available in Asia-Pacific
Location	Languages Available
Australia	English
China	Simplified Chinese
Hong Kong (China, SAR)	English
India	English
Indonesia	English
Japan	Japanese
Korea	Korean
Malaysia	English
New Zealand	English
Philippines	English
Singapore	English
Taiwan	Traditional Chinese
Thailand	English Thai
Vietnam	English Vietnamese

List of website locations and languages available in Europe
Location	Languages Available
Austria	German
Belgium	English French Flemish
Croatia	English Croatian
Czech Republic	English Czech
Denmark	Danish
Finland	Finnish
France	French
Germany	German
Greece	Greek
Hungary	Hungarian
Ireland	English
Italy	Italian
Kazakhstan	Kazakh Russian
Luxembourg	French
Netherlands	Dutch English
Norway	Norwegian
Poland	Polish
Portugal	Portuguese
Romania	Romanian
Serbia	Serbian
Slovakia	Slovak
Spain	Spanish
Sweden	English Swedish
Switzerland	English French German
Turkey	Turkish
Ukraine	Ukrainian
United Kingdom	English

List of website locations and languages available in Middle East and Africa
Location	Languages Available
Cameroon	English French
Congo	French
Egypt	English
Ghana	English
Ivory Coast	French
Israel	English
Jordan	English
Kenya	English
Kuwait	English
Mauritius	English
Nigeria	English
Saudi Arabia	English
Senegal	French
South Africa	English
UAE	English
Uganda	English