← Resources
AI Research1 February 2025

A technical note on DeepSeek

DeepSeek has introduced a generative AI model that matches or exceeds GPT-4 performance at a fraction of the cost, through innovations in mixture-of-experts architecture and lower-precision computation — smart engineering rather than brute-force compute.

Chinese AI startup DeepSeek has introduced a new generative AI model claiming performance that rivals or surpasses leading US models like GPT-4, but at a fraction of the cost. This challenges the common belief that top-tier AI requires massive computing resources and investment.

Low Cost, High Performance

DeepSeek claims it trained its V3 model for just US$6 million using 2,000 Nvidia H800 GPUs, compared to the US$80–100 million estimated for GPT-4 and 16,000 H100 GPUs used for Meta's LLaMA 3. Despite the lower cost, benchmarks suggest V3 performs on par or better than GPT-4 in reasoning tasks.

However, the cited US$6 million likely covers only compute costs, not the full expenditure. The secret to V3's efficiency lies in model design and training data.

Training Data & Reinforcement Learning

DeepSeek has two versions: V3 and the more advanced R1. R1 competes with OpenAI's o1 and reportedly outperforms it in reasoning tasks. Unlike traditional fine-tuning, DeepSeek uses reinforcement learning (RL), reducing dependence on labelled data while enhancing reasoning ability.

During inference, R1 explains its reasoning in 1–2 minutes, providing users with clear, logical outputs. These deliberation outputs were recorded and used to fine-tune V3, significantly boosting its capabilities. Notably, there are allegations that DeepSeek used model distillation — training R1 by querying OpenAI's o1 at scale and learning from the responses.

Key Innovations Driving Efficiency

DeepSeek's cost savings and performance stem from several technical breakthroughs:

Overall, DeepSeek showcases how smart engineering and training approaches can deliver top AI performance with significantly reduced resources.