Video Summary

My Honest Thoughts about Deepseek

Matthew Berman

Main takeaways

Deepseek V4 is a powerful open-source LLM with a 1,000,000-token context and a mixture-of-experts design (1.66T total, 49B active).

A cheaper 'flash' variant (284B total, 13B active) targets high-volume, low-cost use cases at pennies per million tokens.

Benchmarks show V4 slightly trails top closed models (Opus 47, GPT‑5.5) but is close enough for most real-world applications.

China’s algorithmic advances plus efficient training have partly offset hardware export controls, narrowing the U.S. edge.

Reported distillation attacks and benchmark comparisons create IP and security concerns, though Deepseek’s public white paper increases transparency about methods and limits of data use.

Key moments

Questions answered

What is special about Deepseek V4?

Deepseek V4 is an open-source frontier LLM with a 1,000,000-token context window and a mixture-of-experts architecture (1.66T total/49B active) plus a cheaper 'flash' variant for high-volume use.

How does V4 perform compared to closed-source leaders?

Benchmarks show V4 slightly lags top closed models like Opus 47 and GPT‑5.5, but its performance is close enough for most practical use cases while being far cheaper.

Why are export controls not fully preventing competition?

Export controls limit chip access, but algorithmic efficiency, MoE techniques, and clever training methods let labs like Deepseek produce competitive models despite constrained hardware.

What are distillation attacks and why do they matter here?

Distillation attacks extract model Q&A pairs at scale to train new models. U.S. labs allege industrial-scale campaigns by foreign entities that can erode proprietary advantages, though reported Deepseek activity appears smaller than some peers.

What should the U.S. do in response?

Suggested steps include boosting open-source offerings from U.S. labs, improving model efficiency, and driving down costs so enterprises prefer domestic options without sacrificing accessibility.

Deepseek V4: A Game Changer in AI 00:10

"Deepseek just dropped their latest flagship model V4. It's massive, powerful, open-source, and a fraction of the cost."

Deepseek has introduced its latest model, V4, which is described as powerful and open-source, offering similar capabilities at a much lower cost compared to leading AI models.
This model may significantly shift the balance of power in artificial intelligence due to its efficiency and affordability.
Despite the United States having superior resources and investment in AI, China has managed to develop a competitive model that challenges this dominance.

Context and Parameters of Deepseek V4 02:55

"Deepseek V4 comes in two flavors, pro and flash... it has a million token context length."

Deepseek V4 features impressive specs, including a context length of one million tokens, positioning it at the forefront of AI model capabilities.
The model's architecture includes 1.66 trillion total parameters with 49 billion active parameters, utilizing a mixture of experts approach to optimize computation.
Additionally, the flash version of Deepseek is designed to be cheaper, faster, and more efficient, boasting 284 billion total parameters with 13 billion active parameters.

Performance Metrics and Benchmarks 04:23

"What we're seeing is although it is slightly behind here, it's right up there."

In various benchmarking tests, Deepseek V4 performs comparably to the leading models, indicating that while it may lag behind a bit, it is still a strong contender.
Although slightly less intelligent than models such as Opus 47 and GPT 5.5, Deepseek V4 stands out due to its cost-efficiency and performance in a majority of practical use cases.
This efficiency makes it a significant concern for U.S. AI developers, as it underscores the trend of making advanced technology more accessible.

Geopolitical Implications of Export Controls 07:22

"Are export controls actually working? The answer is kind of yes and kind of no."

Export controls put in place by the U.S. to prevent top-tier chips from reaching China are somewhat effective, as China lacks the same computational resources.
However, the effectiveness of these controls is limited due to China's ability to innovate on the algorithmic front, allowing their models to remain competitive despite hardware limitations.
This situation raises concerns about the future trajectory of AI development and competition between the U.S. and China, particularly as China prepares to develop its own AI chips utilizing American technology.

Distillation Attacks: Stealing AI Insights 09:28

"The U.S. has evidence that foreign entities primarily in China are running industrial-scale distillation campaigns to steal American AI."

Recent reports indicate that Chinese AI labs have engaged in distillation attacks aimed at acquiring sensitive information from models like Claude and ChatGPT by extracting question-answer pairs for their training.
This kind of intellectual property theft poses a significant threat to American AI companies, as it undermines the competitive advantages of models built by firms like Anthropic and OpenAI.
The U.S. government has acknowledged ongoing industrial-scale efforts to counteract these practices in order to protect American advancements in artificial intelligence.

American Innovation and DeepSeek 10:22

"The U.S. government acknowledging the theft of American breakthroughs by foreign entities highlights the seriousness of AI innovations being targeted."

The discussion centers around the acknowledgment by the U.S. government of foreign entities utilizing various proxies and coordinated techniques to exploit American innovations. This is a pressing issue that has been reported previously by Anthropic.
Notably, despite the allegations, the Anthropic report reveals that the Chinese labs, particularly DeepSeek, did not steal substantial amounts of data. There is room to argue that their actions might not even qualify as theft, rather than comparisons for performance benchmarking.
In the realm of AI development, companies like DeepSeek run benchmark comparisons to evaluate their models against competitors. These benchmarks may appear similar to what is termed a "distillation attack," which is often viewed in a negative light.
The scale of DeepSeek's reported distillation attack amounts to only 150,000 exchanges, significantly lower than its competitors like Moonshot and Miniax, which reported millions of exchanges. This raises questions about the quality achieved by DeepSeek relative to the amount of data exchanged.

Open Source vs. Proprietary Models 11:58

"DeepSeek's open-sourced model, coupled with a comprehensive white paper, presents a compelling case against concerns of data theft."

The capability of DeepSeek to produce high-quality models is complemented by providing an open-source solution. This transparency fuels curiosity about how they reached their results with seemingly low data interaction.
The announcement reveals that DeepSeek is currently facing compute capacity constraints. They have plans to expand their services later in the year, with expectations of drastically reducing prices for their offerings.
In the AI landscape, pricing is becoming a critical determinant for enterprises. DeepSeek's technology might not reach the absolute top performance but is positioned as "nearly state-of-the-art," which is sufficient for many businesses that do not require cutting-edge solutions.

Economic Implications for U.S. Enterprises 14:15

"The decision to adopt Chinese open-source models over U.S. alternatives raises potential security and economic risks."

The cost-effectiveness of DeepSeek compared to U.S. models like Opus 47 and GPT 5.5 may lead U.S. enterprises to shift their strategies toward adopting Chinese open-source technologies.
This trend emerges during a time of immense investment in AI within the United States, where the fear of not achieving returns on such investments could lead to significant economic consequences.
The cultural ramifications are equally significant. The rise of AI based on Chinese models could change the narrative control that is currently held by U.S. tech, posing risks to the dissemination of information and the potential to influence public opinion.

Initiatives Needed in the U.S. 17:04

"The U.S. needs to focus on enhancing open-source options and driving down costs in AI technology to remain competitive."

To combat the rising attractiveness of Chinese AI models, two key initiatives are suggested: a push for open-source advancements within U.S. labs and striving for greater efficiency and reduced pricing.
Current proprietary models from leading frontier labs often lack open-source flexibility, which puts U.S. companies at a disadvantage in a rapidly evolving technology landscape.
Ensuring that U.S. AI remains competitive requires reducing costs significantly, making it feasible for enterprises to adopt these technologies without the burden of high expenses.

Browse technology summaries

Jump to the technology topic page and keep exploring related summaries.