Source: Detecting and preventing distillation attacks — Anthropic, Feb 2026
Anthropic published a report describing “distillation attack” campaigns conducted by three Chinese AI labs — DeepSeek, Moonshot, and MiniMax — to extract capabilities from Claude at industrial scale. The numbers are significant: 16 million exchanges, approximately 24,000 fraudulent accounts, and sophisticated proxy infrastructure.
The technical facts described are plausible and internally consistent. But it’s worth analyzing the narrative frame Anthropic chose, not just the content itself.
Technical disclosure or lobbying document?
The text blends three distinct levels: technical evidence, geopolitical positioning (“CCP”, “authoritarian governments”, “export controls”), and explicit advocacy for regulatory policy. Anthropic cites itself as a supporter of export controls, yet the public has no access to the raw data underlying the attributions.
It isn’t impossible that the facts are accurate. But the document’s structure also serves to pressure policymakers in a specific direction. That doesn’t invalidate the evidence — but it does demand a critical read.
The “illicit vs. legitimate” distillation distinction is convenient
Anthropic itself distills its own models. OpenAI used web content without explicit authorization to train GPT. The boundary between “fair use,” “legitimate,” and “illicit” in machine learning remains deeply ambiguous — both legally and technically.
The real issue isn’t the technique itself — distillation is neutral — but the ToS violation, and separately, the national security implications. Conflating the two planes weakens the argument.
“Distilled models lose their safeguards”: true, but not specific
The claim that illicitly distilled models lose safety guardrails is legitimate. However, the same happens with any open or semi-open model that gets fine-tuned after release. It isn’t a feature unique to distillation attacks — it’s a structural problem of the AI ecosystem. Using it as a strong argument for selective regulation is partially misleading.
“DeepSeek’s performance depends on Claude”: how true is this narrative?
The report attributes roughly 150,000 exchanges extracted from Claude to DeepSeek. That’s a relatively small number compared to the training corpus of a frontier model. It’s plausible that this was one component of the training data, not the primary explanation for DeepSeek-R1’s capabilities.
The implicit narrative — “their progress significantly depends on us” — serves to downplay Chinese lab achievements, which is geopolitically convenient but technically debatable.
The proposed response trends toward centralization
The announced countermeasures — intelligence sharing among major labs, stricter access controls, behavioral fingerprinting — push toward a model where a few dominant players control access to AI knowledge.
The risk of uncontrolled proliferation is real and shouldn’t be minimized. But it’s worth asking: do these measures also raise entry barriers for new actors, including legitimate ones unconnected to authoritarian governments?
What’s solid
With that said, some observations in the article genuinely deserve attention:
- The distillation attack is a real technique, independently documented outside of Anthropic.
- The use of hydra proxies to circumvent geographic restrictions is a well-known pattern in the sector.
- Systematic extraction of chain-of-thought reasoning data is a particularly sophisticated attack vector: it generates high-quality training data on multi-step reasoning — data that is otherwise hard to obtain.
- Detection via behavioral fingerprinting (anomalous volume, repetitive prompt structure, concentration on specific capabilities) is a sensible and probably effective approach for high-repetition patterns.
Takeaway
This article is simultaneously a legitimate technical disclosure and a geopolitical advocacy document. The two aren’t mutually exclusive — but conflating them leads to sloppy conclusions.
As technologists, it’s useful to read these communications by separating the layers: what is observed, what is inferred, and what is promoted.
The phenomenon is real. The narrative around it is constructed.