Skip to main content
Didit Raises $2M and Joins Y Combinator (W26)
Didit
Back to blog
Blog · April 16, 2026

KYC in the LLM Era: Why Frontier AI Labs Need Identity Verification to Survive

Frontier models cost hundreds of millions to train and can be distilled for pennies. KYC on API access is becoming mandatory. Here is why identity verification is the new moat for AI labs.

By DiditUpdated
cover-kyc-llm-era.png

In February 2026, Anthropic published evidence that three Chinese AI labs had collectively run 16 million exchanges with Claude using 24,000 fraudulent accounts. The purpose was not casual experimentation. It was industrial-scale distillation: training cheaper, weaker models on the outputs of the most expensive AI system ever built.

Two months later, Anthropic rolled out passport-and-selfie identity verification on Claude.

That sequence is not a coincidence. It is the defining compliance story of the LLM era. Frontier AI is being dragged, quickly and unavoidably, into the same "know your customer, monitor your customer" discipline that banks, brokers, and crypto exchanges live under. This post explains why, what it looks like in practice, and what every AI company — not just the frontier labs — should be doing about it.

The Economics That Make KYC Inevitable

Training a frontier model today costs between $100 million and $1 billion in compute alone. GPT-4, Claude 3.5 Opus, Gemini Ultra, Grok 3 — all sit in that range. The next generation will cross into the $1 to $10 billion bracket.

Distillation costs roughly 0.1% of that. Give a weaker model a few million high-quality examples from a stronger one, fine-tune for a few weeks, and you have recovered a large fraction of the target model's capability on most benchmarks.

The gap between "train a frontier model" and "distill a frontier model" is three orders of magnitude. That asymmetry is the most important economic fact in AI right now. It explains why every major frontier lab is either already running a KYC program or has one in active development.

Without KYC, the attack is trivial:

  1. Sign up for as many API accounts as you can automate
  2. Route traffic through residential proxies to defeat IP rate limits
  3. Use fabricated emails, rented phone numbers, and prepaid cards
  4. Pull a few million reasoning traces across coding, math, tool use, and agentic tasks
  5. Train your own model on the dataset
  6. Release it for free or at a fraction of the original's price

The total bill for the attacker is tens of thousands of dollars in API spend. The commercial damage to the lab whose model was distilled is in the billions. This is not a stable system.

What Distillation Actually Looks Like

Anthropic's technical write-up described the attack patterns with unusual clarity. The signatures they detected include:

  • Repetitive prompt templates across hundreds of coordinated accounts, designed to elicit consistent reasoning chains
  • Chain-of-thought elicitation patterns — prompts that force the model to expose its full reasoning, which is then scraped as training data
  • Capability-targeted traffic — entire fleets of accounts focused exclusively on coding, agentic tool use, or mathematical reasoning, depending on the target capability
  • "Hydra cluster" architectures — networks of accounts distributed across APIs and cloud providers to stay under per-endpoint anomaly thresholds
  • Commercial proxy services managing tens of thousands of accounts simultaneously, mixing distillation traffic with legitimate workloads to poison the signal

The named actors — DeepSeek, Moonshot AI, MiniMax — were responsible for specific operations:

  • MiniMax: 13 million exchanges, focused on agentic coding and tool orchestration
  • Moonshot AI: 3.4 million exchanges, covering agentic reasoning, coding, and computer vision
  • DeepSeek: 150,000 exchanges, extracting reasoning capabilities

Every frontier lab assumes the same attack is being run against them. Most are not yet publishing the numbers.

Why KYC Specifically

There are many possible defenses against distillation. KYC is not the only one, and by itself it is not sufficient. It is, however, the foundational layer that makes every other defense work.

Detection Without Identity Is a Leaky Sieve

You can build excellent behavioral classifiers that detect distillation patterns. Anthropic did. But if the attacker can spin up 1,000 new accounts in an hour, your classifier's value decays fast. Every banned account is replaced before you finish writing the ban rationale.

With verified identity, each banned account imposes a real cost on the attacker — they need a new identity, a new document, a new biometric. At some price point, the attack stops being profitable.

Legal Recourse Requires a Real Defendant

Anthropic can sue DeepSeek. It cannot sue "account-98234@tempmail.com." Terms-of-service violations are only enforceable if you know who violated them. KYC turns terms of service from a symbolic document into an actionable contract.

Safety Controls Collapse Without Identity

The entire catalog of capability-gated deployments — biosecurity uplift thresholds, export-control workflows, sanctioned-entity blocking, minor protection — depends on knowing, at minimum, the jurisdiction, age, and legal status of the user. You cannot filter who you do not identify.

Regulators Are Arriving

The EU AI Act is in force. The UK AI Safety Institute has direct testing agreements with frontier labs. The US executive order on AI sets reporting thresholds. The Cyberspace Administration of China already requires identity verification on generative AI. KYC on AI access is moving from best practice to regulatory expectation across every major jurisdiction.

The Emerging Playbook for LLM KYC

The shape of KYC for AI platforms is converging fast. Based on what Anthropic, OpenAI, Google DeepMind, and the larger enterprise AI cloud providers are now doing, the standard program looks like this.

Tier 1: Public Access

Free tier, consumer chat products. Email verification, phone verification, device fingerprinting, CAPTCHAs. No document verification unless risk signals trigger it. The goal is to filter obvious abuse without destroying the signup funnel.

Tier 2: API Access

Paid API customers. Payment method verification as proxy identity (Stripe-level KYC), plus some combination of:

  • Phone verification at signup
  • IP geolocation and jurisdiction screening
  • Organization email domain verification for enterprise
  • ID verification triggered by volume thresholds, capability tier, or anomaly signals

This is where Anthropic's current Claude rollout sits.

Tier 3: Enhanced Due Diligence

Enterprise contracts, bulk inference commitments, access to frontier capabilities (long-context reasoning, agentic tool use, coding at scale). The full KYC stack:

  • Government-issued ID verification with liveness detection
  • Biometric selfie matched to ID photo
  • Sanctions, PEP, and adverse media screening
  • Beneficial ownership for corporate customers
  • Source-of-funds for very large commitments
  • Intended-use attestation with contractual restrictions

Tier 4: High-Risk Capabilities

Anything that crosses the lab's Responsible Scaling Policy or equivalent threshold — biology-uplift models, autonomous agents with real-world write access, dual-use cyber capabilities. Bespoke onboarding with manual review, government customer verification, export-control compliance, periodic re-verification.

Most end users will only ever see Tier 1. Builders will live in Tier 2. Enterprise customers will experience Tier 3. Tier 4 is reserved for a small number of approved entities under direct government oversight.

What Frontier Labs Are Getting Wrong

The early rollouts are learning on the fly, and the mistakes are instructive.

Silent Rollouts Destroy Trust

Anthropic launched identity verification on Claude with a single help center article. No blog post. No advance notice. No published scope. The resulting backlash was predictable and largely avoidable. Users accept KYC when the rationale is clear and the data handling is explicit. They rebel when verification appears overnight with no explanation.

Unclear Triggers Create Paranoia

"Some users, for some features" is a reasonable rollout strategy but a terrible communication strategy. Users assume the worst — that the trigger is political, ideological, or arbitrary. Publish the triggers. "We verify when you exceed X requests/day, when you access Y capability, or when our fraud signals flag Z pattern" is a far better message than opaque rollouts.

Holding Biometric Data In-House Is a Mistake

Every frontier lab that has built its own identity verification stack will regret it inside two years. Biometric custody is a specialized, regulated, audited business. Partner with a dedicated provider (Persona, Onfido, Didit) and stay out of the data custody business. Anthropic got this part right.

Ignoring the Developer Experience

If KYC blocks your API customer for two days while a reviewer looks at a fuzzy document scan, you have lost that customer. The best verification flows complete in under 90 seconds on a mobile device with real-time liveness checks and automated document review. Anything slower is competitive disadvantage.

What Every AI Product Should Do, Not Just the Frontier Labs

If you are building on top of an LLM API — a chatbot, an agent platform, a coding tool, a content product — you are not exempt from this shift. You are downstream of it.

Three practical recommendations:

1. Assume Your Upstream Provider Will Require More Verification

Anthropic will ask more of its API customers over time. So will OpenAI. If your company cannot pass enhanced due diligence (verified beneficial ownership, intended use attestations, export-control screening), your API access is at risk. Get your corporate KYC posture clean now, before it is an emergency.

2. Implement Risk-Based KYC on Your Own Users

Your product is probably being abused at the same rates the frontier labs are. Spam agents, scraping networks, impersonation bots, fraud rings. The right architecture:

  • Low friction at signup — email, phone, device fingerprinting
  • Verification triggered by risk signals — volume, anomaly, suspicious patterns, sensitive features
  • Enhanced verification for paid tiers — document + liveness + sanctions screening
  • Continuous monitoring — behavioral fingerprints, re-verification on anomalies

This is the same risk-based model banks have used for decades, adapted for AI products.

3. Pick an Identity Provider That Fits AI Workflows

Legacy KYC vendors were built for banks. They are slow, expensive, and optimized for the wrong metric. AI products need:

  • Fast verification — under 90 seconds end-to-end
  • Usage-based pricing — no minimums, no enterprise contracts for experimentation
  • Broad document coverage — 14,000+ document types across 220+ countries (AI products are global from day one)
  • Real liveness detection — because deepfake-driven fraud is already the norm in 2026
  • Clean API — because AI companies ship weekly, not quarterly

This is the gap Didit was built for: core KYC at $0.30 per verification, no contracts, no minimums, 500 free checks per month. It is the shape of identity verification that matches how AI companies actually build and scale.

The Endgame

Five years from now, signing up for an API account with a frontier AI lab will feel like opening a brokerage account. Verified identity. Source-of-funds checks for large commitments. Ongoing monitoring. Suspicious activity reporting. Periodic re-verification. Access tiers mapped to capability tiers.

This will strike some people as dystopian. It is, however, the logical endpoint of two forces: the staggering cost of frontier training, and the staggering capability of what is being trained. When the thing on the other side of the API can meaningfully uplift a bioweapons program, or be distilled into a product that destroys billions in enterprise value, the access layer has to look like regulated financial infrastructure.

The labs that figure out how to do this without breaking developer experience will win. The ones that either refuse to verify (and get distilled into irrelevance) or verify poorly (and lose developers to competitors) will not.

KYC is not the enemy of innovation in AI. Unchecked distillation is. The sooner the industry internalizes this, the better the equilibrium looks for everyone — labs, developers, enterprise customers, and the users who depend on the AI layer continuing to exist.

---

Didit provides identity verification infrastructure built for AI-native products. Document verification, biometric liveness, AML screening, ongoing monitoring — at $0.30 per check, across 220+ countries. Start free.

are you ready for free kyc.png

Infrastructure for identity and fraud.

One API for KYC, KYB, Transaction Monitoring, and Wallet Screening. Integrate in 5 minutes.

Ask an AI to summarise this page
KYC for LLM Access: Stopping Distillation Attacks | Guide