Models

China’s open-weights surge: four frontier-class models in twelve days

April 29, 2026 · 6 min read

In April, four Chinese labs — Z.ai, MiniMax, Moonshot, and DeepSeek — released open-weights models that land near the Western frontier on agentic engineering benchmarks at less than a third of the inference cost. For builders, the “self-hosted is a downgrade” assumption no longer holds.

The four models, briefly

DeepSeek V4

The strongest all-rounder of the four. Mixture-of-Experts, strong on coding and math reasoning, with a permissive license that allows commercial use. The community has already produced quantized variants that run on a single high-end consumer GPU at usable speed.

Z.ai GLM-5.1

The best of the four for tool-use and structured output, in our testing. Notable for unusually clean function calling and a long-context retrieval profile that holds up past 200k tokens. If you are building agents that call APIs, this is the one to evaluate first.

Moonshot Kimi K2.6

The Chinese-language strength leader and the closest thing to a “Claude-style” conversational model in the open-weights tier. Long-context up to a million tokens. Slower than the others on routine queries but unusually patient on complex inputs.

MiniMax M2.7

The video and multimodal specialist. Four of the top five video models by Elo are now Chinese-built; M2.7 is the open-weights entry. Useful for teams who need on-prem video understanding without sending frames to a hosted API.

Why this matters for builders outside China

Three concrete shifts.

Self-hosting is now competitive. Until recently the best argument against self-hosting was “the closed models are just better.” On agentic engineering and coding, that gap has closed enough that for many workloads it is now within margin of error. The remaining moat for hosted frontier models is hard reasoning and the long tail of edge cases.
Cost arbitrage is real. A workload that costs $10,000 per month on Claude Opus 4.7 can land at $2,500–$3,500 on a hosted DeepSeek V4 endpoint, or lower if self-hosted at scale. The math now favors a router that escalates to the frontier only when the open-weights model is uncertain.
Sovereignty constraints are easier to meet. EU, healthcare, defense, and government workloads that cannot send data to US-hosted APIs now have credible options that do not require running a worse model.

The risks people downplay

License terms vary across the four and have changed at least once each in the last six months — read the actual file before committing. Supply-chain provenance for weights is harder to verify than for closed models; teams in regulated industries should sign and verify hashes from official sources. And although these models match the frontier on standard benchmarks, the long tail of unusual prompts is still where the closed models pull ahead.

How to evaluate one this week

Pull a representative sample of your real workload — at least 100 prompts with the inputs and the answers you wished you had gotten.
Run them through your current model and the open-weights candidate. Score blind.
Look at the disagreements, not the averages. The averages will be close. The interesting question is: when one model wins, why?
If the open-weights model wins or ties on more than 80% of the workload, route those cases to it and keep the frontier model for the rest.