Claude Opus 4.6 and Gemini 3.1 Pro have passed 50% on Humanity's Last Exam — a benchmark designed to be unsolvable by current AI systems.
Frontier AI Models Now Pass 50% of Humanity's Last Exam — A Benchmark Built to Be Unsolvable
By Hector Herrera | April 15, 2026 | Science
Claude Opus 4.6 and Gemini 3.1 Pro have crossed 50% accuracy on Humanity's Last Exam — a benchmark specifically designed to be beyond the reach of current AI. They passed it anyway. MIT Technology Review's April 13 analysis documents the milestone with charts drawn from benchmark tracking data across major frontier models.
What Humanity's Last Exam is: HLE is a benchmark containing 3,000 questions across expert domains — mathematics, physics, chemistry, biology, history, law, and others — sourced from PhD-level material and verified to be beyond what any AI could answer at the time of its creation (early 2024). It was designed to last years. It lasted about 24 months.
Get this in your inbox.
Daily AI intelligence. Free. No spam.
What 50% means: Randomly guessing on a multi-choice exam might score 20-25%. Expert human performance on HLE varies by domain but averages around 85-90% for specialists in each field. A 50%+ score from an AI system means models are now operating in the range between random chance and human expert — closer to the expert end than many expected at this point.
What it does not mean: Raw benchmark scores do not translate directly to real-world capability. HLE questions are answered in controlled conditions with no time pressure, unlimited attempts, and no consequences for error. Deploying a model that can answer 50% of PhD-level chemistry questions in a production context — where errors cause harm, context is ambiguous, and users may not know when the model is wrong — is a separate and harder problem.
The benchmark acceleration is real. The deployment gap — the distance between what models can do in testing and what they reliably do in production — remains significant. Both things are true at the same time.
Hector Herrera is the founder of Hex AI Systems and editor of NexChron.
Did this help you understand AI better?
Your feedback helps us write more useful content.
Get tomorrow's AI briefing
Join readers who start their day with NexChron. Free, daily, no spam.