Name: The Calibration Blindspot: Why AI Models Can't See Their Own Errors
Start: 2026-05-15T15:30:00-0500
End: 2026-05-15T16:00:00-0500

The Calibration Blindspot: Why AI Models Can't See Their Own Errors

Friday May 15, 2026 3:30pm - 4:00pm CDT

Frontier AI models are achieving remarkable accuracy on standard benchmarks, but accuracy alone does not tell us whether a model knows what it does not know. In safety-critical domains like healthcare, finance, and law, a model's ability to signal uncertainty is just as important as its ability to answer correctly.

This session presents the Calibration Blindspot Benchmark (CBB), an original three-task metacognition evaluation suite tested across 8 frontier models from 6 vendors, Anthropic, Google, OpenAI, DeepSeek, QwenLM, and Z.ai. Results revealed a structural blindspot: models maintained 100% confidence on every prediction regardless of correctness, never forecasted their own errors (Error Recall = 0.000), yet detected others' errors near-perfectly (0.972-1.000). Models can see everyone's mistakes except their own. DeepSeek-R1 was the only model to fail entirely, suggesting reasoning-trained models have a distinct metacognitive failure mode.

Attendees will learn why confidence calibration matters for production AI deployment, how to evaluate metacognition in LLMs, and what this blindspot means for anyone building AI workflows in healthcare, finance, or legal settings.

Speakers

Emmanuel Chea, MPH

Clinical Data Scientist & Founder, Lexify Health

Clinical data scientist, MPH (U of MN). 1st place, 2026 HeatMap Hackathon (BData/ABA). Founder, Lexify Health - AI platform automating clinical policy-to-SQL. 7+ years healthcare data science. Minneapolis, MN.

Friday May 15, 2026 3:30pm - 4:00pm CDT
(c) Alaska Best Buy HQ, 7700 Knox Ave S, Richfield, MN 55423

4 - More Technical

Data Tech 2026

Emmanuel Chea, MPH

Get help with the event

Data Tech 2026

Emmanuel Chea, MPH

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Get help with the event