Loading…
Data Tech 2026 has ended

Friday May 15, 2026 3:30pm - 4:00pm CDT
Frontier AI models are achieving remarkable accuracy on standard benchmarks, but accuracy alone does not tell us whether a model knows what it does not know. In safety-critical domains like healthcare, finance, and law, a model's ability to signal uncertainty is just as important as its ability to answer correctly.

This session presents the Calibration Blindspot Benchmark (CBB), an original three-task metacognition evaluation suite tested across 8 frontier models from 6 vendors, Anthropic, Google, OpenAI, DeepSeek, QwenLM, and Z.ai. Results revealed a structural blindspot: models maintained 100% confidence on every prediction regardless of correctness, never forecasted their own errors (Error Recall = 0.000), yet detected others' errors near-perfectly (0.972-1.000). Models can see everyone's mistakes except their own. DeepSeek-R1 was the only model to fail entirely, suggesting reasoning-trained models have a distinct metacognitive failure mode.

Attendees will learn why confidence calibration matters for production AI deployment, how to evaluate metacognition in LLMs, and what this blindspot means for anyone building AI workflows in healthcare, finance, or legal settings.
Speakers
avatar for Emmanuel Chea, MPH

Emmanuel Chea, MPH

Clinical Data Scientist & Founder, Lexify Health
Clinical data scientist, MPH (U of MN). 1st place, 2026 HeatMap Hackathon (BData/ABA). Founder, Lexify Health - AI platform automating clinical policy-to-SQL. 7+ years healthcare data science. Minneapolis, MN.
Friday May 15, 2026 3:30pm - 4:00pm CDT
(c) Alaska Best Buy HQ, 7700 Knox Ave S, Richfield, MN 55423

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link