Frontier models are failing one in three production attempts — and getting harder to audit - CodeGurus

Taryn Plumb 12:35 pm, PT, April 15, 2026 CleoP made with MidjourneyAI agents are now embedded in real enterprise workflows, and they’re still failing roughly one in three attempts on structured benchmarks. That gap between capability and reliability is the defining operational challenge for IT leaders in 2026, according to Stanford HAI’s ninth annual AI Index report.This uneven, unpredictable performance is what the AI Index calls the “jagged frontier,” a term coined by AI researcher Ethan Mollick to describe the boundary where AI excels and then suddenly fails.“AI models can win a gold medal at the International Mathematical Olympiad,” Stanford HAI researchers point out, “but still can’t reliably tell time.” How models advanced in 2025Enterprise AI adoption has reached 88%….

Related Articles