Gemini 3 Pro scores 69% trust in blinded testing up from 16% for Gemini 2.5: The case for evaluating AI on real-world trust, not academic benchmarks - CodeGurus

Sean Michael Kerner December 3, 2025 Credit: Image generated by VentureBeat with FLUX-2-ProJust a few short weeks ago, Google debuted its Gemini 3 model, claiming it scored a leadership position in multiple AI benchmarks. But the challenge with vendor-provided benchmarks is that they are just that — vendor-provided. A new vendor-neutral evaluation from Prolific, however, puts Gemini 3 at the top of the leaderboard. This isn’t on a set of academic benchmarks; rather, it’s on a set of real-world attributes that actual users and organizations care about. Prolific was founded by researchers at the University of Oxford. The company delivers high-quality, reliable human data to power rigorous research and ethical AI development. The company’s “HUMAINE benchmark” applies this approach…

Related Articles