Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR
Computer Science > Artificial Intelligence arXiv:2507.15855 (cs) [Submitted on 21 Jul 2025 (v1), last revised 30 Sep 2025 (this version, v4)] Title:Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline Authors:Yichen Huang, Lin F. Yang View a PDF of the paper titled Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline, by Yichen Huang and Lin F. Yang View PDF HTML (experimental) Abstract:The International Mathematical Olympiad (IMO) is widely regarded as the world championship of high-school mathematics. IMO problems are renowned for their difficulty and novelty, demanding deep insight, creativity, and rigor. Although large language models perform well on many mathematical benchmarks, they often struggle with…