AlphaProof, an innovative system designed for mathematical reasoning through reinforcement learning, and AlphaGeometry 2, an enhanced version of a geometry-solving model, collaborated to tackle some of the most challenging problems in this year’s IMO. Together, they successfully solved 4 out of the 6 problems presented, equating their performance to that of IMO silver medalists.
To ensure the rigor and fairness of this evaluation, DeepMind assembled a review panel of top mathematicians. The panel included Professor Timothy Gowers, an IMO gold medalist and Fields Medal winner, and Dr. Joseph Myers, a two-time IMO gold medalist and the current chairman of the IMO 2024 Problem Selection Committee. The panel judged the AI’s problem-solving results against IMO standards.
The AI models operated by converting IMO competition problems into mathematical expressions that the models could interpret. AlphaProof specialized in algebra and number theory, not only finding solutions to two problems but also completing the required proofs. Impressively, one of these problems was the most challenging of the competition, with only five human contestants successfully solving it. Meanwhile, AlphaGeometry 2 focused on geometry problems but was unable to solve a combinatorial problem.
According to the IMO scoring system, each problem is worth 7 points, with a maximum possible score of 42 points. The DeepMind system scored 28 points, achieving full marks for each of the four problems it solved. This score aligns with the top performances of silver medalists. However, the threshold for a gold medal in 2024 was set at 29 points, with 58 contestants out of 609 achieving this level in a fiercely competitive field.