AI models tested on finding errors in economics papers
A new piece by Alexis Akira Toda documents experiments testing whether artificial intelligence can refute economic theory. Toda asked several AI models, including Gemini, Refine, Claude, and ChatGPT, to check the correctness of four published economics papers, each containing an error that Toda helped identify or correct. ChatGPT Pro performed best among the models, occasionally constructing counterexamples and corrected proofs, while other models fared worse. However, no model located a true error without substantial human guidance, and data contamination complicates interpretation. Toda argues that a competent human paired with a frontier model can outperform current peer review, but AI cannot yet refute economic theory on its own.
What’s reported
Key figures
Sources: marginalrevolution.com
