AI models tested on finding errors in economics papers

6 reported

A new piece by Alexis Akira Toda documents experiments testing whether artificial intelligence can refute economic theory. Toda asked several AI models, including Gemini, Refine, Claude, and ChatGPT, to check the correctness of four published economics papers, each containing an error that Toda helped identify or correct. ChatGPT Pro performed best among the models, occasionally constructing counterexamples and corrected proofs, while other models fared worse. However, no model located a true error without substantial human guidance, and data contamination complicates interpretation. Toda argues that a competent human paired with a frontier model can outperform current peer review, but AI cannot yet refute economic theory on its own.

What’s reported

The experiments involved asking AI models to check four published economics papers, each containing an error the author helped identify or correct.
Models tested included Gemini, Refine, Claude, and ChatGPT.
ChatGPT Pro performed best, occasionally constructing counterexamples and corrected proofs.
No model located a true error without substantial human guidance.
Data contamination complicates interpretation of the results.
The author argues a competent human paired with a frontier model can outperform current peer review.

Key figures

Alexis Akira Toda, author of the piece.

Sources: marginalrevolution.com

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *