General-purpose LLMs beat specialized clinical AI on medical tests
A new study reports that general-purpose large language models (LLMs) outperformed specialized clinical AI tools across three medical evaluations. The research, conducted by Krithik Viswanath and colleagues, found that frontier LLMs scored higher than clinical AI tools in all assessments. Clinical AI tools performed at a level comparable to an auto-enabled Google Search AI Overview on the RCQ benchmark. The authors highlight the need for independent, real-world evaluation of AI tools before they are used in clinical settings. The findings were noted in a blog post on Marginal Revolution, which also referenced the study as an example of why many Emergent Ventures proposals are rejected quickly.
What’s reported
Key figures
Sources: marginalrevolution.com
