ChatGPT o3 Emerges as the Top AI Model for Addressing Scientists Inquiries

The Allen Institute has unveiled a platform called SciArena, designed to assess the usefulness of AI models for researchers. The operational principles of SciArena are reminiscent of those in the Chatbot Arena, but with a focus on more complex inquiry. Participation is limited to researchers who have at least two publications, and they must undergo a one-hour training session before testing the models.

On SciArena, a researcher poses a question, and the system retrieves relevant academic papers from the Semantic Scholar database, which are then shared with two randomly selected models. Based on the selected articles and their own expertise, the models generate detailed answers. The researcher views both responses side by side and votes for the one they consider better. The name of the winning model is revealed only after the voting concludes.

Currently, the top scorer on SciArena is ChatGPT o3, with a rating of 1172 points, followed by Claude Opus 4 (1080), Gemini 2.5 Pro (1063), DeepSeek R1-0528 (1062), and ChatGPT o4-mini (1054). Notably, o3 has maintained its lead across all four major query categories: engineering, healthcare, natural sciences, and humanities and social sciences.

It should be emphasized that the SciArena ranking is primarily targeted at professional researchers, rather than casual science enthusiasts. For instance, in practice, a model may autonomously seek information and might reference unreliable sources. However, SciArena mitigates this risk by having its literature selection overseen by the system, allowing the Allen Institute’s expertise to aid developers in refining the search capabilities for scientific inquiries.

P.S. You can support me by subscribing to the channel «the runaway neural network«, where I discuss AI from a creative perspective.