Study Reveals AI Struggles to Read Clocks and Understand Calendars

Researchers at the University of Edinburgh investigated the capability of seven multimodal large language models (LLMs) to interpret and generate various forms of information, particularly focusing on answering time-related questions using images of clocks and calendars. The findings indicated that these models face challenges with such fundamental tasks.

The ability to interpret and reason about time through visual input is essential for numerous real-world applications, ranging from event planning to autonomous systems, as highlighted by the study’s authors.

Despite advancements in multimodal LLMs, most of the research has primarily concentrated on object detection and text recognition in images, leaving the analysis of time-related concepts underexplored, the researchers continued.

The team evaluated OpenAI’s GPT-4o and o1, Google DeepMind’s Gemini 2.0, Anthropic’s Claude 3.5 Sonnet, Meta’s Llama 3.2-11B-Vision-Instruct, Alibaba’s Qwen2-VL7B-Instruct, and ModelBest’s MiniCPM-V-2.6.

Researchers provided the models with various images of analog clocks, including dials with Roman numerals, in different colors and without a second hand. They also incorporated calendar images spanning ten years.

The scientists posed a range of questions regarding time and dates, such as the dates of New Year’s Day or the 153rd day of the year.

Reading analog clocks and comprehending calendars involves complex cognitive processes, including detailed visual recognition (the position of the clock hands and arrangement of calendar cells) and intricate numerical reasoning (accounting for leap years), the researchers noted.

Overall, the AI models correctly read the time on analog clocks in less than 25% of cases. The LLMs struggled with interpreting clocks featuring Roman numerals and stylized hands, similarly to those without a second hand. This challenge may stem from difficulties in detecting the hands and interpreting the angles on the clock face, the researchers explained.

Gemini-2.0 demonstrated the highest accuracy in clock-related tasks, while o1 performed better with calendars, but this model still made errors around 20% of the time.

The current study underscores a significant gap in AI’s ability to perform basic tasks that humans handle easily, remarked Rohit Saxena, a co-author of the study and a graduate student at the University of Edinburgh’s School of Informatics. He emphasized that addressing such shortcomings is crucial for the successful deployment of AI systems in time-sensitive applications.

The preprint of the scientific paper titled «Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs» was published on February 7, 2025, on the website arxiv.org (DOI: arXiv.2502.05092).

Meta Platforms*, along with its social networks Facebook** and Instagram**:
* — recognized as an extremist organization, its activities are banned in Russia;
** — are prohibited in Russia.