M3RD: Multi-Modal Mathematical Reasoning in Documents

Workshop in conjunction with ICDAR 2025, September 16-21, 2025 Wuhan, Hubei, China

Motivation

The AI community has been placing significant emphasis on mathematical reasoning as a means to explore the ability of intelligence in large language models (LLMs) and multi-modal large language models (MLLMs), such as OpenAI O1 and DeepSeek R1. As a common information medium, documents consist of text, images, tables, diagrams, charts, mathematical notations, etc. By leveraging multiple elements in documents, multi-modal mathematical reasoning focuses on enabling machines to solve, interpret, and reason about mathematical problems. It combines image and text analysis, symbolic manipulation, numerical computation, and logical inference to address challenges ranging from basic arithmetic to advanced problem-solving in algebra, calculus, and beyond, thus forming an area of growing importance in document intelligence.

Recent advancements in (multi-modal) large language models have driven progress in multi-modal mathematical reasoning, such as chart reasoning [1], table reasoning [2,3], and geometry problem solving [4,5]. Many real-world documents, such as academic papers, technical manuals, and financial reports, involve symbolic mathematics and logical reasoning. Addressing these requires algorithms that combine the precision of symbolic methods with the flexibility of modern AI. This workshop aims to bring together the researchers from industry, science, and academia to exchange ideas and discuss ongoing research in multi-modal mathematical reasoning in documents.

Schedule

TimeEvents
13:50-14:00Opening Remarks
14:00-14:40Invited Talk1, Dr. Pan Lu, Advancing Mathematical Reasoning with Language Models and Agentic Systems.
14:40-15:20Invited Talk2, Dr. Wenda Li
15:20-16:00Contributed Talks (Best/Runner-up Paper Talks)
16:00-16:20Coffee Break
16:20-17:40Poster or Oral Session

References