UNIST and POSTECH Develop "VEHME," an AI Teacher for Grading Handwritten Math Answers
Grading Accuracy Comparable to GPT-4o and Gemini 2.0 Flash
Accepted as a Paper at EMNLP 2025
An AI teacher has been developed that can meticulously grade and even provide feedback on messy, handwritten math answers just like a human would.
On December 17, Professor Tae-Hwan Kim of the UNIST Graduate School of Artificial Intelligence and Professor Sung-An Ko's team at POSTECH announced that they have developed 'VEHME (Vision-Language Model for Evaluating Handwritten Mathematics Expressions),' an AI model capable of grading complex, handwritten math answers.
Research team (from left) Professor Tae-Hwan Kim of UNIST, Professor Sung-An Ko of POSTECH, Researcher Thu Phuong Nguyen of UNIST, Researcher Duc M. Nguyen of POSTECH. Provided by UNIST
Grading subjective math questions is one of the most time-consuming tasks in educational settings, but automating this process has been challenging. Due to the nature of mathematical solutions, which often involve a mix of formulas, graphs, and figures, and because each student has a different handwriting style and answer layout, it has been difficult for AI to accurately recognize and detect errors.
VEHME, developed by the research team, can accurately read the position and context of mathematical expressions and identify incorrect solutions, much like a human following the flow of problem-solving.
When VEHME was used to grade a variety of math solutions, ranging from calculus to elementary arithmetic, it demonstrated grading accuracy comparable to that of large models such as GPT-4o and Gemini 2.0 Flash, despite being a lightweight model.
In particular, for challenging evaluations where answers were severely rotated or the handwriting was illegible, VEHME outperformed commercial models by more accurately pinpointing the locations of errors. While VEHME utilizes 7 billion parameters, models like GPT and Gemini are known to have hundreds of billions of parameters.
The research team was able to develop VEHME by leveraging their proprietary technology, the Expression Visual Prompt Module (EVPM), and a 'dual learning technique.' EVPM allows VEHME to draw virtual boxes around complexly arranged formulas, ensuring the solution order is preserved. The two-stage reinforcement learning not only checks whether the answer is correct but also enables the model to explain which part of the solution process was wrong and why.
Additionally, since there was a lack of sophisticated handwritten and feedback data for AI training, the team generated and utilized synthetic data using a large language model (QwQ-32B).
VEHME is open source, allowing educational institutions such as schools and academies to use it for free.
Professor Tae-Hwan Kim stated, "Handwritten math grading is one of the most challenging problems in EdTech AI and a representative application of multimodal AI, which must understand both images and language. VEHME is a model that follows and evaluates complex solution structures step-by-step like a human, and it is significant that it has achieved a level of stability and efficiency suitable for real educational environments."
Professor Kim added, "The independently developed EVPM module can automatically structure complex visual information, making it applicable not only in education but also in various industries such as document recognition, analysis of design blueprints, and digitization of handwritten records, as a multimodal reasoning model."
This research was supported by the National Research Foundation of Korea under the Ministry of Science and ICT, as well as the Institute of Information & Communications Technology Planning & Evaluation. The results have been accepted as a formal paper at EMNLP (Empirical Methods in Natural Language Processing), a leading international conference in the field of natural language processing.
This year's EMNLP was held in Suzhou, China, from November 5 to 9.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.



