KAIST announced on the 23rd that Professor Changik Kim's research team from the Department of Electrical Engineering developed an ultra-high-efficiency video recognition model called ‘VideoMamba’.
VideoMamba is a next-generation video recognition model designed to address the complexity of existing transformer-based models.
Existing transformer-based models rely on a mechanism called self-attention, which causes computational complexity to increase quadratically.
To overcome this drawback, VideoMamba demonstrates high accuracy with 8 times less computation and 4 times less memory usage than transformer-based models, and its inference speed is 4 times faster compared to existing transformer-based models.
(From left) Professor Changik Kim, Integrated MS-PhD Program Jinyoung Park, PhD Program Heeseon Kim, PhD Program Kangwook Ko, PhD Program Minbeom Kim. Provided by KAIST
In particular, by utilizing the Selective State Space Model (SSSM) mechanism?which dynamically adjusts parameters based on input to better understand the context of sequence data?efficient processing with linear complexity is possible.
Through this, VideoMamba effectively captures the spatio-temporal information of videos and efficiently processes video data with long dependencies.
To maximize the efficiency of the video recognition model, the research team also introduced an advanced spatio-temporal forward and backward selective state space model that enables VideoMamba to analyze 3D spatio-temporal data.
This model effectively integrates unordered spatial information and sequential temporal information to enhance recognition performance.
The research team validated VideoMamba’s performance on various video recognition benchmarks and expects that VideoMamba will provide efficient and practical solutions in diverse application fields requiring video analysis.
For example, by utilizing VideoMamba, autonomous driving systems can analyze driving footage to accurately understand road conditions and recognize pedestrians and obstacles in real time to prevent accidents.
In the medical field, the research team anticipates that VideoMamba can analyze surgical videos to monitor patient status in real time and respond promptly in emergency situations.
In sports, it can analyze players’ movements and tactics during games to improve strategies and detect fatigue or injury risks in real time during training to prevent them.
Professor Changik Kim stated, “The fast processing speed, low memory usage, and improved video recognition performance compared to existing transformer-based models give VideoMamba the potential to be widely used in various video application fields in the future.”
Meanwhile, this research was conducted with support from the Ministry of Science and ICT and the Institute for Information & Communications Technology Planning & Evaluation. The study involved Jinyoung Park (integrated MS-PhD program), Heeseon Kim (PhD program), and Kangwook Ko (PhD program) from KAIST’s Department of Electrical Engineering as co-first authors, Minbeom Kim (PhD program) as a co-author, and Professor Changik Kim as the corresponding author.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

