- Built 100,000 Sets of Error Data for AI Training Summaries
- Enhancing AI Performance and Contributing to AI for Error Detection and Correction
The Altovision Consortium announced on the 21st that it has successfully completed the construction of 100,000 sets of AI training data through the Korea Intelligent Information Society Agency (hereinafter NIA)’s “Abstract Summary Factuality Verification Data” project (hereinafter Abstract Summary Project).
The consortium, led by Altovision Co., Ltd. as the main organization, with participation from Naraknowledge Information Co., Ltd. and Bflysoft Co., Ltd., has been advancing the project since July. Bflysoft Co., Ltd. was responsible for raw data collection, refinement, and source data generation; Altovision Co., Ltd. handled data processing; and Naraknowledge Information Co., Ltd. took charge of inspection and quality management.
CEO O Ju-yang of Altovision Co., Ltd. stated, “As AI training data becomes more refined, it will contribute to the advancement of AI models, bringing our daily lives and AI technology closer together.”
The summary method commonly used in internet articles or summary services is the extractive summarization method. It is utilized in article summary services provided by portals such as Naver and Daum, as well as some media outlets. While it has the advantage of producing highly complete sentences, it has limitations such as awkward sentence connections, omission of important content, and duplication of similar content.
In contrast, abstractive summarization has the advantage of faithfully summarizing content and is a more advanced summarization method than extractive summarization. The abstractive summarization method conducted by this consortium differs from extractive summarization, which selects and presents sentences containing key content from the article as is; instead, AI summarizes the content of the article into new sentences.
It is expected that the abstractive summary error data constructed this time will greatly contribute to improving the performance of abstractive summarization AI and to developing AI that detects or corrects sentence errors in the future.
Regarding the specifics of the Abstract Summary Project, AI machine summaries and human summaries were created using original texts by domain (articles, columns, legal texts), and data was constructed so that AI can learn the errors contained in the summaries. The types of errors were classified into six categories, broadly divided into sentence errors and content errors.
Sentence errors include △Korean spelling and spacing errors △word choice errors △ungrammatical sentences △incomplete or unfinished sentences, while content errors include △keyword or important content errors △repetition of similar content errors. To enable AI to learn these errors, each data set contains the erroneous summary, the location of the summary error, error type information, and corrected information, all structured in a json file format.
CEO O Ju-yang of Altovision Co., Ltd. said, “If abstractive summary error data is utilized, various new AI models can be created,” adding, “Altovision plans to collaborate with TeddySome Co., Ltd. in 2023 to develop a solution that automatically corrects sentence errors in newspaper articles using the data constructed this year.”
Meanwhile, Altovision Co., Ltd., established in 2020, is a small and medium-sized enterprise specializing in building AI training data. It has carried out projects such as NIA’s data construction projects, Gangneung City service projects, and projects for the National IT Industry Promotion Agency (NIPA).
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

