AI specialist company B2N (CEO Taeil An) announced on the 4th that it has successfully completed the “2023 AI Training Data Construction Project” led by the Ministry of Science and ICT and promoted by the National Information Society Agency (NIA).
The “AI Training Data Construction Project” is a government-led initiative supporting the construction of large-scale, high-quality data at the national level in response to the emergence of ultra-large AI models represented by ChatGPT.
In this project, B2N participated as the dedicated quality management company and quality management service provider for AI training data in three consortia: ▲video summary data based on speech recognition ▲comic webtoon data ▲building crack detection images (advanced). Furthermore, B2N closely collaborated with various AI-related specialized companies such as Saltlux, PCN, and Timbel to establish a systematic quality management system.
In last year’s AI training data construction project, B2N conducted quality inspections on a total of four types of AI training data, amounting to 660,000 cases. The detailed quantities include 630,000 images, 30,000 sub-labeling cases (ultra-large AI corpora, image captions), and 3,000 hours of audio. Based on the accuracy and stability of AI training data quality management technology, B2N successfully managed quality across various fields including Korean language, disaster safety environment, and cultural tourism.
Notably, in this project, quality verification was also conducted on high-quality corpus data that can be used as language models to support the latest AI technology of ultra-large AI. A complete quality inspection was performed on the constructed corpus data consisting of a total of 1.86 million sentences and 17.44 million tokens (word units).
Additionally, based on project goals and requirements, B2N carried out overall quality management tasks within the consortium, including planning and executing quality management to maintain data quality, checking quality management activities at each stage, and providing dedicated support for quality verification by the Telecommunications Technology Association (TTA), thereby enhancing data reliability.
Park Soonhyuk, head of B2N’s AIX Group, stated, “Based on years of accumulated experience and expertise, our capabilities in executing AI training data construction projects have been recognized in many ways. As a result, inquiries from various institutions and companies planning to build multimodal data and generative AI data in the 2024 ultra-large AI data construction project are continuously coming in.”
He added, “This year, to build high-quality ultra-large AI data, we plan to expand quality management regarding content similarity, duplication, and harmfulness of large-scale corpus data. In addition to syntactic accuracy and statistical diversity checks using the existing ‘SDQ for AI,’ we will also support semantic accuracy checks through ‘Laflow,’ an integrated platform for AI training data.”
Meanwhile, B2N has participated in the AI training data construction project in various forms, such as providing the ‘SDQ for AI’ tool and quality verification services for tasks within consortia that lacked quality inspection tools, continuously for four years from 2020 to 2023.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

