본문 바로가기
bar_progress

Text Size

Close

Alice Group Releases Two Korean AI Educational Datasets on Hugging Face

Lowering Barriers to AI Research and Expanding Utilization

Alice Group announced on January 14 that it has released two Korean-language educational datasets on the global open-source platform Hugging Face.


The released datasets consist of the "Korean FineWeb-Edu Demo," designed to enhance the Korean performance of large language models (LLMs) in academic and educational domains, and the "Korean-webtext-edu," a Korean web text educational dataset.


Alice Group Releases Two Korean AI Educational Datasets on Hugging Face Demo of the Alice Group Korean Fineweb Training Dataset Released on Hugging Face. Alice Group

The "Korean FineWeb-Edu Demo" is a sample comprising 5% of the "korean-translated-fineweb-edu-dedup" dataset, which contains approximately 190 billion tokens translated into Korean from the English educational web text corpus FineWeb-Edu. It is designed for training Korean LLMs in academic and educational domains and is provided to verify the characteristics and usability of the data prior to large-scale training.


The "Korean-webtext-edu" dataset is constructed by selecting only content that passed an educational value score from a large corpus of Korean web texts. The dataset was evaluated for factuality, contextual consistency, and educational suitability, making it suitable for training Korean AI models.


Alice Group plans to support the revitalization of AI research and development both domestically and internationally by providing high-quality data suitable for training Korean AI models to researchers, developers, and companies for broad use.


Soo-In Kim, CRO of Alice Group, stated, "We will continue to contribute to the advancement of Korean AI research and the growth of the industry ecosystem based on our technological capabilities that encompass data, models, and infrastructure."


© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Special Coverage


Join us on social!

Top