본문 바로가기
bar_progress

Text Size

Close

Kakao Prevents Outages with Full System Multiplexing: "Service Stabilization is Top Priority"

Multiplexing Infrastructure Facilities and Services... Double and Triple Preparedness for Disasters and Accidents
460 Billion Won Investment in Ansan Data Center... Strengthening Dedicated Infrastructure Organization

Kakao Prevents Outages with Full System Multiplexing: "Service Stabilization is Top Priority"

[Asia Economy Reporters Yuri Choi and Minyoung Cha] Kakao, which experienced a service outage due to a data center fire, has decided to implement full system redundancy from infrastructure facilities to services to prevent recurrence. They plan to prepare for unexpected situations with double and triple layers of redundancy through the duplication of fault monitoring systems and data replication.


Cause of Failure: 'Insufficient Redundancy'... Preventing Recurrence through Full System Redundancy

On the 7th, Kakao revealed the causes of the service failure and measures to prevent recurrence, focusing on these points at the annual developer conference ‘If Kakao Dev 2022.’ This is a countermeasure announced about two months after the service outage caused by the fire at the SK C&C Pangyo Data Center on October 15.


Kakao cited insufficient redundancy of data operation management tools and lack of available resources as the causes of the service outage following the Pangyo data center fire. At that time, Kakao had set up server redundancy, with the Pangyo data center servers in active mode and servers at another data center in standby mode. However, the operation management tool, which manages the authority to switch the standby servers to active mode, was not redundant. This failure to activate the standby servers during an emergency worsened the damage. The Ministry of Science and ICT also pointed to insufficient redundancy measures as the cause of the failure the day before.


In response, Kakao plans to implement full system redundancy, not only for data operation management tools but for the entire system to prevent recurrence. Redundancy will be applied across the entire system, from infrastructure facilities such as data centers, data, and platforms to applications and services. The operation management tool, identified as a major cause, will be triplicated to prepare for unexpected situations. Additionally, the monitoring system will be duplicated, and data will be replicated multiple times to ensure continuous service in case of failure. Kakao will also invest a total budget of 460 billion KRW to strengthen disaster prevention measures at the Ansan data center currently under construction and establish emergency response plans and data center disaster recovery (DR) systems.


The infrastructure organization will also be strengthened. Led by Woochan Ko, Chief Cloud Officer (COO) of Kakao Enterprise and co-chair of the Kakao Emergency Response Committee’s recurrence prevention task force, Kakao plans to establish a dedicated IT engineering team and secure talent.


Namgoong Hoon, former Kakao CEO and co-chair of the Emergency Response Committee’s recurrence prevention task force, emphasized, "We will reflect and improve even now to do our best to prevent such accidents in the future," adding, "Service stabilization is the top priority, and we will always keep in mind that it is a social responsibility."


Most Government Correction Requests Included... Compensation Plan Remains a Major Challenge

Kakao’s recurrence prevention measures include most of the corrective actions requested by the government, but the compensation plan is still far from complete. On the 7th, the Ministry of Science and ICT announced the investigation results of the Kakao failure and demanded corrective actions for redundancy. They requested Kakao to prepare higher levels of distribution and redundancy for KakaoTalk’s sending/receiving and authentication functions. They also urged the implementation of disaster preparedness drills and the establishment of recovery plans for various failure scenarios, while asking for a compensation plan for user damages.


Currently, the Kakao compensation negotiation committee has held its second meeting and shared damage cases by type. They plan to analyze these and establish principles for compensation standards and amounts. However, challenges remain, such as the lack of compensation regulations and precedents for free services and the need to verify actual damages. A Kakao official said, "If there are areas to reinforce among the government’s announced corrective demands, we will actively review and reflect them."


The Ministry of Science and ICT also urged SK C&C, the data center operator, and Naver, where the service failure occurred, to establish improvement plans. For SK C&C, they requested strengthening battery monitoring system management and installing various fire detection systems. They asked for the installation of fire suppression equipment necessary for lithium-ion battery fires or, if impossible, to prepare alternative measures. They also demanded the development of disaster response scenarios, detailed training plans, and reporting of training results. For Naver, they requested a review and improvement of recovery targets by service and recovery plans for failure scenarios, as well as conducting drills assuming situations such as the total destruction of the main data center.


Companies expressed their willingness to cooperate actively. Representatives from SK C&C and Naver said, "We will refer to the government’s corrective demands and continue to do our best to operate stable services without interruption."


© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Special Coverage


Join us on social!

Top