Chaos Engineer Preventing IT Network Emergencies
Applying Load to Infrastructure and Systems to Prepare for Worst-Case Scenarios
Long ago, the conspiracy theory called the 'Millennium Bug' gained popularity. It was a theory that on January 1, 2000, the start of the new millennium, all servers worldwide would fail to recognize the date, causing the internet to collapse.
The Millennium Bug ended as one of the absurd doomsday panics, but ironically, we now live in the most vulnerable period for the internet. Since everything is connected to public clouds, even a minor error in a central server can cause systems worldwide to go down. Big tech companies that rely on internet services have even hired experts to prevent the 'end of the internet.'
Experts Preventing Internet Outages
On July 19, numerous computer systems worldwide connected to Microsoft's (MS) Azure Cloud simultaneously stopped working. The cause of this IT crisis was a network security company called CrowdStrike. Their regular patch program caused a minor error, which led to information networks simultaneously displaying blue screens.
United Airlines service suspended due to Microsoft cloud outage in July [Image source=Reuters Yonhap News]
Today, with the widespread use of public clouds, everything from airport ticket reservations to hospitals and government agencies stores information and provides services through the cloud. The convenience of the cloud is beyond words, but there is also the risk that a single minor bug can instantly paralyze computer networks worldwide.
For this reason, the IT industry, which absolutely depends on internet business revenue, invests heavily in 'Chaos Engineering.' As the name Chaos suggests, these experts predict and respond to the 'chaos' that unexpected bugs may cause in computer networks.
Starting from Apple's 'Monkey' in the 1980s
The first chaos engineering is known to have been started by Apple. Around 1983, when Apple released the Macintosh computer, senior programmer Steve Caps created a program called 'Monkey' that deliberately stressed the PC. Monkey wildly input commands to the computer keyboard and mouse to intentionally cause errors, which the team then detected and fixed to stabilize the product. Since then, IT companies have invested in system stabilization work, which is now called chaos engineering.
Today, Netflix, an OTT service provider, is a company serious about chaos engineering. Netflix's high-definition video streaming operates on a cloud-based system. Therefore, they must predict and prevent any situation that could cause server failure. In fact, Netflix experienced its worst incident in 2008 when a sudden power outage caused all systems to be paralyzed for three days.
Since then, Netflix has hired many chaos engineers, and notably, the Netflix chaos team invented an open-source tool called 'Chaos Monkey.' Chaos Monkey deliberately causes random failures in server infrastructure. Through large-scale stress testing, it periodically measures the load levels that corporate systems and infrastructure can withstand and finds and blocks vulnerabilities in advance.
Bank computer networks handling transactions worth tens of millions of won at once, hospitals where patients' lives depend on the systems, and government public institutions also invest in 'chaos management.' These experts usually go unnoticed and their effectiveness is hard to recognize, but they work day and night to prevent the worst-case scenarios that could cause astronomical damage if they occur.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.
![People Preventing the 'Internet Apocalypse' [New Jobs]](https://cphoto.asiae.co.kr/listimglink/1/2024101813364464261_1729226203.jpg)

