Attempting to Predict Disease Outbreaks Using 'Flu' Search Volume
The Discontinuation of the Once-Innovative 'Google Flu Trends'
The Lesson: "Correlation Does Not Imply Causation"
I type on the keyboard while staring at the monitor.
'Flu symptoms'
If someone searched for this, there is a high probability that the person has caught a cold.
And if, all of a sudden, the search volume for keywords like 'flu', 'flu symptoms', and 'flu signs' increases in Busan, it can be interpreted as an outbreak of flu in that area.
Google engineers came up with this simple yet ingenious idea. They quickly turned it into a service. This led to the launch of 'Google Flu Trends' in 2008.
At the time, it was a fresh idea. While traditional health authorities were busy collecting and analyzing hospital reports, Google was already predicting flu outbreaks using people's search data.
Google Flu Trends: Predicting Disease Outbreaks with Search Queries
Google Flu Trends was a system that analyzed flu-related search queries entered by users to predict flu outbreaks in real time. It was an innovative idea.
For example, if there was a sudden increase in searches for keywords like 'cough', 'fever', and 'flu symptoms' in a particular region, the system would predict that a flu outbreak was starting in that area. This allowed detection of flu outbreaks one to two weeks earlier than the traditional reporting systems used by health authorities.
Each week, about 50 million search queries were collected, and among these, the top 45 flu-related keywords were selected to train the model based on flu data from the US Centers for Disease Control and Prevention (CDC).
In its early days, GFT succeeded in quickly detecting flu outbreaks and attracted significant attention.
Google Flu Trends immediately drew public attention after its launch. Its predictions closely matched the actual flu outbreak reports from the CDC, and it was cited as a prime example of the power of big data.
"It's Nonsense" - Complaints Begin to Surface
As time passed, complaints began to emerge. Criticisms such as "It's nonsense" and "It's too exaggerated" started to appear. In one year, the system made an error by almost doubling the actual scale of the winter flu outbreak. In 2013, Google retrained and fine-tuned the model repeatedly, but prediction accuracy did not significantly improve. Eventually, in 2015, Google discontinued the service.
Why did this happen? First, they overlooked the 'media effect.' The media and press are always searching for news. If anything even slightly unusual occurs, they pay attention, track it, and report on it. Flu is no exception. If even a minor flu outbreak is detected in a community, the media creates news stories. Some overly eager media outlets might publish headlines like this:
Similar news reports about the flu suddenly increase. As a result, people search for more flu-related news.
Questions like "Did I catch it too?", "What are the symptoms?", and "How can I prevent it?" naturally arise, and people search for answers. However, there is a problem here. This surge in search volume does not necessarily mean that the number of actual flu patients is increasing.
There was certainly a correlation between increased search queries and increased flu cases. However, correlation does not imply causation. Just because more people are searching for flu symptoms does not mean more people are actually catching the flu. The reasons for increased searches can vary, including actual symptom experience, media reports, or influence from people around them. Google Flu Trends failed to account for this amplification effect caused by the media.
There is also the particularity of 'seasonality.' During transitional seasons or winter, when immunity tends to drop, various seasonal illnesses increase along with the flu. The initial symptoms are usually similar: fever, chills, cough, fatigue, and so on. In other words, people search for similar keywords about a variety of illnesses. It is difficult to accurately separate signals unique to the flu in this environment.
Lessons from the Failure of Google Flu Trends
A large amount of data is important. At the same time, the ability to properly process, refine, and connect large amounts of data is also crucial. Pixabay
Although the Google Flu Trends service was discontinued, it left behind valuable lessons. It provided several important insights not only for big data but also for the field of AI.
After the failure of Google Flu Trends, in 2015, a research team at Harvard University analyzed the problems of the original model and developed a new model called 'ARGO' (AutoRegression with GOogle search). This model used a more sophisticated approach, taking into account the dynamic nature of people's search behaviors and the seasonality of diseases. As a result, it produced predictions that closely matched the actual figures reported by the CDC. This is a typical example of learning from failure and developing better methods.
It also reminded future generations that 'the method of analysis is just as important as the data itself.' No matter how much big data you have, if your analysis method is inadequate, you cannot derive valuable results. Google had an enormous amount of search data, but made mistakes in the area of interpretation and application.
Diversifying data sources is also important. Relying on a single data source has its limitations. One reason the ARGO model was able to achieve more accurate predictions was that it utilized a variety of information in addition to search data. It is said that combining electronic health record (EHR) information can further improve prediction accuracy.
The failure of Google Flu Trends demonstrated the limitations of AI and big data, but at the same time, it also showed the potential for further improvement.
It reaffirmed that data and algorithms alone are not enough; it is necessary to accurately understand what the data means and to verify it from various perspectives. When developing AI systems, it is important to remember that it is not just about the technical aspects, but also about considering the social and cultural context in which the data is generated.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.
![Does an Increase in 'Lee Jaemyung' Search Volume Raise Election Chances? [AI Wrong Answer Note]](https://cphoto.asiae.co.kr/listimglink/1/2025050914363250935_1746768991.jpg)
![Does an Increase in 'Lee Jaemyung' Search Volume Raise Election Chances? [AI Wrong Answer Note]](https://cphoto.asiae.co.kr/listimglink/1/2025050914381250939_1746769091.jpg)

