The Most Accurate Free AI Revealed... ChatGPT Ranks Second, Who Is No.1?

WP and US Public and University Libraries Test AI Tools
Google “AI Mode” Ranks First Among Nine Major Platforms
“Many AI Tools Still Fail to Provide Accurate Answers”

According to an analysis, among artificial intelligence (AI) models, Google’s “AI Mode” provides the most accurate answers as a free search tool. On the 27th (local time), Yonhap News cited The Washington Post (WP) in the United States, reporting that “in an AI search tool test conducted by WP together with librarians from public and university libraries in the US, Google’s ‘AI Mode’ delivered the most accurate responses.”

The Most Accurate Free AI Revealed... ChatGPT Ranks Second, Who Is No.1?

In May, Liz Reid, Vice President in charge of Google Search, gave a speech at the I/O event. Photo by AP News

This test evaluated nine major AI tools, including Google AI Mode, AI Overview, ChatGPT (OpenAI), Claude (Anthropic), Meta AI, Grok (xAI), Perplexity, and Bing Copilot (Microsoft). ChatGPT included both the GPT-5 and GPT-4 Turbo models. Google AI Mode searches the web in depth and provides answers by synthesizing information from multiple sources, while AI Overview is a model that summarizes search results using AI.

The evaluators posed 30 challenging questions and scored 900 responses generated by the AI tools. All tools were tested using only their free basic versions (as of July to August), and the questions focused on five categories: quizzes, expert resource searches, recent events, inherent bias, and image recognition.

As a result, Google AI Mode received the highest score with 60.2 out of 100 points. ChatGPT based on GPT-5 came in second with 55.1 points, and Perplexity ranked third with 51.3 points. Elon Musk’s Grok 3 ranked eighth (40.1 points), and Meta AI received the lowest score, ranking ninth (33.7 points). However, Grok’s latest model, Grok 4, was not included in the test as there is no free version available.

Google AI Mode provided the most accurate answers in the quiz and recent events categories. Bing Copilot scored the highest in expert resource searches, and Perplexity led in image recognition. GPT-4 Turbo delivered the most unbiased answers. While GPT-5 showed overall performance improvements and ranked second, it scored lower than GPT-4 in some areas.

However, AI tools still struggle to determine whether information is up-to-date and how reliable the sources are, and they sometimes confidently provide incorrect answers-a phenomenon known as “hallucination.” WP emphasized, “This test deliberately targeted the weaknesses of AI, but it also revealed that AI still fails to answer a significant number of everyday questions accurately,” adding, “Ultimately, the lesson is that instead of trusting AI responses at face value, users must verify sources, check for up-to-date information, and apply critical thinking.”