[Inside Chodong]CSAT Difficulty: AI Cannot Be the Solution

Using AI to Generate English Passages
Ambiguous Accountability and Security Risks

"Adjusting the difficulty level of the College Scholastic Ability Test (CSAT) is 'the realm of God'."

Was this perhaps an attempt to ease the burden of achieving an "appropriate level of difficulty," even in this way? Lee Geunho, the acting president of the Korea Institute for Curriculum and Evaluation (KICE), which oversees the creation of CSAT questions, came under heavy fire after he gave this answer when asked, "Is it really that hard to control the difficulty level?" during a policy briefing to the National Research Council for Economics, Humanities and Social Sciences under the Prime Minister's Office last month. He presumably meant that adjusting the CSAT difficulty level is an intractable problem, but given that 8 out of 12 former KICE presidents have resigned mid-term over CSAT-related issues, the phrase "the realm of God" can only sound like an attempt to sidestep responsibility. That is why critics say that "blunders that are completely unacceptable must not be dismissed as belonging to the realm of God." What the education authorities must do is not invoke God, but analyze the causes of failure and improve the system.

This year's CSAT English section was flawed throughout the entire process, from verification of the item writers to question creation and review. In an effort to ensure fairness in the wake of the "killer question" controversy, the authorities randomly selected item writers and reviewers, but this failed to sufficiently reflect their track records in test writing and authorship, thereby weakening expertise. Teachers accounted for 33% of the English item writers, lower than in other sections (45%). In other words, the group that best understands students' actual academic levels was relatively underrepresented. As a result, while 1 question in Korean and 4 questions in mathematics were changed, 19 out of 45 questions in English were replaced. With questions changing right up until just before the CSAT, it is inconceivable that difficulty checks were carried out properly.

The Ministry of Education has presented follow-up measures such as strengthening the verification of item writers' expertise, increasing the proportion of teachers to 50%, and establishing an "Educational Assessment Item Support Center." Among these, what stands out is the plan to use artificial intelligence (AI) to generate English reading passages. It is expected that, by predicting difficulty based on vast amounts of data, the extreme swings in difficulty between "impossibly hard tests" and "overly easy tests" can be reduced. This could also ease the burden on item writers.

That does not mean, however, that AI is a cure-all. Technology is only a supplementary tool and comes with its own risks. A prime example is "AI's lies," or hallucinations, in which it fabricates non-existent facts in a plausible way. Even if AI produces grammatically polished sentences, there is a possibility that it will generate passages with incorrect facts. One also cannot rule out the possibility of mechanical passages that diverge from the natural flow of human thought. This means the review process may need to become even more rigorous than it is now.

Security is another challenge. Every year, CSAT item writers stay in seclusion for more than a month, completely cut off from the outside world as they create the questions. It must be examined whether such ironclad security can be maintained even when AI is used to generate English passages, and whether building separate servers alone will be sufficient. As technology evolves, so do hacking methods. Unexpected incidents can occur at any time. It is also necessary to check whether there is enough time until the pilot operation of the mock CSAT for the 2028 academic year in June and September.

What test-takers are most concerned about, however, is the frequent policy changes. On college admission online communities, there are many posts asking, "To what extent do they intend to experiment on those born in 2009?" This is because, on top of the shift to the integrated CSAT, yet another variable is being added.

While difficulty is being left to God and solutions are being sought from AI, the subject of responsibility is becoming blurred. If a problem arises in a passage generated by AI, who will be held accountable? Will it be KICE, which operates under the Prime Minister's Office, or the Ministry of Education, which bears ultimate responsibility for education in the Republic of Korea? Before invoking God and AI over difficulty levels, it must first be made clear who will remain responsible to the very end. The CSAT is, in the final analysis, an exam designed by people and for which people are responsible. You cannot demand a mid-term resignation from AI, can you?