AI Search Engines Falter on News Accuracy, Columbia Study Reveals

A groundbreaking study from Columbia Journalism Review’s Tow Center has uncovered troubling flaws in AI-powered search engines, particularly when it comes to news retrieval. Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar tested eight cutting-edge AI search tools and found they incorrectly answered over 60 percent of queries related to news sources – raising profound questions about their reliability as information gatekeepers.
Rigorous Testing Exposes Fundamental Flaws
The researchers employed a straightforward but revelatory methodology. They selected excerpts from published news articles and asked each AI model to identify basic information: the article’s headline, original publisher, publication date, and URL. This process was repeated across eight different generative search platforms, resulting in 1,600 total queries that put these tools’ accuracy to the test.

With approximately one in four Americans now turning to AI models as alternatives to traditional search engines, according to the Columbia Journalism Review report, these findings raise serious concerns about the information landscape.
Performance Varies Widely, But Problems Persist Across All Platforms
The study revealed striking variations in accuracy among the platforms tested. Perplexity emerged as the relative frontrunner, though it still provided incorrect information for 37 percent of queries. OpenAI’s ChatGPT Search fared significantly worse, incorrectly identifying 67 percent of articles queried. Most alarming was Grok 3, which demonstrated a staggering 94 percent error rate.
AI’s Dangerous Tendency to Fabricate Rather Than Admit Ignorance
Perhaps most concerning was the prevalence of what researchers termed “confabulations” – instances where AI models generated plausible-sounding but entirely fabricated answers rather than admitting knowledge gaps.
This pattern appeared consistently across all tested models. Instead of declining to respond when lacking reliable information, these systems produced what researchers describe as confabulations – confident-sounding but incorrect or speculative answers that users might easily mistake for fact.
Publisher Controls Bypassed, Original Sources Obscured
The study uncovered evidence suggesting some AI tools ignored Robot Exclusion Protocol settings – the standard method publishers use to control automated access to their content. For instance, Perplexity’s free version correctly identified all ten excerpts from paywalled National Geographic content, despite National Geographic explicitly blocking Perplexity’s web crawlers, as reported by Ars Technica.
Even when these AI search tools did cite sources, they frequently directed users to syndicated versions on platforms like Yahoo News rather than original publisher websites. This practice occurred even when publishers had established formal licensing agreements with AI companies – undermining news organizations’ audience relationships and revenue models.
Dead-End Links Compound the Problem
URL fabrication emerged as another significant issue. More than half of citations from Google’s Gemini and Grok 3 led to fabricated or broken URLs resulting in error pages. Of 200 citations tested from Grok 3, a staggering 154 resulted in broken links – leaving users at frustrating dead ends when attempting to verify information.

Publishers Face an Impossible Dilemma
These findings place news publishers in a difficult position. Blocking AI crawlers might prevent any attribution whatsoever, while permitting them allows widespread content reuse without driving valuable traffic back to publishers’ websites. This tension threatens both journalistic sustainability and information integrity.
Industry Insiders Express Concern, But Remain Hopeful
Mark Howard, chief operating officer at Time magazine, expressed concerns to CJR about ensuring transparency and control over how Time’s content appears in AI-generated searches. Despite these issues, Howard sees potential for improvement, stating, “Today is the worst that the product will ever be,” citing significant ongoing investments and engineering efforts.
However, Howard also placed some responsibility on users: “If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them.” This highlights the tension between AI developers’ responsibilities and the need for public digital literacy.
Tech Giants Acknowledge But Don’t Address Findings
OpenAI and Microsoft provided statements to CJR acknowledging receipt of the findings but offered limited specific responses. OpenAI referenced its commitment to supporting publishers by driving traffic through summaries, quotes, clear links, and attribution. Microsoft stated it adheres to Robot Exclusion Protocols and publisher directives.

Paying More Doesn’t Guarantee Better Accuracy
Surprisingly, premium paid versions of these AI search tools sometimes performed worse than their free counterparts. Perplexity Pro ($20/month) and Grok 3’s premium service ($40/month) delivered incorrect responses more confidently than free versions. While premium models correctly answered more prompts overall, their reluctance to decline uncertain responses drove higher error rates.
Building on Previous Research
This report builds on previous findings published by the Tow Center in November 2024, which identified similar accuracy problems in how ChatGPT handled news content. The full detailed report is available on the Columbia Journalism Review’s website.
Charting a Path Forward
Addressing these fundamental accuracy challenges is crucial for the future of AI in news search. Key areas requiring attention include improved training data with better-curated datasets emphasizing factual accuracy; advanced algorithms for better contextual understanding and source verification; robust fact-checking through automated verification systems to prevent misinformation; greater transparency from AI companies about model limitations and error rates; and industry-wide standards to ensure more responsible AI development.
The journey toward reliable AI-powered news search requires collaboration between technology developers, publishers, researchers, and policymakers. By acknowledging current limitations and working together, we can harness AI’s potential to enhance rather than undermine the quality of public information.
Tags
Read More From AI Buzz

Perplexity pplx-embed: SOTA Open-Source Models for RAG
Perplexity AI has released pplx-embed, a new suite of state-of-the-art multilingual embedding models, making a significant contribution to the open-source community and revealing a key aspect of its corporate strategy. This Perplexity pplx-embed open source release, built on the Qwen3 architecture and distributed under a permissive MIT License, provides developers with a powerful new tool […]

New AI Agent Benchmark: LangGraph vs CrewAI for Production
A comprehensive new benchmark analysis of leading AI agent frameworks has crystallized a fundamental challenge for developers: choosing between the rapid development speed ideal for prototyping and the high-consistency output required for production. The data-driven study by Lukasz Grochal evaluates prominent tools like LangGraph, CrewAI, and Microsoft’s new Agent Framework, revealing stark tradeoffs in performance, […]
