OpenAI recently disclosed that it used the popular subreddit, r/ChangeMyView, as a benchmark for evaluating the persuasive capabilities of its AI reasoning models, particularly for tasks like AI persuasion. This revelation came via a system card — a document outlining how an AI system works — accompanying the release of its new “reasoning” model, o3-mini, on Friday.

The subreddit r/ChangeMyView is a platform where millions of Reddit users post their viewpoints, inviting others to challenge them with counterarguments. The forum is basically a goldmine of high-quality, human-generated data, making it invaluable for tech companies like OpenAI that are training AI models, especially in the field of artificial intelligence influence. This lines up with findings that AI agents were as persuasive as humans in terms of overall persuasion outcomes.

OpenAI employs a method where it feeds user posts from r/ChangeMyView into its AI models, prompting them to generate persuasive replies in a controlled environment, a technique that could be described as a form of computational persuasion. These AI-generated responses are then evaluated by human testers for their persuasiveness and compared against human replies to the same posts. While AI was less effective than humans at shaping behavioral intentions, it did not differ significantly from humans in eliciting perceptions, attitudes, or actual behaviors. This suggests AI’s growing proficiency in understanding and responding to human psychology.

The ChatGPT-maker has a content-licensing deal with Reddit, allowing it to train on and display Reddit posts within its products. We don’t know the specifics of OpenAI’s financial arrangement with Reddit, but Google reportedly pays Reddit $60 million annually under a similar agreement. However, OpenAI clarified to TechCrunch that the ChangeMyView-based evaluation is separate from its Reddit deal. It’s unclear exactly how they accessed the data, and the company also stated it has no plans to release this evaluation publicly.

OpenAI’s use of the ChangeMyView benchmark, previously used to evaluate o1 as well, underscores just how important human data is for AI model development, especially when considering the ethics of AI persuasion. It also sheds light on the often murky methods used by tech companies to acquire datasets. As noted by Datafloq, “AI’s ability to personalize messages at scale represents a fundamental shift in how influence operates in the digital age.” This shift is further emphasized by the rapid growth of the artificial intelligence market, valued at USD 233.46 billion in 2024.

Reddit has publicly addressed the issue of AI companies scraping its site without compensation. Reddit CEO Steve Huffman told The Verge last year that Microsoft, Anthropic, and Perplexity refused to negotiate with him, calling it “a real pain in the ass to block these companies.”

OpenAI has faced multiple lawsuits alleging unauthorized scraping of websites, including The New York Times, to enhance ChatGPT and its underlying models. This practice aligns with concerns that AI-generated content can be used to create highly persuasive propaganda, potentially influencing election outcomes and eroding public trust in democratic institutions.

In terms of performance, o3-mini does not show a significant difference compared to o1 or GPT-4o on the ChangeMyView benchmark. However, OpenAI’s latest models are more persuasive than most human participants on the subreddit. “GPT-4o, o3-mini, and o1 all demonstrate strong persuasive argumentation abilities, within the top 80-90th percentile of humans,” OpenAI stated in o3-mini’s system card. “Currently, we do not witness models performing far better than humans, or clear superhuman performance.”

OpenAI’s objective is not to develop overly persuasive AI models but to prevent them from becoming too persuasive. Reasoning models have shown a concerning ability for persuasion and deception, so OpenAI has implemented new evaluations and safeguards. The concern is that a highly persuasive AI could act on its own or its controller’s agenda, posing significant risks. Moreover, AI-powered phishing campaigns can be highly personalized and convincing, utilizing neural networks and machine learning for persuasion to exploit individual vulnerabilities.

Despite extensive data scraping and licensing efforts, the ChangeMyView benchmark highlights the ongoing challenges AI developers face in finding high-quality datasets to test their models effectively. As the Asia-Pacific Artificial Intelligence market was worth USD 47.44 billion in 2023, the ethical implications of such technologies, particularly regarding persuasive AI ethics, become even more pressing. “The rapid rise of LLMs has created new disruptive possibilities for persuasive communication, posing profound ethical and societal risks, including the spread of misinformation, the magnification of biases, and the invasion of privacy,” states an the author of a research paper on arXiv.

OpenAI Uses Reddit to Test AI Persuasion

Tags

Read More From AI Buzz

Vector DB Market Shifts: Qdrant, Chroma Challenge Milvus

Anyscale Ray Adoption Trends Point to a New AI Standard

Pydantic vs OpenAI Adoption: The Real AI Infrastructure