OpenAI's Operator Agent Shows Promise and Limitations in Online Grocery Shopping Tests

Artificial intelligence is rapidly evolving beyond simply answering questions to actively performing tasks. A sophisticated new category of AI, known as “agents,” is emerging with capabilities to execute complex operations online and, potentially soon, in the physical world around us.
Introduction: The Rise of AI Agents – More Than Just Chatbots
The ‘Digital Ghost’ Experience: Witnessing AI in Action
Picture your computer completing tasks autonomously, with no human hand touching the keyboard or mouse. This uncanny “digital ghost” experience is becoming an everyday reality through the remarkable evolution of AI agents.
One tech journalist captures the experience with striking clarity: “I’m watching artificial intelligence order my groceries. Armed with my shopping list, it types each item into the search bar of a supermarket website, then uses its cursor to click. Watching what appears to be a digital ghost do this usually mundane task is strangely transfixing.” This isn’t preprogrammed automation following rigid scripts; it’s an AI making dynamic decisions and navigating the complex web environment in real-time. The sense of witnessing something revolutionary is palpable—even prompting the author’s skeptical husband to wonder, “Are you sure it’s not just a person in India?”

Beyond Chat: AI Agents Taking Action
The fundamental distinction between traditional chatbots and these new AI agents lies in their capacity for action. Margaret Mitchell, chief ethics scientist at AI company Hugging Face, puts it succinctly: “As soon as something is starting to execute actions outside of the chat window, then it’s gone from being a chatbot to an agent.”
These agents can seamlessly interact with software platforms and digital services, autonomously executing tasks that previously demanded direct human involvement. OpenAI’s groundbreaking Operator, available to ChatGPT Pro subscribers, enables users to automate an impressive range of online activities from shopping to research. However, this cutting-edge functionality comes with a premium price tag—a ChatGPT Pro subscription required for Operator access currently runs $200 monthly.
The industry movement toward these “agentic” capabilities is gaining remarkable momentum: “Similar to OpenAI’s offering, Anthropic introduced ‘computer use’ capabilities to its Claude chatbot towards the end of last year. Perplexity and Google have also released ‘agentic’ features into their AI assistants, with further companies developing agents aimed at specific tasks such as coding or research.” This widespread adoption signals a vibrant, rapidly expanding market for increasingly sophisticated AI agents.
The Hype and Reality of AI Agents
The excitement surrounding AI agents is undeniable, with ambitious initiatives like GitHub’s Project Padawan (named after the Star Wars term for a Jedi apprentice) capturing both public imagination and serious technological investment. While the potential applications seem almost limitless, it remains essential to distinguish between aspirational marketing and current technological capabilities.
Though AI agents are making remarkable progress, most remain firmly in active development phases. This technology is evolving at breathtaking speed, accompanied by crucial ongoing conversations about responsible implementation and ethical boundaries. As the original research acknowledges: “It’s early days. Most commercially available agents come with a disclaimer that they’re still experimental – OpenAI describes Operator as a ‘research preview’ – and you can find plenty of examples online of them making amusing mistakes…”

OpenAI’s Operator: A Hands-On Look at AI Grocery Shopping
Testing Operator: From Shopping Lists to Deliveroo
To properly evaluate Operator’s real-world capabilities, extensive practical testing was conducted across various online grocery platforms and food delivery services. The methodology involved providing Operator with typical shopping lists and carefully observing its ability to navigate complex websites, make appropriate product selections, and successfully complete checkout processes.
The author’s firsthand account offers valuable insights: “I type my request and it asks if I have a preferred shop or brand. I tell it to go with whichever is cheapest. A window appears showing a web browser and I see it search ‘UK online grocery delivery’.” Beyond basic grocery shopping, testing expanded to include popular food delivery platforms like Deliveroo, offering a comprehensive view of Operator’s versatility across different e-commerce environments.
Successes and Stumbles: Navigating the Grocery Aisle
Throughout testing, Operator demonstrated impressive capabilities in several key areas. The system successfully implemented price-based filtering: “It starts searching for my requested items and filters the results by price. It selects products and clicks ‘Add to trolley’.”
The agent also displayed remarkable decision-making initiative: “I’m impressed with Operator’s initiative; it doesn’t pepper me with questions, instead making an executive decision when given only a brief item description, such as ‘salmon’ or ‘chicken’.” However, testing also revealed important limitations in Operator’s current implementation. The system encountered unexpected challenges with minimum spend requirements on Ocado: “We hit a snag when Operator realises that Ocado has a minimum spend, so I add more items to the list.” The research further documents instances of product misinterpretation, such as selecting smoked salmon instead of fresh fillets.
The Human Touch: When Operator Needs Assistance
While Operator strives for autonomous operation, testing identified several scenarios where human intervention becomes necessary or preferable. During evaluation, situations frequently arose where Operator appropriately requested user input for sensitive data entry like passwords.
In these circumstances, as outlined in OpenAI’s comprehensive documentation, Operator pauses and prompts the user to take over, ensuring appropriate human oversight: “…the agent prompts me to intervene: while users can take over the browser at any point, OpenAI says Operator is designed to request this ‘when inputting sensitive information into the browser, such as login credentials or payment information’.” OpenAI’s explicit commitment not to capture screenshots during user control phases further demonstrates their prioritization of user privacy and data security.
The Tipping Point: A Reminder of Ethical Considerations
A particularly noteworthy observation during testing concerned Operator’s handling of delivery driver tipping options on food delivery platforms. Without specific instructions to include a gratuity, the AI system completely bypassed the tipping option on Deliveroo.
The author recounts: “I’m mortified, however, when I realise Operator skipped over tipping the delivery rider. I sheepishly take my food and add a generous tip after the fact.” This incident powerfully highlights the critical need for developers to thoroughly consider the broader ethical implications of AI agent behaviors, particularly regarding fair compensation for service workers.
Beyond Groceries: AI Agents Expanding Their Reach
From Manicures to Coding: Exploring Diverse Applications
The potential applications for AI agents extend far beyond online grocery shopping, encompassing an impressively diverse range of tasks. These versatile digital assistants are demonstrating capabilities across domains from scheduling personal appointments to assisting with highly technical coding projects.
The author’s attempt to book a manicure, while ultimately unsuccessful, nevertheless illustrates this expanding potential: “Operator struggles more with this task… By this point, I could have already made my own booking. Operator eventually suggests a suitable appointment, but I abandon the task and chalk it up as a win for Team Human.” This mixed result highlights both the ambitious scope and current limitations of today’s AI agents.
AI Agents in the Workforce: The Future of Software Development
The integration of AI agents into professional workflows is gaining significant momentum across industries. Sam Altman, OpenAI’s CEO, has suggested that AI agents could meaningfully “join the workforce” in the very near future (while a specific citation for this prediction wasn’t provided in the original research, this perspective is widely shared among leading AI researchers and industry executives).
AI agents are increasingly positioned as valuable collaborative partners capable of automating repetitive, time-consuming tasks. For example, as detailed by RevGen Partners, specialized AI agents can efficiently automate legal research, contract analysis, and complex regulatory compliance tasks. GitHub’s enhanced Copilot tool, now featuring sophisticated agentic capabilities, exemplifies this transformative trend. GitHub’s CEO, Thomas Dohmke, explains: “Instead of you just asking a question and it gives you an answer, you give it a problem and then it iterates on that problem together with the code that it has access to.” The company’s ambitious Project Padawan represents an even more significant advancement toward truly autonomous coding assistance.
The Jarvis Vision: Personalized AI Assistants for Everyone
The concept of highly personalized AI assistants, reminiscent of the fictional Jarvis from the Iron Man film series, is rapidly transitioning from science fiction into achievable reality. This vision aims to provide individuals with sophisticated AI agents capable of managing complex schedules, handling diverse communications, and eventually anticipating personal needs before they’re even expressed.
“Dohmke envisions a future when everyone has their own personal Jarvis, the talking AI in Iron Man. Your agent will learn your habits and become customised to your tastes, making it progressively more useful.” As Shift Asia insightfully explains, AI agents represent not merely task automation but a fundamental enhancement of human capabilities across both professional and personal domains.

The Risks and Rewards of Autonomous AI Agents
The Autonomy Spectrum: From Assistance to Full Control
AI agents exist along a complex spectrum of autonomy, ranging from providing basic assistance to operating with sophisticated independence. At the more limited end, agents function primarily as responsive tools, executing specifically requested tasks with clear human oversight.
At the more advanced end of this spectrum, fully autonomous agents could potentially make significant decisions without direct human intervention or supervision. This progression raises both exciting possibilities for efficiency and serious concerns regarding control. The original research emphasizes this spectrum, noting that even the most advanced currently available agents remain predominantly experimental in nature.
Potential Pitfalls: Security, Bias, and Unexpected Consequences
The increasing autonomy granted to AI agents introduces several significant potential risks that demand careful consideration. Security vulnerabilities represent a primary concern, as increasingly powerful agents may inadvertently expose sensitive systems or personal information to exploitation.
Algorithmic bias presents another critical challenge. When AI systems are trained on data containing historical biases, they can perpetuate and even amplify unfair or discriminatory outcomes. Margaret Mitchell, chief ethics scientist at Hugging Face, has co-authored important research specifically warning against fully autonomous agents. Perhaps most concerning is the risk of unexpected behavioral consequences. As AI agents grow increasingly sophisticated, accurately predicting their behavior across diverse scenarios becomes exponentially more difficult.
The Need for Guardrails: Mitigating Risks and Ensuring Responsible Use
To effectively harness the transformative benefits of AI agents while minimizing associated risks, establishing robust guardrails becomes absolutely essential. This includes implementing comprehensive cybersecurity protocols specifically designed for agentic systems.
Addressing algorithmic bias requires meticulous attention to data collection methodologies and ongoing evaluation. As the World Economic Forum astutely observes, it’s vital to remember that AI agents represent not a “set it and forget it” solution and require continuous human oversight throughout their operational lifecycle. Mitchell actively advocates for systematically incentivizing protective guardrails and envisions a future where agents are carefully optimized for specific, well-defined tasks.
The Future is Agentic: Reshaping the Internet and Beyond
Agents Interacting with Agents: A New Digital Ecosystem
The future of AI agents extends far beyond isolated human-machine interactions; it encompasses the emergence of a sophisticated digital ecosystem where multiple agents seamlessly collaborate with each other to accomplish complex objectives.
This interconnected network of specialized AI agents could radically transform tasks that currently require multiple manual steps and extensive coordination. As noted in the original research: “Soon, she says, we’ll see agents interacting with agents – your agent could work with mine to set up a meeting, for example.” This collaborative approach has profound implications for efficiency and workflow optimization.
The Shifting Landscape of the Internet: From Human-Centric to AI-Friendly
The proliferation of AI agents is poised to fundamentally transform the underlying architecture and design philosophy of the internet. Johannes Dohmke presents a compelling vision where the traditional concept of the homepage will gradually diminish in importance as websites evolve.
This signals a significant shift away from conventional human-centered web design principles toward structures optimized for AI agent interaction and interpretation. The original research reinforces this perspective: “Brands may start competing for AI attention over human eyeballs,” suggesting a fundamental realignment of digital marketing strategies and user experience design.
Beyond the Screen: Embodied AI Agents and the Physical World
The evolution of AI agents extends well beyond digital interfaces into the physical environment around us. Margaret Mitchell offers a striking prediction that AI agents will soon handle tangible household tasks like laundry, dishwashing, and food preparation.
This vision of “embodied AI” represents agents capable of sophisticated interaction with physical objects and environments. This expansion beyond screen-based interaction constitutes a revolutionary advancement in human-computer interaction. The World Economic Forum notes that this integration of AI agents into various aspects of daily life is intrinsically connected to pioneering new forms of human-computer interaction across multiple domains. However, Mitchell’s cautionary addendum, “Just don’t give them access to weapons,” underscores the continued importance of ethical frameworks and safety protocols in this rapidly developing field.
In conclusion, the rise of AI agents represents a profound transformation in our relationship with technology. While the potential benefits for efficiency, accessibility, and innovation are extraordinary, the associated risks of increased autonomy demand careful consideration and proactive management. The future of computing is undoubtedly agentic, but realizing its full potential requires thoughtful planning, robust ethical guidelines, and ongoing human oversight.
Tags
Read More From AI Buzz

Perplexity pplx-embed: SOTA Open-Source Models for RAG
Perplexity AI has released pplx-embed, a new suite of state-of-the-art multilingual embedding models, making a significant contribution to the open-source community and revealing a key aspect of its corporate strategy. This Perplexity pplx-embed open source release, built on the Qwen3 architecture and distributed under a permissive MIT License, provides developers with a powerful new tool […]

New AI Agent Benchmark: LangGraph vs CrewAI for Production
A comprehensive new benchmark analysis of leading AI agent frameworks has crystallized a fundamental challenge for developers: choosing between the rapid development speed ideal for prototyping and the high-consistency output required for production. The data-driven study by Lukasz Grochal evaluates prominent tools like LangGraph, CrewAI, and Microsoft’s new Agent Framework, revealing stark tradeoffs in performance, […]
