ChatGPT is changing the way the world is thinking about artificial intelligence. I’m already seeing the impacts the large language model (LLM) is having on industries such as healthcare, transportation, and software. Rivals have fumbled to release chatbots themselves. The Google Bard launch was a PR disaster. However, the narrative may change when Google launches Bard to everyone. In this article, I’ll discuss the up-hill battle that Google will be fighting.
What is the current state of Large Language Models (LLMs)?
In March 2022, OpenAI released GPT-3 to the public. GPT-3, which contains over 175 billion parameters, requires 800GB to store in its entirety. In order for such a large model to be trained, significant compute and storage resources are required.
Smaller companies building on ChatGPT
However, startups are still trying to capitalize on language models despite the large barrier to entry. Y-Combinator is seeing lots of applicants looking to build the “ChatGPT for X.” These funding hopefuls will train ChatGPT with domain-specific data to enrich the GPT-3 model for even more focused expertise. Among them, Baselit is building a data assistant that will allows user to write queries in plain English. Similarly, Forethought uses domain-specific data to provide a support chatbot to users. Chatsonic extends ChatGPT’s 2021 knowledge cut-off by training with recent events. If you’re interested in determining the winner of the 2022 World Cup, Chatsonic would be your best bet.
Tech giants developing ChatGPT competitors
Tech behemoths like Microsoft and Quora are working on chatbots of their own. Microsoft released a chatbot enhancement to Bing which enables conversations with the search engine. However, an article by NPR demonstrates the technology isn’t nearly as user-friendly or accurate as ChatGPT. In one instance, the bot called the reporter things such as “ugly, short, overweight.” In another, Bing’s chatbot fell in love with the user. Quora has also introduced Poe as a stand-alone app that allows users to ask it questions which then query ChatGPT or Anthropic. Another rival in the race to AI superiority, is the search giant, Google.
Google Ain’t No Slouch
Google hasn’t been sleeping on developing their own large language models. In April 2022, they published a paper describing the PaLM model. which contained more than 540 billion parameters, nearly 3 times the size of GPT-3. PaLM aims to generalize across domains better than GPT-3. In 2023, Google AI released PaLM-E which is a multimodal version of the Pathways model that use visual input in addition to textual input to provide answers and solutions. Despite being a force in the field of AI, Google appeared to be caught off guard after the release of ChatGPT at the tail end of 2022.
Google launches Bard: The Rushed Release
Even though Google researchers are no stranger to LLMs, Google requires extensive quality control before release to the general public. Google is being very conservative in their widespread release of their chatbot. Jeff Dean, the lead of AI for Google, pointed out that AI which has not been thoroughly tested runs the risk of making up facts, ultimately causing more harm than good. The company expressed concern about the large risk of a bad release if quality control wasn’t extensive enough.
However, in February 2023, the cost of not competing in the AI race outweighed the risk of damage to their reputation. The company announced their chatbot, Google Bard. In a demo of Bard, the model stated that the James Webb Space Telescope took the first pictures of a planet from outside of our solar system. This turned out to be incorrect, after astronomers on Twitter pointed out that another telescope technically deserved that accolade. This mistake cost Google $100 billion in market share shortly after the mistake in their demo was identified.
These announcements clearly show that Google has been feeling the heat from the release of ChatGPT. Shareholders are causing pressure on Google to release a chatbot that keeps them relevant in the AI discussion.
Google Bard launch to the general public
Due to the risk of LLMs going haywire and causing damage to their reputation, Google is continuing to validate Google Bard. In the meantime, they implemented a waitlist system to allow them to test on a smaller sample size and fix any problems on a smaller scale.
Was Google Bard trained on ChatGPT?
Since the bungled release of Google Bard, there have been continued concerns over how the model has been trained. Since ChatGPT has been open to the public since November 2022, there is speculation that ChatGPT input/output pairs have been used to train their language model.
A spokesperson from Google assured Verge that Google Bard was not trained on data taken from ChatGPT. These assurances, especially with such a large amount of money on the line, should be taken with a grain of salt. If we look at similar races to be the first to release a certain technology, there’s always an attempt to copy the competition. Just take a look at Microsoft’s Zune trying to replicate what the Apple iPod did so flawlessly. Why would the race to develop the best chatbot in the world be any different?
A common approach for large language models is to utilize datasets similar to the Google C4 Dataset which is a collection of 15 million website snapshots. The Washington Post recently worked with the Allen Institute of AI to analyze the composition of this collection. The authors found that among the top of the list for the Google C4 dataset were patents.google.com and wikipedia.org. Until the full research papers are released by these companies, we can’t be certain of their architecture or the training data that was used.
The future of Google Bard
Google Bard, even with issues so far, likely has a bright future. The company recognizes the importance of competing in the race for the best large language model. They also have the financial and knowledge capital to come out on top.
One area in which they clearly need to improve is accuracy of the model output. Unfortunately, the early demo mishaps have thrown the accuracy of their model into question. Language models, including ChatGPT, have provided incorrect answers to questions before. This is concerning since users begin to trust the tool as an all-knowing source of truth. In turn, less fact-checking gets done and poor research habits can lead to the potential spread of misinformation. In the future, Google plans to combine Google search and Bard into a search powerhouse to be named “Magi.” If the company produces a reliable and performant model, we can expect this would eventually find its way into the many other products in the Google suite.
The AI war has just begun
These large companies are competing in a technological race with trillion dollar stakes. AI companies developing large language models need to balance technological progress with ethical considerations. Ultimately, regulation will be needed to assure everyone is playing fairly. One thing is for sure: the AI wars are just getting started.