Unveiling the Limitations of ChatGPT in Education: The Risks of AI Hallucinations, Bias, Misinformation, and Disinformation

Generative AI has revolutionized content creation in diverse fields such as art, music, film, literature, and coding. Large Language Models (LLMs), like ChatGPT, have opened up new avenues for answering questions and generating text in response to natural language commands. However, LLMs face significant risks such as artificial intelligence hallucination, algorithmic biases, and the production of false or inaccurate information. I propose a few measures to mitigate these risks. By adopting these measures, we can enhance the trustworthiness and reliability of LLMs, ensuring that they fulfill their potential as powerful tools for content generation and knowledge dissemination.
My desk according to generative AI.

One branch of AI that has become increasingly popular in recent years is generative AI, which focuses on creating new and original content instead of simply identifying or classifying existing elements. Generative AI accomplishes this by using algorithms and deep learning models that have been trained with large amounts of data.

Generative models can be applied in various fields, including graphic design, music, film, software coding, literature, and more. For example, a generative art model can create new images from a wide variety of styles and patterns (Midjourney, DALL-E2, Stable Diffusion), while a generative music model can compose original pieces with different tones and rhythms (MuseNet, MusicLM). The image of this post was created using an AI image generation model. However, it is important to note that while generative AI can create impressive things, it is still limited by the data and training it receives, so the final result may be imperfect or may not meet all expectations.

A Large Language Model (LLM) is a type of Generative AI that can answer questions, write articles, translate languages, or summarize texts in response to commands written in natural language. One example is ChatGPT, and according to -itself- (ChatGPT defined itself by answering the prompt “Describe briefly what are you”), it is a Large Language Model which was trained on a vast amount of text data to understand and reply to questions prompted in natural language (English, Spanish, French, and a wide variety).

It has been deployed by the company OpenAI as a chatbot (accessible at http://chat.openai.com), and it can answer questions, provide information on various topics, and engage in conversations with users. Though ChatGPT’s performance has improved compared to OpenAI’s earlier LLM versions (such as GPT2 and GPT3), it suffers from various shortcomings and problems related to its own operation as an LLM system, such as the so-called artificial intelligence hallucination (Lakshmanan, 2022), algorithmic biases (Angwin et.al., 2016), and the production of false or inaccurate information (Wadhwani, 2022).

ChatGPT describing itself.

a. AI Hallucinations

The sycophantic nature and eagerness to please of LLMs, such as ChatGPT, make them prone to hallucinations (Metz, 2022) and to writing incoherent responses (Marcus, 2023a) using eloquent language and a confident tone. AI hallucinations are caused by various factors, such as errors in training data, data classification, and labeling, programming, inadequate training, or the inability of systems to correctly interpret the received information (Bordoloi, 2023).

LLMs tend to fail miserably in logic and math tests. The reason is that ChatGPT, like other LLMs (such as Bing AI, LaMDA, and chatGPT), is programmed to construct text that responds coherently to the prompts entered, i.e., its priority is to comply with our requests, not to adhere to reality.

Galactica, a Meta chatbot for scientific research, was shut down two days after its unveiling in November 2022 (Marcus, 2023b) for producing fake scientific publications and wiki articles (Wadhwani, 2022). Microsoft recently released a test version of its Bing search engine that incorporated an OpenAI-developed chatbot. Unexpectedly, a second personality named Sidney emerged from this tool (Romero, 2023a), who threatened users, confused dates, had an existential breakdown, and even tried to break Kevin Roose’s marriage (a New York Times reporter) by declaring its love for him (Roose, 2023).

Likewise, when interacting with ChatGPT, we can expect inventions, falsehoods, and inconsistencies. The next image exemplifies this situation with a problem: “2 cars take a trip in 2 hours. How long would it take for 4 cars to complete the same trip?”

ChatGPT is not great at doing math.

b. Bias

The training data used by AI generative models may include biases. This is because they are records of human behavior and may reflect discrimination based on age, race, gender, and religion, among many others. Thus, it is possible for AI models to reproduce or even amplify and perpetuate biased or discriminatory responses. An example was Tay, Microsoft’s AI-based chatbot, launched in 2016. Tay was trained with Twitter messages and took less than 24 hours to post tweets with misogynistic, racist, pro-Nazi, and anti-Semitic opinions (Kriebitz and Lütge, 2020).

ChatGPT has been accused of having a political bias for refusing to write a poem about former President Donald Trump, to which the chatbot refused, claiming that its guidelines do not allow it to create “partisan, biased, or political content.” When asked the same question about President Joe Biden, a poem appeared that portrayed the current U.S. president positively. Although OpenAI has addressed the issue, and now chatGPT can generate poems about Trump, it has argued that the chatbot was adhering to its directive to “avoid taking positions on controversial topics” (How should AI systems behave, 2023), and it is reasonable to argue that the former President Trump is one of the most controversial figures in recent years. Finally, this fact triggered a debate about whether the chatbot is biased and whether its responses can be trusted for other indications (Johnson, 2023).

ChatGPT’s guardrails.

c. Disinformation and misinformation

It is important to note that chatGPT and similar language models lack scientific rigor and are incapable of producing new scientific knowledge. Moreover, they often lose coherence and become repetitive in lengthy texts. As defined by Alberto Romero (2022b), it can be considered a statistical charlatan that is limited to regurgitating the most suitable words for the context of the received prompt.

Therefore, ChatGPT, like other LLMs, is not a reliable source of information. Its tendency to hallucinate and the possibility of delivering biased, inaccurate, or completely false information makes the user vulnerable to believing and spreading disinformation, either consciously or unconsciously. 

Misinformation and disinformation are two related but distinct concepts. The first refers to content that is factually incorrect, while disinformation is deliberately false or misleading content that is designed to deceive people. Both can be generated by AI models and, when used in an educational context, can lead to students learning incorrect information. 

ChatGPT has the pontential to automate disinformation.

There is a risk that students or teachers may rely too heavily on the results generated by the tool (known as automation bias) and submit work with false information. This can have negative consequences on their ability to make informed decisions and succeed in their academic pursuits and could have disastrous effects if it were to be widely published or disseminated.

Measures to mitigate the risks of misusing ChatGPT in education

To avoid falling prey to AI hallucinations and misinformation, it is crucial to recognize and reject our own “automation bias.” This bias refers to our tendency to place excessive trust in the outputs generated by machines or algorithms. We often assume that machines are infallible and do not make mistakes, but as we have learned, this is not the case, especially with AI systems trained using machine learning techniques. By being aware of this bias, we can take steps to scrutinize and verify the information presented by AI systems rather than blindly accepting it as accurate.

Additionally, all information received from chatGPT should be fact-checked. It is recommended to cross-reference and verify it with multiple reliable sources. This step will help verify the accuracy and authenticity of the information before utilizing it in any way.

Likewise, chatGPT can be used to generate assignments, essays, or research projects (and blog posts!) and present them as their own, constituting academic fraud. This can be addressed to some extent using didactic strategies such as flipped classrooms or tools to detect text generated by AI, such as DetectGPT, GPTZero, or OpenAI Classifier. Though they are not 100% accurate it is better than nothing.

Moreover, it is imperative to promote the implementation of an AI Literacy curriculum at all levels of education to equip students with the knowledge and skills necessary to navigate the limitations of chatGPT and other AI systems. While its use should be encouraged, it is essential to ensure that students are supervised by a professor to prevent the consumption of misinformation and fake information.

In conclusion, the arrival of ChatGPT has created a significant shift in the field of artificial intelligence, with its ability to generate natural language text in a fluent and coherent manner. While it has been able to pass exams like the USMLE and MBA, its arrival in education has caused concern, with many schools blocking access due to the possibility of academic fraud. However, some institutions have embraced the tool, with Yale University issuing guidelines and several schools incorporating it in classes. Scientists are also using chatbots as research assistants. As ChatGPT is a disruptive technology, we must learn to use it correctly and adopt it as a tool to enhance our capabilities in higher education. It will bring many changes to the field, and we must adapt to its advancements.

References

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine Bias — ProPublica. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Bordoloi, S. K. (2023, February 7). The hilarious & horrifying hallucinations of AI –. Sify. https://www.sify.com/ai-analytics/the-hilarious-and-horrifying-hallucinations-of-ai/

Johnson, A. (2023, February 3). Is ChatGPT Partisan? Poems About Trump And Biden Raise Questions About The AI Bot’s Bias. Forbes. https://www.forbes.com/sites/ariannajohnson/2023/02/03/is-chatgpt-partisan-poems-about-trump-and-biden-raise-questions-about-the-ai-bots-bias-heres-what-experts-think/

How should AI systems behave, and who should decide? (2023, February 16). OpenAI Blog. https://openai.com/blog/how-should-ai-systems-behave/

Kriebitz, A., & Lütge, C. (2020). Artificial Intelligence and Human Rights: A Business Ethical Assessment. Business and Human Rights Journal, 5(1), 84–104. https://doi.org/10.1017/bhj.2019.28

Lakshmanan, L. (2022, December 16). Why large language models (like ChatGPT) are bullshit artists | by Lak Lakshmanan. Becoming Human: Artificial Intelligence Magazine. https://becominghuman.ai/why-large-language-models-like-chatgpt-are-bullshit-artists-c4d5bb850852

Marcus, G. (2023, February 12)a. What Google Should Really Be Worried About. The Road to AI We Can Trust. https://garymarcus.substack.com/p/what-google-should-really-be-worried?utm_source=post-email-title&publication_id=888615&post_id=102296054&isFreemail=true&utm_medium=email

Marcus, G. (2023, February 11)b. Inside the Heart of ChatGPT’s Darkness. The Road to AI We Can Trust. https://garymarcus.substack.com/p/inside-the-heart-of-chatgpts-darkness

Metz, C. (2022, December 10). The New Chatbots Could Change the World. Can You Trust Them? The New York Times. https://www.nytimes.com/2022/12/10/technology/ai-chat-bot-chatgpt.html

Romero, A. (2023, February 20)a. What You May Have Missed #18. What You May Have Missed. https://thealgorithmicbridge.substack.com/p/what-you-may-have-missed-18?utm_source=post-email-title&publication_id=883883&post_id=103834501&isFreemail=false&utm_medium=email

Romero, A. (2022, December 9)b. OpenAI Has the Key To Identify ChatGPT’s Writing. The Algorithmic Bridge. https://thealgorithmicbridge.substack.com/p/openai-has-the-key-to-identify-chatgpts

Roose, K. (2023, February 16). A Conversation With Bing’s Chatbot Left Me Deeply Unsettled. The New York Times. https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html

Wadhwani, S. (2022, November 2). Meta’s New Large Language Model Galactica Pulled Down Three Days After Launch. Spiceworks. https://www.spiceworks.com/tech/artificial-intelligence/news/meta-galactica-large-language-model-criticism/

Leave a Reply

Your email address will not be published. Required fields are marked *