What is ChatGPT, and where does its information come from?

While artificial intelligence has been a prominent part of science fiction for decades, it has never been more relevant in reality than it is now. More than ever before, we are hearing about the dangers of artificial intelligence, with a particular focus on the entertainment industry, writers, and academia. Perhaps the most significant example is ChatGPT.

There are many AI programs out there, but most operate behind a pay wall, limiting the number of people who will actually buy in. But ChatGPT is theoretically available to anybody, which makes it the tool curious minds are most likely to use.

There are some particularly compelling statistics to show that ChatGPT is having a major influence on the world. According to Forbes, 89% of students surveyed have used ChatGPT to some extent for their homework and schoolwork. OpenAI itself claims that 80% of Fortune 500 companies use the program for work purposes.

Thanks to this boom in interest, there have been many editorials discussing whether the system is a good idea or not, with major questions focusing on whether AIs are stealing jobs from humans and whether AI is disrupting the ability for individuals to learn properly.

But while those are valid concerns, many people find that they don’t know enough about what ChatGPT is and what it’s capable of to reasonably comment on it. For those who want to join the conversation but don’t know where to start, here’s a breakdown of the top questions about ChatGPT.

What is ChatGPT?

ChatGPT (Chat Generative Pre-trained Transformer) was developed by OpenAI as an artificial intelligence that uses dialogue to better achieve its missions. The AI was trained with human prompts and answers, which made it sound more realistic than similar projects.

Currently, interested users have free access to the GPT-3.5 version, which is what most people think of when they are talking about ChatGPT in everyday use. However, there is also a premium version available for $20 per month that uses the higher-quality GPT-4 version.

The ChatGPT program was built with the primary directive of having realistic conversations with human users, and much of its development has been focused on making the generated text flow naturally. Because of this, it has frequently been used as a content production and research synthesization system.

Where does Chat GPT get its information?

The free version of ChatGPT claims to base its responses on: “(1) information that is publicly available on the internet, (2) information that we license from third parties, and (3) information that our users or our human trainers provide.” This data is limited to what was available through 2021.

Those who pay for ChatGPT Plus have access to more information, as GPT-4 was connected to Bing. This allows users to gain more current information, but that information may not be reliable due to the significant amount of misinformation online.

GPT-4 does include citations for its information, but just because the information is from a real source doesn’t mean that it’s a valid one. OpenAI was specific in the information it trained ChatGPT on, rejecting “hate speech, adult content, sites that primarily aggregate personal information, and spam.” It is unclear if these same protections exist for the Internet-accessible version.

How reliable is ChatGPT’s information?

One critical point when it comes to understanding ChatGPT is that it is a work in progress. The system was intentionally released before it was perfected so that the creators could get diverse, useful feedback. While it has the potential to do a lot of good, ChatGPT provides a mixed bag when it comes to accuracy.

Just like teachers used to warn students about using Wikipedia as a short-cut, ChatGPT can go horribly wrong. According to OpenAI themselves, one of its major limitations is that “ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers.” This is a serious problem that many users don’t seem to be aware of.

Furthermore, ChatGPT has a limited source of information to draw from. The free application was not trained on any information past 2021, and it tends to be less accurate when discussing specialized subjects.

For these reasons, the ChatGPT Privacy Policy includes a warning to its users:

"Services like ChatGPT generate responses by reading a user’s request and, in response, predicting the words most likely to appear next. In some cases, the words most likely to appear next may not be the most factually accurate. For this reason, you should not rely on the factual accuracy of output from our models."
OpenAI

Despite this, ChatGPT can be very successful when it is used to answer direct questions. When put to the test, the AI can actually pass most graduate-level examinations, with GPT-4 actually making it into the 90th percentile on a simulated bar exam.

ChatGPT clearly has its uses, but it should not be trusted as a wholly reliable source of information, nor should it be used to write factual reports without external verification.

Does ChatGPT provide real citations?

While the information may or may not be correct, ChatGPT is most dangerous when it comes to finding citations, as it has a habit of making them up.

As an article from Seattle University explains, ChatGPT “hallucinates” citations, building references that sound correct and fit the prompt but that are either incorrectly sourced or do not exist at all.

This has had an impact on real people. Attorneys Peter LoDuca and Steven Schwartz used ChatGPT to collect precedents in a legal filing, which resulted in “bogus judicial decisions with bogus quotes and bogus internal citations.” The lawyers are now in very real danger of losing their careers.

While users can request sources from ChatGPT, they should be considered false unless proven factual.

Can plagiarism detectors tell if you use Chat GPT?

While there are many plagiarism checkers that claim to be able to tell the difference between human and AI written works, they are generally not accurate enough for most people’s comfort. In fact, OpenAI put out their own system to detect AI-written material that had to be revoked after only six months because of its inconsistent results.

When tested by ZDNET, popular plagiarism detectors were only able to determine whether content was generated by humans or AI 40-80% of the time. While this might be better than nothing, it’s certainly not a reliable enough gauge to be used in serious situations.

Cases of academic dishonesty can result in students being kicked out of school, and professional academics could lose their careers. Given that the current technology cannot reliably tell the difference between human and AI-generated content, they should not be used to discredit an individual without further evidence.

Can ChatGPT access and steal your ideas?

Although there are many nuances to this question, the short answer is that yes, ChatGPT does have access to all of your prompts and can use them as training materials.

Those who pay for ChatGPT Enterprise have been assured that, “You own and control your business data in ChatGPT Enterprise. We do not train on your business data or conversations, and our models don’t learn from your usage.”

This information suggests that those who do not pay for the Enterprise version do not have security and privacy in their work. And that is true.

According to the Privacy Policy, “When you use our Services, we collect Personal Information that is included in the input, file uploads, or feedback that you provide to our Services (“Content”).” While they assure that the data is not stored long-term, it can be used to train the systems that produce AI-generated content.

Given that GPT-4 is advertised as being capable of “creative and technical writing tasks, such as composing songs, writing screenplays, or learning a user’s writing style,” this is a very concerning possibility. As has been the case with visual AI content, writers may find their ideas and voice being used without their consent.

While ChatGPT does not save your input and sell it to external companies, it may use your ideas when being asked to generate content by other users. It’s recommended that you do not include any real names or data in your prompts, as they could accidentally be spread to others.

However, users can opt out through the settings by turning off “Chat history & training” under data controls. This may limit the features available, but it is the safest way to protect private information.

Keep an eye on Ask Everest for answers to all the Internet's questions.