ChatGPT essay series | A Computer Scientist's Perspective

While the Italian Data Protection Authority is evaluating the commitments forwarded by OpenAI in response to the well-known measure, Paolo Fantozzi (engineer and research fellow in computer science at LUMSA University in Rome) gives us his point of view on ChatGPT.

The expert presents us with several problems potentially related to LLMs, such as a misalignment between its scope and our expectation, while the so-called Artificial General Intelligence might not be so close. Moreover, the concrete risk of running out of quality data in the medium to long term to train these systems, as well as sustainability with respect to the level of electricity required for large-scale training by all stakeholders.

However, the discussion also brings us some solutions on how to address the tool: attention should not be devoted to the tool itself, but above all to its method of use. Besides, the human factor also remains relevant, as in the case of anti-plagiarism: specialised software is easy to trick, but only at the first step. Human intelligence allows evaluating each essay in its coherence, accuracy and solidity in reasoning. Fantozzi gives us his view on the best regulatory approach that is possible at present: imperfect regulation that has the needed flexibility to adapt itself, so as to understand how to use these systems to improve our lives and protect ourselves.

1. How should we define but especially use ChatGPT and similar systems?

This type of model is usually defined as a Large Language Model, and in this precise case we refer to a model trained to cope with a task of text generation. The names of both the type of model and the type of task already should be indicative of the goal of this approach: they are devoted to generating text in a certain language (actually more than one), and not to providing facts or data. So, it means that these models give their best in some tasks such as: spell and grammar checks, summarization, extraction of the important part of a text, extraction of information from a text, extension of a text, etc. You can notice that all of these tasks are focused on analyzing facts and information provided by external actors (humans or also databases or services). None of them aims to create information. It is the same as using a knife to screw in a screw: you could do it, but you know that something could go wrong.

2. What are some of the potential risks associated with ChatGPT, and how can they be mitigated?

In my opinion, the main risk associated with ChatGPT and all the hype surrounding it, is raising expectations too much with respect to this kind of technology. Indeed, most of the media and people are starting to think of them as a form of Artificial General Intelligence, as we call it in the computer science field. The difference between AGI and AI systems that we use today is the ability to actually understand and solve intellectual tasks. To ease the concept: an AGI system would act like robots in Asimov’s science fiction books, while the current state-of-the-art AI systems still have many difficulties driving a car autonomously in traffic. This means that at a certain moment in the near future, people will realize that these systems are not so “intelligent” and I’m afraid they could start to think of them as useless and expensive toys, while the AI systems could help human society in many ways.

3. What is the potential impact of ChatGPT in teaching, both in terms of learning methodology and students' critical thinking and creativity in writing?

I’m convinced that it will not have such a huge impact on teaching, but it could have a big impact on teachers. I remember that in primary school we used to visit the library to search for information about some event, or historical character, or also some country. Then suddenly digital encyclopedias came out, so we just could print the same information at home with almost zero effort. The effect of this new technology was not the disappearance of homework, just their evolving. Now it is rare to ask a student (of any grade) to compile research containing well-known facts about something, but it is much more common to ask them to discuss these facts and their impact or importance. For a few years there exist software (and ChatGPT is just one of them) that can write essays about any well-known topic. From my point of view, it is just the moment to evolve again the homework and the exams teachers give to their students. Maybe we could start to ask students to discuss the differences between their points of view and those from historical characters. Anyway, it is just a matter of refocusing demands and topics.

4. Are anti-plagiarism software enough or easy to trick?

They have always been easy to trick. Most of them are based only on text matching between documents. So, they can tell if some part of your essay is actually equal to a part of a document in a database. Just very lazy students leave these parts unchanged: it would be enough, for any student, to rewrite with his or her own words the same sentences and no software could ever understand if she or he copied from a source, also because a human could neither tell. This is the reason why anti-plagiarism software is just used as the very first step, then we evaluate each essay as a whole, using its coherence, its accuracy and its solidity in terms of reasoning.

5. Speaking of a known problem such as 'invented' sources, would it be possible (and a solving method) to make ChatGPT communicate with online libraries containing major academic databases?

The invented sources reported by ChatGPT are just a consequence of the misalignment between its scope and our expectations. We think of the chatbot as an “intelligent being who knows everything” and so when we ask it for the source of its knowledge, we expect an explanation of its reasoning. But it is just completing a sequence of words with something that is plausible at the language level. So, it provides citations with some plausible names, a plausible title and a plausible journal. But it is not guaranteed that they are about the same topic or event that they exist. We also can’t make it communicate with online libraries simply because the model itself does not know it is writing a citation.

In the last month (March 2023) OpenAI has introduced a new version of the services, the so-called plugins. It means that it tries to understand the type of question the user is asking and redirects it to the best service with respect to the question. In some sense, plugins at the same time improve results and decrease the impact of the training data on the model. This mechanism is much more similar to a receptionist who understands what the user wants and guides him to the correct office. There is still no plugin available to provide citations, but it is a first step.

6. Can ChatGPT also be used to improve education, teaching and training, and what regulatory measures should be taken to ensure that it is used effectively in this capacity?

I think that everything in the world could be used to improve education and teaching. It only depends on the teacher and her or his ability to use something as a teaching tool. We could ask the same question about anything: from a book to a video game. It doesn’t matter if a book contains messages of hate or a video game contains violence, it only matters what is the use of the tool: Assassin’s Creed video game saga could be used to teach history and history of art (also comparing differences between the game and the actual events or artworks), even if they are not history books. A simple example of using ChatGPT to teach could be as an exercise generator in the broader sense: we could ask students to generate essays about some topic and then correct them or compare them with facts they already know. But, again, this is just an example; I’m sure that good teachers could find better ways to exploit the characteristics of a chatbot in order to enhance students’ learning.

7. How do you see ChatGPT evolving in the coming years, and what new opportunities and challenges will emerge?

Unfortunately, the research work behind ChatGPT (and also the work behind all the recent models from OpenAI) is not publicly available because of a commercial strategy by OpenAI (in the field it is common to read comments about it calling it ClosedAI). So, we can’t know how this precise family of tools will evolve in the future. But ChatGPT is not the best model (with respect to standard benchmarks) nor the biggest model (with respect to the number of parameters), and all of the state-of-the-art models are available at least as scientific papers to be studied by researchers. The two leading actors in this field are Google and Meta, and both of them are doing very good research moving in different directions. One of the last models presented by Google, that is PaLM, is (probably) the biggest model available and it is capable at the same time of solving arithmetic problems and explaining jokes. One of the last works from Meta, that is LLaMA, proved that it is possible to achieve similar results with smaller models, just using better and larger data and training the model longer. So, the research is very active and each month there are new discoveries and opportunities.

Looking at the other side of the medal, all these models require a vast amount of power to be trained, so there are only a few companies in the world that can carry forward the research in this field, and also no academic institution has enough funds to train a model with this size alone. Even if we can’t blame the companies who can afford it, it can be seen as a threat to the academies and to the freedom of the research itself.

At a higher level, the main challenges in the field will concern energy and data availability. According to the estimations of Prof. Tom Goldstein from the University of Maryland (source), the training of Google PaLM required about 0.0001% of US annual power, and if the current trend will remain the same, then the training of LLMs would use all of the planet's electricity by 2029. Of course, it is not a feasible trend, and so different solutions must be explored and tested. But it is not only a matter of energy, because all these models need data to be trained, and furthermore, they need high-quality data. According to a paper written by researchers from many important universities, we could run out of high-quality language data between 2023 and 2027. This means that if we continue to add data to expand training sets, the new data will be poor quality data: social network posts containing some form of slang, highly biased data (with respect to gender, race or anything else), etc. So, we need to find a way to use the high-quality data we already have to evolve the models.

8. What advice would you give to businesses and individuals looking to incorporate ChatGPT into their operations or daily lives, and how can they ensure that it is being used ethically and responsibly?

My suggestion about new machine learning-based systems is always the same: first of all, you need to study carefully the goals you want to achieve using this tool. 99% of the time it is enough to use a smaller and cheaper model to achieve the same results. It is important to understand that we already use this type of tool every day: just think about the auto-complete systems on the keyboard of your smartphone; I don’t think you would switch it with ChatGPT only because it is more powerful, the result would be the same anyway. So, you should start with simple tools and then, if you encounter limitations which slow you down in some way, then you can try a more complex tool, and so on and so forth. Like Confucio said, “Do not use a cannon to kill a mosquito”.

After that, if you need to use this type of tool keep in mind always that it is not reasoning, and it does not know everything. So always double-check data and information, and try to use it as a starting point, not as a result. It is also possible that all the work you have to do to double-check the answer from a prompt could be even more than writing it from scratch. So, use it only if you are sure, you will save time.

9. Any specific comment on ChatGPT's answer to our question about its risks and regulation? Are any other considerations you feel are appropriate?

Of course, ChatGPT’s answer is just the summarization of long-time discussions about AI-generated content. In my opinion, we can’t wait too long to build “the perfect regulation” about AI systems, because it is a field which changes very quickly. So, I think it would be better to have some imperfect regulations (maybe to cope with copyright infringement) and then evolve them as time passes. Also, I don’t think that every AI system should be regulated with the same rules, because the applications are very different. For instance, I think it would be important to have strict rules about responsibility in autonomous driving, and something much simpler about text generation models. They are just not the same thing. Also, it is important that regulators start a dialogue with technicians and experts in the field of AI, to understand not only how these systems actually work, but also how the field will evolve in the next future. We need to understand how to use these systems to improve our lives, and not to block the technology to keep our lives as they are now.

Paolo Fantozzi

Engineer and research fellow in computer science at LUMSA University in Rome