We are witnessing live how artificial intelligence (AI) is driving remarkable changes in the world we live in. The list of areas that are being affected in one way or another by the dizzying development of this discipline is enormous, although these days there is a lot of talk about copyright.
The conversation revolves around the position of the big tech companies in the face of a possible update of copyright regulations in the United States -which could set a precedent worldwide- that contemplates the idea of paying for data used to train AI systems. Big Tech is not amused.
A potentially industry-changing change
Before moving on, it is necessary to remember why we have come this far, and we can do so in a simple way without delving into technical details. The language models that bring AI tools like OpenAI’s ChatGPT, Microsoft’s Bing Chat and Google’s Bard to life have been trained on huge data sets.
The language models involved in the above example are GPT-3.5, GPT-4 and PaLM. It is thanks to the capabilities of these that the aforementioned chatbots can help us draw up a travel itinerary, write poetry or explain nuclear fission with apples. So where does all this information come from? Here’s the controversy.
In general, there may be exceptions, the datasets are made up of information gathered from Wikipedia, blogs, news sites, books and code from GitHub-style platforms. And, in all of this, there is copyrighted material. There are also models trained with images and videos available on the web.
As generative AI evolves and becomes more popular, authors of works of all kinds have begun to complain – and in some cases sue – tech giants for using their work without permission. However, there are also those who claim that the current legislative framework does not contemplate this reality we are witnessing.
One of the pieces in this swarm is the U.S. Copyright Office, which has committed itself to address the issue, opening the door to a possible remuneration scheme for authors. At the moment, a comment period is underway, where the office is receiving comments from the parties involved in the matter.
Well, some of the biggest tech giants on the planet do not look favorably on changes in this regard. According to Business Insider, Meta has pointed out in its filing that “it would be impossible for AI developers to acquire copyright licenses for critical works,” and that massive amounts of data are used.
Other firms such as OpenAI, Microsoft and Google have taken a similar stance by asserting that so much data is used that there is no viable way to pay for it all (through licensing). TechNet, a group representing these companies, has also said that this scheme would hinder the development of artificial intelligence.
Google claims that its AI models have used a basis known as “knowledge harvesting,” which is allowed under current copyright laws. A change along these lines, according to the search giant, “would impose a crushing liability on AI developers.