voted text but still ambiguities

The EU is well on its way to achieving its challenge: the adoption of the first major regulation governing the uses of artificial intelligence (or AI Law). The Regulation on Artificial Intelligence (RIA) was politically agreed on 8 December 2023 before being voted on by MEPs on Wednesday 13 March, pending a vote in May by the EU Council. However, its full implementation will not take place for two years, i.e. 2026. Initially, regulation could be done mainly through codes of good practice.

Over the past year, many have been able to experience the prowess of tools capable of producing a text like ChatGPT or an image like Midjourney based on a few instructions given in a “prompt”. Also featured was HeyGen, which lets you create a video avatar by cloning your own voice and customizing your lip movement to speak in a chosen language. Google has released a genetic artificial intelligence that allows the creation of music from a simple melody, MusicLM, in anticipation of Sora, the revolutionary video generator from Open AI, creator of ChatGPT, which will soon be on the market and will be able to creates realistic videos by simply entering text.

The public space is also full of depictions of the dangers created by the use of these applications. In the United States, two lawyers who had sought help from ChatGPT found themselves trapped and referred to case law that simply did not exist. Beyond the illusions, it is the dangers of misinformation that are worrying, especially in this election year.

The voted text proposes a framework for artificial intelligence systems based on a risk-based approach. Some AIs are banned, others can only be placed on the market after being checked for compliance. The idea: find a compromise between regulating practices and not restricting innovation. At this very point France tried to soften this text. However, this is a delicate balancing act.

A risk-based approach

The creation of a special regime for general purpose artificial intelligence models (general purpose artificial intelligence in English or GPAI) is undoubtedly one of the most important new features introduced during the negotiations. These models are trained on a large amount of data and are capable of performing a wide range of tasks and being integrated into many varieties of systems or applications. They act as a basis for other systems. Thus, ChatGPT is a genetic artificial intelligence specifically built from the GPT-4 language model which is a GPAI.

The RIA introduces a new category, that of GPAI models that may create systemic risks. These are those that exceed a certain computing power or have at least 10,000 registered business users, limits which the European Commission can adjust to take market developments into account. The concept of “systemic risk” refers to models that are likely to have a significant impact on the EU and “actual or reasonably foreseeable adverse effects on public health, safety, public security, fundamental rights or society as a whole. They can potentially spread on a large scale, especially among users of these models when they integrate them into their systems or applications.

[Plus de 85 000 lecteurs font confiance aux newsletters de The Conversation pour mieux comprendre les grands enjeux du monde. Abonnez-vous aujourd’hui]

This approach using the concept of systemic risk can be compared with another flagship text recently adopted by the European Union, the Digital Services Act (DSA). This imposes additional obligations on very large platforms and very large search engines, in particular with regard to systemic risks associated with control, content or recommendation systems. The process for defining the GPAI models is similar to the process for defining the significant players mentioned above in the DSA: either the self-designating players or the European Commission, assisted by a scientific panel, can unilaterally include them in the list her. systemic risk GPAI models. This list will be public.

Beyond these analogies, important differences are noticeable. At DSA, the goal is to significantly strengthen the oversight of the big players compared to other less powerful online platforms and search engines. In the RIA, the creation of the new class of GPAI models resulted in a more lenient regime for them than that of high-risk AI systems. In other words, it is about finding a balance between a framework that is as flexible as possible and as restrictive as possible.

The special status of GPAI models is essentially based on a series of transparency obligations rather than requiring compliance prior to their entry into the internal market as for high-risk SIAs, which is not without ambivalence.

A “quite detailed summary”

The RIA will require all providers of GPAI models to prepare and maintain technical documentation containing a set of accurate information. This will make it especially possible to know what datasets were used to train the model, test it, validate it, and also where that data came from and how it was collected. This technical documentation must include the known or estimated power consumption of the model.

The new text does not require this technical documentation to be made public, but only to be forwarded to regulatory authorities. Only a “sufficiently detailed summary” of the content used to train the model should be accessible to all, an ambiguous formulation at best.

Pessimists will point out that, unlike the DSA, there is in principle no provision for direct access to this data either for regulatory authorities or for researchers or independent experts responsible for testing these models. In other words, the delivery of this critical information will first be subject to the good faith of the actors. The latter should, however, cooperate with the competent national authorities and the Commission. In case of non-compliance with its demands, the latter can fine suppliers of GPAI models up to 3% of their global annual turnover for the previous financial year or 15 million euros.

GPAI model providers must also provide technical documentation to AI system providers that use and integrate the GPAI model into their applications or systems. This information has a dual purpose: to understand the tasks for which the model is intended, and to take responsibility for anyone who does not use it for its intended purpose or who makes substantial modifications. In such cases, the legal consequences are transferred: the developer would become the designer of a new GPAI model with the specific constraints that come with it.

Those who import or place on the market in Europe artificial intelligence designed outside the territory of the 27 are also liable. In general, however, the obligation of technical documentation as it stands does not make it possible to sufficiently clarify to what extent these will be subject to the same restrictions.

Is copyright really protected?

The RIA also requires providers of GPAI models to establish a policy regarding respect for copyright. And here this is not without ambiguities. The impact of Generative AI on the latter is at the center of public debate in both Europe and the United States.

In the EU, the debate focused on the need to revise the text mining exception that allows artificial intelligence models to use data that is covered by copyright but is freely accessible on the Internet, as long as copyright holders do not object. The RIA does not challenge this, but requires GPAI model providers to implement technologies ensuring that they respect the opposition expressed by the authors.

In other words, the text encourages authors to organize to exercise their rights of opposition. For their part, GPAI model providers must be able to certify that these rights have been respected. They should also certify the automatic deletion of data covered by the exemption after their model has been trained, tested and validated. We can, however, remain skeptical of the idea that the publication of a “detailed summary” of the content of the data used by GPAI models, which in practice allows rights holders to be able to verify the possible use of protected data their.

A reasoned “open source” exception.

Models that are publicly available under a free license and allow open access to their technical features (we talk about “open source”) do not need to implement a copyright policy or publish a detailed summary of the content used to train them. This is for example the case of LlaMA, the Meta language, but not of GPT4 which is a “proprietary” model, i.e. whose technical bases are not shared.

This exception echoes the RIA's general benevolence for research. However, RIA is not locked into a naive approachOpen source. The exemption does not apply to models provided for a fee, GPAI models with systemic risk, or those that only meet some of the criteria that define an open source model. The question could arise in the future for the French start-up Mistral IA.

A cask of the Danaids

The effectiveness of the Regulation's regulation of GPAI models, thus characterized by numerous ambiguities, is ultimately subordinate to the governance framework it proposes. Particular attention should be paid to the Commission's newly established artificial intelligence office. These latter are entrusted with most of the control and support of these actors for compliance.

The task will consist in particular of approving a series of documents that will clarify the many ambiguities of the RIA. The project could be compared to a Danaïdes barrel aimed at constantly adapting regulations to technical and market developments. But in the end, isn't the point of all regulation to allow more day-to-day adjustments to the legal framework by listening to stakeholders and technical progress?


This article is part of the archive “General artificial intelligence: behind the scenes”, published by Paris Dauphine University – PSL online scientific media.

Leave a Reply

Your email address will not be published. Required fields are marked *