Facebook Icon X Twitter Icon LinkedIn Icon YouTube Icon
AI & Regulation: what Copyright and CNIL now demand

AI & Regulation: what Copyright and CNIL now demand

TL;DR - Key Takeaways at a Glance

📖 9 min read

This article explores the new regulatory constraints weighing on AI usage, in particular the impact of the European AI Act on copyright covering training data, and the stricter CNIL requirements around personal data. It details what companies need to know to stay compliant.

Key Points to Remember

  • The European AI Act imposes strict transparency on providers of general-purpose generative AI models (GPAI) regarding the training data used.
  • Companies must now precisely document data sources for their AI models and scrupulously respect copyright.
  • The CNIL is stepping up its audits of AI systems, demanding rigorous compliance on personal data protection.
  • Ignoring rules like the AI Act and CNIL guidelines is no longer an option for SMEs using or developing AI solutions.
  • GPAI providers must publish a detailed summary of the copyrighted content used for training.

The dual pressure reshaping AI

An intellectual property lawyer and a CNIL inspector walk into the office of an AI startup. It’s not the start of a joke — it’s the daily reality of 2024-2025 for any company developing or integrating artificial intelligence.

The AI industry is now navigating two simultaneous regulatory fronts: on one side, copyright questions tied to model training, now framed by the European AI Act. On the other, a CNIL that has multiplied its audits on data breaches involving AI systems. For SMEs using these tools — or agencies like ours integrating them into daily production — ignoring these constraints is no longer an option.

Here’s what’s actually happening, and what you need to understand.

For years, big tech companies trained their models on billions of data points scraped from the web, without asking too many questions about provenance. News articles, literary works, open-source code, artists’ images — everything was fair game.

The European AI Act, gradually entering into force since 2024, flips that logic. Article 53 now imposes a transparency obligation on providers of general-purpose AI models — known as GPAIs — regarding training data. In practical terms: document the sources used, respect applicable copyright, and publish a summary detailed enough to let rightsholders check whether their works were used.

“Providers of general-purpose AI models shall put in place a policy to comply with Union law on copyright.” — AI Act, Article 53(1)(c)

This isn’t trivial. OpenAI, Google, Mistral AI, Stability AI — they’re all affected. And lawsuits are piling up: the New York Times vs. OpenAI, French authors vs. model publishers, illustrators vs. Midjourney. The question isn’t theoretical anymore.

Diagram illustrating the tension between AI model training and copyright compliance in Europe

What this changes for you, in practice. If you use AI tools to generate content — text, images, code — you’re not directly responsible for how the model was trained. But it’s in your interest to pick vendors who document their compliance. A tool built on stolen data is a reputational and potentially legal risk you carry too.

CNIL facing the explosion of data breaches

The other pressure comes from personal data protection. And here, the numbers speak for themselves.

In 2023, the CNIL received over 4,000 data breach notifications — a record. In 2024, the trend accelerated, with special attention paid to systems involving generative AI. Why? Because these tools raise unprecedented questions under GDPR.

Here are the three main friction points that CNIL audits have surfaced:

Prompt data retention. When you type a request into ChatGPT or a similar tool, that data may be used to improve the model. If your prompt contains information about a client, an employee, a patient — you may have just transferred personal data to a third party without a valid legal basis.

Hallucinations and inaccurate data. An AI model can generate false information about a real person. The GDPR imposes a right to data accuracy. Who’s responsible when an AI invents something about someone? The legal answer is still fuzzy, but regulators are starting to point at companies deploying these systems without safeguards.

Non-EU transfers. Most large models are hosted in the United States. Every request you send to an American LLM is potentially a data transfer outside the European Union — subject to strict GDPR rules on international transfers.

Illustration depicting a CNIL audit on the data flows of an artificial intelligence system

The CNIL actually opened an investigation into ChatGPT back in 2023, joining its Italian counterpart (which had temporarily blocked the tool) and the German one. The message is clear: AI is not a free zone when it comes to GDPR.

What it concretely changes for an SME or an agency

Let’s be direct. You probably aren’t building an LLM. But you are using AI tools — and this regulation concerns you all the same.

Here’s what we’re seeing day to day in our agency, and what our clients are increasingly asking us about.

Pick GDPR-compliant AI tools

Not all tools are equal. Microsoft Copilot with enterprise settings, French sovereign solutions like Mistral, or on-premise deployments offer guarantees that the consumer version of ChatGPT doesn’t. Before integrating an AI tool into your workflow, ask a simple question: where does my data go? How long is it kept? Is it used to train the model?

Document your AI usage in your processing register

The GDPR requires a register of data processing activities. If you use AI to handle client information — drafting emails, analyzing contracts, generating reports — that processing must be listed. It’s an obligation, not an option.

Don’t put sensitive data in your prompts

Basic rule, regularly violated. An accountant pasting a pay slip into ChatGPT to “summarize the key elements,” a salesperson pasting a full client history to prep an offer — these habits expose your company. Train your teams. Now.

“GDPR compliance doesn’t stop at your website. It extends to every tool you use to process data, including AIs.” — CNIL position, 2024

The AI Act: an accelerating timeline

The AI Act isn’t a distant law. Its rollout follows a precise schedule:

  • February 2025: Ban on AI practices with unacceptable risk (manipulation, social scoring)
  • August 2025: Obligations for GPAI models (transparency, copyright)
  • August 2026: Full entry into force for high-risk systems

The sectors covered by the “high risk” category include healthcare, education, HR, justice, and critical infrastructure. If you’re in one of these sectors and you use AI — a specific compliance obligation will apply to you by 2026.

Timeline of the European AI Act rollout from 2024 to 2026

The sanctions provided by the AI Act are significant: up to €35 million or 7% of global turnover for the most serious violations. For GPAIs non-compliant on copyright: €15 million or 3% of turnover. These aren’t symbolic numbers.

What we actually do at GDM-Pixel

We use AI heavily — Claude Code for code generation, automated pipelines for content, n8n workflows for our clients. It’s not a marketing posture, it’s our daily production tool.

And precisely because we use it for real, we had to take these questions seriously.

Our approach comes down to three points. First, we never put real client data into our prompts — we work with anonymized or fictional data during testing and development. Second, we documented our AI usage in our GDPR processing register — done, up to date. Third, we pick our tools based on their contractual guarantees on data, not only on their technical performance.

Does it slow down our workflow? Marginally. Does it protect us legally and let us reassure our SME clients on our seriousness? Yes, clearly.

Three points to act on right now

1. Audit your current AI tools. List every AI tool you use. For each one, check its data policy. Drop those that offer no guarantee on the use of your prompts.

2. Update your processing register. If you use AI to handle personal data (clients, employees, prospects), add these processing activities to your GDPR register. The CNIL can ask for it at any time.

3. Train your teams before an incident happens. Most AI-related data breaches don’t come from a technical flaw — they come from an employee who put the wrong info into the wrong tool. An hour of training beats a breach notification to the CNIL.

Compliance isn’t a brake on innovation — it’s a competitive advantage

Here’s what I see on the ground: companies taking AI compliance seriously today are building client trust their competitors don’t have. In a market where AI still generates mistrust among many decision-makers, being able to say “we use AI, and here’s how we protect your data” is a real commercial argument.

The AI Act and the GDPR aren’t going away. Audits will intensify. Sanctions will land — on the big players first, but pressure will trickle down. Better be ready.

Got questions on the compliance of your AI tools, or want to audit your AI usage against GDPR? That’s exactly the kind of diagnosis we run at GDM-Pixel. Get in touch — we’ll give you a ground-level take, without unnecessary legal jargon and without selling you a rebuild if it’s not needed.


Sources: Regulation (EU) 2024/1689 — AI Act, CNIL 2023 Annual Report, CNIL Recommendations on Generative AI

Charles Annoni

Charles Annoni

Front-End Developer and Trainer

Charles Annoni has been helping companies with their web development since 2008. He is also a trainer in higher education.