Re: on ai generated and code provenance

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Warner Losh <imp@bsdimp.com>, Paolo Bonzini <pbonzini@redhat.com>,
	qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: on ai generated and code provenance
Date: Mon, 25 May 2026 18:36:05 -0400	[thread overview]
Message-ID: <20260525183441-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAJSP0QUmh4RLFUg9-5Uky36Tjhr_cLkrE2VAC8K_MdSY9p0WZw@mail.gmail.com>

On Mon, May 25, 2026 at 03:44:02PM -0400, Stefan Hajnoczi wrote:
> On Mon, May 25, 2026 at 1:17 PM Warner Losh <imp@bsdimp.com> wrote:
> > On Mon, May 25, 2026 at 10:34 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> On 5/24/26 14:42, Michael S. Tsirkin wrote:
> >> >       How contributors could comply with DCO terms (b) or (c) for the output of AI
> >> >       content generators commonly available today is unclear.  The QEMU project is
> >> >       not willing or able to accept the legal risks of non-compliance.
> >> >
> >> > But, since this was written, Red Hat's Richard Fontana and Chris Wright
> >> > published this piece:
> >> > https://www.redhat.com/en/blog/ai-assisted-development-and-open-source-navigating-legal-issues
> >> >
> >> > Saying, in particular
> >> >       We understand this concern, but the DCO has never
> >> >       been interpreted to require that every line of a contribution must be
> >> >       the personal creative expression of the contributor or another human
> >> >       developer.
> >> This is not the objection or the worry; rather the question is, what if
> >> the contribution is a creative expression of someone that could claim
> >> copyright in it.  In fact, looking at the Linux policy...
> >>
> >>    Signed-off-by and Developer Certificate of Origin
> >>    =================================================
> >>
> >>    AI agents MUST NOT add Signed-off-by tags. Only humans can legally
> >>    certify the Developer Certificate of Origin (DCO). The human submitter
> >>    is responsible for:
> >>
> >>    * Reviewing all AI-generated code
> >>    * Ensuring compliance with licensing requirements
> >>    * Adding their own Signed-off-by tag to certify the DCO
> >>    * Taking full responsibility for the contribution
> >>
> >> ... the question is how humans can actually do the second step.  The
> >> piece you posted above says: "with disclosure and human attentiveness –
> >> and oversight – aided where possible by tools that check for code
> >> similarity, AI-assisted contributions can be entirely compatible with
> >> the spirit of the DCO".
> >
> >
> > The code produced by AI agents has no copyright. You can incorporate
> > public domain code into your work and have the absolute right to license
> > it (see all the Diseny movies). The notion that LLMs wholesale copy originates
> > from the earliest days of Copilot and turned out were contrived. No recent
> > evidence shows that plagiarism is a concern. To the extent that I modify
> > public domain code, I have a copyright that I can choose to license
> > however I want (and the SOB says it's compatible).
> 
> There is an active field of research on memorization and the status is
> that LLMs do memorize. A paper from 2026
> (https://arxiv.org/pdf/2601.02671) shows that production models can
> output significant chunks of Harry Potter, although the research
> deliberately extracts training inputs rather than doing so
> accidentally. I am sharing this because I don't think it's correct to
> say that concerns about models outputting copyrighted code are
> outdated.

But the concern is with them doing it *accidentally*.
Because willful infringement was always possible.
And that does not seem to be happening.


> I do think that the risk for coding use cases is low as long as LLMs
> are used sensibly. If not, legal cases would have popped up by now.
> 
> The example of ext4 for OpenBSD (https://lwn.net/Articles/1064541/)
> comes to mind as a case where LLMs were used in a risky way and
> maintainers decided to reject the code. Even though the output of AI
> has no copyright, when there is no suitably-licensed information to
> generate the code from, then it is risky to assume AI generated code
> is free from copyright, license, patent, etc effects.
> 
> As long as we keep the usual practices around intellectual property in
> mind when merging code, then I think the risk of copyright issues is
> low and not a blocker for accepting AI generated contributions.
> 
> Stefan

next prev parent reply	other threads:[~2026-05-25 22:36 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-24 12:42 on ai generated and code provenance Michael S. Tsirkin
2026-05-24 17:06 ` Alex Bennée
2026-05-24 17:42   ` Michael S. Tsirkin
2026-05-24 18:26   ` Warner Losh
2026-05-24 20:04     ` Michael S. Tsirkin
2026-05-24 20:11   ` Michael S. Tsirkin
2026-05-24 20:44     ` Stefan Hajnoczi
2026-05-25 15:27       ` Stefan Hajnoczi
2026-05-25 16:32 ` Paolo Bonzini
2026-05-25 17:15   ` Warner Losh
2026-05-25 19:44     ` Stefan Hajnoczi
2026-05-25 22:36       ` Michael S. Tsirkin [this message]
2026-05-26 13:16         ` Stefan Hajnoczi
2026-05-25 19:56     ` Paolo Bonzini
2026-05-26 21:48     ` Philippe Mathieu-Daudé
2026-05-26  8:23   ` Peter Maydell
2026-05-26  9:28     ` Alex Bennée
2026-05-26  9:57     ` Paolo Bonzini
2026-05-26 11:27       ` BALATON Zoltan
2026-05-26 12:30         ` Michael S. Tsirkin
2026-05-26 12:37           ` Manos Pitsidianakis
2026-05-26 13:00             ` Michael S. Tsirkin
2026-05-26 13:22         ` Stefan Hajnoczi
2026-05-26 14:01           ` Warner Losh
2026-05-27  7:11     ` Philippe Mathieu-Daudé
2026-05-26 17:43 ` Kevin Wolf
2026-05-26 18:03   ` Michael S. Tsirkin
2026-05-26 18:59     ` Kevin Wolf
2026-05-26 19:30       ` Michael S. Tsirkin
2026-05-26 19:52         ` Warner Losh
2026-05-27  8:41           ` Kevin Wolf
2026-05-27 10:01             ` Paolo Bonzini
2026-05-27 10:43               ` Alex Bennée
2026-05-27 12:49                 ` Kevin Wolf
2026-05-27 10:53               ` Kevin Wolf
2026-05-27 12:33                 ` Paolo Bonzini
2026-05-27 12:43                   ` Michael S. Tsirkin
2026-05-27 10:54               ` Alistair Francis
2026-05-27 14:21                 ` Warner Losh
2026-05-28  1:59                   ` Alistair Francis
2026-05-28  5:06                     ` Michael S. Tsirkin
2026-05-28  7:32                       ` Paolo Bonzini
2026-05-27 14:11               ` Michael S. Tsirkin
2026-05-27 14:14               ` Warner Losh
2026-05-27 14:51                 ` Kevin Wolf
2026-05-27 16:41                   ` Michael S. Tsirkin
2026-05-27 16:50                     ` Kevin Wolf
2026-05-27 16:56                       ` Michael S. Tsirkin
2026-05-27 17:06                       ` Michael S. Tsirkin
2026-05-27 17:15                         ` Warner Losh
2026-05-27 17:07                       ` Warner Losh
2026-05-27 16:05                 ` Paolo Bonzini
2026-05-27 16:48                   ` Michael S. Tsirkin
2026-05-27 16:57                     ` Warner Losh
2026-05-27 17:05                       ` Michael S. Tsirkin
2026-05-27 17:48                       ` Paolo Bonzini
2026-05-27 16:39               ` Michael S. Tsirkin
2026-05-26 19:50       ` Michael S. Tsirkin
2026-05-27  7:44         ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260525183441-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=imp@bsdimp.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.