LLM based rewrites

All of lore.kernel.org
 help / color / mirror / Atom feed

* LLM based rewrites
@ 2026-03-07 20:49 Christian Brauner
  2026-03-09 13:57 ` Steven Rostedt
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Christian Brauner @ 2026-03-07 20:49 UTC (permalink / raw)
  To: tech-board-discuss
  Cc: Christian Brauner, linux-kernel, ksummit-discuss,
	christianvanbrauner

Hey,

I believe it is a rite of passage to at least once cause a shouting
match with a non-technical topic.

It seems increasingly viable to rewrite an entire codebase using an LLM
and it currently looks like there's at least some examples as in [1]
where people try to use an LLM based rewrite as a clean-room
implementation to relicense the project. I think the FOSDEM talk at [2]
is related to this as well.

Maybe this is a "let's worry about it later" situation but I wonder
whether this is something that the LF or TAB is actively following.

I'm not asking for a legal analysis. I'm mostly looking for reassurance
that we as the kernel community and our representatives have an eye on
this. I find this quite worrisome.

Fwiw, I was made aware that there's a tangentially related discussion on
the distribution mailing list at [3].

Thanks!
Christian

Link: https://github.com/chardet/chardet/issues/327 [1]
Link: https://github.com/chardet/chardet/releases/tag/7.0.0 [1]
Link: https://fosdem.org/2026/schedule/event/SUVS7G-lets_end_open_source_together_with_this_one_simple_trick [2]
Link: https://lore.kernel.org/a3f792e918674e208492a077679ae6ffc88ce0c9.camel@gentoo.org [3]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-07 20:49 LLM based rewrites Christian Brauner
@ 2026-03-09 13:57 ` Steven Rostedt
  2026-03-09 15:31   ` H. Peter Anvin
  2026-03-09 16:05 ` Dave Hansen
  2026-03-09 16:16 ` James Bottomley
  2 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2026-03-09 13:57 UTC (permalink / raw)
  To: Christian Brauner
  Cc: tech-board-discuss, linux-kernel, ksummit-discuss,
	christianvanbrauner

On Sat,  7 Mar 2026 21:49:20 +0100
Christian Brauner <brauner@kernel.org> wrote:

> Hey,

Hi Christian,

> 
> I believe it is a rite of passage to at least once cause a shouting
> match with a non-technical topic.
> 
> It seems increasingly viable to rewrite an entire codebase using an LLM
> and it currently looks like there's at least some examples as in [1]
> where people try to use an LLM based rewrite as a clean-room
> implementation to relicense the project. I think the FOSDEM talk at [2]
> is related to this as well.
> 
> Maybe this is a "let's worry about it later" situation but I wonder
> whether this is something that the LF or TAB is actively following.
> 
> I'm not asking for a legal analysis. I'm mostly looking for reassurance
> that we as the kernel community and our representatives have an eye on
> this. I find this quite worrisome.
> 
> Fwiw, I was made aware that there's a tangentially related discussion on
> the distribution mailing list at [3].

Thanks for bringing this up. I'll bring this up as a topic for our next
meeting. Although it may not be much we can do about it except be aware of
what is happening.

-- Steve


> 
> Thanks!
> Christian
> 
> Link: https://github.com/chardet/chardet/issues/327 [1]
> Link: https://github.com/chardet/chardet/releases/tag/7.0.0 [1]
> Link: https://fosdem.org/2026/schedule/event/SUVS7G-lets_end_open_source_together_with_this_one_simple_trick [2]
> Link: https://lore.kernel.org/a3f792e918674e208492a077679ae6ffc88ce0c9.camel@gentoo.org [3]


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 13:57 ` Steven Rostedt
@ 2026-03-09 15:31   ` H. Peter Anvin
  2026-03-09 16:16     ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2026-03-09 15:31 UTC (permalink / raw)
  To: Steven Rostedt, Christian Brauner
  Cc: tech-board-discuss, linux-kernel, ksummit-discuss,
	christianvanbrauner

On March 9, 2026 6:57:05 AM PDT, Steven Rostedt <rostedt@goodmis.org> wrote:
>On Sat,  7 Mar 2026 21:49:20 +0100
>Christian Brauner <brauner@kernel.org> wrote:
>
>> Hey,
>
>Hi Christian,
>
>> 
>> I believe it is a rite of passage to at least once cause a shouting
>> match with a non-technical topic.
>> 
>> It seems increasingly viable to rewrite an entire codebase using an LLM
>> and it currently looks like there's at least some examples as in [1]
>> where people try to use an LLM based rewrite as a clean-room
>> implementation to relicense the project. I think the FOSDEM talk at [2]
>> is related to this as well.
>> 
>> Maybe this is a "let's worry about it later" situation but I wonder
>> whether this is something that the LF or TAB is actively following.
>> 
>> I'm not asking for a legal analysis. I'm mostly looking for reassurance
>> that we as the kernel community and our representatives have an eye on
>> this. I find this quite worrisome.
>> 
>> Fwiw, I was made aware that there's a tangentially related discussion on
>> the distribution mailing list at [3].
>
>Thanks for bringing this up. I'll bring this up as a topic for our next
>meeting. Although it may not be much we can do about it except be aware of
>what is happening.
>
>-- Steve
>
>
>> 
>> Thanks!
>> Christian
>> 
>> Link: https://github.com/chardet/chardet/issues/327 [1]
>> Link: https://github.com/chardet/chardet/releases/tag/7.0.0 [1]
>> Link: https://fosdem.org/2026/schedule/event/SUVS7G-lets_end_open_source_together_with_this_one_simple_trick [2]
>> Link: https://lore.kernel.org/a3f792e918674e208492a077679ae6ffc88ce0c9.camel@gentoo.org [3]
>
>

It is somewhat hard to see how that would constitute a "clean-room" rewrite. A clean-room rewrite entails two teams, one (the "clean" room) which must be certified to have never seen the code in question, and all communications between the two teams must be auditable.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-07 20:49 LLM based rewrites Christian Brauner
  2026-03-09 13:57 ` Steven Rostedt
@ 2026-03-09 16:05 ` Dave Hansen
  2026-03-09 16:16 ` James Bottomley
  2 siblings, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2026-03-09 16:05 UTC (permalink / raw)
  To: Christian Brauner, tech-board-discuss
  Cc: linux-kernel, ksummit-discuss, christianvanbrauner

On 3/7/26 12:49, Christian Brauner wrote:
> I'm not asking for a legal analysis. I'm mostly looking for reassurance
> that we as the kernel community and our representatives have an eye on
> this. I find this quite worrisome.

Let's say someone did this for the kernel and released it under a more
permissive license. Let's also ignore whether this is legally or morally
naughty or nice for the moment.

Any way you slice it, they'd start with a gigantic Linux-like code base
and effectively zero people to work on it. It would _effectively_ be a
big kernel fork. Maybe it would be different because it's got a more
permissive license. Linus would be proved wrong after all these years
and contributors would flock to the new codebase, throwing off the
chains of the evil GPLv2 that kept Linux from being successful.

Mainline has survived quite a few kernel forks. There would have to be
something pretty darn compelling about this new fork. Considering that
license taste is about as universal as vi/emacs taste, I doubt the
license itself would be compelling on its own. Maybe the new kernel
would be 100% rust? Maybe it would be more friendly to LLM maintenance?

Either way, Magic 8 Ball says: "Concentrate and ask again". So this does
sound like a great TAB topic! ;)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-07 20:49 LLM based rewrites Christian Brauner
  2026-03-09 13:57 ` Steven Rostedt
  2026-03-09 16:05 ` Dave Hansen
@ 2026-03-09 16:16 ` James Bottomley
  2 siblings, 0 replies; 16+ messages in thread
From: James Bottomley @ 2026-03-09 16:16 UTC (permalink / raw)
  To: Christian Brauner, tech-board-discuss
  Cc: linux-kernel, ksummit-discuss, christianvanbrauner

On Sat, 2026-03-07 at 21:49 +0100, Christian Brauner wrote:
[...
> Fwiw, I was made aware that there's a tangentially related discussion
> on the distribution mailing list at [3].

The lawyers on the European Legal Network are already debating this
point.  However, I really doubt that feeding the code base into a LLM
and saying rewrite it will pass muster at least under the US legal
tests for originality: it's the same reason why you can't give an
engineering team the code base and say rewrite it.  You have to apply
the clean room principles of one team reducing the code base to an API
reference and a separate team reimplementing it as a new code base to
get the required separation from being a derived work.

My point not being the legal one, but the technical one that the clean
room reduction to an API and back to code can be done by independent
LLMs, but likely not on the scale that something as big as the kernel
would require.  So it could work for small code bases but would be
prohibitively costly for huge ones.

Regards,

James

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 15:31   ` H. Peter Anvin
@ 2026-03-09 16:16     ` Steven Rostedt
  2026-03-09 16:33       ` Jonathan Corbet
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2026-03-09 16:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christian Brauner, tech-board-discuss, linux-kernel,
	ksummit-discuss, christianvanbrauner

On Mon, 09 Mar 2026 08:31:03 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> It is somewhat hard to see how that would constitute a "clean-room"
> rewrite. A clean-room rewrite entails two teams, one (the "clean" room)
> which must be certified to have never seen the code in question, and all
> communications between the two teams must be auditable.

I was thinking the same.

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 16:16     ` Steven Rostedt
@ 2026-03-09 16:33       ` Jonathan Corbet
  2026-03-09 16:55         ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Corbet @ 2026-03-09 16:33 UTC (permalink / raw)
  To: Steven Rostedt, H. Peter Anvin
  Cc: Christian Brauner, tech-board-discuss, linux-kernel,
	ksummit-discuss, christianvanbrauner

Steven Rostedt <rostedt@goodmis.org> writes:

> On Mon, 09 Mar 2026 08:31:03 -0700
> "H. Peter Anvin" <hpa@zytor.com> wrote:
>
>> It is somewhat hard to see how that would constitute a "clean-room"
>> rewrite. A clean-room rewrite entails two teams, one (the "clean" room)
>> which must be certified to have never seen the code in question, and all
>> communications between the two teams must be auditable.
>
> I was thinking the same.

The argumentation that is being made (which I am trying to reproduce but
am *not* advocating) is that "a clean-room rewrite is just one means to
an end" and that, in this specific case, the code being rewritten was
explicitly excluded from the context given to the bot (though that turns
out not to entirely be the case).  In theory, it only had the desired
API and a set of tests available to it.

The fact that every version of chardet was surely in its training data
is not deemed to be relevant.

jon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 16:33       ` Jonathan Corbet
@ 2026-03-09 16:55         ` H. Peter Anvin
  2026-03-09 17:09           ` H. Peter Anvin
                             ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: H. Peter Anvin @ 2026-03-09 16:55 UTC (permalink / raw)
  To: Jonathan Corbet, Steven Rostedt
  Cc: Christian Brauner, tech-board-discuss, linux-kernel,
	ksummit-discuss, christianvanbrauner

On March 9, 2026 9:33:12 AM PDT, Jonathan Corbet <corbet@lwn.net> wrote:
>Steven Rostedt <rostedt@goodmis.org> writes:
>
>> On Mon, 09 Mar 2026 08:31:03 -0700
>> "H. Peter Anvin" <hpa@zytor.com> wrote:
>>
>>> It is somewhat hard to see how that would constitute a "clean-room"
>>> rewrite. A clean-room rewrite entails two teams, one (the "clean" room)
>>> which must be certified to have never seen the code in question, and all
>>> communications between the two teams must be auditable.
>>
>> I was thinking the same.
>
>The argumentation that is being made (which I am trying to reproduce but
>am *not* advocating) is that "a clean-room rewrite is just one means to
>an end" and that, in this specific case, the code being rewritten was
>explicitly excluded from the context given to the bot (though that turns
>out not to entirely be the case).  In theory, it only had the desired
>API and a set of tests available to it.
>
>The fact that every version of chardet was surely in its training data
>is not deemed to be relevant.
>
>jon
>

That's a question for the lawyers and the courts, really. But it is most definitely *not* clean room. That being said, clean room is certainly not the only way to rewrite software that can pass legal muster, but it is the gold standard

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 16:55         ` H. Peter Anvin
@ 2026-03-09 17:09           ` H. Peter Anvin
  2026-03-09 18:19           ` James Bottomley
  2026-03-10  4:52           ` Theodore Tso
  2 siblings, 0 replies; 16+ messages in thread
From: H. Peter Anvin @ 2026-03-09 17:09 UTC (permalink / raw)
  To: Jonathan Corbet, Steven Rostedt
  Cc: Christian Brauner, tech-board-discuss, linux-kernel,
	ksummit-discuss, christianvanbrauner

On March 9, 2026 9:55:36 AM PDT, "H. Peter Anvin" <hpa@zytor.com> wrote:
>On March 9, 2026 9:33:12 AM PDT, Jonathan Corbet <corbet@lwn.net> wrote:
>>Steven Rostedt <rostedt@goodmis.org> writes:
>>
>>> On Mon, 09 Mar 2026 08:31:03 -0700
>>> "H. Peter Anvin" <hpa@zytor.com> wrote:
>>>
>>>> It is somewhat hard to see how that would constitute a "clean-room"
>>>> rewrite. A clean-room rewrite entails two teams, one (the "clean" room)
>>>> which must be certified to have never seen the code in question, and all
>>>> communications between the two teams must be auditable.
>>>
>>> I was thinking the same.
>>
>>The argumentation that is being made (which I am trying to reproduce but
>>am *not* advocating) is that "a clean-room rewrite is just one means to
>>an end" and that, in this specific case, the code being rewritten was
>>explicitly excluded from the context given to the bot (though that turns
>>out not to entirely be the case).  In theory, it only had the desired
>>API and a set of tests available to it.
>>
>>The fact that every version of chardet was surely in its training data
>>is not deemed to be relevant.
>>
>>jon
>>
>
>That's a question for the lawyers and the courts, really. But it is most definitely *not* clean room. That being said, clean room is certainly not the only way to rewrite software that can pass legal muster, but it is the gold standard
In the end, though, it comes down to the plain fact that LLMs have pushed copyright law into undefined territory. As Uber showed, sometimes the strategy of doing something that is at the very best questionable legally can be successful if you can get spread broadly enough quickly enough so that the political process overtakes the legal one.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 16:55         ` H. Peter Anvin
  2026-03-09 17:09           ` H. Peter Anvin
@ 2026-03-09 18:19           ` James Bottomley
  2026-03-09 18:34             ` Steven Rostedt
  2026-03-10  4:52           ` Theodore Tso
  2 siblings, 1 reply; 16+ messages in thread
From: James Bottomley @ 2026-03-09 18:19 UTC (permalink / raw)
  To: H. Peter Anvin, Jonathan Corbet, Steven Rostedt
  Cc: Christian Brauner, tech-board-discuss, linux-kernel,
	ksummit-discuss, christianvanbrauner

On Mon, 2026-03-09 at 09:55 -0700, H. Peter Anvin wrote:
> On March 9, 2026 9:33:12 AM PDT, Jonathan Corbet <corbet@lwn.net>
> wrote:
> > Steven Rostedt <rostedt@goodmis.org> writes:
> > 
> > > On Mon, 09 Mar 2026 08:31:03 -0700
> > > "H. Peter Anvin" <hpa@zytor.com> wrote:
> > > 
> > > > It is somewhat hard to see how that would constitute a "clean-
> > > > room"
> > > > rewrite. A clean-room rewrite entails two teams, one (the
> > > > "clean" room)
> > > > which must be certified to have never seen the code in
> > > > question, and all
> > > > communications between the two teams must be auditable.
> > > 
> > > I was thinking the same.
> > 
> > The argumentation that is being made (which I am trying to
> > reproduce but
> > am *not* advocating) is that "a clean-room rewrite is just one
> > means to
> > an end" and that, in this specific case, the code being rewritten
> > was
> > explicitly excluded from the context given to the bot (though that
> > turns
> > out not to entirely be the case).  In theory, it only had the
> > desired
> > API and a set of tests available to it.
> > 
> > The fact that every version of chardet was surely in its training
> > data
> > is not deemed to be relevant.
> > 
> > jon
> > 
> 
> That's a question for the lawyers and the courts, really. But it is
> most definitely *not* clean room. That being said, clean room is
> certainly not the only way to rewrite software that can pass legal
> muster, but it is the gold standard

Agreed.  The specific problem is that The US Copyright definition of
derivation presumes that if you've had exposure to the original work
then anything you produce that's similar is a derivative.  That doesn't
mean that you can't produce a non derivative similar work it's just the
burden of proof shifts to you to prove that in creating the similar
work you didn't include any elements of the original.  This is a
phenomenally difficult thing to prove in court (at least for humans)
which is why clean room reverse engineering was developed ... because
you can demonstrate the required non-exposure to the original by the
separation of the two teams.

I don't think LLMs will be able to come up with the necessary proof of
separation without essentially recreating the clean room process, which
grows cost prohibitive as the complexity of the work increases.

Regards,

James





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 18:19           ` James Bottomley
@ 2026-03-09 18:34             ` Steven Rostedt
  2026-03-09 18:38               ` Dr. David Alan Gilbert
  2026-03-09 18:54               ` James Bottomley
  0 siblings, 2 replies; 16+ messages in thread
From: Steven Rostedt @ 2026-03-09 18:34 UTC (permalink / raw)
  To: James Bottomley
  Cc: H. Peter Anvin, Jonathan Corbet, Christian Brauner,
	tech-board-discuss, linux-kernel, ksummit-discuss,
	christianvanbrauner

On Mon, 09 Mar 2026 11:19:42 -0700
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> I don't think LLMs will be able to come up with the necessary proof of
> separation without essentially recreating the clean room process, which
> grows cost prohibitive as the complexity of the work increases.

I con see one AI bot reading the original code, and then making the prompts
to pass to a second AI bot to produce the code.

That is basically exactly how clean rooms work for humans. Now the question
is, would courts agree that is a clean room?

-- Steve

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 18:34             ` Steven Rostedt
@ 2026-03-09 18:38               ` Dr. David Alan Gilbert
  2026-03-09 18:54               ` James Bottomley
  1 sibling, 0 replies; 16+ messages in thread
From: Dr. David Alan Gilbert @ 2026-03-09 18:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: James Bottomley, H. Peter Anvin, Jonathan Corbet,
	Christian Brauner, tech-board-discuss, linux-kernel,
	ksummit-discuss, christianvanbrauner

* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Mon, 09 Mar 2026 11:19:42 -0700
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > I don't think LLMs will be able to come up with the necessary proof of
> > separation without essentially recreating the clean room process, which
> > grows cost prohibitive as the complexity of the work increases.
> 
> I con see one AI bot reading the original code, and then making the prompts
> to pass to a second AI bot to produce the code.

That's what Chardet did:
https://github.com/chardet/chardet/issues/327#issuecomment-4005195078

however, there's a fun question of whether the second AI had already seen
the code; a lot of LLMs seem to have been trained on a lot of open
source code already.

Dave

> -- Steve
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 18:34             ` Steven Rostedt
  2026-03-09 18:38               ` Dr. David Alan Gilbert
@ 2026-03-09 18:54               ` James Bottomley
  1 sibling, 0 replies; 16+ messages in thread
From: James Bottomley @ 2026-03-09 18:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Jonathan Corbet, Christian Brauner,
	tech-board-discuss, linux-kernel, ksummit-discuss,
	christianvanbrauner

On Mon, 2026-03-09 at 14:34 -0400, Steven Rostedt wrote:
> On Mon, 09 Mar 2026 11:19:42 -0700
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
> > I don't think LLMs will be able to come up with the necessary proof
> > of separation without essentially recreating the clean room
> > process, which grows cost prohibitive as the complexity of the work
> > increases.
> 
> I con see one AI bot reading the original code, and then making the
> prompts to pass to a second AI bot to produce the code.
> 
> That is basically exactly how clean rooms work for humans. Now the
> question is, would courts agree that is a clean room?

well, yes, because you can demonstrate clean room separation which is
already legally acknowledged as independent invention.

My argument isn't that LLMs can't do this.  It's that doing it is way
more expensive than simply feeding the code into a LLM and asking for
independent reinvention, which is what the guy did to chardet.  In fact
I contend that reducing something to its API and then reconstructing it
is difficult to scale (and certainly is very costly in LLM tokens)
which is why we shouldn't necessarily fear it will be done to the
kernel.

Regards,

James

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-09 16:55         ` H. Peter Anvin
  2026-03-09 17:09           ` H. Peter Anvin
  2026-03-09 18:19           ` James Bottomley
@ 2026-03-10  4:52           ` Theodore Tso
       [not found]             ` <CAMTJT3_cVaA7aJmDa6j288-qwP3jzvM_R2pdk+XmE+1U=Sovbg@mail.gmail.com>
  2 siblings, 1 reply; 16+ messages in thread
From: Theodore Tso @ 2026-03-10  4:52 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Jonathan Corbet, Steven Rostedt, Christian Brauner,
	tech-board-discuss, linux-kernel, ksummit-discuss,
	christianvanbrauner

> >The fact that every version of chardet was surely in its training data
> >is not deemed to be relevant.
>
> That's a question for the lawyers and the courts, really. But it is
> most definitely *not* clean room. That being said, clean room is
> certainly not the only way to rewrite software that can pass legal
> muster, but it is the gold standard

Well, given that researchers were able to elicit 96% of Harry Potter
and the Sorcerer's Stone from Claude 3.7 Sonnet[1], the question I
have is that if you have one LLM instance create a specification from
looking at the code that you are trying to clone, and then you have a
second LLM instance that was trained on the code you are trying to
clone, and then fed the specification --- regardless of whether this
can be considered "clean room" from a process perpsective, the other
question is just whether there is enough similarity in the actual
*results*, that could also be a problem.

[1] https://arxiv.org/html/2601.02671v1

Of course, we could imagine using the LLM to incrementally rerite the
C code that was elicited from the specification if the results are too
closely to the source program --- that is, "Hey ChatGPT, please file
off the serial number so the source code looks nothing like the GPL
code that I'm trying to rip off."

The thing is, though, this is something that humans could do as well,
It wouldn't surprise me if there are cases of "clean room
implementation" where there might be some incremental rewriting; and
proving that it wasn't a strict clean room procedure might be quite
difficult.  It's just that with AI, it might be easier to do things at
scale.

						- Ted

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
       [not found]             ` <CAMTJT3_cVaA7aJmDa6j288-qwP3jzvM_R2pdk+XmE+1U=Sovbg@mail.gmail.com>
@ 2026-03-10 12:47               ` Theodore Tso
  2026-03-10 14:10                 ` Dr. Greg
  0 siblings, 1 reply; 16+ messages in thread
From: Theodore Tso @ 2026-03-10 12:47 UTC (permalink / raw)
  To: EJ Stinson
  Cc: H. Peter Anvin, Jonathan Corbet, Steven Rostedt,
	Christian Brauner, tech-board-discuss, linux-kernel,
	ksummit-discuss, christianvanbrauner

On Mon, Mar 09, 2026 at 10:15:28PM -0700, EJ Stinson wrote:
> Imagine if a rouge AI got access to rewriting the kernel, or was exploited,
> this would lead to near certain catastrophe. LLM’s should not rewrite the
> code, as if somehow a AI were to achieve singularity or go rouge/be
> attacked by an anarchistic/foreign actor, think about the amount of code it
> could sneak in without human suspicion, or just lead to human ignorance. I
> think for the time being until we know for certain, there should be no
> reason to use LLM’s to help rewrite at scale any sort of code. Even if we
> were able to prove it wasn’t stolen code; the time spent on proving such
> fact, and ensuring the security, would already take way too long tomerit
> this sort of use.

I think you're misunderstanding the concern that was raised at FOSDEM;
which is that it is now possible for companies to take code that might
be licensed under a license such as the GPL, and ask AI to to do a
"clean room rewrite" and make then that code could be used or
relicensed under a more permissive license, such as Apache or BSD ---
or the company might take that code and use it in a proprietary
codebase.

"It's the end of the world as we know it....."

There are a couple of problems with their premise.  The first is that
they demonstrated this on some very simple bits of Javascript.  It's
not clear whether this would work at *all* on something more
complicated, never mind something like the Linux kernel.

The second is the legal issues, and there are multiple dimensions
whether the resulting code really would be considered free and clear
for relicensing.

And the third is whether it would really result in more secure code
(which was their premise for why some companies might do this, since
the people giving the presentation at FOSDEM were security
researchers).  Given that AI generated code is generally *more* likely
to have security vulnerabilities than human written code, this
assumption seems dubious to me.  Also if the security vulnerability is
inherent in the software architecture, having the first LLM generate a
spec might result in a *spec* which is buggy / vulnerable, and so when
the second LLM translates that spec back into C code, not only might
it introduce new security vulnerabiities, the original security
vulnerability present in the source implementaiton might be preserved.

The bottom line is that I rate the FOSDEM as being 10/10 when they
talk about the history of copyright, 9/10 when they talked about the
history of clean room reimplementation (which has been around since
humans has been around), when they talk about what's possible in the
present, I'd give them a 3/10, and when they talk about the future,
I'd rate their talk at 5/10 --- since their whole point was to start a
conversation, and they certainly did that.

One thing we need to remember though is that we don't have the power
to stop people from doing this.  For that matter, it could be that
there are sweatshops in some third world country where people have
been reimplementing open source code into propretiary code, and that
could have been happening for years or even decades --- if the
resulting rewrite gets used in some propetiary code case, we'd never
know about it.

The only thing AI could potentially do is to democratize this, so that
any random person with a few thousand dollars of AI LLM credits might
be able to attempt this.  And even if today the LLM's aren't really up
to the task for non-trivial programs, that could change over time.

If that happens though, it's not just Open Source that is going to be
affected.  There are lots of people predicting that people graduating
with CS degrees are going to be left begging in the streets since
whether we're talking about new proprietary code or new open source
code, an AI bot, perhaps with some help with a senior developer to
guide the LLM, will mean that we won't need all that many (or perhaps
*any*) junior programmers.  Is that hysteria and overblown hyperbole?
Maybe.

The other possibility is that this will be the beginning of something
similar to what happened to the replacement of textile artisans that
made cloth by hand in early 1800's, when mechanized power looms made
their jobs.... obsolete.  Look up "Luddite" in wikipedia for more
details.  What happened really *sucked* for the people who made cloth
the old way, and but the result was the ability for people to buy
shirts for something significantly less that the a year's worth of
wages for the average laborer.

Will AI do to Software Engineers with the early industrial revelotion
in England did to people like Ned Ludd?  Who knows?  But if it
happens, it isn't going to be just Open Source that will be affected.
And in the meantime, people who design clothes and fabric patterns
will have jobs, even today in the 21st century.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: LLM based rewrites
  2026-03-10 12:47               ` Theodore Tso
@ 2026-03-10 14:10                 ` Dr. Greg
  0 siblings, 0 replies; 16+ messages in thread
From: Dr. Greg @ 2026-03-10 14:10 UTC (permalink / raw)
  To: Theodore Tso
  Cc: EJ Stinson, H. Peter Anvin, Jonathan Corbet, Steven Rostedt,
	Christian Brauner, tech-board-discuss, linux-kernel,
	ksummit-discuss, christianvanbrauner

On Tue, Mar 10, 2026 at 08:47:21AM -0400, Theodore Tso wrote:

Good morning, I hope the week is going well for everyone.

> On Mon, Mar 09, 2026 at 10:15:28PM -0700, EJ Stinson wrote:
> > Imagine if a rouge AI got access to rewriting the kernel, or was
> > exploited, this would lead to near certain catastrophe. LLM's
> > should not rewrite the code, as if somehow a AI were to achieve
> > singularity or go rouge/be attacked by an anarchistic/foreign
> > actor, think about the amount of code it could sneak in without
> > human suspicion, or just lead to human ignorance. I think for the
> > time being until we know for certain, there should be no reason to
> > use LLM's to help rewrite at scale any sort of code. Even if we
> > were able to prove it wasn???t stolen code; the time spent on
> > proving such fact, and ensuring the security, would already take
> > way too long tomerit this sort of use.

> And the third is whether it would really result in more secure code
> (which was their premise for why some companies might do this, since
> the people giving the presentation at FOSDEM were security
> researchers).  Given that AI generated code is generally *more* likely
> to have security vulnerabilities than human written code, this
> assumption seems dubious to me.  Also if the security vulnerability is
> inherent in the software architecture, having the first LLM generate a
> spec might result in a *spec* which is buggy / vulnerable, and so when
> the second LLM translates that spec back into C code, not only might
> it introduce new security vulnerabiities, the original security
> vulnerability present in the source implementaiton might be preserved.

It would seem that if some enterprising individual or more likely a
major technology company, with sufficient resources, told an LLM to
simply convert the entire kernel to Rust, that would be the end of
kernel security vulnerabilities as we know it, not?

Then, if said enterprising individual or corporation slapped the GPL
on the result and pushed it to GitHub, mankind would be saved as we
know it.

In the spirit of Christian's intention to inspire conversation... :-)

> Cheers,
> 
> 						- Ted

Have a good remainder of the week.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-03-10 14:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07 20:49 LLM based rewrites Christian Brauner
2026-03-09 13:57 ` Steven Rostedt
2026-03-09 15:31   ` H. Peter Anvin
2026-03-09 16:16     ` Steven Rostedt
2026-03-09 16:33       ` Jonathan Corbet
2026-03-09 16:55         ` H. Peter Anvin
2026-03-09 17:09           ` H. Peter Anvin
2026-03-09 18:19           ` James Bottomley
2026-03-09 18:34             ` Steven Rostedt
2026-03-09 18:38               ` Dr. David Alan Gilbert
2026-03-09 18:54               ` James Bottomley
2026-03-10  4:52           ` Theodore Tso
     [not found]             ` <CAMTJT3_cVaA7aJmDa6j288-qwP3jzvM_R2pdk+XmE+1U=Sovbg@mail.gmail.com>
2026-03-10 12:47               ` Theodore Tso
2026-03-10 14:10                 ` Dr. Greg
2026-03-09 16:05 ` Dave Hansen
2026-03-09 16:16 ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.