* LLM based rewrites
@ 2026-03-07 20:49 Christian Brauner
2026-03-09 13:57 ` Steven Rostedt
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Christian Brauner @ 2026-03-07 20:49 UTC (permalink / raw)
To: tech-board-discuss
Cc: Christian Brauner, linux-kernel, ksummit-discuss,
christianvanbrauner
Hey,
I believe it is a rite of passage to at least once cause a shouting
match with a non-technical topic.
It seems increasingly viable to rewrite an entire codebase using an LLM
and it currently looks like there's at least some examples as in [1]
where people try to use an LLM based rewrite as a clean-room
implementation to relicense the project. I think the FOSDEM talk at [2]
is related to this as well.
Maybe this is a "let's worry about it later" situation but I wonder
whether this is something that the LF or TAB is actively following.
I'm not asking for a legal analysis. I'm mostly looking for reassurance
that we as the kernel community and our representatives have an eye on
this. I find this quite worrisome.
Fwiw, I was made aware that there's a tangentially related discussion on
the distribution mailing list at [3].
Thanks!
Christian
Link: https://github.com/chardet/chardet/issues/327 [1]
Link: https://github.com/chardet/chardet/releases/tag/7.0.0 [1]
Link: https://fosdem.org/2026/schedule/event/SUVS7G-lets_end_open_source_together_with_this_one_simple_trick [2]
Link: https://lore.kernel.org/a3f792e918674e208492a077679ae6ffc88ce0c9.camel@gentoo.org [3]
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: LLM based rewrites 2026-03-07 20:49 LLM based rewrites Christian Brauner @ 2026-03-09 13:57 ` Steven Rostedt 2026-03-09 15:31 ` H. Peter Anvin 2026-03-09 16:05 ` Dave Hansen 2026-03-09 16:16 ` James Bottomley 2 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2026-03-09 13:57 UTC (permalink / raw) To: Christian Brauner Cc: tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On Sat, 7 Mar 2026 21:49:20 +0100 Christian Brauner <brauner@kernel.org> wrote: > Hey, Hi Christian, > > I believe it is a rite of passage to at least once cause a shouting > match with a non-technical topic. > > It seems increasingly viable to rewrite an entire codebase using an LLM > and it currently looks like there's at least some examples as in [1] > where people try to use an LLM based rewrite as a clean-room > implementation to relicense the project. I think the FOSDEM talk at [2] > is related to this as well. > > Maybe this is a "let's worry about it later" situation but I wonder > whether this is something that the LF or TAB is actively following. > > I'm not asking for a legal analysis. I'm mostly looking for reassurance > that we as the kernel community and our representatives have an eye on > this. I find this quite worrisome. > > Fwiw, I was made aware that there's a tangentially related discussion on > the distribution mailing list at [3]. Thanks for bringing this up. I'll bring this up as a topic for our next meeting. Although it may not be much we can do about it except be aware of what is happening. -- Steve > > Thanks! > Christian > > Link: https://github.com/chardet/chardet/issues/327 [1] > Link: https://github.com/chardet/chardet/releases/tag/7.0.0 [1] > Link: https://fosdem.org/2026/schedule/event/SUVS7G-lets_end_open_source_together_with_this_one_simple_trick [2] > Link: https://lore.kernel.org/a3f792e918674e208492a077679ae6ffc88ce0c9.camel@gentoo.org [3] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 13:57 ` Steven Rostedt @ 2026-03-09 15:31 ` H. Peter Anvin 2026-03-09 16:16 ` Steven Rostedt 0 siblings, 1 reply; 16+ messages in thread From: H. Peter Anvin @ 2026-03-09 15:31 UTC (permalink / raw) To: Steven Rostedt, Christian Brauner Cc: tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On March 9, 2026 6:57:05 AM PDT, Steven Rostedt <rostedt@goodmis.org> wrote: >On Sat, 7 Mar 2026 21:49:20 +0100 >Christian Brauner <brauner@kernel.org> wrote: > >> Hey, > >Hi Christian, > >> >> I believe it is a rite of passage to at least once cause a shouting >> match with a non-technical topic. >> >> It seems increasingly viable to rewrite an entire codebase using an LLM >> and it currently looks like there's at least some examples as in [1] >> where people try to use an LLM based rewrite as a clean-room >> implementation to relicense the project. I think the FOSDEM talk at [2] >> is related to this as well. >> >> Maybe this is a "let's worry about it later" situation but I wonder >> whether this is something that the LF or TAB is actively following. >> >> I'm not asking for a legal analysis. I'm mostly looking for reassurance >> that we as the kernel community and our representatives have an eye on >> this. I find this quite worrisome. >> >> Fwiw, I was made aware that there's a tangentially related discussion on >> the distribution mailing list at [3]. > >Thanks for bringing this up. I'll bring this up as a topic for our next >meeting. Although it may not be much we can do about it except be aware of >what is happening. > >-- Steve > > >> >> Thanks! >> Christian >> >> Link: https://github.com/chardet/chardet/issues/327 [1] >> Link: https://github.com/chardet/chardet/releases/tag/7.0.0 [1] >> Link: https://fosdem.org/2026/schedule/event/SUVS7G-lets_end_open_source_together_with_this_one_simple_trick [2] >> Link: https://lore.kernel.org/a3f792e918674e208492a077679ae6ffc88ce0c9.camel@gentoo.org [3] > > It is somewhat hard to see how that would constitute a "clean-room" rewrite. A clean-room rewrite entails two teams, one (the "clean" room) which must be certified to have never seen the code in question, and all communications between the two teams must be auditable. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 15:31 ` H. Peter Anvin @ 2026-03-09 16:16 ` Steven Rostedt 2026-03-09 16:33 ` Jonathan Corbet 0 siblings, 1 reply; 16+ messages in thread From: Steven Rostedt @ 2026-03-09 16:16 UTC (permalink / raw) To: H. Peter Anvin Cc: Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On Mon, 09 Mar 2026 08:31:03 -0700 "H. Peter Anvin" <hpa@zytor.com> wrote: > It is somewhat hard to see how that would constitute a "clean-room" > rewrite. A clean-room rewrite entails two teams, one (the "clean" room) > which must be certified to have never seen the code in question, and all > communications between the two teams must be auditable. I was thinking the same. -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 16:16 ` Steven Rostedt @ 2026-03-09 16:33 ` Jonathan Corbet 2026-03-09 16:55 ` H. Peter Anvin 0 siblings, 1 reply; 16+ messages in thread From: Jonathan Corbet @ 2026-03-09 16:33 UTC (permalink / raw) To: Steven Rostedt, H. Peter Anvin Cc: Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner Steven Rostedt <rostedt@goodmis.org> writes: > On Mon, 09 Mar 2026 08:31:03 -0700 > "H. Peter Anvin" <hpa@zytor.com> wrote: > >> It is somewhat hard to see how that would constitute a "clean-room" >> rewrite. A clean-room rewrite entails two teams, one (the "clean" room) >> which must be certified to have never seen the code in question, and all >> communications between the two teams must be auditable. > > I was thinking the same. The argumentation that is being made (which I am trying to reproduce but am *not* advocating) is that "a clean-room rewrite is just one means to an end" and that, in this specific case, the code being rewritten was explicitly excluded from the context given to the bot (though that turns out not to entirely be the case). In theory, it only had the desired API and a set of tests available to it. The fact that every version of chardet was surely in its training data is not deemed to be relevant. jon ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 16:33 ` Jonathan Corbet @ 2026-03-09 16:55 ` H. Peter Anvin 2026-03-09 17:09 ` H. Peter Anvin ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: H. Peter Anvin @ 2026-03-09 16:55 UTC (permalink / raw) To: Jonathan Corbet, Steven Rostedt Cc: Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On March 9, 2026 9:33:12 AM PDT, Jonathan Corbet <corbet@lwn.net> wrote: >Steven Rostedt <rostedt@goodmis.org> writes: > >> On Mon, 09 Mar 2026 08:31:03 -0700 >> "H. Peter Anvin" <hpa@zytor.com> wrote: >> >>> It is somewhat hard to see how that would constitute a "clean-room" >>> rewrite. A clean-room rewrite entails two teams, one (the "clean" room) >>> which must be certified to have never seen the code in question, and all >>> communications between the two teams must be auditable. >> >> I was thinking the same. > >The argumentation that is being made (which I am trying to reproduce but >am *not* advocating) is that "a clean-room rewrite is just one means to >an end" and that, in this specific case, the code being rewritten was >explicitly excluded from the context given to the bot (though that turns >out not to entirely be the case). In theory, it only had the desired >API and a set of tests available to it. > >The fact that every version of chardet was surely in its training data >is not deemed to be relevant. > >jon > That's a question for the lawyers and the courts, really. But it is most definitely *not* clean room. That being said, clean room is certainly not the only way to rewrite software that can pass legal muster, but it is the gold standard ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 16:55 ` H. Peter Anvin @ 2026-03-09 17:09 ` H. Peter Anvin 2026-03-09 18:19 ` James Bottomley 2026-03-10 4:52 ` Theodore Tso 2 siblings, 0 replies; 16+ messages in thread From: H. Peter Anvin @ 2026-03-09 17:09 UTC (permalink / raw) To: Jonathan Corbet, Steven Rostedt Cc: Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On March 9, 2026 9:55:36 AM PDT, "H. Peter Anvin" <hpa@zytor.com> wrote: >On March 9, 2026 9:33:12 AM PDT, Jonathan Corbet <corbet@lwn.net> wrote: >>Steven Rostedt <rostedt@goodmis.org> writes: >> >>> On Mon, 09 Mar 2026 08:31:03 -0700 >>> "H. Peter Anvin" <hpa@zytor.com> wrote: >>> >>>> It is somewhat hard to see how that would constitute a "clean-room" >>>> rewrite. A clean-room rewrite entails two teams, one (the "clean" room) >>>> which must be certified to have never seen the code in question, and all >>>> communications between the two teams must be auditable. >>> >>> I was thinking the same. >> >>The argumentation that is being made (which I am trying to reproduce but >>am *not* advocating) is that "a clean-room rewrite is just one means to >>an end" and that, in this specific case, the code being rewritten was >>explicitly excluded from the context given to the bot (though that turns >>out not to entirely be the case). In theory, it only had the desired >>API and a set of tests available to it. >> >>The fact that every version of chardet was surely in its training data >>is not deemed to be relevant. >> >>jon >> > >That's a question for the lawyers and the courts, really. But it is most definitely *not* clean room. That being said, clean room is certainly not the only way to rewrite software that can pass legal muster, but it is the gold standard In the end, though, it comes down to the plain fact that LLMs have pushed copyright law into undefined territory. As Uber showed, sometimes the strategy of doing something that is at the very best questionable legally can be successful if you can get spread broadly enough quickly enough so that the political process overtakes the legal one. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 16:55 ` H. Peter Anvin 2026-03-09 17:09 ` H. Peter Anvin @ 2026-03-09 18:19 ` James Bottomley 2026-03-09 18:34 ` Steven Rostedt 2026-03-10 4:52 ` Theodore Tso 2 siblings, 1 reply; 16+ messages in thread From: James Bottomley @ 2026-03-09 18:19 UTC (permalink / raw) To: H. Peter Anvin, Jonathan Corbet, Steven Rostedt Cc: Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On Mon, 2026-03-09 at 09:55 -0700, H. Peter Anvin wrote: > On March 9, 2026 9:33:12 AM PDT, Jonathan Corbet <corbet@lwn.net> > wrote: > > Steven Rostedt <rostedt@goodmis.org> writes: > > > > > On Mon, 09 Mar 2026 08:31:03 -0700 > > > "H. Peter Anvin" <hpa@zytor.com> wrote: > > > > > > > It is somewhat hard to see how that would constitute a "clean- > > > > room" > > > > rewrite. A clean-room rewrite entails two teams, one (the > > > > "clean" room) > > > > which must be certified to have never seen the code in > > > > question, and all > > > > communications between the two teams must be auditable. > > > > > > I was thinking the same. > > > > The argumentation that is being made (which I am trying to > > reproduce but > > am *not* advocating) is that "a clean-room rewrite is just one > > means to > > an end" and that, in this specific case, the code being rewritten > > was > > explicitly excluded from the context given to the bot (though that > > turns > > out not to entirely be the case). In theory, it only had the > > desired > > API and a set of tests available to it. > > > > The fact that every version of chardet was surely in its training > > data > > is not deemed to be relevant. > > > > jon > > > > That's a question for the lawyers and the courts, really. But it is > most definitely *not* clean room. That being said, clean room is > certainly not the only way to rewrite software that can pass legal > muster, but it is the gold standard Agreed. The specific problem is that The US Copyright definition of derivation presumes that if you've had exposure to the original work then anything you produce that's similar is a derivative. That doesn't mean that you can't produce a non derivative similar work it's just the burden of proof shifts to you to prove that in creating the similar work you didn't include any elements of the original. This is a phenomenally difficult thing to prove in court (at least for humans) which is why clean room reverse engineering was developed ... because you can demonstrate the required non-exposure to the original by the separation of the two teams. I don't think LLMs will be able to come up with the necessary proof of separation without essentially recreating the clean room process, which grows cost prohibitive as the complexity of the work increases. Regards, James ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 18:19 ` James Bottomley @ 2026-03-09 18:34 ` Steven Rostedt 2026-03-09 18:38 ` Dr. David Alan Gilbert 2026-03-09 18:54 ` James Bottomley 0 siblings, 2 replies; 16+ messages in thread From: Steven Rostedt @ 2026-03-09 18:34 UTC (permalink / raw) To: James Bottomley Cc: H. Peter Anvin, Jonathan Corbet, Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On Mon, 09 Mar 2026 11:19:42 -0700 James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > I don't think LLMs will be able to come up with the necessary proof of > separation without essentially recreating the clean room process, which > grows cost prohibitive as the complexity of the work increases. I con see one AI bot reading the original code, and then making the prompts to pass to a second AI bot to produce the code. That is basically exactly how clean rooms work for humans. Now the question is, would courts agree that is a clean room? -- Steve ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 18:34 ` Steven Rostedt @ 2026-03-09 18:38 ` Dr. David Alan Gilbert 2026-03-09 18:54 ` James Bottomley 1 sibling, 0 replies; 16+ messages in thread From: Dr. David Alan Gilbert @ 2026-03-09 18:38 UTC (permalink / raw) To: Steven Rostedt Cc: James Bottomley, H. Peter Anvin, Jonathan Corbet, Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner * Steven Rostedt (rostedt@goodmis.org) wrote: > On Mon, 09 Mar 2026 11:19:42 -0700 > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > > I don't think LLMs will be able to come up with the necessary proof of > > separation without essentially recreating the clean room process, which > > grows cost prohibitive as the complexity of the work increases. > > I con see one AI bot reading the original code, and then making the prompts > to pass to a second AI bot to produce the code. That's what Chardet did: https://github.com/chardet/chardet/issues/327#issuecomment-4005195078 however, there's a fun question of whether the second AI had already seen the code; a lot of LLMs seem to have been trained on a lot of open source code already. Dave > -- Steve > -- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ dave @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 18:34 ` Steven Rostedt 2026-03-09 18:38 ` Dr. David Alan Gilbert @ 2026-03-09 18:54 ` James Bottomley 1 sibling, 0 replies; 16+ messages in thread From: James Bottomley @ 2026-03-09 18:54 UTC (permalink / raw) To: Steven Rostedt Cc: H. Peter Anvin, Jonathan Corbet, Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On Mon, 2026-03-09 at 14:34 -0400, Steven Rostedt wrote: > On Mon, 09 Mar 2026 11:19:42 -0700 > James Bottomley <James.Bottomley@HansenPartnership.com> wrote: > > > I don't think LLMs will be able to come up with the necessary proof > > of separation without essentially recreating the clean room > > process, which grows cost prohibitive as the complexity of the work > > increases. > > I con see one AI bot reading the original code, and then making the > prompts to pass to a second AI bot to produce the code. > > That is basically exactly how clean rooms work for humans. Now the > question is, would courts agree that is a clean room? well, yes, because you can demonstrate clean room separation which is already legally acknowledged as independent invention. My argument isn't that LLMs can't do this. It's that doing it is way more expensive than simply feeding the code into a LLM and asking for independent reinvention, which is what the guy did to chardet. In fact I contend that reducing something to its API and then reconstructing it is difficult to scale (and certainly is very costly in LLM tokens) which is why we shouldn't necessarily fear it will be done to the kernel. Regards, James ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-09 16:55 ` H. Peter Anvin 2026-03-09 17:09 ` H. Peter Anvin 2026-03-09 18:19 ` James Bottomley @ 2026-03-10 4:52 ` Theodore Tso [not found] ` <CAMTJT3_cVaA7aJmDa6j288-qwP3jzvM_R2pdk+XmE+1U=Sovbg@mail.gmail.com> 2 siblings, 1 reply; 16+ messages in thread From: Theodore Tso @ 2026-03-10 4:52 UTC (permalink / raw) To: H. Peter Anvin Cc: Jonathan Corbet, Steven Rostedt, Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner > >The fact that every version of chardet was surely in its training data > >is not deemed to be relevant. > > That's a question for the lawyers and the courts, really. But it is > most definitely *not* clean room. That being said, clean room is > certainly not the only way to rewrite software that can pass legal > muster, but it is the gold standard Well, given that researchers were able to elicit 96% of Harry Potter and the Sorcerer's Stone from Claude 3.7 Sonnet[1], the question I have is that if you have one LLM instance create a specification from looking at the code that you are trying to clone, and then you have a second LLM instance that was trained on the code you are trying to clone, and then fed the specification --- regardless of whether this can be considered "clean room" from a process perpsective, the other question is just whether there is enough similarity in the actual *results*, that could also be a problem. [1] https://arxiv.org/html/2601.02671v1 Of course, we could imagine using the LLM to incrementally rerite the C code that was elicited from the specification if the results are too closely to the source program --- that is, "Hey ChatGPT, please file off the serial number so the source code looks nothing like the GPL code that I'm trying to rip off." The thing is, though, this is something that humans could do as well, It wouldn't surprise me if there are cases of "clean room implementation" where there might be some incremental rewriting; and proving that it wasn't a strict clean room procedure might be quite difficult. It's just that with AI, it might be easier to do things at scale. - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <CAMTJT3_cVaA7aJmDa6j288-qwP3jzvM_R2pdk+XmE+1U=Sovbg@mail.gmail.com>]
* Re: LLM based rewrites [not found] ` <CAMTJT3_cVaA7aJmDa6j288-qwP3jzvM_R2pdk+XmE+1U=Sovbg@mail.gmail.com> @ 2026-03-10 12:47 ` Theodore Tso 2026-03-10 14:10 ` Dr. Greg 0 siblings, 1 reply; 16+ messages in thread From: Theodore Tso @ 2026-03-10 12:47 UTC (permalink / raw) To: EJ Stinson Cc: H. Peter Anvin, Jonathan Corbet, Steven Rostedt, Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On Mon, Mar 09, 2026 at 10:15:28PM -0700, EJ Stinson wrote: > Imagine if a rouge AI got access to rewriting the kernel, or was exploited, > this would lead to near certain catastrophe. LLM’s should not rewrite the > code, as if somehow a AI were to achieve singularity or go rouge/be > attacked by an anarchistic/foreign actor, think about the amount of code it > could sneak in without human suspicion, or just lead to human ignorance. I > think for the time being until we know for certain, there should be no > reason to use LLM’s to help rewrite at scale any sort of code. Even if we > were able to prove it wasn’t stolen code; the time spent on proving such > fact, and ensuring the security, would already take way too long tomerit > this sort of use. I think you're misunderstanding the concern that was raised at FOSDEM; which is that it is now possible for companies to take code that might be licensed under a license such as the GPL, and ask AI to to do a "clean room rewrite" and make then that code could be used or relicensed under a more permissive license, such as Apache or BSD --- or the company might take that code and use it in a proprietary codebase. "It's the end of the world as we know it....." There are a couple of problems with their premise. The first is that they demonstrated this on some very simple bits of Javascript. It's not clear whether this would work at *all* on something more complicated, never mind something like the Linux kernel. The second is the legal issues, and there are multiple dimensions whether the resulting code really would be considered free and clear for relicensing. And the third is whether it would really result in more secure code (which was their premise for why some companies might do this, since the people giving the presentation at FOSDEM were security researchers). Given that AI generated code is generally *more* likely to have security vulnerabilities than human written code, this assumption seems dubious to me. Also if the security vulnerability is inherent in the software architecture, having the first LLM generate a spec might result in a *spec* which is buggy / vulnerable, and so when the second LLM translates that spec back into C code, not only might it introduce new security vulnerabiities, the original security vulnerability present in the source implementaiton might be preserved. The bottom line is that I rate the FOSDEM as being 10/10 when they talk about the history of copyright, 9/10 when they talked about the history of clean room reimplementation (which has been around since humans has been around), when they talk about what's possible in the present, I'd give them a 3/10, and when they talk about the future, I'd rate their talk at 5/10 --- since their whole point was to start a conversation, and they certainly did that. One thing we need to remember though is that we don't have the power to stop people from doing this. For that matter, it could be that there are sweatshops in some third world country where people have been reimplementing open source code into propretiary code, and that could have been happening for years or even decades --- if the resulting rewrite gets used in some propetiary code case, we'd never know about it. The only thing AI could potentially do is to democratize this, so that any random person with a few thousand dollars of AI LLM credits might be able to attempt this. And even if today the LLM's aren't really up to the task for non-trivial programs, that could change over time. If that happens though, it's not just Open Source that is going to be affected. There are lots of people predicting that people graduating with CS degrees are going to be left begging in the streets since whether we're talking about new proprietary code or new open source code, an AI bot, perhaps with some help with a senior developer to guide the LLM, will mean that we won't need all that many (or perhaps *any*) junior programmers. Is that hysteria and overblown hyperbole? Maybe. The other possibility is that this will be the beginning of something similar to what happened to the replacement of textile artisans that made cloth by hand in early 1800's, when mechanized power looms made their jobs.... obsolete. Look up "Luddite" in wikipedia for more details. What happened really *sucked* for the people who made cloth the old way, and but the result was the ability for people to buy shirts for something significantly less that the a year's worth of wages for the average laborer. Will AI do to Software Engineers with the early industrial revelotion in England did to people like Ned Ludd? Who knows? But if it happens, it isn't going to be just Open Source that will be affected. And in the meantime, people who design clothes and fabric patterns will have jobs, even today in the 21st century. Cheers, - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-10 12:47 ` Theodore Tso @ 2026-03-10 14:10 ` Dr. Greg 0 siblings, 0 replies; 16+ messages in thread From: Dr. Greg @ 2026-03-10 14:10 UTC (permalink / raw) To: Theodore Tso Cc: EJ Stinson, H. Peter Anvin, Jonathan Corbet, Steven Rostedt, Christian Brauner, tech-board-discuss, linux-kernel, ksummit-discuss, christianvanbrauner On Tue, Mar 10, 2026 at 08:47:21AM -0400, Theodore Tso wrote: Good morning, I hope the week is going well for everyone. > On Mon, Mar 09, 2026 at 10:15:28PM -0700, EJ Stinson wrote: > > Imagine if a rouge AI got access to rewriting the kernel, or was > > exploited, this would lead to near certain catastrophe. LLM's > > should not rewrite the code, as if somehow a AI were to achieve > > singularity or go rouge/be attacked by an anarchistic/foreign > > actor, think about the amount of code it could sneak in without > > human suspicion, or just lead to human ignorance. I think for the > > time being until we know for certain, there should be no reason to > > use LLM's to help rewrite at scale any sort of code. Even if we > > were able to prove it wasn???t stolen code; the time spent on > > proving such fact, and ensuring the security, would already take > > way too long tomerit this sort of use. > And the third is whether it would really result in more secure code > (which was their premise for why some companies might do this, since > the people giving the presentation at FOSDEM were security > researchers). Given that AI generated code is generally *more* likely > to have security vulnerabilities than human written code, this > assumption seems dubious to me. Also if the security vulnerability is > inherent in the software architecture, having the first LLM generate a > spec might result in a *spec* which is buggy / vulnerable, and so when > the second LLM translates that spec back into C code, not only might > it introduce new security vulnerabiities, the original security > vulnerability present in the source implementaiton might be preserved. It would seem that if some enterprising individual or more likely a major technology company, with sufficient resources, told an LLM to simply convert the entire kernel to Rust, that would be the end of kernel security vulnerabilities as we know it, not? Then, if said enterprising individual or corporation slapped the GPL on the result and pushed it to GitHub, mankind would be saved as we know it. In the spirit of Christian's intention to inspire conversation... :-) > Cheers, > > - Ted Have a good remainder of the week. As always, Dr. Greg The Quixote Project - Flailing at the Travails of Cybersecurity https://github.com/Quixote-Project ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-07 20:49 LLM based rewrites Christian Brauner 2026-03-09 13:57 ` Steven Rostedt @ 2026-03-09 16:05 ` Dave Hansen 2026-03-09 16:16 ` James Bottomley 2 siblings, 0 replies; 16+ messages in thread From: Dave Hansen @ 2026-03-09 16:05 UTC (permalink / raw) To: Christian Brauner, tech-board-discuss Cc: linux-kernel, ksummit-discuss, christianvanbrauner On 3/7/26 12:49, Christian Brauner wrote: > I'm not asking for a legal analysis. I'm mostly looking for reassurance > that we as the kernel community and our representatives have an eye on > this. I find this quite worrisome. Let's say someone did this for the kernel and released it under a more permissive license. Let's also ignore whether this is legally or morally naughty or nice for the moment. Any way you slice it, they'd start with a gigantic Linux-like code base and effectively zero people to work on it. It would _effectively_ be a big kernel fork. Maybe it would be different because it's got a more permissive license. Linus would be proved wrong after all these years and contributors would flock to the new codebase, throwing off the chains of the evil GPLv2 that kept Linux from being successful. Mainline has survived quite a few kernel forks. There would have to be something pretty darn compelling about this new fork. Considering that license taste is about as universal as vi/emacs taste, I doubt the license itself would be compelling on its own. Maybe the new kernel would be 100% rust? Maybe it would be more friendly to LLM maintenance? Either way, Magic 8 Ball says: "Concentrate and ask again". So this does sound like a great TAB topic! ;) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: LLM based rewrites 2026-03-07 20:49 LLM based rewrites Christian Brauner 2026-03-09 13:57 ` Steven Rostedt 2026-03-09 16:05 ` Dave Hansen @ 2026-03-09 16:16 ` James Bottomley 2 siblings, 0 replies; 16+ messages in thread From: James Bottomley @ 2026-03-09 16:16 UTC (permalink / raw) To: Christian Brauner, tech-board-discuss Cc: linux-kernel, ksummit-discuss, christianvanbrauner On Sat, 2026-03-07 at 21:49 +0100, Christian Brauner wrote: [... > Fwiw, I was made aware that there's a tangentially related discussion > on the distribution mailing list at [3]. The lawyers on the European Legal Network are already debating this point. However, I really doubt that feeding the code base into a LLM and saying rewrite it will pass muster at least under the US legal tests for originality: it's the same reason why you can't give an engineering team the code base and say rewrite it. You have to apply the clean room principles of one team reducing the code base to an API reference and a separate team reimplementing it as a new code base to get the required separation from being a derived work. My point not being the legal one, but the technical one that the clean room reduction to an API and back to code can be done by independent LLMs, but likely not on the scale that something as big as the kernel would require. So it could work for small code bases but would be prohibitively costly for huge ones. Regards, James ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2026-03-10 14:11 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07 20:49 LLM based rewrites Christian Brauner
2026-03-09 13:57 ` Steven Rostedt
2026-03-09 15:31 ` H. Peter Anvin
2026-03-09 16:16 ` Steven Rostedt
2026-03-09 16:33 ` Jonathan Corbet
2026-03-09 16:55 ` H. Peter Anvin
2026-03-09 17:09 ` H. Peter Anvin
2026-03-09 18:19 ` James Bottomley
2026-03-09 18:34 ` Steven Rostedt
2026-03-09 18:38 ` Dr. David Alan Gilbert
2026-03-09 18:54 ` James Bottomley
2026-03-10 4:52 ` Theodore Tso
[not found] ` <CAMTJT3_cVaA7aJmDa6j288-qwP3jzvM_R2pdk+XmE+1U=Sovbg@mail.gmail.com>
2026-03-10 12:47 ` Theodore Tso
2026-03-10 14:10 ` Dr. Greg
2026-03-09 16:05 ` Dave Hansen
2026-03-09 16:16 ` James Bottomley
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.