Re: LLM based rewrites - Theodore Tso

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Theodore Tso" <tytso@mit.edu>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jonathan Corbet <corbet@lwn.net>,
	Steven Rostedt <rostedt@goodmis.org>,
	Christian Brauner <brauner@kernel.org>,
	tech-board-discuss@lists.linux.dev, linux-kernel@vger.kernel.org,
	ksummit-discuss@lists.linuxfoundation.org,
	christianvanbrauner@gmail.com
Subject: Re: LLM based rewrites
Date: Tue, 10 Mar 2026 00:52:10 -0400	[thread overview]
Message-ID: <20260310045210.GA14867@macsyma-wired.lan> (raw)
In-Reply-To: <04B897EF-DEEC-42D0-8E00-888CEEA5318E@zytor.com>

> >The fact that every version of chardet was surely in its training data
> >is not deemed to be relevant.
>
> That's a question for the lawyers and the courts, really. But it is
> most definitely *not* clean room. That being said, clean room is
> certainly not the only way to rewrite software that can pass legal
> muster, but it is the gold standard

Well, given that researchers were able to elicit 96% of Harry Potter
and the Sorcerer's Stone from Claude 3.7 Sonnet[1], the question I
have is that if you have one LLM instance create a specification from
looking at the code that you are trying to clone, and then you have a
second LLM instance that was trained on the code you are trying to
clone, and then fed the specification --- regardless of whether this
can be considered "clean room" from a process perpsective, the other
question is just whether there is enough similarity in the actual
*results*, that could also be a problem.

[1] https://arxiv.org/html/2601.02671v1

Of course, we could imagine using the LLM to incrementally rerite the
C code that was elicited from the specification if the results are too
closely to the source program --- that is, "Hey ChatGPT, please file
off the serial number so the source code looks nothing like the GPL
code that I'm trying to rip off."

The thing is, though, this is something that humans could do as well,
It wouldn't surprise me if there are cases of "clean room
implementation" where there might be some incremental rewriting; and
proving that it wasn't a strict clean room procedure might be quite
difficult.  It's just that with AI, it might be easier to do things at
scale.

						- Ted

next prev parent reply	other threads:[~2026-03-10  4:56 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-07 20:49 LLM based rewrites Christian Brauner
2026-03-09 13:57 ` Steven Rostedt
2026-03-09 15:31   ` H. Peter Anvin
2026-03-09 16:16     ` Steven Rostedt
2026-03-09 16:33       ` Jonathan Corbet
2026-03-09 16:55         ` H. Peter Anvin
2026-03-09 17:09           ` H. Peter Anvin
2026-03-09 18:19           ` James Bottomley
2026-03-09 18:34             ` Steven Rostedt
2026-03-09 18:38               ` Dr. David Alan Gilbert
2026-03-09 18:54               ` James Bottomley
2026-03-10  4:52           ` Theodore Tso [this message]
     [not found]             ` <CAMTJT3_cVaA7aJmDa6j288-qwP3jzvM_R2pdk+XmE+1U=Sovbg@mail.gmail.com>
2026-03-10 12:47               ` Theodore Tso
2026-03-10 14:10                 ` Dr. Greg
2026-03-09 16:05 ` Dave Hansen
2026-03-09 16:16 ` James Bottomley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260310045210.GA14867@macsyma-wired.lan \
    --to=tytso@mit.edu \
    --cc=brauner@kernel.org \
    --cc=christianvanbrauner@gmail.com \
    --cc=corbet@lwn.net \
    --cc=hpa@zytor.com \
    --cc=ksummit-discuss@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tech-board-discuss@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.