Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Alex Bennée" <alex.bennee@linaro.org>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Alexander Graf" <agraf@csgraf.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Phil Mathieu-Daudé" <philmd@linaro.org>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>, "Kevin Wolf" <kwolf@redhat.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>,
	"Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk>,
	"Peter Maydell" <peter.maydell@linaro.org>
Subject: Re: [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators
Date: Fri, 24 Nov 2023 10:21:17 +0000	[thread overview]
Message-ID: <87plzzcuzm.fsf@draig.linaro.org> (raw)
In-Reply-To: <ZWBngLoa3ERuMxGJ@redhat.com> ("Daniel P. Berrangé"'s message of "Fri, 24 Nov 2023 09:06:29 +0000")

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Nov 23, 2023 at 05:39:18PM -0500, Michael S. Tsirkin wrote:
>> On Thu, Nov 23, 2023 at 05:58:45PM +0000, Daniel P. Berrangé wrote:
>> > The license of a code generation tool itself is usually considered
>> > to be not a factor in the license of its output.
>> 
>> Really? I would find it very surprising if a code generation tool that
>> is not a language model and so is not understanding the code it's
>> generating did not include some code snippets going into the output.
>> It is also possible to unintentionally run afoul of GPL's definition of source
>> code which is "the preferred form of the work for making modifications to it". 
>> So even if you have copyright to input, dumping just output and putting
>> GPL on it might or might not be ok.
>
> Consider the C pre-processor. This takes an input .c file, and expands
> all the macros, to split out a new .c file.
>
> The license of the output .c file is determined by the license of the
> input .c file. The license of the CPP impl (whether OSS or proprietary)
> doesn't have any influence on the license of the output file, it cannot
> magically force the output file to be proprietary any more than it can
> force it to be output file GPL.

LLM's are just a tool like a compiler (albeit with spookier different
internals). The prompt and the instructions are arguably the more
important part of how to get good results from the LLM transformation.
In fact most of the way I've been using them has been by pasting some
existing code and asking for review or transformation of it.

However I totally get that using the various online LLMs you have very
little transparency about what has gone into their training and therefor
there is a danger of proprietary code being hallucinated out of their
matricies. Conversely what if I use an LLM like OpenLLaMa:

  https://github.com/openlm-research/open_llama

I have fairly exhaustive definitions of what went into the training data
which of most interest is probably the StarCoder dataset (paper):

  https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view

where there are tools to detect if generated code has been lifted
directly from the dataset or is indeed a transformation.


>
> With regards,
> Daniel

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

next prev parent reply	other threads:[~2023-11-24 10:21 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-23 11:40 [PATCH 0/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
2023-11-23 11:40 ` [PATCH 1/2] docs: introduce dedicated page about code provenance / sign-off Daniel P. Berrangé
2023-11-23 11:58   ` Philippe Mathieu-Daudé
2023-11-23 17:08     ` Daniel P. Berrangé
2023-11-23 23:56       ` Michael S. Tsirkin
2023-11-23 13:01   ` Peter Maydell
2023-11-23 17:12     ` Daniel P. Berrangé
2023-11-23 13:16   ` Kevin Wolf
2023-11-23 17:12     ` Daniel P. Berrangé
2023-11-23 14:25   ` Michael S. Tsirkin
2023-11-23 17:16     ` Daniel P. Berrangé
2023-11-23 17:33       ` Michael S. Tsirkin
2023-11-24 11:11         ` Philippe Mathieu-Daudé
2023-11-24 11:27           ` Michael S. Tsirkin
2023-11-24  9:49       ` Kevin Wolf
2023-11-23 15:13   ` Stefan Hajnoczi
2024-01-27 14:36   ` Zhao Liu
2024-01-29  9:31     ` Daniel P. Berrangé
2024-01-29  9:35       ` Samuel Tardieu
2024-01-29 10:41         ` Peter Maydell
2024-01-29 11:00           ` Daniel P. Berrangé
2023-11-23 11:40 ` [PATCH 2/2] docs: define policy forbidding use of "AI" / LLM code generators Daniel P. Berrangé
2023-11-23 12:57   ` Alex Bennée
2023-11-23 17:37     ` Michal Suchánek
2023-11-23 23:27       ` Michael S. Tsirkin
2023-11-23 17:46     ` Daniel P. Berrangé
2023-11-23 23:53       ` Michael S. Tsirkin
2023-11-24 10:17         ` Kevin Wolf
2023-11-24 10:33           ` Alex Bennée
2023-11-24 10:42             ` Michael S. Tsirkin
2023-11-24 10:43               ` Peter Maydell
2023-11-24 11:02                 ` Michael S. Tsirkin
2023-11-24 11:37                 ` Daniel P. Berrangé
2023-11-24 11:39                   ` Michael S. Tsirkin
2023-11-24 11:40                     ` Michael S. Tsirkin
2023-11-23 13:20   ` Kevin Wolf
2023-11-23 14:35   ` Michael S. Tsirkin
2023-11-23 14:56     ` Manos Pitsidianakis
2023-11-23 15:13       ` Michael S. Tsirkin
2023-11-23 15:29       ` Philippe Mathieu-Daudé
2023-11-23 17:06         ` Michael S. Tsirkin
2023-11-23 17:29           ` Michal Suchánek
2023-11-23 18:05             ` Michael S. Tsirkin
2023-11-23 15:32       ` Alex Bennée
2023-11-23 18:02       ` Daniel P. Berrangé
2023-11-23 18:10         ` Peter Maydell
2023-11-24 10:25       ` Kevin Wolf
2023-11-24 10:37         ` Michael S. Tsirkin
2023-11-24 10:42         ` Manos Pitsidianakis
2023-11-23 17:58     ` Daniel P. Berrangé
2023-11-23 22:39       ` Michael S. Tsirkin
2023-11-24  9:06         ` Daniel P. Berrangé
2023-11-24  9:27           ` Michael S. Tsirkin
2023-11-24 10:21           ` Alex Bennée [this message]
2023-11-24 10:30             ` Michael S. Tsirkin
2023-11-24 11:41             ` Daniel P. Berrangé
2023-11-23 15:22   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87plzzcuzm.fsf@draig.linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=agraf@csgraf.de \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mark.cave-ayland@ilande.co.uk \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=stefanha@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.