linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
@ 2025-07-24 17:54 linux
  2025-07-24 19:07 ` Konstantin Ryabitsev
  0 siblings, 1 reply; 20+ messages in thread
From: linux @ 2025-07-24 17:54 UTC (permalink / raw)
  To: corbet, workflows, kees, josh, konstantin
  Cc: linux-doc, linux-kernel, Dr. David Alan Gilbert

From: "Dr. David Alan Gilbert" <linux@treblig.org>

It seems right to require that code which is automatically
generated is disclosed in the commit message.

This is a starting point.  It's purposely agnostic about
whether using any such tools is a good idea or not, and is also
agnostic about trying to draw any hard line about when a tool
should be disclosed like this.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
---
This span out of a Fediverse discussion, those involved cc'd

 Documentation/process/submitting-patches.rst | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/Documentation/process/submitting-patches.rst b/Documentation/process/submitting-patches.rst
index cede4e7b29af..d7c8f47a4632 100644
--- a/Documentation/process/submitting-patches.rst
+++ b/Documentation/process/submitting-patches.rst
@@ -452,6 +452,18 @@ development. SoB chains should reflect the **real** route a patch took
 as it was propagated to the maintainers and ultimately to Linus, with
 the first SoB entry signalling primary authorship of a single author.
 
+Disclosing tool generated code
+------------------------------
+
+When a substantial part of the patch (code or text) has been generated by
+some automated system, such as an AI/LLM, or automated code patcher
+(e.g. Coccinelle) the use shall be disclosed by::
+
+  Generated-by: Example Tool 2.3
+
+Where possible, the input text or prompt should be included in the
+commit message to enable others to learn techniques that work well.
+
 
 When to use Acked-by:, Cc:, and Co-developed-by:
 ------------------------------------------------
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-24 17:54 [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag linux
@ 2025-07-24 19:07 ` Konstantin Ryabitsev
  2025-07-24 20:45   ` Kees Cook
  0 siblings, 1 reply; 20+ messages in thread
From: Konstantin Ryabitsev @ 2025-07-24 19:07 UTC (permalink / raw)
  To: linux; +Cc: corbet, workflows, kees, josh, linux-doc, linux-kernel

On Thu, Jul 24, 2025 at 06:54:39PM +0100, linux@treblig.org wrote:
> From: "Dr. David Alan Gilbert" <linux@treblig.org>
> 
> It seems right to require that code which is automatically
> generated is disclosed in the commit message.

I'm not sure that's the case. There is a lot of automatically generated
content being added to the kernel all the time -- such as auto-formatted code,
documentation, and unit tests generated by non-AI tooling. We've not required
indicating this usage before, so I'm not sure it makes sense to start doing it
now.

Furthermore, merely indicating the tool doesn't really say anything about how
it was used (e.g. what version, what prompt, what context, etc.) If anything,
this information needs to live in the cover letter of the submission. I would
suggest we investigate encouraging contributors to disclose this there, e.g.:

| ---
| This patch series was partially generated with "InsensitiveClod o4 Hokus"
| and then heavily modified to remove the parts where it went completely off
| the deep end.

I am also not opposed to having a more standard cover letter footer that would
allow an easier way to query this information via public-inbox services, e.g.:

| generated-by: insensitive clod o4 hokus

However, I don't really think this belongs in the commit trailers.

-K

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-24 19:07 ` Konstantin Ryabitsev
@ 2025-07-24 20:45   ` Kees Cook
  2025-07-24 21:06     ` Laurent Pinchart
  2025-07-24 21:12     ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 20+ messages in thread
From: Kees Cook @ 2025-07-24 20:45 UTC (permalink / raw)
  To: Konstantin Ryabitsev
  Cc: linux, corbet, workflows, josh, linux-doc, linux-kernel

On Thu, Jul 24, 2025 at 03:07:17PM -0400, Konstantin Ryabitsev wrote:
> On Thu, Jul 24, 2025 at 06:54:39PM +0100, linux@treblig.org wrote:
> > From: "Dr. David Alan Gilbert" <linux@treblig.org>
> > 
> > It seems right to require that code which is automatically
> > generated is disclosed in the commit message.
> 
> I'm not sure that's the case. There is a lot of automatically generated
> content being added to the kernel all the time -- such as auto-formatted code,
> documentation, and unit tests generated by non-AI tooling. We've not required
> indicating this usage before, so I'm not sure it makes sense to start doing it
> now.
> 
> Furthermore, merely indicating the tool doesn't really say anything about how
> it was used (e.g. what version, what prompt, what context, etc.) If anything,
> this information needs to live in the cover letter of the submission. I would
> suggest we investigate encouraging contributors to disclose this there, e.g.:
> 
> | ---
> | This patch series was partially generated with "InsensitiveClod o4 Hokus"
> | and then heavily modified to remove the parts where it went completely off
> | the deep end.
> 
> I am also not opposed to having a more standard cover letter footer that would
> allow an easier way to query this information via public-inbox services, e.g.:
> 
> | generated-by: insensitive clod o4 hokus
> 
> However, I don't really think this belongs in the commit trailers.

I agree; I'm not sure I see a benefit in creating a regularized trailer
for this. What automation/tracking is going to key off of it? It's
a detail of patch creation methodology, so the commentary about how
something was created is best put in the prose areas, like we already
do for Coccinelle or other scripts. It's a bit buried in the Researcher
Guidelines[1], but we have explicitly asked for details about tooling:

  When sending patches produced from research, the commit logs should
  contain at least the following details, so that developers have
  appropriate context for understanding the contribution.
  ...
  Specifically include details about any testing, static or dynamic
  analysis programs, and any other tools or methods used to perform the
  work.

Maybe that needs to be repeated in SubmittingPatches?

-Kees

[1] https://docs.kernel.org/process/researcher-guidelines.html

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-24 20:45   ` Kees Cook
@ 2025-07-24 21:06     ` Laurent Pinchart
  2025-07-24 21:12     ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 20+ messages in thread
From: Laurent Pinchart @ 2025-07-24 21:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Konstantin Ryabitsev, linux, corbet, workflows, josh, linux-doc,
	linux-kernel

On Thu, Jul 24, 2025 at 01:45:35PM -0700, Kees Cook wrote:
> On Thu, Jul 24, 2025 at 03:07:17PM -0400, Konstantin Ryabitsev wrote:
> > On Thu, Jul 24, 2025 at 06:54:39PM +0100, linux@treblig.org wrote:
> > > From: "Dr. David Alan Gilbert" <linux@treblig.org>
> > > 
> > > It seems right to require that code which is automatically
> > > generated is disclosed in the commit message.
> > 
> > I'm not sure that's the case. There is a lot of automatically generated
> > content being added to the kernel all the time -- such as auto-formatted code,
> > documentation, and unit tests generated by non-AI tooling. We've not required
> > indicating this usage before, so I'm not sure it makes sense to start doing it
> > now.
> > 
> > Furthermore, merely indicating the tool doesn't really say anything about how
> > it was used (e.g. what version, what prompt, what context, etc.) If anything,
> > this information needs to live in the cover letter of the submission. I would
> > suggest we investigate encouraging contributors to disclose this there, e.g.:
> > 
> > | ---
> > | This patch series was partially generated with "InsensitiveClod o4 Hokus"
> > | and then heavily modified to remove the parts where it went completely off
> > | the deep end.
> > 
> > I am also not opposed to having a more standard cover letter footer that would
> > allow an easier way to query this information via public-inbox services, e.g.:
> > 
> > | generated-by: insensitive clod o4 hokus
> > 
> > However, I don't really think this belongs in the commit trailers.

I think there's often value in having the information in individual
patches instead of (or in addition to) the cover letter though, as it's
common for different patches in a series to be generated differently.
Standardizing on one option or the other may be overkill at this point
though. Especially when it comes to code generated by LLMs, how (and if)
to report that information should be governed by the issues we want to
address, and I don't think there's a consensus on those yet.

One issue that is often mentioned is copyright infringement. We go to
great length today to ensure that code is fit for inclusion in the
kernel from a legal point of view with the certificate of origin and the
SoB line. It would seem to make sense to then also report if code was
geenrated by an LLM per-commit if we want to extend the copyright paper
trail (for whatever purpose it will be used later).

> I agree; I'm not sure I see a benefit in creating a regularized trailer
> for this. What automation/tracking is going to key off of it?

We may find/invent use cases for automation later, in which case we can
revisit usage of a standardized trailer. I however see an important
manual use case for the information already: knowing how a patch was
created helps reviewers. If I'm told a patch was generated by coccinelle
(especially if the semantic patch is included in the commit message
too), I will pay attention to different types of mistakes than for a
manually written patch.

> It's
> a detail of patch creation methodology, so the commentary about how
> something was created is best put in the prose areas, like we already
> do for Coccinelle or other scripts. It's a bit buried in the Researcher
> Guidelines[1], but we have explicitly asked for details about tooling:
> 
>   When sending patches produced from research, the commit logs should
>   contain at least the following details, so that developers have
>   appropriate context for understanding the contribution.
>   ...
>   Specifically include details about any testing, static or dynamic
>   analysis programs, and any other tools or methods used to perform the
>   work.
> 
> Maybe that needs to be repeated in SubmittingPatches?
> 
> -Kees
> 
> [1] https://docs.kernel.org/process/researcher-guidelines.html

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-24 20:45   ` Kees Cook
  2025-07-24 21:06     ` Laurent Pinchart
@ 2025-07-24 21:12     ` Dr. David Alan Gilbert
  2025-07-24 21:20       ` Kees Cook
  1 sibling, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2025-07-24 21:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Konstantin Ryabitsev, corbet, workflows, josh, linux-doc,
	linux-kernel

* Kees Cook (kees@kernel.org) wrote:
> On Thu, Jul 24, 2025 at 03:07:17PM -0400, Konstantin Ryabitsev wrote:
> > On Thu, Jul 24, 2025 at 06:54:39PM +0100, linux@treblig.org wrote:
> > > From: "Dr. David Alan Gilbert" <linux@treblig.org>
> > > 
> > > It seems right to require that code which is automatically
> > > generated is disclosed in the commit message.
> > 
> > I'm not sure that's the case. There is a lot of automatically generated
> > content being added to the kernel all the time -- such as auto-formatted code,
> > documentation, and unit tests generated by non-AI tooling. We've not required
> > indicating this usage before, so I'm not sure it makes sense to start doing it
> > now.
> > 
> > Furthermore, merely indicating the tool doesn't really say anything about how
> > it was used (e.g. what version, what prompt, what context, etc.) If anything,
> > this information needs to live in the cover letter of the submission. I would
> > suggest we investigate encouraging contributors to disclose this there, e.g.:
> > 
> > | ---
> > | This patch series was partially generated with "InsensitiveClod o4 Hokus"
> > | and then heavily modified to remove the parts where it went completely off
> > | the deep end.
> > 
> > I am also not opposed to having a more standard cover letter footer that would
> > allow an easier way to query this information via public-inbox services, e.g.:
> > 
> > | generated-by: insensitive clod o4 hokus
> > 
> > However, I don't really think this belongs in the commit trailers.
> 
> I agree; I'm not sure I see a benefit in creating a regularized trailer
> for this. What automation/tracking is going to key off of it? It's
> a detail of patch creation methodology,

My logic here is something like:
   a) Some people worry about various issues on AI such as copyright;
so it feels like it should be trackable.
   b) The teams that develop tools that work well deserve credit, so
formalising it seems to make that easier to see; be they AI etc.
   c) There's a general worry about people sending patches without
acknowledging their use of AI, and then not (carefully) checking
the output.  Calling out the need to record it might help get
people to at least acknowledge it.
   d) (a) and (c) are really only about AI, but our previous chat
was wondering if all tools needed it, but calling out anything where
it's code generation seemed to be a reasonable line to me.
   e) If one tool tended to be particularly bad at missing one type
of check, with a tag you could track down what we have from it.
   f) Related to (a), some large open source projects are explicitly
disallowing AI generated contributions; life will get messy for them
if people import kernel code with a compatible license that was
generated by AI.

(I didn't really want to get into the question of whether the use of
AI was good or bad; but people worrying about it isn't unreasonable)

> so the commentary about how
> something was created is best put in the prose areas, like we already
> do for Coccinelle or other scripts. It's a bit buried in the Researcher
> Guidelines[1], but we have explicitly asked for details about tooling:
> 
>   When sending patches produced from research, the commit logs should
>   contain at least the following details, so that developers have
>   appropriate context for understanding the contribution.
>   ...
>   Specifically include details about any testing, static or dynamic
>   analysis programs, and any other tools or methods used to perform the
>   work.
> 
> Maybe that needs to be repeated in SubmittingPatches?

'produced from research' is narrowing things down a bit too much I think
when it's people using the tools as their normal way of working.

Dave

> -Kees
> 
> [1] https://docs.kernel.org/process/researcher-guidelines.html
> 
> -- 
> Kees Cook
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-24 21:12     ` Dr. David Alan Gilbert
@ 2025-07-24 21:20       ` Kees Cook
  2025-07-24 23:45         ` Steven Rostedt
  0 siblings, 1 reply; 20+ messages in thread
From: Kees Cook @ 2025-07-24 21:20 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Konstantin Ryabitsev, corbet, workflows, josh, linux-doc,
	linux-kernel

On Thu, Jul 24, 2025 at 09:12:30PM +0000, Dr. David Alan Gilbert wrote:
> * Kees Cook (kees@kernel.org) wrote:
> > [...]
> > do for Coccinelle or other scripts. It's a bit buried in the Researcher
> > Guidelines[1], but we have explicitly asked for details about tooling:
> > 
> >   When sending patches produced from research, the commit logs should
> >   contain at least the following details, so that developers have
> >   appropriate context for understanding the contribution.
> >   ...
> >   Specifically include details about any testing, static or dynamic
> >   analysis programs, and any other tools or methods used to perform the
> >   work.
> > 
> > Maybe that needs to be repeated in SubmittingPatches?
> 
> 'produced from research' is narrowing things down a bit too much I think
> when it's people using the tools as their normal way of working.

Right -- as currently written we have the explicit guideline for
"produced from research" and kind of an unwritten rule to detail any
complex tools involved for regular development (e.g. Coccinelle,
syzkaller, etc). We could generalize the existing statement and repeat
it in a better location?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-24 21:20       ` Kees Cook
@ 2025-07-24 23:45         ` Steven Rostedt
  2025-07-24 23:54           ` Kees Cook
  0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2025-07-24 23:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: Dr. David Alan Gilbert, Konstantin Ryabitsev, corbet, workflows,
	josh, linux-doc, linux-kernel

On Thu, 24 Jul 2025 14:20:03 -0700
Kees Cook <kees@kernel.org> wrote:

> On Thu, Jul 24, 2025 at 09:12:30PM +0000, Dr. David Alan Gilbert wrote:
> > * Kees Cook (kees@kernel.org) wrote:  
> > > [...]
> > > do for Coccinelle or other scripts. It's a bit buried in the Researcher
> > > Guidelines[1], but we have explicitly asked for details about tooling:
> > > 
> > >   When sending patches produced from research, the commit logs should
> > >   contain at least the following details, so that developers have
> > >   appropriate context for understanding the contribution.
> > >   ...
> > >   Specifically include details about any testing, static or dynamic
> > >   analysis programs, and any other tools or methods used to perform the
> > >   work.
> > > 
> > > Maybe that needs to be repeated in SubmittingPatches?  
> > 
> > 'produced from research' is narrowing things down a bit too much I think
> > when it's people using the tools as their normal way of working.  

So I did bring this up in the last TAB meeting. I brought it up because I
found out from reading an LWN[1] article that I received a patch fully
written in AI without knowledge that it was written with AI. If I had known,
I would have examined the patch a little more thoroughly, and would have
discovered a very minor mistake in the patch.

> 
> Right -- as currently written we have the explicit guideline for
> "produced from research" and kind of an unwritten rule to detail any
> complex tools involved for regular development (e.g. Coccinelle,
> syzkaller, etc). We could generalize the existing statement and repeat
> it in a better location?

When a patch is generated by Coccinelle, checkpatch or any other tool, it
should most definitely be mentioned in the change log.

I strongly believe the same goes for AI. Now the argument is where do we
draw the line? If you are using AI that helps write your code, do you need
to disclose it every time?

My thought is to treat AI as another developer. If a developer helps you
like the AI is helping you, would you give that developer credit for that
work? If so, then you should also give credit to the tooling that's helping
you.

I suggested adding a new tag to note any tool that has done non-trivial
work to produce the patch where you give it credit if it has helped you as
much as another developer that you would give credit to.

-- Steve


[1] https://lwn.net/Articles/1026558/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-24 23:45         ` Steven Rostedt
@ 2025-07-24 23:54           ` Kees Cook
  2025-07-25  0:55             ` Dr. David Alan Gilbert
  2025-07-25  1:06             ` Sasha Levin
  0 siblings, 2 replies; 20+ messages in thread
From: Kees Cook @ 2025-07-24 23:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dr. David Alan Gilbert, Konstantin Ryabitsev, corbet, workflows,
	josh, linux-doc, linux-kernel

On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
> My thought is to treat AI as another developer. If a developer helps you
> like the AI is helping you, would you give that developer credit for that
> work? If so, then you should also give credit to the tooling that's helping
> you.
> 
> I suggested adding a new tag to note any tool that has done non-trivial
> work to produce the patch where you give it credit if it has helped you as
> much as another developer that you would give credit to.

We've got tags to choose from already in that case:

Suggested-by: LLM

or

Co-developed-by: LLM <not@human.with.legal.standing>
Signed-off-by: LLM <not@human.with.legal.standing>

The latter seems ... not good, as it implies DCO SoB from a thing that
can't and hasn't acknowledged the DCO.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-24 23:54           ` Kees Cook
@ 2025-07-25  0:55             ` Dr. David Alan Gilbert
  2025-07-25  1:06             ` Sasha Levin
  1 sibling, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2025-07-25  0:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Steven Rostedt, Konstantin Ryabitsev, corbet, workflows, josh,
	linux-doc, linux-kernel

* Kees Cook (kees@kernel.org) wrote:
> On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
> > My thought is to treat AI as another developer. If a developer helps you
> > like the AI is helping you, would you give that developer credit for that
> > work? If so, then you should also give credit to the tooling that's helping
> > you.
> > 
> > I suggested adding a new tag to note any tool that has done non-trivial
> > work to produce the patch where you give it credit if it has helped you as
> > much as another developer that you would give credit to.
> 
> We've got tags to choose from already in that case:
> 
> Suggested-by: LLM

For me, 'Suggested-by:' seems fine for where an LLM has
responded to a 'suggest improvements to this function'.

> or
> 
> Co-developed-by: LLM <not@human.with.legal.standing>
> Signed-off-by: LLM <not@human.with.legal.standing>
> 
> The latter seems ... not good, as it implies DCO SoB from a thing that
> can't and hasn't acknowledged the DCO.

Yeh, the Co-developed-by:  isn't terrible,  but in both that and the
Suggested-by: is there a standard for how you would refer to the tool?
IMHO it should not have an email address there otherwise it'll confuse tools 
into cc'ing them.

Dave

> 
> -- 
> Kees Cook
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-24 23:54           ` Kees Cook
  2025-07-25  0:55             ` Dr. David Alan Gilbert
@ 2025-07-25  1:06             ` Sasha Levin
  2025-07-25  1:20               ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 20+ messages in thread
From: Sasha Levin @ 2025-07-25  1:06 UTC (permalink / raw)
  To: Kees Cook
  Cc: Steven Rostedt, Dr. David Alan Gilbert, Konstantin Ryabitsev,
	corbet, workflows, josh, linux-doc, linux-kernel

On Thu, Jul 24, 2025 at 04:54:11PM -0700, Kees Cook wrote:
>On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
>> My thought is to treat AI as another developer. If a developer helps you
>> like the AI is helping you, would you give that developer credit for that
>> work? If so, then you should also give credit to the tooling that's helping
>> you.
>>
>> I suggested adding a new tag to note any tool that has done non-trivial
>> work to produce the patch where you give it credit if it has helped you as
>> much as another developer that you would give credit to.
>
>We've got tags to choose from already in that case:
>
>Suggested-by: LLM
>
>or
>
>Co-developed-by: LLM <not@human.with.legal.standing>
>Signed-off-by: LLM <not@human.with.legal.standing>
>
>The latter seems ... not good, as it implies DCO SoB from a thing that
>can't and hasn't acknowledged the DCO.

In my mind, "any tool" would also be something like gcc giving you a
"non-trivial" error (think something like a buffer overflow warning that
could have been a security issue).

In that case, should we encode the entire toolchain used for developing
a patch?

Maybe...

Some sort of semi-standardized shorthand notation of the tooling used to
develop a patch could be interesting not just for plain disclosure, but
also to be able to trace back issues with patches ("oh! the author
didn't see a warning because they use gcc 13 while the warning was added
in gcc 14!").

Signed-off-by: John Doe <jd@example.com> # gcc:14.1;ccache:1.2;sparse:4.7;claude-code:0.5

This way some of it could be automated via git hooks and we can recommend
a relevant string to add with checkpatch.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25  1:06             ` Sasha Levin
@ 2025-07-25  1:20               ` Dr. David Alan Gilbert
  2025-07-25  1:52                 ` Sasha Levin
  0 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2025-07-25  1:20 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Kees Cook, Steven Rostedt, Konstantin Ryabitsev, corbet,
	workflows, josh, linux-doc, linux-kernel

* Sasha Levin (sashal@kernel.org) wrote:
> On Thu, Jul 24, 2025 at 04:54:11PM -0700, Kees Cook wrote:
> > On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
> > > My thought is to treat AI as another developer. If a developer helps you
> > > like the AI is helping you, would you give that developer credit for that
> > > work? If so, then you should also give credit to the tooling that's helping
> > > you.
> > > 
> > > I suggested adding a new tag to note any tool that has done non-trivial
> > > work to produce the patch where you give it credit if it has helped you as
> > > much as another developer that you would give credit to.
> > 
> > We've got tags to choose from already in that case:
> > 
> > Suggested-by: LLM
> > 
> > or
> > 
> > Co-developed-by: LLM <not@human.with.legal.standing>
> > Signed-off-by: LLM <not@human.with.legal.standing>
> > 
> > The latter seems ... not good, as it implies DCO SoB from a thing that
> > can't and hasn't acknowledged the DCO.
> 
> In my mind, "any tool" would also be something like gcc giving you a
> "non-trivial" error (think something like a buffer overflow warning that
> could have been a security issue).
> 
> In that case, should we encode the entire toolchain used for developing
> a patch?
> 
> Maybe...
> 
> Some sort of semi-standardized shorthand notation of the tooling used to
> develop a patch could be interesting not just for plain disclosure, but
> also to be able to trace back issues with patches ("oh! the author
> didn't see a warning because they use gcc 13 while the warning was added
> in gcc 14!").
> 
> Signed-off-by: John Doe <jd@example.com> # gcc:14.1;ccache:1.2;sparse:4.7;claude-code:0.5
> 
> This way some of it could be automated via git hooks and we can recommend
> a relevant string to add with checkpatch.

For me there are two separate things:
  a) A tool that found a problem
  b) A tool that wrote a piece of code.

I think the cases you're referring to are all (a), where as I'm mostly
thinking here about (b).
In the case of (a) it's normally _one_ of those tools that found it,
e.g. I see some:
   Found by gcc -fanalyzer

but we don't have a defined way to refer to them.
I also see a variety from coverity, e.g.
  Addresses-Coverity:  xxxxx
or the use of Link: to refer to a coverity failure
or
  Addresses-Coverity-ID: xxxx ("Description of it")

or a few others.
It would be great to standardise some of that as well.

Dave

> -- 
> Thanks,
> Sasha
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25  1:20               ` Dr. David Alan Gilbert
@ 2025-07-25  1:52                 ` Sasha Levin
  2025-07-25  2:02                   ` Steven Rostedt
  2025-07-25 11:29                   ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 20+ messages in thread
From: Sasha Levin @ 2025-07-25  1:52 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kees Cook, Steven Rostedt, Konstantin Ryabitsev, corbet,
	workflows, josh, linux-doc, linux-kernel

On Fri, Jul 25, 2025 at 01:20:59AM +0000, Dr. David Alan Gilbert wrote:
>* Sasha Levin (sashal@kernel.org) wrote:
>> On Thu, Jul 24, 2025 at 04:54:11PM -0700, Kees Cook wrote:
>> > On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
>> > > My thought is to treat AI as another developer. If a developer helps you
>> > > like the AI is helping you, would you give that developer credit for that
>> > > work? If so, then you should also give credit to the tooling that's helping
>> > > you.
>> > >
>> > > I suggested adding a new tag to note any tool that has done non-trivial
>> > > work to produce the patch where you give it credit if it has helped you as
>> > > much as another developer that you would give credit to.
>> >
>> > We've got tags to choose from already in that case:
>> >
>> > Suggested-by: LLM
>> >
>> > or
>> >
>> > Co-developed-by: LLM <not@human.with.legal.standing>
>> > Signed-off-by: LLM <not@human.with.legal.standing>
>> >
>> > The latter seems ... not good, as it implies DCO SoB from a thing that
>> > can't and hasn't acknowledged the DCO.
>>
>> In my mind, "any tool" would also be something like gcc giving you a
>> "non-trivial" error (think something like a buffer overflow warning that
>> could have been a security issue).
>>
>> In that case, should we encode the entire toolchain used for developing
>> a patch?
>>
>> Maybe...
>>
>> Some sort of semi-standardized shorthand notation of the tooling used to
>> develop a patch could be interesting not just for plain disclosure, but
>> also to be able to trace back issues with patches ("oh! the author
>> didn't see a warning because they use gcc 13 while the warning was added
>> in gcc 14!").
>>
>> Signed-off-by: John Doe <jd@example.com> # gcc:14.1;ccache:1.2;sparse:4.7;claude-code:0.5
>>
>> This way some of it could be automated via git hooks and we can recommend
>> a relevant string to add with checkpatch.
>
>For me there are two separate things:
>  a) A tool that found a problem
>  b) A tool that wrote a piece of code.
>
>I think the cases you're referring to are all (a), where as I'm mostly
>thinking here about (b).
>In the case of (a) it's normally _one_ of those tools that found it,
>e.g. I see some:
>   Found by gcc -fanalyzer

I think that the line between (a) and (b) gets very blurry very fast, so
I'd rather stay out of trying to define it.

Running "cargo clippy" on some code might generate a warning as follows:

warning: variables can be used directly in the `format!` string
   --> dyad/src/kernel/sha_processing.rs:20:13
    |
20 |             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
    = note: `#[warn(clippy::uninlined_format_args)]` on by default
help: change this to
    |
20 -             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
20 +             debug!("git sha {git_sha} could not be validated, attempting a second way...");

As you see, it proposes a fix at the bottom. Should I attribute "cargo
clippy" in my commit message as it wrote some code?

Would your answer change if I run "cargo clippy --fix" which would
automatically apply the fix on it's own?

We'll be hitting these issues all over the place if we try and draw a
line... For example, with more advances autocompletion: where would you
draw the line between completing variable names and writing an entire
function based on a comment I've made?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25  1:52                 ` Sasha Levin
@ 2025-07-25  2:02                   ` Steven Rostedt
  2025-07-25  2:39                     ` Sasha Levin
  2025-07-25 11:29                   ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2025-07-25  2:02 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dr. David Alan Gilbert, Kees Cook, Konstantin Ryabitsev, corbet,
	workflows, josh, linux-doc, linux-kernel

On Thu, 24 Jul 2025 21:52:12 -0400
Sasha Levin <sashal@kernel.org> wrote:

> We'll be hitting these issues all over the place if we try and draw a
> line... For example, with more advances autocompletion: where would you
> draw the line between completing variable names and writing an entire
> function based on a comment I've made?

It's not much different than the "copyright" issue. How much code do I have
to copy before I start infringing on someone's copyright?

But if you start using tooling to come up with algorithms that you would
not think of on your own, then you definitely should document it.

Heck, I do it now even for algorithms I get from a book. I'll credit Knuth
on stuff all the time. Same should happen if you get something from AI.

It's one thing if it finds a bug or formatting issue, it's something
completely different if it starts coming up with the algorithms for you.

And even if it is trivial, if you had it do most of the work, you most
definitely should disclose it.

-- Steve

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25  2:02                   ` Steven Rostedt
@ 2025-07-25  2:39                     ` Sasha Levin
  0 siblings, 0 replies; 20+ messages in thread
From: Sasha Levin @ 2025-07-25  2:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dr. David Alan Gilbert, Kees Cook, Konstantin Ryabitsev, corbet,
	workflows, josh, linux-doc, linux-kernel

On Thu, Jul 24, 2025 at 10:02:41PM -0400, Steven Rostedt wrote:
>On Thu, 24 Jul 2025 21:52:12 -0400
>Sasha Levin <sashal@kernel.org> wrote:
>
>> We'll be hitting these issues all over the place if we try and draw a
>> line... For example, with more advances autocompletion: where would you
>> draw the line between completing variable names and writing an entire
>> function based on a comment I've made?
>
>It's not much different than the "copyright" issue. How much code do I have
>to copy before I start infringing on someone's copyright?
>
>But if you start using tooling to come up with algorithms that you would
>not think of on your own, then you definitely should document it.
>
>Heck, I do it now even for algorithms I get from a book. I'll credit Knuth
>on stuff all the time. Same should happen if you get something from AI.
>
>It's one thing if it finds a bug or formatting issue, it's something
>completely different if it starts coming up with the algorithms for you.
>
>And even if it is trivial, if you had it do most of the work, you most
>definitely should disclose it.

Steve, I'm advocating for disclosing more, not less :)

I think that if we try to draw a line, we have no way of doing it
without it being vague and blurry (and quickly become outdated as tech
around us keeps moving).

Adding metadata for the relevant toolchain bits (let them be the
compiler I use, the kernel-specific tooling I ran on the patch, or the
LLM that was used to generate code) has benefits beyond just LLM
disclosure.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25  1:52                 ` Sasha Levin
  2025-07-25  2:02                   ` Steven Rostedt
@ 2025-07-25 11:29                   ` Dr. David Alan Gilbert
  2025-07-25 11:37                     ` Laurent Pinchart
  2025-07-25 22:40                     ` Sasha Levin
  1 sibling, 2 replies; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2025-07-25 11:29 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Kees Cook, Steven Rostedt, Konstantin Ryabitsev, corbet,
	workflows, josh, linux-doc, linux-kernel

* Sasha Levin (sashal@kernel.org) wrote:
> On Fri, Jul 25, 2025 at 01:20:59AM +0000, Dr. David Alan Gilbert wrote:
> > * Sasha Levin (sashal@kernel.org) wrote:
> > > On Thu, Jul 24, 2025 at 04:54:11PM -0700, Kees Cook wrote:
> > > > On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
> > > > > My thought is to treat AI as another developer. If a developer helps you
> > > > > like the AI is helping you, would you give that developer credit for that
> > > > > work? If so, then you should also give credit to the tooling that's helping
> > > > > you.
> > > > >
> > > > > I suggested adding a new tag to note any tool that has done non-trivial
> > > > > work to produce the patch where you give it credit if it has helped you as
> > > > > much as another developer that you would give credit to.
> > > >
> > > > We've got tags to choose from already in that case:
> > > >
> > > > Suggested-by: LLM
> > > >
> > > > or
> > > >
> > > > Co-developed-by: LLM <not@human.with.legal.standing>
> > > > Signed-off-by: LLM <not@human.with.legal.standing>
> > > >
> > > > The latter seems ... not good, as it implies DCO SoB from a thing that
> > > > can't and hasn't acknowledged the DCO.
> > > 
> > > In my mind, "any tool" would also be something like gcc giving you a
> > > "non-trivial" error (think something like a buffer overflow warning that
> > > could have been a security issue).
> > > 
> > > In that case, should we encode the entire toolchain used for developing
> > > a patch?
> > > 
> > > Maybe...
> > > 
> > > Some sort of semi-standardized shorthand notation of the tooling used to
> > > develop a patch could be interesting not just for plain disclosure, but
> > > also to be able to trace back issues with patches ("oh! the author
> > > didn't see a warning because they use gcc 13 while the warning was added
> > > in gcc 14!").
> > > 
> > > Signed-off-by: John Doe <jd@example.com> # gcc:14.1;ccache:1.2;sparse:4.7;claude-code:0.5
> > > 
> > > This way some of it could be automated via git hooks and we can recommend
> > > a relevant string to add with checkpatch.
> > 
> > For me there are two separate things:
> >  a) A tool that found a problem
> >  b) A tool that wrote a piece of code.
> > 
> > I think the cases you're referring to are all (a), where as I'm mostly
> > thinking here about (b).
> > In the case of (a) it's normally _one_ of those tools that found it,
> > e.g. I see some:
> >   Found by gcc -fanalyzer
> 
> I think that the line between (a) and (b) gets very blurry very fast, so
> I'd rather stay out of trying to define it.
> 
> Running "cargo clippy" on some code might generate a warning as follows:
> 
> warning: variables can be used directly in the `format!` string
>   --> dyad/src/kernel/sha_processing.rs:20:13
>    |
> 20 |             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
>    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>    |
>    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
>    = note: `#[warn(clippy::uninlined_format_args)]` on by default
> help: change this to
>    |
> 20 -             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
> 20 +             debug!("git sha {git_sha} could not be validated, attempting a second way...");
> 
> As you see, it proposes a fix at the bottom. Should I attribute "cargo
> clippy" in my commit message as it wrote some code?
> 
> Would your answer change if I run "cargo clippy --fix" which would
> automatically apply the fix on it's own?
> 
> We'll be hitting these issues all over the place if we try and draw a
> line... For example, with more advances autocompletion: where would you
> draw the line between completing variable names and writing an entire
> function based on a comment I've made?

Fuzzy isn't it!

There's at least 3 levels as I see it:
  1) Reported-by:
    That's a lot of tools, that generate an error or warning.
  2) Suggested-by:
    That covers your example above (hmm including --fix ????)
  3) Co-authored-by:
    Where a tool wrote code based on your more abstract instructions

(1) & (2) are taking some existing code and finding errors or light
improvements;  I don't think it matters whether the tool is a good
old chunk of C or an LLM that's doing it, but how much it's originating.

(Now I'm leaning more towards Kees's style of using existing tags
if we could define a way to do it cleanly).

Dave

> -- 
> Thanks,
> Sasha
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25 11:29                   ` Dr. David Alan Gilbert
@ 2025-07-25 11:37                     ` Laurent Pinchart
  2025-07-25 11:49                       ` Dr. David Alan Gilbert
  2025-07-25 22:40                     ` Sasha Levin
  1 sibling, 1 reply; 20+ messages in thread
From: Laurent Pinchart @ 2025-07-25 11:37 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Sasha Levin, Kees Cook, Steven Rostedt, Konstantin Ryabitsev,
	corbet, workflows, josh, linux-doc, linux-kernel

On Fri, Jul 25, 2025 at 11:29:17AM +0000, Dr. David Alan Gilbert wrote:
> * Sasha Levin (sashal@kernel.org) wrote:
> > On Fri, Jul 25, 2025 at 01:20:59AM +0000, Dr. David Alan Gilbert wrote:
> > > * Sasha Levin (sashal@kernel.org) wrote:
> > > > On Thu, Jul 24, 2025 at 04:54:11PM -0700, Kees Cook wrote:
> > > > > On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
> > > > > > My thought is to treat AI as another developer. If a developer helps you
> > > > > > like the AI is helping you, would you give that developer credit for that
> > > > > > work? If so, then you should also give credit to the tooling that's helping
> > > > > > you.
> > > > > >
> > > > > > I suggested adding a new tag to note any tool that has done non-trivial
> > > > > > work to produce the patch where you give it credit if it has helped you as
> > > > > > much as another developer that you would give credit to.
> > > > >
> > > > > We've got tags to choose from already in that case:
> > > > >
> > > > > Suggested-by: LLM
> > > > >
> > > > > or
> > > > >
> > > > > Co-developed-by: LLM <not@human.with.legal.standing>
> > > > > Signed-off-by: LLM <not@human.with.legal.standing>
> > > > >
> > > > > The latter seems ... not good, as it implies DCO SoB from a thing that
> > > > > can't and hasn't acknowledged the DCO.
> > > > 
> > > > In my mind, "any tool" would also be something like gcc giving you a
> > > > "non-trivial" error (think something like a buffer overflow warning that
> > > > could have been a security issue).
> > > > 
> > > > In that case, should we encode the entire toolchain used for developing
> > > > a patch?
> > > > 
> > > > Maybe...
> > > > 
> > > > Some sort of semi-standardized shorthand notation of the tooling used to
> > > > develop a patch could be interesting not just for plain disclosure, but
> > > > also to be able to trace back issues with patches ("oh! the author
> > > > didn't see a warning because they use gcc 13 while the warning was added
> > > > in gcc 14!").
> > > > 
> > > > Signed-off-by: John Doe <jd@example.com> # gcc:14.1;ccache:1.2;sparse:4.7;claude-code:0.5
> > > > 
> > > > This way some of it could be automated via git hooks and we can recommend
> > > > a relevant string to add with checkpatch.
> > > 
> > > For me there are two separate things:
> > >  a) A tool that found a problem
> > >  b) A tool that wrote a piece of code.
> > > 
> > > I think the cases you're referring to are all (a), where as I'm mostly
> > > thinking here about (b).
> > > In the case of (a) it's normally _one_ of those tools that found it,
> > > e.g. I see some:
> > >   Found by gcc -fanalyzer
> > 
> > I think that the line between (a) and (b) gets very blurry very fast, so
> > I'd rather stay out of trying to define it.
> > 
> > Running "cargo clippy" on some code might generate a warning as follows:
> > 
> > warning: variables can be used directly in the `format!` string
> >   --> dyad/src/kernel/sha_processing.rs:20:13
> >    |
> > 20 |             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
> >    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >    |
> >    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
> >    = note: `#[warn(clippy::uninlined_format_args)]` on by default
> > help: change this to
> >    |
> > 20 -             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
> > 20 +             debug!("git sha {git_sha} could not be validated, attempting a second way...");
> > 
> > As you see, it proposes a fix at the bottom. Should I attribute "cargo
> > clippy" in my commit message as it wrote some code?
> > 
> > Would your answer change if I run "cargo clippy --fix" which would
> > automatically apply the fix on it's own?
> > 
> > We'll be hitting these issues all over the place if we try and draw a
> > line... For example, with more advances autocompletion: where would you
> > draw the line between completing variable names and writing an entire
> > function based on a comment I've made?
> 
> Fuzzy isn't it!
> 
> There's at least 3 levels as I see it:
>   1) Reported-by:
>     That's a lot of tools, that generate an error or warning.
>   2) Suggested-by:
>     That covers your example above (hmm including --fix ????)
>   3) Co-authored-by:
>     Where a tool wrote code based on your more abstract instructions
> 
> (1) & (2) are taking some existing code and finding errors or light
> improvements;  I don't think it matters whether the tool is a good
> old chunk of C or an LLM that's doing it, but how much it's originating.

Except from a copyright point of view. The situation is quite clear for
deterministic code generation, it's less so for LLMs.

> (Now I'm leaning more towards Kees's style of using existing tags
> if we could define a way to do it cleanly).

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25 11:37                     ` Laurent Pinchart
@ 2025-07-25 11:49                       ` Dr. David Alan Gilbert
  2025-07-25 17:45                         ` Al Viro
  0 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2025-07-25 11:49 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sasha Levin, Kees Cook, Steven Rostedt, Konstantin Ryabitsev,
	corbet, workflows, josh, linux-doc, linux-kernel

* Laurent Pinchart (laurent.pinchart@ideasonboard.com) wrote:
> On Fri, Jul 25, 2025 at 11:29:17AM +0000, Dr. David Alan Gilbert wrote:
> > * Sasha Levin (sashal@kernel.org) wrote:
> > > On Fri, Jul 25, 2025 at 01:20:59AM +0000, Dr. David Alan Gilbert wrote:
> > > > * Sasha Levin (sashal@kernel.org) wrote:
> > > > > On Thu, Jul 24, 2025 at 04:54:11PM -0700, Kees Cook wrote:
> > > > > > On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
> > > > > > > My thought is to treat AI as another developer. If a developer helps you
> > > > > > > like the AI is helping you, would you give that developer credit for that
> > > > > > > work? If so, then you should also give credit to the tooling that's helping
> > > > > > > you.
> > > > > > >
> > > > > > > I suggested adding a new tag to note any tool that has done non-trivial
> > > > > > > work to produce the patch where you give it credit if it has helped you as
> > > > > > > much as another developer that you would give credit to.
> > > > > >
> > > > > > We've got tags to choose from already in that case:
> > > > > >
> > > > > > Suggested-by: LLM
> > > > > >
> > > > > > or
> > > > > >
> > > > > > Co-developed-by: LLM <not@human.with.legal.standing>
> > > > > > Signed-off-by: LLM <not@human.with.legal.standing>
> > > > > >
> > > > > > The latter seems ... not good, as it implies DCO SoB from a thing that
> > > > > > can't and hasn't acknowledged the DCO.
> > > > > 
> > > > > In my mind, "any tool" would also be something like gcc giving you a
> > > > > "non-trivial" error (think something like a buffer overflow warning that
> > > > > could have been a security issue).
> > > > > 
> > > > > In that case, should we encode the entire toolchain used for developing
> > > > > a patch?
> > > > > 
> > > > > Maybe...
> > > > > 
> > > > > Some sort of semi-standardized shorthand notation of the tooling used to
> > > > > develop a patch could be interesting not just for plain disclosure, but
> > > > > also to be able to trace back issues with patches ("oh! the author
> > > > > didn't see a warning because they use gcc 13 while the warning was added
> > > > > in gcc 14!").
> > > > > 
> > > > > Signed-off-by: John Doe <jd@example.com> # gcc:14.1;ccache:1.2;sparse:4.7;claude-code:0.5
> > > > > 
> > > > > This way some of it could be automated via git hooks and we can recommend
> > > > > a relevant string to add with checkpatch.
> > > > 
> > > > For me there are two separate things:
> > > >  a) A tool that found a problem
> > > >  b) A tool that wrote a piece of code.
> > > > 
> > > > I think the cases you're referring to are all (a), where as I'm mostly
> > > > thinking here about (b).
> > > > In the case of (a) it's normally _one_ of those tools that found it,
> > > > e.g. I see some:
> > > >   Found by gcc -fanalyzer
> > > 
> > > I think that the line between (a) and (b) gets very blurry very fast, so
> > > I'd rather stay out of trying to define it.
> > > 
> > > Running "cargo clippy" on some code might generate a warning as follows:
> > > 
> > > warning: variables can be used directly in the `format!` string
> > >   --> dyad/src/kernel/sha_processing.rs:20:13
> > >    |
> > > 20 |             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
> > >    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >    |
> > >    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
> > >    = note: `#[warn(clippy::uninlined_format_args)]` on by default
> > > help: change this to
> > >    |
> > > 20 -             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
> > > 20 +             debug!("git sha {git_sha} could not be validated, attempting a second way...");
> > > 
> > > As you see, it proposes a fix at the bottom. Should I attribute "cargo
> > > clippy" in my commit message as it wrote some code?
> > > 
> > > Would your answer change if I run "cargo clippy --fix" which would
> > > automatically apply the fix on it's own?
> > > 
> > > We'll be hitting these issues all over the place if we try and draw a
> > > line... For example, with more advances autocompletion: where would you
> > > draw the line between completing variable names and writing an entire
> > > function based on a comment I've made?
> > 
> > Fuzzy isn't it!
> > 
> > There's at least 3 levels as I see it:
> >   1) Reported-by:
> >     That's a lot of tools, that generate an error or warning.
> >   2) Suggested-by:
> >     That covers your example above (hmm including --fix ????)
> >   3) Co-authored-by:
> >     Where a tool wrote code based on your more abstract instructions
> > 
> > (1) & (2) are taking some existing code and finding errors or light
> > improvements;  I don't think it matters whether the tool is a good
> > old chunk of C or an LLM that's doing it, but how much it's originating.
> 
> Except from a copyright point of view. The situation is quite clear for
> deterministic code generation, it's less so for LLMs.

As long as you'd acknowledged the use of the LLM in all cases, it seems to
me right to say to what degree you use it (i.e. the 1..3) above.
I think even most people worried about copright issues would worry
less if an LLM had just told you about a problem (1) and you fixed it.
(Although obviously IANAL)

Dave

> > (Now I'm leaning more towards Kees's style of using existing tags
> > if we could define a way to do it cleanly).
> 
> -- 
> Regards,
> 
> Laurent Pinchart
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25 11:49                       ` Dr. David Alan Gilbert
@ 2025-07-25 17:45                         ` Al Viro
  0 siblings, 0 replies; 20+ messages in thread
From: Al Viro @ 2025-07-25 17:45 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Laurent Pinchart, Sasha Levin, Kees Cook, Steven Rostedt,
	Konstantin Ryabitsev, corbet, workflows, josh, linux-doc,
	linux-kernel

On Fri, Jul 25, 2025 at 11:49:02AM +0000, Dr. David Alan Gilbert wrote:

> > Except from a copyright point of view. The situation is quite clear for
> > deterministic code generation, it's less so for LLMs.
> 
> As long as you'd acknowledged the use of the LLM in all cases, it seems to
> me right to say to what degree you use it (i.e. the 1..3) above.
> I think even most people worried about copright issues would worry
> less if an LLM had just told you about a problem (1) and you fixed it.
> (Although obviously IANAL)

s/told you about a problem/told you that <location> has triggered some
heuristics and might or might not be worth looking into/, really...

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25 11:29                   ` Dr. David Alan Gilbert
  2025-07-25 11:37                     ` Laurent Pinchart
@ 2025-07-25 22:40                     ` Sasha Levin
  2025-07-25 23:29                       ` Dr. David Alan Gilbert
  1 sibling, 1 reply; 20+ messages in thread
From: Sasha Levin @ 2025-07-25 22:40 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Kees Cook, Steven Rostedt, Konstantin Ryabitsev, corbet,
	workflows, josh, linux-doc, linux-kernel

On Fri, Jul 25, 2025 at 11:29:17AM +0000, Dr. David Alan Gilbert wrote:
>* Sasha Levin (sashal@kernel.org) wrote:
>> On Fri, Jul 25, 2025 at 01:20:59AM +0000, Dr. David Alan Gilbert wrote:
>> > * Sasha Levin (sashal@kernel.org) wrote:
>> > > On Thu, Jul 24, 2025 at 04:54:11PM -0700, Kees Cook wrote:
>> > > > On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
>> > > > > My thought is to treat AI as another developer. If a developer helps you
>> > > > > like the AI is helping you, would you give that developer credit for that
>> > > > > work? If so, then you should also give credit to the tooling that's helping
>> > > > > you.
>> > > > >
>> > > > > I suggested adding a new tag to note any tool that has done non-trivial
>> > > > > work to produce the patch where you give it credit if it has helped you as
>> > > > > much as another developer that you would give credit to.
>> > > >
>> > > > We've got tags to choose from already in that case:
>> > > >
>> > > > Suggested-by: LLM
>> > > >
>> > > > or
>> > > >
>> > > > Co-developed-by: LLM <not@human.with.legal.standing>
>> > > > Signed-off-by: LLM <not@human.with.legal.standing>
>> > > >
>> > > > The latter seems ... not good, as it implies DCO SoB from a thing that
>> > > > can't and hasn't acknowledged the DCO.
>> > >
>> > > In my mind, "any tool" would also be something like gcc giving you a
>> > > "non-trivial" error (think something like a buffer overflow warning that
>> > > could have been a security issue).
>> > >
>> > > In that case, should we encode the entire toolchain used for developing
>> > > a patch?
>> > >
>> > > Maybe...
>> > >
>> > > Some sort of semi-standardized shorthand notation of the tooling used to
>> > > develop a patch could be interesting not just for plain disclosure, but
>> > > also to be able to trace back issues with patches ("oh! the author
>> > > didn't see a warning because they use gcc 13 while the warning was added
>> > > in gcc 14!").
>> > >
>> > > Signed-off-by: John Doe <jd@example.com> # gcc:14.1;ccache:1.2;sparse:4.7;claude-code:0.5
>> > >
>> > > This way some of it could be automated via git hooks and we can recommend
>> > > a relevant string to add with checkpatch.
>> >
>> > For me there are two separate things:
>> >  a) A tool that found a problem
>> >  b) A tool that wrote a piece of code.
>> >
>> > I think the cases you're referring to are all (a), where as I'm mostly
>> > thinking here about (b).
>> > In the case of (a) it's normally _one_ of those tools that found it,
>> > e.g. I see some:
>> >   Found by gcc -fanalyzer
>>
>> I think that the line between (a) and (b) gets very blurry very fast, so
>> I'd rather stay out of trying to define it.
>>
>> Running "cargo clippy" on some code might generate a warning as follows:
>>
>> warning: variables can be used directly in the `format!` string
>>   --> dyad/src/kernel/sha_processing.rs:20:13
>>    |
>> 20 |             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
>>    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>    |
>>    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
>>    = note: `#[warn(clippy::uninlined_format_args)]` on by default
>> help: change this to
>>    |
>> 20 -             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
>> 20 +             debug!("git sha {git_sha} could not be validated, attempting a second way...");
>>
>> As you see, it proposes a fix at the bottom. Should I attribute "cargo
>> clippy" in my commit message as it wrote some code?
>>
>> Would your answer change if I run "cargo clippy --fix" which would
>> automatically apply the fix on it's own?
>>
>> We'll be hitting these issues all over the place if we try and draw a
>> line... For example, with more advances autocompletion: where would you
>> draw the line between completing variable names and writing an entire
>> function based on a comment I've made?
>
>Fuzzy isn't it!
>
>There's at least 3 levels as I see it:
>  1) Reported-by:
>    That's a lot of tools, that generate an error or warning.
>  2) Suggested-by:
>    That covers your example above (hmm including --fix ????)
>  3) Co-authored-by:
>    Where a tool wrote code based on your more abstract instructions
>
>(1) & (2) are taking some existing code and finding errors or light
>improvements;  I don't think it matters whether the tool is a good
>old chunk of C or an LLM that's doing it, but how much it's originating.

So let's say I'm using github copilot, and I go:

	/* Iterate over pointers in KEY_TYPE_extent: */
	#define extent_ptr_next(_e, _ptr) <tab> <tab>

and copilot completes the code with "__bkey_ptr_next(_ptr, extent_entry_last(_e))".

Was my instruction abstract? Was it within the realm of something we
consider a trivial change, or should we attribute the agent? :)

Why tackle any of this to begin with?

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag
  2025-07-25 22:40                     ` Sasha Levin
@ 2025-07-25 23:29                       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2025-07-25 23:29 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Kees Cook, Steven Rostedt, Konstantin Ryabitsev, corbet,
	workflows, josh, linux-doc, linux-kernel

* Sasha Levin (sashal@kernel.org) wrote:
> On Fri, Jul 25, 2025 at 11:29:17AM +0000, Dr. David Alan Gilbert wrote:
> > * Sasha Levin (sashal@kernel.org) wrote:
> > > On Fri, Jul 25, 2025 at 01:20:59AM +0000, Dr. David Alan Gilbert wrote:
> > > > * Sasha Levin (sashal@kernel.org) wrote:
> > > > > On Thu, Jul 24, 2025 at 04:54:11PM -0700, Kees Cook wrote:
> > > > > > On Thu, Jul 24, 2025 at 07:45:56PM -0400, Steven Rostedt wrote:
> > > > > > > My thought is to treat AI as another developer. If a developer helps you
> > > > > > > like the AI is helping you, would you give that developer credit for that
> > > > > > > work? If so, then you should also give credit to the tooling that's helping
> > > > > > > you.
> > > > > > >
> > > > > > > I suggested adding a new tag to note any tool that has done non-trivial
> > > > > > > work to produce the patch where you give it credit if it has helped you as
> > > > > > > much as another developer that you would give credit to.
> > > > > >
> > > > > > We've got tags to choose from already in that case:
> > > > > >
> > > > > > Suggested-by: LLM
> > > > > >
> > > > > > or
> > > > > >
> > > > > > Co-developed-by: LLM <not@human.with.legal.standing>
> > > > > > Signed-off-by: LLM <not@human.with.legal.standing>
> > > > > >
> > > > > > The latter seems ... not good, as it implies DCO SoB from a thing that
> > > > > > can't and hasn't acknowledged the DCO.
> > > > >
> > > > > In my mind, "any tool" would also be something like gcc giving you a
> > > > > "non-trivial" error (think something like a buffer overflow warning that
> > > > > could have been a security issue).
> > > > >
> > > > > In that case, should we encode the entire toolchain used for developing
> > > > > a patch?
> > > > >
> > > > > Maybe...
> > > > >
> > > > > Some sort of semi-standardized shorthand notation of the tooling used to
> > > > > develop a patch could be interesting not just for plain disclosure, but
> > > > > also to be able to trace back issues with patches ("oh! the author
> > > > > didn't see a warning because they use gcc 13 while the warning was added
> > > > > in gcc 14!").
> > > > >
> > > > > Signed-off-by: John Doe <jd@example.com> # gcc:14.1;ccache:1.2;sparse:4.7;claude-code:0.5
> > > > >
> > > > > This way some of it could be automated via git hooks and we can recommend
> > > > > a relevant string to add with checkpatch.
> > > >
> > > > For me there are two separate things:
> > > >  a) A tool that found a problem
> > > >  b) A tool that wrote a piece of code.
> > > >
> > > > I think the cases you're referring to are all (a), where as I'm mostly
> > > > thinking here about (b).
> > > > In the case of (a) it's normally _one_ of those tools that found it,
> > > > e.g. I see some:
> > > >   Found by gcc -fanalyzer
> > > 
> > > I think that the line between (a) and (b) gets very blurry very fast, so
> > > I'd rather stay out of trying to define it.
> > > 
> > > Running "cargo clippy" on some code might generate a warning as follows:
> > > 
> > > warning: variables can be used directly in the `format!` string
> > >   --> dyad/src/kernel/sha_processing.rs:20:13
> > >    |
> > > 20 |             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
> > >    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > >    |
> > >    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
> > >    = note: `#[warn(clippy::uninlined_format_args)]` on by default
> > > help: change this to
> > >    |
> > > 20 -             debug!("git sha {} could not be validated, attempting a second way...", git_sha);
> > > 20 +             debug!("git sha {git_sha} could not be validated, attempting a second way...");
> > > 
> > > As you see, it proposes a fix at the bottom. Should I attribute "cargo
> > > clippy" in my commit message as it wrote some code?
> > > 
> > > Would your answer change if I run "cargo clippy --fix" which would
> > > automatically apply the fix on it's own?
> > > 
> > > We'll be hitting these issues all over the place if we try and draw a
> > > line... For example, with more advances autocompletion: where would you
> > > draw the line between completing variable names and writing an entire
> > > function based on a comment I've made?
> > 
> > Fuzzy isn't it!
> > 
> > There's at least 3 levels as I see it:
> >  1) Reported-by:
> >    That's a lot of tools, that generate an error or warning.
> >  2) Suggested-by:
> >    That covers your example above (hmm including --fix ????)
> >  3) Co-authored-by:
> >    Where a tool wrote code based on your more abstract instructions
> > 
> > (1) & (2) are taking some existing code and finding errors or light
> > improvements;  I don't think it matters whether the tool is a good
> > old chunk of C or an LLM that's doing it, but how much it's originating.
> 
> So let's say I'm using github copilot, and I go:
> 
> 	/* Iterate over pointers in KEY_TYPE_extent: */
> 	#define extent_ptr_next(_e, _ptr) <tab> <tab>
> 
> and copilot completes the code with "__bkey_ptr_next(_ptr, extent_entry_last(_e))".
> 
> Was my instruction abstract? Was it within the realm of something we
> consider a trivial change, or should we attribute the agent? :)

Heck, I don't know either!   I mean there are places & projects that ban even
that level of use, but I'd agree that the 'more abstract' doesn't fit there.

> Why tackle any of this to begin with?

It seemed to me appropriate to identify use of AI which some might
object to, or which wouldn't be allowed in their project, or which
might indicate the need to look for different type of errors than
humans normally make.  At the same time it seemed appropriate to
acknowledge things that worked.

Dave

> -- 
> Thanks,
> Sasha
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-07-25 23:29 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-24 17:54 [RFC PATCH] docs: submitting-patches: (AI?) Tool disclosure tag linux
2025-07-24 19:07 ` Konstantin Ryabitsev
2025-07-24 20:45   ` Kees Cook
2025-07-24 21:06     ` Laurent Pinchart
2025-07-24 21:12     ` Dr. David Alan Gilbert
2025-07-24 21:20       ` Kees Cook
2025-07-24 23:45         ` Steven Rostedt
2025-07-24 23:54           ` Kees Cook
2025-07-25  0:55             ` Dr. David Alan Gilbert
2025-07-25  1:06             ` Sasha Levin
2025-07-25  1:20               ` Dr. David Alan Gilbert
2025-07-25  1:52                 ` Sasha Levin
2025-07-25  2:02                   ` Steven Rostedt
2025-07-25  2:39                     ` Sasha Levin
2025-07-25 11:29                   ` Dr. David Alan Gilbert
2025-07-25 11:37                     ` Laurent Pinchart
2025-07-25 11:49                       ` Dr. David Alan Gilbert
2025-07-25 17:45                         ` Al Viro
2025-07-25 22:40                     ` Sasha Levin
2025-07-25 23:29                       ` Dr. David Alan Gilbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).