[RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes
@ 2025-06-30 20:32 Junio C Hamano
  2025-06-30 21:07 ` brian m. carlson
                   ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Junio C Hamano @ 2025-06-30 20:32 UTC (permalink / raw)
  To: git; +Cc: Git PLC

Following the example set by QEMU folks, let's explicitly forbid use
of genAI tools until the copyright and license situations become
more clear.  Here is what QEMU folks say in their commit to adopt
such a rule:

    The DCO requires contributors to assert they have the right to
    contribute under the designated project license. Given the lack
    of consensus on the licensing of AI code generator output, it is
    not considered credible to assert compliance with the DCO clause
    (b) or (c) where a patch includes such generated code.

and it applies equally well to ours.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/SubmittingPatches | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git c/Documentation/SubmittingPatches w/Documentation/SubmittingPatches
index 958e3cc3d5..63fd10ce39 100644
--- c/Documentation/SubmittingPatches
+++ w/Documentation/SubmittingPatches
@@ -439,6 +439,23 @@ highlighted above.
 Only capitalize the very first letter of the trailer, i.e. favor
 "Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By".
 
+
+[[ai]]
+=== Use of AI content generators
+
+This project requires that contributors certify that their
+contributions are made under Developer's Certificate of Origin 1.1,
+which in turn means that contributors must understand the full
+provenance of what they are contributing.  With AI content generators,
+the copyright or license status of their output is ill-defined, without
+any generally accepted legal foundation.
+
+Hence, the project asks that contributors refrain from using AI content
+generators on changes that are submitted to the project.
+Contributions in which use of AI is either known or suspected may not
+be accepted.
+
+
 [[git-tools]]
 === Generate your patch using Git tools out of your commits.
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes
  2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano
@ 2025-06-30 21:07 ` brian m. carlson
  2025-06-30 21:23   ` Collin Funk
  2025-07-01 10:36 ` Christian Couder
  2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder
  2 siblings, 1 reply; 34+ messages in thread
From: brian m. carlson @ 2025-06-30 21:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Git PLC

[-- Attachment #1: Type: text/plain, Size: 3571 bytes --]

On 2025-06-30 at 20:32:22, Junio C Hamano wrote:
> Following the example set by QEMU folks, let's explicitly forbid use
> of genAI tools until the copyright and license situations become
> more clear.  Here is what QEMU folks say in their commit to adopt
> such a rule:
> 
>     The DCO requires contributors to assert they have the right to
>     contribute under the designated project license. Given the lack
>     of consensus on the licensing of AI code generator output, it is
>     not considered credible to assert compliance with the DCO clause
>     (b) or (c) where a patch includes such generated code.
> 
> and it applies equally well to ours.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/SubmittingPatches | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git c/Documentation/SubmittingPatches w/Documentation/SubmittingPatches
> index 958e3cc3d5..63fd10ce39 100644
> --- c/Documentation/SubmittingPatches
> +++ w/Documentation/SubmittingPatches
> @@ -439,6 +439,23 @@ highlighted above.
>  Only capitalize the very first letter of the trailer, i.e. favor
>  "Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By".
>  
> +
> +[[ai]]
> +=== Use of AI content generators
> +
> +This project requires that contributors certify that their
> +contributions are made under Developer's Certificate of Origin 1.1,
> +which in turn means that contributors must understand the full
> +provenance of what they are contributing.  With AI content generators,
> +the copyright or license status of their output is ill-defined, without
> +any generally accepted legal foundation.
> +
> +Hence, the project asks that contributors refrain from using AI content
> +generators on changes that are submitted to the project.
> +Contributions in which use of AI is either known or suspected may not
> +be accepted.

This matches the advice we gave contributors to GSOC and similar
projects, so it's good that we're being consistent here.

I think this seems prudent given the fact that there are 181 signatories
to the Berne Convention and even if the courts rule that the use of
generative AI is acceptable in one country (say, the United States), it
isn't clear that that will mean anything in other countries (such as
Canada).  Considering that there's ongoing litigation and quite a bit of
legal uncertainty, as well as substantial pushback on generative AI from
the open source community, this approach seems like it's in the best
interests of the project at the moment[0].  We can always reconsider in
the future if need be.

I'll note that this was my interpretation of the DCO from the start (and
I have governed my behaviour and contributions accordingly) but it can
be helpful to explicitly document our shared understanding.

One style note: I noticed that there's two blank lines before and after
this block.  Some sections have one blank line between them and some
have two, so I don't think this is a problem, but I thought I might as
well point it out.

[0] I know some large companies feel differently, but considering our
status as a member project of Conservancy (which is a non-profit), our
comparatively limited assets, and the potential negative legal effects
on downstream distributors (many of which are independent people or
non-profits), I would say we find ourselves in a different position from
those companies and would need to make a different decision.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes
  2025-06-30 21:07 ` brian m. carlson
@ 2025-06-30 21:23   ` Collin Funk
  0 siblings, 0 replies; 34+ messages in thread
From: Collin Funk @ 2025-06-30 21:23 UTC (permalink / raw)
  To: brian m. carlson; +Cc: Junio C Hamano, git, Git PLC

Hi all,

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> I think this seems prudent given the fact that there are 181 signatories
> to the Berne Convention and even if the courts rule that the use of
> generative AI is acceptable in one country (say, the United States), it
> isn't clear that that will mean anything in other countries (such as
> Canada).  Considering that there's ongoing litigation and quite a bit of
> legal uncertainty, as well as substantial pushback on generative AI from
> the open source community, this approach seems like it's in the best
> interests of the project at the moment[0].  We can always reconsider in
> the future if need be.

I agree. It feels unsafe given the lack of legislation and lack of case
law.

One thing, though:

>> +Hence, the project asks that contributors refrain from using AI content
>> +generators on changes that are submitted to the project.
>> +Contributions in which use of AI is either known or suspected may not
>> +be accepted.

This feels more like a suggestion than a requirement. Shouldn't we
explicitly prohibit it? If we truly are worried about the
copyright-ability of its output.

Collin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes
  2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano
  2025-06-30 21:07 ` brian m. carlson
@ 2025-07-01 10:36 ` Christian Couder
  2025-07-01 11:07   ` Christian Couder
  2025-07-01 16:20   ` Junio C Hamano
  2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder
  2 siblings, 2 replies; 34+ messages in thread
From: Christian Couder @ 2025-07-01 10:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Git PLC

On Mon, Jun 30, 2025 at 10:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Following the example set by QEMU folks, let's explicitly forbid use
> of genAI tools until the copyright and license situations become
> more clear.  Here is what QEMU folks say in their commit to adopt
> such a rule:
>
>     The DCO requires contributors to assert they have the right to
>     contribute under the designated project license. Given the lack
>     of consensus on the licensing of AI code generator output, it is
>     not considered credible to assert compliance with the DCO clause
>     (b) or (c) where a patch includes such generated code.

Here they forbid licensing any "AI code generator output" with the DCO.

> and it applies equally well to ours.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/SubmittingPatches | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git c/Documentation/SubmittingPatches w/Documentation/SubmittingPatches
> index 958e3cc3d5..63fd10ce39 100644
> --- c/Documentation/SubmittingPatches
> +++ w/Documentation/SubmittingPatches
> @@ -439,6 +439,23 @@ highlighted above.
>  Only capitalize the very first letter of the trailer, i.e. favor
>  "Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By".
>
> +
> +[[ai]]
> +=== Use of AI content generators
> +
> +This project requires that contributors certify that their
> +contributions are made under Developer's Certificate of Origin 1.1,
> +which in turn means that contributors must understand the full
> +provenance of what they are contributing.  With AI content generators,
> +the copyright or license status of their output is ill-defined, without
> +any generally accepted legal foundation.

Here we would forbid licensing any "AI content generator" output, not
just AI code generator output. So what we would forbid might be more
general than what QEMU folks forbid. For example they might still
accept a new logo, or even commit messages, made using an AI while we
wouldn't.

> +Hence, the project asks that contributors refrain from using AI content
> +generators on changes that are submitted to the project.

Here it looks like using an AI capable of generating content to just
check code that would be submitted could also be forbidden. I don't
think this is what we want, so I think we might want to reword this.

> +Contributions in which use of AI is either known or suspected may not
> +be accepted.

Here also "use of AI" might forbid checking what we submit using any AI tool.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes
  2025-07-01 10:36 ` Christian Couder
@ 2025-07-01 11:07   ` Christian Couder
  2025-07-01 17:33     ` Junio C Hamano
  2025-07-01 16:20   ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Christian Couder @ 2025-07-01 11:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Git PLC

On Tue, Jul 1, 2025 at 12:36 PM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Mon, Jun 30, 2025 at 10:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > Following the example set by QEMU folks, let's explicitly forbid use
> > of genAI tools until the copyright and license situations become
> > more clear.  Here is what QEMU folks say in their commit to adopt
> > such a rule:
> >
> >     The DCO requires contributors to assert they have the right to
> >     contribute under the designated project license. Given the lack
> >     of consensus on the licensing of AI code generator output, it is
> >     not considered credible to assert compliance with the DCO clause
> >     (b) or (c) where a patch includes such generated code.
>
> Here they forbid licensing any "AI code generator output" with the DCO.
>
> > and it applies equally well to ours.

[...]

> > +=== Use of AI content generators
> > +
> > +This project requires that contributors certify that their
> > +contributions are made under Developer's Certificate of Origin 1.1,
> > +which in turn means that contributors must understand the full
> > +provenance of what they are contributing.  With AI content generators,
> > +the copyright or license status of their output is ill-defined, without
> > +any generally accepted legal foundation.
>
> Here we would forbid licensing any "AI content generator" output, not
> just AI code generator output. So what we would forbid might be more
> general than what QEMU folks forbid. For example they might still
> accept a new logo, or even commit messages, made using an AI while we
> wouldn't.

As QEMU is part of the Conservancy, like Git, I wonder if they
consulted a Conservancy lawyer to come up with their wording? If they
did, maybe we could reuse that expertise?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes
  2025-07-01 10:36 ` Christian Couder
  2025-07-01 11:07   ` Christian Couder
@ 2025-07-01 16:20   ` Junio C Hamano
  2025-07-08 14:23     ` Christian Couder
  1 sibling, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2025-07-01 16:20 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Git PLC

Christian Couder <christian.couder@gmail.com> writes:

>> +
>> +[[ai]]
>> +=== Use of AI content generators
>> +
>> +This project requires that contributors certify that their
>> +contributions are made under Developer's Certificate of Origin 1.1,
>> +which in turn means that contributors must understand the full
>> +provenance of what they are contributing.  With AI content generators,
>> +the copyright or license status of their output is ill-defined, without
>> +any generally accepted legal foundation.
>
> Here we would forbid licensing any "AI content generator" output, not
> just AI code generator output. So what we would forbid might be more
> general than what QEMU folks forbid. For example they might still
> accept a new logo, or even commit messages, made using an AI while we
> wouldn't.

I didn't think about the distinction you are trying to draw when I
wrote the patch, but after thinking about it, I think it is a good
thing to prevent us from adopting a new logo graphics somebody may
have ownership rights without us knowing.  I would consider the
commit log message as an integral part of any "contribution", and
read the word "contribution" used in the [[dco]] section as such, if
the rule covers the commit log message, that is very much
appreciated.

>> +Hence, the project asks that contributors refrain from using AI content
>> +generators on changes that are submitted to the project.
>
> Here it looks like using an AI capable of generating content to just
> check code that would be submitted could also be forbidden. I don't
> think this is what we want, so I think we might want to reword this.

Good point.  Asking agents to proofread and suggest improvements is
like asking your friends to do so.  Care to suggest replacement to
these two sentences (above and below)?

>> +Contributions in which use of AI is either known or suspected may not
>> +be accepted.
>
> Here also "use of AI" might forbid checking what we submit using any AI tool.

Thanks.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes
  2025-07-01 11:07   ` Christian Couder
@ 2025-07-01 17:33     ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2025-07-01 17:33 UTC (permalink / raw)
  To: Christian Couder; +Cc: git, Git PLC

Christian Couder <christian.couder@gmail.com> writes:

> As QEMU is part of the Conservancy, like Git, I wonder if they
> consulted a Conservancy lawyer to come up with their wording? If they
> did, maybe we could reuse that expertise?

Or grab their wording wholesale, perhaps?

    https://github.com/qemu/qemu/commit/3d40db0efc22520fa6c399cf73960dced423b048

is the commit they added it to their policy.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes
  2025-07-01 16:20   ` Junio C Hamano
@ 2025-07-08 14:23     ` Christian Couder
  0 siblings, 0 replies; 34+ messages in thread
From: Christian Couder @ 2025-07-08 14:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Git PLC

On Tue, Jul 1, 2025 at 6:20 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:

> > Here we would forbid licensing any "AI content generator" output, not
> > just AI code generator output. So what we would forbid might be more
> > general than what QEMU folks forbid. For example they might still
> > accept a new logo, or even commit messages, made using an AI while we
> > wouldn't.
>
> I didn't think about the distinction you are trying to draw when I
> wrote the patch, but after thinking about it, I think it is a good
> thing to prevent us from adopting a new logo graphics somebody may
> have ownership rights without us knowing.  I would consider the
> commit log message as an integral part of any "contribution", and
> read the word "contribution" used in the [[dco]] section as such, if
> the rule covers the commit log message, that is very much
> appreciated.

I am not sure about logos, but for the commit message, it seems to me
that it could have drawbacks related to translation or wordings.

For example if someone is not a good English writer, they could write
a commit message in their native language and then ask an AI to
translate it. Or they could write it in their bad English and then ask
an AI to improve the wordings. I am not sure we want to forbid all
that.

> >> +Hence, the project asks that contributors refrain from using AI content
> >> +generators on changes that are submitted to the project.
> >
> > Here it looks like using an AI capable of generating content to just
> > check code that would be submitted could also be forbidden. I don't
> > think this is what we want, so I think we might want to reword this.
>
> Good point.  Asking agents to proofread and suggest improvements is
> like asking your friends to do so.  Care to suggest replacement to
> these two sentences (above and below)?

I could try but I would feel better if we tried to find and ask people
around who have thought about this subject already.

Especially I think it's difficult to draw the line between a tool that
suggests improvements and a tool that generates content. For example
if I were a very bad English writer and asked an AI to suggest
improvements to a commit message I wrote, then the AI might actually
rewrite nearly everything and the result could be very similar to what
the AI would have generated in the first place based only on the diff
part of the patch.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2] SubmittingPatches: add section about AI
  2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano
  2025-06-30 21:07 ` brian m. carlson
  2025-07-01 10:36 ` Christian Couder
@ 2025-10-01 14:02 ` Christian Couder
  2025-10-01 18:59   ` Chuck Wolber
                     ` (2 more replies)
  2 siblings, 3 replies; 34+ messages in thread
From: Christian Couder @ 2025-10-01 14:02 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC,
	Johannes Schindelin, Patrick Steinhardt, Christian Couder,
	Christian Couder

As more and more developer tools use AI, we are facing two main risks
related to AI generated content:

  - its situation regarding copyright and license is not clear,
    and:

  - more and more bad quality content could be submitted for review to
    the mailing list.

To mitigate both risks, let's add an "Use of Artificial Intelligence"
section to "Documentation/SubmittingPatches" with the goal of
discouraging its blind use to generate content that is submitted to
the project, while still allowing us to benefit from its help in some
innovative, useful and less risky ways.

Helped-by: Rick Sanders <rick@sfconservancy.org>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>

---
This is inspired by the "AI guidelines" section we already have for
mentoring programs (like GSoC or Outreachy) in:

https://git.github.io/General-Application-Information/

which was discussed briefly in a PR
(https://github.com/git/git.github.io/pull/771)
and in a small thread on the mailing list
(https://lore.kernel.org/git/CAP8UFD37_qsTjM97GK2EOWHteqoUKdwxjKS-SU629H2LnbTTtA@mail.gmail.com/).

 Documentation/SubmittingPatches | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 86ca7f6a78..04191e2945 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -446,6 +446,34 @@ highlighted above.
 Only capitalize the very first letter of the trailer, i.e. favor
 "Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By".
 
+[[ai]]
+=== Use of Artificial Intelligence (AI)
+
+The Developer's Certificate of Origin requires contributors to certify
+that they know the origin of their contributions to the project and
+that they have the right to submit it under the project's license.
+It's not yet clear that this can be legally satisfied when submitting
+significant amount of content that has been generated by AI tools.
+
+Another issue with AI generated content is that AIs still often
+hallucinate or just produce bad code, commit messages, documentation
+or output, even when you point out their mistakes.
+
+To avoid these issues, we will reject anything that looks AI
+generated, that sounds overly formal or bloated, that looks like AI
+slop, that looks good on the surface but makes no sense, or that
+senders don’t understand or cannot explain.
+
+We strongly recommend using AI tools carefully and responsibly.
+
+Contributors would often benefit more from AI by using it to guide and
+help them step by step towards producing a solution by themselves
+rather than by asking for a full solution that they would then mostly
+copy-paste. They can also use AI to help with debugging, or with
+checking for obvious mistakes, things that can be improved, things
+that don’t match our style, guidelines or our feedback, before sending
+it to us.
+
 [[git-tools]]
 === Generate your patch using Git tools out of your commits.
 
-- 
2.51.0.195.ge34f015aea.dirty


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder
@ 2025-10-01 18:59   ` Chuck Wolber
  2025-10-01 23:32     ` brian m. carlson
  2025-10-03 13:33     ` Christian Couder
  2025-10-01 20:59   ` Junio C Hamano
  2025-10-01 21:37   ` brian m. carlson
  2 siblings, 2 replies; 34+ messages in thread
From: Chuck Wolber @ 2025-10-01 18:59 UTC (permalink / raw)
  To: Christian Couder, git
  Cc: Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC,
	Johannes Schindelin, Patrick Steinhardt, Christian Couder

On Wed Oct 1, 2025 at 2:03 PM UTC, Christian Couder wrote:

> To mitigate both risks, let's add an "Use of Artificial Intelligence"
> section to "Documentation/SubmittingPatches" with the goal of
> discouraging its blind use to generate content that is submitted to
> the project, while still allowing us to benefit from its help in some
> innovative, useful and less risky ways.

I love the intent here, but it does not seem like that came through in the
proposed patch.

I think this patch opens the door to some concerning issues, including the
potential for false accusations and inconsistent treatment of human (non-AI)
generated contributions.

Sticking to a message of self-reliance (e.g. responsible AI use) and making
some technical changes to mark AI content might be a better approach.

> +The Developer's Certificate of Origin requires contributors to certify
> +that they know the origin of their contributions to the project and
> +that they have the right to submit it under the project's license.
> +It's not yet clear that this can be legally satisfied when submitting
> +significant amount of content that has been generated by AI tools.

The legal issues around AI will be resolved in time, but the future will not
stop bringing us a steady stream of things that create legal ambiguity.

Creating one-off sections that cover _multiple_ topics _including_ legal
ambiguity seems like it risks reducing clarity. To get the full picture, this
patch (and patches like it in the future) require me to navigate multiple
sections to understand all of the project's relevant legal concerns.

I also have two specific concerns with the wording:

1. It repeats what is said just a few paragraphs earlier in the document. I
understand _why_ it does this, but moving the essence of this topic up to the
DCO section avoids the repetition and avoids diluting the project's legal
guidance.

2. What am I supposed to do with "It's not yet clear"? This is worse than
telling me nothing. It introduces a vague question with no clear guidance. It
is _true_ that no clear guidance exists, but what are the consequences when it
_does_ exist? The worst case scenario is that we have to go back and
rework/remove AI generated patches. So why not just require something like a
declaration of AI content like the one proposed at declare-ai.org?

> +To avoid these issues, we will reject anything that looks AI
> +generated, that sounds overly formal or bloated, that looks like AI
> +slop, that looks good on the surface but makes no sense, or that
> +senders don’t understand or cannot explain.

That reads like a full stop rejection of all AI generated patch content.

What if AI were to generate a great patch whose technical quality is exemplary
in every way? How is that any different from a great patch of exemplary
technical quality submitted by a person who is unambiguosly evil?

But perhaps you intended it to mean a full stop rejection of content that
_looks_ like it was generated by the primitive AI we have _today_? Even going
with the interpretation you likely intended opens up a concerning double
standard.

What if a patch "looks" AI generated, but in reality was wholly geneated by a
human? Does this mean that patches generated by humans that fit the declared
criteria would be treated as if they were AI generated?

What about a non-native speaker who uses AI in an attempt to bridge a language
barrier? By definition they would lack the ability to judge the degree to which
their patch suddenly meets your criteria.

How is any of that fair, and how could you even tell the difference?

And on a personal note, the subjective wording gives me a "walking on
eggshells" feeling. It opens the door for false accusations, and gets us away
from judging things _purely_ on their technical merit.

Would it not be more _consistent_ to continue saying what is already true? That
your patches _must_ be remarkably high quality regardless of how they were
created?

With the addition of a required AI declaration (again, check out declare-ai.org
for an example of what that might look like), I think you cover all of the
necessary bases. And sure, someone could lie. But they can lie about meeting
the DCO as well. The consequences are the same - remove/rework.

> +We strongly recommend using AI tools carefully and responsibly.

Agreed, but I think you lost me here.

Taking your words at face value, the prior paragraph reads as if the Git
project is declaring an outright ban on _all_ AI generated content (and I am
nearly certain that is _not_ what you intended to say). If so, why bother
continuing on with a PSA (Public Safety Announcement)? It reads like a
non-alcoholic drink that has the words, "Drink Responsibly" printed on the side
of the can.

> +Contributors would often benefit more from AI by using it to guide and
> +help them step by step towards producing a solution by themselves
> +rather than by asking for a full solution that they would then mostly
> +copy-paste. They can also use AI to help with debugging, or with
> +checking for obvious mistakes, things that can be improved, things
> +that don’t match our style, guidelines or our feedback, before sending
> +it to us.

I think this is very useful guidance. And although it is timely, I think it
stands a good chance of being timeless, even when AI becomes far more competent
than it is today.

AI is not going away, and we need to find a way to use it productively
_without_ losing our sense of self-reliance. If we fail to develop this ability
when AI is hardly more skilled than an above average intern, full of hubris and
zero real world experience, imagine how unqualified we will be when AI becomes
competent enough to manipulate and mislead us?

Overall, I feel like an addition to the documentation is warranted, but this
version makes me uncomfortable if not a little unwelcome. Making a techncial
change to the required declarations and expanding on the theme of self-reliance
and responsible use feels like a more productive way to address this issue.

Putting my "money where my mouth is", I am more than happy to suggest a
revision to this patch if you would like. I wanted to avoid that right now
because it seemed like a dialog was warranted first.

..Ch:W..

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder
  2025-10-01 18:59   ` Chuck Wolber
@ 2025-10-01 20:59   ` Junio C Hamano
  2025-10-03  8:51     ` Christian Couder
  2025-10-01 21:37   ` brian m. carlson
  2 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2025-10-01 20:59 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin,
	Patrick Steinhardt, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> As more and more developer tools use AI, we are facing two main risks
> related to AI generated content:
>
>   - its situation regarding copyright and license is not clear,
>     and:
>
>   - more and more bad quality content could be submitted for review to
>     the mailing list.
>
> To mitigate both risks, let's add an "Use of Artificial Intelligence"
> section to "Documentation/SubmittingPatches" with the goal of
> discouraging its blind use to generate content that is submitted to
> the project, while still allowing us to benefit from its help in some
> innovative, useful and less risky ways.
>
> Helped-by: Rick Sanders <rick@sfconservancy.org>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
>
> ---
> This is inspired by the "AI guidelines" section we already have for

A more important thing to mention is that Rick is a lawyer at SFC
helped us to draft the wording used in this one.

> +[[ai]]
> +=== Use of Artificial Intelligence (AI)
> +
> +The Developer's Certificate of Origin requires contributors to certify
> +that they know the origin of their contributions to the project and
> +that they have the right to submit it under the project's license.
> +It's not yet clear that this can be legally satisfied when submitting
> +significant amount of content that has been generated by AI tools.
> +
> +Another issue with AI generated content is that AIs still often
> +hallucinate or just produce bad code, commit messages, documentation
> +or output, even when you point out their mistakes.
> +
> +To avoid these issues, we will reject anything that looks AI
> +generated, that sounds overly formal or bloated, that looks like AI
> +slop, that looks good on the surface but makes no sense, or that
> +senders don’t understand or cannot explain.

A milder way to phrase this would be to jump directly to "we reject
what the sender cannot explain when asked about it".  "How does this
work?"  "Why is this a good thing to do?"  "Where did it come from?"
instead of saying "looks AI generated".

It would sidestep the "who decides if it looks AI generated?" question.

> +We strongly recommend using AI tools carefully and responsibly.
> +
> +Contributors would often benefit more from AI by using it to guide and
> +help them step by step towards producing a solution by themselves
> +rather than by asking for a full solution that they would then mostly
> +copy-paste. They can also use AI to help with debugging, or with
> +checking for obvious mistakes, things that can be improved, things
> +that don’t match our style, guidelines or our feedback, before sending
> +it to us.
> +
>  [[git-tools]]
>  === Generate your patch using Git tools out of your commits.


Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder
  2025-10-01 18:59   ` Chuck Wolber
  2025-10-01 20:59   ` Junio C Hamano
@ 2025-10-01 21:37   ` brian m. carlson
  2025-10-03 14:25     ` Christian Couder
  2025-10-03 20:48     ` Elijah Newren
  2 siblings, 2 replies; 34+ messages in thread
From: brian m. carlson @ 2025-10-01 21:37 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC,
	Johannes Schindelin, Patrick Steinhardt, Christian Couder

[-- Attachment #1: Type: text/plain, Size: 3780 bytes --]

On 2025-10-01 at 14:02:50, Christian Couder wrote:
> +[[ai]]
> +=== Use of Artificial Intelligence (AI)
> +
> +The Developer's Certificate of Origin requires contributors to certify
> +that they know the origin of their contributions to the project and
> +that they have the right to submit it under the project's license.
> +It's not yet clear that this can be legally satisfied when submitting
> +significant amount of content that has been generated by AI tools.

Perhaps we'd like to write this:

  It's not yet clear that this can be legally satisfied when submitting
  significant amount of content that has been generated by AI tools,
  so we cannot accept this content in our project.

If we're going to have a policy, we need to be direct about it and not
let people draw their own conclusions.  Many people don't have English
as a first language and we don't want people trying to language lawyer.

We could say something like this:

  Please do not sign off your work if you’re using an LLM to contribute
  unless you have included copyright and license information for all the
  code used in that LLM.

This allows the possibility that, say, Google trains an LLM entirely on
their own code, such that there is only one copyright holder and they
can license it as they see fit.  I don't think we _need_ to consider
that case if we don't want to allow that (say, for code quality
reasons), but we could if we wanted to.

> +Another issue with AI generated content is that AIs still often
> +hallucinate or just produce bad code, commit messages, documentation
> +or output, even when you point out their mistakes.
> +
> +To avoid these issues, we will reject anything that looks AI
> +generated, that sounds overly formal or bloated, that looks like AI
> +slop, that looks good on the surface but makes no sense, or that
> +senders don’t understand or cannot explain.

I've definitely seen this.  LLMs also typically do not write nice,
logical, bisectable commits, which I personally dislike as a reviewer.

> +We strongly recommend using AI tools carefully and responsibly.

I think this is maybe not definitive enough.  If we don't believe it's
possible to sign-off when code is generated using LLMs, then we should
say definitively, "Contributors may not use AI to write contributions to
Git," or something similarly clear.

Right now, this sounds too ambiguous and it might allow someone to write
substantial code that they think is of good quality using an LLM because
in their view that's careful and responsible, when we don't think that
users can sign off on that and therefore that's not possible.  Telling
people to use tools "carefully and responsibly" is like telling people
to drive "a reasonable and prudent speed" without further qualification
and then being surprised when they go 200 km/hr down the road.

I'd like to see the language be more like our code of conduct in that it
is broad and covers a wide variety of behaviour but also explicitly
states what is and is not acceptable to avoid ambiguity, confusion, or
argument.

> +Contributors would often benefit more from AI by using it to guide and
> +help them step by step towards producing a solution by themselves
> +rather than by asking for a full solution that they would then mostly
> +copy-paste. They can also use AI to help with debugging, or with
> +checking for obvious mistakes, things that can be improved, things
> +that don’t match our style, guidelines or our feedback, before sending
> +it to us.

This kind of use I feel is less objectionable.  I think it might be
acceptable to use an LLM as a guide, a linter, or a first-pass code
review.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-01 18:59   ` Chuck Wolber
@ 2025-10-01 23:32     ` brian m. carlson
  2025-10-02  2:30       ` Ben Knoble
  2025-10-03 13:33     ` Christian Couder
  1 sibling, 1 reply; 34+ messages in thread
From: brian m. carlson @ 2025-10-01 23:32 UTC (permalink / raw)
  To: Chuck Wolber
  Cc: Christian Couder, git, Junio C Hamano, Taylor Blau, Rick Sanders,
	Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 6109 bytes --]

On 2025-10-01 at 18:59:31, Chuck Wolber wrote:
> 1. It repeats what is said just a few paragraphs earlier in the document. I
> understand _why_ it does this, but moving the essence of this topic up to the
> DCO section avoids the repetition and avoids diluting the project's legal
> guidance.
> 
> 2. What am I supposed to do with "It's not yet clear"? This is worse than
> telling me nothing. It introduces a vague question with no clear guidance. It
> is _true_ that no clear guidance exists, but what are the consequences when it
> _does_ exist? The worst case scenario is that we have to go back and
> rework/remove AI generated patches. So why not just require something like a
> declaration of AI content like the one proposed at declare-ai.org?

I agree that this is unclear, which is why I suggested we be more
definitive.

Many of the companies that develop LLMs are headquartered in the United
States.  Many of the people that contribute to Git or distribute Git are
not.  For instance, I am located in Canada, which has different
copyright laws (we have the more limited fair dealing like the UK,
instead of the US's fair use) and has moral rights.  It is entirely
possible that the use of an LLM could be legal in one country or
jurisdiction but not another.

By accepting code that is written using LLMs into Git, we expose our
contributors (who implicitly distribute Git code by uploading it to
servers) and distributors (such as Linux distros or their distributors)
to potential liability if the use of a particular LLM or LLMs in general
are found to be illegal in their jurisdiction.  Unlike most of the
companies that develop LLMs, most contributors and distributors of Git
are individuals or non-profits with limited resources.  Even as someone
who works in the tech industry and is paid accordingly, defending a
copyright claim would be extremely expensive and probably financially
devastating for me and I really do not want to take that risk.

That's why simply declaring LLM use is not acceptable: because it
exposes others who have limited resources to legal risk.  Note that
ripping it out afterwards would require rewriting the Git history and
would not solve the problem of all of the people who are distributing or
using older versions (which would have been judged to violate copyright
law) or relieve them of the fact that they would have been exposed to
legal liability for their distribution.

The avoidance of legal problems is why we require sign-off.  If
Developer X signs off a patch that was later judged to violate copyright
law, then they have made a legally binding statement to that effect and
they have effectively accepted the entire legal liability for that[0].  If
we don't believe people can legally make certain types of contributions,
then we should explicitly tell people that they should not make that
legal statement to avoid any ambiguity.

This is very different from situations where companies make a decision
to incorporate LLM-generated code into their own codebases.  They can
hire lawyers to determine whether LLM-generated code is legal in their
given jurisdiction and obtain whatever legal necessities are required to
operate in compliance with the law.  They also usually have substantial
resources to address any problems that come up.  We, on the other hand,
are effectively a global project, must engage in behaviour that is legal
in all or nearly all jurisdictions, and have very limited resources.

> That reads like a full stop rejection of all AI generated patch content.
> 
> What if AI were to generate a great patch whose technical quality is exemplary
> in every way? How is that any different from a great patch of exemplary
> technical quality submitted by a person who is unambiguosly evil?

There are a couple of problems here: one, some AI code (including
documentation or other text) is of poor quality; two, regardless of the
quality, many people submit AI-generated code they do not understand;
and three, AI-generated code is a legal minefield.

A technically great patch solves the first but not the other two.  We
still need people who submit code to be able to explain their changes
and respond to questions about the code.  What decisions were made?  Why
were they made?  What are the tradeoffs and downsides?

> Taking your words at face value, the prior paragraph reads as if the Git
> project is declaring an outright ban on _all_ AI generated content (and I am
> nearly certain that is _not_ what you intended to say). If so, why bother
> continuing on with a PSA (Public Safety Announcement)? It reads like a
> non-alcoholic drink that has the words, "Drink Responsibly" printed on the side
> of the can.

I think this is actually what they intended to say, but did so poorly.
I agree clarification would be valuable.

> AI is not going away, and we need to find a way to use it productively
> _without_ losing our sense of self-reliance. If we fail to develop this ability
> when AI is hardly more skilled than an above average intern, full of hubris and
> zero real world experience, imagine how unqualified we will be when AI becomes
> competent enough to manipulate and mislead us?

I think you assume LLMs can have intelligence.  They are glorified
prediction engines, effectively fancy Markov chains.  In some cases,
that can be useful and valuable and we can do interesting things with
them, but they cannot actually have intelligence, creativity or reason.

And LLMs already manipulate and mislead people.  They have been
implicated in goading teenagers to suicide or leading people into
conspiracy theories.  Some LLMs espouse racist, anti-Semitic, or
otherwise hateful views.  That's a good reason to be wary of them and
how they're incorporated to our lives, at least until such a time that
they have appropriate safety measures and regulation in place (if that
ever happens).

[0] I refer you to the common-law doctrine of promissory estoppel.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-01 23:32     ` brian m. carlson
@ 2025-10-02  2:30       ` Ben Knoble
  0 siblings, 0 replies; 34+ messages in thread
From: Ben Knoble @ 2025-10-02  2:30 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Chuck Wolber, Christian Couder, git, Junio C Hamano, Taylor Blau,
	Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder


> Le 1 oct. 2025 à 19:44, brian m. carlson <sandals@crustytoothpaste.net> a écrit :
> 
> On 2025-10-01 at 18:59:31, Chuck Wolber wrote:
> 
>> AI is not going away, and we need to find a way to use it productively
>> _without_ losing our sense of self-reliance. If we fail to develop this ability
>> when AI is hardly more skilled than an above average intern, full of hubris and
>> zero real world experience, imagine how unqualified we will be when AI becomes
>> competent enough to manipulate and mislead us?
> 
> I think you assume LLMs can have intelligence.  They are glorified
> prediction engines, effectively fancy Markov chains.  In some cases,
> that can be useful and valuable and we can do interesting things with
> them, but they cannot actually have intelligence, creativity or reason.
> 
> And LLMs already manipulate and mislead people.  They have been
> implicated in goading teenagers to suicide or leading people into
> conspiracy theories.  Some LLMs espouse racist, anti-Semitic, or
> otherwise hateful views.  That's a good reason to be wary of them and
> how they're incorporated to our lives, at least until such a time that
> they have appropriate safety measures and regulation in place (if that
> ever happens).

A tangent, and one I’m happy to continue but off-list (I’m happy to continue publicly, but this is not the forum): I’d encourage folks to give the LLMentalist Effect [1] a read. Regardless of where you fall on “intelligence vs stochastic parrot,” I think you’ll find some interesting conclusions.

[1]: https://softwarecrisis.dev/letters/llmentalist

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-01 20:59   ` Junio C Hamano
@ 2025-10-03  8:51     ` Christian Couder
  2025-10-03 16:20       ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Christian Couder @ 2025-10-03  8:51 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin,
	Patrick Steinhardt, Christian Couder

On Wed, Oct 1, 2025 at 10:59 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > As more and more developer tools use AI, we are facing two main risks
> > related to AI generated content:
> >
> >   - its situation regarding copyright and license is not clear,
> >     and:
> >
> >   - more and more bad quality content could be submitted for review to
> >     the mailing list.
> >
> > To mitigate both risks, let's add an "Use of Artificial Intelligence"
> > section to "Documentation/SubmittingPatches" with the goal of
> > discouraging its blind use to generate content that is submitted to
> > the project, while still allowing us to benefit from its help in some
> > innovative, useful and less risky ways.
> >
> > Helped-by: Rick Sanders <rick@sfconservancy.org>
> > Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> >
> > ---
> > This is inspired by the "AI guidelines" section we already have for
>
> A more important thing to mention is that Rick is a lawyer at SFC
> helped us to draft the wording used in this one.

Yeah, right, I will mention it in a v3 if there is one.

> > +[[ai]]
> > +=== Use of Artificial Intelligence (AI)
> > +
> > +The Developer's Certificate of Origin requires contributors to certify
> > +that they know the origin of their contributions to the project and
> > +that they have the right to submit it under the project's license.
> > +It's not yet clear that this can be legally satisfied when submitting
> > +significant amount of content that has been generated by AI tools.
> > +
> > +Another issue with AI generated content is that AIs still often
> > +hallucinate or just produce bad code, commit messages, documentation
> > +or output, even when you point out their mistakes.
> > +
> > +To avoid these issues, we will reject anything that looks AI
> > +generated, that sounds overly formal or bloated, that looks like AI
> > +slop, that looks good on the surface but makes no sense, or that
> > +senders don’t understand or cannot explain.
>
> A milder way to phrase this would be to jump directly to "we reject
> what the sender cannot explain when asked about it".  "How does this
> work?"  "Why is this a good thing to do?"  "Where did it come from?"
> instead of saying "looks AI generated".
>
> It would sidestep the "who decides if it looks AI generated?" question.

I don't think the "who decides if it looks AI generated?" question is
very relevant. If someone says that a patch looks mostly AI generated
and gives a good argument supporting this claim, it's the same as if
someone gives any other good argument against the patch. In the end,
the community and you decide if the argument is good enough and if the
patch should be rejected based on that (and other arguments for and
against the patch of course).

For example, let's suppose that in the future someone knows that
ChatGPT7 is very likely to use double dash ("--") and the word
"absolutely" a lot in its sentences, and notices that a contributor
sent a long documentation patch that is full of them. I would say that
it would be a good argument to reject that patch. We could be wrong in
rejecting the patch because of that argument, because maybe the
writer's style happens to be similar to ChatGPT7's style, but I think
we should have the possibility to reject such patches based on the
fact that they definitely look AI generated. Otherwise I don't think
we can seriously claim that we try to uphold the DCO as well as we
can.

So I think we definitely need to say something like "we will reject
anything that looks AI generated" or maybe "we will reject anything
that looks significantly AI generated". In the v3 if there is one, I
will change the wording to the latter.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-01 18:59   ` Chuck Wolber
  2025-10-01 23:32     ` brian m. carlson
@ 2025-10-03 13:33     ` Christian Couder
  1 sibling, 0 replies; 34+ messages in thread
From: Christian Couder @ 2025-10-03 13:33 UTC (permalink / raw)
  To: Chuck Wolber
  Cc: git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC,
	Johannes Schindelin, Patrick Steinhardt, Christian Couder

On Wed, Oct 1, 2025 at 8:59 PM Chuck Wolber <chuck@wolber.net> wrote:
>
> On Wed Oct 1, 2025 at 2:03 PM UTC, Christian Couder wrote:
>
> > To mitigate both risks, let's add an "Use of Artificial Intelligence"
> > section to "Documentation/SubmittingPatches" with the goal of
> > discouraging its blind use to generate content that is submitted to
> > the project, while still allowing us to benefit from its help in some
> > innovative, useful and less risky ways.
>
> I love the intent here, but it does not seem like that came through in the
> proposed patch.
>
> I think this patch opens the door to some concerning issues, including the
> potential for false accusations and inconsistent treatment of human (non-AI)
> generated contributions.

I don't think the patch changes anything regarding false accusation
and inconsistent treatment of human generated contributions.

> Sticking to a message of self-reliance (e.g. responsible AI use) and making
> some technical changes to mark AI content might be a better approach.

I don't think we want to mark AI content. It would be too much of a
burden managing this especially knowing the limit of what should be
marked or not.

> > +The Developer's Certificate of Origin requires contributors to certify
> > +that they know the origin of their contributions to the project and
> > +that they have the right to submit it under the project's license.
> > +It's not yet clear that this can be legally satisfied when submitting
> > +significant amount of content that has been generated by AI tools.
>
> The legal issues around AI will be resolved in time, but the future will not
> stop bringing us a steady stream of things that create legal ambiguity.
>
> Creating one-off sections that cover _multiple_ topics _including_ legal
> ambiguity seems like it risks reducing clarity. To get the full picture, this
> patch (and patches like it in the future) require me to navigate multiple
> sections to understand all of the project's relevant legal concerns.

I don't think having this section on top of the rest is a big burden
for developers in general. Perhaps you are very concerned about the
legal issues in the project you contribute to, but on the other hand
there weren't a lot of concerns when we added the similar AI
guidelines in https://git.github.io/General-Application-Information/.

> I also have two specific concerns with the wording:
>
> 1. It repeats what is said just a few paragraphs earlier in the document. I
> understand _why_ it does this, but moving the essence of this topic up to the
> DCO section avoids the repetition and avoids diluting the project's legal
> guidance.

Being able to refer people to a single section about AI has some
benefits. If you have a wording that reduces the repetition while
still making the AI section easily understandable on its own, I am
willing to consider it for a v3 version of this patch.

> 2. What am I supposed to do with "It's not yet clear"? This is worse than
> telling me nothing. It introduces a vague question with no clear guidance. It
> is _true_ that no clear guidance exists, but what are the consequences when it
> _does_ exist? The worst case scenario is that we have to go back and
> rework/remove AI generated patches.

When guidance will exist, we might have to change our "AI use"
section, but we can deal with that then. It's better to adapt now to
the current situation as well as we can rather than try to anticipate
the future while we can't really know what it will look like.

And if we have done our best to avoid accepting too much AI generated
content now, then hopefully we won't have to go back and rework/remove
many AI generated patches.

> So why not just require something like a
> declaration of AI content like the one proposed at declare-ai.org?

I think this could add a lot of complexity to the process. For example
people could be using many different AI tools in every contribution,
like:

- for code completion,
- for checking for memory leaks,
- for checking for possible refactorings,
- for commit message translation from their native language to English,
- for email translation from their native language to English,
- for better understanding the feedback they received,
- for helping with the forge they are using (what if it performs
interactive rebases for example),
- etc

They might not know where to stop and might not even know if their
email software (like GMail for example) is already using AI to help
them write messages.

It's also possible to ask different AIs to do the same job, for
example checking for errors in the patches that are about to be sent.
What if some AIs find no improvements and others find some? Shoud what
every AI found be mentioned?

What if AIs start debating between themselves whether something is an
error or not and cannot come to a conclusion? Should that debate be
kept somehow?

And no, this is not pure speculation. I talked recently to someone
working on an IDE and thinking about saving into Git all the AI
context (including such AI debates) around some contributions to make
sure it's available for other AIs and humans working down the road on
further work based on those contributions.

In short if we now ask people to declare, then those who try to do the
right thing will spend a lot of time figuring things out and being
burdened for perhaps no good reason while those who won't care and
will do the worst on that will have the most benefits as they will not
be burdened and save a lot of time.

If automated processes are one day easily available to record some AI
context, then I don't think we would be against them, and maybe we can
decide then to ask people to use them. But we are not there yet, we
don't know what they will look like and require, and it's just not our
role to push on this.

> > +To avoid these issues, we will reject anything that looks AI
> > +generated, that sounds overly formal or bloated, that looks like AI
> > +slop, that looks good on the surface but makes no sense, or that
> > +senders don’t understand or cannot explain.
>
> That reads like a full stop rejection of all AI generated patch content.

In a reply to Junio, I have suggested changing "we will reject
anything that looks AI generated" to "we will reject anything that
looks significantly AI generated". I am open to tweaking that even
more, but we need to say somehow that submitting a lot of AI generated
content as-is is not welcome. Otherwise we just don't mitigate the
risks we want to mitigate. (See my reply to Junio.)

> What if AI were to generate a great patch whose technical quality is exemplary
> in every way? How is that any different from a great patch of exemplary
> technical quality submitted by a person who is unambiguosly evil?

If an AI were to generate a great patch no different than what a human
would generate, then we cannot say that it looks AI generated, and
then the only issue is "Do we trust the person sending the patch?". If
the person has sent a lot of patches that looked AI generated in the
past, we might reject the patch based on that. Otherwise, the issue is
the same as if someone sends some proprietary code. Yeah, we could
accept code that is proprietary if someone sends it to us and we don't
realize it's proprietary code, but then if they signed off the patch,
they are responsible for that according to the DCO.

> But perhaps you intended it to mean a full stop rejection of content that
> _looks_ like it was generated by the primitive AI we have _today_? Even going
> with the interpretation you likely intended opens up a concerning double
> standard.
>
> What if a patch "looks" AI generated, but in reality was wholly geneated by a
> human?

Mistakes happen. We could indeed be wrong to reject the patch based on
that. See my reply to Junio about this.

The thing is that we cannot eat our cake and have it too. If we want
to protect the project from risks related to too much AI generated
content, we need to be able to reject such content based on some
criteria that are unlikely to be perfect.

> Does this mean that patches generated by humans that fit the declared
> criteria would be treated as if they were AI generated?

Patches generated by humans that look like AI generated patches will
probably be treated as if they were AI generated. That's unfortunate,
but hopefully soon the few people who would generate patches that look
like AI generated patches will learn and will soon make their patches
look different than AI generated ones.

> What about a non-native speaker who uses AI in an attempt to bridge a language
> barrier? By definition they would lack the ability to judge the degree to which
> their patch suddenly meets your criteria.

This is one of the reasons why this v2 is different from the previous
v1. We don't outright reject any use of generative AI in this v2, we
want to say that the result shouldn't look like a lot of AI generated
content sent as-is. If an AI was used to translate something that was
initially human generated, it will hopefully not sound like it was
fully AI generated.

And yeah mistakes can happen, but hopefully the community and the
maintainer will be able to learn and adapt from them and the process
will be relatively smooth after some time.

> How is any of that fair, and how could you even tell the difference?

It's a judgment call, like when we decide if a patch is technically
good enough to be accepted. In practice I think we will often
recommend rewriting parts that look AI generated in the same way we
ask to rewrite bad code or bad commit messages. We might sometimes not
even mention that it seems to us like it was AI generated.

You might say that it might then not be worth having an "Use of AI"
section in our SubmittingPatches document, but we think it's still
useful for different reasons like:

- it shows that we are trying to do something against the AI related
risks, especially the legal one,
- it might save us from reviewing AI generated content in the first
place if contributors read our SubmittingPatches document before
working on patches,
- it could give contributors good ideas about how to use AI in acceptable ways,
- it signals to our reviewers that they should speak up against, or
just reject, what looks like a lot of AI generated content,
- it gives reviewers the possibility to refer contributors to some
documentation about the subject.

> And on a personal note, the subjective wording gives me a "walking on
> eggshells" feeling. It opens the door for false accusations, and gets us away
> from judging things _purely_ on their technical merit.

If we see content in some patches that looks copyrighted by a company,
and we are not confident that the company agreed to release it under a
compatible license, we can already reject it on non technical merit.
We could even already say something like:

"Your code looks obviously AI generated for such and such a reason. We
are not sure that so much AI generated code is compatible with the DCO
as the AI could have copy-pasted proprietary code it saw during its
training. So we are going to reject it."

So things don't fundamentally change. In this regard, this patch just
clarifies things for contributors and reviewers.

In some ways, the section that this patch adds is not different from
other sections like for example "Make separate commits for logically
separate changes." Yeah, perhaps many developers are unfortunately not
used to making separate commits for logically separate changes, and
they put a lot of different things into a single commit, and they
don't want to spend time reworking their working commits. So they
might feel that their contributions are going to be judged on baseless
red tape merit instead of the real thing. But anyway we state our
standards clearly, so they should know in advance how their
contributions are going to be judged.

> Would it not be more _consistent_ to continue saying what is already true? That
> your patches _must_ be remarkably high quality regardless of how they were
> created?

The issue is that quality might not be defined in the same way by
everyone. Some aspects of what we consider quality might be considered
otherwise (maybe "useless red tape") by some. So it's better to be
explicit as much as we can.

> With the addition of a required AI declaration (again, check out declare-ai.org
> for an example of what that might look like), I think you cover all of the
> necessary bases. And sure, someone could lie. But they can lie about meeting
> the DCO as well. The consequences are the same - remove/rework.
>
> > +We strongly recommend using AI tools carefully and responsibly.
>
> Agreed, but I think you lost me here.
>
> Taking your words at face value, the prior paragraph reads as if the Git
> project is declaring an outright ban on _all_ AI generated content (and I am
> nearly certain that is _not_ what you intended to say).

Yeah, we don't intend to ban _all_ AI generated content. Please
suggest other wordings if some sentences read like that.

What we don't want is a lot of AI generated content that no human was
involved in creating. If a human was involved in creating some
content, then the human has at least some copyright and some
responsibility on it.

> If so, why bother
> continuing on with a PSA (Public Safety Announcement)? It reads like a
> non-alcoholic drink that has the words, "Drink Responsibly" printed on the side
> of the can.

On prescription and over-the-counter drug packaging there are
sometimes "Boxed Warning" (or warnings along with a red warning
triangle pictogram in Europe) designed to alert people to potential
side effects that could impair their ability to drive or operate heavy
machinery safely. This sentence ("We strongly recommend using AI tools
carefully and responsibly.") is a bit similar. It is intended to make
people who would machinally read or look at the document pause and
think for a bit. It's a good thing when used sparingly and for good
reason which I think is the case here.

[...]

> Overall, I feel like an addition to the documentation is warranted, but this
> version makes me uncomfortable if not a little unwelcome. Making a techncial
> change to the required declarations and expanding on the theme of self-reliance
> and responsible use feels like a more productive way to address this issue.
>
> Putting my "money where my mouth is", I am more than happy to suggest a
> revision to this patch if you would like. I wanted to avoid that right now
> because it seemed like a dialog was warranted first.

Thanks for the review and for the offer of a revision to this patch. I
would prefer not a full new version of the patch though, but rather
some suggestions for alternative wordings of some sentences.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-01 21:37   ` brian m. carlson
@ 2025-10-03 14:25     ` Christian Couder
  2025-10-03 20:48     ` Elijah Newren
  1 sibling, 0 replies; 34+ messages in thread
From: Christian Couder @ 2025-10-03 14:25 UTC (permalink / raw)
  To: brian m. carlson, Christian Couder, git, Junio C Hamano,
	Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin,
	Patrick Steinhardt, Christian Couder

On Wed, Oct 1, 2025 at 11:37 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2025-10-01 at 14:02:50, Christian Couder wrote:
> > +[[ai]]
> > +=== Use of Artificial Intelligence (AI)
> > +
> > +The Developer's Certificate of Origin requires contributors to certify
> > +that they know the origin of their contributions to the project and
> > +that they have the right to submit it under the project's license.
> > +It's not yet clear that this can be legally satisfied when submitting
> > +significant amount of content that has been generated by AI tools.
>
> Perhaps we'd like to write this:
>
>   It's not yet clear that this can be legally satisfied when submitting
>   significant amount of content that has been generated by AI tools,
>   so we cannot accept this content in our project.
>
> If we're going to have a policy, we need to be direct about it and not
> let people draw their own conclusions.  Many people don't have English
> as a first language and we don't want people trying to language lawyer.

I understand why you want to be direct, but unfortunately (or
fortunately depending on your point of view) some generated content is
acceptable if it is not too big, or if it is specific enough or if a
human has been involved enough. In a number of cases like for example
translated or reworded content, wrapping lines, refactored code, or
renamed variables, it is likely that a significant amount of content
is acceptable because a human has already been involved and the
content is specific enough. If we say right away that we cannot accept
it, we might prevent interesting and useful use cases.

> We could say something like this:
>
>   Please do not sign off your work if you’re using an LLM to contribute
>   unless you have included copyright and license information for all the
>   code used in that LLM.

For now I don't think we want or need to be involved in checking or
trying to check what code and/or training data has been/is used in an
LLM, what LLM(s) are used in which AI tools, all the AI tools that a
user might have used, etc. See my reply to Chuck Wolber's review
related to declare-ai.org.

> This allows the possibility that, say, Google trains an LLM entirely on
> their own code, such that there is only one copyright holder and they
> can license it as they see fit.  I don't think we _need_ to consider
> that case if we don't want to allow that (say, for code quality
> reasons), but we could if we wanted to.

I agree it would be nice if some LLMs were trained only on specific
code (or on no existing code at all) so that we could alleviate the
legal issue with them, but for now I don't think they exist. We can
always adapt later if/when they ever appear.

> > +Another issue with AI generated content is that AIs still often
> > +hallucinate or just produce bad code, commit messages, documentation
> > +or output, even when you point out their mistakes.
> > +
> > +To avoid these issues, we will reject anything that looks AI
> > +generated, that sounds overly formal or bloated, that looks like AI
> > +slop, that looks good on the surface but makes no sense, or that
> > +senders don’t understand or cannot explain.
>
> I've definitely seen this.  LLMs also typically do not write nice,
> logical, bisectable commits, which I personally dislike as a reviewer.
>
> > +We strongly recommend using AI tools carefully and responsibly.
>
> I think this is maybe not definitive enough.  If we don't believe it's
> possible to sign-off when code is generated using LLMs, then we should
> say definitively, "Contributors may not use AI to write contributions to
> Git," or something similarly clear.

I think it's far too restrictive for no good reason. See above and see
my discussion about this with Junio on the first version of this patch
he sent last July.

> Right now, this sounds too ambiguous and it might allow someone to write
> substantial code that they think is of good quality using an LLM because
> in their view that's careful and responsible, when we don't think that
> users can sign off on that and therefore that's not possible.  Telling
> people to use tools "carefully and responsibly" is like telling people
> to drive "a reasonable and prudent speed" without further qualification
> and then being surprised when they go 200 km/hr down the road.

The sentence ("We strongly recommend using AI tools carefully and
responsibly.") is designed to make people pause and think a bit when
they are reading machinally or just skimming the doc. It's not
designed to set a clear limit on what is acceptable and what is not.
And in fact it couldn't do so because there is no such clear limit.

> I'd like to see the language be more like our code of conduct in that it
> is broad and covers a wide variety of behaviour but also explicitly
> states what is and is not acceptable to avoid ambiguity, confusion, or
> argument.

Feel free to make more suggestions. I don't think your goal is easy to
achieve though.

> > +Contributors would often benefit more from AI by using it to guide and
> > +help them step by step towards producing a solution by themselves
> > +rather than by asking for a full solution that they would then mostly
> > +copy-paste. They can also use AI to help with debugging, or with
> > +checking for obvious mistakes, things that can be improved, things
> > +that don’t match our style, guidelines or our feedback, before sending
> > +it to us.
>
> This kind of use I feel is less objectionable.  I think it might be
> acceptable to use an LLM as a guide, a linter, or a first-pass code
> review.

Yeah, it looks like we all agree on that. The issue is that the limit
between these acceptable kinds of use and other problematic ones is
fuzzy.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-03  8:51     ` Christian Couder
@ 2025-10-03 16:20       ` Junio C Hamano
  2025-10-03 16:45         ` rsbecker
  2025-10-08  7:22         ` Christian Couder
  0 siblings, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2025-10-03 16:20 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin,
	Patrick Steinhardt, Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

>> A milder way to phrase this would be to jump directly to "we reject
>> what the sender cannot explain when asked about it".  "How does this
>> work?"  "Why is this a good thing to do?"  "Where did it come from?"
>> instead of saying "looks AI generated".
>>
>> It would sidestep the "who decides if it looks AI generated?" question.
>
> I don't think the "who decides if it looks AI generated?" question is
> very relevant. If someone says that a patch looks mostly AI generated
> and gives a good argument supporting this claim, it's the same as if
> someone gives any other good argument against the patch. In the end,
> the community and you decide if the argument is good enough and if the
> patch should be rejected based on that (and other arguments for and
> against the patch of course).

And then who plays the final arbiter?  One can keep insisting on a
patch that looks to me an apparent AI slop that it was what one
wrote oneself, but you may find it a plausible that it was a human
creation.  Then what?

It is very much relevant to avoid such argument, because the point
is irrelevant.  We are trying to avoid accepting something the
submitter has no rights to claim theirs, and requesting them to
explain where it came from, how it works, etc. would be a better
test than "does it look AI generated?  to everybody?", wouldn't it?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-03 16:20       ` Junio C Hamano
@ 2025-10-03 16:45         ` rsbecker
  2025-10-08  7:22         ` Christian Couder
  1 sibling, 0 replies; 34+ messages in thread
From: rsbecker @ 2025-10-03 16:45 UTC (permalink / raw)
  To: 'Junio C Hamano', 'Christian Couder'
  Cc: git, 'Taylor Blau', 'Rick Sanders',
	'Git at SFC', 'Johannes Schindelin',
	'Patrick Steinhardt', 'Christian Couder'

On October 3, 2025 12:21 PM, Junio C Hamano wrote:
>Christian Couder <christian.couder@gmail.com> writes:
>
>>> A milder way to phrase this would be to jump directly to "we reject
>>> what the sender cannot explain when asked about it".  "How does this
>>> work?"  "Why is this a good thing to do?"  "Where did it come from?"
>>> instead of saying "looks AI generated".
>>>
>>> It would sidestep the "who decides if it looks AI generated?" question.
>>
>> I don't think the "who decides if it looks AI generated?" question is
>> very relevant. If someone says that a patch looks mostly AI generated
>> and gives a good argument supporting this claim, it's the same as if
>> someone gives any other good argument against the patch. In the end,
>> the community and you decide if the argument is good enough and if the
>> patch should be rejected based on that (and other arguments for and
>> against the patch of course).
>
>And then who plays the final arbiter?  One can keep insisting on a patch
that looks
>to me an apparent AI slop that it was what one wrote oneself, but you may
find it a
>plausible that it was a human creation.  Then what?
>
>It is very much relevant to avoid such argument, because the point is
irrelevant.  We
>are trying to avoid accepting something the submitter has no rights to
claim theirs,
>and requesting them to explain where it came from, how it works, etc. would
be a
>better test than "does it look AI generated?  to everybody?", wouldn't it?

Can the cover page from the originator contain statements that:
a) I (whomever it is) has the legal authority to the submitted patch without
violating any copyright.
b) The code is original work and does not violate any IP laws where I
(whomever)
am located.
c) The code is not generated from AI and/or despite being AI generated, I
(whomever)
have verified that the code works as anticipated and does not contain AI
contents
trained from another code-base or project that might otherwise violate b),
and that
I (whomever) accept all responsibility for falsely making this statement.

This could be changed to an agreement maintained by the Conservancy prior to
Accepting any non-trivial contributions providing the agreement is
referenced in
Either the cover page or commit comments.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-01 21:37   ` brian m. carlson
  2025-10-03 14:25     ` Christian Couder
@ 2025-10-03 20:48     ` Elijah Newren
  2025-10-03 22:20       ` brian m. carlson
  2025-10-08  7:30       ` Christian Couder
  1 sibling, 2 replies; 34+ messages in thread
From: Elijah Newren @ 2025-10-03 20:48 UTC (permalink / raw)
  To: brian m. carlson, Christian Couder, git, Junio C Hamano,
	Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin,
	Patrick Steinhardt, Christian Couder

On Wed, Oct 1, 2025 at 2:37 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2025-10-01 at 14:02:50, Christian Couder wrote:
> > +[[ai]]
> > +=== Use of Artificial Intelligence (AI)
> > +
> > +The Developer's Certificate of Origin requires contributors to certify
> > +that they know the origin of their contributions to the project and
> > +that they have the right to submit it under the project's license.
> > +It's not yet clear that this can be legally satisfied when submitting
> > +significant amount of content that has been generated by AI tools.
>
> Perhaps we'd like to write this:
>
>   It's not yet clear that this can be legally satisfied when submitting
>   significant amount of content that has been generated by AI tools,
>   so we cannot accept this content in our project.
>
> If we're going to have a policy, we need to be direct about it and not
> let people draw their own conclusions.  Many people don't have English
> as a first language and we don't want people trying to language lawyer.
>
> We could say something like this:
>
>   Please do not sign off your work if you’re using an LLM to contribute
>   unless you have included copyright and license information for all the
>   code used in that LLM.

Would this mean that you wanted to ban contributions like d12166d3c8bb
(Merge branch 'en/docfixes', 2023-10-23), available on the list over
at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
?   We don't need to go theoretical, I've already contributed such a
patch series before -- 2 years ago -- and it was merged.  Granted,
that was entirely documentation, and I called out the usage of AI in
the cover letter, and I manually checked every change (discarding many
of them) and split it into commits on my own, could easily explain any
change and why it was good, etc.  And I was upfront about all of it.

If any use of AI is bad, do we need to revert that series?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-03 20:48     ` Elijah Newren
@ 2025-10-03 22:20       ` brian m. carlson
  2025-10-06 17:45         ` Junio C Hamano
                           ` (2 more replies)
  2025-10-08  7:30       ` Christian Couder
  1 sibling, 3 replies; 34+ messages in thread
From: brian m. carlson @ 2025-10-03 22:20 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Christian Couder, git, Junio C Hamano, Taylor Blau, Rick Sanders,
	Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

[-- Attachment #1: Type: text/plain, Size: 4925 bytes --]

On 2025-10-03 at 20:48:40, Elijah Newren wrote:
> Would this mean that you wanted to ban contributions like d12166d3c8bb
> (Merge branch 'en/docfixes', 2023-10-23), available on the list over
> at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
> ?   We don't need to go theoretical, I've already contributed such a
> patch series before -- 2 years ago -- and it was merged.  Granted,
> that was entirely documentation, and I called out the usage of AI in
> the cover letter, and I manually checked every change (discarding many
> of them) and split it into commits on my own, could easily explain any
> change and why it was good, etc.  And I was upfront about all of it.

I think the main problem here is that we don't know the copyright
status of LLM outputs.  It is not uncommon for them to produce output
that reflects their training input and we see evidence of that in, for
instance, the New York Times lawsuit against OpenAI.

As I said, the situation is very unclear legally, with active litigation
in multiple countries, and we have to comply with pretty much every
country's laws in this situation.  Whether something is legal in the
United States, where you're located, is completely irrelevant to whether
it is legal in Canada, where I'm located, or Germany or the UK, where we
have other contributors.  We also have to consider whether it's legal in
all of the countries that Git is distributed in, which includes every
country in which Debian has a mirror[0], even countries under
international sanctions, such as Iran, Russia, and Belarus.

It doesn't matter if the person using AI has indemnification, either,
since that only covers civil matters, and at least in the U.S. and
Canada, knowingly violating copyright is also a criminal offence.

The sign-off process is designed to clearly state that a person has the
ability to contribute code under the license and I don't think, as
things stand, it's possible to make that assertion with code or
documentation generated from an LLM except in very limited
circumstances.  I don't allow LLM-generated code in my personal projects
that require sign-off for that reason, and neither does QEMU[1].  I
don't think I could honestly assert either (a) or (b) in the DCO with
LLM-generated code because it's not clear to me whether "I have the
right to submit it under the…license."

To quote the QEMU policy:

  To satisfy the DCO, the patch contributor has to fully understand the
  copyright and license status of content they are contributing to QEMU. With AI
  content generators, the copyright and license status of the output is
  ill-defined with no generally accepted, settled legal foundation.

  Where the training material is known, it is common for it to include large
  volumes of material under restrictive licensing/copyright terms. Even where
  the training material is all known to be under open source licenses, it is
  likely to be under a variety of terms, not all of which will be compatible
  with QEMU's licensing requirements.

I remember the SCO situation with Linux and how it really created a lot
of uncertainty with Linux because SCO created FUD around Linux licensing
and how that led to the DCO being created.  I am aware of the fact that
many open source contributors are very unhappy that their code has been
used to train LLMs without retaining credits and copyright notices or
honouring the license terms[2].  And I have spent many years working
with non-profits[3], where I have always been taught that we should
avoid even the appearance of impropriety.

It may matter less what the situation actually ends up being legally
(although it could end up being quite bad) and more whether someone can
imply or suggest that Git is not being distributed in compliance with
the license or contains infringing code, which could effectively make it
undistributable because nobody wants to take that risk.  And litigation,
even if Git and its contributors are successful, can be extraordinarily
expensive.

So I think, given the circumstances, yes, the right thing to do is to
ban LLM-generated contributions with a policy very similar or identical
to QEMU's.  If, in the future, the legal situation changes and it
becomes unambiguously legal to use LLMs across the world, then we can
reconsider that policy then.

[0] https://www.debian.org/mirror/list
[1] https://github.com/qemu/qemu/commit/3d40db0efc22520fa6c399cf73960dced423b048
[2] Regardless of the legal concerns, this implicates professional
ethics concerns, such as §1.5 of the ACM Code of Ethics[4].  Ethics
requirements usually go well beyond what the law requires.
[3] Software Freedom Conservancy, which handles legal matters for the
Git project, is a non-profit.
[4] https://www.acm.org/code-of-ethics
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-03 22:20       ` brian m. carlson
@ 2025-10-06 17:45         ` Junio C Hamano
  2025-10-08  4:18           ` Elijah Newren
  2025-10-08  9:28           ` Christian Couder
  2025-10-08  4:18         ` Elijah Newren
  2025-10-08  8:37         ` Christian Couder
  2 siblings, 2 replies; 34+ messages in thread
From: Junio C Hamano @ 2025-10-06 17:45 UTC (permalink / raw)
  To: brian m. carlson
  Cc: Elijah Newren, Christian Couder, git, Taylor Blau, Rick Sanders,
	Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

"brian m. carlson" <sandals@crustytoothpaste.net> writes:

> It may matter less what the situation actually ends up being legally
> (although it could end up being quite bad) and more whether someone can
> imply or suggest that Git is not being distributed in compliance with
> the license or contains infringing code, which could effectively make it
> undistributable because nobody wants to take that risk.  And litigation,
> even if Git and its contributors are successful, can be extraordinarily
> expensive.
>
> So I think, given the circumstances, yes, the right thing to do is to
> ban LLM-generated contributions with a policy very similar or identical
> to QEMU's.  If, in the future, the legal situation changes and it
> becomes unambiguously legal to use LLMs across the world, then we can
> reconsider that policy then.

OK, so here is theirs for further discussion minimally adjusted for
our use.  I do not see much difference at least in spirit with what
started this thread, but phrasing is certainly firmer, and I have no
problem with it.

Use of AI content generators
~~~~~~~~~~~~~~~~~~~~~~~~~~~

TL;DR:

  **Current Git project policy is copied from what QEMU does.  To
  DECLINE any contributions which are believed to include or derive
  from AI generated content. This includes ChatGPT, Claude, Copilot,
  Llama and similar tools.**

The increasing prevalence of AI-assisted software development results in a
number of difficult legal questions and risks for software projects, including
Git.  Of particular concern is content generated by `Large Language Models
<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).

The Git community requires that contributors certify their patch submissions
are made in accordance with the rules of the `Developer's Certificate of
Origin (DCO) <dco>`.

To satisfy the DCO, the patch contributor has to fully understand the
copyright and license status of content they are contributing to Git. With AI
content generators, the copyright and license status of the output is
ill-defined with no generally accepted, settled legal foundation.

Where the training material is known, it is common for it to include large
volumes of material under restrictive licensing/copyright terms. Even where
the training material is all known to be under open source licenses, it is
likely to be under a variety of terms, not all of which will be compatible
with Git's licensing requirements.

How contributors could comply with DCO terms (b) or (c) for the output of AI
content generators commonly available today is unclear.  The Git project is
not willing or able to accept the legal risks of non-compliance.

The Git project thus requires that contributors refrain from using AI content
generators on patches intended to be submitted to the project, and will
decline any contribution if use of AI is either known or suspected.

This policy does not apply to other uses of AI, such as researching APIs or
algorithms, static analysis, or debugging, provided their output is not to be
included in contributions.

Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
generation agents which are built on top of such tools.

This policy may evolve as AI tools mature and the legal situation is
clarifed. In the meanwhile, requests for exceptions to this policy will be
evaluated by the Git project on a case by case basis. To be granted an
exception, a contributor will need to demonstrate clarity of the license and
copyright status for the tool's output in relation to its training model and
code, to the satisfaction of the project maintainers.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-03 22:20       ` brian m. carlson
  2025-10-06 17:45         ` Junio C Hamano
@ 2025-10-08  4:18         ` Elijah Newren
  2025-10-08  8:37         ` Christian Couder
  2 siblings, 0 replies; 34+ messages in thread
From: Elijah Newren @ 2025-10-08  4:18 UTC (permalink / raw)
  To: brian m. carlson, Elijah Newren, Christian Couder, git,
	Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC,
	Johannes Schindelin, Patrick Steinhardt, Christian Couder

On Fri, Oct 3, 2025 at 3:20 PM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2025-10-03 at 20:48:40, Elijah Newren wrote:
> > Would this mean that you wanted to ban contributions like d12166d3c8bb
> > (Merge branch 'en/docfixes', 2023-10-23), available on the list over
> > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
> > ?   We don't need to go theoretical, I've already contributed such a
> > patch series before -- 2 years ago -- and it was merged.  Granted,
> > that was entirely documentation, and I called out the usage of AI in
> > the cover letter, and I manually checked every change (discarding many
> > of them) and split it into commits on my own, could easily explain any
> > change and why it was good, etc.  And I was upfront about all of it.
>
> I think the main problem here is that we don't know the copyright
> status of LLM outputs.  It is not uncommon for them to produce output
> that reflects their training input and we see evidence of that in, for
> instance, the New York Times lawsuit against OpenAI.
>
> As I said, the situation is very unclear legally, with active litigation
> in multiple countries, and we have to comply with pretty much every
> country's laws in this situation.  Whether something is legal in the
> United States, where you're located, is completely irrelevant to whether
> it is legal in Canada, where I'm located, or Germany or the UK, where we
> have other contributors.  We also have to consider whether it's legal in
> all of the countries that Git is distributed in, which includes every
> country in which Debian has a mirror[0], even countries under
> international sanctions, such as Iran, Russia, and Belarus.
>
> It doesn't matter if the person using AI has indemnification, either,
> since that only covers civil matters, and at least in the U.S. and
> Canada, knowingly violating copyright is also a criminal offence.
>
> The sign-off process is designed to clearly state that a person has the
> ability to contribute code under the license and I don't think, as
> things stand, it's possible to make that assertion with code or
> documentation generated from an LLM except in very limited
> circumstances.  I don't allow LLM-generated code in my personal projects
> that require sign-off for that reason, and neither does QEMU[1].  I
> don't think I could honestly assert either (a) or (b) in the DCO with
> LLM-generated code because it's not clear to me whether "I have the
> right to submit it under the…license."
>
> To quote the QEMU policy:
>
>   To satisfy the DCO, the patch contributor has to fully understand the
>   copyright and license status of content they are contributing to QEMU. With AI
>   content generators, the copyright and license status of the output is
>   ill-defined with no generally accepted, settled legal foundation.
>
>   Where the training material is known, it is common for it to include large
>   volumes of material under restrictive licensing/copyright terms. Even where
>   the training material is all known to be under open source licenses, it is
>   likely to be under a variety of terms, not all of which will be compatible
>   with QEMU's licensing requirements.
>
> I remember the SCO situation with Linux and how it really created a lot
> of uncertainty with Linux because SCO created FUD around Linux licensing
> and how that led to the DCO being created.  I am aware of the fact that
> many open source contributors are very unhappy that their code has been
> used to train LLMs without retaining credits and copyright notices or
> honouring the license terms[2].  And I have spent many years working
> with non-profits[3], where I have always been taught that we should
> avoid even the appearance of impropriety.
>
> It may matter less what the situation actually ends up being legally
> (although it could end up being quite bad) and more whether someone can
> imply or suggest that Git is not being distributed in compliance with
> the license or contains infringing code, which could effectively make it
> undistributable because nobody wants to take that risk.  And litigation,
> even if Git and its contributors are successful, can be extraordinarily
> expensive.
>
> So I think, given the circumstances, yes, the right thing to do is to
> ban LLM-generated contributions with a policy very similar or identical
> to QEMU's.  If, in the future, the legal situation changes and it
> becomes unambiguously legal to use LLMs across the world, then we can
> reconsider that policy then.
>
> [0] https://www.debian.org/mirror/list
> [1] https://github.com/qemu/qemu/commit/3d40db0efc22520fa6c399cf73960dced423b048
> [2] Regardless of the legal concerns, this implicates professional
> ethics concerns, such as §1.5 of the ACM Code of Ethics[4].  Ethics
> requirements usually go well beyond what the law requires.
> [3] Software Freedom Conservancy, which handles legal matters for the
> Git project, is a non-profit.
> [4] https://www.acm.org/code-of-ethics

Thanks for clarifying your position.  To me, your preferred wording
for the position statement doesn't quite match the rationale.  I think
for cases of:

  * fixing typos
  * finding wording tweaks to existing documentation
  * tab completion of e.g. the next three lines in an IDE when limited
to e.g. what most any engineer in the world would write based on the
comment on the line before (or if the AI plugin doesn't quite get the
three lines right, well I already had them in my head and if it gets
close enough, it's easier for me to accept and then edit into what I
already knew I wanted)
  * assisting with wording in writing a commit message as an editor
(or maybe even suggesting some initial wording based on the patch I
already wrote)
  * identifying potential bugs in a patch
  * identifying potential typos in documentation

that none of these particular uses cause problems for the rationale
you specify, but at least the first four would be disallowed by the
preferred wording you want, and perhaps even the last two wouldn't be
allowed either (though I don't think AI is very good at the second to
last one, so not a big loss on that particular one yet).  Perhaps due
to my incomplete understanding of copyright all of these would
actually be problematic with the rationale you already gave for
reasons I don't yet know about or just haven't yet understood, but if
not, I'd rather not disallow these kinds of uses.

The first two from my list have a good example in the form of the
series at d12166d3c8bb (Merge branch 'en/docfixes', 2023-10-23) [or on
the list at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
], which was already merged a few years ago.  So if we adopt wording
that disallows these kinds of changes, then we also need to talk about
whether we grandfather already-merged series or proactively revert
them.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-06 17:45         ` Junio C Hamano
@ 2025-10-08  4:18           ` Elijah Newren
  2025-10-12 15:07             ` Junio C Hamano
  2025-10-08  9:28           ` Christian Couder
  1 sibling, 1 reply; 34+ messages in thread
From: Elijah Newren @ 2025-10-08  4:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Christian Couder, git, Taylor Blau,
	Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

On Mon, Oct 6, 2025 at 10:45 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "brian m. carlson" <sandals@crustytoothpaste.net> writes:
>
> > It may matter less what the situation actually ends up being legally
> > (although it could end up being quite bad) and more whether someone can
> > imply or suggest that Git is not being distributed in compliance with
> > the license or contains infringing code, which could effectively make it
> > undistributable because nobody wants to take that risk.  And litigation,
> > even if Git and its contributors are successful, can be extraordinarily
> > expensive.
> >
> > So I think, given the circumstances, yes, the right thing to do is to
> > ban LLM-generated contributions with a policy very similar or identical
> > to QEMU's.  If, in the future, the legal situation changes and it
> > becomes unambiguously legal to use LLMs across the world, then we can
> > reconsider that policy then.
>
> OK, so here is theirs for further discussion minimally adjusted for
> our use.  I do not see much difference at least in spirit with what
> started this thread, but phrasing is certainly firmer, and I have no
> problem with it.
>
>
>
> Use of AI content generators
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> TL;DR:
>
>   **Current Git project policy is copied from what QEMU does.  To
>   DECLINE any contributions which are believed to include or derive
>   from AI generated content. This includes ChatGPT, Claude, Copilot,
>   Llama and similar tools.**
>
> The increasing prevalence of AI-assisted software development results in a
> number of difficult legal questions and risks for software projects, including
> Git.  Of particular concern is content generated by `Large Language Models
> <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs).
>
> The Git community requires that contributors certify their patch submissions
> are made in accordance with the rules of the `Developer's Certificate of
> Origin (DCO) <dco>`.
>
> To satisfy the DCO, the patch contributor has to fully understand the
> copyright and license status of content they are contributing to Git. With AI
> content generators, the copyright and license status of the output is
> ill-defined with no generally accepted, settled legal foundation.
>
> Where the training material is known, it is common for it to include large
> volumes of material under restrictive licensing/copyright terms. Even where
> the training material is all known to be under open source licenses, it is
> likely to be under a variety of terms, not all of which will be compatible
> with Git's licensing requirements.
>
> How contributors could comply with DCO terms (b) or (c) for the output of AI
> content generators commonly available today is unclear.  The Git project is
> not willing or able to accept the legal risks of non-compliance.
>
> The Git project thus requires that contributors refrain from using AI content
> generators on patches intended to be submitted to the project, and will
> decline any contribution if use of AI is either known or suspected.
>
> This policy does not apply to other uses of AI, such as researching APIs or
> algorithms, static analysis, or debugging, provided their output is not to be
> included in contributions.
>
> Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
> ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
> generation agents which are built on top of such tools.
>
> This policy may evolve as AI tools mature and the legal situation is
> clarifed. In the meanwhile, requests for exceptions to this policy will be
> evaluated by the Git project on a case by case basis. To be granted an
> exception, a contributor will need to demonstrate clarity of the license and
> copyright status for the tool's output in relation to its training model and
> code, to the satisfaction of the project maintainers.

I preferred the version Christian sent, but *if* we end up adopting
some of the QEMU wording, I've got a logistics question:

    Will we grandfather already accepted series, or proactively revert them?

For example, the series merged at d12166d3c8bb (Merge branch
'en/docfixes', 2023-10-23) [or on the list at
https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
], which was already merged a few years ago.  I don't think that
series has anything remotely questionable from a copyright standpoint,
yet the QEMU-inspired wording would explicitly disallow it as far as I
can tell, and would claim that such kinds of things would never be
accepted in our project, even though people can find and point to the
fact that we already did.  Would that be problematic?

Of course, if we don't adopt the QEMU wording and go with Christian's
version, then we don't need to worry about whether to revert or
explain how it is grandfathered.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-03 16:20       ` Junio C Hamano
  2025-10-03 16:45         ` rsbecker
@ 2025-10-08  7:22         ` Christian Couder
  1 sibling, 0 replies; 34+ messages in thread
From: Christian Couder @ 2025-10-08  7:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin,
	Patrick Steinhardt, Christian Couder

On Fri, Oct 3, 2025 at 6:20 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> >> A milder way to phrase this would be to jump directly to "we reject
> >> what the sender cannot explain when asked about it".  "How does this
> >> work?"  "Why is this a good thing to do?"  "Where did it come from?"
> >> instead of saying "looks AI generated".
> >>
> >> It would sidestep the "who decides if it looks AI generated?" question.
> >
> > I don't think the "who decides if it looks AI generated?" question is
> > very relevant. If someone says that a patch looks mostly AI generated
> > and gives a good argument supporting this claim, it's the same as if
> > someone gives any other good argument against the patch. In the end,
> > the community and you decide if the argument is good enough and if the
> > patch should be rejected based on that (and other arguments for and
> > against the patch of course).
>
> And then who plays the final arbiter?

You, like for any other discussion about a patch when there are
different opinions.

> One can keep insisting on a
> patch that looks to me an apparent AI slop that it was what one
> wrote oneself, but you may find it a plausible that it was a human
> creation.  Then what?

You decide if the arguments on one side are better than those on the
other side, again like for any other discussion about a patch when
there are different opinions.

Why should the process be different? It could be different if we think
that such behavior is similar to the bad behavior we talk about in our
code of conduct, but I don't think we want to go there and have some
special procedures, right?

> It is very much relevant to avoid such argument, because the point
> is irrelevant.  We are trying to avoid accepting something the
> submitter has no rights to claim theirs, and requesting them to
> explain where it came from, how it works, etc. would be a better
> test than "does it look AI generated?  to everybody?", wouldn't it?

The sender can ask the AI where it came from, how it works, etc, and
copy-paste the AI's answers. The sender could also prompt the AI or
modify its answers so that they look human generated as much as
possible. So just asking those questions might not help much in some
cases. In the end, whatever the answers to some questions, we have to
be able to decide if the suspicious content looks too much like it has
been AI generated or not.

It doesn't mean that asking those questions couldn't help in some
cases. It means that we just don't want to enter into the details of
which questions we can ask and if we should judge based on the answers
to those questions or something else. For example our code of conduct
says that we will take action "in response to any behavior that they
deem inappropriate, threatening, offensive, or harmful." It doesn't
tie us to asking some questions and taking action based on the
answers.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-03 20:48     ` Elijah Newren
  2025-10-03 22:20       ` brian m. carlson
@ 2025-10-08  7:30       ` Christian Couder
  1 sibling, 0 replies; 34+ messages in thread
From: Christian Couder @ 2025-10-08  7:30 UTC (permalink / raw)
  To: Elijah Newren
  Cc: brian m. carlson, git, Junio C Hamano, Taylor Blau, Rick Sanders,
	Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

On Fri, Oct 3, 2025 at 10:48 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Oct 1, 2025 at 2:37 PM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:

> > We could say something like this:
> >
> >   Please do not sign off your work if you’re using an LLM to contribute
> >   unless you have included copyright and license information for all the
> >   code used in that LLM.
>
> Would this mean that you wanted to ban contributions like d12166d3c8bb
> (Merge branch 'en/docfixes', 2023-10-23), available on the list over
> at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
> ?   We don't need to go theoretical, I've already contributed such a
> patch series before -- 2 years ago -- and it was merged.  Granted,
> that was entirely documentation, and I called out the usage of AI in
> the cover letter, and I manually checked every change (discarding many
> of them) and split it into commits on my own, could easily explain any
> change and why it was good, etc.  And I was upfront about all of it.

This is a good example why we don't want to ban any use of generated AI. Thanks.

> If any use of AI is bad, do we need to revert that series?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-03 22:20       ` brian m. carlson
  2025-10-06 17:45         ` Junio C Hamano
  2025-10-08  4:18         ` Elijah Newren
@ 2025-10-08  8:37         ` Christian Couder
  2025-10-08  9:28           ` Michal Suchánek
  2025-10-09  1:13           ` Collin Funk
  2 siblings, 2 replies; 34+ messages in thread
From: Christian Couder @ 2025-10-08  8:37 UTC (permalink / raw)
  To: brian m. carlson, Elijah Newren, Christian Couder, git,
	Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC,
	Johannes Schindelin, Patrick Steinhardt, Christian Couder

On Sat, Oct 4, 2025 at 12:20 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2025-10-03 at 20:48:40, Elijah Newren wrote:
> > Would this mean that you wanted to ban contributions like d12166d3c8bb
> > (Merge branch 'en/docfixes', 2023-10-23), available on the list over
> > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
> > ?   We don't need to go theoretical, I've already contributed such a
> > patch series before -- 2 years ago -- and it was merged.  Granted,
> > that was entirely documentation, and I called out the usage of AI in
> > the cover letter, and I manually checked every change (discarding many
> > of them) and split it into commits on my own, could easily explain any
> > change and why it was good, etc.  And I was upfront about all of it.
>
> I think the main problem here is that we don't know the copyright
> status of LLM outputs.

It's very unlikely that whatever is decided about the copyright status
of LLM outputs will fundamentally change copyright law. So for example
small changes, or changes where a human has been involved a lot, or
changes that are very specific, and so on, are very likely acceptable.

> It is not uncommon for them to produce output
> that reflects their training input and we see evidence of that in, for
> instance, the New York Times lawsuit against OpenAI.

You might say something very similar about people contributing proprietary code:

"It is not uncommon to have people copy-paste some proprietary code
into an open source project and we see evidence of that in such and
such incidents."

So it's just fine to accept some degree of risk. We have to accept it
anyway. Saying "we will ban everything AI generated" will not make the
risk disappear either.

> As I said, the situation is very unclear legally, with active litigation
> in multiple countries, and we have to comply with pretty much every
> country's laws in this situation.  Whether something is legal in the
> United States, where you're located, is completely irrelevant to whether
> it is legal in Canada, where I'm located, or Germany or the UK, where we
> have other contributors.  We also have to consider whether it's legal in
> all of the countries that Git is distributed in, which includes every
> country in which Debian has a mirror[0], even countries under
> international sanctions, such as Iran, Russia, and Belarus.

I don't quite agree with this. Theoretically if the official mirrors
are only in a few countries, then only the laws in these few countries
(+ US law as the Conservancy is US based) might be really legally
relevant for the project. Then it's the responsibility of
distributions or people cloning/downloading the software to check that
it's legal in the countries they distribute or clone/download it.

In practice we should pay attention a bit to make sure we don't create
obvious legal problems for too many people, but if some countries
decide to have laws that are too stupid and ban too many things, we
could decide that we should definitely not pay attention to those
laws.

> It doesn't matter if the person using AI has indemnification, either,
> since that only covers civil matters, and at least in the U.S. and
> Canada, knowingly violating copyright is also a criminal offence.
>
> The sign-off process is designed to clearly state that a person has the
> ability to contribute code under the license and I don't think, as
> things stand, it's possible to make that assertion with code or
> documentation generated from an LLM except in very limited
> circumstances.

I think in practice those "very limited circumstances" can cover a lot
of different things though. Do we really want to enter into a legal
debate over what
https://en.wikipedia.org/wiki/Sc%C3%A8nes_%C3%A0_faire means for
software for example? Or about allowing or disallowing translation of
documentation or commit messages based on the fact that the tools used
for translation use an LLM or not?

I have given a lot of examples of what is very likely acceptable.
Elijah has given a very good concrete example showing why we should
not outright ban AI too. If you think they are not good examples
please tell it clearly. Otherwise I think you cannot keep saying that
they are related to "very limited circumstances".

> I don't allow LLM-generated code in my personal projects
> that require sign-off for that reason, and neither does QEMU[1].  I
> don't think I could honestly assert either (a) or (b) in the DCO with
> LLM-generated code because it's not clear to me whether "I have the
> right to submit it under the…license."
>
> To quote the QEMU policy:
>
>   To satisfy the DCO, the patch contributor has to fully understand the
>   copyright and license status of content they are contributing to QEMU. With AI
>   content generators, the copyright and license status of the output is
>   ill-defined with no generally accepted, settled legal foundation.
>
>   Where the training material is known, it is common for it to include large
>   volumes of material under restrictive licensing/copyright terms. Even where
>   the training material is all known to be under open source licenses, it is
>   likely to be under a variety of terms, not all of which will be compatible
>   with QEMU's licensing requirements.

The QEMU policy was discussed in the previous version already.

> I remember the SCO situation with Linux and how it really created a lot
> of uncertainty with Linux because SCO created FUD around Linux licensing
> and how that led to the DCO being created.  I am aware of the fact that
> many open source contributors are very unhappy that their code has been
> used to train LLMs without retaining credits and copyright notices or
> honouring the license terms[2].

I don't think it's very relevant for your position on this. On the
contrary, if LLMs have been trained mostly with open source code, then
if they produce copyrighted output, that output is more likely to be
compatible with the GPL. It has even been suggested (and discussed in
this thread) that some AIs should be trained only with open source
material (for example MIT licensed material?) so that we could stop
worrying about including it. If that happens, there would be no reason
to outright ban AI generated content, right?

> And I have spent many years working
> with non-profits[3], where I have always been taught that we should
> avoid even the appearance of impropriety.

Adding a section restricting AI use, even if it doesn't go as far as
you would like, is already a first step in the direction you want. If
this gets merged, you can always send patches on top to make it more
restrictive.

> It may matter less what the situation actually ends up being legally
> (although it could end up being quite bad) and more whether someone can
> imply or suggest that Git is not being distributed in compliance with
> the license or contains infringing code, which could effectively make it
> undistributable because nobody wants to take that risk.  And litigation,
> even if Git and its contributors are successful, can be extraordinarily
> expensive.

There are already legal risks anyway (see above).

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-08  8:37         ` Christian Couder
@ 2025-10-08  9:28           ` Michal Suchánek
  2025-10-08  9:35             ` Christian Couder
  2025-10-09  1:13           ` Collin Funk
  1 sibling, 1 reply; 34+ messages in thread
From: Michal Suchánek @ 2025-10-08  9:28 UTC (permalink / raw)
  To: Christian Couder
  Cc: brian m. carlson, Elijah Newren, git, Junio C Hamano, Taylor Blau,
	Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

Hello,

On Wed, Oct 08, 2025 at 10:37:53AM +0200, Christian Couder wrote:
> On Sat, Oct 4, 2025 at 12:20 AM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> >

> 
> > I remember the SCO situation with Linux and how it really created a lot
> > of uncertainty with Linux because SCO created FUD around Linux licensing
> > and how that led to the DCO being created.  I am aware of the fact that
> > many open source contributors are very unhappy that their code has been
> > used to train LLMs without retaining credits and copyright notices or
> > honouring the license terms[2].
> 
> I don't think it's very relevant for your position on this. On the
> contrary, if LLMs have been trained mostly with open source code, then
> if they produce copyrighted output, that output is more likely to be
> compatible with the GPL. It has even been suggested (and discussed in
> this thread) that some AIs should be trained only with open source
> material (for example MIT licensed material?) so that we could stop
> worrying about including it. If that happens, there would be no reason
> to outright ban AI generated content, right?

even MIT license requires attribution. As most current day LLMs fail to
provide that their output is legally dubious even when trained on fairly
permissively licensed code.

Thanks

Michal

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-06 17:45         ` Junio C Hamano
  2025-10-08  4:18           ` Elijah Newren
@ 2025-10-08  9:28           ` Christian Couder
  2025-10-13 18:14             ` Junio C Hamano
  1 sibling, 1 reply; 34+ messages in thread
From: Christian Couder @ 2025-10-08  9:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Elijah Newren, git, Taylor Blau, Rick Sanders,
	Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

On Mon, Oct 6, 2025 at 7:45 PM Junio C Hamano <gitster@pobox.com> wrote:

> OK, so here is theirs for further discussion minimally adjusted for
> our use.  I do not see much difference at least in spirit with what
> started this thread, but phrasing is certainly firmer, and I have no
> problem with it.

I don't think it's a good idea to be too firm. It could prevent people
willing to follow the rules from doing things that are actually
acceptable while it won't prevent the risks from people not following
the rules anyway.

Some of us have given examples of some uses that are likely acceptable
but seem to be banned by such firm wording. Do we want to discuss
again if translating a commit message using an AI tool is fine or not?

So I think we should start with something less firm, and then discuss
the pros vs cons of being firmer if some insist on being firmer then.

[...]

> How contributors could comply with DCO terms (b) or (c) for the output of AI
> content generators commonly available today is unclear.  The Git project is
> not willing or able to accept the legal risks of non-compliance.

I think this could be understood as if the Git project is responsible
for contributors submitting content they should not submit. I don't
think we should go into this.

[...]

> This policy does not apply to other uses of AI, such as researching APIs or
> algorithms, static analysis, or debugging, provided their output is not to be
> included in contributions.

This is not realistic. If an AI does static analysis for example, it
is likely to suggest a fix for the issues it finds. Hopefully the fix
will be the right one, so it will end up being included in the
contributions.

> Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's

s/includes/include/

> ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content
> generation agents which are built on top of such tools.

I don't think we should list examples like this. It could be
understood as if we ban such tools while they can help with static
analysis, typo fixing, translation, etc... On the other hand some
IDEs, for example, might include AI tools without users being really
aware of them.

> This policy may evolve as AI tools mature and the legal situation is
> clarifed. In the meanwhile, requests for exceptions to this policy will be
> evaluated by the Git project on a case by case basis.

I don't think we want to go into such processes.

> To be granted an
> exception, a contributor will need to demonstrate clarity of the license and
> copyright status for the tool's output in relation to its training model and
> code, to the satisfaction of the project maintainers.

If there are ever such AI tools trained on material such that the
legal risk is reduced, we will likely know about it. And even though
the legal risk will be reduced, the risk to be flooded with bad output
might not. So I don't think it's worth getting into this.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-08  9:28           ` Michal Suchánek
@ 2025-10-08  9:35             ` Christian Couder
  0 siblings, 0 replies; 34+ messages in thread
From: Christian Couder @ 2025-10-08  9:35 UTC (permalink / raw)
  To: Michal Suchánek
  Cc: brian m. carlson, Elijah Newren, git, Junio C Hamano, Taylor Blau,
	Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

Hi,

On Wed, Oct 8, 2025 at 11:28 AM Michal Suchánek <msuchanek@suse.de> wrote:

> > I don't think it's very relevant for your position on this. On the
> > contrary, if LLMs have been trained mostly with open source code, then
> > if they produce copyrighted output, that output is more likely to be
> > compatible with the GPL. It has even been suggested (and discussed in
> > this thread) that some AIs should be trained only with open source
> > material (for example MIT licensed material?) so that we could stop
> > worrying about including it. If that happens, there would be no reason
> > to outright ban AI generated content, right?
>
> even MIT license requires attribution. As most current day LLMs fail to
> provide that their output is legally dubious even when trained on fairly
> permissively licensed code.

Fair enough, but then if an AI is ever trained with the particular
purpose of producing code that can be included into MIT compatible
code bases, then hopefully people training it will make sure it can
help with properly attributing that code.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-08  8:37         ` Christian Couder
  2025-10-08  9:28           ` Michal Suchánek
@ 2025-10-09  1:13           ` Collin Funk
  1 sibling, 0 replies; 34+ messages in thread
From: Collin Funk @ 2025-10-09  1:13 UTC (permalink / raw)
  To: Christian Couder
  Cc: brian m. carlson, Elijah Newren, git, Junio C Hamano, Taylor Blau,
	Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> On Sat, Oct 4, 2025 at 12:20 AM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
>>
>> On 2025-10-03 at 20:48:40, Elijah Newren wrote:
>> > Would this mean that you wanted to ban contributions like d12166d3c8bb
>> > (Merge branch 'en/docfixes', 2023-10-23), available on the list over
>> > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/
>> > ?   We don't need to go theoretical, I've already contributed such a
>> > patch series before -- 2 years ago -- and it was merged.  Granted,
>> > that was entirely documentation, and I called out the usage of AI in
>> > the cover letter, and I manually checked every change (discarding many
>> > of them) and split it into commits on my own, could easily explain any
>> > change and why it was good, etc.  And I was upfront about all of it.
>>
>> I think the main problem here is that we don't know the copyright
>> status of LLM outputs.
>
> It's very unlikely that whatever is decided about the copyright status
> of LLM outputs will fundamentally change copyright law. So for example
> small changes, or changes where a human has been involved a lot, or
> changes that are very specific, and so on, are very likely acceptable.

The issue is lack of law, from my understanding. There has been zero
political will in the US for copyright legislation with respect to the
output of AI. Therefore, we are left with case law that is still
ongoing, that is, no precedent.

>> I remember the SCO situation with Linux and how it really created a lot
>> of uncertainty with Linux because SCO created FUD around Linux licensing
>> and how that led to the DCO being created.  I am aware of the fact that
>> many open source contributors are very unhappy that their code has been
>> used to train LLMs without retaining credits and copyright notices or
>> honouring the license terms[2].
>
> I don't think it's very relevant for your position on this. On the
> contrary, if LLMs have been trained mostly with open source code, then
> if they produce copyrighted output, that output is more likely to be
> compatible with the GPL. It has even been suggested (and discussed in
> this thread) that some AIs should be trained only with open source
> material (for example MIT licensed material?) so that we could stop
> worrying about including it. If that happens, there would be no reason
> to outright ban AI generated content, right?

Not all open source code is compatible with other open source code. If
you use the output of a model trained on GPLv3+ code in a GPLv2-only
project, then the creator of the GPLv3+ code could claim that you
violated the license since they are not compatible. Whether they would
win in court or not, I have no clue, but it is probably best to avoid
that situation.

Collin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-08  4:18           ` Elijah Newren
@ 2025-10-12 15:07             ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2025-10-12 15:07 UTC (permalink / raw)
  To: Elijah Newren
  Cc: brian m. carlson, Christian Couder, git, Taylor Blau,
	Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

Elijah Newren <newren@gmail.com> writes:

>> ...
>> This policy may evolve as AI tools mature and the legal situation is
>> clarifed. In the meanwhile, requests for exceptions to this policy will be
>> evaluated by the Git project on a case by case basis. To be granted an
>> exception, a contributor will need to demonstrate clarity of the license and
>> copyright status for the tool's output in relation to its training model and
>> code, to the satisfaction of the project maintainers.
>
> I preferred the version Christian sent, but *if* we end up adopting
> some of the QEMU wording, I've got a logistics question:
>
>     Will we grandfather already accepted series, or proactively revert them?

Stepping back a bit, can we treat this new guideline element just
like any other guidelines in SubmittingPatches and also
CodingGuidelines?

We have certain rules in our SubmittingPatches and CodingGuidelines
to help us not get into trouble in the future.  We require the log
messages to follow certain style to give them uniformity as
otherwise it would become harder to dig the history later to find
cause of an issue we are having today, and more importantly what the
design parameters were back when the change we are having trouble
with was written.  We ask people to follow certain style in the code
as it would make it more work to understand code if different styles
are mixed together without reason.

But we also frown upon churning the codebase for the sake of
strictly match the prescribed coding style.  The rules are mostly to
control newly written things so that they do not make our codebase
into worse shape than it currently is.  When we update a part of our
codebase for some reason, other than "there is no particular reason
but we want to fix them to match guidelines", we would take existing
guideline violations the touched part may have into account, of
course.  And we find no need in our other non-AI guidelines to say
"we grandfather badness that already exists, but we try our best to
enforce the guidelines as strictly as possible", and the reason, I
think, is because that is implicitly what everybody expects.  Should
the "We tell you again not to blindly add things with unknown
origin, given the recent proliferation of AI coding product" rule be
any special and different?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-08  9:28           ` Christian Couder
@ 2025-10-13 18:14             ` Junio C Hamano
  2025-10-23 17:32               ` Junio C Hamano
  0 siblings, 1 reply; 34+ messages in thread
From: Junio C Hamano @ 2025-10-13 18:14 UTC (permalink / raw)
  To: Christian Couder
  Cc: brian m. carlson, Elijah Newren, git, Taylor Blau, Rick Sanders,
	Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

Christian Couder <christian.couder@gmail.com> writes:

> On Mon, Oct 6, 2025 at 7:45 PM Junio C Hamano <gitster@pobox.com> wrote:
>
>> OK, so here is theirs for further discussion minimally adjusted for
>> our use.  I do not see much difference at least in spirit with what
>> started this thread, but phrasing is certainly firmer, and I have no
>> problem with it.
>
> I don't think it's a good idea to be too firm. It could prevent people
> willing to follow the rules from doing things that are actually
> acceptable while it won't prevent the risks from people not following
> the rules anyway.

>> How contributors could comply with DCO terms (b) or (c) for the output of AI
>> content generators commonly available today is unclear.  The Git project is
>> not willing or able to accept the legal risks of non-compliance.
>
> I think this could be understood as if the Git project is responsible
> for contributors submitting content they should not submit. I don't
> think we should go into this.

When the project distributes work that it has no right to
distribute, those who claim to be right holders would try to make
the project be held responsible for it.  It is a different story if
the court agrees.

> [...]
>
>> This policy does not apply to other uses of AI, such as researching APIs or
>> algorithms, static analysis, or debugging, provided their output is not to be
>> included in contributions.
>
> This is not realistic. If an AI does static analysis for example, it
> is likely to suggest a fix for the issues it finds. Hopefully the fix
> will be the right one, so it will end up being included in the
> contributions.
>
>> Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's
>
> s/includes/include/

We are not in the business of typofixing QEMU policy.  Send that
patch in their direction  ;-).

I do not have strong preference either way.  Even if the wording is
firm, it is really up to each contributor to honor the guideline and
be honest with us.  You may see autocorrection in your editor fix a
typo for you, and more advanced tools may offer to rewrite what you
wrote, whether it is prose or code.  It is very plausible that,
especially for simple fixes, the result may be what the contributor
would have arrived on their own anyway, and in such a case, even the
contributor would not even know how much came from "AI" or simple
dictionary, or if that AI learned with things you should not have
seen.

So, I do not think it makes too big a difference in practice whether
we adopt the QEMU with minimum rewrite, or the version you posted.
As the one you sent is in line with what we give applicants of our
mentoring programs, and it was read over by our SFC lawyer, I'd
prefer to keep the version I already have in my tree.  Not moving on
either, I think, is worse than adopting either in this case.

Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2] SubmittingPatches: add section about AI
  2025-10-13 18:14             ` Junio C Hamano
@ 2025-10-23 17:32               ` Junio C Hamano
  0 siblings, 0 replies; 34+ messages in thread
From: Junio C Hamano @ 2025-10-23 17:32 UTC (permalink / raw)
  To: Christian Couder
  Cc: brian m. carlson, Elijah Newren, git, Taylor Blau, Rick Sanders,
	Git at SFC, Johannes Schindelin, Patrick Steinhardt,
	Christian Couder

Junio C Hamano <gitster@pobox.com> writes:

> I do not have strong preference either way.  Even if the wording is
> firm, it is really up to each contributor to honor the guideline and
> be honest with us.  You may see autocorrection in your editor fix a
> typo for you, and more advanced tools may offer to rewrite what you
> wrote, whether it is prose or code.  It is very plausible that,
> especially for simple fixes, the result may be what the contributor
> would have arrived on their own anyway, and in such a case, even the
> contributor would not even know how much came from "AI" or simple
> dictionary, or if that AI learned with things you should not have
> seen.
>
> So, I do not think it makes too big a difference in practice whether
> we adopt the QEMU with minimum rewrite, or the version you posted.
> As the one you sent is in line with what we give applicants of our
> mentoring programs, and it was read over by our SFC lawyer, I'd
> prefer to keep the version I already have in my tree.  Not moving on
> either, I think, is worse than adopting either in this case.

Taking time to discuss before deciding on an important issue is one
thing, but waiting for more input to happen and not moving in either
direction is worse than picking one and move on.  As I said above, I
do not quite see material difference between either one in practice.

I guess it is time to make an executive decision to merge it down to
'next'.  We can still tweak the language if we want, but it is more
important to have a written policy to reject materials of unknown
origin (whether it came from generative AI or not) than not having
one while we wish to be able to pick the best policy, waiting for a
better argument to come from somewhere.

As to Elijah's concern about grandfathering, I do not think it has
much practical benefit to make such a declaration.  If it turns out
that older "contributions" had added something we shouldn't have,
regardless of how it was generated (either from generative AI or a
human contributor typing while unconciously recalling what they saw
elsewhere), we may need to revert it anyway, so we will deal with it
when it becomes an issue.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2025-10-23 17:32 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano
2025-06-30 21:07 ` brian m. carlson
2025-06-30 21:23   ` Collin Funk
2025-07-01 10:36 ` Christian Couder
2025-07-01 11:07   ` Christian Couder
2025-07-01 17:33     ` Junio C Hamano
2025-07-01 16:20   ` Junio C Hamano
2025-07-08 14:23     ` Christian Couder
2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder
2025-10-01 18:59   ` Chuck Wolber
2025-10-01 23:32     ` brian m. carlson
2025-10-02  2:30       ` Ben Knoble
2025-10-03 13:33     ` Christian Couder
2025-10-01 20:59   ` Junio C Hamano
2025-10-03  8:51     ` Christian Couder
2025-10-03 16:20       ` Junio C Hamano
2025-10-03 16:45         ` rsbecker
2025-10-08  7:22         ` Christian Couder
2025-10-01 21:37   ` brian m. carlson
2025-10-03 14:25     ` Christian Couder
2025-10-03 20:48     ` Elijah Newren
2025-10-03 22:20       ` brian m. carlson
2025-10-06 17:45         ` Junio C Hamano
2025-10-08  4:18           ` Elijah Newren
2025-10-12 15:07             ` Junio C Hamano
2025-10-08  9:28           ` Christian Couder
2025-10-13 18:14             ` Junio C Hamano
2025-10-23 17:32               ` Junio C Hamano
2025-10-08  4:18         ` Elijah Newren
2025-10-08  8:37         ` Christian Couder
2025-10-08  9:28           ` Michal Suchánek
2025-10-08  9:35             ` Christian Couder
2025-10-09  1:13           ` Collin Funk
2025-10-08  7:30       ` Christian Couder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).