* [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes
@ 2025-06-30 20:32 Junio C Hamano
2025-06-30 21:07 ` brian m. carlson
` (2 more replies)
0 siblings, 3 replies; 34+ messages in thread
From: Junio C Hamano @ 2025-06-30 20:32 UTC (permalink / raw)
To: git; +Cc: Git PLC
Following the example set by QEMU folks, let's explicitly forbid use
of genAI tools until the copyright and license situations become
more clear. Here is what QEMU folks say in their commit to adopt
such a rule:
The DCO requires contributors to assert they have the right to
contribute under the designated project license. Given the lack
of consensus on the licensing of AI code generator output, it is
not considered credible to assert compliance with the DCO clause
(b) or (c) where a patch includes such generated code.
and it applies equally well to ours.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Documentation/SubmittingPatches | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git c/Documentation/SubmittingPatches w/Documentation/SubmittingPatches
index 958e3cc3d5..63fd10ce39 100644
--- c/Documentation/SubmittingPatches
+++ w/Documentation/SubmittingPatches
@@ -439,6 +439,23 @@ highlighted above.
Only capitalize the very first letter of the trailer, i.e. favor
"Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By".
+
+[[ai]]
+=== Use of AI content generators
+
+This project requires that contributors certify that their
+contributions are made under Developer's Certificate of Origin 1.1,
+which in turn means that contributors must understand the full
+provenance of what they are contributing. With AI content generators,
+the copyright or license status of their output is ill-defined, without
+any generally accepted legal foundation.
+
+Hence, the project asks that contributors refrain from using AI content
+generators on changes that are submitted to the project.
+Contributions in which use of AI is either known or suspected may not
+be accepted.
+
+
[[git-tools]]
=== Generate your patch using Git tools out of your commits.
^ permalink raw reply related [flat|nested] 34+ messages in thread* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes 2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano @ 2025-06-30 21:07 ` brian m. carlson 2025-06-30 21:23 ` Collin Funk 2025-07-01 10:36 ` Christian Couder 2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder 2 siblings, 1 reply; 34+ messages in thread From: brian m. carlson @ 2025-06-30 21:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Git PLC [-- Attachment #1: Type: text/plain, Size: 3571 bytes --] On 2025-06-30 at 20:32:22, Junio C Hamano wrote: > Following the example set by QEMU folks, let's explicitly forbid use > of genAI tools until the copyright and license situations become > more clear. Here is what QEMU folks say in their commit to adopt > such a rule: > > The DCO requires contributors to assert they have the right to > contribute under the designated project license. Given the lack > of consensus on the licensing of AI code generator output, it is > not considered credible to assert compliance with the DCO clause > (b) or (c) where a patch includes such generated code. > > and it applies equally well to ours. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > Documentation/SubmittingPatches | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git c/Documentation/SubmittingPatches w/Documentation/SubmittingPatches > index 958e3cc3d5..63fd10ce39 100644 > --- c/Documentation/SubmittingPatches > +++ w/Documentation/SubmittingPatches > @@ -439,6 +439,23 @@ highlighted above. > Only capitalize the very first letter of the trailer, i.e. favor > "Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By". > > + > +[[ai]] > +=== Use of AI content generators > + > +This project requires that contributors certify that their > +contributions are made under Developer's Certificate of Origin 1.1, > +which in turn means that contributors must understand the full > +provenance of what they are contributing. With AI content generators, > +the copyright or license status of their output is ill-defined, without > +any generally accepted legal foundation. > + > +Hence, the project asks that contributors refrain from using AI content > +generators on changes that are submitted to the project. > +Contributions in which use of AI is either known or suspected may not > +be accepted. This matches the advice we gave contributors to GSOC and similar projects, so it's good that we're being consistent here. I think this seems prudent given the fact that there are 181 signatories to the Berne Convention and even if the courts rule that the use of generative AI is acceptable in one country (say, the United States), it isn't clear that that will mean anything in other countries (such as Canada). Considering that there's ongoing litigation and quite a bit of legal uncertainty, as well as substantial pushback on generative AI from the open source community, this approach seems like it's in the best interests of the project at the moment[0]. We can always reconsider in the future if need be. I'll note that this was my interpretation of the DCO from the start (and I have governed my behaviour and contributions accordingly) but it can be helpful to explicitly document our shared understanding. One style note: I noticed that there's two blank lines before and after this block. Some sections have one blank line between them and some have two, so I don't think this is a problem, but I thought I might as well point it out. [0] I know some large companies feel differently, but considering our status as a member project of Conservancy (which is a non-profit), our comparatively limited assets, and the potential negative legal effects on downstream distributors (many of which are independent people or non-profits), I would say we find ourselves in a different position from those companies and would need to make a different decision. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes 2025-06-30 21:07 ` brian m. carlson @ 2025-06-30 21:23 ` Collin Funk 0 siblings, 0 replies; 34+ messages in thread From: Collin Funk @ 2025-06-30 21:23 UTC (permalink / raw) To: brian m. carlson; +Cc: Junio C Hamano, git, Git PLC Hi all, "brian m. carlson" <sandals@crustytoothpaste.net> writes: > I think this seems prudent given the fact that there are 181 signatories > to the Berne Convention and even if the courts rule that the use of > generative AI is acceptable in one country (say, the United States), it > isn't clear that that will mean anything in other countries (such as > Canada). Considering that there's ongoing litigation and quite a bit of > legal uncertainty, as well as substantial pushback on generative AI from > the open source community, this approach seems like it's in the best > interests of the project at the moment[0]. We can always reconsider in > the future if need be. I agree. It feels unsafe given the lack of legislation and lack of case law. One thing, though: >> +Hence, the project asks that contributors refrain from using AI content >> +generators on changes that are submitted to the project. >> +Contributions in which use of AI is either known or suspected may not >> +be accepted. This feels more like a suggestion than a requirement. Shouldn't we explicitly prohibit it? If we truly are worried about the copyright-ability of its output. Collin ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes 2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano 2025-06-30 21:07 ` brian m. carlson @ 2025-07-01 10:36 ` Christian Couder 2025-07-01 11:07 ` Christian Couder 2025-07-01 16:20 ` Junio C Hamano 2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder 2 siblings, 2 replies; 34+ messages in thread From: Christian Couder @ 2025-07-01 10:36 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Git PLC On Mon, Jun 30, 2025 at 10:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > Following the example set by QEMU folks, let's explicitly forbid use > of genAI tools until the copyright and license situations become > more clear. Here is what QEMU folks say in their commit to adopt > such a rule: > > The DCO requires contributors to assert they have the right to > contribute under the designated project license. Given the lack > of consensus on the licensing of AI code generator output, it is > not considered credible to assert compliance with the DCO clause > (b) or (c) where a patch includes such generated code. Here they forbid licensing any "AI code generator output" with the DCO. > and it applies equally well to ours. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > Documentation/SubmittingPatches | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git c/Documentation/SubmittingPatches w/Documentation/SubmittingPatches > index 958e3cc3d5..63fd10ce39 100644 > --- c/Documentation/SubmittingPatches > +++ w/Documentation/SubmittingPatches > @@ -439,6 +439,23 @@ highlighted above. > Only capitalize the very first letter of the trailer, i.e. favor > "Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By". > > + > +[[ai]] > +=== Use of AI content generators > + > +This project requires that contributors certify that their > +contributions are made under Developer's Certificate of Origin 1.1, > +which in turn means that contributors must understand the full > +provenance of what they are contributing. With AI content generators, > +the copyright or license status of their output is ill-defined, without > +any generally accepted legal foundation. Here we would forbid licensing any "AI content generator" output, not just AI code generator output. So what we would forbid might be more general than what QEMU folks forbid. For example they might still accept a new logo, or even commit messages, made using an AI while we wouldn't. > +Hence, the project asks that contributors refrain from using AI content > +generators on changes that are submitted to the project. Here it looks like using an AI capable of generating content to just check code that would be submitted could also be forbidden. I don't think this is what we want, so I think we might want to reword this. > +Contributions in which use of AI is either known or suspected may not > +be accepted. Here also "use of AI" might forbid checking what we submit using any AI tool. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes 2025-07-01 10:36 ` Christian Couder @ 2025-07-01 11:07 ` Christian Couder 2025-07-01 17:33 ` Junio C Hamano 2025-07-01 16:20 ` Junio C Hamano 1 sibling, 1 reply; 34+ messages in thread From: Christian Couder @ 2025-07-01 11:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Git PLC On Tue, Jul 1, 2025 at 12:36 PM Christian Couder <christian.couder@gmail.com> wrote: > > On Mon, Jun 30, 2025 at 10:32 PM Junio C Hamano <gitster@pobox.com> wrote: > > > > Following the example set by QEMU folks, let's explicitly forbid use > > of genAI tools until the copyright and license situations become > > more clear. Here is what QEMU folks say in their commit to adopt > > such a rule: > > > > The DCO requires contributors to assert they have the right to > > contribute under the designated project license. Given the lack > > of consensus on the licensing of AI code generator output, it is > > not considered credible to assert compliance with the DCO clause > > (b) or (c) where a patch includes such generated code. > > Here they forbid licensing any "AI code generator output" with the DCO. > > > and it applies equally well to ours. [...] > > +=== Use of AI content generators > > + > > +This project requires that contributors certify that their > > +contributions are made under Developer's Certificate of Origin 1.1, > > +which in turn means that contributors must understand the full > > +provenance of what they are contributing. With AI content generators, > > +the copyright or license status of their output is ill-defined, without > > +any generally accepted legal foundation. > > Here we would forbid licensing any "AI content generator" output, not > just AI code generator output. So what we would forbid might be more > general than what QEMU folks forbid. For example they might still > accept a new logo, or even commit messages, made using an AI while we > wouldn't. As QEMU is part of the Conservancy, like Git, I wonder if they consulted a Conservancy lawyer to come up with their wording? If they did, maybe we could reuse that expertise? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes 2025-07-01 11:07 ` Christian Couder @ 2025-07-01 17:33 ` Junio C Hamano 0 siblings, 0 replies; 34+ messages in thread From: Junio C Hamano @ 2025-07-01 17:33 UTC (permalink / raw) To: Christian Couder; +Cc: git, Git PLC Christian Couder <christian.couder@gmail.com> writes: > As QEMU is part of the Conservancy, like Git, I wonder if they > consulted a Conservancy lawyer to come up with their wording? If they > did, maybe we could reuse that expertise? Or grab their wording wholesale, perhaps? https://github.com/qemu/qemu/commit/3d40db0efc22520fa6c399cf73960dced423b048 is the commit they added it to their policy. Thanks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes 2025-07-01 10:36 ` Christian Couder 2025-07-01 11:07 ` Christian Couder @ 2025-07-01 16:20 ` Junio C Hamano 2025-07-08 14:23 ` Christian Couder 1 sibling, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2025-07-01 16:20 UTC (permalink / raw) To: Christian Couder; +Cc: git, Git PLC Christian Couder <christian.couder@gmail.com> writes: >> + >> +[[ai]] >> +=== Use of AI content generators >> + >> +This project requires that contributors certify that their >> +contributions are made under Developer's Certificate of Origin 1.1, >> +which in turn means that contributors must understand the full >> +provenance of what they are contributing. With AI content generators, >> +the copyright or license status of their output is ill-defined, without >> +any generally accepted legal foundation. > > Here we would forbid licensing any "AI content generator" output, not > just AI code generator output. So what we would forbid might be more > general than what QEMU folks forbid. For example they might still > accept a new logo, or even commit messages, made using an AI while we > wouldn't. I didn't think about the distinction you are trying to draw when I wrote the patch, but after thinking about it, I think it is a good thing to prevent us from adopting a new logo graphics somebody may have ownership rights without us knowing. I would consider the commit log message as an integral part of any "contribution", and read the word "contribution" used in the [[dco]] section as such, if the rule covers the commit log message, that is very much appreciated. >> +Hence, the project asks that contributors refrain from using AI content >> +generators on changes that are submitted to the project. > > Here it looks like using an AI capable of generating content to just > check code that would be submitted could also be forbidden. I don't > think this is what we want, so I think we might want to reword this. Good point. Asking agents to proofread and suggest improvements is like asking your friends to do so. Care to suggest replacement to these two sentences (above and below)? >> +Contributions in which use of AI is either known or suspected may not >> +be accepted. > > Here also "use of AI" might forbid checking what we submit using any AI tool. Thanks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes 2025-07-01 16:20 ` Junio C Hamano @ 2025-07-08 14:23 ` Christian Couder 0 siblings, 0 replies; 34+ messages in thread From: Christian Couder @ 2025-07-08 14:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Git PLC On Tue, Jul 1, 2025 at 6:20 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > Here we would forbid licensing any "AI content generator" output, not > > just AI code generator output. So what we would forbid might be more > > general than what QEMU folks forbid. For example they might still > > accept a new logo, or even commit messages, made using an AI while we > > wouldn't. > > I didn't think about the distinction you are trying to draw when I > wrote the patch, but after thinking about it, I think it is a good > thing to prevent us from adopting a new logo graphics somebody may > have ownership rights without us knowing. I would consider the > commit log message as an integral part of any "contribution", and > read the word "contribution" used in the [[dco]] section as such, if > the rule covers the commit log message, that is very much > appreciated. I am not sure about logos, but for the commit message, it seems to me that it could have drawbacks related to translation or wordings. For example if someone is not a good English writer, they could write a commit message in their native language and then ask an AI to translate it. Or they could write it in their bad English and then ask an AI to improve the wordings. I am not sure we want to forbid all that. > >> +Hence, the project asks that contributors refrain from using AI content > >> +generators on changes that are submitted to the project. > > > > Here it looks like using an AI capable of generating content to just > > check code that would be submitted could also be forbidden. I don't > > think this is what we want, so I think we might want to reword this. > > Good point. Asking agents to proofread and suggest improvements is > like asking your friends to do so. Care to suggest replacement to > these two sentences (above and below)? I could try but I would feel better if we tried to find and ask people around who have thought about this subject already. Especially I think it's difficult to draw the line between a tool that suggests improvements and a tool that generates content. For example if I were a very bad English writer and asked an AI to suggest improvements to a commit message I wrote, then the AI might actually rewrite nearly everything and the result could be very similar to what the AI would have generated in the first place based only on the diff part of the patch. ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2] SubmittingPatches: add section about AI 2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano 2025-06-30 21:07 ` brian m. carlson 2025-07-01 10:36 ` Christian Couder @ 2025-10-01 14:02 ` Christian Couder 2025-10-01 18:59 ` Chuck Wolber ` (2 more replies) 2 siblings, 3 replies; 34+ messages in thread From: Christian Couder @ 2025-10-01 14:02 UTC (permalink / raw) To: git Cc: Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder, Christian Couder As more and more developer tools use AI, we are facing two main risks related to AI generated content: - its situation regarding copyright and license is not clear, and: - more and more bad quality content could be submitted for review to the mailing list. To mitigate both risks, let's add an "Use of Artificial Intelligence" section to "Documentation/SubmittingPatches" with the goal of discouraging its blind use to generate content that is submitted to the project, while still allowing us to benefit from its help in some innovative, useful and less risky ways. Helped-by: Rick Sanders <rick@sfconservancy.org> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> --- This is inspired by the "AI guidelines" section we already have for mentoring programs (like GSoC or Outreachy) in: https://git.github.io/General-Application-Information/ which was discussed briefly in a PR (https://github.com/git/git.github.io/pull/771) and in a small thread on the mailing list (https://lore.kernel.org/git/CAP8UFD37_qsTjM97GK2EOWHteqoUKdwxjKS-SU629H2LnbTTtA@mail.gmail.com/). Documentation/SubmittingPatches | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 86ca7f6a78..04191e2945 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches @@ -446,6 +446,34 @@ highlighted above. Only capitalize the very first letter of the trailer, i.e. favor "Signed-off-by" over "Signed-Off-By" and "Acked-by:" over "Acked-By". +[[ai]] +=== Use of Artificial Intelligence (AI) + +The Developer's Certificate of Origin requires contributors to certify +that they know the origin of their contributions to the project and +that they have the right to submit it under the project's license. +It's not yet clear that this can be legally satisfied when submitting +significant amount of content that has been generated by AI tools. + +Another issue with AI generated content is that AIs still often +hallucinate or just produce bad code, commit messages, documentation +or output, even when you point out their mistakes. + +To avoid these issues, we will reject anything that looks AI +generated, that sounds overly formal or bloated, that looks like AI +slop, that looks good on the surface but makes no sense, or that +senders don’t understand or cannot explain. + +We strongly recommend using AI tools carefully and responsibly. + +Contributors would often benefit more from AI by using it to guide and +help them step by step towards producing a solution by themselves +rather than by asking for a full solution that they would then mostly +copy-paste. They can also use AI to help with debugging, or with +checking for obvious mistakes, things that can be improved, things +that don’t match our style, guidelines or our feedback, before sending +it to us. + [[git-tools]] === Generate your patch using Git tools out of your commits. -- 2.51.0.195.ge34f015aea.dirty ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder @ 2025-10-01 18:59 ` Chuck Wolber 2025-10-01 23:32 ` brian m. carlson 2025-10-03 13:33 ` Christian Couder 2025-10-01 20:59 ` Junio C Hamano 2025-10-01 21:37 ` brian m. carlson 2 siblings, 2 replies; 34+ messages in thread From: Chuck Wolber @ 2025-10-01 18:59 UTC (permalink / raw) To: Christian Couder, git Cc: Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Wed Oct 1, 2025 at 2:03 PM UTC, Christian Couder wrote: > To mitigate both risks, let's add an "Use of Artificial Intelligence" > section to "Documentation/SubmittingPatches" with the goal of > discouraging its blind use to generate content that is submitted to > the project, while still allowing us to benefit from its help in some > innovative, useful and less risky ways. I love the intent here, but it does not seem like that came through in the proposed patch. I think this patch opens the door to some concerning issues, including the potential for false accusations and inconsistent treatment of human (non-AI) generated contributions. Sticking to a message of self-reliance (e.g. responsible AI use) and making some technical changes to mark AI content might be a better approach. > +The Developer's Certificate of Origin requires contributors to certify > +that they know the origin of their contributions to the project and > +that they have the right to submit it under the project's license. > +It's not yet clear that this can be legally satisfied when submitting > +significant amount of content that has been generated by AI tools. The legal issues around AI will be resolved in time, but the future will not stop bringing us a steady stream of things that create legal ambiguity. Creating one-off sections that cover _multiple_ topics _including_ legal ambiguity seems like it risks reducing clarity. To get the full picture, this patch (and patches like it in the future) require me to navigate multiple sections to understand all of the project's relevant legal concerns. I also have two specific concerns with the wording: 1. It repeats what is said just a few paragraphs earlier in the document. I understand _why_ it does this, but moving the essence of this topic up to the DCO section avoids the repetition and avoids diluting the project's legal guidance. 2. What am I supposed to do with "It's not yet clear"? This is worse than telling me nothing. It introduces a vague question with no clear guidance. It is _true_ that no clear guidance exists, but what are the consequences when it _does_ exist? The worst case scenario is that we have to go back and rework/remove AI generated patches. So why not just require something like a declaration of AI content like the one proposed at declare-ai.org? > +To avoid these issues, we will reject anything that looks AI > +generated, that sounds overly formal or bloated, that looks like AI > +slop, that looks good on the surface but makes no sense, or that > +senders don’t understand or cannot explain. That reads like a full stop rejection of all AI generated patch content. What if AI were to generate a great patch whose technical quality is exemplary in every way? How is that any different from a great patch of exemplary technical quality submitted by a person who is unambiguosly evil? But perhaps you intended it to mean a full stop rejection of content that _looks_ like it was generated by the primitive AI we have _today_? Even going with the interpretation you likely intended opens up a concerning double standard. What if a patch "looks" AI generated, but in reality was wholly geneated by a human? Does this mean that patches generated by humans that fit the declared criteria would be treated as if they were AI generated? What about a non-native speaker who uses AI in an attempt to bridge a language barrier? By definition they would lack the ability to judge the degree to which their patch suddenly meets your criteria. How is any of that fair, and how could you even tell the difference? And on a personal note, the subjective wording gives me a "walking on eggshells" feeling. It opens the door for false accusations, and gets us away from judging things _purely_ on their technical merit. Would it not be more _consistent_ to continue saying what is already true? That your patches _must_ be remarkably high quality regardless of how they were created? With the addition of a required AI declaration (again, check out declare-ai.org for an example of what that might look like), I think you cover all of the necessary bases. And sure, someone could lie. But they can lie about meeting the DCO as well. The consequences are the same - remove/rework. > +We strongly recommend using AI tools carefully and responsibly. Agreed, but I think you lost me here. Taking your words at face value, the prior paragraph reads as if the Git project is declaring an outright ban on _all_ AI generated content (and I am nearly certain that is _not_ what you intended to say). If so, why bother continuing on with a PSA (Public Safety Announcement)? It reads like a non-alcoholic drink that has the words, "Drink Responsibly" printed on the side of the can. > +Contributors would often benefit more from AI by using it to guide and > +help them step by step towards producing a solution by themselves > +rather than by asking for a full solution that they would then mostly > +copy-paste. They can also use AI to help with debugging, or with > +checking for obvious mistakes, things that can be improved, things > +that don’t match our style, guidelines or our feedback, before sending > +it to us. I think this is very useful guidance. And although it is timely, I think it stands a good chance of being timeless, even when AI becomes far more competent than it is today. AI is not going away, and we need to find a way to use it productively _without_ losing our sense of self-reliance. If we fail to develop this ability when AI is hardly more skilled than an above average intern, full of hubris and zero real world experience, imagine how unqualified we will be when AI becomes competent enough to manipulate and mislead us? Overall, I feel like an addition to the documentation is warranted, but this version makes me uncomfortable if not a little unwelcome. Making a techncial change to the required declarations and expanding on the theme of self-reliance and responsible use feels like a more productive way to address this issue. Putting my "money where my mouth is", I am more than happy to suggest a revision to this patch if you would like. I wanted to avoid that right now because it seemed like a dialog was warranted first. ..Ch:W.. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-01 18:59 ` Chuck Wolber @ 2025-10-01 23:32 ` brian m. carlson 2025-10-02 2:30 ` Ben Knoble 2025-10-03 13:33 ` Christian Couder 1 sibling, 1 reply; 34+ messages in thread From: brian m. carlson @ 2025-10-01 23:32 UTC (permalink / raw) To: Chuck Wolber Cc: Christian Couder, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder [-- Attachment #1: Type: text/plain, Size: 6109 bytes --] On 2025-10-01 at 18:59:31, Chuck Wolber wrote: > 1. It repeats what is said just a few paragraphs earlier in the document. I > understand _why_ it does this, but moving the essence of this topic up to the > DCO section avoids the repetition and avoids diluting the project's legal > guidance. > > 2. What am I supposed to do with "It's not yet clear"? This is worse than > telling me nothing. It introduces a vague question with no clear guidance. It > is _true_ that no clear guidance exists, but what are the consequences when it > _does_ exist? The worst case scenario is that we have to go back and > rework/remove AI generated patches. So why not just require something like a > declaration of AI content like the one proposed at declare-ai.org? I agree that this is unclear, which is why I suggested we be more definitive. Many of the companies that develop LLMs are headquartered in the United States. Many of the people that contribute to Git or distribute Git are not. For instance, I am located in Canada, which has different copyright laws (we have the more limited fair dealing like the UK, instead of the US's fair use) and has moral rights. It is entirely possible that the use of an LLM could be legal in one country or jurisdiction but not another. By accepting code that is written using LLMs into Git, we expose our contributors (who implicitly distribute Git code by uploading it to servers) and distributors (such as Linux distros or their distributors) to potential liability if the use of a particular LLM or LLMs in general are found to be illegal in their jurisdiction. Unlike most of the companies that develop LLMs, most contributors and distributors of Git are individuals or non-profits with limited resources. Even as someone who works in the tech industry and is paid accordingly, defending a copyright claim would be extremely expensive and probably financially devastating for me and I really do not want to take that risk. That's why simply declaring LLM use is not acceptable: because it exposes others who have limited resources to legal risk. Note that ripping it out afterwards would require rewriting the Git history and would not solve the problem of all of the people who are distributing or using older versions (which would have been judged to violate copyright law) or relieve them of the fact that they would have been exposed to legal liability for their distribution. The avoidance of legal problems is why we require sign-off. If Developer X signs off a patch that was later judged to violate copyright law, then they have made a legally binding statement to that effect and they have effectively accepted the entire legal liability for that[0]. If we don't believe people can legally make certain types of contributions, then we should explicitly tell people that they should not make that legal statement to avoid any ambiguity. This is very different from situations where companies make a decision to incorporate LLM-generated code into their own codebases. They can hire lawyers to determine whether LLM-generated code is legal in their given jurisdiction and obtain whatever legal necessities are required to operate in compliance with the law. They also usually have substantial resources to address any problems that come up. We, on the other hand, are effectively a global project, must engage in behaviour that is legal in all or nearly all jurisdictions, and have very limited resources. > That reads like a full stop rejection of all AI generated patch content. > > What if AI were to generate a great patch whose technical quality is exemplary > in every way? How is that any different from a great patch of exemplary > technical quality submitted by a person who is unambiguosly evil? There are a couple of problems here: one, some AI code (including documentation or other text) is of poor quality; two, regardless of the quality, many people submit AI-generated code they do not understand; and three, AI-generated code is a legal minefield. A technically great patch solves the first but not the other two. We still need people who submit code to be able to explain their changes and respond to questions about the code. What decisions were made? Why were they made? What are the tradeoffs and downsides? > Taking your words at face value, the prior paragraph reads as if the Git > project is declaring an outright ban on _all_ AI generated content (and I am > nearly certain that is _not_ what you intended to say). If so, why bother > continuing on with a PSA (Public Safety Announcement)? It reads like a > non-alcoholic drink that has the words, "Drink Responsibly" printed on the side > of the can. I think this is actually what they intended to say, but did so poorly. I agree clarification would be valuable. > AI is not going away, and we need to find a way to use it productively > _without_ losing our sense of self-reliance. If we fail to develop this ability > when AI is hardly more skilled than an above average intern, full of hubris and > zero real world experience, imagine how unqualified we will be when AI becomes > competent enough to manipulate and mislead us? I think you assume LLMs can have intelligence. They are glorified prediction engines, effectively fancy Markov chains. In some cases, that can be useful and valuable and we can do interesting things with them, but they cannot actually have intelligence, creativity or reason. And LLMs already manipulate and mislead people. They have been implicated in goading teenagers to suicide or leading people into conspiracy theories. Some LLMs espouse racist, anti-Semitic, or otherwise hateful views. That's a good reason to be wary of them and how they're incorporated to our lives, at least until such a time that they have appropriate safety measures and regulation in place (if that ever happens). [0] I refer you to the common-law doctrine of promissory estoppel. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-01 23:32 ` brian m. carlson @ 2025-10-02 2:30 ` Ben Knoble 0 siblings, 0 replies; 34+ messages in thread From: Ben Knoble @ 2025-10-02 2:30 UTC (permalink / raw) To: brian m. carlson Cc: Chuck Wolber, Christian Couder, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder > Le 1 oct. 2025 à 19:44, brian m. carlson <sandals@crustytoothpaste.net> a écrit : > > On 2025-10-01 at 18:59:31, Chuck Wolber wrote: > >> AI is not going away, and we need to find a way to use it productively >> _without_ losing our sense of self-reliance. If we fail to develop this ability >> when AI is hardly more skilled than an above average intern, full of hubris and >> zero real world experience, imagine how unqualified we will be when AI becomes >> competent enough to manipulate and mislead us? > > I think you assume LLMs can have intelligence. They are glorified > prediction engines, effectively fancy Markov chains. In some cases, > that can be useful and valuable and we can do interesting things with > them, but they cannot actually have intelligence, creativity or reason. > > And LLMs already manipulate and mislead people. They have been > implicated in goading teenagers to suicide or leading people into > conspiracy theories. Some LLMs espouse racist, anti-Semitic, or > otherwise hateful views. That's a good reason to be wary of them and > how they're incorporated to our lives, at least until such a time that > they have appropriate safety measures and regulation in place (if that > ever happens). A tangent, and one I’m happy to continue but off-list (I’m happy to continue publicly, but this is not the forum): I’d encourage folks to give the LLMentalist Effect [1] a read. Regardless of where you fall on “intelligence vs stochastic parrot,” I think you’ll find some interesting conclusions. [1]: https://softwarecrisis.dev/letters/llmentalist ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-01 18:59 ` Chuck Wolber 2025-10-01 23:32 ` brian m. carlson @ 2025-10-03 13:33 ` Christian Couder 1 sibling, 0 replies; 34+ messages in thread From: Christian Couder @ 2025-10-03 13:33 UTC (permalink / raw) To: Chuck Wolber Cc: git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Wed, Oct 1, 2025 at 8:59 PM Chuck Wolber <chuck@wolber.net> wrote: > > On Wed Oct 1, 2025 at 2:03 PM UTC, Christian Couder wrote: > > > To mitigate both risks, let's add an "Use of Artificial Intelligence" > > section to "Documentation/SubmittingPatches" with the goal of > > discouraging its blind use to generate content that is submitted to > > the project, while still allowing us to benefit from its help in some > > innovative, useful and less risky ways. > > I love the intent here, but it does not seem like that came through in the > proposed patch. > > I think this patch opens the door to some concerning issues, including the > potential for false accusations and inconsistent treatment of human (non-AI) > generated contributions. I don't think the patch changes anything regarding false accusation and inconsistent treatment of human generated contributions. > Sticking to a message of self-reliance (e.g. responsible AI use) and making > some technical changes to mark AI content might be a better approach. I don't think we want to mark AI content. It would be too much of a burden managing this especially knowing the limit of what should be marked or not. > > +The Developer's Certificate of Origin requires contributors to certify > > +that they know the origin of their contributions to the project and > > +that they have the right to submit it under the project's license. > > +It's not yet clear that this can be legally satisfied when submitting > > +significant amount of content that has been generated by AI tools. > > The legal issues around AI will be resolved in time, but the future will not > stop bringing us a steady stream of things that create legal ambiguity. > > Creating one-off sections that cover _multiple_ topics _including_ legal > ambiguity seems like it risks reducing clarity. To get the full picture, this > patch (and patches like it in the future) require me to navigate multiple > sections to understand all of the project's relevant legal concerns. I don't think having this section on top of the rest is a big burden for developers in general. Perhaps you are very concerned about the legal issues in the project you contribute to, but on the other hand there weren't a lot of concerns when we added the similar AI guidelines in https://git.github.io/General-Application-Information/. > I also have two specific concerns with the wording: > > 1. It repeats what is said just a few paragraphs earlier in the document. I > understand _why_ it does this, but moving the essence of this topic up to the > DCO section avoids the repetition and avoids diluting the project's legal > guidance. Being able to refer people to a single section about AI has some benefits. If you have a wording that reduces the repetition while still making the AI section easily understandable on its own, I am willing to consider it for a v3 version of this patch. > 2. What am I supposed to do with "It's not yet clear"? This is worse than > telling me nothing. It introduces a vague question with no clear guidance. It > is _true_ that no clear guidance exists, but what are the consequences when it > _does_ exist? The worst case scenario is that we have to go back and > rework/remove AI generated patches. When guidance will exist, we might have to change our "AI use" section, but we can deal with that then. It's better to adapt now to the current situation as well as we can rather than try to anticipate the future while we can't really know what it will look like. And if we have done our best to avoid accepting too much AI generated content now, then hopefully we won't have to go back and rework/remove many AI generated patches. > So why not just require something like a > declaration of AI content like the one proposed at declare-ai.org? I think this could add a lot of complexity to the process. For example people could be using many different AI tools in every contribution, like: - for code completion, - for checking for memory leaks, - for checking for possible refactorings, - for commit message translation from their native language to English, - for email translation from their native language to English, - for better understanding the feedback they received, - for helping with the forge they are using (what if it performs interactive rebases for example), - etc They might not know where to stop and might not even know if their email software (like GMail for example) is already using AI to help them write messages. It's also possible to ask different AIs to do the same job, for example checking for errors in the patches that are about to be sent. What if some AIs find no improvements and others find some? Shoud what every AI found be mentioned? What if AIs start debating between themselves whether something is an error or not and cannot come to a conclusion? Should that debate be kept somehow? And no, this is not pure speculation. I talked recently to someone working on an IDE and thinking about saving into Git all the AI context (including such AI debates) around some contributions to make sure it's available for other AIs and humans working down the road on further work based on those contributions. In short if we now ask people to declare, then those who try to do the right thing will spend a lot of time figuring things out and being burdened for perhaps no good reason while those who won't care and will do the worst on that will have the most benefits as they will not be burdened and save a lot of time. If automated processes are one day easily available to record some AI context, then I don't think we would be against them, and maybe we can decide then to ask people to use them. But we are not there yet, we don't know what they will look like and require, and it's just not our role to push on this. > > +To avoid these issues, we will reject anything that looks AI > > +generated, that sounds overly formal or bloated, that looks like AI > > +slop, that looks good on the surface but makes no sense, or that > > +senders don’t understand or cannot explain. > > That reads like a full stop rejection of all AI generated patch content. In a reply to Junio, I have suggested changing "we will reject anything that looks AI generated" to "we will reject anything that looks significantly AI generated". I am open to tweaking that even more, but we need to say somehow that submitting a lot of AI generated content as-is is not welcome. Otherwise we just don't mitigate the risks we want to mitigate. (See my reply to Junio.) > What if AI were to generate a great patch whose technical quality is exemplary > in every way? How is that any different from a great patch of exemplary > technical quality submitted by a person who is unambiguosly evil? If an AI were to generate a great patch no different than what a human would generate, then we cannot say that it looks AI generated, and then the only issue is "Do we trust the person sending the patch?". If the person has sent a lot of patches that looked AI generated in the past, we might reject the patch based on that. Otherwise, the issue is the same as if someone sends some proprietary code. Yeah, we could accept code that is proprietary if someone sends it to us and we don't realize it's proprietary code, but then if they signed off the patch, they are responsible for that according to the DCO. > But perhaps you intended it to mean a full stop rejection of content that > _looks_ like it was generated by the primitive AI we have _today_? Even going > with the interpretation you likely intended opens up a concerning double > standard. > > What if a patch "looks" AI generated, but in reality was wholly geneated by a > human? Mistakes happen. We could indeed be wrong to reject the patch based on that. See my reply to Junio about this. The thing is that we cannot eat our cake and have it too. If we want to protect the project from risks related to too much AI generated content, we need to be able to reject such content based on some criteria that are unlikely to be perfect. > Does this mean that patches generated by humans that fit the declared > criteria would be treated as if they were AI generated? Patches generated by humans that look like AI generated patches will probably be treated as if they were AI generated. That's unfortunate, but hopefully soon the few people who would generate patches that look like AI generated patches will learn and will soon make their patches look different than AI generated ones. > What about a non-native speaker who uses AI in an attempt to bridge a language > barrier? By definition they would lack the ability to judge the degree to which > their patch suddenly meets your criteria. This is one of the reasons why this v2 is different from the previous v1. We don't outright reject any use of generative AI in this v2, we want to say that the result shouldn't look like a lot of AI generated content sent as-is. If an AI was used to translate something that was initially human generated, it will hopefully not sound like it was fully AI generated. And yeah mistakes can happen, but hopefully the community and the maintainer will be able to learn and adapt from them and the process will be relatively smooth after some time. > How is any of that fair, and how could you even tell the difference? It's a judgment call, like when we decide if a patch is technically good enough to be accepted. In practice I think we will often recommend rewriting parts that look AI generated in the same way we ask to rewrite bad code or bad commit messages. We might sometimes not even mention that it seems to us like it was AI generated. You might say that it might then not be worth having an "Use of AI" section in our SubmittingPatches document, but we think it's still useful for different reasons like: - it shows that we are trying to do something against the AI related risks, especially the legal one, - it might save us from reviewing AI generated content in the first place if contributors read our SubmittingPatches document before working on patches, - it could give contributors good ideas about how to use AI in acceptable ways, - it signals to our reviewers that they should speak up against, or just reject, what looks like a lot of AI generated content, - it gives reviewers the possibility to refer contributors to some documentation about the subject. > And on a personal note, the subjective wording gives me a "walking on > eggshells" feeling. It opens the door for false accusations, and gets us away > from judging things _purely_ on their technical merit. If we see content in some patches that looks copyrighted by a company, and we are not confident that the company agreed to release it under a compatible license, we can already reject it on non technical merit. We could even already say something like: "Your code looks obviously AI generated for such and such a reason. We are not sure that so much AI generated code is compatible with the DCO as the AI could have copy-pasted proprietary code it saw during its training. So we are going to reject it." So things don't fundamentally change. In this regard, this patch just clarifies things for contributors and reviewers. In some ways, the section that this patch adds is not different from other sections like for example "Make separate commits for logically separate changes." Yeah, perhaps many developers are unfortunately not used to making separate commits for logically separate changes, and they put a lot of different things into a single commit, and they don't want to spend time reworking their working commits. So they might feel that their contributions are going to be judged on baseless red tape merit instead of the real thing. But anyway we state our standards clearly, so they should know in advance how their contributions are going to be judged. > Would it not be more _consistent_ to continue saying what is already true? That > your patches _must_ be remarkably high quality regardless of how they were > created? The issue is that quality might not be defined in the same way by everyone. Some aspects of what we consider quality might be considered otherwise (maybe "useless red tape") by some. So it's better to be explicit as much as we can. > With the addition of a required AI declaration (again, check out declare-ai.org > for an example of what that might look like), I think you cover all of the > necessary bases. And sure, someone could lie. But they can lie about meeting > the DCO as well. The consequences are the same - remove/rework. > > > +We strongly recommend using AI tools carefully and responsibly. > > Agreed, but I think you lost me here. > > Taking your words at face value, the prior paragraph reads as if the Git > project is declaring an outright ban on _all_ AI generated content (and I am > nearly certain that is _not_ what you intended to say). Yeah, we don't intend to ban _all_ AI generated content. Please suggest other wordings if some sentences read like that. What we don't want is a lot of AI generated content that no human was involved in creating. If a human was involved in creating some content, then the human has at least some copyright and some responsibility on it. > If so, why bother > continuing on with a PSA (Public Safety Announcement)? It reads like a > non-alcoholic drink that has the words, "Drink Responsibly" printed on the side > of the can. On prescription and over-the-counter drug packaging there are sometimes "Boxed Warning" (or warnings along with a red warning triangle pictogram in Europe) designed to alert people to potential side effects that could impair their ability to drive or operate heavy machinery safely. This sentence ("We strongly recommend using AI tools carefully and responsibly.") is a bit similar. It is intended to make people who would machinally read or look at the document pause and think for a bit. It's a good thing when used sparingly and for good reason which I think is the case here. [...] > Overall, I feel like an addition to the documentation is warranted, but this > version makes me uncomfortable if not a little unwelcome. Making a techncial > change to the required declarations and expanding on the theme of self-reliance > and responsible use feels like a more productive way to address this issue. > > Putting my "money where my mouth is", I am more than happy to suggest a > revision to this patch if you would like. I wanted to avoid that right now > because it seemed like a dialog was warranted first. Thanks for the review and for the offer of a revision to this patch. I would prefer not a full new version of the patch though, but rather some suggestions for alternative wordings of some sentences. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder 2025-10-01 18:59 ` Chuck Wolber @ 2025-10-01 20:59 ` Junio C Hamano 2025-10-03 8:51 ` Christian Couder 2025-10-01 21:37 ` brian m. carlson 2 siblings, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2025-10-01 20:59 UTC (permalink / raw) To: Christian Couder Cc: git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder Christian Couder <christian.couder@gmail.com> writes: > As more and more developer tools use AI, we are facing two main risks > related to AI generated content: > > - its situation regarding copyright and license is not clear, > and: > > - more and more bad quality content could be submitted for review to > the mailing list. > > To mitigate both risks, let's add an "Use of Artificial Intelligence" > section to "Documentation/SubmittingPatches" with the goal of > discouraging its blind use to generate content that is submitted to > the project, while still allowing us to benefit from its help in some > innovative, useful and less risky ways. > > Helped-by: Rick Sanders <rick@sfconservancy.org> > Signed-off-by: Christian Couder <chriscool@tuxfamily.org> > > --- > This is inspired by the "AI guidelines" section we already have for A more important thing to mention is that Rick is a lawyer at SFC helped us to draft the wording used in this one. > +[[ai]] > +=== Use of Artificial Intelligence (AI) > + > +The Developer's Certificate of Origin requires contributors to certify > +that they know the origin of their contributions to the project and > +that they have the right to submit it under the project's license. > +It's not yet clear that this can be legally satisfied when submitting > +significant amount of content that has been generated by AI tools. > + > +Another issue with AI generated content is that AIs still often > +hallucinate or just produce bad code, commit messages, documentation > +or output, even when you point out their mistakes. > + > +To avoid these issues, we will reject anything that looks AI > +generated, that sounds overly formal or bloated, that looks like AI > +slop, that looks good on the surface but makes no sense, or that > +senders don’t understand or cannot explain. A milder way to phrase this would be to jump directly to "we reject what the sender cannot explain when asked about it". "How does this work?" "Why is this a good thing to do?" "Where did it come from?" instead of saying "looks AI generated". It would sidestep the "who decides if it looks AI generated?" question. > +We strongly recommend using AI tools carefully and responsibly. > + > +Contributors would often benefit more from AI by using it to guide and > +help them step by step towards producing a solution by themselves > +rather than by asking for a full solution that they would then mostly > +copy-paste. They can also use AI to help with debugging, or with > +checking for obvious mistakes, things that can be improved, things > +that don’t match our style, guidelines or our feedback, before sending > +it to us. > + > [[git-tools]] > === Generate your patch using Git tools out of your commits. Thanks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-01 20:59 ` Junio C Hamano @ 2025-10-03 8:51 ` Christian Couder 2025-10-03 16:20 ` Junio C Hamano 0 siblings, 1 reply; 34+ messages in thread From: Christian Couder @ 2025-10-03 8:51 UTC (permalink / raw) To: Junio C Hamano Cc: git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Wed, Oct 1, 2025 at 10:59 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > As more and more developer tools use AI, we are facing two main risks > > related to AI generated content: > > > > - its situation regarding copyright and license is not clear, > > and: > > > > - more and more bad quality content could be submitted for review to > > the mailing list. > > > > To mitigate both risks, let's add an "Use of Artificial Intelligence" > > section to "Documentation/SubmittingPatches" with the goal of > > discouraging its blind use to generate content that is submitted to > > the project, while still allowing us to benefit from its help in some > > innovative, useful and less risky ways. > > > > Helped-by: Rick Sanders <rick@sfconservancy.org> > > Signed-off-by: Christian Couder <chriscool@tuxfamily.org> > > > > --- > > This is inspired by the "AI guidelines" section we already have for > > A more important thing to mention is that Rick is a lawyer at SFC > helped us to draft the wording used in this one. Yeah, right, I will mention it in a v3 if there is one. > > +[[ai]] > > +=== Use of Artificial Intelligence (AI) > > + > > +The Developer's Certificate of Origin requires contributors to certify > > +that they know the origin of their contributions to the project and > > +that they have the right to submit it under the project's license. > > +It's not yet clear that this can be legally satisfied when submitting > > +significant amount of content that has been generated by AI tools. > > + > > +Another issue with AI generated content is that AIs still often > > +hallucinate or just produce bad code, commit messages, documentation > > +or output, even when you point out their mistakes. > > + > > +To avoid these issues, we will reject anything that looks AI > > +generated, that sounds overly formal or bloated, that looks like AI > > +slop, that looks good on the surface but makes no sense, or that > > +senders don’t understand or cannot explain. > > A milder way to phrase this would be to jump directly to "we reject > what the sender cannot explain when asked about it". "How does this > work?" "Why is this a good thing to do?" "Where did it come from?" > instead of saying "looks AI generated". > > It would sidestep the "who decides if it looks AI generated?" question. I don't think the "who decides if it looks AI generated?" question is very relevant. If someone says that a patch looks mostly AI generated and gives a good argument supporting this claim, it's the same as if someone gives any other good argument against the patch. In the end, the community and you decide if the argument is good enough and if the patch should be rejected based on that (and other arguments for and against the patch of course). For example, let's suppose that in the future someone knows that ChatGPT7 is very likely to use double dash ("--") and the word "absolutely" a lot in its sentences, and notices that a contributor sent a long documentation patch that is full of them. I would say that it would be a good argument to reject that patch. We could be wrong in rejecting the patch because of that argument, because maybe the writer's style happens to be similar to ChatGPT7's style, but I think we should have the possibility to reject such patches based on the fact that they definitely look AI generated. Otherwise I don't think we can seriously claim that we try to uphold the DCO as well as we can. So I think we definitely need to say something like "we will reject anything that looks AI generated" or maybe "we will reject anything that looks significantly AI generated". In the v3 if there is one, I will change the wording to the latter. Thanks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-03 8:51 ` Christian Couder @ 2025-10-03 16:20 ` Junio C Hamano 2025-10-03 16:45 ` rsbecker 2025-10-08 7:22 ` Christian Couder 0 siblings, 2 replies; 34+ messages in thread From: Junio C Hamano @ 2025-10-03 16:20 UTC (permalink / raw) To: Christian Couder Cc: git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder Christian Couder <christian.couder@gmail.com> writes: >> A milder way to phrase this would be to jump directly to "we reject >> what the sender cannot explain when asked about it". "How does this >> work?" "Why is this a good thing to do?" "Where did it come from?" >> instead of saying "looks AI generated". >> >> It would sidestep the "who decides if it looks AI generated?" question. > > I don't think the "who decides if it looks AI generated?" question is > very relevant. If someone says that a patch looks mostly AI generated > and gives a good argument supporting this claim, it's the same as if > someone gives any other good argument against the patch. In the end, > the community and you decide if the argument is good enough and if the > patch should be rejected based on that (and other arguments for and > against the patch of course). And then who plays the final arbiter? One can keep insisting on a patch that looks to me an apparent AI slop that it was what one wrote oneself, but you may find it a plausible that it was a human creation. Then what? It is very much relevant to avoid such argument, because the point is irrelevant. We are trying to avoid accepting something the submitter has no rights to claim theirs, and requesting them to explain where it came from, how it works, etc. would be a better test than "does it look AI generated? to everybody?", wouldn't it? ^ permalink raw reply [flat|nested] 34+ messages in thread
* RE: [PATCH v2] SubmittingPatches: add section about AI 2025-10-03 16:20 ` Junio C Hamano @ 2025-10-03 16:45 ` rsbecker 2025-10-08 7:22 ` Christian Couder 1 sibling, 0 replies; 34+ messages in thread From: rsbecker @ 2025-10-03 16:45 UTC (permalink / raw) To: 'Junio C Hamano', 'Christian Couder' Cc: git, 'Taylor Blau', 'Rick Sanders', 'Git at SFC', 'Johannes Schindelin', 'Patrick Steinhardt', 'Christian Couder' On October 3, 2025 12:21 PM, Junio C Hamano wrote: >Christian Couder <christian.couder@gmail.com> writes: > >>> A milder way to phrase this would be to jump directly to "we reject >>> what the sender cannot explain when asked about it". "How does this >>> work?" "Why is this a good thing to do?" "Where did it come from?" >>> instead of saying "looks AI generated". >>> >>> It would sidestep the "who decides if it looks AI generated?" question. >> >> I don't think the "who decides if it looks AI generated?" question is >> very relevant. If someone says that a patch looks mostly AI generated >> and gives a good argument supporting this claim, it's the same as if >> someone gives any other good argument against the patch. In the end, >> the community and you decide if the argument is good enough and if the >> patch should be rejected based on that (and other arguments for and >> against the patch of course). > >And then who plays the final arbiter? One can keep insisting on a patch that looks >to me an apparent AI slop that it was what one wrote oneself, but you may find it a >plausible that it was a human creation. Then what? > >It is very much relevant to avoid such argument, because the point is irrelevant. We >are trying to avoid accepting something the submitter has no rights to claim theirs, >and requesting them to explain where it came from, how it works, etc. would be a >better test than "does it look AI generated? to everybody?", wouldn't it? Can the cover page from the originator contain statements that: a) I (whomever it is) has the legal authority to the submitted patch without violating any copyright. b) The code is original work and does not violate any IP laws where I (whomever) am located. c) The code is not generated from AI and/or despite being AI generated, I (whomever) have verified that the code works as anticipated and does not contain AI contents trained from another code-base or project that might otherwise violate b), and that I (whomever) accept all responsibility for falsely making this statement. This could be changed to an agreement maintained by the Conservancy prior to Accepting any non-trivial contributions providing the agreement is referenced in Either the cover page or commit comments. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-03 16:20 ` Junio C Hamano 2025-10-03 16:45 ` rsbecker @ 2025-10-08 7:22 ` Christian Couder 1 sibling, 0 replies; 34+ messages in thread From: Christian Couder @ 2025-10-08 7:22 UTC (permalink / raw) To: Junio C Hamano Cc: git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Fri, Oct 3, 2025 at 6:20 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > >> A milder way to phrase this would be to jump directly to "we reject > >> what the sender cannot explain when asked about it". "How does this > >> work?" "Why is this a good thing to do?" "Where did it come from?" > >> instead of saying "looks AI generated". > >> > >> It would sidestep the "who decides if it looks AI generated?" question. > > > > I don't think the "who decides if it looks AI generated?" question is > > very relevant. If someone says that a patch looks mostly AI generated > > and gives a good argument supporting this claim, it's the same as if > > someone gives any other good argument against the patch. In the end, > > the community and you decide if the argument is good enough and if the > > patch should be rejected based on that (and other arguments for and > > against the patch of course). > > And then who plays the final arbiter? You, like for any other discussion about a patch when there are different opinions. > One can keep insisting on a > patch that looks to me an apparent AI slop that it was what one > wrote oneself, but you may find it a plausible that it was a human > creation. Then what? You decide if the arguments on one side are better than those on the other side, again like for any other discussion about a patch when there are different opinions. Why should the process be different? It could be different if we think that such behavior is similar to the bad behavior we talk about in our code of conduct, but I don't think we want to go there and have some special procedures, right? > It is very much relevant to avoid such argument, because the point > is irrelevant. We are trying to avoid accepting something the > submitter has no rights to claim theirs, and requesting them to > explain where it came from, how it works, etc. would be a better > test than "does it look AI generated? to everybody?", wouldn't it? The sender can ask the AI where it came from, how it works, etc, and copy-paste the AI's answers. The sender could also prompt the AI or modify its answers so that they look human generated as much as possible. So just asking those questions might not help much in some cases. In the end, whatever the answers to some questions, we have to be able to decide if the suspicious content looks too much like it has been AI generated or not. It doesn't mean that asking those questions couldn't help in some cases. It means that we just don't want to enter into the details of which questions we can ask and if we should judge based on the answers to those questions or something else. For example our code of conduct says that we will take action "in response to any behavior that they deem inappropriate, threatening, offensive, or harmful." It doesn't tie us to asking some questions and taking action based on the answers. Thanks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder 2025-10-01 18:59 ` Chuck Wolber 2025-10-01 20:59 ` Junio C Hamano @ 2025-10-01 21:37 ` brian m. carlson 2025-10-03 14:25 ` Christian Couder 2025-10-03 20:48 ` Elijah Newren 2 siblings, 2 replies; 34+ messages in thread From: brian m. carlson @ 2025-10-01 21:37 UTC (permalink / raw) To: Christian Couder Cc: git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder [-- Attachment #1: Type: text/plain, Size: 3780 bytes --] On 2025-10-01 at 14:02:50, Christian Couder wrote: > +[[ai]] > +=== Use of Artificial Intelligence (AI) > + > +The Developer's Certificate of Origin requires contributors to certify > +that they know the origin of their contributions to the project and > +that they have the right to submit it under the project's license. > +It's not yet clear that this can be legally satisfied when submitting > +significant amount of content that has been generated by AI tools. Perhaps we'd like to write this: It's not yet clear that this can be legally satisfied when submitting significant amount of content that has been generated by AI tools, so we cannot accept this content in our project. If we're going to have a policy, we need to be direct about it and not let people draw their own conclusions. Many people don't have English as a first language and we don't want people trying to language lawyer. We could say something like this: Please do not sign off your work if you’re using an LLM to contribute unless you have included copyright and license information for all the code used in that LLM. This allows the possibility that, say, Google trains an LLM entirely on their own code, such that there is only one copyright holder and they can license it as they see fit. I don't think we _need_ to consider that case if we don't want to allow that (say, for code quality reasons), but we could if we wanted to. > +Another issue with AI generated content is that AIs still often > +hallucinate or just produce bad code, commit messages, documentation > +or output, even when you point out their mistakes. > + > +To avoid these issues, we will reject anything that looks AI > +generated, that sounds overly formal or bloated, that looks like AI > +slop, that looks good on the surface but makes no sense, or that > +senders don’t understand or cannot explain. I've definitely seen this. LLMs also typically do not write nice, logical, bisectable commits, which I personally dislike as a reviewer. > +We strongly recommend using AI tools carefully and responsibly. I think this is maybe not definitive enough. If we don't believe it's possible to sign-off when code is generated using LLMs, then we should say definitively, "Contributors may not use AI to write contributions to Git," or something similarly clear. Right now, this sounds too ambiguous and it might allow someone to write substantial code that they think is of good quality using an LLM because in their view that's careful and responsible, when we don't think that users can sign off on that and therefore that's not possible. Telling people to use tools "carefully and responsibly" is like telling people to drive "a reasonable and prudent speed" without further qualification and then being surprised when they go 200 km/hr down the road. I'd like to see the language be more like our code of conduct in that it is broad and covers a wide variety of behaviour but also explicitly states what is and is not acceptable to avoid ambiguity, confusion, or argument. > +Contributors would often benefit more from AI by using it to guide and > +help them step by step towards producing a solution by themselves > +rather than by asking for a full solution that they would then mostly > +copy-paste. They can also use AI to help with debugging, or with > +checking for obvious mistakes, things that can be improved, things > +that don’t match our style, guidelines or our feedback, before sending > +it to us. This kind of use I feel is less objectionable. I think it might be acceptable to use an LLM as a guide, a linter, or a first-pass code review. -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-01 21:37 ` brian m. carlson @ 2025-10-03 14:25 ` Christian Couder 2025-10-03 20:48 ` Elijah Newren 1 sibling, 0 replies; 34+ messages in thread From: Christian Couder @ 2025-10-03 14:25 UTC (permalink / raw) To: brian m. carlson, Christian Couder, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Wed, Oct 1, 2025 at 11:37 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2025-10-01 at 14:02:50, Christian Couder wrote: > > +[[ai]] > > +=== Use of Artificial Intelligence (AI) > > + > > +The Developer's Certificate of Origin requires contributors to certify > > +that they know the origin of their contributions to the project and > > +that they have the right to submit it under the project's license. > > +It's not yet clear that this can be legally satisfied when submitting > > +significant amount of content that has been generated by AI tools. > > Perhaps we'd like to write this: > > It's not yet clear that this can be legally satisfied when submitting > significant amount of content that has been generated by AI tools, > so we cannot accept this content in our project. > > If we're going to have a policy, we need to be direct about it and not > let people draw their own conclusions. Many people don't have English > as a first language and we don't want people trying to language lawyer. I understand why you want to be direct, but unfortunately (or fortunately depending on your point of view) some generated content is acceptable if it is not too big, or if it is specific enough or if a human has been involved enough. In a number of cases like for example translated or reworded content, wrapping lines, refactored code, or renamed variables, it is likely that a significant amount of content is acceptable because a human has already been involved and the content is specific enough. If we say right away that we cannot accept it, we might prevent interesting and useful use cases. > We could say something like this: > > Please do not sign off your work if you’re using an LLM to contribute > unless you have included copyright and license information for all the > code used in that LLM. For now I don't think we want or need to be involved in checking or trying to check what code and/or training data has been/is used in an LLM, what LLM(s) are used in which AI tools, all the AI tools that a user might have used, etc. See my reply to Chuck Wolber's review related to declare-ai.org. > This allows the possibility that, say, Google trains an LLM entirely on > their own code, such that there is only one copyright holder and they > can license it as they see fit. I don't think we _need_ to consider > that case if we don't want to allow that (say, for code quality > reasons), but we could if we wanted to. I agree it would be nice if some LLMs were trained only on specific code (or on no existing code at all) so that we could alleviate the legal issue with them, but for now I don't think they exist. We can always adapt later if/when they ever appear. > > +Another issue with AI generated content is that AIs still often > > +hallucinate or just produce bad code, commit messages, documentation > > +or output, even when you point out their mistakes. > > + > > +To avoid these issues, we will reject anything that looks AI > > +generated, that sounds overly formal or bloated, that looks like AI > > +slop, that looks good on the surface but makes no sense, or that > > +senders don’t understand or cannot explain. > > I've definitely seen this. LLMs also typically do not write nice, > logical, bisectable commits, which I personally dislike as a reviewer. > > > +We strongly recommend using AI tools carefully and responsibly. > > I think this is maybe not definitive enough. If we don't believe it's > possible to sign-off when code is generated using LLMs, then we should > say definitively, "Contributors may not use AI to write contributions to > Git," or something similarly clear. I think it's far too restrictive for no good reason. See above and see my discussion about this with Junio on the first version of this patch he sent last July. > Right now, this sounds too ambiguous and it might allow someone to write > substantial code that they think is of good quality using an LLM because > in their view that's careful and responsible, when we don't think that > users can sign off on that and therefore that's not possible. Telling > people to use tools "carefully and responsibly" is like telling people > to drive "a reasonable and prudent speed" without further qualification > and then being surprised when they go 200 km/hr down the road. The sentence ("We strongly recommend using AI tools carefully and responsibly.") is designed to make people pause and think a bit when they are reading machinally or just skimming the doc. It's not designed to set a clear limit on what is acceptable and what is not. And in fact it couldn't do so because there is no such clear limit. > I'd like to see the language be more like our code of conduct in that it > is broad and covers a wide variety of behaviour but also explicitly > states what is and is not acceptable to avoid ambiguity, confusion, or > argument. Feel free to make more suggestions. I don't think your goal is easy to achieve though. > > +Contributors would often benefit more from AI by using it to guide and > > +help them step by step towards producing a solution by themselves > > +rather than by asking for a full solution that they would then mostly > > +copy-paste. They can also use AI to help with debugging, or with > > +checking for obvious mistakes, things that can be improved, things > > +that don’t match our style, guidelines or our feedback, before sending > > +it to us. > > This kind of use I feel is less objectionable. I think it might be > acceptable to use an LLM as a guide, a linter, or a first-pass code > review. Yeah, it looks like we all agree on that. The issue is that the limit between these acceptable kinds of use and other problematic ones is fuzzy. Thanks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-01 21:37 ` brian m. carlson 2025-10-03 14:25 ` Christian Couder @ 2025-10-03 20:48 ` Elijah Newren 2025-10-03 22:20 ` brian m. carlson 2025-10-08 7:30 ` Christian Couder 1 sibling, 2 replies; 34+ messages in thread From: Elijah Newren @ 2025-10-03 20:48 UTC (permalink / raw) To: brian m. carlson, Christian Couder, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Wed, Oct 1, 2025 at 2:37 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2025-10-01 at 14:02:50, Christian Couder wrote: > > +[[ai]] > > +=== Use of Artificial Intelligence (AI) > > + > > +The Developer's Certificate of Origin requires contributors to certify > > +that they know the origin of their contributions to the project and > > +that they have the right to submit it under the project's license. > > +It's not yet clear that this can be legally satisfied when submitting > > +significant amount of content that has been generated by AI tools. > > Perhaps we'd like to write this: > > It's not yet clear that this can be legally satisfied when submitting > significant amount of content that has been generated by AI tools, > so we cannot accept this content in our project. > > If we're going to have a policy, we need to be direct about it and not > let people draw their own conclusions. Many people don't have English > as a first language and we don't want people trying to language lawyer. > > We could say something like this: > > Please do not sign off your work if you’re using an LLM to contribute > unless you have included copyright and license information for all the > code used in that LLM. Would this mean that you wanted to ban contributions like d12166d3c8bb (Merge branch 'en/docfixes', 2023-10-23), available on the list over at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/ ? We don't need to go theoretical, I've already contributed such a patch series before -- 2 years ago -- and it was merged. Granted, that was entirely documentation, and I called out the usage of AI in the cover letter, and I manually checked every change (discarding many of them) and split it into commits on my own, could easily explain any change and why it was good, etc. And I was upfront about all of it. If any use of AI is bad, do we need to revert that series? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-03 20:48 ` Elijah Newren @ 2025-10-03 22:20 ` brian m. carlson 2025-10-06 17:45 ` Junio C Hamano ` (2 more replies) 2025-10-08 7:30 ` Christian Couder 1 sibling, 3 replies; 34+ messages in thread From: brian m. carlson @ 2025-10-03 22:20 UTC (permalink / raw) To: Elijah Newren Cc: Christian Couder, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder [-- Attachment #1: Type: text/plain, Size: 4925 bytes --] On 2025-10-03 at 20:48:40, Elijah Newren wrote: > Would this mean that you wanted to ban contributions like d12166d3c8bb > (Merge branch 'en/docfixes', 2023-10-23), available on the list over > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/ > ? We don't need to go theoretical, I've already contributed such a > patch series before -- 2 years ago -- and it was merged. Granted, > that was entirely documentation, and I called out the usage of AI in > the cover letter, and I manually checked every change (discarding many > of them) and split it into commits on my own, could easily explain any > change and why it was good, etc. And I was upfront about all of it. I think the main problem here is that we don't know the copyright status of LLM outputs. It is not uncommon for them to produce output that reflects their training input and we see evidence of that in, for instance, the New York Times lawsuit against OpenAI. As I said, the situation is very unclear legally, with active litigation in multiple countries, and we have to comply with pretty much every country's laws in this situation. Whether something is legal in the United States, where you're located, is completely irrelevant to whether it is legal in Canada, where I'm located, or Germany or the UK, where we have other contributors. We also have to consider whether it's legal in all of the countries that Git is distributed in, which includes every country in which Debian has a mirror[0], even countries under international sanctions, such as Iran, Russia, and Belarus. It doesn't matter if the person using AI has indemnification, either, since that only covers civil matters, and at least in the U.S. and Canada, knowingly violating copyright is also a criminal offence. The sign-off process is designed to clearly state that a person has the ability to contribute code under the license and I don't think, as things stand, it's possible to make that assertion with code or documentation generated from an LLM except in very limited circumstances. I don't allow LLM-generated code in my personal projects that require sign-off for that reason, and neither does QEMU[1]. I don't think I could honestly assert either (a) or (b) in the DCO with LLM-generated code because it's not clear to me whether "I have the right to submit it under the…license." To quote the QEMU policy: To satisfy the DCO, the patch contributor has to fully understand the copyright and license status of content they are contributing to QEMU. With AI content generators, the copyright and license status of the output is ill-defined with no generally accepted, settled legal foundation. Where the training material is known, it is common for it to include large volumes of material under restrictive licensing/copyright terms. Even where the training material is all known to be under open source licenses, it is likely to be under a variety of terms, not all of which will be compatible with QEMU's licensing requirements. I remember the SCO situation with Linux and how it really created a lot of uncertainty with Linux because SCO created FUD around Linux licensing and how that led to the DCO being created. I am aware of the fact that many open source contributors are very unhappy that their code has been used to train LLMs without retaining credits and copyright notices or honouring the license terms[2]. And I have spent many years working with non-profits[3], where I have always been taught that we should avoid even the appearance of impropriety. It may matter less what the situation actually ends up being legally (although it could end up being quite bad) and more whether someone can imply or suggest that Git is not being distributed in compliance with the license or contains infringing code, which could effectively make it undistributable because nobody wants to take that risk. And litigation, even if Git and its contributors are successful, can be extraordinarily expensive. So I think, given the circumstances, yes, the right thing to do is to ban LLM-generated contributions with a policy very similar or identical to QEMU's. If, in the future, the legal situation changes and it becomes unambiguously legal to use LLMs across the world, then we can reconsider that policy then. [0] https://www.debian.org/mirror/list [1] https://github.com/qemu/qemu/commit/3d40db0efc22520fa6c399cf73960dced423b048 [2] Regardless of the legal concerns, this implicates professional ethics concerns, such as §1.5 of the ACM Code of Ethics[4]. Ethics requirements usually go well beyond what the law requires. [3] Software Freedom Conservancy, which handles legal matters for the Git project, is a non-profit. [4] https://www.acm.org/code-of-ethics -- brian m. carlson (they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-03 22:20 ` brian m. carlson @ 2025-10-06 17:45 ` Junio C Hamano 2025-10-08 4:18 ` Elijah Newren 2025-10-08 9:28 ` Christian Couder 2025-10-08 4:18 ` Elijah Newren 2025-10-08 8:37 ` Christian Couder 2 siblings, 2 replies; 34+ messages in thread From: Junio C Hamano @ 2025-10-06 17:45 UTC (permalink / raw) To: brian m. carlson Cc: Elijah Newren, Christian Couder, git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder "brian m. carlson" <sandals@crustytoothpaste.net> writes: > It may matter less what the situation actually ends up being legally > (although it could end up being quite bad) and more whether someone can > imply or suggest that Git is not being distributed in compliance with > the license or contains infringing code, which could effectively make it > undistributable because nobody wants to take that risk. And litigation, > even if Git and its contributors are successful, can be extraordinarily > expensive. > > So I think, given the circumstances, yes, the right thing to do is to > ban LLM-generated contributions with a policy very similar or identical > to QEMU's. If, in the future, the legal situation changes and it > becomes unambiguously legal to use LLMs across the world, then we can > reconsider that policy then. OK, so here is theirs for further discussion minimally adjusted for our use. I do not see much difference at least in spirit with what started this thread, but phrasing is certainly firmer, and I have no problem with it. Use of AI content generators ~~~~~~~~~~~~~~~~~~~~~~~~~~~ TL;DR: **Current Git project policy is copied from what QEMU does. To DECLINE any contributions which are believed to include or derive from AI generated content. This includes ChatGPT, Claude, Copilot, Llama and similar tools.** The increasing prevalence of AI-assisted software development results in a number of difficult legal questions and risks for software projects, including Git. Of particular concern is content generated by `Large Language Models <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). The Git community requires that contributors certify their patch submissions are made in accordance with the rules of the `Developer's Certificate of Origin (DCO) <dco>`. To satisfy the DCO, the patch contributor has to fully understand the copyright and license status of content they are contributing to Git. With AI content generators, the copyright and license status of the output is ill-defined with no generally accepted, settled legal foundation. Where the training material is known, it is common for it to include large volumes of material under restrictive licensing/copyright terms. Even where the training material is all known to be under open source licenses, it is likely to be under a variety of terms, not all of which will be compatible with Git's licensing requirements. How contributors could comply with DCO terms (b) or (c) for the output of AI content generators commonly available today is unclear. The Git project is not willing or able to accept the legal risks of non-compliance. The Git project thus requires that contributors refrain from using AI content generators on patches intended to be submitted to the project, and will decline any contribution if use of AI is either known or suspected. This policy does not apply to other uses of AI, such as researching APIs or algorithms, static analysis, or debugging, provided their output is not to be included in contributions. Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content generation agents which are built on top of such tools. This policy may evolve as AI tools mature and the legal situation is clarifed. In the meanwhile, requests for exceptions to this policy will be evaluated by the Git project on a case by case basis. To be granted an exception, a contributor will need to demonstrate clarity of the license and copyright status for the tool's output in relation to its training model and code, to the satisfaction of the project maintainers. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-06 17:45 ` Junio C Hamano @ 2025-10-08 4:18 ` Elijah Newren 2025-10-12 15:07 ` Junio C Hamano 2025-10-08 9:28 ` Christian Couder 1 sibling, 1 reply; 34+ messages in thread From: Elijah Newren @ 2025-10-08 4:18 UTC (permalink / raw) To: Junio C Hamano Cc: brian m. carlson, Christian Couder, git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Mon, Oct 6, 2025 at 10:45 AM Junio C Hamano <gitster@pobox.com> wrote: > > "brian m. carlson" <sandals@crustytoothpaste.net> writes: > > > It may matter less what the situation actually ends up being legally > > (although it could end up being quite bad) and more whether someone can > > imply or suggest that Git is not being distributed in compliance with > > the license or contains infringing code, which could effectively make it > > undistributable because nobody wants to take that risk. And litigation, > > even if Git and its contributors are successful, can be extraordinarily > > expensive. > > > > So I think, given the circumstances, yes, the right thing to do is to > > ban LLM-generated contributions with a policy very similar or identical > > to QEMU's. If, in the future, the legal situation changes and it > > becomes unambiguously legal to use LLMs across the world, then we can > > reconsider that policy then. > > OK, so here is theirs for further discussion minimally adjusted for > our use. I do not see much difference at least in spirit with what > started this thread, but phrasing is certainly firmer, and I have no > problem with it. > > > > Use of AI content generators > ~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > TL;DR: > > **Current Git project policy is copied from what QEMU does. To > DECLINE any contributions which are believed to include or derive > from AI generated content. This includes ChatGPT, Claude, Copilot, > Llama and similar tools.** > > The increasing prevalence of AI-assisted software development results in a > number of difficult legal questions and risks for software projects, including > Git. Of particular concern is content generated by `Large Language Models > <https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). > > The Git community requires that contributors certify their patch submissions > are made in accordance with the rules of the `Developer's Certificate of > Origin (DCO) <dco>`. > > To satisfy the DCO, the patch contributor has to fully understand the > copyright and license status of content they are contributing to Git. With AI > content generators, the copyright and license status of the output is > ill-defined with no generally accepted, settled legal foundation. > > Where the training material is known, it is common for it to include large > volumes of material under restrictive licensing/copyright terms. Even where > the training material is all known to be under open source licenses, it is > likely to be under a variety of terms, not all of which will be compatible > with Git's licensing requirements. > > How contributors could comply with DCO terms (b) or (c) for the output of AI > content generators commonly available today is unclear. The Git project is > not willing or able to accept the legal risks of non-compliance. > > The Git project thus requires that contributors refrain from using AI content > generators on patches intended to be submitted to the project, and will > decline any contribution if use of AI is either known or suspected. > > This policy does not apply to other uses of AI, such as researching APIs or > algorithms, static analysis, or debugging, provided their output is not to be > included in contributions. > > Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's > ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content > generation agents which are built on top of such tools. > > This policy may evolve as AI tools mature and the legal situation is > clarifed. In the meanwhile, requests for exceptions to this policy will be > evaluated by the Git project on a case by case basis. To be granted an > exception, a contributor will need to demonstrate clarity of the license and > copyright status for the tool's output in relation to its training model and > code, to the satisfaction of the project maintainers. I preferred the version Christian sent, but *if* we end up adopting some of the QEMU wording, I've got a logistics question: Will we grandfather already accepted series, or proactively revert them? For example, the series merged at d12166d3c8bb (Merge branch 'en/docfixes', 2023-10-23) [or on the list at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/ ], which was already merged a few years ago. I don't think that series has anything remotely questionable from a copyright standpoint, yet the QEMU-inspired wording would explicitly disallow it as far as I can tell, and would claim that such kinds of things would never be accepted in our project, even though people can find and point to the fact that we already did. Would that be problematic? Of course, if we don't adopt the QEMU wording and go with Christian's version, then we don't need to worry about whether to revert or explain how it is grandfathered. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-08 4:18 ` Elijah Newren @ 2025-10-12 15:07 ` Junio C Hamano 0 siblings, 0 replies; 34+ messages in thread From: Junio C Hamano @ 2025-10-12 15:07 UTC (permalink / raw) To: Elijah Newren Cc: brian m. carlson, Christian Couder, git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder Elijah Newren <newren@gmail.com> writes: >> ... >> This policy may evolve as AI tools mature and the legal situation is >> clarifed. In the meanwhile, requests for exceptions to this policy will be >> evaluated by the Git project on a case by case basis. To be granted an >> exception, a contributor will need to demonstrate clarity of the license and >> copyright status for the tool's output in relation to its training model and >> code, to the satisfaction of the project maintainers. > > I preferred the version Christian sent, but *if* we end up adopting > some of the QEMU wording, I've got a logistics question: > > Will we grandfather already accepted series, or proactively revert them? Stepping back a bit, can we treat this new guideline element just like any other guidelines in SubmittingPatches and also CodingGuidelines? We have certain rules in our SubmittingPatches and CodingGuidelines to help us not get into trouble in the future. We require the log messages to follow certain style to give them uniformity as otherwise it would become harder to dig the history later to find cause of an issue we are having today, and more importantly what the design parameters were back when the change we are having trouble with was written. We ask people to follow certain style in the code as it would make it more work to understand code if different styles are mixed together without reason. But we also frown upon churning the codebase for the sake of strictly match the prescribed coding style. The rules are mostly to control newly written things so that they do not make our codebase into worse shape than it currently is. When we update a part of our codebase for some reason, other than "there is no particular reason but we want to fix them to match guidelines", we would take existing guideline violations the touched part may have into account, of course. And we find no need in our other non-AI guidelines to say "we grandfather badness that already exists, but we try our best to enforce the guidelines as strictly as possible", and the reason, I think, is because that is implicitly what everybody expects. Should the "We tell you again not to blindly add things with unknown origin, given the recent proliferation of AI coding product" rule be any special and different? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-06 17:45 ` Junio C Hamano 2025-10-08 4:18 ` Elijah Newren @ 2025-10-08 9:28 ` Christian Couder 2025-10-13 18:14 ` Junio C Hamano 1 sibling, 1 reply; 34+ messages in thread From: Christian Couder @ 2025-10-08 9:28 UTC (permalink / raw) To: Junio C Hamano Cc: brian m. carlson, Elijah Newren, git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Mon, Oct 6, 2025 at 7:45 PM Junio C Hamano <gitster@pobox.com> wrote: > OK, so here is theirs for further discussion minimally adjusted for > our use. I do not see much difference at least in spirit with what > started this thread, but phrasing is certainly firmer, and I have no > problem with it. I don't think it's a good idea to be too firm. It could prevent people willing to follow the rules from doing things that are actually acceptable while it won't prevent the risks from people not following the rules anyway. Some of us have given examples of some uses that are likely acceptable but seem to be banned by such firm wording. Do we want to discuss again if translating a commit message using an AI tool is fine or not? So I think we should start with something less firm, and then discuss the pros vs cons of being firmer if some insist on being firmer then. [...] > How contributors could comply with DCO terms (b) or (c) for the output of AI > content generators commonly available today is unclear. The Git project is > not willing or able to accept the legal risks of non-compliance. I think this could be understood as if the Git project is responsible for contributors submitting content they should not submit. I don't think we should go into this. [...] > This policy does not apply to other uses of AI, such as researching APIs or > algorithms, static analysis, or debugging, provided their output is not to be > included in contributions. This is not realistic. If an AI does static analysis for example, it is likely to suggest a fix for the issues it finds. Hopefully the fix will be the right one, so it will end up being included in the contributions. > Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's s/includes/include/ > ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content > generation agents which are built on top of such tools. I don't think we should list examples like this. It could be understood as if we ban such tools while they can help with static analysis, typo fixing, translation, etc... On the other hand some IDEs, for example, might include AI tools without users being really aware of them. > This policy may evolve as AI tools mature and the legal situation is > clarifed. In the meanwhile, requests for exceptions to this policy will be > evaluated by the Git project on a case by case basis. I don't think we want to go into such processes. > To be granted an > exception, a contributor will need to demonstrate clarity of the license and > copyright status for the tool's output in relation to its training model and > code, to the satisfaction of the project maintainers. If there are ever such AI tools trained on material such that the legal risk is reduced, we will likely know about it. And even though the legal risk will be reduced, the risk to be flooded with bad output might not. So I don't think it's worth getting into this. Thanks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-08 9:28 ` Christian Couder @ 2025-10-13 18:14 ` Junio C Hamano 2025-10-23 17:32 ` Junio C Hamano 0 siblings, 1 reply; 34+ messages in thread From: Junio C Hamano @ 2025-10-13 18:14 UTC (permalink / raw) To: Christian Couder Cc: brian m. carlson, Elijah Newren, git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder Christian Couder <christian.couder@gmail.com> writes: > On Mon, Oct 6, 2025 at 7:45 PM Junio C Hamano <gitster@pobox.com> wrote: > >> OK, so here is theirs for further discussion minimally adjusted for >> our use. I do not see much difference at least in spirit with what >> started this thread, but phrasing is certainly firmer, and I have no >> problem with it. > > I don't think it's a good idea to be too firm. It could prevent people > willing to follow the rules from doing things that are actually > acceptable while it won't prevent the risks from people not following > the rules anyway. >> How contributors could comply with DCO terms (b) or (c) for the output of AI >> content generators commonly available today is unclear. The Git project is >> not willing or able to accept the legal risks of non-compliance. > > I think this could be understood as if the Git project is responsible > for contributors submitting content they should not submit. I don't > think we should go into this. When the project distributes work that it has no right to distribute, those who claim to be right holders would try to make the project be held responsible for it. It is a different story if the court agrees. > [...] > >> This policy does not apply to other uses of AI, such as researching APIs or >> algorithms, static analysis, or debugging, provided their output is not to be >> included in contributions. > > This is not realistic. If an AI does static analysis for example, it > is likely to suggest a fix for the issues it finds. Hopefully the fix > will be the right one, so it will end up being included in the > contributions. > >> Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's > > s/includes/include/ We are not in the business of typofixing QEMU policy. Send that patch in their direction ;-). I do not have strong preference either way. Even if the wording is firm, it is really up to each contributor to honor the guideline and be honest with us. You may see autocorrection in your editor fix a typo for you, and more advanced tools may offer to rewrite what you wrote, whether it is prose or code. It is very plausible that, especially for simple fixes, the result may be what the contributor would have arrived on their own anyway, and in such a case, even the contributor would not even know how much came from "AI" or simple dictionary, or if that AI learned with things you should not have seen. So, I do not think it makes too big a difference in practice whether we adopt the QEMU with minimum rewrite, or the version you posted. As the one you sent is in line with what we give applicants of our mentoring programs, and it was read over by our SFC lawyer, I'd prefer to keep the version I already have in my tree. Not moving on either, I think, is worse than adopting either in this case. Thanks. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-13 18:14 ` Junio C Hamano @ 2025-10-23 17:32 ` Junio C Hamano 0 siblings, 0 replies; 34+ messages in thread From: Junio C Hamano @ 2025-10-23 17:32 UTC (permalink / raw) To: Christian Couder Cc: brian m. carlson, Elijah Newren, git, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder Junio C Hamano <gitster@pobox.com> writes: > I do not have strong preference either way. Even if the wording is > firm, it is really up to each contributor to honor the guideline and > be honest with us. You may see autocorrection in your editor fix a > typo for you, and more advanced tools may offer to rewrite what you > wrote, whether it is prose or code. It is very plausible that, > especially for simple fixes, the result may be what the contributor > would have arrived on their own anyway, and in such a case, even the > contributor would not even know how much came from "AI" or simple > dictionary, or if that AI learned with things you should not have > seen. > > So, I do not think it makes too big a difference in practice whether > we adopt the QEMU with minimum rewrite, or the version you posted. > As the one you sent is in line with what we give applicants of our > mentoring programs, and it was read over by our SFC lawyer, I'd > prefer to keep the version I already have in my tree. Not moving on > either, I think, is worse than adopting either in this case. Taking time to discuss before deciding on an important issue is one thing, but waiting for more input to happen and not moving in either direction is worse than picking one and move on. As I said above, I do not quite see material difference between either one in practice. I guess it is time to make an executive decision to merge it down to 'next'. We can still tweak the language if we want, but it is more important to have a written policy to reject materials of unknown origin (whether it came from generative AI or not) than not having one while we wish to be able to pick the best policy, waiting for a better argument to come from somewhere. As to Elijah's concern about grandfathering, I do not think it has much practical benefit to make such a declaration. If it turns out that older "contributions" had added something we shouldn't have, regardless of how it was generated (either from generative AI or a human contributor typing while unconciously recalling what they saw elsewhere), we may need to revert it anyway, so we will deal with it when it becomes an issue. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-03 22:20 ` brian m. carlson 2025-10-06 17:45 ` Junio C Hamano @ 2025-10-08 4:18 ` Elijah Newren 2025-10-08 8:37 ` Christian Couder 2 siblings, 0 replies; 34+ messages in thread From: Elijah Newren @ 2025-10-08 4:18 UTC (permalink / raw) To: brian m. carlson, Elijah Newren, Christian Couder, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Fri, Oct 3, 2025 at 3:20 PM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2025-10-03 at 20:48:40, Elijah Newren wrote: > > Would this mean that you wanted to ban contributions like d12166d3c8bb > > (Merge branch 'en/docfixes', 2023-10-23), available on the list over > > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/ > > ? We don't need to go theoretical, I've already contributed such a > > patch series before -- 2 years ago -- and it was merged. Granted, > > that was entirely documentation, and I called out the usage of AI in > > the cover letter, and I manually checked every change (discarding many > > of them) and split it into commits on my own, could easily explain any > > change and why it was good, etc. And I was upfront about all of it. > > I think the main problem here is that we don't know the copyright > status of LLM outputs. It is not uncommon for them to produce output > that reflects their training input and we see evidence of that in, for > instance, the New York Times lawsuit against OpenAI. > > As I said, the situation is very unclear legally, with active litigation > in multiple countries, and we have to comply with pretty much every > country's laws in this situation. Whether something is legal in the > United States, where you're located, is completely irrelevant to whether > it is legal in Canada, where I'm located, or Germany or the UK, where we > have other contributors. We also have to consider whether it's legal in > all of the countries that Git is distributed in, which includes every > country in which Debian has a mirror[0], even countries under > international sanctions, such as Iran, Russia, and Belarus. > > It doesn't matter if the person using AI has indemnification, either, > since that only covers civil matters, and at least in the U.S. and > Canada, knowingly violating copyright is also a criminal offence. > > The sign-off process is designed to clearly state that a person has the > ability to contribute code under the license and I don't think, as > things stand, it's possible to make that assertion with code or > documentation generated from an LLM except in very limited > circumstances. I don't allow LLM-generated code in my personal projects > that require sign-off for that reason, and neither does QEMU[1]. I > don't think I could honestly assert either (a) or (b) in the DCO with > LLM-generated code because it's not clear to me whether "I have the > right to submit it under the…license." > > To quote the QEMU policy: > > To satisfy the DCO, the patch contributor has to fully understand the > copyright and license status of content they are contributing to QEMU. With AI > content generators, the copyright and license status of the output is > ill-defined with no generally accepted, settled legal foundation. > > Where the training material is known, it is common for it to include large > volumes of material under restrictive licensing/copyright terms. Even where > the training material is all known to be under open source licenses, it is > likely to be under a variety of terms, not all of which will be compatible > with QEMU's licensing requirements. > > I remember the SCO situation with Linux and how it really created a lot > of uncertainty with Linux because SCO created FUD around Linux licensing > and how that led to the DCO being created. I am aware of the fact that > many open source contributors are very unhappy that their code has been > used to train LLMs without retaining credits and copyright notices or > honouring the license terms[2]. And I have spent many years working > with non-profits[3], where I have always been taught that we should > avoid even the appearance of impropriety. > > It may matter less what the situation actually ends up being legally > (although it could end up being quite bad) and more whether someone can > imply or suggest that Git is not being distributed in compliance with > the license or contains infringing code, which could effectively make it > undistributable because nobody wants to take that risk. And litigation, > even if Git and its contributors are successful, can be extraordinarily > expensive. > > So I think, given the circumstances, yes, the right thing to do is to > ban LLM-generated contributions with a policy very similar or identical > to QEMU's. If, in the future, the legal situation changes and it > becomes unambiguously legal to use LLMs across the world, then we can > reconsider that policy then. > > [0] https://www.debian.org/mirror/list > [1] https://github.com/qemu/qemu/commit/3d40db0efc22520fa6c399cf73960dced423b048 > [2] Regardless of the legal concerns, this implicates professional > ethics concerns, such as §1.5 of the ACM Code of Ethics[4]. Ethics > requirements usually go well beyond what the law requires. > [3] Software Freedom Conservancy, which handles legal matters for the > Git project, is a non-profit. > [4] https://www.acm.org/code-of-ethics Thanks for clarifying your position. To me, your preferred wording for the position statement doesn't quite match the rationale. I think for cases of: * fixing typos * finding wording tweaks to existing documentation * tab completion of e.g. the next three lines in an IDE when limited to e.g. what most any engineer in the world would write based on the comment on the line before (or if the AI plugin doesn't quite get the three lines right, well I already had them in my head and if it gets close enough, it's easier for me to accept and then edit into what I already knew I wanted) * assisting with wording in writing a commit message as an editor (or maybe even suggesting some initial wording based on the patch I already wrote) * identifying potential bugs in a patch * identifying potential typos in documentation that none of these particular uses cause problems for the rationale you specify, but at least the first four would be disallowed by the preferred wording you want, and perhaps even the last two wouldn't be allowed either (though I don't think AI is very good at the second to last one, so not a big loss on that particular one yet). Perhaps due to my incomplete understanding of copyright all of these would actually be problematic with the rationale you already gave for reasons I don't yet know about or just haven't yet understood, but if not, I'd rather not disallow these kinds of uses. The first two from my list have a good example in the form of the series at d12166d3c8bb (Merge branch 'en/docfixes', 2023-10-23) [or on the list at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/ ], which was already merged a few years ago. So if we adopt wording that disallows these kinds of changes, then we also need to talk about whether we grandfather already-merged series or proactively revert them. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-03 22:20 ` brian m. carlson 2025-10-06 17:45 ` Junio C Hamano 2025-10-08 4:18 ` Elijah Newren @ 2025-10-08 8:37 ` Christian Couder 2025-10-08 9:28 ` Michal Suchánek 2025-10-09 1:13 ` Collin Funk 2 siblings, 2 replies; 34+ messages in thread From: Christian Couder @ 2025-10-08 8:37 UTC (permalink / raw) To: brian m. carlson, Elijah Newren, Christian Couder, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Sat, Oct 4, 2025 at 12:20 AM brian m. carlson <sandals@crustytoothpaste.net> wrote: > > On 2025-10-03 at 20:48:40, Elijah Newren wrote: > > Would this mean that you wanted to ban contributions like d12166d3c8bb > > (Merge branch 'en/docfixes', 2023-10-23), available on the list over > > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/ > > ? We don't need to go theoretical, I've already contributed such a > > patch series before -- 2 years ago -- and it was merged. Granted, > > that was entirely documentation, and I called out the usage of AI in > > the cover letter, and I manually checked every change (discarding many > > of them) and split it into commits on my own, could easily explain any > > change and why it was good, etc. And I was upfront about all of it. > > I think the main problem here is that we don't know the copyright > status of LLM outputs. It's very unlikely that whatever is decided about the copyright status of LLM outputs will fundamentally change copyright law. So for example small changes, or changes where a human has been involved a lot, or changes that are very specific, and so on, are very likely acceptable. > It is not uncommon for them to produce output > that reflects their training input and we see evidence of that in, for > instance, the New York Times lawsuit against OpenAI. You might say something very similar about people contributing proprietary code: "It is not uncommon to have people copy-paste some proprietary code into an open source project and we see evidence of that in such and such incidents." So it's just fine to accept some degree of risk. We have to accept it anyway. Saying "we will ban everything AI generated" will not make the risk disappear either. > As I said, the situation is very unclear legally, with active litigation > in multiple countries, and we have to comply with pretty much every > country's laws in this situation. Whether something is legal in the > United States, where you're located, is completely irrelevant to whether > it is legal in Canada, where I'm located, or Germany or the UK, where we > have other contributors. We also have to consider whether it's legal in > all of the countries that Git is distributed in, which includes every > country in which Debian has a mirror[0], even countries under > international sanctions, such as Iran, Russia, and Belarus. I don't quite agree with this. Theoretically if the official mirrors are only in a few countries, then only the laws in these few countries (+ US law as the Conservancy is US based) might be really legally relevant for the project. Then it's the responsibility of distributions or people cloning/downloading the software to check that it's legal in the countries they distribute or clone/download it. In practice we should pay attention a bit to make sure we don't create obvious legal problems for too many people, but if some countries decide to have laws that are too stupid and ban too many things, we could decide that we should definitely not pay attention to those laws. > It doesn't matter if the person using AI has indemnification, either, > since that only covers civil matters, and at least in the U.S. and > Canada, knowingly violating copyright is also a criminal offence. > > The sign-off process is designed to clearly state that a person has the > ability to contribute code under the license and I don't think, as > things stand, it's possible to make that assertion with code or > documentation generated from an LLM except in very limited > circumstances. I think in practice those "very limited circumstances" can cover a lot of different things though. Do we really want to enter into a legal debate over what https://en.wikipedia.org/wiki/Sc%C3%A8nes_%C3%A0_faire means for software for example? Or about allowing or disallowing translation of documentation or commit messages based on the fact that the tools used for translation use an LLM or not? I have given a lot of examples of what is very likely acceptable. Elijah has given a very good concrete example showing why we should not outright ban AI too. If you think they are not good examples please tell it clearly. Otherwise I think you cannot keep saying that they are related to "very limited circumstances". > I don't allow LLM-generated code in my personal projects > that require sign-off for that reason, and neither does QEMU[1]. I > don't think I could honestly assert either (a) or (b) in the DCO with > LLM-generated code because it's not clear to me whether "I have the > right to submit it under the…license." > > To quote the QEMU policy: > > To satisfy the DCO, the patch contributor has to fully understand the > copyright and license status of content they are contributing to QEMU. With AI > content generators, the copyright and license status of the output is > ill-defined with no generally accepted, settled legal foundation. > > Where the training material is known, it is common for it to include large > volumes of material under restrictive licensing/copyright terms. Even where > the training material is all known to be under open source licenses, it is > likely to be under a variety of terms, not all of which will be compatible > with QEMU's licensing requirements. The QEMU policy was discussed in the previous version already. > I remember the SCO situation with Linux and how it really created a lot > of uncertainty with Linux because SCO created FUD around Linux licensing > and how that led to the DCO being created. I am aware of the fact that > many open source contributors are very unhappy that their code has been > used to train LLMs without retaining credits and copyright notices or > honouring the license terms[2]. I don't think it's very relevant for your position on this. On the contrary, if LLMs have been trained mostly with open source code, then if they produce copyrighted output, that output is more likely to be compatible with the GPL. It has even been suggested (and discussed in this thread) that some AIs should be trained only with open source material (for example MIT licensed material?) so that we could stop worrying about including it. If that happens, there would be no reason to outright ban AI generated content, right? > And I have spent many years working > with non-profits[3], where I have always been taught that we should > avoid even the appearance of impropriety. Adding a section restricting AI use, even if it doesn't go as far as you would like, is already a first step in the direction you want. If this gets merged, you can always send patches on top to make it more restrictive. > It may matter less what the situation actually ends up being legally > (although it could end up being quite bad) and more whether someone can > imply or suggest that Git is not being distributed in compliance with > the license or contains infringing code, which could effectively make it > undistributable because nobody wants to take that risk. And litigation, > even if Git and its contributors are successful, can be extraordinarily > expensive. There are already legal risks anyway (see above). ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-08 8:37 ` Christian Couder @ 2025-10-08 9:28 ` Michal Suchánek 2025-10-08 9:35 ` Christian Couder 2025-10-09 1:13 ` Collin Funk 1 sibling, 1 reply; 34+ messages in thread From: Michal Suchánek @ 2025-10-08 9:28 UTC (permalink / raw) To: Christian Couder Cc: brian m. carlson, Elijah Newren, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder Hello, On Wed, Oct 08, 2025 at 10:37:53AM +0200, Christian Couder wrote: > On Sat, Oct 4, 2025 at 12:20 AM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > > > > I remember the SCO situation with Linux and how it really created a lot > > of uncertainty with Linux because SCO created FUD around Linux licensing > > and how that led to the DCO being created. I am aware of the fact that > > many open source contributors are very unhappy that their code has been > > used to train LLMs without retaining credits and copyright notices or > > honouring the license terms[2]. > > I don't think it's very relevant for your position on this. On the > contrary, if LLMs have been trained mostly with open source code, then > if they produce copyrighted output, that output is more likely to be > compatible with the GPL. It has even been suggested (and discussed in > this thread) that some AIs should be trained only with open source > material (for example MIT licensed material?) so that we could stop > worrying about including it. If that happens, there would be no reason > to outright ban AI generated content, right? even MIT license requires attribution. As most current day LLMs fail to provide that their output is legally dubious even when trained on fairly permissively licensed code. Thanks Michal ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-08 9:28 ` Michal Suchánek @ 2025-10-08 9:35 ` Christian Couder 0 siblings, 0 replies; 34+ messages in thread From: Christian Couder @ 2025-10-08 9:35 UTC (permalink / raw) To: Michal Suchánek Cc: brian m. carlson, Elijah Newren, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder Hi, On Wed, Oct 8, 2025 at 11:28 AM Michal Suchánek <msuchanek@suse.de> wrote: > > I don't think it's very relevant for your position on this. On the > > contrary, if LLMs have been trained mostly with open source code, then > > if they produce copyrighted output, that output is more likely to be > > compatible with the GPL. It has even been suggested (and discussed in > > this thread) that some AIs should be trained only with open source > > material (for example MIT licensed material?) so that we could stop > > worrying about including it. If that happens, there would be no reason > > to outright ban AI generated content, right? > > even MIT license requires attribution. As most current day LLMs fail to > provide that their output is legally dubious even when trained on fairly > permissively licensed code. Fair enough, but then if an AI is ever trained with the particular purpose of producing code that can be included into MIT compatible code bases, then hopefully people training it will make sure it can help with properly attributing that code. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-08 8:37 ` Christian Couder 2025-10-08 9:28 ` Michal Suchánek @ 2025-10-09 1:13 ` Collin Funk 1 sibling, 0 replies; 34+ messages in thread From: Collin Funk @ 2025-10-09 1:13 UTC (permalink / raw) To: Christian Couder Cc: brian m. carlson, Elijah Newren, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder Christian Couder <christian.couder@gmail.com> writes: > On Sat, Oct 4, 2025 at 12:20 AM brian m. carlson > <sandals@crustytoothpaste.net> wrote: >> >> On 2025-10-03 at 20:48:40, Elijah Newren wrote: >> > Would this mean that you wanted to ban contributions like d12166d3c8bb >> > (Merge branch 'en/docfixes', 2023-10-23), available on the list over >> > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/ >> > ? We don't need to go theoretical, I've already contributed such a >> > patch series before -- 2 years ago -- and it was merged. Granted, >> > that was entirely documentation, and I called out the usage of AI in >> > the cover letter, and I manually checked every change (discarding many >> > of them) and split it into commits on my own, could easily explain any >> > change and why it was good, etc. And I was upfront about all of it. >> >> I think the main problem here is that we don't know the copyright >> status of LLM outputs. > > It's very unlikely that whatever is decided about the copyright status > of LLM outputs will fundamentally change copyright law. So for example > small changes, or changes where a human has been involved a lot, or > changes that are very specific, and so on, are very likely acceptable. The issue is lack of law, from my understanding. There has been zero political will in the US for copyright legislation with respect to the output of AI. Therefore, we are left with case law that is still ongoing, that is, no precedent. >> I remember the SCO situation with Linux and how it really created a lot >> of uncertainty with Linux because SCO created FUD around Linux licensing >> and how that led to the DCO being created. I am aware of the fact that >> many open source contributors are very unhappy that their code has been >> used to train LLMs without retaining credits and copyright notices or >> honouring the license terms[2]. > > I don't think it's very relevant for your position on this. On the > contrary, if LLMs have been trained mostly with open source code, then > if they produce copyrighted output, that output is more likely to be > compatible with the GPL. It has even been suggested (and discussed in > this thread) that some AIs should be trained only with open source > material (for example MIT licensed material?) so that we could stop > worrying about including it. If that happens, there would be no reason > to outright ban AI generated content, right? Not all open source code is compatible with other open source code. If you use the output of a model trained on GPLv3+ code in a GPLv2-only project, then the creator of the GPLv3+ code could claim that you violated the license since they are not compatible. Whether they would win in court or not, I have no clue, but it is probably best to avoid that situation. Collin ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2] SubmittingPatches: add section about AI 2025-10-03 20:48 ` Elijah Newren 2025-10-03 22:20 ` brian m. carlson @ 2025-10-08 7:30 ` Christian Couder 1 sibling, 0 replies; 34+ messages in thread From: Christian Couder @ 2025-10-08 7:30 UTC (permalink / raw) To: Elijah Newren Cc: brian m. carlson, git, Junio C Hamano, Taylor Blau, Rick Sanders, Git at SFC, Johannes Schindelin, Patrick Steinhardt, Christian Couder On Fri, Oct 3, 2025 at 10:48 PM Elijah Newren <newren@gmail.com> wrote: > > On Wed, Oct 1, 2025 at 2:37 PM brian m. carlson > <sandals@crustytoothpaste.net> wrote: > > We could say something like this: > > > > Please do not sign off your work if you’re using an LLM to contribute > > unless you have included copyright and license information for all the > > code used in that LLM. > > Would this mean that you wanted to ban contributions like d12166d3c8bb > (Merge branch 'en/docfixes', 2023-10-23), available on the list over > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/ > ? We don't need to go theoretical, I've already contributed such a > patch series before -- 2 years ago -- and it was merged. Granted, > that was entirely documentation, and I called out the usage of AI in > the cover letter, and I manually checked every change (discarding many > of them) and split it into commits on my own, could easily explain any > change and why it was good, etc. And I was upfront about all of it. This is a good example why we don't want to ban any use of generated AI. Thanks. > If any use of AI is bad, do we need to revert that series? ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2025-10-23 17:32 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-06-30 20:32 [RFC/PATCH] SubmittingPatches: forbid use of genAI to generate changes Junio C Hamano 2025-06-30 21:07 ` brian m. carlson 2025-06-30 21:23 ` Collin Funk 2025-07-01 10:36 ` Christian Couder 2025-07-01 11:07 ` Christian Couder 2025-07-01 17:33 ` Junio C Hamano 2025-07-01 16:20 ` Junio C Hamano 2025-07-08 14:23 ` Christian Couder 2025-10-01 14:02 ` [PATCH v2] SubmittingPatches: add section about AI Christian Couder 2025-10-01 18:59 ` Chuck Wolber 2025-10-01 23:32 ` brian m. carlson 2025-10-02 2:30 ` Ben Knoble 2025-10-03 13:33 ` Christian Couder 2025-10-01 20:59 ` Junio C Hamano 2025-10-03 8:51 ` Christian Couder 2025-10-03 16:20 ` Junio C Hamano 2025-10-03 16:45 ` rsbecker 2025-10-08 7:22 ` Christian Couder 2025-10-01 21:37 ` brian m. carlson 2025-10-03 14:25 ` Christian Couder 2025-10-03 20:48 ` Elijah Newren 2025-10-03 22:20 ` brian m. carlson 2025-10-06 17:45 ` Junio C Hamano 2025-10-08 4:18 ` Elijah Newren 2025-10-12 15:07 ` Junio C Hamano 2025-10-08 9:28 ` Christian Couder 2025-10-13 18:14 ` Junio C Hamano 2025-10-23 17:32 ` Junio C Hamano 2025-10-08 4:18 ` Elijah Newren 2025-10-08 8:37 ` Christian Couder 2025-10-08 9:28 ` Michal Suchánek 2025-10-08 9:35 ` Christian Couder 2025-10-09 1:13 ` Collin Funk 2025-10-08 7:30 ` Christian Couder
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).