Git development
 help / color / mirror / Atom feed
* [PATCH v3 2/3] MyFirstContribution: recommend the use of b4
From: Patrick Steinhardt @ 2026-06-08  6:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones,
	SZEDER Gábor, Kristoffer Haugsbakk, Toon Claes
In-Reply-To: <20260608-pks-b4-v3-0-f5e497d10c56@pks.im>

The b4 tool originates from the Linux kernel community and is intended
to help mailing-list based workflows. It automates a lot of the annoying
bookkeeping tasks that contributors typically need to do: tracking the
list of recipients, Message-IDs, range-diffs and the like. In addition
to that, b4 also has many other subcommands that help the maintainer and
reviewers.

The Git project uses the same infrastructure as the kernel, so this tool
is also a very good fit for us. Adapt "MyFirstContribution" to
explicitly recommend its use.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/MyFirstContribution.adoc | 92 ++++++++++++++++++++++++++++++++--
 Documentation/SubmittingPatches        |  6 ++-
 2 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc
index 984b7f5aa8..607876f3d8 100644
--- a/Documentation/MyFirstContribution.adoc
+++ b/Documentation/MyFirstContribution.adoc
@@ -833,7 +833,7 @@ This patchset is part of the MyFirstContribution tutorial and should not
 be merged.
 ----
 
-At this point the tutorial diverges, in order to demonstrate two
+At this point the tutorial diverges, in order to demonstrate three
 different methods of formatting your patchset and getting it reviewed.
 
 The first method to be covered is GitGitGadget, which is useful for those
@@ -845,9 +845,14 @@ more fine-grained control over the emails to be sent. This method requires some
 setup which can change depending on your system and will not be covered in this
 tutorial.
 
+The third method to be covered is `b4`, which builds on top of `git
+format-patch` and `git send-email`. This method is the recommended way to
+submit patches via mail as it automates a lot of the bookkeeping required by
+`git send-email`.
+
 Regardless of which method you choose, your engagement with reviewers will be
-the same; the review process will be covered after the sections on GitGitGadget
-and `git send-email`.
+the same; the review process will be covered after the sections on GitGitGadget,
+`git send-email` and `b4`.
 
 [[howto-ggg]]
 == Sending Patches via GitGitGadget
@@ -1296,6 +1301,87 @@ index 88f126184c..38da593a60 100644
 2.21.0.392.gf8f6787159e-goog
 ----
 
+[[howto-b4]]
+== Sending Patches with `b4`
+
+`b4` is a tool that builds on top of `git format-patch` and `git send-email`.
+It automates much of the bookkeeping involved in sending a patch series to a
+mailing-list-based project.
+
+Refer to the https://b4.docs.kernel.org/[b4 documentation] for a full reference.
+
+[[prep-b4]]
+=== Preparing a Patch Series
+
+`b4` tracks your patch series as a branch. To start tracking the `psuh` branch
+you have been working on, run:
+
+----
+$ b4 prep --enroll master
+----
+
+This enrolls the current branch, using `master` as the base of the topic. `b4`
+manages the cover letter as part of the branch, so you can edit it at any time
+with:
+
+----
+$ b4 prep --edit-cover
+----
+
+The cover letter not only tracks the content of the top-level mail, but also
+the set of recipients. You can add recipients by adding `To:` and `Cc:`
+trailer lines.
+
+[[send-b4]]
+=== Sending the Patches
+
+Before sending the series out for real, you can inspect what `b4` would send by
+passing `--dry-run`:
+
+----
+$ b4 send --dry-run
+----
+
+Once you are happy with the result, send the series with:
+
+----
+$ b4 send
+----
+
+[[v2-b4]]
+=== Sending v2
+
+When you are ready to send a new iteration of your series, refine your
+patches as usual using linkgit:git-rebase[1]. Note that you typically want to
+rebase on top of the cover letter. You can configure an alias to enable easy
+rebases going forward:
+
+---
+$ git config set alias.b4-rebase 'rebase "HEAD^{/--- b4-submit-tracking ---}"'
+$ git b4-rebase -i
+---
+
+Before sending out the new version you should also update the cover letter with
+`b4 prep --edit-cover` to note the relevant changes compared to the previous
+version. You can inspect the changes between the two versions with `b4 prep
+--compare-to=v1`.
+
+Same as with the first version, you can use `b4 send` to send out the second
+version. `b4` automatically bumps the version to `v2`, generates the range-diff
+against the previous iteration, and threads the new series as a reply to the
+cover letter of the first version.
+
+[[configure-b4]]
+=== Configure b4
+
+`b4` can be configured via linkgit:git-config[1]. In addition to that, projects
+can have their own set of defaults in `.b4-config` in the root tree, which also
+uses Git's config format. The user's configuration always takes precedence over
+the per-project defaults.
+
+Refer to the https://b4.docs.kernel.org/en/latest/config.html[b4 config documentation]
+for more information on the available options.
+
 [[now-what]]
 == My Patch Got Emailed - Now What?
 
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index d570184ec8..99427e1ee1 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -573,8 +573,10 @@ your existing e-mail client (often optimized for "multipart/*" MIME
 type e-mails) might render your patches unusable.
 
 NOTE: Here we outline the procedure using `format-patch` and
-`send-email`, but you can instead use GitGitGadget to send in your
-patches (see link:MyFirstContribution.html[MyFirstContribution]).
+`send-email`, but you can instead use GitGitGadget or `b4` to send in
+your patches (see link:MyFirstContribution.html[MyFirstContribution]).
+Contributors are encouraged to use `b4`, which automates much of the
+bookkeeping that is otherwise done by hand.
 
 People on the Git mailing list need to be able to read and
 comment on the changes you are submitting.  It is important for

-- 
2.54.0.1136.gdb2ca164c4.dirty


^ permalink raw reply related

* [PATCH v3 1/3] MyFirstContribution: recommend shallow threading of cover letters
From: Patrick Steinhardt @ 2026-06-08  6:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones,
	SZEDER Gábor, Kristoffer Haugsbakk, Toon Claes
In-Reply-To: <20260608-pks-b4-v3-0-f5e497d10c56@pks.im>

The "MyFirstContribution" document recommends the use of deep threading
of cover letters: every cover letter of subsequent iterations shall be
linked to the cover letter of the preceding version. The result of this
is that eventually, threads with many versions are getting nested so
deep that it becomes hard to follow.

Adapt the recommendation to instead propose shallow threading of cover
letters: instead of linking the cover letter to the previous cover
letter, the user is supposed to always link it to the first cover
letter. This still makes it easy to follow the iterations, but has the
benefit of nesting to a much shallower level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/MyFirstContribution.adoc | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc
index b9fdefce02..984b7f5aa8 100644
--- a/Documentation/MyFirstContribution.adoc
+++ b/Documentation/MyFirstContribution.adoc
@@ -790,7 +790,7 @@ We can note a few things:
   v3", etc. in place of "PATCH". For example, "[PATCH v2 1/3]" would be the first of
   three patches in the second iteration. Each iteration is sent with a new cover
   letter (like "[PATCH v2 0/3]" above), itself a reply to the cover letter of the
-  previous iteration (more on that below).
+  first iteration (more on that below).
 
 NOTE: A single-patch topic is sent with "[PATCH]", "[PATCH v2]", etc. without
 _i_/_n_ numbering (in the above thread overview, no single-patch topic appears,
@@ -1214,7 +1214,7 @@ between your last version and now, if it's something significant. You do not
 need the exact same body in your second cover letter; focus on explaining to
 reviewers the changes you've made that may not be as visible.
 
-You will also need to go and find the Message-ID of your previous cover letter.
+You will also need to go and find the Message-ID of your first cover letter.
 You can either note it when you send the first series, from the output of `git
 send-email`, or you can look it up on the
 https://lore.kernel.org/git[mailing list]. Find your cover letter in the
@@ -1227,8 +1227,8 @@ Message-ID: <foo.12345.author@example.com>
 
 Your Message-ID is `<foo.12345.author@example.com>`. This example will be used
 below as well; make sure to replace it with the correct Message-ID for your
-**previous cover letter** - that is, if you're sending v2, use the Message-ID
-from v1; if you're sending v3, use the Message-ID from v2.
+**first cover letter** - that is, for any subsequent version that you send,
+always use the Message-ID from v1.
 
 While you're looking at the email, you should also note who is CC'd, as it's
 common practice in the mailing list to keep all CCs on a thread. You can add

-- 
2.54.0.1136.gdb2ca164c4.dirty


^ permalink raw reply related

* [PATCH v3 0/3] Documentation: recommend the use of b4
From: Patrick Steinhardt @ 2026-06-08  6:49 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones,
	SZEDER Gábor, Kristoffer Haugsbakk, Toon Claes
In-Reply-To: <20260602-pks-b4-v1-0-a7ae5a49e9cf@pks.im>

Hi,

this small patch series wires up b4 in Git and recommends the use
thereof via "MyFirstContribution", as discussed in [1].

Changes in v3:
  - I wasn't really able to judge consensus one way or the other
    regarding the deep vs shallow nesting of cover letters, so I still
    have the change to shallow nesting of cover letters part of this
    series. If we continue to be split on this one (or if we favor the
    current status quo) I'm happy to drop the first patch and adapt the
    last patch to use deep nesting of cover letters instead.
  - Hopefully fix some confusion by saying "shallow/deep threading of
    cover letters".
  - Fix some more instances where we recommend deep threading of cover
    letters.
  - Link to v2: https://patch.msgid.link/20260603-pks-b4-v2-0-a8aea0aa2c23@pks.im

Changes in v2:
  - Reorder commits so that the b4 docs are added first.
  - Add a section that highlights how to configure b4, and that points
    out that the per-project defaults can be overridden via Git
    configuration.
  - Add a patch to MyFirstContribution that recommends shallow
    threading. I mostly intend this to be a discussion starter so that
    the `.b4-config` file matches our preferred threading style.
  - Fix a typo.
  - Link to v1: https://patch.msgid.link/20260602-pks-b4-v1-0-a7ae5a49e9cf@pks.im

Thanks!

Patrick

[1]: <xmqqik81xpqx.fsf@gitster.g>

---
Patrick Steinhardt (3):
      MyFirstContribution: recommend shallow threading of cover letters
      MyFirstContribution: recommend the use of b4
      b4: introduce configuration for the Git project

 .b4-config                             |   6 ++
 .b4-cover-template                     |  11 ++++
 Documentation/MyFirstContribution.adoc | 100 ++++++++++++++++++++++++++++++---
 Documentation/SubmittingPatches        |   6 +-
 4 files changed, 114 insertions(+), 9 deletions(-)

Range-diff versus v2:

1:  f7784c8f7f ! 1:  4b0c4f9aca Documentation/MyFirstContribution: recommend shallow threading
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    Documentation/MyFirstContribution: recommend shallow threading
    +    MyFirstContribution: recommend shallow threading of cover letters
     
    -    The "MyFirstContribution" document recommends the use of deep threading:
    -    every cover letter of subsequent iterations shall be linked to the cover
    -    letter of the preceding version. The result of this is that eventually,
    -    threads with many versions are getting nested so deep that it becomes
    -    hard to follow.
    +    The "MyFirstContribution" document recommends the use of deep threading
    +    of cover letters: every cover letter of subsequent iterations shall be
    +    linked to the cover letter of the preceding version. The result of this
    +    is that eventually, threads with many versions are getting nested so
    +    deep that it becomes hard to follow.
     
    -    Adapt the recommendation to instead propose shallow threading: instead
    -    of linking the cover letter to the previous cover letter, the user is
    -    supposed to always link it to the first cover letter. This still makes
    -    it easy to follow the iterations, but has the benefit of nesting to a
    -    much shallower level.
    +    Adapt the recommendation to instead propose shallow threading of cover
    +    letters: instead of linking the cover letter to the previous cover
    +    letter, the user is supposed to always link it to the first cover
    +    letter. This still makes it easy to follow the iterations, but has the
    +    benefit of nesting to a much shallower level.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## Documentation/MyFirstContribution.adoc ##
    +@@ Documentation/MyFirstContribution.adoc: We can note a few things:
    +   v3", etc. in place of "PATCH". For example, "[PATCH v2 1/3]" would be the first of
    +   three patches in the second iteration. Each iteration is sent with a new cover
    +   letter (like "[PATCH v2 0/3]" above), itself a reply to the cover letter of the
    +-  previous iteration (more on that below).
    ++  first iteration (more on that below).
    + 
    + NOTE: A single-patch topic is sent with "[PATCH]", "[PATCH v2]", etc. without
    + _i_/_n_ numbering (in the above thread overview, no single-patch topic appears,
    +@@ Documentation/MyFirstContribution.adoc: between your last version and now, if it's something significant. You do not
    + need the exact same body in your second cover letter; focus on explaining to
    + reviewers the changes you've made that may not be as visible.
    + 
    +-You will also need to go and find the Message-ID of your previous cover letter.
    ++You will also need to go and find the Message-ID of your first cover letter.
    + You can either note it when you send the first series, from the output of `git
    + send-email`, or you can look it up on the
    + https://lore.kernel.org/git[mailing list]. Find your cover letter in the
     @@ Documentation/MyFirstContribution.adoc: Message-ID: <foo.12345.author@example.com>
      
      Your Message-ID is `<foo.12345.author@example.com>`. This example will be used
2:  e8f3caf73a ! 2:  625de75a33 Documentation/MyFirstContribution: recommend the use of b4
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    Documentation/MyFirstContribution: recommend the use of b4
    +    MyFirstContribution: recommend the use of b4
     
         The b4 tool originates from the Linux kernel community and is intended
         to help mailing-list based workflows. It automates a lot of the annoying
3:  35591c55c8 = 3:  a95973cfb6 b4: introduce configuration for the Git project

---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260602-pks-b4-31cc20d7f84b


^ permalink raw reply

* Re: [PATCH v2 3/3] b4: introduce configuration for the Git project
From: Patrick Steinhardt @ 2026-06-08  6:48 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones
In-Reply-To: <87qzmn20a9.fsf@emacs.iotcl.com>

On Wed, Jun 03, 2026 at 03:58:38PM +0200, Toon Claes wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/.b4-config b/.b4-config
> > new file mode 100644
> > index 0000000000..fd4fb56b6d
> > --- /dev/null
> > +++ b/.b4-config
> > @@ -0,0 +1,6 @@
> > +# Note that these are default values that you can tweak via the typical
> > +# git-config(1) machinery. You thus shouldn't ever have to change this file.
> > +# See also https://b4.docs.kernel.org/en/latest/config.html.
> > +[b4]
> > +send-same-thread = shallow
> 
> Is it worth to note this requires v0.15 or higher?
> 
> That version was released only 2 months ago, I can imagine many distros
> still ship an older version, what happens if a version doesn't support
> this setting yet?

That's fair. In case it's not supported we fall back to the default,
which is to not use threading at all.

Might be another indicator that we should just stick with the current
threading style.

Patrick

^ permalink raw reply

* Re: [PATCH 1/2] b4: introduce configuration for the Git project
From: Patrick Steinhardt @ 2026-06-08  6:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: SZEDER Gábor, Weijie Yuan, Tuomas Ahola, git
In-Reply-To: <xmqqmrxbp0s6.fsf@gitster.g>

On Thu, Jun 04, 2026 at 10:11:37AM +0900, Junio C Hamano wrote:
> Having said that, I've seen a cover letter of iteration N (for any
> value of N > 1) that respondes to the cover letter of the initial
> iteration.  While it seems not to break "br" and the lore archive
> does not seem unhappy about it, I am not sure if tooling used by
> other people are also happy with it.

Yeah, I always use that style myself. I mostly prefer it because the
nesting for long-running patch series with many versions is eventually
getting out of hand and hard to navigate, and I haven't seen any tooling
breaking as a result of that style.

If I'm the only one who thinks that style to be preferable I am happy to
adapt. I'm not really sure yet what the consensus is -- I'll send one
more version that includes the changes, but if we continue to be split
or in favor of the current status quo I'll drop those in the next
version.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH 1/2] b4: introduce configuration for the Git project
From: Patrick Steinhardt @ 2026-06-08  6:48 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Weijie Yuan, Tuomas Ahola, git, Junio C Hamano
In-Reply-To: <aiAK9eLvew+mgWt+@szeder.dev>

On Wed, Jun 03, 2026 at 01:07:33PM +0200, SZEDER Gábor wrote:
> On Wed, Jun 03, 2026 at 08:55:04AM +0200, Patrick Steinhardt wrote:
> > On Wed, Jun 03, 2026 at 10:12:22AM +0800, Weijie Yuan wrote:
> > > On Tue, Jun 02, 2026 at 08:09:55PM +0300, Tuomas Ahola wrote:
> > > > Huh?  Doesn't MyFirstContribution speak *against* shallow threading?
> > > >
> > > > 	        [...]  make sure to replace it with the correct Message-ID for your
> > > > 	**previous cover letter** - that is, if you're sending v2, use the Message-ID
> > > > 	from v1; if you're sending v3, use the Message-ID from v2.
> > > 
> > > I don't get it. Doesn't shallow threading means every following patches
> > > are replying to the cover letter? Replying to the previous one is
> > > --chain-reply-to, if I'm not mistaken.
> > 
> > Shallow threading basically means that all patches are sent as a
> > response to the current cover letter, and the current cover letter is
> > always attached to the cover letter of the _first_ version.
> 
> No, in Git shallow threading means that all patches are sent as a
> respose to the current cover letter, period.  It has nothing to do
> with whether the current cover letter is sent as a reply to the cover
> letter of the first or the previous version.
> 
> > So this quote is definitely at odds with the configuration I have
> > proposed. It's actually quite surprising to me that we recommend deep
> > threading -- I personally find it extremely hard to navigate as the
> > nesting eventually gets way too deep.
> 
> Deep threading means that every mail is a reply to the previous one.
> Again, it has nothing to do with the relation of the current cover
> letter and the previous cover letters.
> 
> Therefore, we do not recommend deep threading.

Oh, you're right of course. I totally forgot that we even had this
style.

Patrick

^ permalink raw reply

* Re: [PATCH v2 2/3] Documentation/MyFirstContribution: recommend the use of b4
From: Patrick Steinhardt @ 2026-06-08  6:48 UTC (permalink / raw)
  To: Toon Claes; +Cc: git, Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones
In-Reply-To: <87mrxa27xq.fsf@emacs.iotcl.com>

On Thu, Jun 04, 2026 at 07:25:37AM +0200, Toon Claes wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> > diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc
> > index 069020196c..fc0b06ae67 100644
> > --- a/Documentation/MyFirstContribution.adoc
> > +++ b/Documentation/MyFirstContribution.adoc
> > @@ -833,7 +833,7 @@ This patchset is part of the MyFirstContribution tutorial and should not
> >  be merged.
> >  ----
> >  
> > -At this point the tutorial diverges, in order to demonstrate two
> > +At this point the tutorial diverges, in order to demonstrate three
> >  different methods of formatting your patchset and getting it reviewed.
> >  
> >  The first method to be covered is GitGitGadget, which is useful for those
> > @@ -845,9 +845,14 @@ more fine-grained control over the emails to be sent. This method requires some
> >  setup which can change depending on your system and will not be covered in this
> >  tutorial.
> >  
> > +The third method to be covered is `b4`, which builds on top of `git
> > +format-patch` and `git send-email`. This method is the recommended way to
> > +submit patches via mail as it automates a lot of the bookkeeping required by
> > +`git send-email`.
> 
> The GitGitGadget method includes Running CI, maybe that's worth
> mentioning the user is responsible themselves to run the whole test
> suite? Or is this outside the scope of this series, since `git
> send-email` doesn't include that too.

I'd say it's out-of-scope for this patch series.

That being said, I have been wondering last week whether we can automate
running CI in some fashion to shorten feedback cycles, bridge the gap
between the mailing list and CI and ultimately help both reviewers and
Junio. Some subsystems in the Linux kernel for example have tooling that
picks up patch series from the mailing list, runs it through CI and then
reports results to the mailing list (for example [1]).

Having something like that might be valuable for us, too.

Patrick

[1]: https://github.com/linux-netdev/nipa

^ permalink raw reply

* Re: [PATCH v2 1/3] Documentation/MyFirstContribution: recommend shallow threading
From: Patrick Steinhardt @ 2026-06-08  6:48 UTC (permalink / raw)
  To: Kristoffer Haugsbakk
  Cc: git, Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones
In-Reply-To: <f1dbb848-2d9b-488a-835b-2d23006b5fa6@app.fastmail.com>

On Wed, Jun 03, 2026 at 10:09:39PM +0200, Kristoffer Haugsbakk wrote:
> On Wed, Jun 3, 2026, at 08:58, Patrick Steinhardt wrote:
> > The "MyFirstContribution" document recommends the use of deep threading:
> > every cover letter of subsequent iterations shall be linked to the cover
> > letter of the preceding version. The result of this is that eventually,
> > threads with many versions are getting nested so deep that it becomes
> > hard to follow.
> >
> > Adapt the recommendation to instead propose shallow threading: instead
> > of linking the cover letter to the previous cover letter, the user is
> > supposed to always link it to the first cover letter. This still makes
> > it easy to follow the iterations, but has the benefit of nesting to a
> > much shallower level.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  Documentation/MyFirstContribution.adoc | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >[snip]
> 
> Only today did I notice that your eleven-version git-history(1) series
> uses this style. (Or: today I noticed that it’a thing)
> 
> https://lore.kernel.org/git/20250819-b4-pks-history-builtin-v1-0-9b77c32688fe@pks.im/
> 
> That would have had a bad rightward drift with the usual reply to
> previous version style.
> 
> I’ve been reading Lore on Safari on mobile and some threads go so deep
> that the replies just become unclickable backticks. *Huh?* Well I can
> use the Next/Previous buttons and maybe there is a way to make it work,
> but I’ve just given up on those right-going subthreads. ;)
> 
> ... and I also don’t see any drawbacks to that threading, using that
> series as an example. It looks just as comprehensible as the usual
> style.

Yeah. It doesn't matter much for patch series that don't require many
iterations. But eventually I feel like it gets out-of-hand to have the
deep nesting.

Patrick

^ permalink raw reply

* Re: [PATCH v2 1/3] Documentation/MyFirstContribution: recommend shallow threading
From: Patrick Steinhardt @ 2026-06-08  6:48 UTC (permalink / raw)
  To: Weijie Yuan; +Cc: git, Junio C Hamano, Tuomas Ahola, Ramsay Jones
In-Reply-To: <aiACDLOtd_0_CCD7@wyuan.org>

On Wed, Jun 03, 2026 at 06:29:32PM +0800, Weijie Yuan wrote:
> I'm afraid there will be some chaos.

I think "chaos" is a bit exaggerated. We already have both styles on the
mailing list, and I think in general people are able to cope with that
just fine. :)

> As mentioned earlier, GitGitGadget now supports deep nesting of
> iterations, if b4 changes while GitGitGadget doesn't, it would be
> inconsistent in the archive. So, negotiation is necessary here.

That's a good point though -- if we change the recommendation, we should
aim to change it consistently. I'll talk with Dscho (maintainer of GGG)
today.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH v2 1/3] Documentation/MyFirstContribution: recommend shallow threading
From: Patrick Steinhardt @ 2026-06-08  6:48 UTC (permalink / raw)
  To: Tuomas Ahola; +Cc: git, Junio C Hamano, Weijie Yuan, Ramsay Jones
In-Reply-To: <20260603100145.7iym5%taahol@utu.fi>

On Wed, Jun 03, 2026 at 01:01:45PM +0300, Tuomas Ahola wrote:
> Patrick Steinhardt <ps@pks.im> wrote:
> 
> > The "MyFirstContribution" document recommends the use of deep threading:
> > every cover letter of subsequent iterations shall be linked to the cover
> > letter of the preceding version. The result of this is that eventually,
> > threads with many versions are getting nested so deep that it becomes
> > hard to follow.
> > 
> > Adapt the recommendation to instead propose shallow threading: instead
> > of linking the cover letter to the previous cover letter, the user is
> > supposed to always link it to the first cover letter. This still makes
> > it easy to follow the iterations, but has the benefit of nesting to a
> > much shallower level.
> > 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >  Documentation/MyFirstContribution.adoc | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc
> > index b9fdefce02..069020196c 100644
> > --- a/Documentation/MyFirstContribution.adoc
> > +++ b/Documentation/MyFirstContribution.adoc
> > @@ -1227,8 +1227,8 @@ Message-ID: <foo.12345.author@example.com>
> >  
> >  Your Message-ID is `<foo.12345.author@example.com>`. This example will be used
> >  below as well; make sure to replace it with the correct Message-ID for your
> > -**previous cover letter** - that is, if you're sending v2, use the Message-ID
> > -from v1; if you're sending v3, use the Message-ID from v2.
> > +**first cover letter** - that is, for any subsequent version that you send,
> > +always use the Message-ID from v1.
> >  
> >  While you're looking at the email, you should also note who is CC'd, as it's
> >  common practice in the mailing list to keep all CCs on a thread. You can add
> > 
> > -- 
> > 2.54.0.1064.gd145956f57.dirty
> 
> If we adapt this change to the guidance, let's fix also other places of the
> document that talk about replying to the previous cover letter.

Good catch, thanks!

Patrick

^ permalink raw reply

* [PATCH] log: improve --follow following renames for non-linear history
From: Miklos Vajna @ 2026-06-08  6:35 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: git
In-Reply-To: <xmqqjysz7r41.fsf@gitster.g>

Have a repo with a subtree merge, do a 'git log --follow prefix/test.c',
the output only contains history in the outer repo, not commits that
were merged via a subtree merge.

What happened is that 'git log --follow' used to store the followed path
only in opt->diffopt.pathspec, so in case the commit history is
non-linear, and multiple parents had renames to the followed path, then
the end result wasn't really defined: the first commit that happened to
be visited in one of the parents updated opt->diffopt.pathspec, and from
that point, only that updated path was visited.

Fix the problem by introducing a commit -> path map
(follow_pathspec_slab) that stores that will be path to follow when
visiting that parent. At the top of log_tree_commit(), if the slab has
an entry for this commit, we replace opt->diffopt.pathspec with it, so
the correct path is followed, even if an unrelated sub-tree changed the
path to be followed to something else. After log_tree_diff() runs, we
record each parent's path in the slab: for a non-merge commit,
try_to_follow_renames() inside diff_tree_oid() has already updated
opt->diffopt.pathspec to the parent's name, so we just record it. For a
merge, log_tree_diff() is a no-op; we run a separate
diff_tree_oid(parent, commit, ...) with follow_renames=1 for each parent
and record the path it finds. As a result, the walk order doesn't
matter, which was exactly the source of problems previously.

This helps with subtree merges (rename happens inside the merge commit),
but also the general case when the rename happens in the history of
parents, not in the merge commit itself.
---

Hi Junio, Jeff,

On Tue, May 19, 2026 at 05:14:54PM +0900, Junio C Hamano <gitster@pobox.com> wrote:
> A minor "tweak" that does not solve this inherent design issue does
> not interest me, so...

Here is a patch that attempts to actually solve the problem you point
out here, by tracking what path should be followed for multiple branches
of the history.

It obsoletes the previous two "tweak" patches in this thread.

Hopefully this one is more interesting. :-)

Thanks,

Miklos

 Documentation/config/log.adoc |   3 +-
 log-tree.c                    | 116 ++++++++++++++++++++++++++++++++++
 log-tree.h                    |   1 +
 revision.c                    |   2 +
 revision.h                    |   4 ++
 t/meson.build                 |   1 +
 t/t4218-log-follow-merge.sh   |  80 +++++++++++++++++++++++
 7 files changed, 205 insertions(+), 2 deletions(-)
 create mode 100755 t/t4218-log-follow-merge.sh

diff --git a/Documentation/config/log.adoc b/Documentation/config/log.adoc
index f20cc25cd7..757a7be196 100644
--- a/Documentation/config/log.adoc
+++ b/Documentation/config/log.adoc
@@ -53,8 +53,7 @@ This is the same as the `--decorate` option of the `git log`.
 `log.follow`::
 	If `true`, `git log` will act as if the `--follow` option was used when
 	a single <path> is given.  This has the same limitations as `--follow`,
-	i.e. it cannot be used to follow multiple files and does not work well
-	on non-linear history.
+	i.e. it cannot be used to follow multiple files.
 
 `log.graphColors`::
 	A list of colors, separated by commas, that can be used to draw
diff --git a/log-tree.c b/log-tree.c
index 7e048701d0..e7f098e571 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -3,6 +3,7 @@
 
 #include "git-compat-util.h"
 #include "commit-reach.h"
+#include "commit-slab.h"
 #include "config.h"
 #include "diff.h"
 #include "diffcore.h"
@@ -1089,6 +1090,96 @@ static int do_remerge_diff(struct rev_info *opt,
 	return !opt->loginfo;
 }
 
+/* Per-commit pathspec storage for --follow across merges */
+define_commit_slab(follow_pathspec_slab, char *);
+
+static const char *pathspec_single_path(const struct pathspec *ps)
+{
+	if (ps->nr != 1)
+		return NULL;
+	return ps->items[0].match;
+}
+
+static void set_pathspec_to_single_path(struct pathspec *ps, const char *path)
+{
+	const char *paths[2] = { path, NULL };
+
+	clear_pathspec(ps);
+	parse_pathspec(ps,
+		       PATHSPEC_ALL_MAGIC & ~PATHSPEC_LITERAL,
+		       PATHSPEC_LITERAL_PATH, "", paths);
+}
+
+static void remember_follow_pathspec(struct rev_info *opt,
+				     struct commit *c, const char *path)
+{
+	char **slot;
+
+	if (!path)
+		return;
+	if (!opt->follow_pathspec_slab) {
+		opt->follow_pathspec_slab = xmalloc(sizeof(*opt->follow_pathspec_slab));
+		init_follow_pathspec_slab(opt->follow_pathspec_slab);
+	}
+	slot = follow_pathspec_slab_at(opt->follow_pathspec_slab, c);
+	if (*slot && !strcmp(*slot, path))
+		return;
+	free(*slot);
+	*slot = xstrdup(path);
+}
+
+static const char *recall_follow_pathspec(struct rev_info *opt,
+					  struct commit *c)
+{
+	char **slot;
+
+	if (!opt->follow_pathspec_slab)
+		return NULL;
+	slot = follow_pathspec_slab_peek(opt->follow_pathspec_slab, c);
+	return slot ? *slot : NULL;
+}
+
+static void free_follow_pathspec_slot(char **slot)
+{
+	FREE_AND_NULL(*slot);
+}
+
+void release_follow_pathspec_slab(struct rev_info *opt)
+{
+	if (!opt->follow_pathspec_slab)
+		return;
+	deep_clear_follow_pathspec_slab(opt->follow_pathspec_slab,
+					free_follow_pathspec_slot);
+	FREE_AND_NULL(opt->follow_pathspec_slab);
+}
+
+/* Compute the followed pathspec that should apply to parent. */
+static void propagate_follow_pathspec_to_parent(struct rev_info *opt,
+						struct commit *commit,
+						struct commit *parent)
+{
+	struct diff_options diff_opts;
+	const char *path;
+
+	parse_commit_or_die(parent);
+	repo_diff_setup(opt->diffopt.repo, &diff_opts);
+	copy_pathspec(&diff_opts.pathspec, &opt->diffopt.pathspec);
+	diff_opts.flags.recursive = 1;
+	diff_opts.flags.follow_renames = 1;
+	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
+	diff_setup_done(&diff_opts);
+	diff_tree_oid(get_commit_tree_oid(parent),
+		      get_commit_tree_oid(commit),
+		      "", &diff_opts);
+
+	path = pathspec_single_path(&diff_opts.pathspec);
+	if (path)
+		remember_follow_pathspec(opt, parent, path);
+
+	diff_queue_clear(&diff_queued_diff);
+	diff_free(&diff_opts);
+}
+
 /*
  * Show the diff of a commit.
  *
@@ -1179,6 +1270,16 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 	opt->loginfo = &log;
 	opt->diffopt.no_free = 1;
 
+	/* Any recorded pathspec for this commit? If so, restore it. */
+	if (opt->diffopt.flags.follow_renames) {
+		const char *stored = recall_follow_pathspec(opt, commit);
+		if (stored) {
+			const char *current = pathspec_single_path(&opt->diffopt.pathspec);
+			if (!current || strcmp(current, stored))
+				set_pathspec_to_single_path(&opt->diffopt.pathspec, stored);
+		}
+	}
+
 	/* NEEDSWORK: no restoring of no_free?  Why? */
 	if (opt->line_level_traverse)
 		return line_log_print(opt, commit);
@@ -1195,6 +1296,21 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit)
 		fprintf(opt->diffopt.file, "\n%s\n", opt->break_bar);
 	if (shown)
 		show_diff_of_diff(opt);
+
+	/* Record what pathspec each parent of this commit should use */
+	if (opt->diffopt.flags.follow_renames) {
+		struct commit_list *parents = get_saved_parents(opt, commit);
+		if (parents && parents->next) {
+			struct commit_list *p;
+			for (p = parents; p; p = p->next)
+				propagate_follow_pathspec_to_parent(opt, commit,
+								    p->item);
+		} else if (parents) {
+			remember_follow_pathspec(opt, parents->item,
+				pathspec_single_path(&opt->diffopt.pathspec));
+		}
+	}
+
 	opt->loginfo = NULL;
 	maybe_flush_or_die(opt->diffopt.file, "stdout");
 	opt->diffopt.no_free = no_free;
diff --git a/log-tree.h b/log-tree.h
index 07924be8bc..e8679b6c4a 100644
--- a/log-tree.h
+++ b/log-tree.h
@@ -26,6 +26,7 @@ struct decoration_options {
 int parse_decorate_color_config(const char *var, const char *slot_name, const char *value);
 int log_tree_diff_flush(struct rev_info *);
 int log_tree_commit(struct rev_info *, struct commit *);
+void release_follow_pathspec_slab(struct rev_info *);
 void show_log(struct rev_info *opt);
 void format_decorations(struct strbuf *sb, const struct commit *commit,
 			enum git_colorbool use_color, const struct decoration_options *opts);
diff --git a/revision.c b/revision.c
index 5693618be4..caa85fb4c6 100644
--- a/revision.c
+++ b/revision.c
@@ -26,6 +26,7 @@
 #include "decorate.h"
 #include "string-list.h"
 #include "line-log.h"
+#include "log-tree.h"
 #include "mailmap.h"
 #include "commit-slab.h"
 #include "cache-tree.h"
@@ -3284,6 +3285,7 @@ void release_revisions(struct rev_info *revs)
 	line_log_free(revs);
 	oidset_clear(&revs->missing_commits);
 	release_revisions_bloom_keyvecs(revs);
+	release_follow_pathspec_slab(revs);
 }
 
 static void add_child(struct rev_info *revs, struct commit *parent, struct commit *child)
diff --git a/revision.h b/revision.h
index c9a11827cc..607113ca74 100644
--- a/revision.h
+++ b/revision.h
@@ -65,6 +65,7 @@ struct repository;
 struct rev_info;
 struct string_list;
 struct saved_parents;
+struct follow_pathspec_slab;
 struct bloom_keyvec;
 struct bloom_filter_settings;
 struct option;
@@ -354,6 +355,9 @@ struct rev_info {
 	/* copies of the parent lists, for --full-diff display */
 	struct saved_parents *saved_parents_slab;
 
+	/* per-commit pathspec for --follow across merges */
+	struct follow_pathspec_slab *follow_pathspec_slab;
+
 	struct commit_list *previous_parents;
 	struct commit_list *ancestry_path_bottoms;
 	const char *break_bar;
diff --git a/t/meson.build b/t/meson.build
index 2af8d01279..cd43e0609a 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -576,6 +576,7 @@ integration_tests = [
   't4215-log-skewed-merges.sh',
   't4216-log-bloom.sh',
   't4217-log-limit.sh',
+  't4218-log-follow-merge.sh',
   't4252-am-options.sh',
   't4253-am-keep-cr-dos.sh',
   't4254-am-corrupt.sh',
diff --git a/t/t4218-log-follow-merge.sh b/t/t4218-log-follow-merge.sh
new file mode 100755
index 0000000000..7a1b6fcb84
--- /dev/null
+++ b/t/t4218-log-follow-merge.sh
@@ -0,0 +1,80 @@
+#!/bin/sh
+
+test_description='Test --follow follows renames across merges'
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+test_expect_success 'setup subtree-merged repository' '
+	git init inner &&
+	echo inner >inner/inner.txt &&
+	git -C inner add inner.txt &&
+	git -C inner commit -m "inner init" &&
+
+	git init outer &&
+	echo outer >outer/outer.txt &&
+	git -C outer add outer.txt &&
+	git -C outer commit -m "outer init" &&
+
+	git -C outer fetch ../inner master &&
+	git -C outer merge -s ours --no-commit --allow-unrelated-histories \
+		FETCH_HEAD &&
+	git -C outer read-tree --prefix=inner/ -u FETCH_HEAD &&
+	git -C outer commit -m "Merge inner repo into inner/ subdirectory"
+'
+
+test_expect_success '--follow finds the pre-merge commit through a subtree merge' '
+	git -C outer log --follow --pretty=tformat:%s inner/inner.txt >actual &&
+	echo "inner init" >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success 'setup merge of two branches that both renamed a file to README' '
+	git init foo &&
+	mkdir foo/foo &&
+	echo "foo readme" >foo/foo/README &&
+	git -C foo add foo/README &&
+	git -C foo commit -m "add foo README" &&
+
+	git -C foo mv foo/README README &&
+	git -C foo commit -m "promote foo README to toplevel" &&
+
+	echo "foo c" >foo/foo.c &&
+	git -C foo add foo.c &&
+	git -C foo commit -m "add foo C impl" &&
+
+	git init bar &&
+	mkdir bar/bar &&
+	echo "bar readme" >bar/bar/README &&
+	git -C bar add bar/README &&
+	git -C bar commit -m "add bar README" &&
+
+	git -C bar mv bar/README README &&
+	git -C bar commit -m "promote bar README to toplevel" &&
+
+	echo "bar c" >bar/bar.c &&
+	git -C bar add bar.c &&
+	git -C bar commit -m "add bar C impl" &&
+
+	git -C foo fetch ../bar master &&
+	git -C foo merge -s ours --no-commit --allow-unrelated-histories \
+		FETCH_HEAD &&
+	git -C foo checkout FETCH_HEAD -- bar.c &&
+	git -C foo commit -m "merge bar into foo"
+'
+
+test_expect_success '--follow follows renames across both sides of a merge' '
+	git -C foo log --follow --pretty=tformat:%s README >actual &&
+	sort actual >actual.sorted &&
+	cat >expect <<-\EOF &&
+	add bar README
+	add foo README
+	promote bar README to toplevel
+	promote foo README to toplevel
+	EOF
+	test_cmp expect actual.sorted
+'
+
+test_done
-- 
2.51.0


^ permalink raw reply related

* Re: [PATCH 01/16] packfile: rename `struct packfile_store` to `odb_source_packed`
From: Patrick Steinhardt @ 2026-06-08  6:23 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git
In-Reply-To: <CAOLa=ZQhGbjFbn_cpOmhYYN0xNjp1K8_Pj2mi34WzS25DG4ZEA@mail.gmail.com>

On Fri, Jun 05, 2026 at 07:25:31AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Not too long ago, we have introduced the packfile store in b7983adb51
> > (packfile: introduce a new `struct packfile_store`, 2025-09-23). This
> > struct is responsible for managing all of our access to packfiles and is
> > used as one of the two sources of objects for the "files" source.
> >
> > Back when I introduced this structure I didn't have the clear vision yet
> > that it will eventually also turn into a proper object database source,
> > and how exactly that infrastructure will look like. Now though it's
> > becoming increasingly clear that it does make sense to treat it just the
> > same as any of our other ODB sources.
> >
> > The consequence is that the naming is now a bit out-of-date: it's just
> > another source and will be turned into a proper `struct odb_source` over
> > the next couple of commits, but it's not named accordingly.
> >
> > Rename the structure to `odb_source_packed` to align it with this goal
> > and to bring it in line with the other sources we already have.
> >
> 
> Looks good, I'm assuming we'll also rename drop some of the
> `packfile_store_*` functions as things get cleaned up in the following
> commits.

In fact not all of them, mostly because I wanted to not make the patch
series even bigger than it already is. I'll clean up stragglers in a
subsequent patch series though.

Patrick

^ permalink raw reply

* Re: [PATCH v2] compat/posix.h: enable UNUSED warning messages for Clang
From: Patrick Steinhardt @ 2026-06-08  6:20 UTC (permalink / raw)
  To: Dominik Loidolt; +Cc: gitster, git, asedeno, asedeno, avarab
In-Reply-To: <aiLxCWp8Bv-KQoLf@four.local>

On Fri, Jun 05, 2026 at 05:53:45PM +0200, Dominik Loidolt wrote:
> On Fri, Jun 05, 2026 at 03:22:49PM +0200, Patrick Steinhardt wrote:
> > I was wondering about that, too. The question that I have is whether
> > there's any particular reason why the check was written that way. So in
> > the best case we'd do some digging into the history to figure out why
> > this looks the way it looks like.
> 
> I think the current bit-shift style introduced by 89c855ed3c (git-compat-util.h:
> implement a different ARRAY_SIZE macro for for safely deriving the size of
> array, 2015-04-30) was inherited from glibc [0].
> 
> I found that NetBSD [1] has long used the more explicit comparison form instead
> of the bit-shift style, and other BSDs seem to do the same. So there is at
> least established precedent for writing the version check that way. :-)
> 
> I see no obvious reason to prefer the bit-shift style today.

Thanks for digging!

I don't really see a reason to keep the bitshift style, either. It could
make a difference if it was ever evaluated at runtime, as we would
evaluate the arguments multiple times with youur version. But all of the
instances we have are evaluated at compile time anwyay, so that doesn't
matter much to us.

I'll leave it up to you whether you want to send another iteration of
this patch series that also adapts the preexisting callsite to use the
new style.

Thanks!

Patrick

^ permalink raw reply

* [PATCH] ref-filter: reuse --contains traversal results
From: Tamir Duberstein @ 2026-06-08  3:33 UTC (permalink / raw)
  To: git
  Cc: Jeff King, Karthik Nayak, Junio C Hamano, Victoria Dye,
	Derrick Stolee, Elijah Newren, Tamir Duberstein

git branch and git for-each-ref call repo_is_descendant_of() for each
candidate selected by --contains or --no-contains. Each call starts a
new graph walk, so refs with shared history repeatedly traverse the same
commits.

ffc4b8012d (tag: speed up --contains calculation, 2011-06-11) introduced
the tag traversal that caches positive and negative answers across
candidates. ee2bd06b0f (ref-filter: implement '--contains' option,
2015-07-07) preserved the branch and tag implementations when ref-filter
learned --contains. 008ed7df930 (tag.c: use the correct algorithm for
the '--contains' option, 2015-10-18) noted that they should be unified.

Use the memoized traversal for every ref-filter contains check and
remove the implementation selector. The cache records answers for one
fixed target list, so document that callers must clear it before
changing the list.

The memoized depth-first walk assumes acyclic ancestry, but replacement
refs can create cycles. Track commits while they are on the walk. If a
cycle is found, discard partial cache entries and use
repo_is_descendant_of() for that candidate.

The branch and for-each-ref path passed repo_is_descendant_of() through
a Boolean interface. In configurations where it returned -1 for missing
ancestry, ref-filter treated the error as "contains". The memoized path
instead fails when ancestry cannot be parsed, as git tag already did.
During review of the 2018 reachability series, making parse failures
fatal was explicitly deferred because that series was intended to
preserve behavior. Unifying the implementations now makes all callers
fail consistently instead of preserving that accidental Boolean
interpretation.

The added p1500 case uses up to 8,192 packed refs along one first-parent
history. It improves from 0.68 to 0.03 seconds.

On a checkout with 62,174 remote-tracking refs, I ran:

    hyperfine --warmup 0 --runs 3 \
        --command-name parent \
        '"$parent" branch -r --contains c78ae85f3ce7e >/dev/null' \
        --command-name this-commit \
        '"$this" branch -r --contains c78ae85f3ce7e >/dev/null'

The results were:

             parent       this commit
  elapsed    104.365 s     467.7 ms
  user        93.702 s     220.2 ms
  system       0.723 s     182.7 ms

The wall-time standard deviations were 11.356 seconds and 133.8
milliseconds, respectively, for a 223x speedup. Both commands produced
output with SHA-256
2466f6e2b72aa16b1a2126eddb81c8a1b2764ee251204ac034c191a925aa896f.

Both revisions were rebuilt with the default -O2 flags using Apple clang
21.0.0 on macOS 26.5. The machine was a MacBook Pro (Mac16,6) with a
16-core Apple M4 Max (12 performance and four efficiency cores) and 128
GB RAM.

Link: https://lore.kernel.org/git/1445163904-24611-1-git-send-email-Karthik.188@gmail.com/
Link: https://lore.kernel.org/git/20180723204112.233274-1-jonathantanmy@google.com/
Link: https://lore.kernel.org/git/24424e55-7fa8-d05b-bc39-e14b4d5abcb6@gmail.com/
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
 builtin/tag.c                  |  1 -
 commit-reach.c                 | 45 +++++++++++++++++++++++++++++++-----------
 commit-reach.h                 | 15 ++++++++++----
 ref-filter.c                   |  6 ++++--
 ref-filter.h                   |  7 +++----
 t/helper/test-reach.c          | 10 ++--------
 t/perf/p1500-graph-walks.sh    | 24 +++++++++++++++++++++-
 t/t6301-for-each-ref-errors.sh | 18 +++++++++++++++++
 t/t6302-for-each-ref-filter.sh | 21 ++++++++++++++++++++
 t/t6600-test-reach.sh          |  6 ++----
 10 files changed, 117 insertions(+), 36 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index d51c2e3349..9f34d948d4 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -71,7 +71,6 @@ static int list_tags(struct ref_filter *filter, struct ref_sorting *sorting,
 
 	if (verify_ref_format(format))
 		die(_("unable to parse format string"));
-	filter->with_commit_tag_algo = 1;
 	filter_and_format_refs(filter, FILTER_REFS_TAGS, sorting, format);
 
 	free(to_free);
diff --git a/commit-reach.c b/commit-reach.c
index 9b3ea46d6f..6e599a3670 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -6,7 +6,6 @@
 #include "decorate.h"
 #include "hex.h"
 #include "prio-queue.h"
-#include "ref-filter.h"
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
@@ -708,7 +707,8 @@ static int in_commit_list(const struct commit_list *want, struct commit *c)
 
 /*
  * Test whether the candidate is contained in the list.
- * Do not recurse to find out, though, but return -1 if inconclusive.
+ * Do not recurse to find out, though, but return CONTAINS_UNKNOWN if
+ * inconclusive.
  */
 static enum contains_result contains_test(struct commit *candidate,
 					  const struct commit_list *want,
@@ -743,9 +743,9 @@ static void push_to_contains_stack(struct commit *candidate, struct contains_sta
 	contains_stack->contains_stack[contains_stack->nr++].parents = candidate->parents;
 }
 
-static enum contains_result contains_tag_algo(struct commit *candidate,
-					      const struct commit_list *want,
-					      struct contains_cache *cache)
+static enum contains_result contains_algo(struct commit *candidate,
+					  struct commit_list *want,
+					  struct contains_cache *cache)
 {
 	struct contains_stack contains_stack = { 0, 0, NULL };
 	enum contains_result result;
@@ -765,6 +765,7 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 	if (result != CONTAINS_UNKNOWN)
 		return result;
 
+	*contains_cache_at(cache, candidate) = CONTAINS_IN_PROGRESS;
 	push_to_contains_stack(candidate, &contains_stack);
 	while (contains_stack.nr) {
 		struct contains_stack_entry *entry = &contains_stack.contains_stack[contains_stack.nr - 1];
@@ -776,8 +777,8 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 			contains_stack.nr--;
 		}
 		/*
-		 * If we just popped the stack, parents->item has been marked,
-		 * therefore contains_test will return a meaningful yes/no.
+		 * A parent may have just been popped and marked, or may still
+		 * be active when replacement refs create a cycle.
 		 */
 		else switch (contains_test(parents->item, want, cache, cutoff)) {
 		case CONTAINS_YES:
@@ -787,21 +788,41 @@ static enum contains_result contains_tag_algo(struct commit *candidate,
 		case CONTAINS_NO:
 			entry->parents = parents->next;
 			break;
+		case CONTAINS_IN_PROGRESS:
+			/*
+			 * Partial negative answers are not safe across a cycle.
+			 * Discard them and use the cycle-safe reachability walk.
+			 */
+			goto cycle;
 		case CONTAINS_UNKNOWN:
+			*contains_cache_at(cache, parents->item) =
+				CONTAINS_IN_PROGRESS;
 			push_to_contains_stack(parents->item, &contains_stack);
 			break;
 		}
 	}
 	free(contains_stack.contains_stack);
 	return contains_test(candidate, want, cache, cutoff);
+
+cycle:
+	free(contains_stack.contains_stack);
+	clear_contains_cache(cache);
+	init_contains_cache(cache);
+
+	result = repo_is_descendant_of(the_repository, candidate, want);
+	if (result < 0)
+		exit(128);
+	*contains_cache_at(cache, candidate) =
+		result ? CONTAINS_YES : CONTAINS_NO;
+	return result ? CONTAINS_YES : CONTAINS_NO;
 }
 
-int commit_contains(struct ref_filter *filter, struct commit *commit,
-		    struct commit_list *list, struct contains_cache *cache)
+int commit_contains(struct commit *commit, struct commit_list *list,
+		    struct contains_cache *cache)
 {
-	if (filter->with_commit_tag_algo)
-		return contains_tag_algo(commit, list, cache) == CONTAINS_YES;
-	return repo_is_descendant_of(the_repository, commit, list);
+	if (!list)
+		return 1;
+	return contains_algo(commit, list, cache) == CONTAINS_YES;
 }
 
 int can_all_from_reach_with_flag(struct object_array *from,
diff --git a/commit-reach.h b/commit-reach.h
index 3f3a563d8a..144dc56275 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -5,7 +5,6 @@
 #include "commit-slab.h"
 
 struct commit_list;
-struct ref_filter;
 struct object_id;
 struct object_array;
 
@@ -73,13 +72,21 @@ int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
 enum contains_result {
 	CONTAINS_UNKNOWN = 0,
 	CONTAINS_NO,
-	CONTAINS_YES
+	CONTAINS_YES,
+	CONTAINS_IN_PROGRESS
 };
 
 define_commit_slab(contains_cache, enum contains_result);
 
-int commit_contains(struct ref_filter *filter, struct commit *commit,
-		    struct commit_list *list, struct contains_cache *cache);
+/*
+ * Return whether "commit" is a descendant of any commit in "list". An empty
+ * list matches.
+ *
+ * "cache" records answers for one fixed "list". Clear it before changing the
+ * list.
+ */
+int commit_contains(struct commit *commit, struct commit_list *list,
+		    struct contains_cache *cache);
 
 /*
  * Determine if every commit in 'from' can reach at least one commit
diff --git a/ref-filter.c b/ref-filter.c
index 1da4c0e60d..7788147959 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2991,11 +2991,13 @@ static struct ref_array_item *apply_ref_filter(const struct reference *ref,
 			return NULL;
 		/* We perform the filtering for the '--contains' option... */
 		if (filter->with_commit &&
-		    !commit_contains(filter, commit, filter->with_commit, &filter->internal.contains_cache))
+		    !commit_contains(commit, filter->with_commit,
+				     &filter->internal.contains_cache))
 			return NULL;
 		/* ...or for the `--no-contains' option */
 		if (filter->no_commit &&
-		    commit_contains(filter, commit, filter->no_commit, &filter->internal.no_contains_cache))
+		    commit_contains(commit, filter->no_commit,
+				    &filter->internal.no_contains_cache))
 			return NULL;
 	}
 
diff --git a/ref-filter.h b/ref-filter.h
index 120221b47f..9e14afca9c 100644
--- a/ref-filter.h
+++ b/ref-filter.h
@@ -73,10 +73,9 @@ struct ref_filter {
 	struct commit_list *reachable_from;
 	struct commit_list *unreachable_from;
 
-	unsigned int with_commit_tag_algo : 1,
-		match_as_path : 1,
-		ignore_case : 1,
-		detached : 1;
+	unsigned int match_as_path : 1,
+		     ignore_case : 1,
+		     detached : 1;
 	unsigned int kind,
 		lines;
 	int abbrev,
diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c
index 5d86a96c17..82235f713e 100644
--- a/t/helper/test-reach.c
+++ b/t/helper/test-reach.c
@@ -6,7 +6,6 @@
 #include "gettext.h"
 #include "hex.h"
 #include "object-name.h"
-#include "ref-filter.h"
 #include "setup.h"
 #include "string-list.h"
 #include "tag.h"
@@ -138,16 +137,11 @@ int cmd__reach(int ac, const char **av)
 
 		printf("%s(X,_,_,0,0):%d\n", av[1], can_all_from_reach_with_flag(&X_obj, 2, 4, 0, 0));
 	} else if (!strcmp(av[1], "commit_contains")) {
-		struct ref_filter filter = REF_FILTER_INIT;
 		struct contains_cache cache;
 		init_contains_cache(&cache);
 
-		if (ac > 2 && !strcmp(av[2], "--tag"))
-			filter.with_commit_tag_algo = 1;
-		else
-			filter.with_commit_tag_algo = 0;
-
-		printf("%s(_,A,X,_):%d\n", av[1], commit_contains(&filter, A, X, &cache));
+		printf("%s(_,A,X,_):%d\n", av[1],
+		       commit_contains(A, X, &cache));
 		clear_contains_cache(&cache);
 	} else if (!strcmp(av[1], "get_reachable_subset")) {
 		const int reachable_flag = 1;
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index 5b23ce5db9..ac68fdbacd 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -5,6 +5,8 @@ test_description='Commit walk performance tests'
 
 test_perf_large_repo
 
+contains_ref_limit=8192
+
 test_expect_success 'setup' '
 	git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
 	sort -r allrefs | head -n 50 >refs &&
@@ -32,10 +34,25 @@ test_expect_success 'setup' '
 		echo "X:$line" >>test-tool-tags || return 1
 	done &&
 
+	git rev-list --first-parent --max-count=$contains_ref_limit HEAD >contains-commits &&
+	contains_ref_count=$(wc -l <contains-commits) &&
+	test "$contains_ref_count" -gt 0 &&
+	contains_base=$(tail -n 1 contains-commits) &&
+	export contains_base &&
+	awk "{
+		printf \"update refs/contains-perf/%04d %s\\n\", NR, \$1
+	}" contains-commits |
+		git update-ref --stdin &&
+	git pack-refs --include "refs/contains-perf/*" &&
+
 	commit=$(git commit-tree $(git rev-parse HEAD^{tree})) &&
 	git update-ref refs/heads/disjoint-base $commit &&
 
-	git commit-graph write --reachable
+	git commit-graph write --reachable &&
+
+	git for-each-ref --contains="$contains_base" \
+		refs/contains-perf/ >actual &&
+	test_line_count = $contains_ref_count actual
 '
 
 test_perf 'ahead-behind counts: git for-each-ref' '
@@ -62,6 +79,11 @@ test_perf 'contains: git tag --merged' '
 	xargs git tag --merged=HEAD <tags
 '
 
+test_perf 'contains: git for-each-ref --contains' '
+	git for-each-ref --contains="$contains_base" \
+		refs/contains-perf/ >/dev/null
+'
+
 test_perf 'is-base check: test-tool reach (refs)' '
 	test-tool reach get_branch_base_for_tip <test-tool-refs
 '
diff --git a/t/t6301-for-each-ref-errors.sh b/t/t6301-for-each-ref-errors.sh
index e06feb06e9..169cc70c23 100755
--- a/t/t6301-for-each-ref-errors.sh
+++ b/t/t6301-for-each-ref-errors.sh
@@ -52,6 +52,24 @@ test_expect_success 'Missing objects are reported correctly' '
 	test_must_be_empty brief-err
 '
 
+test_expect_success 'missing ancestors are reported by contains filters' '
+	test_when_finished "git update-ref -d refs/heads/missing-parent" &&
+	{
+		echo "tree $(git rev-parse HEAD^{tree})" &&
+		echo "parent $MISSING" &&
+		git cat-file commit HEAD |
+			sed -n -e "/^author /p" -e "/^committer /p" &&
+		echo &&
+		echo "missing parent"
+	} >commit &&
+	broken=$(git hash-object -t commit -w commit) &&
+	git update-ref refs/heads/missing-parent "$broken" &&
+	test_must_fail git for-each-ref --contains=HEAD \
+		refs/heads/missing-parent >out 2>err &&
+	test_must_be_empty out &&
+	test_grep "unable to parse commit $MISSING" err
+'
+
 test_expect_success 'ahead-behind requires an argument' '
 	test_must_fail git for-each-ref \
 		--format="%(ahead-behind)" 2>err &&
diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index 7f060d97bf..423505d1fb 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -177,6 +177,27 @@ test_expect_success 'filtering with --contains and --no-contains' '
 	test_cmp expect actual
 '
 
+test_expect_success 'contains handles cyclic replacement histories' '
+	one=$(git rev-parse one) &&
+	three=$(git rev-parse three) &&
+	test_when_finished "
+		git replace -d $one
+		git replace -d $three
+		git tag -d cycle-a cycle-b
+	" &&
+	git tag cycle-a "$one" &&
+	git tag cycle-b "$three" &&
+	git replace --graft "$one" "$three" two &&
+	git replace --graft "$three" "$one" &&
+	cat >expect <<-\EOF &&
+	refs/tags/cycle-a
+	refs/tags/cycle-b
+	EOF
+	git for-each-ref --format="%(refname)" --contains=two \
+		"refs/tags/cycle-*" >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success '%(color) must fail' '
 	test_must_fail git for-each-ref --format="%(color)%(refname)"
 '
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index b5b314e570..1ecc2571c2 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -286,8 +286,7 @@ test_expect_success 'commit_contains:hit' '
 	X:commit-9-3
 	EOF
 	echo "commit_contains(_,A,X,_):1" >expect &&
-	test_all_modes commit_contains &&
-	test_all_modes commit_contains --tag
+	test_all_modes commit_contains
 '
 
 test_expect_success 'commit_contains:miss' '
@@ -303,8 +302,7 @@ test_expect_success 'commit_contains:miss' '
 	X:commit-9-3
 	EOF
 	echo "commit_contains(_,A,X,_):0" >expect &&
-	test_all_modes commit_contains &&
-	test_all_modes commit_contains --tag
+	test_all_modes commit_contains
 '
 
 test_expect_success 'rev-list: basic topo-order' '

---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260607-ref-filter-memoized-contains-7cb6b3bccad1

Best regards,
--  
Tamir Duberstein <tamird@gmail.com>


^ permalink raw reply related

* [PATCH] describe: limit default ref iteration to tags
From: Tamir Duberstein @ 2026-06-07 20:51 UTC (permalink / raw)
  To: git
  Cc: Shawn O. Pearce, Junio C Hamano, Jeff King, Patrick Steinhardt,
	Tamir Duberstein

Unless --all is given, get_name() rejects every ref outside refs/tags/.
The rejection happens only after the ref backend has enumerated the ref,
so repositories with many other refs spend most of a simple describe
invocation visiting refs which cannot affect its result.

Commit 8a5a1884e9 (Avoid accessing non-tag refs in git-describe unless
--all is requested, 2008-02-24) moved this rejection before object
lookup, but left iteration unscoped. Pass the existing refs/tags/
restriction to the iterator unless --all is given so the backend can
avoid unrelated refs.

On a checkout with 124,357 refs, of which 330 were tags, I ran the
following command with the parent and patched binaries:

    hyperfine --warmup 3 --runs 15 \
        'git describe --always --long --abbrev=40 HEAD'

The results were:

             parent       this commit
  elapsed    196.2 ms      63.3 ms
  user        69.5 ms      48.0 ms
  system     123.0 ms      12.0 ms

The wall-time standard deviations were 13.2 ms and 2.6 ms, respectively,
for a 3.10x speedup.

Both revisions were built with -O3, -mcpu=native, and ThinLTO using
Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
(Mac16,6) with a 16-core Apple M4 Max (12 performance and four
efficiency cores) and 128 GB RAM.

Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
 builtin/describe.c       |  3 +++
 t/perf/p6100-describe.sh | 20 ++++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/builtin/describe.c b/builtin/describe.c
index 1c47d7c0b7..3532c8ff22 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -740,6 +740,9 @@ int cmd_describe(int argc,
 		return ret;
 	}
 
+	if (!all)
+		for_each_ref_opts.prefix = "refs/tags/";
+
 	hashmap_init(&names, commit_name_neq, NULL, 0);
 	refs_for_each_ref_ext(get_main_ref_store(the_repository),
 			      get_name, NULL, &for_each_ref_opts);
diff --git a/t/perf/p6100-describe.sh b/t/perf/p6100-describe.sh
index 069f91ce49..dfcaf59e90 100755
--- a/t/perf/p6100-describe.sh
+++ b/t/perf/p6100-describe.sh
@@ -5,6 +5,12 @@ test_description='performance of git-describe'
 
 test_perf_default_repo
 
+test_lazy_prereq PERF_REFFILES '
+	test "$(git rev-parse --show-ref-format)" = files
+'
+
+ref_count=10000
+
 # clear out old tags and give us a known state
 test_expect_success 'set up tags' '
 	git for-each-ref --format="delete %(refname)" refs/tags >to-delete &&
@@ -27,4 +33,18 @@ test_perf 'describe HEAD with one tag' '
 	git describe --match=new HEAD
 '
 
+test_expect_success PERF_REFFILES 'set up many unrelated refs' '
+	git tag -m tip tip HEAD &&
+	for i in $(test_seq $ref_count)
+	do
+		printf "create refs/heads/describe-perf/%05d HEAD\n" $i ||
+		return 1
+	done >instructions &&
+	git update-ref --stdin <instructions
+'
+
+test_perf 'describe exact tag with many loose refs' --prereq PERF_REFFILES '
+	git describe --exact-match HEAD
+'
+
 test_done

---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260607-describe-tag-ref-scope-7d00ae140a58

Best regards,
--  
Tamir Duberstein <tamird@gmail.com>


^ permalink raw reply related

* Re: [PATCH v3 4/6] diff: add long-running diff process via diff.<driver>.process
From: Michael Montalbo @ 2026-06-07 20:36 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Michael Montalbo via GitGitGadget, git
In-Reply-To: <c7987f11-9181-3975-552c-14e74abb2c97@gmx.de>

On Sun, Jun 7, 2026 at 7:36 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Michael,
>
> I stumbled about this patch when it broke CI in Git for Windows, where we
> do _not_ use `NO_PYTHON`, even though Python is unavailable in the
> build/test CI jobs. The existing tests handle this situation gracefully,
> this here patch does not:
>
> On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote:
>
> > diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> > new file mode 100755
> > index 0000000000..f159cd86d8
> > --- /dev/null
> > +++ b/t/t4080-diff-process.sh
> > @@ -0,0 +1,538 @@
> > +#!/bin/sh
> > +
> > +test_description='diff process via long-running process'
> > +
> > +. ./test-lib.sh
> > +
> > +if test_have_prereq PYTHON
> > +then
> > +     PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
>
> When neither `python3` nor `python` are available (which is the case in
> the minimal Git for Windows SDK used in Git's CI runs), this fails under
> `set -e`. Before even running the first test case. Resulting in an
> unexpected TAP format error.
>
> Now, we could "fix" this by imitating what `lib-p4` does (see
> https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8
> which demonstrates that it is indeed a work-around on Windows):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index fdf6da1c341e67..bd22c247ff3856 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -4,9 +4,10 @@ test_description='diff process via long-running process'
>
>  . ./test-lib.sh
>
> -if test_have_prereq PYTHON
> +if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH"
>  then
> -       PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
> +       skip_all='python interpreter not available'
> +       test_done
>  fi
>
>  #
> -- snap --
>
> Of course, this uncovers _another_ problem with the Python script: It uses
> Python3-only `f"..."` format strings, which cannot be handled by the
> Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points.
> So this requires _another follow-up (see also
> https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f):
>
> -- snip --
> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> index bd22c247ff3856..ba14682a9086e4 100755
> --- a/t/t4080-diff-process.sh
> +++ b/t/t4080-diff-process.sh
> @@ -39,7 +39,8 @@ setup_backend () {
>
>         def write_pkt(line):
>             data = (line + "\n").encode()
> -           sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> +           hdr = "{:04x}".format(len(data) + 4).encode()
> +           sys.stdout.buffer.write(hdr + data)
>             sys.stdout.buffer.flush()
>
>         def write_flush():
> @@ -98,7 +99,8 @@ setup_backend () {
>             new = read_content()
>             old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
>             new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> -           log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> +           log("command={} pathname={} old={} new={}".format(
> +               cmd, pathname, old_first, new_first))
>
>             if mode == "error":
>                 write_flush()
> @@ -130,7 +132,7 @@ setup_backend () {
>                 else:
>                     ol = old.count(b"\n")
>                     nl = new.count(b"\n")
> -                   write_pkt(f"hunk 1 {ol} 1 {nl}")
> +                   write_pkt("hunk 1 {} 1 {}".format(ol, nl))
>                 write_flush()
>                 write_pkt("status=success")
>                 write_flush()
> -- snap --
>
> And this is still not enough to make it work with Python2, see
> https://github.com/dscho/git/actions/runs/27091523842/job/79955895737:
>
> -- snip --
> [...]
> + git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c
>   Traceback (most recent call last):
>     File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module>
>       assert read_pkt() == "git-diff-client"
>     File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt
>       hdr = sys.stdin.buffer.read(4)
>   AttributeError: 'file' object has no attribute 'buffer'
> -- snap --
>
> I have experienced similar patterns in my career, where a single decision
> required multiple follow-up fixes _just_ to avoid having to revert that
> decision. This kind of doubling down has never ended well.
>
> Therefore I would like to take a step back, and ask: Is it _really_ a good
> idea to use Python here? Are we certain that we want to _require_ Python
> to run this test and skip it if Python isn't available (as is the case in
> the Windows-related parts of Git's very own CI) even if Python has nothing
> at all to do with the feature that is being tested?
>
> I don't want to be doomed to repeat history, and we can very well learn
> e.g. from prior art in this very project, where the tests for the
> clean/smudge filters (which _also_ want to speak pkt-line over stdio)
> needlessly incurred Perl as a requirement to run the tests. It was
> Matheus's heroic work in 52917a998ef3a (t0021: implementation the
> rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the
> new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided
> that unnecessary prerequisite.
>
> Likewise, there is `test-tool pkt-line` intended for driving the pkt-line
> protocol via simple shell scripts.
>
> So the conscious project direction has been: fold pkt-line test backends
> into `test-tool` and drop the scripting-language prereq. Reintroducing the
> same shape in Python would walk this back.
>
> Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
> 2025-04-29) has been another clear sign that the Git project is actively
> _removing_ scripting-language dependencies from the build and test
> infrastructure, not adding new ones.
>

Now I wonder if the extension / addition of more Perl test infra with my other
series:

https://lore.kernel.org/git/pull.2135.git.1780559158.gitgitgadget@gmail.com/T/

also goes against the project direction w.r.t. removing scripting languages.
Maybe I should also re-evaluate the usage of Perl there. I am leveraging
existing shell parsing logic written in Perl, but maybe the preference for
Perl-based lint rules is a mistake and should be avoided.

> The clear prior art in Git's own tests for what t4080 wants to do, as of
> today, is `t/helper/test-rot13-filter.c`, which could be imitated here
> instead of (re-)introducing a dependency on a scripting language other
> than Unix shell in Git's test suite.
>
> The `PYTHON` prereq exists in exactly five files today, all `git p4`
> related (where Python is an intrinsic prerequisite given that `git-p4.py`
> _is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
> `t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
> and `t/t9836-git-p4-metadata-encoding-python3.sh`.
>
> After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here
> patch would be the first one, after almost 20 years, to re-introduce
> Python as a dependency outside `git p4`.
>
> And it would also be the first ever to embed a Python script as a heredoc:
>
> > +fi
> > +
> > +#
> > +# A single parametric diff process.
> > +# Usage: diff-process-backend --mode=<mode> [--log=<path>]
> > +#
> > +# Modes:
> > +#   whole-file  - report all lines as changed (default)
> > +#   fixed-hunk  - always report hunk 5 2 5 2
> > +#   bad-hunk    - report out-of-bounds hunk 999 1 999 1
> > +#   bad-sync    - report hunk with mismatched unchanged totals
> > +#   overlap     - report two overlapping hunks
> > +#   no-hunks   - return no hunks (files considered equivalent)
> > +#   error       - return status=error for every request
> > +#   abort       - return status=abort for every request
> > +#   crash       - read one request then exit without responding
> > +#
> > +setup_backend () {
> > +     cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
> > +     import sys, os
> > +
> > +     def read_pkt():
> > +         hdr = sys.stdin.buffer.read(4)
> > +         if len(hdr) < 4: return None
> > +         length = int(hdr, 16)
> > +         if length == 0: return ""
> > +         data = sys.stdin.buffer.read(length - 4)
> > +         return data.decode().rstrip("\n")
> > +
> > +     def write_pkt(line):
> > +         data = (line + "\n").encode()
> > +         sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> > +         sys.stdout.buffer.flush()
> > +
> > +     def write_flush():
> > +         sys.stdout.buffer.write(b"0000")
> > +         sys.stdout.buffer.flush()
> > +
> > +     def read_content():
> > +         chunks = []
> > +         while True:
> > +             hdr = sys.stdin.buffer.read(4)
> > +             if len(hdr) < 4: break
> > +             length = int(hdr, 16)
> > +             if length == 0: break
> > +             chunks.append(sys.stdin.buffer.read(length - 4))
> > +         return b"".join(chunks)
> > +
> > +     mode = "whole-file"
> > +     logfile = None
> > +     for arg in sys.argv[1:]:
> > +         if arg.startswith("--mode="):
> > +             mode = arg[7:]
> > +         elif arg.startswith("--log="):
> > +             logfile = open(arg[6:], "a")
> > +
> > +     def log(msg):
> > +         if logfile:
> > +             logfile.write(msg + "\n")
> > +             logfile.flush()
> > +
> > +     # Handshake
> > +     assert read_pkt() == "git-diff-client"
> > +     assert read_pkt() == "version=1"
> > +     read_pkt()
> > +     write_pkt("git-diff-server")
> > +     write_pkt("version=1")
> > +     write_flush()
> > +     while True:
> > +         p = read_pkt()
> > +         if p == "": break
> > +     write_pkt("capability=hunks")
> > +     write_flush()
> > +
> > +     log("ready")
> > +
> > +     while True:
> > +         cmd = None
> > +         pathname = None
> > +         while True:
> > +             p = read_pkt()
> > +             if p is None: sys.exit(0)
> > +             if p == "": break
> > +             if p.startswith("command="): cmd = p.split("=",1)[1]
> > +             if p.startswith("pathname="): pathname = p.split("=",1)[1]
> > +         if cmd is None: sys.exit(0)
> > +         old = read_content()
> > +         new = read_content()
> > +         old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> > +         new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> > +         log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> > +
> > +         if mode == "error":
> > +             write_flush()
> > +             write_pkt("status=error")
> > +             write_flush()
> > +             continue
> > +
> > +         if mode == "abort":
> > +             write_flush()
> > +             write_pkt("status=abort")
> > +             write_flush()
> > +             continue
> > +
> > +         if mode == "crash":
> > +             sys.exit(1)
> > +
> > +         if cmd == "hunks":
> > +             if mode == "fixed-hunk":
> > +                 write_pkt("hunk 5 2 5 2")
> > +             elif mode == "bad-hunk":
> > +                 write_pkt("hunk 999 1 999 1")
> > +             elif mode == "bad-sync":
> > +                 write_pkt("hunk 1 2 1 1")
> > +             elif mode == "overlap":
> > +                 write_pkt("hunk 1 5 1 5")
> > +                 write_pkt("hunk 3 2 3 2")
> > +             elif mode == "no-hunks":
> > +                 pass
> > +             else:
> > +                 ol = old.count(b"\n")
> > +                 nl = new.count(b"\n")
> > +                 write_pkt(f"hunk 1 {ol} 1 {nl}")
> > +             write_flush()
> > +             write_pkt("status=success")
> > +             write_flush()
> > +         else:
> > +             write_flush()
> > +             write_pkt("status=error")
> > +             write_flush()
> > +     PYEOF
>
> The existing pattern is to provide larger scripts as fixtures in
> associated `t/tNNNN/` directories, not as heredoc, see e.g.
> `t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in
> heredoc strings makes it virtually impossible to use static code analysis
> or syntax highlighting to fend off banal errors.
>
> Given the complexity of what t4080 tries to test (error, abort, crash,
> bad-sync, no-hunks, multiple files in one session, capability
> negotiation), it would unfortunately be infeasible to use `test-tool
> pkt-line` from a shell script implementing that `diff.*.process` protocol.
>
> So I've spiked a demo how the `test-tool diff-process-backend` could look
> like (letting Opus do the menial typing, so that I can enjoy at least part
> of a sunny Sunday outside), which also passes the CI build and test:
> https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6
>
> That commit is of course not intended to be used as-is; Feel free to pick
> code parts of it and integrate them into your topic branch. Or write your
> own test-tool helper from scratch if that's more your jam.
>
> Ciao,
> Johannes
>
> > +     write_script diff-process-backend <<-SHEOF
> > +     exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
> > +     SHEOF
> > +}
> > +
> > +BACKEND="./diff-process-backend"
> > +
> > +test_expect_success PYTHON 'setup' '
> > +     setup_backend &&
> > +     echo "*.c diff=cdiff" >.gitattributes &&
> > +     git add .gitattributes &&
> > +
> > +     # boundary.c: 10 lines, changes at 5-6 and 9-10.
> > +     # Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
> > +     cat >boundary.c <<-\EOF &&
> > +     line1
> > +     line2
> > +     line3
> > +     line4
> > +     OLD5
> > +     OLD6
> > +     line7
> > +     line8
> > +     OLD9
> > +     OLD10
> > +     EOF
> > +     git add boundary.c &&
> > +
> > +     # worddiff.c: single-line function, value changes 1 -> 999.
> > +     # Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
> > +     cat >worddiff.c <<-\EOF &&
> > +     int value(void) { return 1; }
> > +     EOF
> > +     git add worddiff.c &&
> > +
> > +     # newfile.c: single-line function, value changes 42 -> 99.
> > +     # Used by: new file, --exit-code, multiple drivers.
> > +     cat >newfile.c <<-\EOF &&
> > +     int new_func(void) { return 42; }
> > +     EOF
> > +     git add newfile.c &&
> > +
> > +     # logtest.c: single-line function for log/format-patch tests.
> > +     # Needs two commits so log -1 has a diff.
> > +     cat >logtest.c <<-\EOF &&
> > +     int logfunc(void) { return 1; }
> > +     EOF
> > +     git add logtest.c &&
> > +
> > +     # two.c/one.c: two-file pair for error/abort/startup-failure tests.
> > +     cat >one.c <<-\EOF &&
> > +     int first(void) { return 1; }
> > +     EOF
> > +     cat >two.c <<-\EOF &&
> > +     int second(void) { return 2; }
> > +     EOF
> > +     git add one.c two.c &&
> > +
> > +     git commit -m "initial" &&
> > +
> > +     # Second commit for logtest.c (so log -1 has something to show).
> > +     cat >logtest.c <<-\EOF &&
> > +     int logfunc(void) { return 2; }
> > +     EOF
> > +     git add logtest.c &&
> > +     git commit -m "change logtest.c" &&
> > +
> > +     # Working tree modifications (not committed).
> > +     cat >boundary.c <<-\EOF &&
> > +     line1
> > +     line2
> > +     line3
> > +     line4
> > +     NEW5
> > +     NEW6
> > +     line7
> > +     line8
> > +     NEW9
> > +     NEW10
> > +     EOF
> > +
> > +     cat >worddiff.c <<-\EOF &&
> > +     int value(void) { return 999; }
> > +     EOF
> > +
> > +     cat >newfile.c <<-\EOF &&
> > +     int new_func(void) { return 99; }
> > +     EOF
> > +
> > +     cat >one.c <<-\EOF &&
> > +     int first(void) { return 10; }
> > +     EOF
> > +
> > +     cat >two.c <<-\EOF
> > +     int second(void) { return 20; }
> > +     EOF
> > +'
> > +
> > +#
> > +# Core behavior: the tool controls which lines are marked as changed.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process hunk boundaries affect output' '
> > +     # The file has changes at lines 5-6 and 9-10, but fixed-hunk
> > +     # only reports lines 5-6 as changed.  Lines 9-10 should not
> > +     # appear as changed in the output.
> > +     git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
> > +             diff boundary.c >actual &&
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^-OLD6" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^+NEW6" actual &&
> > +     test_grep ! "^-OLD9" actual &&
> > +     test_grep ! "^-OLD10" actual &&
> > +     test_grep ! "^+NEW9" actual &&
> > +     test_grep ! "^+NEW10" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with new file' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- newfile.c >actual 2>stderr &&
> > +     test_grep "return 99" actual &&
> > +     test_grep "pathname=newfile.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with added file (empty old side)' '
> > +     cat >added.c <<-\EOF &&
> > +     int added(void) { return 1; }
> > +     EOF
> > +     git add added.c &&
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --cached -- added.c >actual 2>stderr &&
> > +     test_grep "added" actual &&
> > +     test_grep "pathname=added.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process skipped for binary files' '
> > +     printf "\\0binary" >binary.c &&
> > +     git add binary.c &&
> > +     git commit -m "add binary" &&
> > +     printf "\\0changed" >binary.c &&
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- binary.c >actual &&
> > +     test_grep "Binary files" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
> > +     echo "not tracked by cdiff" >unmatched.txt &&
> > +     git add unmatched.txt &&
> > +     git commit -m "add unmatched.txt" &&
> > +
> > +     echo "modified" >unmatched.txt &&
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- unmatched.txt >actual &&
> > +     test_grep "modified" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'multiple drivers use separate processes' '
> > +     echo "*.h diff=hdiff" >>.gitattributes &&
> > +     git add .gitattributes &&
> > +
> > +     cat >multi.h <<-\EOF &&
> > +     int header(void) { return 1; }
> > +     EOF
> > +     git add multi.h &&
> > +     git commit -m "add multi.h" &&
> > +
> > +     cat >multi.h <<-\EOF &&
> > +     int header(void) { return 2; }
> > +     EOF
> > +
> > +     rm -f backend-c.log backend-h.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
> > +         -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
> > +             diff -- newfile.c multi.h >actual 2>stderr &&
> > +     test_grep "pathname=newfile.c" backend-c.log &&
> > +     test_grep "pathname=multi.h" backend-h.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works alongside textconv' '
> > +     write_script uppercase-filter <<-\EOF &&
> > +     tr "a-z" "A-Z" <"$1"
> > +     EOF
> > +
> > +     cat >textconv.c <<-\EOF &&
> > +     hello world
> > +     EOF
> > +     git add textconv.c &&
> > +     git commit -m "add textconv.c" &&
> > +
> > +     cat >textconv.c <<-\EOF &&
> > +     goodbye world
> > +     EOF
> > +
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.textconv="./uppercase-filter" \
> > +         -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff -- textconv.c >actual 2>stderr &&
> > +     # The diff process receives textconv-transformed (uppercase) content.
> > +     test_grep "pathname=textconv.c" backend.log &&
> > +     test_grep "old=HELLO WORLD" backend.log &&
> > +     test_grep "new=GOODBYE WORLD" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +#
> > +# Downstream features: word diff, log, equivalent files, exit code.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process with --word-diff' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --word-diff worddiff.c >actual 2>stderr &&
> > +     test_grep "\[-1;-\]" actual &&
> > +     test_grep "{+999;+}" actual &&
> > +     test_grep "pathname=worddiff.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process works with git log -p' '
> > +     # With no-hunks mode, the tool says the files are equivalent,
> > +     # so log -p should show the commit but no diff content.
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
> > +             log -1 -p -- logtest.c >actual 2>stderr &&
> > +     test_grep "change logtest.c" actual &&
> > +     test_grep ! "return 2" actual &&
> > +     test_grep "command=hunks pathname=logtest.c" backend.log &&
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
> > +     cat >nohunks.c <<-\EOF &&
> > +     int zero(void) { return 0; }
> > +     EOF
> > +     git add nohunks.c &&
> > +     git commit -m "add nohunks.c" &&
> > +
> > +     cat >nohunks.c <<-\EOF &&
> > +     int zero(void) { return 999; }
> > +     EOF
> > +
> > +     git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > +             diff nohunks.c >actual &&
> > +     test_must_be_empty actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
> > +     git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> > +             diff --exit-code nohunks.c
> > +'
> > +
> > +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
> > +     test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
> > +             diff --exit-code newfile.c
> > +'
> > +
> > +#
> > +# Bypass mechanisms: flags and commands that skip the diff process.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --diff-algorithm=patience worddiff.c >actual &&
> > +     test_grep "return 999" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process not used by --stat' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> > +             diff --stat worddiff.c >actual &&
> > +     test_grep "worddiff.c" actual &&
> > +     test_path_is_missing backend.log
> > +'
> > +
> > +#
> > +# Error handling and fallback.
> > +#
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool error status' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     # Fallback produces the full builtin diff (both change regions).
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^-OLD9" actual &&
> > +     test_grep "^+NEW9" actual &&
> > +     # Tool was contacted (it replied with error, not crash).
> > +     test_grep "command=hunks pathname=boundary.c" backend.log &&
> > +     test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process error keeps tool available for next file' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> > +             diff -- one.c two.c >actual 2>stderr &&
> > +     # Unlike abort, error keeps the tool available: both files
> > +     # are sent to the tool (and both fall back).
> > +     test_grep "pathname=one.c" backend.log &&
> > +     test_grep "pathname=two.c" backend.log &&
> > +     test_grep "return 10" actual &&
> > +     test_grep "return 20" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process abort disables for session' '
> > +     rm -f backend.log &&
> > +     git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
> > +             diff -- one.c two.c >actual &&
> > +     # Both files should still produce diff output via fallback.
> > +     test_grep "return 10" actual &&
> > +     test_grep "return 20" actual &&
> > +     # The tool aborts on the first file and git clears its
> > +     # capability.  The second file never contacts the tool.
> > +     test_grep "pathname=one.c" backend.log &&
> > +     test_grep ! "pathname=two.c" backend.log
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on tool crash' '
> > +     git -c diff.cdiff.process="$BACKEND --mode=crash" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^-OLD9" actual &&
> > +     test_grep "^+NEW9" actual &&
> > +     # Crash is a communication failure, so a warning is emitted.
> > +     test_grep "diff process.*failed" stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process startup failure only warns once' '
> > +     git -c diff.cdiff.process="/nonexistent/tool" \
> > +             diff -- one.c two.c >actual 2>stderr &&
> > +     # Both files produce diff output via fallback.
> > +     test_grep "return 10" actual &&
> > +     test_grep "return 20" actual &&
> > +     # Sentinel prevents repeated warnings: only one, not one per file.
> > +     test_grep "diff process.*failed" stderr >warnings &&
> > +     test_line_count = 1 warnings
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on bad hunks' '
> > +     git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     test_grep "^-OLD5" actual &&
> > +     test_grep "^+NEW5" actual &&
> > +     test_grep "^-OLD9" actual &&
> > +     test_grep "^+NEW9" actual &&
> > +     # Invalid hunks are caught by xdiff validation, not the
> > +     # protocol layer, so no warning is emitted.
> > +     test_must_be_empty stderr
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
> > +     cat >synctest.c <<-\EOF &&
> > +     line1
> > +     line2
> > +     line3
> > +     EOF
> > +     git add synctest.c &&
> > +     git commit -m "add synctest.c" &&
> > +
> > +     cat >synctest.c <<-\EOF &&
> > +     line1
> > +     changed
> > +     line3
> > +     EOF
> > +
> > +     # bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
> > +     # line as changed, leaving 1 unchanged old vs 2 unchanged new.
> > +     # The synchronization invariant fails and git falls back.
> > +     git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
> > +             diff synctest.c >actual 2>stderr &&
> > +     test_grep "changed" actual
> > +'
> > +
> > +test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
> > +     # boundary.c has 10 lines, so both hunks are in bounds
> > +     # but they overlap at lines 3-5, triggering the ordering check.
> > +     git -c diff.cdiff.process="$BACKEND --mode=overlap" \
> > +             diff boundary.c >actual 2>stderr &&
> > +     test_grep "NEW5" actual
> > +'
> > +
> > +test_done
> > diff --git a/userdiff.h b/userdiff.h
> > index 51c26e0d41..a98eabe377 100644
> > --- a/userdiff.h
> > +++ b/userdiff.h
> > @@ -3,6 +3,7 @@
> >
> >  #include "notes-cache.h"
> >
> > +struct diff_subprocess;
> >  struct index_state;
> >  struct repository;
> >
> > @@ -33,6 +34,8 @@ struct userdiff_driver {
> >       int textconv_want_cache;
> >       const char *process;
> >       char *process_owned;
> > +     struct diff_subprocess *diff_subprocess;
> > +     unsigned diff_process_failed : 1;
> >  };
> >  enum userdiff_driver_type {
> >       USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
> > --
> > gitgitgadget
> >
> >
> >

^ permalink raw reply

* [PATCH RFC 2/2] builtin/history: print feedback after successful reword
From: Pablo Sabater @ 2026-06-07 20:07 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Kaartic Sivaraam, Pablo Sabater
In-Reply-To: <20260607-ps-history-reword-v1-0-ba43a3cbb81b@gmail.com>

Unlike `git commit --amend` and `git rebase -i`, `git history reword`
doesn't print anything, this makes it feel empty for a porcelain command
and hard to tell if the command did anything without using other
commands like `git log <commit>` to check if the reword was done.

Print a message on successful rewords so the user has feedback about it.

Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
 builtin/history.c         |  4 ++++
 t/t3451-history-reword.sh | 14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/builtin/history.c b/builtin/history.c
index 51a22a9a1c..0f1ba3b531 100644
--- a/builtin/history.c
+++ b/builtin/history.c
@@ -739,6 +739,10 @@ static int cmd_history_reword(int argc,
 		goto out;
 	}
 
+	fprintf(stderr, _("Successfully reworded commit %s to %s\n"),
+		repo_find_unique_abbrev(repo, &original->object.oid, DEFAULT_ABBREV),
+		repo_find_unique_abbrev(repo, &rewritten->object.oid, DEFAULT_ABBREV));
+
 	ret = 0;
 
 out:
diff --git a/t/t3451-history-reword.sh b/t/t3451-history-reword.sh
index 54ea8a7207..4b22d761e3 100755
--- a/t/t3451-history-reword.sh
+++ b/t/t3451-history-reword.sh
@@ -416,4 +416,18 @@ test_expect_success 'aborts if the commit message is the same' '
 	)
 '
 
+test_expect_success 'prints feedback on successful reword' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit first &&
+
+		reword_with_message HEAD 2>err <<-EOF &&
+		first reworded
+		EOF
+		test_grep "Successfully reworded" err
+	)
+'
+
 test_done

-- 
2.54.0


^ permalink raw reply related

* [PATCH RFC 1/2] builtin/history: abort reword on unchanged message
From: Pablo Sabater @ 2026-06-07 20:07 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Kaartic Sivaraam, Pablo Sabater
In-Reply-To: <20260607-ps-history-reword-v1-0-ba43a3cbb81b@gmail.com>

When using `git history reword` if the new message is the same as the
original it continues anyway creating a new commit with the same
message and updates its descendants, modifying the history after this
'reworded' commit even though there was no actual change.

`git commit --amend` and `git rebase -i` + reword share this behavior,
however `git history reword` is different:
1. Works in-memory without touching the index or the worktree [1], so
   there are no side effects like staged files that could justify
   rewriting the history when the commit message is the same.
2. `git history` by default updates all the branches [2] that contain the
   original commit making it more costly than `git rebase -i` that only
   updates the current branch.

Add a check if the original commit message is the same as the new one
and abort if so.

[1]: https://lore.kernel.org/git/20260113-b4-pks-history-builtin-v11-8-e74ebfa2652d@pks.im/
[2]: https://git-scm.com/docs/git-history#_description

Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
 builtin/history.c         | 10 ++++++++++
 t/t3451-history-reword.sh | 20 ++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/builtin/history.c b/builtin/history.c
index 0fc06fb204..51a22a9a1c 100644
--- a/builtin/history.c
+++ b/builtin/history.c
@@ -135,6 +135,13 @@ static int commit_tree_ext(struct repository *repo,
 					  original_body, action, &commit_message);
 		if (ret < 0)
 			goto out;
+
+		if (!strcmp(original_body, commit_message.buf)) {
+			fprintf(stderr, _("Message unchanged,"
+					  " aborting reword.\n"));
+			ret = 1;
+			goto out;
+		}
 	} else {
 		strbuf_addstr(&commit_message, original_body);
 	}
@@ -718,6 +725,9 @@ static int cmd_history_reword(int argc,
 	if (ret < 0) {
 		ret = error(_("failed writing reworded commit"));
 		goto out;
+	} else if (ret == 1) {
+		ret = 0;
+		goto out;
 	}
 
 	strbuf_addf(&reflog_msg, "reword: updating %s", argv[0]);
diff --git a/t/t3451-history-reword.sh b/t/t3451-history-reword.sh
index de7b357685..54ea8a7207 100755
--- a/t/t3451-history-reword.sh
+++ b/t/t3451-history-reword.sh
@@ -396,4 +396,24 @@ test_expect_success 'retains changes in the worktree and index' '
 	)
 '
 
+test_expect_success 'aborts if the commit message is the same' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit first &&
+		test_commit second &&
+
+		git rev-parse HEAD >oid-before &&
+		write_script fake-editor.sh <<-\EOF &&
+		true
+		EOF
+		test_set_editor "$(pwd)"/fake-editor.sh &&
+		git history reword HEAD 2>err &&
+		git rev-parse HEAD >oid-after &&
+		test_cmp oid-before oid-after &&
+		test_grep "Message unchanged" err
+	)
+'
+
 test_done

-- 
2.54.0


^ permalink raw reply related

* [PATCH RFC 0/2] builtin/history: change git history reword behavior and feedback
From: Pablo Sabater @ 2026-06-07 20:07 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Kaartic Sivaraam, Pablo Sabater

This small series contains two commits that aim to improve
`git history reword`:
1. Abort the reword when the original message and the new message are
   the same to avoid unnecessary history changes.
2. Print feedback after a successful reword so the user knows about it.

`git commit --amend` and `git rebase -i` with reword don't abort if
the commit message is the same as the original and they update as if
it was a new message in favor of changing this behavior for
`git history reword`:
- As noted in the `git history` documentation, the command by
  default updates all branches that contain the original commit [1]
  this makes `git history reword` more expensive than other options
  like `git rebase -i` that only updates the current branch.
- `git history` works in-memory without touching the worktree or index
  [2], because it doesn't use the sequencer and `git history reword`
  doesn't care about the staged files only about the commit message, it
  should have no problems.

About the last fact in favor of 1, I'm not completely sure if it's
because of staged files that's the reason why `git commit --amend` or
`git rebase -i` with reword still updates even if the commit message
is the same one. I'm not very up to sequencer.c to be sure but maybe
there's a historical reason about it that someone knows. Anyways I
believe that given this new command is a good idea to discuss it.

The commit message of 1 mentions staged files as a possible justification
for why --amend and rebase behave this way, but that's just an
assumption that I'll be happy to change if I'm wrong.

[1]: https://git-scm.com/docs/git-history#_description
[2]: https://lore.kernel.org/git/20260113-b4-pks-history-builtin-v11-8-e74ebfa2652d@pks.im/

To: git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>
Cc: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>

Signed-off-by: Pablo Sabater <pabloosabaterr@gmail.com>
---
Pablo Sabater (2):
      builtin/history: abort reword on unchanged message
      builtin/history: print feedback after successful reword

 builtin/history.c         | 14 ++++++++++++++
 t/t3451-history-reword.sh | 34 ++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)
---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260607-ps-history-reword-fcb70eaa4aa9

Best regards,
--  
Pablo Sabater <pabloosabaterr@gmail.com>


^ permalink raw reply

* Re: [PATCH] ls-files: filter pathspec before lstat
From: Tamir Duberstein @ 2026-06-07 18:24 UTC (permalink / raw)
  To: Kristoffer Haugsbakk
  Cc: git, René Scharfe, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <e42fac49-5037-4eac-b4c8-58bc62857ee2@app.fastmail.com>

On Sun, Jun 7, 2026 at 12:17 PM Kristoffer Haugsbakk
<kristofferhaugsbakk@fastmail.com> wrote:
>
> On Sun, Jun 7, 2026, at 18:07, Tamir Duberstein wrote:
> > On Sun, Jun 7, 2026 at 12:02 PM Kristoffer Haugsbakk
> > <kristofferhaugsbakk@fastmail.com> wrote:
> >>[snip]
> >>
> >> I have done the same thing in our company repo, crediting <LLM> for
> >> authoring or co-authoring or helping with a specific thing. Using a
> >> “people” trailer. But the intent was just to show how some LLM was
> >> involved. So I think I am going to switch to the following trailer for
> >> our company repo.
> >>
> >>     LLM: Yes
> >
> > This all sounds reasonable to me. The kernel has started asking for
> > this trailer
> > (https://github.com/torvalds/linux/commit/78d979db6cef557c171d6059cbce06c3db89c7ee)
> > and I saw precedent in Git as recently as last month
> > (https://github.com/git/git/commit/7a094d68a27e321a99c8ab6b700909e503904bd9)
> > so I erred on the side of caution.
> >
> > I am also OK with this trailer being dropped or replaced on apply.
>
> The most important thing to be aware of is “Use of Artificial
> Intelligence (AI)” in `Documentation/SubmittingPatches`. :)

Acknowledge, thanks! Will omit the trailer on future submissions.

^ permalink raw reply

* Re: [PATCH] ls-files: filter pathspec before lstat
From: Kristoffer Haugsbakk @ 2026-06-07 16:17 UTC (permalink / raw)
  To: Tamir Duberstein
  Cc: git, René Scharfe, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <CAJ-ks9nXybntsa5FCJVWSQ2u+hzxaMdrfCdL3D+vmzjO4e21kQ@mail.gmail.com>

On Sun, Jun 7, 2026, at 18:07, Tamir Duberstein wrote:
> On Sun, Jun 7, 2026 at 12:02 PM Kristoffer Haugsbakk
> <kristofferhaugsbakk@fastmail.com> wrote:
>>[snip]
>>
>> I have done the same thing in our company repo, crediting <LLM> for
>> authoring or co-authoring or helping with a specific thing. Using a
>> “people” trailer. But the intent was just to show how some LLM was
>> involved. So I think I am going to switch to the following trailer for
>> our company repo.
>>
>>     LLM: Yes
>
> This all sounds reasonable to me. The kernel has started asking for
> this trailer
> (https://github.com/torvalds/linux/commit/78d979db6cef557c171d6059cbce06c3db89c7ee)
> and I saw precedent in Git as recently as last month
> (https://github.com/git/git/commit/7a094d68a27e321a99c8ab6b700909e503904bd9)
> so I erred on the side of caution.
>
> I am also OK with this trailer being dropped or replaced on apply.

The most important thing to be aware of is “Use of Artificial
Intelligence (AI)” in `Documentation/SubmittingPatches`. :)

Thanks

^ permalink raw reply

* Re: [PATCH] ls-files: filter pathspec before lstat
From: Tamir Duberstein @ 2026-06-07 16:07 UTC (permalink / raw)
  To: Kristoffer Haugsbakk
  Cc: git, René Scharfe, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <8f3bab63-3b37-4492-a39e-95e610a15a07@app.fastmail.com>

On Sun, Jun 7, 2026 at 12:02 PM Kristoffer Haugsbakk
<kristofferhaugsbakk@fastmail.com> wrote:
>
> On Sun, Jun 7, 2026, at 17:40, Tamir Duberstein wrote:
> >[snip]
> > Assisted-by: Codex gpt-5.5
>
> This is more of a Git for Windows trailer. The Git project doesn’t
> document its use.
>
> An aside here but these trailers attributing specific LLMs feels like
> etching “Peter was here” under some table. What benefit for the project
> does knowing that it was this version of Codex or Claude or something?
> A link to the prompt/conversation would provide provenance and show how
> the LLM was used. But three years from now, what information beyond the
> fact that an LLM was involved (any of them) does this offer?
>
> I can understand the benefit for the companies behind these LLMs to have
> these attributions in OSS projects.
>
> I have done the same thing in our company repo, crediting <LLM> for
> authoring or co-authoring or helping with a specific thing. Using a
> “people” trailer. But the intent was just to show how some LLM was
> involved. So I think I am going to switch to the following trailer for
> our company repo.
>
>     LLM: Yes

This all sounds reasonable to me. The kernel has started asking for
this trailer (https://github.com/torvalds/linux/commit/78d979db6cef557c171d6059cbce06c3db89c7ee)
and I saw precedent in Git as recently as last month
(https://github.com/git/git/commit/7a094d68a27e321a99c8ab6b700909e503904bd9)
so I erred on the side of caution.

I am also OK with this trailer being dropped or replaced on apply.

^ permalink raw reply

* Re: [PATCH] ls-files: filter pathspec before lstat
From: Kristoffer Haugsbakk @ 2026-06-07 16:02 UTC (permalink / raw)
  To: Tamir Duberstein, git
  Cc: René Scharfe, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <20260607-ls-files-pathspec-lstat-v1-1-8cf40b730146@gmail.com>

On Sun, Jun 7, 2026, at 17:40, Tamir Duberstein wrote:
>[snip]
> Assisted-by: Codex gpt-5.5

This is more of a Git for Windows trailer. The Git project doesn’t
document its use.

An aside here but these trailers attributing specific LLMs feels like
etching “Peter was here” under some table. What benefit for the project
does knowing that it was this version of Codex or Claude or something?
A link to the prompt/conversation would provide provenance and show how
the LLM was used. But three years from now, what information beyond the
fact that an LLM was involved (any of them) does this offer?

I can understand the benefit for the companies behind these LLMs to have
these attributions in OSS projects.

I have done the same thing in our company repo, crediting <LLM> for
authoring or co-authoring or helping with a specific thing. Using a
“people” trailer. But the intent was just to show how some LLM was
involved. So I think I am going to switch to the following trailer for
our company repo.

    LLM: Yes

> Signed-off-by: Tamir Duberstein <tamird@gmail.com>
> ---
>[snip]

^ permalink raw reply

* [PATCH] ls-files: filter pathspec before lstat
From: Tamir Duberstein @ 2026-06-07 15:40 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Patrick Steinhardt, Junio C Hamano,
	Tamir Duberstein

show_files() checks whether each index entry is deleted or modified
before show_ce() applies the pathspec. prune_index() avoids most of this
work for pathspecs with a common directory prefix, but a top-level name
or leading wildcard leaves every entry to be checked.

Match the pathspec before lstat() for the deleted and modified modes.
Keep the later match in show_ce() so --error-unmatch is satisfied only
by entries that are actually shown.

On a repository with 859,211 index entries, a 19,931,862-byte index, and
25,303,439 packed objects occupying 21.13 GiB, I ran the following
command with the parent and patched binaries:

    hyperfine --warmup 0 --runs 3 \
        'git -c core.fsmonitor=false ls-files --deleted -- README.md'

The results were:

             parent       this commit
  elapsed    60.742 s     1.061 s
  user        1.117 s     0.963 s
  system     10.740 s     0.042 s

Both revisions were built with -O3, -mcpu=native, and ThinLTO using
Apple clang 21.0.0 on macOS 26.5. The machine was a MacBook Pro
(Mac16,6) with a 16-core Apple M4 Max (12 performance and four
efficiency cores) and 128 GB RAM.

Assisted-by: Codex gpt-5.5
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
---
 builtin/ls-files.c                  |  7 +++++++
 t/meson.build                       |  1 +
 t/perf/p3010-ls-files.sh            | 27 +++++++++++++++++++++++++++
 t/t3010-ls-files-killed-modified.sh | 18 ++++++++++++++++++
 4 files changed, 53 insertions(+)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index e1a22b41b9..702c607183 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -450,6 +450,13 @@ static void show_files(struct repository *repo, struct dir_struct *dir)
 			continue;
 		if (ce_skip_worktree(ce))
 			continue;
+		/* Only entries shown by show_ce() satisfy --error-unmatch. */
+		if (pathspec.nr &&
+		    !match_pathspec(repo->index, &pathspec, fullname.buf,
+				    fullname.len, max_prefix_len, NULL,
+				    S_ISDIR(ce->ce_mode) ||
+				    S_ISGITLINK(ce->ce_mode)))
+			continue;
 		stat_err = lstat(fullname.buf, &st);
 		if (stat_err && (errno != ENOENT && errno != ENOTDIR))
 			error_errno("cannot lstat '%s'", fullname.buf);
diff --git a/t/meson.build b/t/meson.build
index 2af8d01279..ee8086e6ef 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -1140,6 +1140,7 @@ benchmarks = [
   'perf/p1500-graph-walks.sh',
   'perf/p1501-rev-parse-oneline.sh',
   'perf/p2000-sparse-operations.sh',
+  'perf/p3010-ls-files.sh',
   'perf/p3400-rebase.sh',
   'perf/p3404-rebase-interactive.sh',
   'perf/p4000-diff-algorithms.sh',
diff --git a/t/perf/p3010-ls-files.sh b/t/perf/p3010-ls-files.sh
new file mode 100755
index 0000000000..bb80768063
--- /dev/null
+++ b/t/perf/p3010-ls-files.sh
@@ -0,0 +1,27 @@
+#!/bin/sh
+
+test_description='Tests ls-files worktree performance'
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+test_checkout_worktree
+
+test_expect_success 'select a zero-prefix pathspec' '
+	tracked_file=$(git ls-files | sed -n 1p) &&
+	test -n "$tracked_file" &&
+	pathspec="?${tracked_file#?}" &&
+	test_export pathspec
+'
+
+test_perf 'ls-files --deleted with pathspec' '
+	git -c core.fsmonitor=false ls-files --deleted \
+		-- "$pathspec" >/dev/null
+'
+
+test_perf 'ls-files --modified with pathspec' '
+	git -c core.fsmonitor=false ls-files --modified \
+		-- "$pathspec" >/dev/null
+'
+
+test_done
diff --git a/t/t3010-ls-files-killed-modified.sh b/t/t3010-ls-files-killed-modified.sh
index 7af4532cd1..6e38e10219 100755
--- a/t/t3010-ls-files-killed-modified.sh
+++ b/t/t3010-ls-files-killed-modified.sh
@@ -124,4 +124,22 @@ test_expect_success 'validate git ls-files -m output.' '
 	test_cmp .expected .output
 '
 
+test_expect_success 'worktree modes honor wildcard pathspecs' '
+	cat >.expected <<-\EOF &&
+	path2/file2
+	path3/file3
+	EOF
+	git ls-files --deleted -- "path?/file?" >.output &&
+	test_cmp .expected .output &&
+
+	cat >.expected <<-\EOF &&
+	path7
+	path8
+	EOF
+	git ls-files --modified --error-unmatch -- "path[78]" >.output &&
+	test_cmp .expected .output &&
+
+	test_must_fail git ls-files --modified --error-unmatch -- path10
+'
+
 test_done

---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260607-ls-files-pathspec-lstat-885125a5d644

Best regards,
--  
Tamir Duberstein <tamird@gmail.com>


^ permalink raw reply related

* Re: [PATCH v3 4/6] diff: add long-running diff process via diff.<driver>.process
From: Johannes Schindelin @ 2026-06-07 14:36 UTC (permalink / raw)
  To: Michael Montalbo via GitGitGadget; +Cc: git, Michael Montalbo, Michael Montalbo
In-Reply-To: <d044fa0ee5c9cda7dfe4f663f34443103521ef43.1780087700.git.gitgitgadget@gmail.com>

Hi Michael,

I stumbled about this patch when it broke CI in Git for Windows, where we
do _not_ use `NO_PYTHON`, even though Python is unavailable in the
build/test CI jobs. The existing tests handle this situation gracefully,
this here patch does not:

On Sun, 7 Jun 2026, Michael Montalbo via GitGitGadget wrote:

> diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
> new file mode 100755
> index 0000000000..f159cd86d8
> --- /dev/null
> +++ b/t/t4080-diff-process.sh
> @@ -0,0 +1,538 @@
> +#!/bin/sh
> +
> +test_description='diff process via long-running process'
> +
> +. ./test-lib.sh
> +
> +if test_have_prereq PYTHON
> +then
> +	PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)

When neither `python3` nor `python` are available (which is the case in
the minimal Git for Windows SDK used in Git's CI runs), this fails under
`set -e`. Before even running the first test case. Resulting in an
unexpected TAP format error.

Now, we could "fix" this by imitating what `lib-p4` does (see
https://github.com/dscho/git/commit/bd0b5570c744f678911a67a62da63f30655f20d8
which demonstrates that it is indeed a work-around on Windows):

-- snip --
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index fdf6da1c341e67..bd22c247ff3856 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -4,9 +4,10 @@ test_description='diff process via long-running process'
 
 . ./test-lib.sh
 
-if test_have_prereq PYTHON
+if ! test_have_prereq PYTHON || ! test -x "$PYTHON_PATH"
 then
-	PYTHON_PATH=$(command -v python3) || PYTHON_PATH=$(command -v python)
+	skip_all='python interpreter not available'
+	test_done
 fi
 
 #
-- snap --

Of course, this uncovers _another_ problem with the Python script: It uses
Python3-only `f"..."` format strings, which cannot be handled by the
Python2 to which the `PYTHON_PATH` variable in `linux-TEST-vars` points.
So this requires _another follow-up (see also
https://github.com/dscho/git/commit/c12a9f4c80e5ce8db0fe370fac46fb45be2b775f):

-- snip --
diff --git a/t/t4080-diff-process.sh b/t/t4080-diff-process.sh
index bd22c247ff3856..ba14682a9086e4 100755
--- a/t/t4080-diff-process.sh
+++ b/t/t4080-diff-process.sh
@@ -39,7 +39,8 @@ setup_backend () {
 
 	def write_pkt(line):
 	    data = (line + "\n").encode()
-	    sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
+	    hdr = "{:04x}".format(len(data) + 4).encode()
+	    sys.stdout.buffer.write(hdr + data)
 	    sys.stdout.buffer.flush()
 
 	def write_flush():
@@ -98,7 +99,8 @@ setup_backend () {
 	    new = read_content()
 	    old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
 	    new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
-	    log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
+	    log("command={} pathname={} old={} new={}".format(
+	        cmd, pathname, old_first, new_first))
 
 	    if mode == "error":
 	        write_flush()
@@ -130,7 +132,7 @@ setup_backend () {
 	        else:
 	            ol = old.count(b"\n")
 	            nl = new.count(b"\n")
-	            write_pkt(f"hunk 1 {ol} 1 {nl}")
+	            write_pkt("hunk 1 {} 1 {}".format(ol, nl))
 	        write_flush()
 	        write_pkt("status=success")
 	        write_flush()
-- snap --

And this is still not enough to make it work with Python2, see
https://github.com/dscho/git/actions/runs/27091523842/job/79955895737:

-- snip --
[...]
+ git -c diff.cdiff.process=./diff-process-backend --mode=fixed-hunk diff boundary.c
  Traceback (most recent call last):
    File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 45, in <module>
      assert read_pkt() == "git-diff-client"
    File "/__w/git/git/t/trash directory.t4080-diff-process/diff-process-backend.py", line 4, in read_pkt
      hdr = sys.stdin.buffer.read(4)
  AttributeError: 'file' object has no attribute 'buffer'
-- snap --

I have experienced similar patterns in my career, where a single decision
required multiple follow-up fixes _just_ to avoid having to revert that
decision. This kind of doubling down has never ended well.

Therefore I would like to take a step back, and ask: Is it _really_ a good
idea to use Python here? Are we certain that we want to _require_ Python
to run this test and skip it if Python isn't available (as is the case in
the Windows-related parts of Git's very own CI) even if Python has nothing
at all to do with the feature that is being tested?

I don't want to be doomed to repeat history, and we can very well learn
e.g. from prior art in this very project, where the tests for the
clean/smudge filters (which _also_ want to speak pkt-line over stdio)
needlessly incurred Perl as a requirement to run the tests. It was
Matheus's heroic work in 52917a998ef3a (t0021: implementation the
rot13-filter.pl script in C, 2022-08-14) and 4d1d843be7a15 (tests: use the
new C rot13-filter helper to avoid PERL prereq, 2022-08-14) that avoided
that unnecessary prerequisite.

Likewise, there is `test-tool pkt-line` intended for driving the pkt-line
protocol via simple shell scripts.

So the conscious project direction has been: fold pkt-line test backends
into `test-tool` and drop the scripting-language prereq. Reintroducing the
same shape in Python would walk this back.

Patrick's careful effort in 27bd8ee311719 (Merge branch 'ps/fewer-perl',
2025-04-29) has been another clear sign that the Git project is actively
_removing_ scripting-language dependencies from the build and test
infrastructure, not adding new ones.

The clear prior art in Git's own tests for what t4080 wants to do, as of
today, is `t/helper/test-rot13-filter.c`, which could be imitated here
instead of (re-)introducing a dependency on a scripting language other
than Unix shell in Git's test suite.

The `PYTHON` prereq exists in exactly five files today, all `git p4`
related (where Python is an intrinsic prerequisite given that `git-p4.py`
_is_ written in Python): `t/lib-git-p4.sh`, `t/t9802-git-p4-filetype.sh`,
`t/t9810-git-p4-rcs.sh`, `t/t9835-git-p4-metadata-encoding-python2.sh`,
and `t/t9836-git-p4-metadata-encoding-python3.sh`.

After 7cdbff14d482 (remove merge-recursive-old, 2006-11-20), this here
patch would be the first one, after almost 20 years, to re-introduce
Python as a dependency outside `git p4`.

And it would also be the first ever to embed a Python script as a heredoc:

> +fi
> +
> +#
> +# A single parametric diff process.
> +# Usage: diff-process-backend --mode=<mode> [--log=<path>]
> +#
> +# Modes:
> +#   whole-file  - report all lines as changed (default)
> +#   fixed-hunk  - always report hunk 5 2 5 2
> +#   bad-hunk    - report out-of-bounds hunk 999 1 999 1
> +#   bad-sync    - report hunk with mismatched unchanged totals
> +#   overlap     - report two overlapping hunks
> +#   no-hunks   - return no hunks (files considered equivalent)
> +#   error       - return status=error for every request
> +#   abort       - return status=abort for every request
> +#   crash       - read one request then exit without responding
> +#
> +setup_backend () {
> +	cat >"$TRASH_DIRECTORY/diff-process-backend.py" <<-\PYEOF
> +	import sys, os
> +
> +	def read_pkt():
> +	    hdr = sys.stdin.buffer.read(4)
> +	    if len(hdr) < 4: return None
> +	    length = int(hdr, 16)
> +	    if length == 0: return ""
> +	    data = sys.stdin.buffer.read(length - 4)
> +	    return data.decode().rstrip("\n")
> +
> +	def write_pkt(line):
> +	    data = (line + "\n").encode()
> +	    sys.stdout.buffer.write(f"{len(data)+4:04x}".encode() + data)
> +	    sys.stdout.buffer.flush()
> +
> +	def write_flush():
> +	    sys.stdout.buffer.write(b"0000")
> +	    sys.stdout.buffer.flush()
> +
> +	def read_content():
> +	    chunks = []
> +	    while True:
> +	        hdr = sys.stdin.buffer.read(4)
> +	        if len(hdr) < 4: break
> +	        length = int(hdr, 16)
> +	        if length == 0: break
> +	        chunks.append(sys.stdin.buffer.read(length - 4))
> +	    return b"".join(chunks)
> +
> +	mode = "whole-file"
> +	logfile = None
> +	for arg in sys.argv[1:]:
> +	    if arg.startswith("--mode="):
> +	        mode = arg[7:]
> +	    elif arg.startswith("--log="):
> +	        logfile = open(arg[6:], "a")
> +
> +	def log(msg):
> +	    if logfile:
> +	        logfile.write(msg + "\n")
> +	        logfile.flush()
> +
> +	# Handshake
> +	assert read_pkt() == "git-diff-client"
> +	assert read_pkt() == "version=1"
> +	read_pkt()
> +	write_pkt("git-diff-server")
> +	write_pkt("version=1")
> +	write_flush()
> +	while True:
> +	    p = read_pkt()
> +	    if p == "": break
> +	write_pkt("capability=hunks")
> +	write_flush()
> +
> +	log("ready")
> +
> +	while True:
> +	    cmd = None
> +	    pathname = None
> +	    while True:
> +	        p = read_pkt()
> +	        if p is None: sys.exit(0)
> +	        if p == "": break
> +	        if p.startswith("command="): cmd = p.split("=",1)[1]
> +	        if p.startswith("pathname="): pathname = p.split("=",1)[1]
> +	    if cmd is None: sys.exit(0)
> +	    old = read_content()
> +	    new = read_content()
> +	    old_first = old.split(b"\n")[0].decode(errors="replace") if old else ""
> +	    new_first = new.split(b"\n")[0].decode(errors="replace") if new else ""
> +	    log(f"command={cmd} pathname={pathname} old={old_first} new={new_first}")
> +
> +	    if mode == "error":
> +	        write_flush()
> +	        write_pkt("status=error")
> +	        write_flush()
> +	        continue
> +
> +	    if mode == "abort":
> +	        write_flush()
> +	        write_pkt("status=abort")
> +	        write_flush()
> +	        continue
> +
> +	    if mode == "crash":
> +	        sys.exit(1)
> +
> +	    if cmd == "hunks":
> +	        if mode == "fixed-hunk":
> +	            write_pkt("hunk 5 2 5 2")
> +	        elif mode == "bad-hunk":
> +	            write_pkt("hunk 999 1 999 1")
> +	        elif mode == "bad-sync":
> +	            write_pkt("hunk 1 2 1 1")
> +	        elif mode == "overlap":
> +	            write_pkt("hunk 1 5 1 5")
> +	            write_pkt("hunk 3 2 3 2")
> +	        elif mode == "no-hunks":
> +	            pass
> +	        else:
> +	            ol = old.count(b"\n")
> +	            nl = new.count(b"\n")
> +	            write_pkt(f"hunk 1 {ol} 1 {nl}")
> +	        write_flush()
> +	        write_pkt("status=success")
> +	        write_flush()
> +	    else:
> +	        write_flush()
> +	        write_pkt("status=error")
> +	        write_flush()
> +	PYEOF

The existing pattern is to provide larger scripts as fixtures in
associated `t/tNNNN/` directories, not as heredoc, see e.g.
`t/t1509/prepare-chroot.sh`. Writing scripts, especially lengthy ones, in
heredoc strings makes it virtually impossible to use static code analysis
or syntax highlighting to fend off banal errors.

Given the complexity of what t4080 tries to test (error, abort, crash,
bad-sync, no-hunks, multiple files in one session, capability
negotiation), it would unfortunately be infeasible to use `test-tool
pkt-line` from a shell script implementing that `diff.*.process` protocol.

So I've spiked a demo how the `test-tool diff-process-backend` could look
like (letting Opus do the menial typing, so that I can enjoy at least part
of a sunny Sunday outside), which also passes the CI build and test:
https://github.com/dscho/git/commit/b6e3c93381b00929476c3a00155f7cf7334a22e6

That commit is of course not intended to be used as-is; Feel free to pick
code parts of it and integrate them into your topic branch. Or write your
own test-tool helper from scratch if that's more your jam.

Ciao,
Johannes

> +	write_script diff-process-backend <<-SHEOF
> +	exec "$PYTHON_PATH" "$TRASH_DIRECTORY/diff-process-backend.py" "\$@"
> +	SHEOF
> +}
> +
> +BACKEND="./diff-process-backend"
> +
> +test_expect_success PYTHON 'setup' '
> +	setup_backend &&
> +	echo "*.c diff=cdiff" >.gitattributes &&
> +	git add .gitattributes &&
> +
> +	# boundary.c: 10 lines, changes at 5-6 and 9-10.
> +	# Used by: hunk boundaries, error fallback, crash, bad hunks, overlap.
> +	cat >boundary.c <<-\EOF &&
> +	line1
> +	line2
> +	line3
> +	line4
> +	OLD5
> +	OLD6
> +	line7
> +	line8
> +	OLD9
> +	OLD10
> +	EOF
> +	git add boundary.c &&
> +
> +	# worddiff.c: single-line function, value changes 1 -> 999.
> +	# Used by: word-diff, --diff-algorithm, --no-ext-diff, --stat.
> +	cat >worddiff.c <<-\EOF &&
> +	int value(void) { return 1; }
> +	EOF
> +	git add worddiff.c &&
> +
> +	# newfile.c: single-line function, value changes 42 -> 99.
> +	# Used by: new file, --exit-code, multiple drivers.
> +	cat >newfile.c <<-\EOF &&
> +	int new_func(void) { return 42; }
> +	EOF
> +	git add newfile.c &&
> +
> +	# logtest.c: single-line function for log/format-patch tests.
> +	# Needs two commits so log -1 has a diff.
> +	cat >logtest.c <<-\EOF &&
> +	int logfunc(void) { return 1; }
> +	EOF
> +	git add logtest.c &&
> +
> +	# two.c/one.c: two-file pair for error/abort/startup-failure tests.
> +	cat >one.c <<-\EOF &&
> +	int first(void) { return 1; }
> +	EOF
> +	cat >two.c <<-\EOF &&
> +	int second(void) { return 2; }
> +	EOF
> +	git add one.c two.c &&
> +
> +	git commit -m "initial" &&
> +
> +	# Second commit for logtest.c (so log -1 has something to show).
> +	cat >logtest.c <<-\EOF &&
> +	int logfunc(void) { return 2; }
> +	EOF
> +	git add logtest.c &&
> +	git commit -m "change logtest.c" &&
> +
> +	# Working tree modifications (not committed).
> +	cat >boundary.c <<-\EOF &&
> +	line1
> +	line2
> +	line3
> +	line4
> +	NEW5
> +	NEW6
> +	line7
> +	line8
> +	NEW9
> +	NEW10
> +	EOF
> +
> +	cat >worddiff.c <<-\EOF &&
> +	int value(void) { return 999; }
> +	EOF
> +
> +	cat >newfile.c <<-\EOF &&
> +	int new_func(void) { return 99; }
> +	EOF
> +
> +	cat >one.c <<-\EOF &&
> +	int first(void) { return 10; }
> +	EOF
> +
> +	cat >two.c <<-\EOF
> +	int second(void) { return 20; }
> +	EOF
> +'
> +
> +#
> +# Core behavior: the tool controls which lines are marked as changed.
> +#
> +
> +test_expect_success PYTHON 'diff process hunk boundaries affect output' '
> +	# The file has changes at lines 5-6 and 9-10, but fixed-hunk
> +	# only reports lines 5-6 as changed.  Lines 9-10 should not
> +	# appear as changed in the output.
> +	git -c diff.cdiff.process="$BACKEND --mode=fixed-hunk" \
> +		diff boundary.c >actual &&
> +	test_grep "^-OLD5" actual &&
> +	test_grep "^-OLD6" actual &&
> +	test_grep "^+NEW5" actual &&
> +	test_grep "^+NEW6" actual &&
> +	test_grep ! "^-OLD9" actual &&
> +	test_grep ! "^-OLD10" actual &&
> +	test_grep ! "^+NEW9" actual &&
> +	test_grep ! "^+NEW10" actual
> +'
> +
> +test_expect_success PYTHON 'diff process works with new file' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff -- newfile.c >actual 2>stderr &&
> +	test_grep "return 99" actual &&
> +	test_grep "pathname=newfile.c" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process works with added file (empty old side)' '
> +	cat >added.c <<-\EOF &&
> +	int added(void) { return 1; }
> +	EOF
> +	git add added.c &&
> +
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff --cached -- added.c >actual 2>stderr &&
> +	test_grep "added" actual &&
> +	test_grep "pathname=added.c" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process skipped for binary files' '
> +	printf "\\0binary" >binary.c &&
> +	git add binary.c &&
> +	git commit -m "add binary" &&
> +	printf "\\0changed" >binary.c &&
> +
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff -- binary.c >actual &&
> +	test_grep "Binary files" actual &&
> +	test_path_is_missing backend.log
> +'
> +
> +test_expect_success PYTHON 'diff process not consulted for unmatched driver' '
> +	echo "not tracked by cdiff" >unmatched.txt &&
> +	git add unmatched.txt &&
> +	git commit -m "add unmatched.txt" &&
> +
> +	echo "modified" >unmatched.txt &&
> +
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff -- unmatched.txt >actual &&
> +	test_grep "modified" actual &&
> +	test_path_is_missing backend.log
> +'
> +
> +test_expect_success PYTHON 'multiple drivers use separate processes' '
> +	echo "*.h diff=hdiff" >>.gitattributes &&
> +	git add .gitattributes &&
> +
> +	cat >multi.h <<-\EOF &&
> +	int header(void) { return 1; }
> +	EOF
> +	git add multi.h &&
> +	git commit -m "add multi.h" &&
> +
> +	cat >multi.h <<-\EOF &&
> +	int header(void) { return 2; }
> +	EOF
> +
> +	rm -f backend-c.log backend-h.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend-c.log" \
> +	    -c diff.hdiff.process="$BACKEND --log=backend-h.log" \
> +		diff -- newfile.c multi.h >actual 2>stderr &&
> +	test_grep "pathname=newfile.c" backend-c.log &&
> +	test_grep "pathname=multi.h" backend-h.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process works alongside textconv' '
> +	write_script uppercase-filter <<-\EOF &&
> +	tr "a-z" "A-Z" <"$1"
> +	EOF
> +
> +	cat >textconv.c <<-\EOF &&
> +	hello world
> +	EOF
> +	git add textconv.c &&
> +	git commit -m "add textconv.c" &&
> +
> +	cat >textconv.c <<-\EOF &&
> +	goodbye world
> +	EOF
> +
> +	rm -f backend.log &&
> +	git -c diff.cdiff.textconv="./uppercase-filter" \
> +	    -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff -- textconv.c >actual 2>stderr &&
> +	# The diff process receives textconv-transformed (uppercase) content.
> +	test_grep "pathname=textconv.c" backend.log &&
> +	test_grep "old=HELLO WORLD" backend.log &&
> +	test_grep "new=GOODBYE WORLD" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +#
> +# Downstream features: word diff, log, equivalent files, exit code.
> +#
> +
> +test_expect_success PYTHON 'diff process with --word-diff' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff --word-diff worddiff.c >actual 2>stderr &&
> +	test_grep "\[-1;-\]" actual &&
> +	test_grep "{+999;+}" actual &&
> +	test_grep "pathname=worddiff.c" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process works with git log -p' '
> +	# With no-hunks mode, the tool says the files are equivalent,
> +	# so log -p should show the commit but no diff content.
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --mode=no-hunks --log=backend.log" \
> +		log -1 -p -- logtest.c >actual 2>stderr &&
> +	test_grep "change logtest.c" actual &&
> +	test_grep ! "return 2" actual &&
> +	test_grep "command=hunks pathname=logtest.c" backend.log &&
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process no hunks suppresses diff output' '
> +	cat >nohunks.c <<-\EOF &&
> +	int zero(void) { return 0; }
> +	EOF
> +	git add nohunks.c &&
> +	git commit -m "add nohunks.c" &&
> +
> +	cat >nohunks.c <<-\EOF &&
> +	int zero(void) { return 999; }
> +	EOF
> +
> +	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> +		diff nohunks.c >actual &&
> +	test_must_be_empty actual
> +'
> +
> +test_expect_success PYTHON 'diff process no hunks with --exit-code returns success' '
> +	git -c diff.cdiff.process="$BACKEND --mode=no-hunks" \
> +		diff --exit-code nohunks.c
> +'
> +
> +test_expect_success PYTHON 'diff process with --exit-code and hunks returns failure' '
> +	test_expect_code 1 git -c diff.cdiff.process="$BACKEND" \
> +		diff --exit-code newfile.c
> +'
> +
> +#
> +# Bypass mechanisms: flags and commands that skip the diff process.
> +#
> +
> +test_expect_success PYTHON 'diff process bypassed by --diff-algorithm' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff --diff-algorithm=patience worddiff.c >actual &&
> +	test_grep "return 999" actual &&
> +	test_path_is_missing backend.log
> +'
> +
> +test_expect_success PYTHON 'diff process not used by --stat' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --log=backend.log" \
> +		diff --stat worddiff.c >actual &&
> +	test_grep "worddiff.c" actual &&
> +	test_path_is_missing backend.log
> +'
> +
> +#
> +# Error handling and fallback.
> +#
> +
> +test_expect_success PYTHON 'diff process fallback on tool error status' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> +		diff boundary.c >actual 2>stderr &&
> +	# Fallback produces the full builtin diff (both change regions).
> +	test_grep "^-OLD5" actual &&
> +	test_grep "^+NEW5" actual &&
> +	test_grep "^-OLD9" actual &&
> +	test_grep "^+NEW9" actual &&
> +	# Tool was contacted (it replied with error, not crash).
> +	test_grep "command=hunks pathname=boundary.c" backend.log &&
> +	test_grep "diff process.*failed" stderr
> +'
> +
> +test_expect_success PYTHON 'diff process error keeps tool available for next file' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --mode=error --log=backend.log" \
> +		diff -- one.c two.c >actual 2>stderr &&
> +	# Unlike abort, error keeps the tool available: both files
> +	# are sent to the tool (and both fall back).
> +	test_grep "pathname=one.c" backend.log &&
> +	test_grep "pathname=two.c" backend.log &&
> +	test_grep "return 10" actual &&
> +	test_grep "return 20" actual
> +'
> +
> +test_expect_success PYTHON 'diff process abort disables for session' '
> +	rm -f backend.log &&
> +	git -c diff.cdiff.process="$BACKEND --mode=abort --log=backend.log" \
> +		diff -- one.c two.c >actual &&
> +	# Both files should still produce diff output via fallback.
> +	test_grep "return 10" actual &&
> +	test_grep "return 20" actual &&
> +	# The tool aborts on the first file and git clears its
> +	# capability.  The second file never contacts the tool.
> +	test_grep "pathname=one.c" backend.log &&
> +	test_grep ! "pathname=two.c" backend.log
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on tool crash' '
> +	git -c diff.cdiff.process="$BACKEND --mode=crash" \
> +		diff boundary.c >actual 2>stderr &&
> +	test_grep "^-OLD5" actual &&
> +	test_grep "^+NEW5" actual &&
> +	test_grep "^-OLD9" actual &&
> +	test_grep "^+NEW9" actual &&
> +	# Crash is a communication failure, so a warning is emitted.
> +	test_grep "diff process.*failed" stderr
> +'
> +
> +test_expect_success PYTHON 'diff process startup failure only warns once' '
> +	git -c diff.cdiff.process="/nonexistent/tool" \
> +		diff -- one.c two.c >actual 2>stderr &&
> +	# Both files produce diff output via fallback.
> +	test_grep "return 10" actual &&
> +	test_grep "return 20" actual &&
> +	# Sentinel prevents repeated warnings: only one, not one per file.
> +	test_grep "diff process.*failed" stderr >warnings &&
> +	test_line_count = 1 warnings
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on bad hunks' '
> +	git -c diff.cdiff.process="$BACKEND --mode=bad-hunk" \
> +		diff boundary.c >actual 2>stderr &&
> +	test_grep "^-OLD5" actual &&
> +	test_grep "^+NEW5" actual &&
> +	test_grep "^-OLD9" actual &&
> +	test_grep "^+NEW9" actual &&
> +	# Invalid hunks are caught by xdiff validation, not the
> +	# protocol layer, so no warning is emitted.
> +	test_must_be_empty stderr
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on mismatched unchanged totals' '
> +	cat >synctest.c <<-\EOF &&
> +	line1
> +	line2
> +	line3
> +	EOF
> +	git add synctest.c &&
> +	git commit -m "add synctest.c" &&
> +
> +	cat >synctest.c <<-\EOF &&
> +	line1
> +	changed
> +	line3
> +	EOF
> +
> +	# bad-sync reports hunk 1 2 1 1: marks 2 old lines and 1 new
> +	# line as changed, leaving 1 unchanged old vs 2 unchanged new.
> +	# The synchronization invariant fails and git falls back.
> +	git -c diff.cdiff.process="$BACKEND --mode=bad-sync" \
> +		diff synctest.c >actual 2>stderr &&
> +	test_grep "changed" actual
> +'
> +
> +test_expect_success PYTHON 'diff process fallback on overlapping hunks' '
> +	# boundary.c has 10 lines, so both hunks are in bounds
> +	# but they overlap at lines 3-5, triggering the ordering check.
> +	git -c diff.cdiff.process="$BACKEND --mode=overlap" \
> +		diff boundary.c >actual 2>stderr &&
> +	test_grep "NEW5" actual
> +'
> +
> +test_done
> diff --git a/userdiff.h b/userdiff.h
> index 51c26e0d41..a98eabe377 100644
> --- a/userdiff.h
> +++ b/userdiff.h
> @@ -3,6 +3,7 @@
>  
>  #include "notes-cache.h"
>  
> +struct diff_subprocess;
>  struct index_state;
>  struct repository;
>  
> @@ -33,6 +34,8 @@ struct userdiff_driver {
>  	int textconv_want_cache;
>  	const char *process;
>  	char *process_owned;
> +	struct diff_subprocess *diff_subprocess;
> +	unsigned diff_process_failed : 1;
>  };
>  enum userdiff_driver_type {
>  	USERDIFF_DRIVER_TYPE_BUILTIN = 1<<0,
> -- 
> gitgitgadget
> 
> 
> 

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox