Git development

Git development
 help / color / mirror / Atom feed

* [PATCH v2 2/3] Documentation/MyFirstContribution: recommend the use of b4
From: Patrick Steinhardt @ 2026-06-03  6:59 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones
In-Reply-To: <20260603-pks-b4-v2-0-a8aea0aa2c23@pks.im>

The b4 tool originates from the Linux kernel community and is intended
to help mailing-list based workflows. It automates a lot of the annoying
bookkeeping tasks that contributors typically need to do: tracking the
list of recipients, Message-IDs, range-diffs and the like. In addition
to that, b4 also has many other subcommands that help the maintainer and
reviewers.

The Git project uses the same infrastructure as the kernel, so this tool
is also a very good fit for us. Adapt "MyFirstContribution" to
explicitly recommend its use.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/MyFirstContribution.adoc | 92 ++++++++++++++++++++++++++++++++--
 Documentation/SubmittingPatches        |  6 ++-
 2 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc
index 069020196c..fc0b06ae67 100644
--- a/Documentation/MyFirstContribution.adoc
+++ b/Documentation/MyFirstContribution.adoc
@@ -833,7 +833,7 @@ This patchset is part of the MyFirstContribution tutorial and should not
 be merged.
 ----
 
-At this point the tutorial diverges, in order to demonstrate two
+At this point the tutorial diverges, in order to demonstrate three
 different methods of formatting your patchset and getting it reviewed.
 
 The first method to be covered is GitGitGadget, which is useful for those
@@ -845,9 +845,14 @@ more fine-grained control over the emails to be sent. This method requires some
 setup which can change depending on your system and will not be covered in this
 tutorial.
 
+The third method to be covered is `b4`, which builds on top of `git
+format-patch` and `git send-email`. This method is the recommended way to
+submit patches via mail as it automates a lot of the bookkeeping required by
+`git send-email`.
+
 Regardless of which method you choose, your engagement with reviewers will be
-the same; the review process will be covered after the sections on GitGitGadget
-and `git send-email`.
+the same; the review process will be covered after the sections on GitGitGadget,
+`git send-email` and `b4`.
 
 [[howto-ggg]]
 == Sending Patches via GitGitGadget
@@ -1296,6 +1301,87 @@ index 88f126184c..38da593a60 100644
 2.21.0.392.gf8f6787159e-goog
 ----
 
+[[howto-b4]]
+== Sending Patches with `b4`
+
+`b4` is a tool that builds on top of `git format-patch` and `git send-email`.
+It automates much of the bookkeeping involved in sending a patch series to a
+mailing-list-based project.
+
+Refer to the https://b4.docs.kernel.org/[b4 documentation] for a full reference.
+
+[[prep-b4]]
+=== Preparing a Patch Series
+
+`b4` tracks your patch series as a branch. To start tracking the `psuh` branch
+you have been working on, run:
+
+----
+$ b4 prep --enroll master
+----
+
+This enrolls the current branch, using `master` as the base of the topic. `b4`
+manages the cover letter as part of the branch, so you can edit it at any time
+with:
+
+----
+$ b4 prep --edit-cover
+----
+
+The cover letter not only tracks the content of the top-level mail, but also
+the set of recipients. You can add recipients by adding `To:` and `Cc:`
+trailer lines.
+
+[[send-b4]]
+=== Sending the Patches
+
+Before sending the series out for real, you can inspect what `b4` would send by
+passing `--dry-run`:
+
+----
+$ b4 send --dry-run
+----
+
+Once you are happy with the result, send the series with:
+
+----
+$ b4 send
+----
+
+[[v2-b4]]
+=== Sending v2
+
+When you are ready to send a new iteration of your series, refine your
+patches as usual using linkgit:git-rebase[1]. Note that you typically want to
+rebase on top of the cover letter. You can configure an alias to enable easy
+rebases going forward:
+
+---
+$ git config set alias.b4-rebase 'rebase "HEAD^{/--- b4-submit-tracking ---}"'
+$ git b4-rebase -i
+---
+
+Before sending out the new version you should also update the cover letter with
+`b4 prep --edit-cover` to note the relevant changes compared to the previous
+version. You can inspect the changes between the two versions with `b4 prep
+--compare-to=v1`.
+
+Same as with the first version, you can use `b4 send` to send out the second
+version. `b4` automatically bumps the version to `v2`, generates the range-diff
+against the previous iteration, and threads the new series as a reply to the
+cover letter of the first version.
+
+[[configure-b4]]
+=== Configure b4
+
+`b4` can be configured via linkgit:git-config[1]. In addition to that, projects
+can have their own set of defaults in `.b4-config` in the root tree, which also
+uses Git's config format. The user's configuration always takes precedence over
+the per-project defaults.
+
+Refer to the https://b4.docs.kernel.org/en/latest/config.html[b4 config documentation]
+for more information on the available options.
+
 [[now-what]]
 == My Patch Got Emailed - Now What?
 
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index d570184ec8..99427e1ee1 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -573,8 +573,10 @@ your existing e-mail client (often optimized for "multipart/*" MIME
 type e-mails) might render your patches unusable.
 
 NOTE: Here we outline the procedure using `format-patch` and
-`send-email`, but you can instead use GitGitGadget to send in your
-patches (see link:MyFirstContribution.html[MyFirstContribution]).
+`send-email`, but you can instead use GitGitGadget or `b4` to send in
+your patches (see link:MyFirstContribution.html[MyFirstContribution]).
+Contributors are encouraged to use `b4`, which automates much of the
+bookkeeping that is otherwise done by hand.
 
 People on the Git mailing list need to be able to read and
 comment on the changes you are submitting.  It is important for

-- 
2.54.0.1064.gd145956f57.dirty


^ permalink raw reply related

* [PATCH v2 1/3] Documentation/MyFirstContribution: recommend shallow threading
From: Patrick Steinhardt @ 2026-06-03  6:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones
In-Reply-To: <20260603-pks-b4-v2-0-a8aea0aa2c23@pks.im>

The "MyFirstContribution" document recommends the use of deep threading:
every cover letter of subsequent iterations shall be linked to the cover
letter of the preceding version. The result of this is that eventually,
threads with many versions are getting nested so deep that it becomes
hard to follow.

Adapt the recommendation to instead propose shallow threading: instead
of linking the cover letter to the previous cover letter, the user is
supposed to always link it to the first cover letter. This still makes
it easy to follow the iterations, but has the benefit of nesting to a
much shallower level.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/MyFirstContribution.adoc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/MyFirstContribution.adoc b/Documentation/MyFirstContribution.adoc
index b9fdefce02..069020196c 100644
--- a/Documentation/MyFirstContribution.adoc
+++ b/Documentation/MyFirstContribution.adoc
@@ -1227,8 +1227,8 @@ Message-ID: <foo.12345.author@example.com>

 Your Message-ID is `<foo.12345.author@example.com>`. This example will be used
 below as well; make sure to replace it with the correct Message-ID for your
-**previous cover letter** - that is, if you're sending v2, use the Message-ID
-from v1; if you're sending v3, use the Message-ID from v2.
+**first cover letter** - that is, for any subsequent version that you send,
+always use the Message-ID from v1.

 While you're looking at the email, you should also note who is CC'd, as it's
 common practice in the mailing list to keep all CCs on a thread. You can add

-- 
2.54.0.1064.gd145956f57.dirty

^ permalink raw reply related

* [PATCH v2 0/3] Documentation: recommend the use of b4
From: Patrick Steinhardt @ 2026-06-03  6:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Tuomas Ahola, Weijie Yuan, Ramsay Jones
In-Reply-To: <20260602-pks-b4-v1-0-a7ae5a49e9cf@pks.im>

Hi,

this small patch series wires up b4 in Git and recommends the use
thereof via "MyFirstContribution", as discussed in [1].

Changes in v2:
  - Reorder commits so that the b4 docs are added first.
  - Add a section that highlights how to configure b4, and that points
    out that the per-project defaults can be overridden via Git
    configuration.
  - Add a patch to MyFirstContribution that recommends shallow
    threading. I mostly intend this to be a discussion starter so that
    the `.b4-config` file matches our preferred threading style.
  - Fix a typo.
  - Link to v1: https://patch.msgid.link/20260602-pks-b4-v1-0-a7ae5a49e9cf@pks.im

Thanks!

Patrick

[1]: <xmqqik81xpqx.fsf@gitster.g>

---
Patrick Steinhardt (3):
      Documentation/MyFirstContribution: recommend shallow threading
      Documentation/MyFirstContribution: recommend the use of b4
      b4: introduce configuration for the Git project

 .b4-config                             |  6 +++
 .b4-cover-template                     | 11 ++++
 Documentation/MyFirstContribution.adoc | 96 ++++++++++++++++++++++++++++++++--
 Documentation/SubmittingPatches        |  6 ++-
 4 files changed, 112 insertions(+), 7 deletions(-)

Range-diff versus v1:

-:  ---------- > 1:  359ce9ec24 Documentation/MyFirstContribution: recommend shallow threading
2:  55fffeb8f8 ! 2:  ce9aa56846 Documentation/MyFirstContribution: recommend the use of b4
    @@ Documentation/MyFirstContribution.adoc: index 88f126184c..38da593a60 100644
     +version. `b4` automatically bumps the version to `v2`, generates the range-diff
     +against the previous iteration, and threads the new series as a reply to the
     +cover letter of the first version.
    ++
    ++[[configure-b4]]
    ++=== Configure b4
    ++
    ++`b4` can be configured via linkgit:git-config[1]. In addition to that, projects
    ++can have their own set of defaults in `.b4-config` in the root tree, which also
    ++uses Git's config format. The user's configuration always takes precedence over
    ++the per-project defaults.
    ++
    ++Refer to the https://b4.docs.kernel.org/en/latest/config.html[b4 config documentation]
    ++for more information on the available options.
     +
      [[now-what]]
      == My Patch Got Emailed - Now What?
1:  0fe6cf8511 ! 3:  e2bf7b6e46 b4: introduce configuration for the Git project
    @@ Commit message
         b4: introduce configuration for the Git project
     
         We're about to extend our documentation to recommend b4 for sending
    -    patch series ot the mailing list. Prepare for this by introducing a b4
    +    patch series to the mailing list. Prepare for this by introducing a b4
         configuration so that the tool knows to honor our preferences. For now,
         this configuration does two things:
     
    @@ Commit message
         forward, like for example auto-configuration of folks to Cc on certain
         patches. But these two tweaks feel like a good place to start.
     
    +    Note that these values only serve as defaults, and users may want to
    +    tweak those defaults based on their own preference. Luckily, users can
    +    do that without having to touch `.b4-config` at all, as b4 allows them
    +    to override values via Git configuration:
    +
    +        ```
    +        $ git config set b4.prep-cover-template /does/not/exist
    +        $ b4 send --dry-run
    +        ERROR: prep-cover-template says to use x, but it does not exist
    +        ```
    +
    +    So this gives users an easy way to override our defaults without having
    +    to touch ".b4-config", which would dirty the tree.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## .b4-config (new) ##
     @@
    ++# Note that these are default values that you can tweak via the typical
    ++# git-config(1) machinery. You thus shouldn't ever have to change this file.
    ++# See also https://b4.docs.kernel.org/en/latest/config.html.
     +[b4]
     +send-same-thread = shallow
     +prep-cover-template = ./.b4-cover-template

---
base-commit: 9ac3f193c05c2237e2b14ebaa1149e9fc8a1abe0
change-id: 20260602-pks-b4-31cc20d7f84b


^ permalink raw reply

* Re: [PATCH 1/2] b4: introduce configuration for the Git project
From: Patrick Steinhardt @ 2026-06-03  6:55 UTC (permalink / raw)
  To: Weijie Yuan; +Cc: Tuomas Ahola, git, Junio C Hamano
In-Reply-To: <ah-Nhr2PboWUq6eU@wyuan.org>

On Wed, Jun 03, 2026 at 10:12:22AM +0800, Weijie Yuan wrote:
> On Tue, Jun 02, 2026 at 08:09:55PM +0300, Tuomas Ahola wrote:
> > Huh?  Doesn't MyFirstContribution speak *against* shallow threading?
> >
> > 	        [...]  make sure to replace it with the correct Message-ID for your
> > 	**previous cover letter** - that is, if you're sending v2, use the Message-ID
> > 	from v1; if you're sending v3, use the Message-ID from v2.
> 
> I don't get it. Doesn't shallow threading means every following patches
> are replying to the cover letter? Replying to the previous one is
> --chain-reply-to, if I'm not mistaken.

Shallow threading basically means that all patches are sent as a
response to the current cover letter, and the current cover letter is
always attached to the cover letter of the _first_ version.

So this quote is definitely at odds with the configuration I have
proposed. It's actually quite surprising to me that we recommend deep
threading -- I personally find it extremely hard to navigate as the
nesting eventually gets way too deep.

You know -- I'll include a patch that changes the wording there to also
use shallow nesting, mostly to kick off a discussion and arrive at a
decision there.

Thanks!

Patrick

^ permalink raw reply

* Re: [PATCH 2/2] Documentation/MyFirstContribution: recommend the use of b4
From: Patrick Steinhardt @ 2026-06-03  6:54 UTC (permalink / raw)
  To: Weijie Yuan; +Cc: git, Junio C Hamano
In-Reply-To: <ah8ALHMDVA2Gzz10@wyuan.org>

On Wed, Jun 03, 2026 at 12:09:16AM +0800, Weijie Yuan wrote:
> > +Contributors are encouraged to use `b4`, which automates much of the
> > +bookkeeping that is otherwise done by hand.
> 
> So for statement like this and with my personal experience, I would say
> b4 is a more suitable option for senior contributors, as they already
> know, for example, what Message-ID and range-diffs are. But apparently,
> whose who use forges may not know.

I think it's perfectly suitable for newcomers, too. It automates so many
of the concepts that a contributor has to learn way less about mailing
list specific concepts, which reduces the learning curve.

> Back to the patch, I think regarding b4 as a more advanced contribution
> way for those who had contributed via mailing lists for more than one
> time is a better expression or formulation. Here I mean "b4 prep", other
> usage like "b4 mbox" and "b4 am" are of course more basic, and be
> mentioned as tips when interacting with Git mailing list.
> 
> A bit too wordy, in conclusion: Suggest that new contributors master
> classic git operations first. When they are familiar with those process,
> b4 might be a good option.

Ah, that's what you're hinting at. So you mean to say that folks should
first understand the basics before basically automating all of the parts
for them?

I guess I can see where you're coming from, but I'm not sure I agree
with this a 100%. My main goal is to make it easier for new community
members to contribute to Git, and that means that we should automate all
the hard parts as far as possible. This saves those new contributors
from frustration, and it means that reviewers on the mailing list won't
have to teach every single new contributor about how they should thread
the mails, generate range-diffs and the like.

So in the end, it saves both their and our time, but the learning
opportunity is of course a bit diminished. I'd gladly accept that
tradeoff though.

Thanks for your input!

Patrick

^ permalink raw reply

* Re: [PATCH 1/2] b4: introduce configuration for the Git project
From: Patrick Steinhardt @ 2026-06-03  6:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ramsay Jones, git
In-Reply-To: <871peopbvf.fsf@gitster.g>

On Wed, Jun 03, 2026 at 11:59:48AM +0900, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
> > On 02/06/2026 2:32 pm, Junio C Hamano wrote:
> >> Patrick Steinhardt <ps@pks.im> writes:
> >> 
> >>> We're about to extend our documentation to recommend b4 for sending
> >>> patch series ot the mailing list. Prepare for this by introducing a b4
> >>> configuration so that the tool knows to honor our preferences. For now,
> >>> this configuration does two things:
> >>> ...
> >> (hence making the tree dirty).
> >
> > Hmm, for those of us not in the know, perhaps mention the b4 documentation
> > at 'b4.docs.kernel.org' (which includes how to install b4 ... ;) ).
> 
> Thanks for raising an excellent point.

I already refer to the docs in the second commit. Let me maybe reorder
them so that we first show how it's used before tweaking it.

Patrick

^ permalink raw reply

* [PATCH v2 4/4] t: let prove fail when parsing invalid TAP output
From: Patrick Steinhardt @ 2026-06-03  5:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano
In-Reply-To: <20260603-pks-t7527-fix-tap-output-v2-0-cf3af5694e20@pks.im>

To make the result of our tests accessible we use the TAP protocol. This
protocol is parsed by either prove or by Meson. Unfortunately, these two
tools differ when it comes to their strictness when parsing the
protocol:

  - Prove by default happily accepts lines not specified by the
    protocol.

  - Meson will also accept such lines, but prints a big and ugly warning
    message.

We have fixed our test suite in the past to not print invalid TAP lines
anymore via b1dc2e796e (Merge branch 'ps/meson-tap-parse', 2025-06-17).
But as none of our tools perform a strict check it's still possible for
broken tests to sneak back in, like for example in 362f69547f (Merge
branch 'ps/t1006-tap-fix', 2025-07-16). This doesn't hurt at all when
using prove, but it's quite annoying when using Meson due to the
generated warnings.

Unfortunately, there doesn't seem to be a portable way to make all tools
complain about violations of the TAP format. The TAP 14 specification
has added pragmas to the protocol that would allow us to say `pragma
+strict`, and the effect of that would be to treat invalid TAP lines as
a test failure. But the release of TAP 14 is still rather recent, and
Test-Harness for example only gained support for it in version 3.48,
which was released in 2023.

In fact though, this pragma was already introduced as an inofficial
extension of the TAP protocol with Test-Harness 3.10, released in 2008.
So while not all tools understand the pragma, at least prove does for a
long time.

Unconditionally enable the pragma when using prove so that we'll detect
tests that emit broken TAP output right away. This would have detected
the issues fixed in preceding commits:

    $ prove t7527-builtin-fsmonitor.sh
    t7527-builtin-fsmonitor.sh .. All 69 subtests passed
            (less 6 skipped subtests: 63 okay)

    Test Summary Report
    -------------------
    t7527-builtin-fsmonitor.sh (Wstat: 0 Tests: 69 Failed: 0)
      Parse errors: Unknown TAP token: "Initialized empty Git repository in /tmp/git/test_fsmonitor_smoke/.git/"

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/test-lib.sh | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index d1d24c4124..ceefb99bff 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1532,6 +1532,12 @@ then
 	BAIL_OUT 'You need to build test-tool; Run "make t/helper/test-tool" in the source (toplevel) directory'
 fi

+if test -n "$HARNESS_ACTIVE"
+then
+	say "TAP version 13"
+	say "pragma +strict"
+fi
+
 # Are we running this test at all?
 remove_trash=
 this_test=${0##*/}

-- 
2.54.0.1064.gd145956f57.dirty

^ permalink raw reply related

* [PATCH v2 3/4] t/lib-git-p4: silence output when killing p4d and its watchdog
From: Patrick Steinhardt @ 2026-06-03  5:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano
In-Reply-To: <20260603-pks-t7527-fix-tap-output-v2-0-cf3af5694e20@pks.im>

When stopping the p4d watchdog process via "kill -9", the shell may
print a job-control notification like:

  ./test-lib.sh: line 1269: 57960 Killed: 9               while true; do
      if test $nr_tries_left -eq 0; then
          kill -9 $p4d_pid; exit 1;
      fi; sleep 1; nr_tries_left=$(($nr_tries_left - 1));
  done 2> /dev/null 4>&2  (wd: ~)

This message is printed asynchronously by the shell when it reaps the
process. While harmless right now, this will cause breakage once we
enable strict parsing of the TAP protocol in a subsequent commit.

Fix this by using `wait` so that we can synchronously reap the watchdog
process and swallow the diagnostic.

While at it, deduplicate the logic we have in `stop_p4d_and_watchdog ()`
and `stop_and_cleanup_p4d ()`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/lib-git-p4.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/lib-git-p4.sh b/t/lib-git-p4.sh
index d22e9c684a..9108868187 100644
--- a/t/lib-git-p4.sh
+++ b/t/lib-git-p4.sh
@@ -65,6 +65,7 @@ pidfile="$TRASH_DIRECTORY/p4d.pid"
 
 stop_p4d_and_watchdog () {
 	kill -9 $p4d_pid $watchdog_pid
+	wait $p4d_pid $watchdog_pid 2>/dev/null
 }
 
 # git p4 submit generates a temp file, which will
@@ -174,8 +175,7 @@ retry_until_success () {
 }
 
 stop_and_cleanup_p4d () {
-	kill -9 $p4d_pid $watchdog_pid
-	wait $p4d_pid
+	stop_p4d_and_watchdog
 	rm -rf "$db" "$cli" "$pidfile"
 }
 

-- 
2.54.0.1064.gd145956f57.dirty


^ permalink raw reply related

* [PATCH v2 2/4] t/test-lib: silence EBUSY errors on Windows during test cleanup
From: Patrick Steinhardt @ 2026-06-03  5:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano
In-Reply-To: <20260603-pks-t7527-fix-tap-output-v2-0-cf3af5694e20@pks.im>

When tests have finished we clean up the trash directory via `rm -rf`.
On Windows this can fail with EBUSY in cases where a process still holds
some of the files open, for example when we have spawned a daemonized
process that wasn't properly terminated. We thus retry several times,
but every failure will result in error messages being printed, and that
in turn breaks the TAP output format.

One such case where this is causing issues is in t921x, which contains
tests related to Scalar. Some tests spawn the fsmonitor daemon, and we
never properly terminate it.

The obvious fix would be to ensure that we never leak any processes, but
that gets ugly fast. Instead, let's work around the issue by silencing
error messages printed by the `rm -rf` calls. We already know to print
an error when the retry loop fails, so we don't loose much.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/test-lib.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 4a7357b547..d1d24c4124 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1299,10 +1299,10 @@ test_done () {
 			error "Tests passed but trash directory already removed before test cleanup; aborting"

 			cd "$TRASH_DIRECTORY/.." &&
-			rm -fr "$TRASH_DIRECTORY" || {
+			rm -fr "$TRASH_DIRECTORY" 2>/dev/null || {
 				# try again in a bit
 				sleep 5;
-				rm -fr "$TRASH_DIRECTORY"
+				rm -fr "$TRASH_DIRECTORY" 2>/dev/null
 			} ||
 			error "Tests passed but test cleanup failed; aborting"
 		fi

-- 
2.54.0.1064.gd145956f57.dirty

^ permalink raw reply related

* [PATCH v2 1/4] t7527: fix broken TAP output
From: Patrick Steinhardt @ 2026-06-03  5:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano
In-Reply-To: <20260603-pks-t7527-fix-tap-output-v2-0-cf3af5694e20@pks.im>

Before running the tests in t7527 we first verify whether the fsmonitor
even works, which seems to depend on the actual filesystem that is in
use. The verification executes outside of any prerequisite or test body,
so its stdout/stderr is not being redirected.

The consequence of this is that any command that prints to stdout/stderr
may break the TAP specification by printing invalid lines. And in fact
we already do that, as git-init(1) prints the path to the created Git
repository by default.

Fix this issue by moving the logic into a lazy prerequisite.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t7527-builtin-fsmonitor.sh | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/t/t7527-builtin-fsmonitor.sh b/t/t7527-builtin-fsmonitor.sh
index b63c162f9b..d881e27466 100755
--- a/t/t7527-builtin-fsmonitor.sh
+++ b/t/t7527-builtin-fsmonitor.sh
@@ -25,7 +25,8 @@ maybe_timeout () {
 		"$@"
 	fi
 }
-verify_fsmonitor_works () {
+
+test_lazy_prereq FSMONITOR_WORKS '
 	git init test_fsmonitor_smoke || return 1

 	GIT_TRACE_FSMONITOR="$PWD/smoke.trace" &&
@@ -50,9 +51,9 @@ verify_fsmonitor_works () {
 	ret=$?
 	rm -rf test_fsmonitor_smoke smoke.trace
 	return $ret
-}
+'

-if ! verify_fsmonitor_works
+if ! test_have_prereq FSMONITOR_WORKS
 then
 	skip_all="filesystem does not deliver fsmonitor events (container/overlayfs?)"
 	test_done

-- 
2.54.0.1064.gd145956f57.dirty

^ permalink raw reply related

* [PATCH v2 0/4] t: fix broken TAP output
From: Patrick Steinhardt @ 2026-06-03  5:39 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano
In-Reply-To: <20260602-pks-t7527-fix-tap-output-v1-0-db3da2a1b137@pks.im>

Hi,

this small patch series fixes another instance of broken TAP output that
has landed via 4d11b9c218 (Merge branch 'pt/fsmonitor-linux', 2026-05-31).

As this has happened multiple times by now I decided to have a look at
whether we can fix this class of issues a bit more holistically. So this
series also contains a change that makes prove bail out when it sees
invalid TAP output, which uncovers a small set of preexisting issues in
our test suite.

Changes in v2:
  - Fix waiting for p4d, and deduplicate the logic that does this.
  - Link to v1: https://patch.msgid.link/20260602-pks-t7527-fix-tap-output-v1-0-db3da2a1b137@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (4):
      t7527: fix broken TAP output
      t/test-lib: silence EBUSY errors on Windows during test cleanup
      t/lib-git-p4: silence output when killing p4d and its watchdog
      t: let prove fail when parsing invalid TAP output

 t/lib-git-p4.sh              |  4 ++--
 t/t7527-builtin-fsmonitor.sh |  7 ++++---
 t/test-lib.sh                | 10 ++++++++--
 3 files changed, 14 insertions(+), 7 deletions(-)

Range-diff versus v1:

1:  09977059d1 = 1:  18b4fc7b81 t7527: fix broken TAP output
2:  162d3d42d8 = 2:  8dd921534b t/test-lib: silence EBUSY errors on Windows during test cleanup
3:  4ecb8cb1ce < -:  ---------- t/lib-git-p4: silence output when killing p4d and its watchdog
-:  ---------- > 3:  8b343176fe t/lib-git-p4: silence output when killing p4d and its watchdog
4:  95fb0d07ae = 4:  e69aa0ab79 t: let prove fail when parsing invalid TAP output

---
base-commit: 1666c1265231b0bc5f613fbbf3f0a9896cdef76e
change-id: 20260601-pks-t7527-fix-tap-output-105da1d73df0


^ permalink raw reply

* Re: [PATCH 1/2] b4: introduce configuration for the Git project
From: Junio C Hamano @ 2026-06-03  2:59 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Patrick Steinhardt, git
In-Reply-To: <8dbdb553-9633-46bb-8a51-040d06d0d10e@ramsayjones.plus.com>

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

> On 02/06/2026 2:32 pm, Junio C Hamano wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>> 
>>> We're about to extend our documentation to recommend b4 for sending
>>> patch series ot the mailing list. Prepare for this by introducing a b4
>>> configuration so that the tool knows to honor our preferences. For now,
>>> this configuration does two things:
>>> ...
>> (hence making the tree dirty).
>
> Hmm, for those of us not in the know, perhaps mention the b4 documentation
> at 'b4.docs.kernel.org' (which includes how to install b4 ... ;) ).

Thanks for raising an excellent point.

^ permalink raw reply

* Re: [PATCH 1/2] b4: introduce configuration for the Git project
From: Weijie Yuan @ 2026-06-03  2:12 UTC (permalink / raw)
  To: Tuomas Ahola; +Cc: Patrick Steinhardt, git, Junio C Hamano
In-Reply-To: <20260602170955.Z4b7y%taahol@utu.fi>

On Tue, Jun 02, 2026 at 08:09:55PM +0300, Tuomas Ahola wrote:
> Huh?  Doesn't MyFirstContribution speak *against* shallow threading?
>
> 	        [...]  make sure to replace it with the correct Message-ID for your
> 	**previous cover letter** - that is, if you're sending v2, use the Message-ID
> 	from v1; if you're sending v3, use the Message-ID from v2.

I don't get it. Doesn't shallow threading means every following patches
are replying to the cover letter? Replying to the previous one is
--chain-reply-to, if I'm not mistaken.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
From: Colin Stagner @ 2026-06-03  1:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ian Jackson, git, Christian Heusel, george, Christian Hesse,
	Phillip Wood
In-Reply-To: <xmqqv7c13o5l.fsf@gitster.g>

On 6/1/26 17:13, Junio C Hamano wrote:

> I am tempted to mark the topic as stalled, to be discarded for
> inaction

No objection. I'd still like to see this reviewed, but we can revisit 
this later if interest develops.

> While I do agree that avoiding bash-isms in the main part of Git and
> sticking to vanilla POSIX has merit, this particular one seems more
> like an artificial limit imposed by dash than sticking to the POSIX
> as the common denoninator, at least to me.

Correct, this topic is a workaround for an artificial limit. The limit 
is Debian-specific and was introduced as a downstream patch in 2018 [1], 
[2].

This git-subtree issue has been reported before in

   <CAN7rbOve-EFOGPjr1wrD77r-3RQ+3+qso82_oV5Qud-skobL7w@mail.gmail.com>,

   <26263.63341.878041.155047@chiark.greenend.org.uk>,

and probably other places. These are old reports, and I haven't found 
anyone there still interested in a fix.



Colin


[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=579815

[2]: 
https://sources.debian.org/patches/dash/0.5.12-12/0009-dash-Fix-stack-overflow-from-infinite-recursion-in-s.patch/


^ permalink raw reply

* Re: [PATCH] http: preserve wwwauth_headers across redirects
From: Aaron Plattner @ 2026-06-03  0:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Rahul Rameshbabu
In-Reply-To: <xmqqpl28scll.fsf@gitster.g>

On 6/2/26 5:15 PM, Junio C Hamano wrote:
> Aaron Plattner <aplattner@nvidia.com> writes:
> 
>> diff --git a/http.c b/http.c
>> index ea9b16861b..cac8c9bfc9 100644
>> --- a/http.c
>> +++ b/http.c
>> @@ -2425,7 +2425,21 @@ static int http_request_recoverable(const char *url,
>>   	if (options->effective_url && options->base_url) {
>>   		if (update_url_from_redirect(options->base_url,
>>   					     url, options->effective_url)) {
>> +			struct strvec wwwauth_headers = STRVEC_INIT;
>> +
>> +			/*
>> +			 * Preserve wwwauth_headers across the call to
>> +			 * credential_from_url(): if the effective URL doesn't
>> +			 * specify its own credentials, a credential helper
>> +			 * might need the wwwauth[] array from the server's
>> +			 * redirect response in order to authenticate.
>> +			 */
>> +			strvec_pushv(&wwwauth_headers,
>> +				     http_auth.wwwauth_headers.v);
>>   			credential_from_url(&http_auth, options->base_url->buf);
>> +			strvec_pushv(&http_auth.wwwauth_headers,
>> +				     wwwauth_headers.v);
>> +			strvec_clear(&wwwauth_headers);
>>   			url = options->effective_url->buf;
>>   		}
>>   	}
> 
> As strvec_pushv() makes copies of the strings contained in .v[]
> array, the above will
> 
>   - make a deep copy of http_auth.wwwauth_headers.v[] and store it away
>     in wwwauth_headers.v[];
> 
>   - let credential_from_url() get rid of
>     http_auth.wwwauth_headers.v[] (the original is freed here, but we
>     have a deep copy stashed away safely), and perhaps add some of
>     its own there; then
> 
>   - add what we stashed away back to http_auth.wwwauth_headers.v[].
> 
> So it does not leak and it does not have use-after-free, either,
> which is good, even though it may be a bit inefficient having to
> copy these strings so many times.

I agree, although in the grand scheme wwwauth_headers is a drop in the 
bucket compared to, say, nearly any git object.

I tried to find an easy way to just not clear it in the first place, but 
that doesn't really match what credential_clear() is intended to do.
> I briefly wondered if it is unconditionally adding back the original
> wwwauth_headers always the right thing to do, but I think this is
> good.  In the context of http_request_recovorable(), the redirect
> has already happened, and the request to the redirect target has
> failed with a 401. The wwwauth_headers currently in http_auth were
> populated from this 401 response from the redirect target. Since we
> are updating http_auth's URL to match this redirect target (in order
> to query the helper for the correct host), the headers we currently
> have are the active challenges for this new URL. Thus, they must be
> preserved and passed to the helper.

This matches my understanding and I think it points to a more 
fundamental design issue: wwwauth_headers aren't really credentials at 
all, and maybe they shouldn't be in struct credential in the first 
place. I wonder if it would make sense to encapsulate it in some other 
http-related structure that lives alongside the credentials, presumably 
along with the protocol, host, and path.

> A few design questions that came to my mind are:
> 
>   - Is wwwauth_headers the _only_ thing that needs to be preserved in
>     the existing credential in http_auth?  Will it stay to be the
>     only thing, or will we need to rethink what this patch did in the
>     future when we add such a new member to "struct credential"?
> 
>   - If we need to preserve some other members in "struct credential",
>     or if we add such members to the struct in the future, what would
>     be the recommended way to extend what this patch does to cover?
> 
> If we add new members in the future to store other transient
> response-based authentication state (e.g. Authentication-Info
> headers, or proxy authentication states), they will be wiped by
> credential_from_url() and will need to be preserved the same way,
> no?  This observation and thought experiment may hint that the
> manual save-and-restore approach is not robust against future
> extensions of struct credential.
> 
> The current approach of manually saving and restoring
> wwwauth_headers in http.c creates a tight coupling between the HTTP
> layer and the internals of struct credential. If new transient
> fields are added in the future, developers must remember to update
> http.c to preserve them, which may be error-prone.
> 
> I wonder if it would make the design more robust and future-proof to
> encapsulate this logic in credential.c instead.  For example, we
> could introduce a helper function:
> 
>      void credential_update_url(struct credential *c, const char *url)
> 
> that does what the new code added around credential_from_url() by
> this patch does, perhaps?

Yeah, maybe. I'll think about this design some more.

-- Aaron

^ permalink raw reply

* Re: [PATCH] http: preserve wwwauth_headers across redirects
From: Junio C Hamano @ 2026-06-03  0:15 UTC (permalink / raw)
  To: Aaron Plattner; +Cc: git, Rahul Rameshbabu
In-Reply-To: <20260602161150.1527493-1-aplattner@nvidia.com>

Aaron Plattner <aplattner@nvidia.com> writes:

> diff --git a/http.c b/http.c
> index ea9b16861b..cac8c9bfc9 100644
> --- a/http.c
> +++ b/http.c
> @@ -2425,7 +2425,21 @@ static int http_request_recoverable(const char *url,
>  	if (options->effective_url && options->base_url) {
>  		if (update_url_from_redirect(options->base_url,
>  					     url, options->effective_url)) {
> +			struct strvec wwwauth_headers = STRVEC_INIT;
> +
> +			/*
> +			 * Preserve wwwauth_headers across the call to
> +			 * credential_from_url(): if the effective URL doesn't
> +			 * specify its own credentials, a credential helper
> +			 * might need the wwwauth[] array from the server's
> +			 * redirect response in order to authenticate.
> +			 */
> +			strvec_pushv(&wwwauth_headers,
> +				     http_auth.wwwauth_headers.v);
>  			credential_from_url(&http_auth, options->base_url->buf);
> +			strvec_pushv(&http_auth.wwwauth_headers,
> +				     wwwauth_headers.v);
> +			strvec_clear(&wwwauth_headers);
>  			url = options->effective_url->buf;
>  		}
>  	}

As strvec_pushv() makes copies of the strings contained in .v[]
array, the above will

 - make a deep copy of http_auth.wwwauth_headers.v[] and store it away
   in wwwauth_headers.v[];

 - let credential_from_url() get rid of
   http_auth.wwwauth_headers.v[] (the original is freed here, but we
   have a deep copy stashed away safely), and perhaps add some of
   its own there; then

 - add what we stashed away back to http_auth.wwwauth_headers.v[].

So it does not leak and it does not have use-after-free, either,
which is good, even though it may be a bit inefficient having to
copy these strings so many times.

I briefly wondered if it is unconditionally adding back the original
wwwauth_headers always the right thing to do, but I think this is
good.  In the context of http_request_recovorable(), the redirect
has already happened, and the request to the redirect target has
failed with a 401. The wwwauth_headers currently in http_auth were
populated from this 401 response from the redirect target. Since we
are updating http_auth's URL to match this redirect target (in order
to query the helper for the correct host), the headers we currently
have are the active challenges for this new URL. Thus, they must be
preserved and passed to the helper.

A few design questions that came to my mind are:

 - Is wwwauth_headers the _only_ thing that needs to be preserved in
   the existing credential in http_auth?  Will it stay to be the
   only thing, or will we need to rethink what this patch did in the
   future when we add such a new member to "struct credential"?

 - If we need to preserve some other members in "struct credential",
   or if we add such members to the struct in the future, what would
   be the recommended way to extend what this patch does to cover?

If we add new members in the future to store other transient
response-based authentication state (e.g. Authentication-Info
headers, or proxy authentication states), they will be wiped by
credential_from_url() and will need to be preserved the same way,
no?  This observation and thought experiment may hint that the
manual save-and-restore approach is not robust against future
extensions of struct credential.

The current approach of manually saving and restoring
wwwauth_headers in http.c creates a tight coupling between the HTTP
layer and the internals of struct credential. If new transient
fields are added in the future, developers must remember to update
http.c to preserve them, which may be error-prone.

I wonder if it would make the design more robust and future-proof to
encapsulate this logic in credential.c instead.  For example, we
could introduce a helper function:

    void credential_update_url(struct credential *c, const char *url)

that does what the new code added around credential_from_url() by
this patch does, perhaps?

^ permalink raw reply

* [PATCH v3] index-pack: retain child bases in delta cache
From: Arijit Banerjee via GitGitGadget @ 2026-06-03  0:05 UTC (permalink / raw)
  To: git
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee, Jeff King, Arijit Banerjee, Arijit Banerjee
In-Reply-To: <pull.2131.v2.git.1780330402264.gitgitgadget@gmail.com>

From: Arijit Banerjee <arijit@effectiveailabs.com>

When resolving a delta whose result has children of its own,
index-pack adds the result to work_head, accounts its data in
base_cache_used, and calls prune_base_data(). It then immediately frees
that same data.

This bypasses the existing delta base cache policy and can force later
descendants to reconstruct the queued base again. Let the existing
delta_base_cache_limit pruning policy decide whether to keep or evict
the data instead.

This does not add a new cache or increase the cache limit. The object
data is already accounted in base_cache_used before prune_base_data()
runs. Once all direct children of a base have been dispatched, and no
thread is actively retaining that base for patch_delta(), release the
cached bytes. The base_data struct itself remains alive until the
existing children_remaining bookkeeping says the whole subtree is done.

On a quiet Ubuntu 24.04 VM with 16 vCPUs, 32 GiB RAM, and local SSD,
standard p5302-pack-index.sh runs improved as follows:

  libgit2:  3.17(11.49+0.60)  ->   2.69(10.52+0.28), 15.1% faster
  redis:    5.84(15.56+0.63)  ->   4.95(14.05+0.32), 15.2% faster
  git.git: 11.17(38.04+1.29)  ->   9.67(35.29+0.60), 13.4% faster
  cpython: 32.69(117.85+4.37) ->  28.60(109.25+1.91), 12.5% faster
  linux:  279.22(797.69+40.86) -> 236.34(723.13+19.02), 15.4% faster

The linux p5302 number is from a single repeat; the others are from the
default three repeats.

End-to-end local full-clone spot checks also improved:

  libgit2:  5.00s ->   4.54s, 9.2% faster
  redis:    8.75s ->   7.92s, 9.5% faster
  git.git: 25.04s ->  23.71s, 5.3% faster
  cpython: 56.72s ->  55.94s, 1.4% faster
  linux:  556.17s -> 523.83s, 5.8% faster

t/t5302-pack-index.sh passed, and GitGitGadget's linux-leaks CI also
exercised that test under SANITIZE=leak.

Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
---
    index-pack: retain child bases in delta cache
    
    Speed up the local index-pack phase used by clone/fetch for large
    delta-compressed packs.
    
    When index-pack reconstructs a child base and queues it for resolving
    descendant deltas, it currently frees that data immediately. This can
    force the same base to be reconstructed again. This patch keeps the data
    in the existing delta base cache instead of immediately freeing it.
    
    This does not add a new cache or increase the cache limit. The object
    data is already accounted in base_cache_used, and prune_base_data() is
    already called at this point.
    
    To keep the retained data lifetime precise, the patch also releases
    cached bytes once all direct children of a base have been dispatched and
    no thread is actively retaining that base for patch_delta(). The
    base_data struct itself still stays alive until the existing
    children_remaining bookkeeping says the whole subtree is done.
    
    Changes since v2:
    
     * Addressed Jeff King's review question by releasing cached base data
       after all direct children have been dispatched, while keeping the
       existing subtree bookkeeping intact.
     * Re-ran t/t5302-pack-index.sh, p5302-pack-index.sh, and end-to-end
       full clone spot checks with the precise-release version.
    
    Changes since v1:
    
     * Added benchmark results and leak-safety context to the commit
       message.
     * Included standard p5302-pack-index.sh perf-suite results.
    
    Correctness:
    
     * t/t5302-pack-index.sh passed.
     * GitGitGadget's linux-leaks CI exercises t5302-pack-index.sh under
       SANITIZE=leak.
    
    Benchmarks on a quiet Ubuntu 24.04 VM, 16 vCPU, 32 GiB RAM, local SSD:
    
    Standard p5302-pack-index.sh perf-suite results using
    GIT_PERF_LARGE_REPO=<repo> ./run HEAD~1 HEAD -- p5302-pack-index.sh:
    
    repo HEAD~1 HEAD wall-time change libgit2 3.17(11.49+0.60)
    2.69(10.52+0.28) 15.1% faster redis 5.84(15.56+0.63) 4.95(14.05+0.32)
    15.2% faster git.git 11.17(38.04+1.29) 9.67(35.29+0.60) 13.4% faster
    cpython 32.69(117.85+4.37) 28.60(109.25+1.91) 12.5% faster linux
    279.22(797.69+40.86) 236.34(723.13+19.02) 15.4% faster
    
    The linux p5302 row is from a single repeat; the others use the default
    three repeats. These timings isolate the index-pack phase affected by
    this patch.
    
    End-to-end local full-clone spot checks also showed wins, though these
    timings include both server-side pack-objects and client-side index-pack
    running concurrently over a local file:// transport, so they are not
    isolated index-pack timings.
    
    These runs used git clone --bare --no-local, dropped page cache before
    each clone, and used the matching build's bin-wrappers/git as the client
    plus the matching bin-wrappers/git-upload-pack via --upload-pack.
    
    repo baseline patched wall-time change libgit2 5.00s 4.54s 9.2% faster
    redis 8.75s 7.92s 9.5% faster git.git 25.04s 23.71s 5.3% faster cpython
    56.72s 55.94s 1.4% faster linux 556.17s 523.83s 5.8% faster

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2131%2Farijit91%2Findex-pack-retain-child-base-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2131/arijit91/index-pack-retain-child-base-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/2131

Range-diff vs v2:

 1:  42eca38f51 ! 1:  51967f9edf index-pack: retain child bases in delta cache
     @@ Commit message
      
          This does not add a new cache or increase the cache limit. The object
          data is already accounted in base_cache_used before prune_base_data()
     -    runs, and the existing pruning and base cleanup paths still release it.
     +    runs. Once all direct children of a base have been dispatched, and no
     +    thread is actively retaining that base for patch_delta(), release the
     +    cached bytes. The base_data struct itself remains alive until the
     +    existing children_remaining bookkeeping says the whole subtree is done.
      
          On a quiet Ubuntu 24.04 VM with 16 vCPUs, 32 GiB RAM, and local SSD,
     -    direct index-pack timings on single-pack Linux fixtures improved as
     -    follows:
     +    standard p5302-pack-index.sh runs improved as follows:
      
     -      linux blobless: 69.17s -> 57.98s (16.2% faster), RSS flat
     -      linux full:     280.72s -> 236.32s (15.8% faster), RSS +1.9%
     +      libgit2:  3.17(11.49+0.60)  ->   2.69(10.52+0.28), 15.1% faster
     +      redis:    5.84(15.56+0.63)  ->   4.95(14.05+0.32), 15.2% faster
     +      git.git: 11.17(38.04+1.29)  ->   9.67(35.29+0.60), 13.4% faster
     +      cpython: 32.69(117.85+4.37) ->  28.60(109.25+1.91), 12.5% faster
     +      linux:  279.22(797.69+40.86) -> 236.34(723.13+19.02), 15.4% faster
      
     -    Five-repeat medians on public repositories also improved:
     +    The linux p5302 number is from a single repeat; the others are from the
     +    default three repeats.
      
     -      git.git:  12.31s -> 10.70s (13.1% faster)
     -      libgit2:   3.35s ->  2.88s (14.0% faster)
     -      redis:     6.52s ->  5.64s (13.5% faster)
     -      cpython:  33.02s -> 31.44s (4.8% faster)
     +    End-to-end local full-clone spot checks also improved:
      
     -    The standard p5302 perf test on a smaller git.git fixture was neutral:
     -
     -      5302.9 index-pack default threads:
     -        11.21(38.07+1.33) -> 11.16(37.90+1.31), -0.4%
     +      libgit2:  5.00s ->   4.54s, 9.2% faster
     +      redis:    8.75s ->   7.92s, 9.5% faster
     +      git.git: 25.04s ->  23.71s, 5.3% faster
     +      cpython: 56.72s ->  55.94s, 1.4% faster
     +      linux:  556.17s -> 523.83s, 5.8% faster
      
          t/t5302-pack-index.sh passed, and GitGitGadget's linux-leaks CI also
          exercised that test under SANITIZE=leak.
     @@ Commit message
          Signed-off-by: Arijit Banerjee <arijit@effectiveailabs.com>
      
       ## builtin/index-pack.c ##
     +@@ builtin/index-pack.c: static void free_base_data(struct base_data *c)
     + 	}
     + }
     + 
     ++static int base_data_has_remaining_direct_children(struct base_data *c)
     ++{
     ++	return c->ref_first <= c->ref_last ||
     ++	       c->ofs_first <= c->ofs_last;
     ++}
     ++
     + static void prune_base_data(struct base_data *retain)
     + {
     + 	struct list_head *pos;
     +@@ builtin/index-pack.c: static void *threaded_second_pass(void *data)
     + 		}
     + 
     + 		work_lock();
     +-		if (parent)
     ++		if (parent) {
     + 			parent->retain_data--;
     ++			if (!parent->retain_data &&
     ++			    !base_data_has_remaining_direct_children(parent))
     ++				free_base_data(parent);
     ++		}
     + 
     + 		if (child && child->data) {
     + 			/*
      @@ builtin/index-pack.c: static void *threaded_second_pass(void *data)
       			list_add(&child->list, &work_head);
       			base_cache_used += child->size;


 builtin/index-pack.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index cf0bd8280d..00b4dff419 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -433,6 +433,12 @@ static void free_base_data(struct base_data *c)
 	}
 }
 
+static int base_data_has_remaining_direct_children(struct base_data *c)
+{
+	return c->ref_first <= c->ref_last ||
+	       c->ofs_first <= c->ofs_last;
+}
+
 static void prune_base_data(struct base_data *retain)
 {
 	struct list_head *pos;
@@ -1201,8 +1207,12 @@ static void *threaded_second_pass(void *data)
 		}
 
 		work_lock();
-		if (parent)
+		if (parent) {
 			parent->retain_data--;
+			if (!parent->retain_data &&
+			    !base_data_has_remaining_direct_children(parent))
+				free_base_data(parent);
+		}
 
 		if (child && child->data) {
 			/*
@@ -1212,7 +1222,6 @@ static void *threaded_second_pass(void *data)
 			list_add(&child->list, &work_head);
 			base_cache_used += child->size;
 			prune_base_data(NULL);
-			free_base_data(child);
 		} else if (child) {
 			/*
 			 * This child does not have its own children. It may be

base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d
-- 
gitgitgadget

^ permalink raw reply related

* Re: [PATCH] worktree: record creation time and free-form note
From: Kiesel, Norbert @ 2026-06-03  0:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq1peots9i.fsf@gitster.g>

Yes, I could change my PR to use $GIT_COMMON_DIR/worktrees/$worktree/description
instead of the currently used $GIT_COMMON_DIR/worktrees/$worktree/note.

Give me a day, and I can create the updated diff.

Best,
  Norbert

On Tue, Jun 2, 2026 at 4:52 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Kiesel, Norbert" <norbert.kiesel@creditkarma.com> writes:
>
> > From 130cd5e4a25e6672b2a97268e1100b6ef03fa552 Mon Sep 17 00:00:00 2001
> > From: Norbert Kiesel <norbert.kiesel@creditkarma.com>
> > Date: Mon, 1 Jun 2026 17:03:39 -0700
> > Subject: [PATCH] worktree: record creation time and free-form note
> >
> > Add per-worktree metadata so users can answer "what is this worktree
> > for, and when did I make it?" without resorting to external notes.
>
> Although I am not personally interested in this topic all that much,
> let me point out that we have $GIT_DIR/description file that may be
> useful for something like this.  It has been the canonical place for
> the main repository to identify itself long before secondary worktrees
> were invented and $GIT_COMMON_DIR/worktrees/$worktree/description would
> be a natural extension of the concept, I'd presume.



-- 
Norbert Kiesel | Staff Software Engineer | Credit Karma
norbert.kiesel@creditkarma.com | www.creditkarma.com

This email may contain confidential and privileged information. Any
review, use, distribution, or disclosure by anyone other than the
intended recipient(s) is prohibited. If you are not the intended
recipient, please contact the sender by reply email and delete all
copies of this message.

^ permalink raw reply

* Re: [PATCH] worktree: record creation time and free-form note
From: Junio C Hamano @ 2026-06-02 23:57 UTC (permalink / raw)
  To: Kiesel, Norbert; +Cc: git
In-Reply-To: <xmqq1peots9i.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> "Kiesel, Norbert" <norbert.kiesel@creditkarma.com> writes:
>
>> From 130cd5e4a25e6672b2a97268e1100b6ef03fa552 Mon Sep 17 00:00:00 2001
>> From: Norbert Kiesel <norbert.kiesel@creditkarma.com>
>> Date: Mon, 1 Jun 2026 17:03:39 -0700
>> Subject: [PATCH] worktree: record creation time and free-form note

Ah, I forgot to mention another thing.  Please do not add these four
lines to your message body.  The information belongs to the e-mail
header, and as long as your e-mail software is configured correctly
there shouldn't be a need to use From: or override the time when the
patch was made public with Date: in-body header.

Thanks.

^ permalink raw reply

* Re: [PATCH] worktree: record creation time and free-form note
From: Junio C Hamano @ 2026-06-02 23:52 UTC (permalink / raw)
  To: Kiesel, Norbert; +Cc: git
In-Reply-To: <CAPGaHku+RAV+FA3C0md0xHiavfdB_anoqcMM06MAiU1VyMAdLA@mail.gmail.com>

"Kiesel, Norbert" <norbert.kiesel@creditkarma.com> writes:

> From 130cd5e4a25e6672b2a97268e1100b6ef03fa552 Mon Sep 17 00:00:00 2001
> From: Norbert Kiesel <norbert.kiesel@creditkarma.com>
> Date: Mon, 1 Jun 2026 17:03:39 -0700
> Subject: [PATCH] worktree: record creation time and free-form note
>
> Add per-worktree metadata so users can answer "what is this worktree
> for, and when did I make it?" without resorting to external notes.

Although I am not personally interested in this topic all that much,
let me point out that we have $GIT_DIR/description file that may be
useful for something like this.  It has been the canonical place for
the main repository to identify itself long before secondary worktrees
were invented and $GIT_COMMON_DIR/worktrees/$worktree/description would
be a natural extension of the concept, I'd presume.

^ permalink raw reply

* Re: [PATCH v2] prio-queue: use cascade-down for faster extract-min
From: Kristofer Karlsson @ 2026-06-02 22:40 UTC (permalink / raw)
  To: René Scharfe; +Cc: Kristofer Karlsson via GitGitGadget, git
In-Reply-To: <90270818-c52b-4611-8da2-6cee20628fc2@web.de>

On Tue, 2 Jun 2026 at 18:37, René Scharfe <l.s.r@web.de> wrote:
>
> Would you be interested in benchmarking the following patch for making
> prio_queue_replace() unnecessary by doing its optimization
> automatically?  I get a 1% performance hit for the describe command
> that I can't explain.  And it leaves the heap unbalanced after a
> prio_queue_get(), which complicates things, so I found it lacking.
> But I wonder how it stacks up against your cascade approach for your
> use case and if there's anything to salvage.
>
> René

Thank you for the detailed feedback and the patch! It was very
helpful to have a concrete alternative to compare against.

I spent some time benchmarking the different approaches on a
large monorepo with a wide DAG.

All measurements include the nonstale O(1) tracking from my other
series as a common base, since that dominates the merge-base path.

The approaches I compared:

  1. cascade-only: the sift_up_rebalance from this patch (v2)
  2. rene-lazy: your deferred sift_down_root patch
  3. cascade+lazy: cascade for unfused gets, lazy fusion for
     get+put pairs

Results (10 runs, 1 warmup, CPU pinned to performance):

  merge-base --all master master~1000 (~4s workload):

    cascade-only   4.18s (median)
    rene-lazy      4.25s
    cascade+lazy   4.24s

  rev-list --count master~1000..master (~3.8s workload):

    cascade-only   3.86s
    rene-lazy      3.75s
    cascade+lazy   3.74s

The lazy approaches show a small win on rev-list (~3%) where get+put
pairs are common in limit_list. On merge-base --all, everything is
within noise, the prio_queue is a small fraction of total runtime
there. Combining cascade with lazy fusion didn't produce additional
gains beyond what each gives individually.

Looking at your patch, I think the deferred sift-down logic is
essentially the same optimization as the lazy_queue wrapper you
wrote for describe.c - both defer the work from get and fuse it
with a following put. So I'd be hesitant to add a second form of
that deferral directly into prio_queue when lazy_queue already
"owns" that responsibility as a wrapper.

That said, I think it would make sense to fold lazy_queue entirely
into prio_queue. It's an optimization that never hurts as far as I can
tell, and it would simplify several callers. pop_most_recent_commit
and show-branch both independently re-implement the same
peek+replace pattern that lazy_queue formalizes. Making it automatic
in prio_queue would clean up all of them.

I have a local branch exploring that direction. Maybe it makes more
sense to do the lazy_queue fold first, and then see if the cascade
change is still worth adding on top?

Either way, I think the two directions are complementary - cascade
reduces comparisons per sift, while lazy fusion can eliminate full
rebalance cycles.

I'm on a company offsite now so I may be slow to answer, but I will
definitely resume this when I get back home.

- Kristofer

^ permalink raw reply

* [PATCH v2 4/4] pack-objects: support `--delta-islands` with `--path-walk`
From: Taylor Blau @ 2026-06-02 22:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <cover.1780438896.git.me@ttaylorr.com>

Since the inception of `--path-walk`, this option has had a documented
incompatibility with `--delta-islands`.

When discussing those original patches on the list, a message from
Stolee in [1] noted the following:

    this could be remedied by [...] doing a separate walk to identify
    islands using the normal method

In a related portion of the thread, Peff explains[2]:

    The delta islands code already does its own tree walk to propagate
    the bits down (it does rely on the base walk's show_commit() to
    propagate through the commits).

    Once each object has its island bitmaps, I think however you
    choose to come up with delta candidates [...] you should be able
    to use it. It's fundamentally just answering the question of "am
    I allowed to delta between these two objects".

That is similar to what this patch does, and it turns out the cheaper
option is sufficient: perform the same island side effects from the
path-walk callback rather than doing a second walk.

Recall how delta-islands are computed during a normal repack:

 - `show_commit()` calls `propagate_island_marks()` for each commit,
   which merges the commit's island bitset onto its root tree object and
   onto each of its parent commits.

 - `show_object()` for a tree records the tree's depth derived from the
   slash-separated pathname. Subsequent `resolve_tree_islands()` uses
   that depth to walk trees in increasing-depth order, propagating each
   tree's marks to its children.

 - At delta-search time, `in_same_island()` enforces that a delta
   target's island bitmap is a subset of its base's: every island that
   reaches the target must also reach the base.

Path-walk's enumeration callback is `add_objects_by_path()`. It already
adds objects to `to_pack`, but until now did not perform the
island-related side effects. Two things are needed:

 - For each commit batch, call `propagate_island_marks()` on commits,
   exactly as `show_commit()` does.

   We have to be careful about the order in which we call this function,
   and we must see a commit before its parents in order to have
   island marks to propagate.

   The path-walk batch preserves that order. Path-walk appends commits
   to its `OBJ_COMMIT` batch as they come back from the same
   `get_revision()` loop the regular traversal uses, and
   `add_objects_by_path()` iterates the batch in array order. So every
   commit reaches `propagate_island_marks()` in the same sequence that
   `show_commit()` would have seen it, and the descendant-first chain
   that the algorithm relies on is intact.

   Skip island propagation for excluded commits to match the regular
   traversal, whose `show_commit()` callback is only invoked for
   interesting commits. Boundary commits may still be present in
   path-walk's callback so they can serve as thin-pack bases, but they
   should not contribute island marks.

 - For each tree batch, record the tree's depth from the path. Use the
   `record_tree_depth()` helper from the previous commit so both
   callbacks behave identically, including the max-depth-wins behavior
   when a tree is reached via more than one path. The helper accepts
   both the `show_object()` path shape ("foo", "foo/bar") and the
   path-walk shape with a trailing slash ("foo/", "foo/bar/"), so depths
   recorded from either traversal mode are directly comparable.

   This is implicit in the implementation sketch from Peff above.
   `resolve_tree_islands()` sorts trees by `oe->tree_depth` in
   increasing-depth order before propagating marks down, so that a
   parent tree's marks are finalized before its children inherit them.
   Without recording the depth at path-walk time, every
   path-walk-discovered tree would land at depth 0 in `to_pack`, the
   sort would lose its ordering, and children could inherit marks from
   parents whose own contributions had not yet been merged in.

With those two pieces in place, `resolve_tree_islands()` receives the
same island inputs from path-walk as it would from the regular
traversal, so the existing island checks can be reused unchanged.

Drop the documented incompatibility between `--path-walk` and
`--delta-islands`, and add t5320 coverage for path-walk island repacks
with and without bitmap writing, as well as the same-island case where a
delta remains allowed.

[1]: https://lore.kernel.org/git/9aa2471b-0850-4707-9733-d3b33609f5f2@gmail.com/
[2]: https://lore.kernel.org/git/20240911063203.GA1538586@coredump.intra.peff.net/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc | 14 +++++++-------
 builtin/pack-objects.c              | 22 ++++++++++++++++++----
 t/t5320-delta-islands.sh            | 29 +++++++++++++++++++++++++++++
 3 files changed, 54 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 0adce8961a3..65cd00c152f 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -402,13 +402,13 @@ will be automatically changed to version `1`.
 	of filenames that cause collisions in Git's default name-hash
 	algorithm.
 +
-Incompatible with `--delta-islands`. When `--use-bitmap-index` is
-specified with `--path-walk`, a successful bitmap traversal is used for
-object enumeration, with path-walk remaining as the fallback traversal
-when the bitmap cannot satisfy the request. The `--path-walk` option
-supports the `--filter=<spec>` forms `blob:none`, `blob:limit=<n>`,
-`tree:0`, `object:type=<type>`, and `sparse:<oid>`. These supported filter
-types can be combined with the `combine:<spec>+<spec>` form.
+When `--use-bitmap-index` is specified with `--path-walk`, a successful
+bitmap traversal is used for object enumeration, with path-walk
+remaining as the fallback traversal when the bitmap cannot satisfy the
+request. The `--path-walk` option supports the `--filter=<spec>` forms
+`blob:none`, `blob:limit=<n>`, `tree:0`, `object:type=<type>`, and
+`sparse:<oid>`. These supported filter types can be combined with the
+`combine:<spec>+<spec>` form.
 
 
 DELTA ISLANDS
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ec02e2b21d2..f48ea7a888b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4737,13 +4737,29 @@ static int add_objects_by_path(const char *path,
 
 		add_object_entry(oid, type, path, exclude);
 
-		if (type == OBJ_COMMIT && write_bitmap_index) {
+		if (type == OBJ_COMMIT) {
 			struct commit *commit;
 
+			if (!write_bitmap_index && !use_delta_islands)
+				continue;
+
 			commit = lookup_commit(the_repository, oid);
 			if (!commit)
 				die(_("could not find commit %s"), oid_to_hex(oid));
-			index_commit_for_bitmap(commit);
+			if (write_bitmap_index)
+				index_commit_for_bitmap(commit);
+			/*
+			 * Skip island propagation for boundary commits.
+			 * The regular traversal's show_commit() is only
+			 * called for interesting commits; matching that
+			 * here keeps path-walk from doing extra work that
+			 * would only be a no-op anyway (boundary commits
+			 * are not in island_marks).
+			 */
+			if (use_delta_islands && !exclude)
+				propagate_island_marks(the_repository, commit);
+		} else if (type == OBJ_TREE && use_delta_islands) {
+			record_tree_depth(oid, path);
 		}
 	}
 
@@ -5205,8 +5221,6 @@ int cmd_pack_objects(int argc,
 		const char *option = NULL;
 		if (!path_walk_filter_compatible(&filter_options))
 			option = "--filter";
-		else if (use_delta_islands)
-			option = "--delta-islands";
 
 		if (option) {
 			warning(_("cannot use %s with %s"),
diff --git a/t/t5320-delta-islands.sh b/t/t5320-delta-islands.sh
index 2c961c70963..9b28344a0a3 100755
--- a/t/t5320-delta-islands.sh
+++ b/t/t5320-delta-islands.sh
@@ -53,6 +53,35 @@ test_expect_success 'separate islands disallows delta' '
 	! is_delta_base $two $one
 '
 
+test_expect_success 'path-walk island repack respects islands' '
+	GIT_TRACE2_EVENT="$(pwd)/trace.path-walk-islands" \
+		git -c "pack.island=refs/heads/(.*)" repack -adfi \
+		--path-walk 2>err &&
+	test_region pack-objects path-walk trace.path-walk-islands &&
+	test_grep ! "cannot use --delta-islands with --path-walk" err &&
+	! is_delta_base $one $two &&
+	! is_delta_base $two $one
+'
+
+test_expect_success 'path-walk island bitmap repack respects islands' '
+	GIT_TRACE2_EVENT="$(pwd)/trace.path-walk-island-bitmap" \
+		git -c "pack.island=refs/heads/(.*)" repack -a -d -f -i -b \
+		--path-walk 2>err &&
+	test_region pack-objects path-walk trace.path-walk-island-bitmap &&
+	test_path_is_file .git/objects/pack/*.bitmap &&
+	git rev-list --test-bitmap --use-bitmap-index one &&
+	test_grep ! "cannot use --delta-islands with --path-walk" err &&
+	! is_delta_base $one $two &&
+	! is_delta_base $two $one
+'
+
+test_expect_success 'path-walk same island allows delta' '
+	GIT_TRACE2_EVENT="$(pwd)/trace.path-walk-same-island" \
+		git -c "pack.island=refs/heads" repack -adfi --path-walk &&
+	test_region pack-objects path-walk trace.path-walk-same-island &&
+	is_delta_base $one $two
+'
+
 test_expect_success 'same island allows delta' '
 	git -c "pack.island=refs/heads" repack -adfi &&
 	is_delta_base $one $two
-- 
2.54.0.23.gae57607b57f

^ permalink raw reply related

* [PATCH v2 3/4] pack-objects: extract `record_tree_depth()` helper
From: Taylor Blau @ 2026-06-02 22:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <cover.1780438896.git.me@ttaylorr.com>

Prepare for a subsequent change that needs to record tree depths from a
second call site by factoring the delta-islands tree-depth bookkeeping
out of `show_object()` and into a helper, `record_tree_depth()`.

The helper looks up the object in `to_pack`, returns early when the
object was not added there, computes the depth from the slash count in
the supplied name, and preserves the existing max-depth-wins behavior
when a tree is reached by more than one path.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e4dcb563b7d..ec02e2b21d2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2722,6 +2722,22 @@ static inline void oe_set_tree_depth(struct packing_data *pack,
 	pack->tree_depth[e - pack->objects] = tree_depth;
 }
 
+static void record_tree_depth(const struct object_id *oid, const char *name)
+{
+	const char *p;
+	unsigned depth;
+	struct object_entry *ent;
+
+	/* the empty string is a root tree, which is depth 0 */
+	depth = *name ? 1 : 0;
+	for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
+		depth++;
+
+	ent = packlist_find(&to_pack, oid);
+	if (ent && depth > oe_tree_depth(&to_pack, ent))
+		oe_set_tree_depth(&to_pack, ent, depth);
+}
+
 /*
  * Return the size of the object without doing any delta
  * reconstruction (so non-deltas are true object sizes, but deltas
@@ -4375,20 +4391,8 @@ static void show_object(struct object *obj, const char *name,
 	add_preferred_base_object(name);
 	add_object_entry(&obj->oid, obj->type, name, 0);
 
-	if (use_delta_islands) {
-		const char *p;
-		unsigned depth;
-		struct object_entry *ent;
-
-		/* the empty string is a root tree, which is depth 0 */
-		depth = *name ? 1 : 0;
-		for (p = strchr(name, '/'); p; p = strchr(p + 1, '/'))
-			depth++;
-
-		ent = packlist_find(&to_pack, &obj->oid);
-		if (ent && depth > oe_tree_depth(&to_pack, ent))
-			oe_set_tree_depth(&to_pack, ent, depth);
-	}
+	if (use_delta_islands)
+		record_tree_depth(&obj->oid, name);
 }
 
 static void show_object__ma_allow_any(struct object *obj, const char *name, void *data)
-- 
2.54.0.23.gae57607b57f


^ permalink raw reply related

* [PATCH v2 2/4] pack-objects: support reachability bitmaps with `--path-walk`
From: Taylor Blau @ 2026-06-02 22:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <cover.1780438896.git.me@ttaylorr.com>

When 'pack-objects' is invoked with '--path-walk', it prevents us from
using reachability bitmaps.

This behavior dates back to 70664d2865c (pack-objects: add --path-walk
option, 2025-05-16), which included a comment in the relevant portion of
the command-line arguments handling that read as follows:

    /*
     * We must disable the bitmaps because we are removing
     * the --objects / --objects-edge[-aggressive] options.
     */

In fb2c309b7d3 (pack-objects: pass --objects with --path-walk,
2026-05-02), path-walk learned to pass '--objects' again, but still
kept bitmap traversal disabled. That leaves two useful cases
unsupported:

 * A path-walk repack that writes bitmaps does not give the bitmap
   selector any commits, because path-walk reveals commits through
   `add_objects_by_path()` rather than through `show_commit()`, where
   `index_commit_for_bitmap()` is normally called.

 * An invocation like "git pack-objects --use-bitmap-index --path-walk"
   never tries an existing bitmap, even when one is available and could
   answer the request.

Fortunately for us, neither restriction is required.

 * On the writing side: teach the path-walk object callback to call
   `index_commit_for_bitmap()` for commits that it adds to the pack.
   That gives the bitmap selector the commit candidates it would have
   seen from the regular traversal.

 * For bitmap reading, keep passing '--objects' to the internal rev_list
   machinery, but stop clearing `use_bitmap_index`. If an existing
   bitmap can answer the request, use it; otherwise fall back to
   path-walk's own enumeration.

As a result, we can see significantly reduced pack sizes from p5311
before this commit:

    Test                                      HEAD^             HEAD
    ----------------------------------------------------------------------------------
    5311.38: server (1 days, --path-walk)     2.56(2.52+0.03)   0.01(0.01+0.00) -99.6%
    5311.39: size   (1 days, --path-walk)              123.9K            123.9K +0.0%
    5311.40: client (1 days, --path-walk)     0.00(0.01+0.00)   0.00(0.00+0.00) =
    5311.42: server (2 days, --path-walk)     2.57(2.52+0.05)   0.01(0.01+0.00) -99.6%
    5311.43: size   (2 days, --path-walk)              123.9K            123.9K +0.0%
    5311.44: client (2 days, --path-walk)     0.00(0.00+0.00)   0.00(0.00+0.00) =
    5311.46: server (4 days, --path-walk)     2.58(2.51+0.07)   0.01(0.01+0.00) -99.6%
    5311.47: size   (4 days, --path-walk)              123.9K            123.9K +0.0%
    5311.48: client (4 days, --path-walk)     0.00(0.00+0.00)   0.00(0.00+0.00) =
    5311.50: server (8 days, --path-walk)     2.58(2.53+0.04)   0.02(0.02+0.00) -99.2%
    5311.51: size   (8 days, --path-walk)              152.4K            152.4K +0.0%
    5311.52: client (8 days, --path-walk)     0.00(0.01+0.00)   0.00(0.01+0.00) =
    5311.54: server (16 days, --path-walk)    2.58(2.52+0.05)   0.03(0.02+0.00) -98.8%
    5311.55: size   (16 days, --path-walk)             205.3K            205.3K +0.0%
    5311.56: client (16 days, --path-walk)    0.01(0.01+0.00)   0.01(0.01+0.00) +0.0%
    5311.58: server (32 days, --path-walk)    2.59(2.53+0.06)   0.03(0.03+0.00) -98.8%
    5311.59: size   (32 days, --path-walk)             209.3K            209.3K +0.0%
    5311.60: client (32 days, --path-walk)    0.01(0.02+0.00)   0.01(0.02+0.00) +0.0%
    5311.62: server (64 days, --path-walk)    2.70(2.76+0.06)   0.16(0.24+0.04) -94.1%
    5311.63: size   (64 days, --path-walk)               4.1M              4.1M +0.0%
    5311.64: client (64 days, --path-walk)    0.44(0.50+0.02)   0.44(0.51+0.02) +0.0%
    5311.66: server (128 days, --path-walk)   2.88(3.20+0.05)   0.34(0.65+0.05) -88.2%
    5311.67: size   (128 days, --path-walk)              9.0M              9.0M -0.0%
    5311.68: client (128 days, --path-walk)   0.93(1.22+0.07)   0.93(1.20+0.08) +0.0%

We get the same size of output pack, but this commit allows us to do so
in a significantly shorter amount of time. Intuitively, we're generating
the same pack (hence the unchanged 'test_size' output from run to run),
but varying how we get there. Before this commit, pack-objects prefers
'--path-walk' to '--use-bitmap-index', so we generate the output pack by
performing a normal '--path-walk' traversal. With this commit, we are
operating over a *repacked* state (that itself was done with a
'--path-walk' traversal), but are able to perform pack-reuse on that
repacked state via bitmaps.

There is one wrinkle when it comes to '--boundary', which we must not
pass into the bitmap walk in the presence of both '--path-walk' and
'--use-bitmap-index'. Path-walk needs boundary commits when it performs
its own traversal, in order to discover bases for thin packs, but the
bitmap traversal does not expect this. Work around this by setting
`revs->boundary` as late as possible within the '--path-walk' traversal,
after any bitmap attempt has either succeeded or declined to answer the
request.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-pack-objects.adoc |  6 +++--
 builtin/pack-objects.c              | 18 +++++++++++++--
 t/perf/p5311-pack-bitmaps-fetch.sh  | 14 +++++++----
 t/t5310-pack-bitmaps.sh             | 36 +++++++++++++++++++++++++++++
 4 files changed, 66 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 8a27aa19fd3..0adce8961a3 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -402,8 +402,10 @@ will be automatically changed to version `1`.
 	of filenames that cause collisions in Git's default name-hash
 	algorithm.
 +
-Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
-ignored in the presence of `--path-walk`. The `--path-walk` option
+Incompatible with `--delta-islands`. When `--use-bitmap-index` is
+specified with `--path-walk`, a successful bitmap traversal is used for
+object enumeration, with path-walk remaining as the fallback traversal
+when the bitmap cannot satisfy the request. The `--path-walk` option
 supports the `--filter=<spec>` forms `blob:none`, `blob:limit=<n>`,
 `tree:0`, `object:type=<type>`, and `sparse:<oid>`. These supported filter
 types can be combined with the `combine:<spec>+<spec>` form.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b783dc62bc9..e4dcb563b7d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4732,6 +4732,15 @@ static int add_objects_by_path(const char *path,
 			continue;
 
 		add_object_entry(oid, type, path, exclude);
+
+		if (type == OBJ_COMMIT && write_bitmap_index) {
+			struct commit *commit;
+
+			commit = lookup_commit(the_repository, oid);
+			if (!commit)
+				die(_("could not find commit %s"), oid_to_hex(oid));
+			index_commit_for_bitmap(commit);
+		}
 	}
 
 	oe_end = to_pack.nr_objects;
@@ -4764,6 +4773,13 @@ static int get_object_list_path_walk(struct rev_info *revs)
 	info.path_fn = add_objects_by_path;
 	info.path_fn_data = &processed;
 
+	/*
+	 * Path-walk needs boundary commits to discover thin-pack bases, but
+	 * bitmap traversal does not understand the boundary state. Set it
+	 * here so any prior bitmap attempt sees the usual non-boundary walk.
+	 */
+	revs->boundary = 1;
+
 	/*
 	 * Allow the --[no-]sparse option to be interesting here, if only
 	 * for testing purposes. Paths with no interesting objects will not
@@ -5195,9 +5211,7 @@ int cmd_pack_objects(int argc,
 		}
 	}
 	if (path_walk) {
-		strvec_push(&rp, "--boundary");
 		strvec_push(&rp, "--objects");
-		use_bitmap_index = 0;
 	} else if (thin) {
 		use_internal_rev_list = 1;
 		strvec_push(&rp, shallow
diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index 5bea5c64e7b..1b115d921a1 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -4,15 +4,18 @@ test_description='performance of fetches from bitmapped packs'
 . ./perf-lib.sh
 
 test_fetch_bitmaps () {
+	argv=$1
+	export argv
+
 	test_expect_success 'setup test directory' '
 		rm -fr * .git
 	'
 
 	test_perf_default_repo
 
-	test_expect_success 'create bitmapped server repo' '
+	test_expect_success "create bitmapped server repo ${argv:+($argv)}" '
 		git config pack.writebitmaps true &&
-		git repack -ad
+		git repack -ad $argv
 	'
 
 	# simulate a fetch from a repository that last fetched N days ago, for
@@ -20,7 +23,7 @@ test_fetch_bitmaps () {
 	# and assume the first entry in the chain that is N days older than the current
 	# HEAD is where the HEAD would have been then.
 	for days in 1 2 4 8 16 32 64 128; do
-		title=$(printf '%10s' "($days days)")
+		title=$(printf '%10s' "($days days${argv:+, $argv})")
 		test_expect_success "setup revs from $days days ago" '
 			now=$(git log -1 --format=%ct HEAD) &&
 			then=$(($now - ($days * 86400))) &&
@@ -47,6 +50,9 @@ test_fetch_bitmaps () {
 	done
 }
 
-test_fetch_bitmaps
+for argv in '' --path-walk
+do
+	test_fetch_bitmaps $argv || return 1
+done
 
 test_done
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f693cb56691..69c5da1580a 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -577,6 +577,42 @@ test_bitmap_cases
 
 sane_unset GIT_TEST_PACK_USE_BITMAP_BOUNDARY_TRAVERSAL
 
+test_expect_success 'path-walk repack can write and use bitmap indexes' '
+	test_when_finished "rm -rf path-walk-bitmap" &&
+	git init path-walk-bitmap &&
+	(
+		cd path-walk-bitmap &&
+		test_commit first &&
+		test_commit second &&
+		test_commit third &&
+
+		git repack -a -d -b --path-walk &&
+		git rev-list --test-bitmap --use-bitmap-index HEAD &&
+
+		git rev-parse HEAD >in &&
+
+		git rev-list --objects --no-object-names HEAD >expect.raw &&
+		sort expect.raw >expect &&
+
+		for reuse in true false
+		do
+			: >trace.txt &&
+
+			GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
+			git -c pack.allowPackReuse=$reuse pack-objects \
+				--stdout --revs --path-walk --use-bitmap-index \
+				<in >out.pack &&
+			grep "\"category\":\"bitmap\",\"key\":\"bitmap/hits\"" trace.txt &&
+
+			git index-pack out.pack &&
+
+			list_packed_objects out.idx >actual.raw &&
+			sort actual.raw >actual &&
+			test_cmp expect actual || return 1
+		done
+	)
+'
+
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
 	test_must_fail git repack -d 2>err &&
-- 
2.54.0.23.gae57607b57f


^ permalink raw reply related

* [PATCH v2 1/4] t/perf: drop p5311's lookup-table permutation
From: Taylor Blau @ 2026-06-02 22:21 UTC (permalink / raw)
  To: git; +Cc: Derrick Stolee, Junio C Hamano, Jeff King, Elijah Newren
In-Reply-To: <cover.1780438896.git.me@ttaylorr.com>

p5311 measures the cost of serving a fetch from a bitmapped pack and
indexing the resulting pack on the client. Since 761416ef91d
(bitmap-lookup-table: add performance tests for lookup table,
2022-08-14), p5311 effectively runs itself twice: once with the bitmap's
lookup table extension enabled, and again with it disabled.

This comparison has served its useful purpose, as the lookup table is
almost four years old, and the de-facto default in server-side Git
deployments.

A following commit will want to test a different combination (repacking
with and without '--path-walk' instead of the lookup table). Instead of
multiplying the current test count by two again to produce four
variations of `test_fetch_bitmaps()`, drop the lookup table option to
reduce the number of perf tests we run. Retain `test_fetch_bitmaps()`
itself, since we will use this in the future for the new
parameterization.

(As an aside, a future commit outside of this series will adjust the
default value of 'pack.writeBitmapLookupTable' to "true", matching the
de-facto norm for deployments where the existence of bitmap lookup
tables is meaningful. Punt on that to a later series and instead make
the minimal change for now.)

Suggested-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5311-pack-bitmaps-fetch.sh | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index 047efb995d6..5bea5c64e7b 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -12,7 +12,6 @@ test_fetch_bitmaps () {

 	test_expect_success 'create bitmapped server repo' '
 		git config pack.writebitmaps true &&
-		git config pack.writeBitmapLookupTable '"$1"' &&
 		git repack -ad
 	'

@@ -32,7 +31,7 @@ test_fetch_bitmaps () {
 			} >revs
 		'

-		test_perf "server $title (lookup=$1)" '
+		test_perf "server $title" '
 			git pack-objects --stdout --revs \
 					--thin --delta-base-offset \
 					<revs >tmp.pack
@@ -42,13 +41,12 @@ test_fetch_bitmaps () {
 			test_file_size tmp.pack
 		'

-		test_perf "client $title (lookup=$1)" '
+		test_perf "client $title" '
 			git index-pack --stdin --fix-thin <tmp.pack
 		'
 	done
 }

-test_fetch_bitmaps true
-test_fetch_bitmaps false
+test_fetch_bitmaps

 test_done
-- 
2.54.0.23.gae57607b57f

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox