* [PATCH v2] stash: reuse cached index entries in --patch temporary index
From: Adam Johnson via GitGitGadget @ 2026-05-22 23:12 UTC (permalink / raw)
To: git
Cc: Thomas Gummerer, Elijah Newren, Phillip Wood, Victoria Dye,
Adam Johnson, Adam Johnson
In-Reply-To: <pull.2306.git.git.1779194605735.gitgitgadget@gmail.com>
From: Adam Johnson <me@adamj.eu>
`git stash -p` prepares the interactive selection by creating a
temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
running the `add -p` machinery.
That temporary index was created by running `git read-tree HEAD`. The
resulting index had no useful cached stat data or fsmonitor-valid bits
from the real index. When `run_add_p()` refreshed that temporary index
before showing the first prompt, it could end up lstat(2)-ing every
tracked file, even in a repository where `git diff` and `git restore -p`
can use fsmonitor to avoid that work.
Create the temporary index in-process instead. Use `unpack_trees()` to
reset the real index contents to HEAD while writing the result to the
temporary index path. For paths whose index entries already match HEAD,
`oneway_merge()` reuses the existing cache entries, preserving their
cached stat data and `CE_FSMONITOR_VALID` state.
This makes the refresh performed by `run_add_p()` behave like the one
used by `git restore -p`: unchanged paths can be skipped via fsmonitor
instead of being scanned again.
In a 206k file repository with `core.fsmonitor` enabled and a one-line
change in one file, time to first prompt dropped from 34.774 seconds to
0.659 seconds. The new perf test file demonstrates similar improvements,
with maen times for without- and with-fsmonitor cases dropping from 6.90
and 6.83 seconds to 0.55 and 0.28 seconds, respectively.
Signed-off-by: Adam Johnson <me@adamj.eu>
---
stash: reuse cached index entries in --patch temporary index
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2306%2Fadamchainz%2Faj%2Foptimize-stash-patch-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2306/adamchainz/aj/optimize-stash-patch-v2
Pull-Request: https://github.com/git/git/pull/2306
Range-diff vs v1:
1: b228160cc4 ! 1: 8785572c4d stash: reuse cached index entries in --patch temporary index
@@ Commit message
In a 206k file repository with `core.fsmonitor` enabled and a one-line
change in one file, time to first prompt dropped from 34.774 seconds to
- 0.659 seconds.
+ 0.659 seconds. The new perf test file demonstrates similar improvements,
+ with maen times for without- and with-fsmonitor cases dropping from 6.90
+ and 6.83 seconds to 0.55 and 0.28 seconds, respectively.
Signed-off-by: Adam Johnson <me@adamj.eu>
@@ builtin/stash.c: static int reset_tree(struct object_id *i_tree, int update, int
+ struct lock_file lock_file = LOCK_INIT;
+
+ repo_read_index_preload(the_repository, NULL, 0);
-+ if (refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL))
-+ return -1;
++ refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL);
+
+ hold_lock_file_for_update(&lock_file, index_path, LOCK_DIE_ON_ERROR);
+
@@ builtin/stash.c: static int stash_patch(struct stash_info *info, const struct pa
goto done;
}
- ## t/t3904-stash-patch.sh ##
-@@ t/t3904-stash-patch.sh: test_expect_success 'none of this moved HEAD' '
- verify_saved_head
- '
-
-+test_expect_success 'stash -p with unmodified tracked files present' '
-+ git reset --hard &&
-+ echo line1 >alpha &&
-+ echo line1 >beta &&
-+ git add alpha beta &&
-+ git commit -m "add alpha and beta" &&
-+ echo line2 >>alpha &&
-+ echo y | git stash -p &&
-+ echo line1 >expect &&
-+ test_cmp expect alpha &&
-+ test_cmp expect beta &&
-+ git stash pop &&
-+ printf "line1\nline2\n" >expect &&
-+ test_cmp expect alpha &&
-+ echo line1 >expect &&
-+ test_cmp expect beta
+ ## t/perf/p3904-stash-patch.sh (new) ##
+@@
++#!/bin/sh
++
++test_description="Performance tests for git stash -p"
++
++. ./perf-lib.sh
++
++test_perf_fresh_repo
++
++test_expect_success "setup" '
++ mkdir files &&
++ test_seq 1 100000 | while read i; do
++ echo "content $i" >files/$i.txt || return 1
++ done &&
++ git add files/ &&
++ git commit -q -m "add tracked files" &&
++ echo modified >files/1.txt
+'
+
- test_expect_success 'stash -p with split hunk' '
- git reset --hard &&
- cat >test <<-\EOF &&
++test_perf "stash -p, no fsmonitor" \
++ --setup 'echo modified >files/1.txt' '
++ printf "q\n" | git stash -p >/dev/null 2>&1 || true
++'
++
++if test_have_prereq FSMONITOR_DAEMON
++then
++ test_expect_success "enable builtin fsmonitor" '
++ git config core.fsmonitor true &&
++ git fsmonitor--daemon start &&
++ git update-index --fsmonitor &&
++ git status >/dev/null 2>&1
++ '
++
++ test_perf "stash -p, builtin fsmonitor" \
++ --setup 'echo modified >files/1.txt && git status >/dev/null 2>&1' '
++ printf "q\n" | git stash -p >/dev/null 2>&1 || true
++ '
++
++ test_expect_success "stop builtin fsmonitor" '
++ git fsmonitor--daemon stop
++ '
++fi
++
++test_done
builtin/stash.c | 70 +++++++++++++++++++++++++++++++++----
t/perf/p3904-stash-patch.sh | 43 +++++++++++++++++++++++
2 files changed, 107 insertions(+), 6 deletions(-)
create mode 100755 t/perf/p3904-stash-patch.sh
diff --git a/builtin/stash.c b/builtin/stash.c
index 32dbc97b47..c4809f299a 100644
--- a/builtin/stash.c
+++ b/builtin/stash.c
@@ -372,6 +372,56 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
return 0;
}
+static int create_index_from_tree(const struct object_id *tree_id,
+ const char *index_path)
+{
+ int nr_trees = 1;
+ int ret = 0;
+ struct unpack_trees_options opts;
+ struct tree_desc t[MAX_UNPACK_TREES];
+ struct tree *tree;
+ struct index_state dst_istate = INDEX_STATE_INIT(the_repository);
+ struct lock_file lock_file = LOCK_INIT;
+
+ repo_read_index_preload(the_repository, NULL, 0);
+ refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL);
+
+ hold_lock_file_for_update(&lock_file, index_path, LOCK_DIE_ON_ERROR);
+
+ memset(&opts, 0, sizeof(opts));
+
+ tree = repo_parse_tree_indirect(the_repository, tree_id);
+ if (!tree || repo_parse_tree(the_repository, tree)) {
+ ret = -1;
+ goto done;
+ }
+
+ init_tree_desc(t, &tree->object.oid, tree->buffer, tree->size);
+
+ opts.head_idx = 1;
+ opts.src_index = the_repository->index;
+ opts.dst_index = &dst_istate;
+ opts.merge = 1;
+ opts.reset = UNPACK_RESET_PROTECT_UNTRACKED;
+ opts.fn = oneway_merge;
+
+ if (unpack_trees(nr_trees, t, &opts)) {
+ ret = -1;
+ goto done;
+ }
+
+ if (write_locked_index(&dst_istate, &lock_file, COMMIT_LOCK)) {
+ ret = error(_("unable to write new index file"));
+ goto done;
+ }
+
+done:
+ release_index(&dst_istate);
+ if (ret)
+ rollback_lock_file(&lock_file);
+ return ret;
+}
+
static int diff_tree_binary(struct strbuf *out, struct object_id *w_commit)
{
struct child_process cp = CHILD_PROCESS_INIT;
@@ -1321,18 +1371,26 @@ static int stash_patch(struct stash_info *info, const struct pathspec *ps,
struct interactive_options *interactive_opts)
{
int ret = 0;
- struct child_process cp_read_tree = CHILD_PROCESS_INIT;
struct child_process cp_diff_tree = CHILD_PROCESS_INIT;
+ struct commit *head_commit;
+ const struct object_id *head_tree;
struct index_state istate = INDEX_STATE_INIT(the_repository);
char *old_index_env = NULL, *old_repo_index_file;
remove_path(stash_index_path.buf);
- cp_read_tree.git_cmd = 1;
- strvec_pushl(&cp_read_tree.args, "read-tree", "HEAD", NULL);
- strvec_pushf(&cp_read_tree.env, "GIT_INDEX_FILE=%s",
- stash_index_path.buf);
- if (run_command(&cp_read_tree)) {
+ head_commit = lookup_commit(the_repository, &info->b_commit);
+ if (!head_commit || repo_parse_commit(the_repository, head_commit)) {
+ ret = -1;
+ goto done;
+ }
+ head_tree = get_commit_tree_oid(head_commit);
+ if (!head_tree) {
+ ret = -1;
+ goto done;
+ }
+
+ if (create_index_from_tree(head_tree, stash_index_path.buf)) {
ret = -1;
goto done;
}
diff --git a/t/perf/p3904-stash-patch.sh b/t/perf/p3904-stash-patch.sh
new file mode 100755
index 0000000000..4cfce638be
--- /dev/null
+++ b/t/perf/p3904-stash-patch.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description="Performance tests for git stash -p"
+
+. ./perf-lib.sh
+
+test_perf_fresh_repo
+
+test_expect_success "setup" '
+ mkdir files &&
+ test_seq 1 100000 | while read i; do
+ echo "content $i" >files/$i.txt || return 1
+ done &&
+ git add files/ &&
+ git commit -q -m "add tracked files" &&
+ echo modified >files/1.txt
+'
+
+test_perf "stash -p, no fsmonitor" \
+ --setup 'echo modified >files/1.txt' '
+ printf "q\n" | git stash -p >/dev/null 2>&1 || true
+'
+
+if test_have_prereq FSMONITOR_DAEMON
+then
+ test_expect_success "enable builtin fsmonitor" '
+ git config core.fsmonitor true &&
+ git fsmonitor--daemon start &&
+ git update-index --fsmonitor &&
+ git status >/dev/null 2>&1
+ '
+
+ test_perf "stash -p, builtin fsmonitor" \
+ --setup 'echo modified >files/1.txt && git status >/dev/null 2>&1' '
+ printf "q\n" | git stash -p >/dev/null 2>&1 || true
+ '
+
+ test_expect_success "stop builtin fsmonitor" '
+ git fsmonitor--daemon stop
+ '
+fi
+
+test_done
base-commit: 7bcaabddcf68bd0702697da5904c3b68c52f94cf
--
gitgitgadget
^ permalink raw reply related
* Re: [PATCH v2 04/11] git-gui: use rev-parse exclusively to find a repository
From: Mark Levedahl @ 2026-05-22 23:00 UTC (permalink / raw)
To: Johannes Sixt; +Cc: egg_mushroomcow, bootaina702, git
In-Reply-To: <8d1488ec-c4de-4ddd-b3cd-e1e8b4a343bf@kdbg.org>
On 5/22/26 4:46 AM, Johannes Sixt wrote:
> Am 20.05.26 um 22:24 schrieb Mark Levedahl:
>
> Sorry, but I cannot agree with "prefix is only known after the worktree
> is found". The prefix is a property that can be known even if we haven't
> asked where the top-level of the working tree is. See more below.
>
>> This is true even when running the repository
>> picker: that option provides a list of prior selections, and does no
>> validation on the list beyond checking that the directories exist. For
>> now, just initialize _prefix along with other global variables.
>>
>
> You cannot leave the _prefix empty, because it breaks `git gui browser
> master dir` when invoked from a subdirectory of the working tree.
>
> This line must remain. I see that you add it back in later patch. There
> may be some motivation to move prefix discovery, but there is no
> motivation to remove it at this point.
Never mind, I confused myself on the process. THis patch has only affected repo discovery,
worktree discovery is later so should be left untouched here.
Mark
^ permalink raw reply
* Re: [PATCH v5 00/13] pack-objects: integrate --path-walk and some --filter options
From: Taylor Blau @ 2026-05-22 22:40 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, newren, peff, ps,
Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
On Fri, May 22, 2026 at 06:24:24PM +0000, Derrick Stolee via GitGitGadget wrote:
> Range-diff vs v4:
>
> 1: 0840110116 = 1: 0840110116 t5620: make test work with path-walk var
> 2: d7c87545f3 = 2: d7c87545f3 pack-objects: pass --objects with --path-walk
> 3: fb8a0f9c43 ! 3: 697ef716d2 t/perf: add pack-objects filter and path-walk benchmark
> @@ t/perf/p5315-pack-objects-filter.sh (new)
> + awk "{print \$4;}" >top-dirs &&
> + top_nr=$(wc -l <top-dirs) &&
> +
> -+ >depth2-dirs &&
> + while read tdir
> + do
> -+ git ls-tree -d --name-only "HEAD:$tdir" 2>/dev/null || return 1
> -+ done <top-dirs >depth2-dirs.raw &&
> -+ sed "s|^|$tdir/|" <depth2-dirs.raw >depth2-dirs &&
> ++ git ls-tree -d --format="$tdir/%(path)" "HEAD:$tdir" || return 1
> ++ done <top-dirs >depth2-dirs &&
> +
> + d2_nr=$(wc -l <depth2-dirs) &&
> +
> 4: e77c8a6bbc = 4: 91845bcef0 path-walk: always emit directly-requested objects
> 5: f4904f81e0 = 5: fdb9361198 path-walk: support blobless filter
> 6: f37467e46f = 6: 89726faf7e backfill: die on incompatible filter options
> 7: 133c1b156c = 7: 3884d4737f path-walk: support blob size limit filter
> 8: 0f517be8e3 = 8: 31b4ef0fa1 path-walk: add pl_sparse_trees to control tree pruning
> 9: b4dc09ab69 = 9: 7d8f0aa036 pack-objects: support sparse:oid filter with path-walk
> 10: 0b1eed0790 = 10: a68676d0de t6601: tag otherwise-unreachable trees
> 11: b23244c4c2 = 11: b0db73c6cc path-walk: support `tree:0` filter
> 12: 7e1e503361 = 12: 6845988f50 path-walk: support `object:type` filter
> 13: a615b1a707 = 13: d33d899251 path-walk: support `combine` filter
The range-diff looks good to me. Thanks!
Thanks,
Taylor
^ permalink raw reply
* Re: [PATCH] stash: reuse cached index entries in --patch temporary index
From: Adam Johnson @ 2026-05-22 20:55 UTC (permalink / raw)
To: Junio C Hamano, Adam Johnson
Cc: git, Thomas Gummerer, Elijah Newren, Phillip Wood, Victoria Dye
In-Reply-To: <xmqqldde6cl5.fsf@gitster.g>
> Is this "non-zero return from refresh_index() leads to a failure"
> intended?
Good catch, it’s not needed. Removing, we can make the call
unconditional.
On Wed, 20 May 2026, at 03:26, Junio C Hamano wrote:
> "Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > 2 files changed, 83 insertions(+), 6 deletions(-)
> >
> > diff --git a/builtin/stash.c b/builtin/stash.c
> > index 32dbc97b47..48189cb9f7 100644
> > --- a/builtin/stash.c
> > +++ b/builtin/stash.c
> > @@ -372,6 +372,57 @@ static int reset_tree(struct object_id *i_tree, int update, int reset)
> > return 0;
> > }
> >
> > +static int create_index_from_tree(const struct object_id *tree_id,
> > + const char *index_path)
> > +{
> > + int nr_trees = 1;
> > + int ret = 0;
> > + struct unpack_trees_options opts;
> > + struct tree_desc t[MAX_UNPACK_TREES];
> > + struct tree *tree;
> > + struct index_state dst_istate = INDEX_STATE_INIT(the_repository);
> > + struct lock_file lock_file = LOCK_INIT;
> > +
> > + repo_read_index_preload(the_repository, NULL, 0);
> > + if (refresh_index(the_repository->index, REFRESH_QUIET, NULL, NULL, NULL))
> > + return -1;
>
> Is this "non-zero return from refresh_index() leads to a failure"
> intended? The old "git read-tree HEAD" wouldn't have cared if the
> original index were unmerged, for example, but with this update, we
> will see an immediate failure. There are other conditions that
> refresh_index() flips its local variable has_errors on, which leads
> to its non-zero return.
>
> Since "git stash -p" is almost always invoked when the user has
> unstaged modifications, I am not sure allowing refresh_index() to
> notice and barf is what we want here.
>
^ permalink raw reply
* Re: [PATCH] stash: reuse cached index entries in --patch temporary index
From: Adam Johnson @ 2026-05-22 20:53 UTC (permalink / raw)
To: Junio C Hamano, Adam Johnson
Cc: git, Thomas Gummerer, Elijah Newren, Phillip Wood, Victoria Dye
In-Reply-To: <xmqqse7m6deh.fsf@gitster.g>
> I however have to wonder if simply replacing the external process
> invocation with "git read-tree -m HEAD" (i.e., oneway merge) gives
> a similar speed-up.
Good idea, I just tried this, but it does not help. The subprocess runs
with GIT_INDEX_FILE set to a temporary index, so oneway_merge
never uses the CE_FSMONITOR_VALID fast path.
The in-process approach is necessary because it lets us set
opts.src_index to the real index with cached stat and fsmonitor data,
before switching GIT_INDEX_FILE.
> What I read from the proposed log message is that the change is
> purely about performance and should not change any behaviour. Why
> do we need a new test in t/t3904? I would not have surprised if we
> saw a new test in t/perf/, though.
Ah yeah, my bad. I added this while iterating to catch a bug I introduced,
but it's not necessary for the final patch. Will remove.
On Wed, 20 May 2026, at 03:08, Junio C Hamano wrote:
> "Adam Johnson via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Adam Johnson <me@adamj.eu>
> >
> > `git stash -p` prepares the interactive selection by creating a
> > temporary index at HEAD, switching `GIT_INDEX_FILE` to it, and then
> > running the `add -p` machinery.
> >
> > That temporary index was created by running `git read-tree HEAD`. The
> > resulting index had no useful cached stat data or fsmonitor-valid bits
> > from the real index. When `run_add_p()` refreshed that temporary index
> > before showing the first prompt, it could end up lstat(2)-ing every
> > tracked file, even in a repository where `git diff` and `git restore -p`
> > can use fsmonitor to avoid that work.
> >
> > Create the temporary index in-process instead. Use `unpack_trees()` to
> > reset the real index contents to HEAD while writing the result to the
> > temporary index path. For paths whose index entries already match HEAD,
> > `oneway_merge()` reuses the existing cache entries, preserving their
> > cached stat data and `CE_FSMONITOR_VALID` state.
>
> Clever. As the fsmonitor_valid bit is in-core only, updating the
> index in-process would be an obvious and probably the only sensible
> way to preserve it.
>
> I however have to wonder if simply replacing the external process
> invocation with "git read-tree -m HEAD" (i.e., oneway merge) gives
> a similar speed-up.
>
> > This makes the refresh performed by `run_add_p()` behave like the one
> > used by `git restore -p`: unchanged paths can be skipped via fsmonitor
> > instead of being scanned again.
> >
> > In a 206k file repository with `core.fsmonitor` enabled and a one-line
> > change in one file, time to first prompt dropped from 34.774 seconds to
> > 0.659 seconds.
>
> Interesting.
>
> > diff --git a/t/t3904-stash-patch.sh b/t/t3904-stash-patch.sh
> > index 90a4ff2c10..4b3241c8cd 100755
> > --- a/t/t3904-stash-patch.sh
> > +++ b/t/t3904-stash-patch.sh
> > @@ -84,6 +84,24 @@ test_expect_success 'none of this moved HEAD' '
> > verify_saved_head
> > '
> >
> > +test_expect_success 'stash -p with unmodified tracked files present' '
> > + git reset --hard &&
> > + echo line1 >alpha &&
> > + echo line1 >beta &&
> > + git add alpha beta &&
> > + git commit -m "add alpha and beta" &&
> > + echo line2 >>alpha &&
> > + echo y | git stash -p &&
> > + echo line1 >expect &&
> > + test_cmp expect alpha &&
> > + test_cmp expect beta &&
> > + git stash pop &&
> > + printf "line1\nline2\n" >expect &&
> > + test_cmp expect alpha &&
> > + echo line1 >expect &&
> > + test_cmp expect beta
> > +'
>
> What I read from the proposed log message is that the change is
> purely about performance and should not change any behaviour. Why
> do we need a new test in t/t3904? I would not have surprised if we
> saw a new test in t/perf/, though.
>
> Thanks.
>
^ permalink raw reply
* Re: [PATCH 1/5] xdiff: support external hunks via xpparam_t
From: Michael Montalbo @ 2026-05-22 19:06 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Michael Montalbo via GitGitGadget, git
In-Reply-To: <xmqq33zkui4q.fsf@gitster.g>
On Thu, May 21, 2026 at 10:29 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +/*
> > + * Populate the changed[] arrays from externally supplied hunks,
> > + * bypassing the diff algorithm. Validates that hunks are in order,
> > + * non-overlapping, and within bounds.
> > + *
> > + * Returns 0 on success, -1 on validation failure.
> > + */
> > +static int xdl_populate_hunks_from_external(xdfenv_t *xe,
> > + const struct xdl_hunk *hunks,
> > + size_t nr_hunks)
> > +{
> > + size_t i;
> > + long j, prev_old_end = 0, prev_new_end = 0;
> > + long total_old = 0, total_new = 0;
> > +
> > + /*
> > + * Clear changed[] arrays. xdl_prepare_env() may have dirtied
> > + * them via xdl_cleanup_records(). The allocation is nrec + 2
> > + * elements; changed points one past the start (see xprepare.c).
> > + */
> > + memset(xe->xdf1.changed - 1, 0,
> > + (xe->xdf1.nrec + 2) * sizeof(bool));
> > + memset(xe->xdf2.changed - 1, 0,
> > + (xe->xdf2.nrec + 2) * sizeof(bool));
>
> This, especially the starting offset of -1, looks horrible. The
> internal layout of xdfenv_t might happen to match the way the above
> code expects, which is how xdl_prepare_ctx() may have give you, but
> it somehow feels brittle. I guess the assumption that changed[]
> does not point at the beginning of the allocated area (e.g., it is a
> no-no to free(xe->xdf1.changed) or realloc() it) is so pervasive that
> it cannot be helped. Sigh.
>
Agreed it is ugly. I wanted to make sure the entire changed[] including
sentinels were clear as a defensive measure for downstream callers
(xdl_change_compact). I agree this results in something that is ugly
and brittle, but in the end I thought it was superior to relying on the
fact that upstream zeroes the entire changed[] array. Maybe if the
comment was more explicit about why this is happening it would be
helpful?
/*
* Clear changed[] arrays including sentinels.
* xdl_prepare_env() may have dirtied them via
* xdl_cleanup_records(), and xdl_change_compact() reads
* the sentinel at changed[-1] during backward scans.
*/
> > int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
> > xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
> > xdchange_t *xscr;
> > xdfenv_t xe;
> > emit_func_t ef = xecfg->hunk_func ? xdl_call_hunk_func : xdl_emit_diff;
> >
> > - if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
> > -
> > - return -1;
> > + if (xpp->external_hunks) {
> > + if (xdl_prepare_env(mf1, mf2, xpp, &xe) < 0)
> > + return -1;
> > + if (xdl_populate_hunks_from_external(&xe,
> > + xpp->external_hunks,
> > + xpp->external_hunks_nr) < 0) {
> > + /*
> > + * Invalid external hunks; fall back to the
> > + * builtin diff algorithm. Re-runs
> > + * xdl_prepare_env() via xdl_do_diff().
> > + */
> > + xdl_free_env(&xe);
> > + if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
> > + return -1;
>
> If the external tool keeps sending bogus hunks, silently falling
> back to what we would have done if there weren't any external stuff
> may be necessary to pleasantly keep using Git, but two and a half
> short comments here.
>
> (1) "What we would have done" is exactly the same as what appears
> in the corresponding "else" block. Can we make sure that we do
> not have to keep updating both copies in the future with some
> code rearrangement?
>
How about something like this:
if (xpp->external_hunks) {
if (xdl_prepare_env(mf1, mf2, xpp, &xe) < 0)
return -1;
if (xdl_populate_hunks_from_external(&xe,
xpp->external_hunks,
xpp->external_hunks_nr) == 0)
goto diff_done;
xdl_free_env(&xe);
}
if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
return -1;
diff_done:
> (2) The writer of the external tool may want to see some trace of
> warning under certain flags when a failure of the tool forces
> the receiving end to fallback.
>
In diff.c how about we emit a warning rather than a trace on
fallback:
warning(_("diff process failed for '%s',"
" falling back to builtin diff"),
name_a);
> (3) If the tool throws too many broken replies, perhaps we want to
> disable it automatically?
>
For the RFC I wanted to keep it simple, but I definitely agree. A configurable
failure policy makes a lot of sense to me (e.g., disable after N failures).
> > + }
> > + } else {
> > + if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0)
> > + return -1;
> > }
> > +
> > if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
> > xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
> > xdl_build_script(&xe, &xscr) < 0) {
^ permalink raw reply
* Re: [PATCH 0/3] line-log: integrate -L with the standard log output pipeline
From: D. Ben Knoble @ 2026-05-22 18:48 UTC (permalink / raw)
To: Michael Montalbo via GitGitGadget; +Cc: git, Michael Montalbo, Junio C Hamano
In-Reply-To: <pull.2094.git.1777349126.gitgitgadget@gmail.com>
Hi Michael,
On Tue, Apr 28, 2026 at 12:06 AM Michael Montalbo via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> Since its introduction, git log -L has short-circuited from
> log_tree_commit() into its own output function, bypassing log_tree_diff()
> and log_tree_diff_flush(). This skips no_free save/restore,
> always_show_header, diff_free() cleanup, and means that pickaxe (-S, -G,
> --find-object) and --diff-filter cannot suppress commits whose pairs are all
> filtered out, because show_log() runs before diffcore_std().
>
> This series restructures the flow so that -L goes through the same
> log_tree_diff() -> log_tree_diff_flush() path as normal single-parent and
> merge diffs, then uses that to enable several non-patch diff formats.
Cleanup by itself to shrink the number of concepts in the code is
already a good thing IMO, so getting additional features out of it is
even nicer.
> Patch 1: revision: move -L setup before output_format-to-diff derivation
>
> Preparatory reorder in setup_revisions(). The -L block sets a default
> DIFF_FORMAT_PATCH when no format is requested; move it before the derivation
> of revs->diff from output_format so the default is visible to that check. No
> behavior change on its own.
Straightforward, nice.
>
> Patch 2: line-log: integrate -L output with the standard log-tree pipeline
>
> Rename line_log_print() to line_log_queue_pairs(), stripping it down to only
> queue pre-computed filepairs. log_tree_diff_flush() handles show_log(),
> diffcore_std(), and diff_flush(). This fixes pickaxe and --diff-filter
> suppression, and aligns the commit/diff separator with the rest of log
> output. Also rejects --full-diff, which is meaningless when filepairs are
> pre-computed.
At first I questioned the removal of the DIFF_FORMAT_NO_OUTPUT
conditional in line_log_queue_pairs, but now that it only queues pairs
it shouldn't be checking output formats. Good.
I also noted that log_tree_diff() returns the result of
log_tree_diff_flush() in the -L case, which is a bit different from
the other patterns. I think the difference is that the other cases
have some conditional logic around the log_tree_diff_flush cases (?)
but I'm not sure. Perhaps that branch should also be looking at
opt->loginfo ?
Finally, I wonder if in describing the removal of the early return:
> - Remove the early return in log_tree_commit() that bypassed
> no_free save/restore, always_show_header, and diff_free().
we might want to be more explicit that this is _because_ line-level
diff is now handled in the regular pipeline?
[I suppose we could, in theory, split the rejection of --full-diff to
a separate prep commit, idk.j]
> Patch 3: line-log: allow non-patch diff formats with -L
>
> Expand the allowlist to accept --raw, --name-only, --name-status, and
> --summary. These only read filepair metadata already set by the line-log
> machinery. Diff stat formats (--stat, --numstat, --shortstat, --dirstat)
> remain blocked because they call compute_diffstat() on full blob content and
> would show whole-file statistics rather than range-scoped ones.
Short and sweet.
The stat formats are kind of like --full-diff, and I think they should
probably all be rejected or all allowed: since the stats are based on
the full-diff, it makes sense to enable them if we can also make -L +
--full-diff semantically sensible.
Otherwise, we'd need to find a way to make the stat formats scoped for -L.
> Michael Montalbo (3):
> revision: move -L setup before output_format-to-diff derivation
> line-log: integrate -L output with the standard log-tree pipeline
> line-log: allow non-patch diff formats with -L
>
> Documentation/line-range-options.adoc | 10 +-
> line-log.c | 30 ++----
> line-log.h | 2 +-
> log-tree.c | 9 +-
> revision.c | 25 +++--
> t/t4211-line-log.sh | 99 ++++++++++++++++---
> t/t4211/sha1/expect.parallel-change-f-to-main | 1 -
> .../sha256/expect.parallel-change-f-to-main | 1 -
> 8 files changed, 120 insertions(+), 57 deletions(-)
>
>
> base-commit: 9f223ef1c026d91c7ac68cc0211bde255dda6199
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2094%2Fmmontalbo%2Fmm%2Fline-log-use-log-tree-diff-flush-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2094/mmontalbo/mm/line-log-use-log-tree-diff-flush-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/2094
> --
> gitgitgadget
A few other comments:
- Tests should use test_grep; some do, but some don't.
- There is one occurrence of "sed | grep" that I wonder if we want to
rewrite to avoid issues with exit status one side of the pipe?
Thanks for working on this!
[Apologies for the unusual review format; this was easier for me at
the moment than digging up the individual patches, and I don't think
_most_ of the review would benefit from spreading out across multiple
mails.]
--
D. Ben Knoble
^ permalink raw reply
* [PATCH v5 13/13] path-walk: support `combine` filter
From: Taylor Blau via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Taylor Blau
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Taylor Blau <me@ttaylorr.com>
The `combine` filter takes the intersection of its children, that is:
objects are shown only when all child filters would admit the object.
The preceding patches added support for many individual filter types.
Enable users to compose these filters by implementing support for the
`combine` filter type.
Mapping intersection onto path_walk_info works because every supported
child filter is a monotonic restriction:
- `blob:none`, `tree:0` unconditionally clear `info->blobs` and (for
`tree:0`) `info->trees`; clearing an already-cleared flag is a
no-op.
- `object:type=X` is now expressed as an AND of each type flag with the
filtered type, so applying multiple such filters only refines the
existing set rather than overwrites it.
- `blob:limit=N` has to compose too: the intersection of "size < L1"
and "size < L2" is "size < min(L1, L2)".
Update the `LOFC_BLOB_LIMIT` handler to take the running minimum when
`info->blob_limit` is already set, so a combined filter with, e.g.,
both "blob:limit=10" and "blob:limit=5" produces a limit of 5
regardless of ordering.
- `sparse:oid` is left unchanged. A `combine` filter that includes a
`sparse:oid` is allowed at most once, since the existing handler
refuses to overwrite `info->pl`. Two `sparse:oid` filters in a single
`combine` would be unusual and are rejected with a warning, matching
the standalone `sparse:oid` behavior.
Implementation-wise, the existing `prepare_filters()` called
`list_objects_filter_release()` inside each case branch. That works fine for
top-level filters, but `combine` filters need to recurse over its child
filters without releasing each one in turn (since the parent's release
iterates the sub array). Split `prepare_filters()` into a recursive helper
that performs only the mutation, plus a thin wrapper that calls the helper
and then releases the top-level filter once.
The `LOFC_COMBINE` case in the helper just walks `sub_nr` and recurses;
child filters are released by the wrapper's single
`list_objects_filter_release()` call on the parent (which itself recursively
releases each sub-filter, the same way it always has).
If any sub-filter is unsupported (e.g. "tree:1", "sparse:<path>", or a
not-yet-supported choice), the recursion bubbles a failure up and the
existing pack-objects/backfill fallback paths kick in.
Add coverage in t6601:
- "combine:blob:none+tree:0" collapses to "tree:0"
- "combine:object:type=blob+blob:limit=3" yields only the blobs
smaller than three bytes
- "combine:object:type=blob+object:type=tree" intersects to empty
- "combine:tree:1+blob:none" reports the "tree:1" error.
Update Documentation/git-pack-objects.adoc to add combine to the
list of supported --filter forms.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-pack-objects.adoc | 3 +-
path-walk.c | 25 ++++++++--
t/t6601-path-walk.sh | 71 +++++++++++++++++++++++++++++
3 files changed, 93 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index f2852ebd31..8a27aa19fd 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -405,7 +405,8 @@ will be automatically changed to version `1`.
Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
ignored in the presence of `--path-walk`. The `--path-walk` option
supports the `--filter=<spec>` forms `blob:none`, `blob:limit=<n>`,
-`tree:0`, `object:type=<type>`, and `sparse:<oid>`.
+`tree:0`, `object:type=<type>`, and `sparse:<oid>`. These supported filter
+types can be combined with the `combine:<spec>+<spec>` form.
DELTA ISLANDS
diff --git a/path-walk.c b/path-walk.c
index 418972e753..94ff90bd15 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -571,8 +571,8 @@ static int setup_pending_objects(struct path_walk_info *info,
return 0;
}
-static int prepare_filters(struct path_walk_info *info,
- struct list_objects_filter_options *options)
+static int prepare_filters_one(struct path_walk_info *info,
+ struct list_objects_filter_options *options)
{
switch (options->choice) {
case LOFC_DISABLED:
@@ -589,7 +589,8 @@ static int prepare_filters(struct path_walk_info *info,
if (info) {
if (!options->blob_limit_value)
info->blobs = 0;
- else
+ else if (!info->blob_limit ||
+ info->blob_limit > options->blob_limit_value)
info->blob_limit = options->blob_limit_value;
list_objects_filter_release(options);
}
@@ -604,7 +605,6 @@ static int prepare_filters(struct path_walk_info *info,
if (info) {
info->trees = 0;
info->blobs = 0;
- list_objects_filter_release(options);
}
return 1;
@@ -656,8 +656,13 @@ static int prepare_filters(struct path_walk_info *info,
warning(_("sparse filter is not cone-mode compatible"));
return 0;
}
+ }
+ return 1;
- list_objects_filter_release(options);
+ case LOFC_COMBINE:
+ for (size_t i = 0; i < options->sub_nr; i++) {
+ if (!prepare_filters_one(info, &options->sub[i]))
+ return 0;
}
return 1;
@@ -668,6 +673,16 @@ static int prepare_filters(struct path_walk_info *info,
}
}
+static int prepare_filters(struct path_walk_info *info,
+ struct list_objects_filter_options *options)
+{
+ if (!prepare_filters_one(info, options))
+ return 0;
+ if (info)
+ list_objects_filter_release(options);
+ return 1;
+}
+
int path_walk_filter_compatible(struct list_objects_filter_options *options)
{
return prepare_filters(NULL, options);
diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh
index 0fd8e61c76..e9fcd85e75 100755
--- a/t/t6601-path-walk.sh
+++ b/t/t6601-path-walk.sh
@@ -727,6 +727,77 @@ test_expect_success 'all, object:type=blob filter' '
test_cmp_sorted expect out
'
+test_expect_success 'all, combine:blob:none+tree:0 filter' '
+ test-tool path-walk \
+ --filter=combine:blob:none+tree:0 -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tag:/tags:$(git rev-parse refs/tags/first)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.1)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.2)
+ 1:tag:/tags:$(git rev-parse refs/tags/third)
+ 1:tag:/tags:$(git rev-parse refs/tags/fourth)
+ 1:tag:/tags:$(git rev-parse refs/tags/tree-tag)
+ 1:tag:/tags:$(git rev-parse refs/tags/blob-tag)
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{})
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{})
+ 3:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag^{tree})
+ 3:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag2)
+ blobs:2
+ commits:4
+ tags:7
+ trees:2
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'all, combine:object:type=blob+blob:limit=3 filter' '
+ test-tool path-walk \
+ --filter=combine:object:type=blob+blob:limit=3 \
+ -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{})
+ 0:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{})
+ 1:blob:a:$(git rev-parse base~2:a)
+ 2:blob:left/b:$(git rev-parse base~2:left/b)
+ 3:blob:right/c:$(git rev-parse base~2:right/c)
+ 4:blob:right/d:$(git rev-parse base~1:right/d)
+ blobs:6
+ commits:0
+ tags:0
+ trees:0
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'all, combine of disjoint object:types is empty' '
+ test-tool path-walk \
+ --filter=combine:object:type=blob+object:type=tree \
+ -- --all >out &&
+
+ cat >expect <<-EOF &&
+ blobs:0
+ commits:0
+ tags:0
+ trees:0
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'combine: rejects unsupported subfilters' '
+ test_must_fail test-tool path-walk \
+ --filter=combine:tree:1+blob:none -- --all 2>err &&
+ test_grep "tree:1 filter not supported by the path-walk API" err
+'
+
test_expect_success 'setup sparse filter blob' '
# Cone-mode patterns: include root, exclude all dirs, include left/
cat >patterns <<-\EOF &&
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 12/13] path-walk: support `object:type` filter
From: Taylor Blau via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Taylor Blau
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Taylor Blau <me@ttaylorr.com>
The `object:type` filter accepts only objects of a single type; it is
the second member of the object-info-only filter family that bitmap
traversal already supports.
Like `blob:none` and `tree:0`, it can be evaluated with nothing more
than the object's type, which is exactly the granularity path-walk's
existing info->{commits,trees,blobs,tags} flags already control.
Map `LOFC_OBJECT_TYPE` in `prepare_filters()` by AND-ing each flag
against the filtered type. A single `object:type=X` filter
applied to the default info (all flags = 1) leaves `info->X = 1` and
all the others 0, which is what we want.
Using an AND rather than straight assignment prepares us for a
subsequent change to implement combined object filters.
The path-walk machinery is mostly already wired for the per-type
distinction:
- `walk_path()` calls `path_fn` for a batch only when the corresponding
`info->X` flag is set, so unwanted types are silently not reported.
- `add_tree_entries()` skips tree entries of type `OBJ_BLOB` when
`info->blobs` is unset, so we don't even allocate paths for them.
- The commit-walk loop short-circuits the root-tree fetch when
`!info->trees && !info->blobs`, so commit-only filters don't descend
into trees at all.
But there are a couple of side effects of the "trees off, blobs on" case
that need fixing:
1. 'setup_pending_objects()' previously skipped pending trees as soon
as `info->trees` was zero. For 'object:type=blob' the call site
needs those pending trees: a lightweight tag pointing to a tree, or
an annotated tag whose peeled target is a tree, can both reach
blobs that are otherwise unreachable from any commit's root tree.
Loosen the gate to "if (!info->trees && !info->blobs) continue" and
similarly retrieve the root_tree_list whenever either trees or
blobs are wanted.
2. The revision machinery's `handle_commit()` drops pending trees when
`revs->tree_objects` is zero (see the 'OBJ_TREE' handler in
revision.c), so by the time path-walk sees the pending list
after `prepare_revision_walk()` the tree-bearing pendings would
already be gone. Fix this by setting
revs->tree_objects = info->trees || info->blobs
so pending trees survive `prepare_revision_walk()` whenever we
need to walk into them. Path-walk still resets tree_objects to
zero immediately after `prepare_revision_walk()` returns, so the
rev-walk itself never enumerates trees redundantly with
path-walk's own descent.
Add coverage in t6601 for each of the four `object:type` values. The
'object:type=blob' test in particular asserts that file2 and child/file
(both reachable only through tag-pointed trees) show up in the output,
exercising the pending-tree fix.
Update Documentation/git-pack-objects.adoc to add object:type to
the list of supported --filter forms.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-pack-objects.adoc | 2 +-
path-walk.c | 13 ++++-
path-walk.h | 6 +++
t/t6601-path-walk.sh | 84 +++++++++++++++++++++++++++++
4 files changed, 103 insertions(+), 2 deletions(-)
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index c86219be91..f2852ebd31 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -405,7 +405,7 @@ will be automatically changed to version `1`.
Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
ignored in the presence of `--path-walk`. The `--path-walk` option
supports the `--filter=<spec>` forms `blob:none`, `blob:limit=<n>`,
-`tree:0`, and `sparse:<oid>`.
+`tree:0`, `object:type=<type>`, and `sparse:<oid>`.
DELTA ISLANDS
diff --git a/path-walk.c b/path-walk.c
index cb67b8ce86..418972e753 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -382,7 +382,7 @@ static int walk_path(struct path_walk_context *ctx,
ret = ctx->info->path_fn(path, &filtered, list->type,
ctx->info->path_fn_data);
oid_array_clear(&filtered);
- } else if (path_is_for_direct_objects(path) ||
+ } else if ((!ctx->info->strict_types && path_is_for_direct_objects(path)) ||
(list->type == OBJ_TREE && ctx->info->trees) ||
(list->type == OBJ_BLOB && ctx->info->blobs) ||
(list->type == OBJ_TAG && ctx->info->tags)) {
@@ -608,6 +608,17 @@ static int prepare_filters(struct path_walk_info *info,
}
return 1;
+ case LOFC_OBJECT_TYPE:
+ if (info) {
+ info->commits &= options->object_type == OBJ_COMMIT;
+ info->tags &= options->object_type == OBJ_TAG;
+ info->trees &= options->object_type == OBJ_TREE;
+ info->blobs &= options->object_type == OBJ_BLOB;
+ info->strict_types = 1;
+ list_objects_filter_release(options);
+ }
+ return 1;
+
case LOFC_SPARSE_OID:
if (info) {
struct object_id sparse_oid;
diff --git a/path-walk.h b/path-walk.h
index 7e57ae5f65..a2652b2d46 100644
--- a/path-walk.h
+++ b/path-walk.h
@@ -47,6 +47,12 @@ struct path_walk_info {
int blobs;
int tags;
+ /**
+ * If 'strict_types' is 0, then direct object requests will no longer
+ * override the object type restrictions.
+ */
+ int strict_types;
+
/**
* If non-zero, specifies a maximum blob size. Blobs with a
* size equal to or greater than this limit will not be
diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh
index 566db7c7e3..0fd8e61c76 100755
--- a/t/t6601-path-walk.sh
+++ b/t/t6601-path-walk.sh
@@ -643,6 +643,90 @@ test_expect_success 'tree:1 filter is rejected' '
test_grep "tree:1 filter not supported by the path-walk API" err
'
+test_expect_success 'all, object:type=commit filter' '
+ test-tool path-walk --filter=object:type=commit -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ blobs:0
+ commits:4
+ tags:0
+ trees:0
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'all, object:type=tag filter' '
+ test-tool path-walk --filter=object:type=tag -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:tag:/tags:$(git rev-parse refs/tags/first)
+ 0:tag:/tags:$(git rev-parse refs/tags/second.1)
+ 0:tag:/tags:$(git rev-parse refs/tags/second.2)
+ 0:tag:/tags:$(git rev-parse refs/tags/third)
+ 0:tag:/tags:$(git rev-parse refs/tags/fourth)
+ 0:tag:/tags:$(git rev-parse refs/tags/tree-tag)
+ 0:tag:/tags:$(git rev-parse refs/tags/blob-tag)
+ blobs:0
+ commits:0
+ tags:7
+ trees:0
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'all, object:type=tree filter' '
+ test-tool path-walk --filter=object:type=tree -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:tree::$(git rev-parse topic^{tree})
+ 0:tree::$(git rev-parse base^{tree})
+ 0:tree::$(git rev-parse base~1^{tree})
+ 0:tree::$(git rev-parse base~2^{tree})
+ 1:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag^{})
+ 1:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag2^{})
+ 2:tree:a/:$(git rev-parse base:a)
+ 3:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 4:tree:left/:$(git rev-parse base:left)
+ 4:tree:left/:$(git rev-parse base~2:left)
+ 5:tree:right/:$(git rev-parse topic:right)
+ 5:tree:right/:$(git rev-parse base~1:right)
+ 5:tree:right/:$(git rev-parse base~2:right)
+ blobs:0
+ commits:0
+ tags:0
+ trees:13
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'all, object:type=blob filter' '
+ test-tool path-walk --filter=object:type=blob -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{})
+ 0:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{})
+ 1:blob:a:$(git rev-parse base~2:a)
+ 2:blob:left/b:$(git rev-parse base:left/b)
+ 2:blob:left/b:$(git rev-parse base~2:left/b)
+ 3:blob:right/c:$(git rev-parse base~2:right/c)
+ 3:blob:right/c:$(git rev-parse topic:right/c)
+ 4:blob:right/d:$(git rev-parse base~1:right/d)
+ blobs:8
+ commits:0
+ tags:0
+ trees:0
+ EOF
+
+ test_cmp_sorted expect out
+'
+
test_expect_success 'setup sparse filter blob' '
# Cone-mode patterns: include root, exclude all dirs, include left/
cat >patterns <<-\EOF &&
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 11/13] path-walk: support `tree:0` filter
From: Taylor Blau via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Taylor Blau
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Taylor Blau <me@ttaylorr.com>
The `tree:0` object filter omits all trees and blobs from the result,
keeping only commits and tags. Consequently, this filter type should
has a fairly straightforward integration with path-walk, as the decision
to include an object depends only on its type and does not depend on any
path-sensitive state.
Mapping it onto `path_walk_info` is direct: set `info->trees = 0` and
`info->blobs = 0` in `prepare_filters()` when the `LOFC_TREE_DEPTH`
choice is requested with depth zero. The existing code already plumbs
those flags through the rest of the walk:
- 'walk_objects_by_path()' sets `revs->blob_objects = info->blobs` and
`revs->tree_objects = info->trees` before `prepare_revision_walk()`,
so the revision walk doesn't try to enumerate trees or blobs itself.
- The commit-walk loop short-circuits the root-tree fetch with
"if (!info->trees && !info->blobs) continue;", so we never even
look up the root tree, let alone descend into it.
- `setup_pending_objects()` skips pending trees and blobs based on
the same flags.
This means the path-walk doesn't allocate or expand any tree structures
at all under `tree:0`, which matches the intended behavior of the
filter.
However, this requires first fixing some issues with how the path-walk
API handles directly-requested trees _and_ trees requested through
lightweight tags. These changes create substantial updates to
t6601-path-walk.sh, which the previous change highlighted as a problem
by tagging otherwise-unreachable trees and having them not appear in the
output.
Non-zero tree-depth filters are not supported. Those depend on the depth
at which a tree is visited, which is a path-walk concept the filter
machinery doesn't currently share with the path-walk API. Reject them in
`prepare_filters()` with a helpful error and let pack-objects fall back
to the regular traversal, the same way it already does for unsupported
filters.
Add coverage in t6601 for both `--all` and a single-branch case to
confirm that no trees or blobs are emitted, and a separate test that
`tree:1` is rejected with the expected error message. Place the new
tests before "setup sparse filter blob" so they run on the original set
of refs, before the orphan branch that the sparse-tree tests create.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-pack-objects.adoc | 4 +-
path-walk.c | 53 +++++++--
t/t6601-path-walk.sh | 165 ++++++++++++++++++----------
3 files changed, 152 insertions(+), 70 deletions(-)
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index e38853391b..c86219be91 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -404,8 +404,8 @@ will be automatically changed to version `1`.
+
Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
ignored in the presence of `--path-walk`. The `--path-walk` option
-supports the `--filter=<spec>` forms `blob:none`, `blob:limit=<n>`, and
-`sparse:<oid>`.
+supports the `--filter=<spec>` forms `blob:none`, `blob:limit=<n>`,
+`tree:0`, and `sparse:<oid>`.
DELTA ISLANDS
diff --git a/path-walk.c b/path-walk.c
index ce38dcf1e9..cb67b8ce86 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -390,11 +390,18 @@ static int walk_path(struct path_walk_context *ctx,
ctx->info->path_fn_data);
}
- /* Expand data for children. */
- if (list->type == OBJ_TREE) {
+ /*
+ * Expand tree children, except when the set is directly requested
+ * _and_ we are otherwise filtering out trees.
+ */
+ if (list->type == OBJ_TREE &&
+ (!path_is_for_direct_objects(path) || ctx->info->trees)) {
+ /* Use root path if expanding from tagged/direct trees. */
+ const char *expand_path = !strcmp(path, "/tagged-trees")
+ ? root_path : path;
for (size_t i = 0; i < list->oids.nr; i++) {
ret |= add_tree_entries(ctx,
- path,
+ expand_path,
&list->oids.oid[i]);
}
}
@@ -442,12 +449,12 @@ static int setup_pending_objects(struct path_walk_info *info,
{
struct type_and_oid_list *tags = NULL;
struct type_and_oid_list *tagged_blobs = NULL;
- struct type_and_oid_list *root_tree_list = NULL;
+ struct type_and_oid_list *tagged_trees = NULL;
if (info->tags)
CALLOC_ARRAY(tags, 1);
CALLOC_ARRAY(tagged_blobs, 1);
- root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);
+ CALLOC_ARRAY(tagged_trees, 1);
/*
* Pending objects include:
@@ -491,14 +498,15 @@ static int setup_pending_objects(struct path_walk_info *info,
switch (obj->type) {
case OBJ_TREE:
- if (pending->path) {
- char *path = *pending->path ? xstrfmt("%s/", pending->path)
- : xstrdup("");
+ if (pending->path && *pending->path) {
+ char *path = xstrfmt("%s/", pending->path);
add_path_to_list(ctx, path, OBJ_TREE, &obj->oid, 1);
free(path);
+ } else if (!pending->path || !info->trees) {
+ oid_array_append(&tagged_trees->oids, &obj->oid);
} else {
- /* assume a root tree, such as a lightweight tag. */
- oid_array_append(&root_tree_list->oids, &obj->oid);
+ add_path_to_list(ctx, root_path, OBJ_TREE,
+ &obj->oid, 1);
}
break;
@@ -535,6 +543,18 @@ static int setup_pending_objects(struct path_walk_info *info,
free(tagged_blobs);
}
}
+ if (tagged_trees) {
+ if (tagged_trees->oids.nr) {
+ const char *tagged_tree_path = "/tagged-trees";
+ tagged_trees->type = OBJ_TREE;
+ tagged_trees->maybe_interesting = 1;
+ strmap_put(&ctx->paths_to_lists, tagged_tree_path, tagged_trees);
+ push_to_stack(ctx, tagged_tree_path);
+ } else {
+ oid_array_clear(&tagged_trees->oids);
+ free(tagged_trees);
+ }
+ }
if (tags) {
if (tags->oids.nr) {
const char *tag_path = "/tags";
@@ -575,6 +595,19 @@ static int prepare_filters(struct path_walk_info *info,
}
return 1;
+ case LOFC_TREE_DEPTH:
+ if (options->tree_exclude_depth) {
+ error(_("tree:%lu filter not supported by the path-walk API"),
+ options->tree_exclude_depth);
+ return 0;
+ }
+ if (info) {
+ info->trees = 0;
+ info->blobs = 0;
+ list_objects_filter_release(options);
+ }
+ return 1;
+
case LOFC_SPARSE_OID:
if (info) {
struct object_id sparse_oid;
diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh
index 92c524d145..566db7c7e3 100755
--- a/t/t6601-path-walk.sh
+++ b/t/t6601-path-walk.sh
@@ -77,23 +77,23 @@ test_expect_success 'all' '
3:tree::$(git rev-parse base^{tree})
3:tree::$(git rev-parse base~1^{tree})
3:tree::$(git rev-parse base~2^{tree})
- 3:tree::$(git rev-parse refs/tags/tree-tag^{})
- 3:tree::$(git rev-parse refs/tags/tree-tag2^{})
4:blob:a:$(git rev-parse base~2:a)
- 5:blob:file2:$(git rev-parse refs/tags/tree-tag2^{}:file2)
- 6:tree:a/:$(git rev-parse base:a)
- 7:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
- 8:blob:child/file:$(git rev-parse refs/tags/tree-tag:child/file)
- 9:tree:left/:$(git rev-parse base:left)
- 9:tree:left/:$(git rev-parse base~2:left)
- 10:blob:left/b:$(git rev-parse base~2:left/b)
- 10:blob:left/b:$(git rev-parse base:left/b)
- 11:tree:right/:$(git rev-parse topic:right)
- 11:tree:right/:$(git rev-parse base~1:right)
- 11:tree:right/:$(git rev-parse base~2:right)
- 12:blob:right/c:$(git rev-parse base~2:right/c)
- 12:blob:right/c:$(git rev-parse topic:right/c)
- 13:blob:right/d:$(git rev-parse base~1:right/d)
+ 5:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag^{})
+ 5:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag2^{})
+ 6:blob:file2:$(git rev-parse refs/tags/tree-tag2^{}:file2)
+ 7:tree:a/:$(git rev-parse base:a)
+ 8:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 9:blob:child/file:$(git rev-parse refs/tags/tree-tag:child/file)
+ 10:tree:left/:$(git rev-parse base:left)
+ 10:tree:left/:$(git rev-parse base~2:left)
+ 11:blob:left/b:$(git rev-parse base~2:left/b)
+ 11:blob:left/b:$(git rev-parse base:left/b)
+ 12:tree:right/:$(git rev-parse topic:right)
+ 12:tree:right/:$(git rev-parse base~1:right)
+ 12:tree:right/:$(git rev-parse base~2:right)
+ 13:blob:right/c:$(git rev-parse base~2:right/c)
+ 13:blob:right/c:$(git rev-parse topic:right/c)
+ 14:blob:right/d:$(git rev-parse base~1:right/d)
blobs:10
commits:4
tags:7
@@ -471,15 +471,15 @@ test_expect_success 'all, blob:none filter' '
3:tree::$(git rev-parse base^{tree})
3:tree::$(git rev-parse base~1^{tree})
3:tree::$(git rev-parse base~2^{tree})
- 3:tree::$(git rev-parse refs/tags/tree-tag^{})
- 3:tree::$(git rev-parse refs/tags/tree-tag2^{})
- 4:tree:a/:$(git rev-parse base:a)
- 5:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
- 6:tree:left/:$(git rev-parse base:left)
- 6:tree:left/:$(git rev-parse base~2:left)
- 7:tree:right/:$(git rev-parse topic:right)
- 7:tree:right/:$(git rev-parse base~1:right)
- 7:tree:right/:$(git rev-parse base~2:right)
+ 4:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag^{})
+ 4:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag2^{})
+ 5:tree:a/:$(git rev-parse base:a)
+ 6:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 7:tree:left/:$(git rev-parse base:left)
+ 7:tree:left/:$(git rev-parse base~2:left)
+ 8:tree:right/:$(git rev-parse topic:right)
+ 8:tree:right/:$(git rev-parse base~1:right)
+ 8:tree:right/:$(git rev-parse base~2:right)
blobs:2
commits:4
tags:7
@@ -533,15 +533,15 @@ test_expect_success 'all, blob:limit=0 filter' '
3:tree::$(git rev-parse base^{tree})
3:tree::$(git rev-parse base~1^{tree})
3:tree::$(git rev-parse base~2^{tree})
- 3:tree::$(git rev-parse refs/tags/tree-tag^{})
- 3:tree::$(git rev-parse refs/tags/tree-tag2^{})
- 4:tree:a/:$(git rev-parse base:a)
- 5:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
- 6:tree:left/:$(git rev-parse base:left)
- 6:tree:left/:$(git rev-parse base~2:left)
- 7:tree:right/:$(git rev-parse topic:right)
- 7:tree:right/:$(git rev-parse base~1:right)
- 7:tree:right/:$(git rev-parse base~2:right)
+ 4:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag^{})
+ 4:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag2^{})
+ 5:tree:a/:$(git rev-parse base:a)
+ 6:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 7:tree:left/:$(git rev-parse base:left)
+ 7:tree:left/:$(git rev-parse base~2:left)
+ 8:tree:right/:$(git rev-parse topic:right)
+ 8:tree:right/:$(git rev-parse base~1:right)
+ 8:tree:right/:$(git rev-parse base~2:right)
blobs:2
commits:4
tags:7
@@ -572,19 +572,19 @@ test_expect_success 'all, blob:limit=3 filter' '
3:tree::$(git rev-parse base^{tree})
3:tree::$(git rev-parse base~1^{tree})
3:tree::$(git rev-parse base~2^{tree})
- 3:tree::$(git rev-parse refs/tags/tree-tag^{})
- 3:tree::$(git rev-parse refs/tags/tree-tag2^{})
4:blob:a:$(git rev-parse base~2:a)
- 5:tree:a/:$(git rev-parse base:a)
- 6:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
- 7:tree:left/:$(git rev-parse base:left)
- 7:tree:left/:$(git rev-parse base~2:left)
- 8:blob:left/b:$(git rev-parse base~2:left/b)
- 9:tree:right/:$(git rev-parse topic:right)
- 9:tree:right/:$(git rev-parse base~1:right)
- 9:tree:right/:$(git rev-parse base~2:right)
- 10:blob:right/c:$(git rev-parse base~2:right/c)
- 11:blob:right/d:$(git rev-parse base~1:right/d)
+ 5:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag^{})
+ 5:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag2^{})
+ 6:tree:a/:$(git rev-parse base:a)
+ 7:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 8:tree:left/:$(git rev-parse base:left)
+ 8:tree:left/:$(git rev-parse base~2:left)
+ 9:blob:left/b:$(git rev-parse base~2:left/b)
+ 10:tree:right/:$(git rev-parse topic:right)
+ 10:tree:right/:$(git rev-parse base~1:right)
+ 10:tree:right/:$(git rev-parse base~2:right)
+ 11:blob:right/c:$(git rev-parse base~2:right/c)
+ 12:blob:right/d:$(git rev-parse base~1:right/d)
blobs:6
commits:4
tags:7
@@ -594,6 +594,55 @@ test_expect_success 'all, blob:limit=3 filter' '
test_cmp_sorted expect out
'
+test_expect_success 'all, tree:0 filter' '
+ test-tool path-walk --filter=tree:0 -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tag:/tags:$(git rev-parse refs/tags/first)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.1)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.2)
+ 1:tag:/tags:$(git rev-parse refs/tags/third)
+ 1:tag:/tags:$(git rev-parse refs/tags/fourth)
+ 1:tag:/tags:$(git rev-parse refs/tags/tree-tag)
+ 1:tag:/tags:$(git rev-parse refs/tags/blob-tag)
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{})
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{})
+ 3:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag^{tree})
+ 3:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag2)
+ blobs:2
+ commits:4
+ tags:7
+ trees:2
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'topic only, tree:0 filter' '
+ test-tool path-walk --filter=tree:0 -- topic >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ blobs:0
+ commits:3
+ tags:0
+ trees:0
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'tree:1 filter is rejected' '
+ test_must_fail test-tool path-walk --filter=tree:1 -- --all 2>err &&
+ test_grep "tree:1 filter not supported by the path-walk API" err
+'
+
test_expect_success 'setup sparse filter blob' '
# Cone-mode patterns: include root, exclude all dirs, include left/
cat >patterns <<-\EOF &&
@@ -625,19 +674,19 @@ test_expect_success 'all, sparse:oid filter' '
3:tree::$(git rev-parse base^{tree})
3:tree::$(git rev-parse base~1^{tree})
3:tree::$(git rev-parse base~2^{tree})
- 3:tree::$(git rev-parse refs/tags/tree-tag^{})
- 3:tree::$(git rev-parse refs/tags/tree-tag2^{})
4:blob:a:$(git rev-parse base~2:a)
- 5:blob:file2:$(git rev-parse refs/tags/tree-tag2^{}:file2)
- 6:tree:a/:$(git rev-parse base:a)
- 7:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
- 8:tree:left/:$(git rev-parse base:left)
- 8:tree:left/:$(git rev-parse base~2:left)
- 9:blob:left/b:$(git rev-parse base~2:left/b)
- 9:blob:left/b:$(git rev-parse base:left/b)
- 10:tree:right/:$(git rev-parse topic:right)
- 10:tree:right/:$(git rev-parse base~1:right)
- 10:tree:right/:$(git rev-parse base~2:right)
+ 5:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag^{})
+ 5:tree:/tagged-trees:$(git rev-parse refs/tags/tree-tag2^{})
+ 6:blob:file2:$(git rev-parse refs/tags/tree-tag2^{}:file2)
+ 7:tree:a/:$(git rev-parse base:a)
+ 8:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 9:tree:left/:$(git rev-parse base:left)
+ 9:tree:left/:$(git rev-parse base~2:left)
+ 10:blob:left/b:$(git rev-parse base~2:left/b)
+ 10:blob:left/b:$(git rev-parse base:left/b)
+ 11:tree:right/:$(git rev-parse topic:right)
+ 11:tree:right/:$(git rev-parse base~1:right)
+ 11:tree:right/:$(git rev-parse base~2:right)
blobs:6
commits:4
tags:7
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 10/13] t6601: tag otherwise-unreachable trees
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
The tests in t6601-path-walk.sh demonstrate the behavior of the
path-walk API under different conditions. One thing that I noticed while
updating the behavior of directly-requested objects is that we don't
actually emit tagged trees. This was previously not noticed due to those
tagged trees actually being reachable from commits that we are including
in the path-walk.
Update the test setup to have tree-tag and tree-tag2 point to trees that
are otherwise unreachable.
It is worth noting that this does not meaningfully change any of the
other test cases, demontrating the bug.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t6601-path-walk.sh | 24 +++++++++++-------------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh
index ac294867a5..92c524d145 100755
--- a/t/t6601-path-walk.sh
+++ b/t/t6601-path-walk.sh
@@ -7,17 +7,15 @@ test_description='direct path-walk API tests'
test_expect_success 'setup test repository' '
git checkout -b base &&
- # Make some objects that will only be reachable
- # via non-commit tags.
- mkdir child &&
- echo file >child/file &&
- git add child &&
- git commit -m "will abandon" &&
- git tag -a -m "tree" tree-tag HEAD^{tree} &&
- echo file2 >file2 &&
- git add file2 &&
- git commit --amend -m "will abandon" &&
- git tag tree-tag2 HEAD^{tree} &&
+ # Create tree objects that are only reachable via tags,
+ # not from any commit in the history.
+ child_blob_oid=$(echo "child blob content" | git hash-object -t blob -w --stdin) &&
+ child_tree_oid=$(printf "100644 blob %s\tfile\n" "$child_blob_oid" | git mktree) &&
+ tree_tag_oid=$(printf "040000 tree %s\tchild\n" "$child_tree_oid" | git mktree) &&
+ git tag -a -m "tree" tree-tag "$tree_tag_oid" &&
+ file2_blob_oid=$(echo "tagged tree file2" | git hash-object -t blob -w --stdin) &&
+ tree_tag2_oid=$(printf "040000 tree %s\tchild\n100644 blob %s\tfile2\n" "$child_tree_oid" "$file2_blob_oid" | git mktree) &&
+ git tag tree-tag2 "$tree_tag2_oid" &&
echo blob >file &&
blob_oid=$(git hash-object -t blob -w --stdin <file) &&
@@ -26,7 +24,7 @@ test_expect_success 'setup test repository' '
blob2_oid=$(git hash-object -t blob -w --stdin <file2) &&
git tag blob-tag2 "$blob2_oid" &&
- rm -fr child file file2 &&
+ rm -fr file file2 &&
mkdir left &&
mkdir right &&
@@ -34,7 +32,7 @@ test_expect_success 'setup test repository' '
echo b >left/b &&
echo c >right/c &&
git add . &&
- git commit --amend -m "first" &&
+ git commit -m "first" &&
git tag -m "first" first HEAD &&
echo d >right/d &&
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 09/13] pack-objects: support sparse:oid filter with path-walk
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
The --filter=sparse:<oid> option to 'git pack-objects' allows focusing
an object set to a sparse-checkout definition. This reduces the set of
matching blobs while retaining all reachable trees. No server currently
supports fetching with this filter because it is expensive to compute
and reachability bitmaps do not help without a significant effort to
extend the bitmap feature to store bitmaps for each supported sparse-
checkout definition.
Without focusing on serving fetches and clones with these filters, there
are still benefits that could be realized by making this faster. With
the sparse index, it's more realistic now than ever to be able to
operate a local clone that was bootstrapped by a packfile created with
a sparse filter, because the missing trees are not needed to move a
sparse-checkout from one commit to another or to view the history of any
path in scope. Such clones could perhaps be bootstrapped by partial
bundles.
Previously, constructing these sparse packs has been incredibly
computationally inefficient. The revision walk that explores which
objects are in scope spends a lot of time checking each object to see if
it matches the sparse-checkout patterns, causing quadratic behavior
(number of objects times number of sparse-checkout patterns). This
improves somewhat when using cone-mode sparse-checkout patterns that can
use hashtables and prefix matches to determine containment. However, the
check per object is still too expensive for most cases.
This is where the path-walk feature comes in. We can proceed as normal
by placing objects in bins by path and _then_ check a group of objects
all at once. Since sparse:<oid> only restricts blobs, the path-walk must
include all reachable trees while using the cone-mode patterns to skip
blobs at paths outside the sparse scope. This establishes a baseline for
a potential future "treesparse:<oid>" filter that would also restrict
trees, but introducing such a new filter is deferred to a later change.
The implementation here is focused around loading the sparse-checkout
patterns from the provided object ID and checking that the patterns are
indeed cone-mode patterns. We can then load the correct pattern list
into the path walk context and use the logic that already exists from
bff45557675 (backfill: add --sparse option, 2025-02-03), though that
feature loads sparse-checkout patterns from the worktree's local
settings and also restricts tree objects. We use a combination of errors
and warnings to signal problems during this load. The difference is that
errors are likely fatal for the non-path-walk version while the warnings
are probably just implementation details for the path-walk version and
the 'git pack-objects' command can fall back to the revision walk
version.
Now that the SEEN flag is deferred until after pattern checks (from the
previous commit), handle the case where a tree with a shared OID appears
at both an out-of-cone and in-cone path. When trees are not being pruned
(pl_sparse_trees == 0), the path-walk re-walks the tree at the in-cone
path so that in-cone blobs within it are discovered. The new tests in
t5317 and t6601 demonstrate this behavior and would fail without these
changes.
The performance test p5315 shows the impact of this change when using
sparse filters:
Test HEAD~1 HEAD
----------------------------------------------------------------------
5315.10: repack (sparse:oid) 77.98 77.47 -0.7%
5315.11: repack size (sparse:oid) 187.5M 187.4M -0.0%
5315.12: repack (sparse:oid, --path-walk) 77.91 31.41 -59.7%
5315.13: repack size (sparse:oid, --path-walk) 187.5M 161.1M -14.1%
These performance tests were run on the Git repository. The --path-walk
feature shows meaningful space savings (14% smaller for sparse packs)
and dramatic time savings (60% faster) by leveraging the path-walk's
ability to skip blobs outside the sparse scope.
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blaue <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-backfill.adoc | 4 +
Documentation/git-pack-objects.adoc | 3 +-
builtin/pack-objects.c | 16 ++-
path-walk.c | 81 ++++++++++++++-
t/t5317-pack-objects-filter-objects.sh | 125 +++++++++++++++++++++++
t/t6601-path-walk.sh | 131 +++++++++++++++++++++++++
6 files changed, 350 insertions(+), 10 deletions(-)
diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
index c0a3b80615..82d6a1969d 100644
--- a/Documentation/git-backfill.adoc
+++ b/Documentation/git-backfill.adoc
@@ -80,6 +80,10 @@ OPTIONS
+
You may also use commit-limiting options understood by
linkgit:git-rev-list[1] such as `--first-parent`, `--since`, or pathspecs.
++
+Most `--filter=<spec>` options don't work with the purpose of
+`git backfill`, but the `sparse:<oid>` filter is integrated to provide a
+focused set of paths to download, distinct from the `--sparse` option.
SEE ALSO
--------
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 85ae48b699..e38853391b 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -404,7 +404,8 @@ will be automatically changed to version `1`.
+
Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
ignored in the presence of `--path-walk`. The `--path-walk` option
-supports the `--filter=<spec>` forms `blob:none` and `blob:limit=<n>`.
+supports the `--filter=<spec>` forms `blob:none`, `blob:limit=<n>`, and
+`sparse:<oid>`.
DELTA ISLANDS
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index bc9fb5b457..b783dc62bc 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -4754,7 +4754,7 @@ static int add_objects_by_path(const char *path,
return 0;
}
-static void get_object_list_path_walk(struct rev_info *revs)
+static int get_object_list_path_walk(struct rev_info *revs)
{
struct path_walk_info info = PATH_WALK_INFO_INIT;
unsigned int processed = 0;
@@ -4777,8 +4777,9 @@ static void get_object_list_path_walk(struct rev_info *revs)
result = walk_objects_by_path(&info);
trace2_region_leave("pack-objects", "path-walk", revs->repo);
- if (result)
- die(_("failed to pack objects via path-walk"));
+ path_walk_info_clear(&info);
+
+ return result;
}
static void get_object_list(struct rev_info *revs, struct strvec *argv)
@@ -4841,8 +4842,13 @@ static void get_object_list(struct rev_info *revs, struct strvec *argv)
fn_show_object = show_object;
if (path_walk) {
- get_object_list_path_walk(revs);
- } else {
+ if (get_object_list_path_walk(revs)) {
+ warning(_("failed to pack objects via path-walk"));
+ path_walk = 0;
+ }
+ }
+
+ if (!path_walk) {
if (prepare_revision_walk(revs))
die(_("revision walk setup failed"));
mark_edges_uninteresting(revs, show_edge, sparse);
diff --git a/path-walk.c b/path-walk.c
index 225857bbc8..ce38dcf1e9 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -10,6 +10,7 @@
#include "hex.h"
#include "list-objects.h"
#include "list-objects-filter-options.h"
+#include "object-name.h"
#include "odb.h"
#include "object.h"
#include "oid-array.h"
@@ -180,10 +181,6 @@ static int add_tree_entries(struct path_walk_context *ctx,
return -1;
}
- /* Skip this object if already seen. */
- if (o->flags & SEEN)
- continue;
-
strbuf_setlen(&path, base_len);
strbuf_add(&path, entry.path, entry.pathlen);
@@ -194,6 +191,40 @@ static int add_tree_entries(struct path_walk_context *ctx,
if (type == OBJ_TREE)
strbuf_addch(&path, '/');
+ if (o->flags & SEEN) {
+ /*
+ * A tree with a shared OID may appear at multiple
+ * paths. Even though we already added this tree to
+ * the output at some other path, we still need to
+ * walk into it at this in-cone path to discover
+ * blobs that were not found at the earlier
+ * out-of-cone path.
+ *
+ * Only do this for paths not yet in our map, to
+ * avoid duplicate entries when the same tree OID
+ * appears at the same path across multiple commits.
+ */
+ if (type == OBJ_TREE && ctx->info->pl &&
+ ctx->info->pl->use_cone_patterns &&
+ !ctx->info->pl_sparse_trees &&
+ !strmap_contains(&ctx->paths_to_lists, path.buf)) {
+ int dtype;
+ enum pattern_match_result m;
+ m = path_matches_pattern_list(path.buf, path.len,
+ path.buf + base_len,
+ &dtype,
+ ctx->info->pl,
+ ctx->repo->index);
+ if (m != NOT_MATCHED) {
+ add_path_to_list(ctx, path.buf, type,
+ &entry.oid,
+ !(o->flags & UNINTERESTING));
+ push_to_stack(ctx, path.buf);
+ }
+ }
+ continue;
+ }
+
if (ctx->info->pl) {
int dtype;
enum pattern_match_result match;
@@ -544,6 +575,48 @@ static int prepare_filters(struct path_walk_info *info,
}
return 1;
+ case LOFC_SPARSE_OID:
+ if (info) {
+ struct object_id sparse_oid;
+ struct repository *repo = info->revs->repo;
+
+ if (info->pl) {
+ warning(_("sparse filter cannot be combined with existing sparse patterns"));
+ return 0;
+ }
+
+ if (repo_get_oid_with_flags(repo,
+ options->sparse_oid_name,
+ &sparse_oid,
+ GET_OID_BLOB)) {
+ error(_("unable to access sparse blob in '%s'"),
+ options->sparse_oid_name);
+ return 0;
+ }
+
+ CALLOC_ARRAY(info->pl, 1);
+ info->pl->use_cone_patterns = 1;
+
+ if (add_patterns_from_blob_to_list(&sparse_oid, "", 0,
+ info->pl) < 0) {
+ clear_pattern_list(info->pl);
+ FREE_AND_NULL(info->pl);
+ error(_("unable to parse sparse filter data in '%s'"),
+ oid_to_hex(&sparse_oid));
+ return 0;
+ }
+
+ if (!info->pl->use_cone_patterns) {
+ clear_pattern_list(info->pl);
+ FREE_AND_NULL(info->pl);
+ warning(_("sparse filter is not cone-mode compatible"));
+ return 0;
+ }
+
+ list_objects_filter_release(options);
+ }
+ return 1;
+
default:
error(_("object filter '%s' not supported by the path-walk API"),
list_objects_filter_spec(options));
diff --git a/t/t5317-pack-objects-filter-objects.sh b/t/t5317-pack-objects-filter-objects.sh
index 501d715b9a..dddb79ba62 100755
--- a/t/t5317-pack-objects-filter-objects.sh
+++ b/t/t5317-pack-objects-filter-objects.sh
@@ -478,4 +478,129 @@ test_expect_success 'verify pack-objects w/ --missing=allow-any' '
EOF
'
+# Test that --path-walk produces the same object set as standard traversal
+# when using sparse:oid filters with cone-mode patterns.
+#
+# The sparse:oid filter restricts only blobs, not trees. Both standard
+# and path-walk should produce identical sets of blobs, commits, and trees.
+
+test_expect_success 'setup pw_sparse for path-walk comparison' '
+ git init pw_sparse &&
+ mkdir -p pw_sparse/inc/sub pw_sparse/exc/sub &&
+
+ for n in 1 2
+ do
+ echo "inc $n" >pw_sparse/inc/file$n &&
+ echo "inc sub $n" >pw_sparse/inc/sub/file$n &&
+ echo "exc $n" >pw_sparse/exc/file$n &&
+ echo "exc sub $n" >pw_sparse/exc/sub/file$n &&
+ echo "root $n" >pw_sparse/root$n || return 1
+ done &&
+
+ git -C pw_sparse add . &&
+ git -C pw_sparse commit -m "first" &&
+
+ echo "inc 1 modified" >pw_sparse/inc/file1 &&
+ echo "exc 1 modified" >pw_sparse/exc/file1 &&
+ echo "root 1 modified" >pw_sparse/root1 &&
+ git -C pw_sparse add . &&
+ git -C pw_sparse commit -m "second" &&
+
+ # Cone-mode sparse pattern: include root + inc/
+ printf "/*\n!/*/\n/inc/\n" |
+ git -C pw_sparse hash-object -w --stdin >sparse_oid
+'
+
+test_expect_success 'sparse:oid with --path-walk produces same blobs' '
+ oid=$(cat sparse_oid) &&
+
+ git -C pw_sparse pack-objects --revs --stdout \
+ --filter=sparse:oid=$oid >standard.pack <<-EOF &&
+ HEAD
+ EOF
+ git -C pw_sparse index-pack ../standard.pack &&
+ git -C pw_sparse verify-pack -v ../standard.pack >standard_verify &&
+
+ git -C pw_sparse pack-objects --revs --stdout \
+ --path-walk --filter=sparse:oid=$oid >pathwalk.pack <<-EOF &&
+ HEAD
+ EOF
+ git -C pw_sparse index-pack ../pathwalk.pack &&
+ git -C pw_sparse verify-pack -v ../pathwalk.pack >pathwalk_verify &&
+
+ # Blobs must match exactly
+ grep -E "^[0-9a-f]{40} blob" standard_verify |
+ awk "{print \$1}" | sort >standard_blobs &&
+ grep -E "^[0-9a-f]{40} blob" pathwalk_verify |
+ awk "{print \$1}" | sort >pathwalk_blobs &&
+ test_cmp standard_blobs pathwalk_blobs &&
+
+ # Commits must match exactly
+ grep -E "^[0-9a-f]{40} commit" standard_verify |
+ awk "{print \$1}" | sort >standard_commits &&
+ grep -E "^[0-9a-f]{40} commit" pathwalk_verify |
+ awk "{print \$1}" | sort >pathwalk_commits &&
+ test_cmp standard_commits pathwalk_commits
+'
+
+test_expect_success 'sparse:oid with --path-walk includes all trees' '
+ # The sparse:oid filter restricts only blobs, not trees.
+ # Both standard and path-walk should include the same trees.
+ grep -E "^[0-9a-f]{40} tree" standard_verify |
+ awk "{print \$1}" | sort >standard_trees &&
+ grep -E "^[0-9a-f]{40} tree" pathwalk_verify |
+ awk "{print \$1}" | sort >pathwalk_trees &&
+
+ test_cmp standard_trees pathwalk_trees
+'
+
+# Test the edge case where the same tree/blob OID appears at both an
+# in-cone and out-of-cone path. When sibling directories have identical
+# contents, they share a tree OID. The path-walk defers marking objects
+# SEEN until after checking sparse patterns, so an object at an out-of-cone
+# path can still be discovered at an in-cone path.
+
+test_expect_success 'setup pw_shared for shared OID across cone boundary' '
+ git init pw_shared &&
+ mkdir pw_shared/aaa pw_shared/zzz &&
+ echo "shared content" >pw_shared/aaa/file &&
+ echo "shared content" >pw_shared/zzz/file &&
+ echo "root file" >pw_shared/rootfile &&
+ git -C pw_shared add . &&
+ git -C pw_shared commit -m "aaa and zzz share tree OID" &&
+
+ # Verify they share a tree OID
+ aaa_tree=$(git -C pw_shared rev-parse HEAD:aaa) &&
+ zzz_tree=$(git -C pw_shared rev-parse HEAD:zzz) &&
+ test "$aaa_tree" = "$zzz_tree" &&
+
+ # Cone pattern: include root + zzz/ (not aaa/)
+ printf "/*\n!/*/\n/zzz/\n" |
+ git -C pw_shared hash-object -w --stdin >shared_sparse_oid
+'
+
+test_expect_success 'shared tree OID: --path-walk blobs match standard' '
+ oid=$(cat shared_sparse_oid) &&
+
+ git -C pw_shared pack-objects --revs --stdout \
+ --filter=sparse:oid=$oid >shared_std.pack <<-EOF &&
+ HEAD
+ EOF
+ git -C pw_shared index-pack ../shared_std.pack &&
+ git -C pw_shared verify-pack -v ../shared_std.pack >shared_std_verify &&
+
+ git -C pw_shared pack-objects --revs --stdout \
+ --path-walk --filter=sparse:oid=$oid >shared_pw.pack <<-EOF &&
+ HEAD
+ EOF
+ git -C pw_shared index-pack ../shared_pw.pack &&
+ git -C pw_shared verify-pack -v ../shared_pw.pack >shared_pw_verify &&
+
+ grep -E "^[0-9a-f]{40} blob" shared_std_verify |
+ awk "{print \$1}" | sort >shared_std_blobs &&
+ grep -E "^[0-9a-f]{40} blob" shared_pw_verify |
+ awk "{print \$1}" | sort >shared_pw_blobs &&
+ test_cmp shared_std_blobs shared_pw_blobs
+'
+
test_done
diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh
index 02ad83dfb0..ac294867a5 100755
--- a/t/t6601-path-walk.sh
+++ b/t/t6601-path-walk.sh
@@ -596,4 +596,135 @@ test_expect_success 'all, blob:limit=3 filter' '
test_cmp_sorted expect out
'
+test_expect_success 'setup sparse filter blob' '
+ # Cone-mode patterns: include root, exclude all dirs, include left/
+ cat >patterns <<-\EOF &&
+ /*
+ !/*/
+ /left/
+ EOF
+ sparse_oid=$(git hash-object -w -t blob patterns)
+'
+
+test_expect_success 'all, sparse:oid filter' '
+ test-tool path-walk --filter=sparse:oid=$sparse_oid -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tag:/tags:$(git rev-parse refs/tags/first)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.1)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.2)
+ 1:tag:/tags:$(git rev-parse refs/tags/third)
+ 1:tag:/tags:$(git rev-parse refs/tags/fourth)
+ 1:tag:/tags:$(git rev-parse refs/tags/tree-tag)
+ 1:tag:/tags:$(git rev-parse refs/tags/blob-tag)
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{})
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{})
+ 3:tree::$(git rev-parse topic^{tree})
+ 3:tree::$(git rev-parse base^{tree})
+ 3:tree::$(git rev-parse base~1^{tree})
+ 3:tree::$(git rev-parse base~2^{tree})
+ 3:tree::$(git rev-parse refs/tags/tree-tag^{})
+ 3:tree::$(git rev-parse refs/tags/tree-tag2^{})
+ 4:blob:a:$(git rev-parse base~2:a)
+ 5:blob:file2:$(git rev-parse refs/tags/tree-tag2^{}:file2)
+ 6:tree:a/:$(git rev-parse base:a)
+ 7:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 8:tree:left/:$(git rev-parse base:left)
+ 8:tree:left/:$(git rev-parse base~2:left)
+ 9:blob:left/b:$(git rev-parse base~2:left/b)
+ 9:blob:left/b:$(git rev-parse base:left/b)
+ 10:tree:right/:$(git rev-parse topic:right)
+ 10:tree:right/:$(git rev-parse base~1:right)
+ 10:tree:right/:$(git rev-parse base~2:right)
+ blobs:6
+ commits:4
+ tags:7
+ trees:13
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'topic only, sparse:oid filter' '
+ test-tool path-walk --filter=sparse:oid=$sparse_oid -- topic >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tree::$(git rev-parse topic^{tree})
+ 1:tree::$(git rev-parse base~1^{tree})
+ 1:tree::$(git rev-parse base~2^{tree})
+ 2:blob:a:$(git rev-parse base~2:a)
+ 3:tree:left/:$(git rev-parse base~2:left)
+ 4:blob:left/b:$(git rev-parse base~2:left/b)
+ 5:tree:right/:$(git rev-parse topic:right)
+ 5:tree:right/:$(git rev-parse base~1:right)
+ 5:tree:right/:$(git rev-parse base~2:right)
+ blobs:2
+ commits:3
+ tags:0
+ trees:7
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+# Demonstrate the SEEN flag ordering issue: when the same tree/blob OID
+# appears at two sibling paths where one is in-cone and the other is
+# out-of-cone, the path-walk must still discover blobs at the in-cone
+# path even when the shared tree OID was first encountered out-of-cone.
+# Since sparse:oid includes all trees, the out-of-cone tree (aaa/) is
+# walked first, and its blob is skipped. The path-walk then re-walks
+# the same tree OID at the in-cone path (zzz/) to find the blob there.
+
+test_expect_success 'setup shared tree OID across cone boundary' '
+ git checkout --orphan shared-tree &&
+ git rm -rf . &&
+ mkdir aaa zzz &&
+ echo "shared content" >aaa/file &&
+ echo "shared content" >zzz/file &&
+ echo "root file" >rootfile &&
+ git add aaa zzz rootfile &&
+ git commit -m "aaa and zzz have same tree OID" &&
+
+ # Verify they really share a tree OID
+ aaa_tree=$(git rev-parse HEAD:aaa) &&
+ zzz_tree=$(git rev-parse HEAD:zzz) &&
+ test "$aaa_tree" = "$zzz_tree" &&
+
+ # Cone pattern: include root + zzz/ (not aaa/)
+ cat >shared-patterns <<-\EOF &&
+ /*
+ !/*/
+ /zzz/
+ EOF
+ shared_sparse_oid=$(git hash-object -w -t blob shared-patterns)
+'
+
+test_expect_success 'sparse:oid with shared tree OID across cone boundary' '
+ test-tool path-walk \
+ --filter=sparse:oid=$shared_sparse_oid \
+ -- shared-tree >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse shared-tree)
+ 1:tree::$(git rev-parse shared-tree^{tree})
+ 2:blob:rootfile:$(git rev-parse shared-tree:rootfile)
+ 3:tree:aaa/:$(git rev-parse shared-tree:aaa)
+ 4:tree:zzz/:$(git rev-parse shared-tree:zzz)
+ 5:blob:zzz/file:$(git rev-parse shared-tree:zzz/file)
+ blobs:2
+ commits:1
+ tags:0
+ trees:3
+ EOF
+
+ test_cmp_sorted expect out
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 08/13] path-walk: add pl_sparse_trees to control tree pruning
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
The path-walk API prunes trees and blobs when a sparse-checkout pattern
list is provided, which is the correct behavior for 'git backfill
--sparse' since it only needs to fill in objects at paths within the
sparse cone.
However, a future change will use the path-walk API with a sparse:<oid>
filter that restricts only blobs while retaining all reachable trees.
To support both behaviors, add a 'pl_sparse_trees' flag to
path_walk_info. When set (as in 'git backfill --sparse' and the
--stdin-pl test helper mode), the sparse patterns prune both trees and
blobs. When unset, only blobs are filtered and all trees are walked and
reported.
Additionally, move the SEEN flag assignment in add_tree_entries() to
after the sparse pattern and pathspec checks. Previously, SEEN was set
immediately upon discovering an object, before checking whether its path
matched the sparse patterns. When the same object ID appeared at
multiple paths (e.g. sibling directories with identical contents), the
first path to be visited would mark the object as SEEN. If that path was
outside the sparse cone, the object would be skipped there but also
never discovered at its in-cone path.
By deferring the SEEN flag until after the checks pass, objects that are
skipped due to sparse filtering remain discoverable at other paths where
they may be in scope.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
builtin/backfill.c | 1 +
path-walk.c | 5 +++--
path-walk.h | 6 ++++++
t/helper/test-path-walk.c | 6 +++++-
t/t6601-path-walk.sh | 37 +++++++++++++++++++++++++++++++++++++
5 files changed, 52 insertions(+), 3 deletions(-)
diff --git a/builtin/backfill.c b/builtin/backfill.c
index 5254a42711..e71e0f4742 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -109,6 +109,7 @@ static int do_backfill(struct backfill_context *ctx)
if (ctx->sparse) {
CALLOC_ARRAY(info.pl, 1);
+ info.pl_sparse_trees = 1;
if (get_sparse_checkout_patterns(info.pl)) {
path_walk_info_clear(&info);
return error(_("problem loading sparse-checkout"));
diff --git a/path-walk.c b/path-walk.c
index 04b924d4de..225857bbc8 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -183,7 +183,6 @@ static int add_tree_entries(struct path_walk_context *ctx,
/* Skip this object if already seen. */
if (o->flags & SEEN)
continue;
- o->flags |= SEEN;
strbuf_setlen(&path, base_len);
strbuf_add(&path, entry.path, entry.pathlen);
@@ -204,7 +203,8 @@ static int add_tree_entries(struct path_walk_context *ctx,
ctx->repo->index);
if (ctx->info->pl->use_cone_patterns &&
- match == NOT_MATCHED)
+ match == NOT_MATCHED &&
+ (type == OBJ_BLOB || ctx->info->pl_sparse_trees))
continue;
else if (!ctx->info->pl->use_cone_patterns &&
type == OBJ_BLOB &&
@@ -239,6 +239,7 @@ static int add_tree_entries(struct path_walk_context *ctx,
continue;
}
+ o->flags |= SEEN;
add_path_to_list(ctx, path.buf, type, &entry.oid,
!(o->flags & UNINTERESTING));
diff --git a/path-walk.h b/path-walk.h
index 60ceb65433..7e57ae5f65 100644
--- a/path-walk.h
+++ b/path-walk.h
@@ -76,8 +76,14 @@ struct path_walk_info {
* of the cone. If not in cone mode, then all tree paths will be
* explored but the path_fn will only be called when the path matches
* the sparse-checkout patterns.
+ *
+ * When 'pl_sparse_trees' is zero, the sparse patterns only restrict
+ * blobs and all trees are included in the walk output. This matches
+ * the behavior of the sparse:oid object filter. When nonzero, trees
+ * are also pruned by the sparse patterns (as used by backfill).
*/
struct pattern_list *pl;
+ int pl_sparse_trees;
};
#define PATH_WALK_INFO_INIT { \
diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c
index 88f86ae0dc..3f2b50a9aa 100644
--- a/t/helper/test-path-walk.c
+++ b/t/helper/test-path-walk.c
@@ -68,7 +68,7 @@ static int emit_block(const char *path, struct oid_array *oids,
int cmd__path_walk(int argc, const char **argv)
{
- int res, stdin_pl = 0;
+ int res, stdin_pl = 0, pl_sparse_trees = -1;
struct rev_info revs = REV_INFO_INIT;
struct path_walk_info info = PATH_WALK_INFO_INIT;
struct path_walk_test_data data = { 0 };
@@ -89,6 +89,8 @@ int cmd__path_walk(int argc, const char **argv)
N_("toggle aggressive edge walk")),
OPT_BOOL(0, "stdin-pl", &stdin_pl,
N_("read a pattern list over stdin")),
+ OPT_BOOL(0, "pl-sparse-trees", &pl_sparse_trees,
+ N_("toggle pruning of trees by sparse patterns")),
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_END(),
};
@@ -116,6 +118,8 @@ int cmd__path_walk(int argc, const char **argv)
if (stdin_pl) {
struct strbuf in = STRBUF_INIT;
CALLOC_ARRAY(info.pl, 1);
+ info.pl_sparse_trees = (pl_sparse_trees >= 0) ?
+ pl_sparse_trees : 1;
info.pl->use_cone_patterns = 1;
diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh
index 45f366d738..02ad83dfb0 100755
--- a/t/t6601-path-walk.sh
+++ b/t/t6601-path-walk.sh
@@ -206,6 +206,43 @@ test_expect_success 'base & topic, sparse' '
test_cmp_sorted expect out
'
+test_expect_success 'base & topic, sparse, no tree pruning' '
+ cat >patterns <<-EOF &&
+ /*
+ !/*/
+ /left/
+ EOF
+
+ test-tool path-walk --stdin-pl --no-pl-sparse-trees \
+ -- base topic <patterns >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tree::$(git rev-parse topic^{tree})
+ 1:tree::$(git rev-parse base^{tree})
+ 1:tree::$(git rev-parse base~1^{tree})
+ 1:tree::$(git rev-parse base~2^{tree})
+ 2:blob:a:$(git rev-parse base~2:a)
+ 3:tree:a/:$(git rev-parse base:a)
+ 4:tree:left/:$(git rev-parse base:left)
+ 4:tree:left/:$(git rev-parse base~2:left)
+ 5:blob:left/b:$(git rev-parse base~2:left/b)
+ 5:blob:left/b:$(git rev-parse base:left/b)
+ 6:tree:right/:$(git rev-parse topic:right)
+ 6:tree:right/:$(git rev-parse base~1:right)
+ 6:tree:right/:$(git rev-parse base~2:right)
+ blobs:3
+ commits:4
+ tags:0
+ trees:10
+ EOF
+
+ test_cmp_sorted expect out
+'
+
test_expect_success 'topic only' '
test-tool path-walk -- topic >out &&
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 07/13] path-walk: support blob size limit filter
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
Extend the path-walk API to handle the 'blob:limit=<size>' object
filter natively. This filter omits blobs whose size is equal to or
greater than the given limit, matching the semantics used by the
list-objects-filter machinery.
When revs->filter.choice is LOFC_BLOB_LIMIT, the prepare_filters()
method stores the limit value in info->blob_limit and clears the filter
from revs. If the limit is zero, this degenerates to blob:none (all
blobs excluded), so info->blobs is set to 0 instead.
During walk_path(), blob batches are filtered before being delivered to
the callback: each blob's size is checked via odb_read_object_info(),
and only blobs strictly smaller than the limit are included. Blobs whose
size cannot be determined (e.g. missing in a partial clone) are
conservatively included, matching the existing filter behavior. Empty
batches after filtering are skipped entirely.
The check for inclusion in the path batch looks a little strange at
first glance. We use odb_read_object_info() to read the object's size.
Based on all of the assumptions to this point, this _should_ return
OBJ_BLOB. Since we are focused on the size filter, we use a
short-circuited OR (||) to skip the size check if that method returns a
different object type.
Notice that this inspection of object sizes requires the content to be
present in the repository. The odb_read_object_info() call will download
a missing blob on-demand. This means that the use of the path-walk API
within 'git backfill' would not operate nicely with this filter type.
The intention of that command is to download missing blobs in batches.
Downloading objects one-by-one would go against the point. Update the
validation in 'git backfill' to add its own compatibility check on top
of path_walk_filter_compatible().
Add tests for blob:limit=0 (equivalent to blob:none) and blob:limit=3
(which exercises partial filtering within a batch where some blobs are
kept and others are excluded).
Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-pack-objects.adoc | 2 +-
builtin/backfill.c | 2 +
path-walk.c | 41 +++++++++++++--
path-walk.h | 7 +++
t/t5620-backfill.sh | 2 +-
t/t6601-path-walk.sh | 82 +++++++++++++++++++++++++++++
6 files changed, 130 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index 2994faf988..85ae48b699 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -404,7 +404,7 @@ will be automatically changed to version `1`.
+
Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
ignored in the presence of `--path-walk`. The `--path-walk` option
-supports the `--filter=<spec>` form `blob:none`.
+supports the `--filter=<spec>` forms `blob:none` and `blob:limit=<n>`.
DELTA ISLANDS
diff --git a/builtin/backfill.c b/builtin/backfill.c
index b80f9ebe69..5254a42711 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -98,6 +98,8 @@ static void reject_unsupported_rev_list_options(struct rev_info *revs)
"--diff-merges");
if (!path_walk_filter_compatible(&revs->filter))
die(_("cannot backfill with these filter options"));
+ if (revs->filter.blob_limit_value)
+ die(_("cannot backfill with blob size limits"));
}
static int do_backfill(struct backfill_context *ctx)
diff --git a/path-walk.c b/path-walk.c
index bd81508163..04b924d4de 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -10,6 +10,7 @@
#include "hex.h"
#include "list-objects.h"
#include "list-objects-filter-options.h"
+#include "odb.h"
#include "object.h"
#include "oid-array.h"
#include "path.h"
@@ -327,13 +328,35 @@ static int walk_path(struct path_walk_context *ctx,
/*
* Evaluate function pointer on this data, if requested.
* Ignore object type filters for tagged objects (path starts
- * with `/`).
+ * with `/`), first for blobs and then other types.
*/
- if ((list->type == OBJ_TREE && (ctx->info->trees || path_is_for_direct_objects(path))) ||
- (list->type == OBJ_BLOB && (ctx->info->blobs || path_is_for_direct_objects(path))) ||
- (list->type == OBJ_TAG && ctx->info->tags))
+ if (list->type == OBJ_BLOB &&
+ ctx->info->blob_limit &&
+ !path_is_for_direct_objects(path)) {
+ struct oid_array filtered = OID_ARRAY_INIT;
+
+ for (size_t i = 0; i < list->oids.nr; i++) {
+ unsigned long size;
+
+ if (odb_read_object_info(ctx->repo->objects,
+ &list->oids.oid[i],
+ &size) != OBJ_BLOB ||
+ size < ctx->info->blob_limit)
+ oid_array_append(&filtered,
+ &list->oids.oid[i]);
+ }
+
+ if (filtered.nr)
+ ret = ctx->info->path_fn(path, &filtered, list->type,
+ ctx->info->path_fn_data);
+ oid_array_clear(&filtered);
+ } else if (path_is_for_direct_objects(path) ||
+ (list->type == OBJ_TREE && ctx->info->trees) ||
+ (list->type == OBJ_BLOB && ctx->info->blobs) ||
+ (list->type == OBJ_TAG && ctx->info->tags)) {
ret = ctx->info->path_fn(path, &list->oids, list->type,
ctx->info->path_fn_data);
+ }
/* Expand data for children. */
if (list->type == OBJ_TREE) {
@@ -510,6 +533,16 @@ static int prepare_filters(struct path_walk_info *info,
}
return 1;
+ case LOFC_BLOB_LIMIT:
+ if (info) {
+ if (!options->blob_limit_value)
+ info->blobs = 0;
+ else
+ info->blob_limit = options->blob_limit_value;
+ list_objects_filter_release(options);
+ }
+ return 1;
+
default:
error(_("object filter '%s' not supported by the path-walk API"),
list_objects_filter_spec(options));
diff --git a/path-walk.h b/path-walk.h
index a1736ecb2b..60ceb65433 100644
--- a/path-walk.h
+++ b/path-walk.h
@@ -47,6 +47,13 @@ struct path_walk_info {
int blobs;
int tags;
+ /**
+ * If non-zero, specifies a maximum blob size. Blobs with a
+ * size equal to or greater than this limit will not be
+ * emitted unless included in 'pending'.
+ */
+ unsigned long blob_limit;
+
/**
* When 'prune_all_uninteresting' is set and a path has all objects
* marked as UNINTERESTING, then the path-walk will not visit those
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index ede89f8c33..d2ea68e065 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -20,7 +20,7 @@ test_expect_success 'backfill rejects incompatible filter options' '
test_grep "cannot backfill with these filter options" err &&
test_must_fail git backfill --objects --filter=blob:limit=10m 2>err &&
- test_grep "cannot backfill with these filter options" err
+ test_grep "cannot backfill with blob size limits" err
'
# We create objects in the 'src' repo.
diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh
index b0ee31ee2d..45f366d738 100755
--- a/t/t6601-path-walk.sh
+++ b/t/t6601-path-walk.sh
@@ -477,4 +477,86 @@ test_expect_success 'topic only, blob:none filter' '
test_cmp_sorted expect out
'
+test_expect_success 'all, blob:limit=0 filter' '
+ test-tool path-walk --filter=blob:limit=0 -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tag:/tags:$(git rev-parse refs/tags/first)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.1)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.2)
+ 1:tag:/tags:$(git rev-parse refs/tags/third)
+ 1:tag:/tags:$(git rev-parse refs/tags/fourth)
+ 1:tag:/tags:$(git rev-parse refs/tags/tree-tag)
+ 1:tag:/tags:$(git rev-parse refs/tags/blob-tag)
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{})
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{})
+ 3:tree::$(git rev-parse topic^{tree})
+ 3:tree::$(git rev-parse base^{tree})
+ 3:tree::$(git rev-parse base~1^{tree})
+ 3:tree::$(git rev-parse base~2^{tree})
+ 3:tree::$(git rev-parse refs/tags/tree-tag^{})
+ 3:tree::$(git rev-parse refs/tags/tree-tag2^{})
+ 4:tree:a/:$(git rev-parse base:a)
+ 5:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 6:tree:left/:$(git rev-parse base:left)
+ 6:tree:left/:$(git rev-parse base~2:left)
+ 7:tree:right/:$(git rev-parse topic:right)
+ 7:tree:right/:$(git rev-parse base~1:right)
+ 7:tree:right/:$(git rev-parse base~2:right)
+ blobs:2
+ commits:4
+ tags:7
+ trees:13
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'all, blob:limit=3 filter' '
+ test-tool path-walk --filter=blob:limit=3 -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tag:/tags:$(git rev-parse refs/tags/first)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.1)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.2)
+ 1:tag:/tags:$(git rev-parse refs/tags/third)
+ 1:tag:/tags:$(git rev-parse refs/tags/fourth)
+ 1:tag:/tags:$(git rev-parse refs/tags/tree-tag)
+ 1:tag:/tags:$(git rev-parse refs/tags/blob-tag)
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{})
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{})
+ 3:tree::$(git rev-parse topic^{tree})
+ 3:tree::$(git rev-parse base^{tree})
+ 3:tree::$(git rev-parse base~1^{tree})
+ 3:tree::$(git rev-parse base~2^{tree})
+ 3:tree::$(git rev-parse refs/tags/tree-tag^{})
+ 3:tree::$(git rev-parse refs/tags/tree-tag2^{})
+ 4:blob:a:$(git rev-parse base~2:a)
+ 5:tree:a/:$(git rev-parse base:a)
+ 6:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 7:tree:left/:$(git rev-parse base:left)
+ 7:tree:left/:$(git rev-parse base~2:left)
+ 8:blob:left/b:$(git rev-parse base~2:left/b)
+ 9:tree:right/:$(git rev-parse topic:right)
+ 9:tree:right/:$(git rev-parse base~1:right)
+ 9:tree:right/:$(git rev-parse base~2:right)
+ 10:blob:right/c:$(git rev-parse base~2:right/c)
+ 11:blob:right/d:$(git rev-parse base~1:right/d)
+ blobs:6
+ commits:4
+ tags:7
+ trees:13
+ EOF
+
+ test_cmp_sorted expect out
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 06/13] backfill: die on incompatible filter options
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
The 'git backfill' command uses the path-walk API in a critical way: it
uses the objects output from the command to find the batches of missing
objects that should be requested from the server. Unlike 'git
pack-objects', we cannot fall back to another mechanism.
The previous change added the path_walk_filter_compatible() method that
we can reuse here. Use it during argument validation in cmd_backfill().
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
builtin/backfill.c | 5 ++---
t/t5620-backfill.sh | 8 ++++++++
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/builtin/backfill.c b/builtin/backfill.c
index 7ffab2ea74..b80f9ebe69 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -96,9 +96,8 @@ static void reject_unsupported_rev_list_options(struct rev_info *revs)
if (revs->explicit_diff_merges)
die(_("'%s' cannot be used with 'git backfill'"),
"--diff-merges");
- if (revs->filter.choice)
- die(_("'%s' cannot be used with 'git backfill'"),
- "--filter");
+ if (!path_walk_filter_compatible(&revs->filter))
+ die(_("cannot backfill with these filter options"));
}
static int do_backfill(struct backfill_context *ctx)
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 94f35ce190..ede89f8c33 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -15,6 +15,14 @@ test_expect_success 'backfill rejects unexpected arguments' '
test_grep "unrecognized argument: --unexpected-arg" err
'
+test_expect_success 'backfill rejects incompatible filter options' '
+ test_must_fail git backfill --objects --filter=tree:1 2>err &&
+ test_grep "cannot backfill with these filter options" err &&
+
+ test_must_fail git backfill --objects --filter=blob:limit=10m 2>err &&
+ test_grep "cannot backfill with these filter options" err
+'
+
# We create objects in the 'src' repo.
test_expect_success 'setup repo for object creation' '
echo "{print \$1}" >print_1.awk &&
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 05/13] path-walk: support blobless filter
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
The 'git pack-objects' command can opt-in to using the path-walk API for
scanning the objects. Currently, this option is dynamically disabled if
combined with '--filter=<X>', even when using a simple filter such as
'blob:none' to signal a blobless packfile. This is a common scenario for
repos at scale, so is worth integrating.
Also, users can opt-in to the '--path-walk' option by default through
the pack.usePathWalk=true config option. When using that in a blobless
partial clone, the following warning can appear even though the user did
not specify either option directly:
warning: cannot use --filter with --path-walk
Teach the path-walk API to handle the 'blob:none' object filter
natively. When revs->filter.choice is LOFC_BLOB_NONE, the path-walk
sets info->blobs to 0 (skipping all blob objects) and clears the
filter from revs so that prepare_revision_walk() does not reject the
configuration.
This check is implemented in the static prepare_filters() method, which
will simultaneously check if the input filters are compatible and will
make the appropriate mutations to the path_walk_info and filters if the
path_walk_info is non-NULL. This allows us to use this logic both in the
API method path_walk_filter_compatible() for use in
builtin/pack-objects.c and as a prep step in walk_objects_by_path().
Update the test helper (test-path-walk) to accept --filter=<spec>
as a test-tool option (before '--'), applying it to revs after
setup_revisions() to avoid the --objects requirement check. We can also
revert recent GIT_TEST_PACK_PATH_WALK overrides in t5620.
Also switch test-path-walk from REV_INFO_INIT with manual repo
assignment to repo_init_revisions(), which properly initializes
the filter_spec strbuf needed for filter parsing.
Add tests for blob:none with --all and with a single branch.
The performance test p5315 shows the impact of this change when using
blobless filters:
Test HEAD~1 HEAD
---------------------------------------------------------------------
5315.6: repack (blob:none) 13.53 13.87 +2.5%
5315.7: repack size (blob:none) 137.7M 137.8M +0.1%
5315.8: repack (blob:none, --path-walk) 13.51 23.43 +73.4%
5315.9: repack size (blob:none, --path-walk) 137.7M 115.2M -16.3%
These performance tests were run on the Git repository. The --path-walk
feature shows meaningful space savings (16% smaller for blobless packs)
at the cost of increased computation time due to the two compression
passes. This data demonstrates that the feature is engaged and provides
real compression benefits when --no-reuse-delta forces fresh deltas.
Co-Authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-pack-objects.adoc | 6 +--
builtin/pack-objects.c | 2 +-
path-walk.c | 30 ++++++++++++++
path-walk.h | 7 ++++
t/helper/test-path-walk.c | 11 ++++-
t/t5620-backfill.sh | 9 -----
t/t6601-path-walk.sh | 62 +++++++++++++++++++++++++++++
7 files changed, 113 insertions(+), 14 deletions(-)
diff --git a/Documentation/git-pack-objects.adoc b/Documentation/git-pack-objects.adoc
index b78175fbe1..2994faf988 100644
--- a/Documentation/git-pack-objects.adoc
+++ b/Documentation/git-pack-objects.adoc
@@ -402,9 +402,9 @@ will be automatically changed to version `1`.
of filenames that cause collisions in Git's default name-hash
algorithm.
+
-Incompatible with `--delta-islands`, `--shallow`, or `--filter`. The
-`--use-bitmap-index` option will be ignored in the presence of
-`--path-walk.`
+Incompatible with `--delta-islands`. The `--use-bitmap-index` option is
+ignored in the presence of `--path-walk`. The `--path-walk` option
+supports the `--filter=<spec>` form `blob:none`.
DELTA ISLANDS
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4338962904..bc9fb5b457 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -5177,7 +5177,7 @@ int cmd_pack_objects(int argc,
if (path_walk) {
const char *option = NULL;
- if (filter_options.choice)
+ if (!path_walk_filter_compatible(&filter_options))
option = "--filter";
else if (use_delta_islands)
option = "--delta-islands";
diff --git a/path-walk.c b/path-walk.c
index 05bfc1c114..bd81508163 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -9,6 +9,7 @@
#include "hashmap.h"
#include "hex.h"
#include "list-objects.h"
+#include "list-objects-filter-options.h"
#include "object.h"
#include "oid-array.h"
#include "path.h"
@@ -495,6 +496,32 @@ static int setup_pending_objects(struct path_walk_info *info,
return 0;
}
+static int prepare_filters(struct path_walk_info *info,
+ struct list_objects_filter_options *options)
+{
+ switch (options->choice) {
+ case LOFC_DISABLED:
+ return 1;
+
+ case LOFC_BLOB_NONE:
+ if (info) {
+ info->blobs = 0;
+ list_objects_filter_release(options);
+ }
+ return 1;
+
+ default:
+ error(_("object filter '%s' not supported by the path-walk API"),
+ list_objects_filter_spec(options));
+ return 0;
+ }
+}
+
+int path_walk_filter_compatible(struct list_objects_filter_options *options)
+{
+ return prepare_filters(NULL, options);
+}
+
/**
* Given the configuration of 'info', walk the commits based on 'info->revs' and
* call 'info->path_fn' on each discovered path.
@@ -522,6 +549,9 @@ int walk_objects_by_path(struct path_walk_info *info)
trace2_region_enter("path-walk", "commit-walk", info->revs->repo);
+ if (!prepare_filters(info, &info->revs->filter))
+ return -1;
+
CALLOC_ARRAY(commit_list, 1);
commit_list->type = OBJ_COMMIT;
diff --git a/path-walk.h b/path-walk.h
index 657eeda8ec..a1736ecb2b 100644
--- a/path-walk.h
+++ b/path-walk.h
@@ -90,3 +90,10 @@ void path_walk_info_clear(struct path_walk_info *info);
* Returns nonzero on an error.
*/
int walk_objects_by_path(struct path_walk_info *info);
+
+struct list_objects_filter_options;
+/**
+ * Given a set of options for filtering objects, return 1 if the options
+ * are compatible with the path-walk API and 0 otherwise.
+ */
+int path_walk_filter_compatible(struct list_objects_filter_options *options);
diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c
index fe63002c2b..88f86ae0dc 100644
--- a/t/helper/test-path-walk.c
+++ b/t/helper/test-path-walk.c
@@ -4,6 +4,7 @@
#include "dir.h"
#include "environment.h"
#include "hex.h"
+#include "list-objects-filter-options.h"
#include "object-name.h"
#include "object.h"
#include "pretty.h"
@@ -71,6 +72,8 @@ int cmd__path_walk(int argc, const char **argv)
struct rev_info revs = REV_INFO_INIT;
struct path_walk_info info = PATH_WALK_INFO_INIT;
struct path_walk_test_data data = { 0 };
+ struct list_objects_filter_options filter_options =
+ LIST_OBJECTS_FILTER_INIT;
struct option options[] = {
OPT_BOOL(0, "blobs", &info.blobs,
N_("toggle inclusion of blob objects")),
@@ -86,11 +89,12 @@ int cmd__path_walk(int argc, const char **argv)
N_("toggle aggressive edge walk")),
OPT_BOOL(0, "stdin-pl", &stdin_pl,
N_("read a pattern list over stdin")),
+ OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_END(),
};
setup_git_directory();
- revs.repo = the_repository;
+ repo_init_revisions(the_repository, &revs, NULL);
argc = parse_options(argc, argv, NULL,
options, path_walk_usage,
@@ -101,6 +105,10 @@ int cmd__path_walk(int argc, const char **argv)
else
usage(path_walk_usage[0]);
+ /* Apply the filter after setup_revisions to avoid the --objects check. */
+ if (filter_options.choice)
+ list_objects_filter_copy(&revs.filter, &filter_options);
+
info.revs = &revs;
info.path_fn = emit_block;
info.path_fn_data = &data;
@@ -129,6 +137,7 @@ int cmd__path_walk(int argc, const char **argv)
free(info.pl);
}
+ list_objects_filter_release(&filter_options);
release_revisions(&revs);
return res;
}
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index e174290787..94f35ce190 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -298,9 +298,6 @@ test_expect_success 'backfill with prefix pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # If we enable --path-walk here, we will get a warning overs stderr
- # due to incompatibilities with --filter.
- GIT_TEST_PACK_PATH_WALK=0 \
git -C backfill-path backfill HEAD -- d/f 2>err &&
test_must_be_empty err &&
@@ -318,9 +315,6 @@ test_expect_success 'backfill with multiple pathspecs' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # If we enable --path-walk here, we will get a warning overs stderr
- # due to incompatibilities with --filter.
- GIT_TEST_PACK_PATH_WALK=0 \
git -C backfill-path backfill HEAD -- d/f a 2>err &&
test_must_be_empty err &&
@@ -338,9 +332,6 @@ test_expect_success 'backfill with wildcard pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # If we enable --path-walk here, we will get a warning overs stderr
- # due to incompatibilities with --filter.
- GIT_TEST_PACK_PATH_WALK=0 \
git -C backfill-path backfill HEAD -- "d/file.*.txt" 2>err &&
test_must_be_empty err &&
diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh
index 56bd1e3c5b..b0ee31ee2d 100755
--- a/t/t6601-path-walk.sh
+++ b/t/t6601-path-walk.sh
@@ -415,4 +415,66 @@ test_expect_success 'trees are reported exactly once' '
test_line_count = 1 out-filtered
'
+test_expect_success 'all, blob:none filter' '
+ test-tool path-walk --filter=blob:none -- --all >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tag:/tags:$(git rev-parse refs/tags/first)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.1)
+ 1:tag:/tags:$(git rev-parse refs/tags/second.2)
+ 1:tag:/tags:$(git rev-parse refs/tags/third)
+ 1:tag:/tags:$(git rev-parse refs/tags/fourth)
+ 1:tag:/tags:$(git rev-parse refs/tags/tree-tag)
+ 1:tag:/tags:$(git rev-parse refs/tags/blob-tag)
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag^{})
+ 2:blob:/tagged-blobs:$(git rev-parse refs/tags/blob-tag2^{})
+ 3:tree::$(git rev-parse topic^{tree})
+ 3:tree::$(git rev-parse base^{tree})
+ 3:tree::$(git rev-parse base~1^{tree})
+ 3:tree::$(git rev-parse base~2^{tree})
+ 3:tree::$(git rev-parse refs/tags/tree-tag^{})
+ 3:tree::$(git rev-parse refs/tags/tree-tag2^{})
+ 4:tree:a/:$(git rev-parse base:a)
+ 5:tree:child/:$(git rev-parse refs/tags/tree-tag:child)
+ 6:tree:left/:$(git rev-parse base:left)
+ 6:tree:left/:$(git rev-parse base~2:left)
+ 7:tree:right/:$(git rev-parse topic:right)
+ 7:tree:right/:$(git rev-parse base~1:right)
+ 7:tree:right/:$(git rev-parse base~2:right)
+ blobs:2
+ commits:4
+ tags:7
+ trees:13
+ EOF
+
+ test_cmp_sorted expect out
+'
+
+test_expect_success 'topic only, blob:none filter' '
+ test-tool path-walk --filter=blob:none -- topic >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tree::$(git rev-parse topic^{tree})
+ 1:tree::$(git rev-parse base~1^{tree})
+ 1:tree::$(git rev-parse base~2^{tree})
+ 2:tree:left/:$(git rev-parse base~2:left)
+ 3:tree:right/:$(git rev-parse topic:right)
+ 3:tree:right/:$(git rev-parse base~1:right)
+ 3:tree:right/:$(git rev-parse base~2:right)
+ blobs:0
+ commits:3
+ tags:0
+ trees:7
+ EOF
+
+ test_cmp_sorted expect out
+'
+
test_done
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 04/13] path-walk: always emit directly-requested objects
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
We are preparing to integrate the path-walk API with some --filter options
in 'git pack-objects', but there is a subtle issue that is revealed when
those are put together and the test suite is run with
GIT_TEST_PACK_PATH_WALK=1.
When a filter reduces the set of requested objects, this results in
filtering out directly-requested objects, such as in the download of needed
blobs in a blobless partial clone.
The root cause is that the scan of pending objects in the path-walk API
respects the filters set in the path_walk_info instead of overriding them
for pending objects.
We can tell that a path is part of the directly-referenced objects if its
path name starts with '/' (other paths, including root trees never have this
starting character). Create a path_is_for_direct_objects() to make this
meaning clear, especially as we add more references in the future as we
integrate the path-walk API with partial clone filter options.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/technical/api-path-walk.adoc | 7 ++++
path-walk.c | 42 ++++++++++++++--------
path-walk.h | 5 +++
3 files changed, 39 insertions(+), 15 deletions(-)
diff --git a/Documentation/technical/api-path-walk.adoc b/Documentation/technical/api-path-walk.adoc
index a67de1b143..6e17b13d61 100644
--- a/Documentation/technical/api-path-walk.adoc
+++ b/Documentation/technical/api-path-walk.adoc
@@ -48,6 +48,13 @@ commits.
applications could disable some options to make it simpler to walk
the objects or to have fewer calls to `path_fn`.
+
+Note that objects directly requested as pending objects (such as targets
+of lightweight tags or other ref tips) are always emitted to `path_fn`,
+even when the corresponding type flag is disabled. Only objects
+discovered during the tree walk are subject to these type filters. This
+ensures that objects specifically requested through the revision input
+are never silently dropped.
++
While it is possible to walk only commits in this way, consumers would be
better off using the revision walk API instead.
diff --git a/path-walk.c b/path-walk.c
index 6e426af433..05bfc1c114 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -248,6 +248,17 @@ static int add_tree_entries(struct path_walk_context *ctx,
return 0;
}
+/*
+ * Paths starting with '/' (e.g., "/tags", "/tagged-blobs") hold objects that
+ * were directly requested by 'pending' objects rather than discovered during
+ * tree traversal.
+ */
+static int path_is_for_direct_objects(const char *path)
+{
+ ASSERT(path);
+ return path[0] == '/';
+}
+
/*
* For each path in paths_to_explore, walk the trees another level
* and add any found blobs to the batch (but only if they exist and
@@ -306,14 +317,19 @@ static int walk_path(struct path_walk_context *ctx,
if (list->type == OBJ_BLOB &&
ctx->revs->prune_data.nr &&
+ !path_is_for_direct_objects(path) &&
!match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
path, strlen(path), 0,
NULL, 0))
return 0;
- /* Evaluate function pointer on this data, if requested. */
- if ((list->type == OBJ_TREE && ctx->info->trees) ||
- (list->type == OBJ_BLOB && ctx->info->blobs) ||
+ /*
+ * Evaluate function pointer on this data, if requested.
+ * Ignore object type filters for tagged objects (path starts
+ * with `/`).
+ */
+ if ((list->type == OBJ_TREE && (ctx->info->trees || path_is_for_direct_objects(path))) ||
+ (list->type == OBJ_BLOB && (ctx->info->blobs || path_is_for_direct_objects(path))) ||
(list->type == OBJ_TAG && ctx->info->tags))
ret = ctx->info->path_fn(path, &list->oids, list->type,
ctx->info->path_fn_data);
@@ -374,10 +390,8 @@ static int setup_pending_objects(struct path_walk_info *info,
if (info->tags)
CALLOC_ARRAY(tags, 1);
- if (info->blobs)
- CALLOC_ARRAY(tagged_blobs, 1);
- if (info->trees)
- root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);
+ CALLOC_ARRAY(tagged_blobs, 1);
+ root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);
/*
* Pending objects include:
@@ -421,8 +435,6 @@ static int setup_pending_objects(struct path_walk_info *info,
switch (obj->type) {
case OBJ_TREE:
- if (!info->trees)
- continue;
if (pending->path) {
char *path = *pending->path ? xstrfmt("%s/", pending->path)
: xstrdup("");
@@ -435,8 +447,6 @@ static int setup_pending_objects(struct path_walk_info *info,
break;
case OBJ_BLOB:
- if (!info->blobs)
- continue;
if (pending->path)
add_path_to_list(ctx, pending->path, OBJ_BLOB, &obj->oid, 1);
else
@@ -532,15 +542,17 @@ int walk_objects_by_path(struct path_walk_info *info)
push_to_stack(&ctx, root_path);
/*
- * Set these values before preparing the walk to catch
- * lightweight tags pointing to non-commits and indexed objects.
+ * Ensure that prepare_revision_walk() keeps all pending objects
+ * even through an object type filter.
*/
- info->revs->blob_objects = info->blobs;
- info->revs->tree_objects = info->trees;
+ info->revs->blob_objects = info->revs->tree_objects = 1;
if (prepare_revision_walk(info->revs))
die(_("failed to setup revision walk"));
+ info->revs->blob_objects = info->blobs;
+ info->revs->tree_objects = info->trees;
+
/*
* Walk trees to mark them as UNINTERESTING.
* This is particularly important when 'edge_aggressive' is set.
diff --git a/path-walk.h b/path-walk.h
index 5ef5a8440e..657eeda8ec 100644
--- a/path-walk.h
+++ b/path-walk.h
@@ -36,6 +36,11 @@ struct path_walk_info {
/**
* Initialize which object types the path_fn should be called on. This
* could also limit the walk to skip blobs if not set.
+ *
+ * Note: even when 'blobs' or 'trees' is disabled, objects that are
+ * directly requested as pending objects will still be emitted to
+ * path_fn. Only objects discovered during the tree walk are filtered by
+ * these flags.
*/
int commits;
int trees;
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 03/13] t/perf: add pack-objects filter and path-walk benchmark
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
Add p5315-pack-objects-filter.sh to measure the performance of
'git pack-objects --revs --all' under different filter and traversal
combinations:
* no filter (baseline)
* --filter=blob:none (blobless)
* --filter=sparse:oid=<oid> (cone-mode sparse)
Each filter scenario is tested both with and without --path-walk,
producing paired measurements that show the impact of the path-walk
traversal for each filter type as we integrate the --path-walk feature
with different --filter options. It currently has no integration so
falls back to the standard revision walk. Thus, there are no significant
differences in the current results other than a full repack (and even
then, the --path-walk feature is not incredibly different for the
default Git repository):
Test HEAD
-----------------------------------------------------
5315.2: repack (no filter) 27.91
5315.3: repack size (no filter) 250.7M
5315.4: repack (no filter, --path-walk) 34.92
5315.5: repack size (no filter, --path-walk) 220.0M
5315.6: repack (blob:none) 13.63
5315.7: repack size (blob:none) 137.6M
5315.8: repack (blob:none, --path-walk) 13.48
5315.9: repack size (blob:none, --path-walk) 137.7M
5315.10: repack (sparse:oid) 72.67
5315.11: repack size (sparse:oid) 187.4M
5315.12: repack (sparse:oid, --path-walk) 72.47
5315.13: repack size (sparse:oid, --path-walk) 187.4M
The sparse filter definition is built automatically by sampling
depth-2 directories from the test repository, making the test work
on any repo passed via GIT_PERF_LARGE_REPO. For repos that lack
depth-2 directories, a single top-level directory is used; for flat
repos, the sparse tests are skipped via prerequisite.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/perf/p5315-pack-objects-filter.sh | 129 ++++++++++++++++++++++++++++
1 file changed, 129 insertions(+)
create mode 100755 t/perf/p5315-pack-objects-filter.sh
diff --git a/t/perf/p5315-pack-objects-filter.sh b/t/perf/p5315-pack-objects-filter.sh
new file mode 100755
index 0000000000..445ff25be2
--- /dev/null
+++ b/t/perf/p5315-pack-objects-filter.sh
@@ -0,0 +1,129 @@
+#!/bin/sh
+
+test_description='Tests pack-objects performance with filters and --path-walk'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success 'setup filter inputs' '
+ # Sample a few depth-2 directories from the test repo to build
+ # a cone-mode sparse-checkout definition. The sampling picks
+ # directories at evenly-spaced positions so the choice is stable
+ # and scales to repos of any shape.
+
+ git ls-tree -d HEAD >top-entries &&
+ grep "^040000" top-entries |
+ awk "{print \$4;}" >top-dirs &&
+ top_nr=$(wc -l <top-dirs) &&
+
+ while read tdir
+ do
+ git ls-tree -d --format="$tdir/%(path)" "HEAD:$tdir" || return 1
+ done <top-dirs >depth2-dirs &&
+
+ d2_nr=$(wc -l <depth2-dirs) &&
+
+ if test "$d2_nr" -ge 2
+ then
+ # Pick two directories from evenly-spaced positions.
+ first=$(sed -n "1p" depth2-dirs) &&
+ mid=$(sed -n "$((d2_nr / 2 + 1))p" depth2-dirs) &&
+
+ p1=$(dirname "$first") &&
+ p2=$(dirname "$mid") &&
+
+ # Build cone-mode sparse-checkout patterns.
+ {
+ echo "/*" &&
+ echo "!/*/" &&
+ echo "/$p1/" &&
+ echo "!/$p1/*/" &&
+ if test "$p1" != "$p2"
+ then
+ echo "/$p2/" &&
+ echo "!/$p2/*/"
+ fi &&
+ echo "/$first/" &&
+ if test "$first" != "$mid"
+ then
+ echo "/$mid/"
+ fi
+ } >sparse-patterns &&
+
+ git hash-object -w sparse-patterns >sparse-oid &&
+ echo "Sparse cone: $first $mid" &&
+ cat sparse-patterns &&
+ test_set_prereq SPARSE_OID
+ elif test "$top_nr" -ge 1
+ then
+ # Fallback: use a single top-level directory.
+ first=$(sed -n "1p" top-dirs) &&
+ {
+ echo "/*" &&
+ echo "!/*/" &&
+ echo "/$first/"
+ } >sparse-patterns &&
+
+ git hash-object -w sparse-patterns >sparse-oid &&
+ echo "Sparse cone: $first" &&
+ cat sparse-patterns &&
+ test_set_prereq SPARSE_OID
+ fi
+'
+
+test_perf 'repack (no filter)' '
+ git pack-objects --stdout --no-reuse-delta --revs --all </dev/null >pk
+'
+
+test_size 'repack size (no filter)' '
+ test_file_size pk
+'
+
+test_perf 'repack (no filter, --path-walk)' '
+ git pack-objects --stdout --no-reuse-delta --revs --all --path-walk </dev/null >pk
+'
+
+test_size 'repack size (no filter, --path-walk)' '
+ test_file_size pk
+'
+
+test_perf 'repack (blob:none)' '
+ git pack-objects --stdout --no-reuse-delta --revs --all --filter=blob:none </dev/null >pk
+'
+
+test_size 'repack size (blob:none)' '
+ test_file_size pk
+'
+
+test_perf 'repack (blob:none, --path-walk)' '
+ git pack-objects --stdout --no-reuse-delta --revs --all --path-walk \
+ --filter=blob:none </dev/null >pk
+'
+
+test_size 'repack size (blob:none, --path-walk)' '
+ test_file_size pk
+'
+
+test_perf 'repack (sparse:oid)' \
+ --prereq SPARSE_OID '
+ git pack-objects --stdout --no-reuse-delta --revs --all \
+ --filter=sparse:oid=$(cat sparse-oid) </dev/null >pk
+'
+
+test_size 'repack size (sparse:oid)' \
+ --prereq SPARSE_OID '
+ test_file_size pk
+'
+
+test_perf 'repack (sparse:oid, --path-walk)' \
+ --prereq SPARSE_OID '
+ git pack-objects --stdout --no-reuse-delta --revs --all --path-walk \
+ --filter=sparse:oid=$(cat sparse-oid) </dev/null >pk
+'
+
+test_size 'repack size (sparse:oid, --path-walk)' \
+ --prereq SPARSE_OID '
+ test_file_size pk
+'
+
+test_done
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 02/13] pack-objects: pass --objects with --path-walk
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
When 'git pack-objects' has the --path-walk option enabled, it uses a
different set of revision walk parameters than normal. For one,
--objects was previously assumed by the path-walk API and could be
omitted. We also needed --boundary to allow discovering UNINTERESTING
objects to use as delta bases.
We will be updating the path-walk API soon to work with some filter
options. However, the revision machinery will trigger a fatal error:
fatal: object filtering requires --objects
The fix is easy: add the --objects option as an argument. This has no
effect on the path-walk API but does simplify the revision option
parsing for the objects filter.
We can remove the comment about "removing" the options because they were
never removed and instead not added. We still need to disable using
bitmaps.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
builtin/pack-objects.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index dd2480a73d..4338962904 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -5190,10 +5190,7 @@ int cmd_pack_objects(int argc,
}
if (path_walk) {
strvec_push(&rp, "--boundary");
- /*
- * We must disable the bitmaps because we are removing
- * the --objects / --objects-edge[-aggressive] options.
- */
+ strvec_push(&rp, "--objects");
use_bitmap_index = 0;
} else if (thin) {
use_internal_rev_list = 1;
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 01/13] t5620: make test work with path-walk var
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee, Derrick Stolee
In-Reply-To: <pull.2101.v5.git.1779474277.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
The GIT_TEST_PACK_PATH_WALK test variable allows enabling the
--path-walk option to 'git pack-objects' by default. This sometimes
engages the warning that --path-walk is incompatible with the --filter
option. These tests in t5620 fail due to this warning over stderr in
this case. Disable this variable for this moment until these options
work together.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t5620-backfill.sh | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 94f35ce190..e174290787 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -298,6 +298,9 @@ test_expect_success 'backfill with prefix pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
+ # If we enable --path-walk here, we will get a warning overs stderr
+ # due to incompatibilities with --filter.
+ GIT_TEST_PACK_PATH_WALK=0 \
git -C backfill-path backfill HEAD -- d/f 2>err &&
test_must_be_empty err &&
@@ -315,6 +318,9 @@ test_expect_success 'backfill with multiple pathspecs' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
+ # If we enable --path-walk here, we will get a warning overs stderr
+ # due to incompatibilities with --filter.
+ GIT_TEST_PACK_PATH_WALK=0 \
git -C backfill-path backfill HEAD -- d/f a 2>err &&
test_must_be_empty err &&
@@ -332,6 +338,9 @@ test_expect_success 'backfill with wildcard pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
+ # If we enable --path-walk here, we will get a warning overs stderr
+ # due to incompatibilities with --filter.
+ GIT_TEST_PACK_PATH_WALK=0 \
git -C backfill-path backfill HEAD -- "d/file.*.txt" 2>err &&
test_must_be_empty err &&
--
gitgitgadget
^ permalink raw reply related
* [PATCH v5 00/13] pack-objects: integrate --path-walk and some --filter options
From: Derrick Stolee via GitGitGadget @ 2026-05-22 18:24 UTC (permalink / raw)
To: git
Cc: christian.couder, gitster, johannes.schindelin, johncai86,
karthik.188, kristofferhaugsbakk, me, newren, peff, ps,
Taylor Blau, Derrick Stolee
In-Reply-To: <pull.2101.v4.git.1778707135.gitgitgadget@gmail.com>
NOTE: This series is based on en/backfill-fixes-and-edges.
The 'git pack-objects' command has a '--path-walk' option that uses the
path-walk API instead of a typical revision walk to group objects into
chunks by path name instead of relying solely on name-hashes to group
similar files together. (It also does a second compression pass looking for
better deltas after the first pass that is focused within chunks per path.)
The '--path-walk' feature was not previously integrated with the '--filter'
feature, so a warning would appear and disable the path-walk API when a
filter is given. This patch series integrates these together in the
following ways:
* --filter=blob:none updates the path-walk API options to skip blobs.
* --filter=blob:limit=<size> adds a scan to a list of blob objects to
remove objects that are too large.
* --filter=sparse:<oid> adds a scan to the chunks to validate that the
paths match the sparse-checkout patterns.
In particular, this last check is significantly faster than the previous
algorithm because it can check all objects at a given path simultaneously
instead of checking all sparse-checkout patterns for each discovered blob
object.
A subtlety must be added here, in that we must change how we mark an object
as "seen" during the path-walk. We may need to add an object to multiple
paths and only mark it as "seen" if it indeed matched the sparse-checkout
patterns as the path is accepted for emitting to the callback. This adds a
new filter that the "seen" objects must also be removed from later chunks to
avoid sending the same object as grouped to multiple chunks.
There's also a subtle detail here in that the path-walk API also prunes tree
paths based on cone-mode sparse-checkouts, to enable 'git backfill --sparse'
operating quickly for small sparse-checkout scopes. But the
--filter=sparse:<oid> feature doesn't prune trees!
As a future step, I do plan to recommend that we add a treesparse:<oid>
setting that does allow us to trim the tree set by cone-mode sparse
patterns. At the time that partial clone filters were being created, cone
mode sparse-checkout didn't exist and neither did the sparse index. Those
features together make a smaller tree set possible, assuming the user never
needs to change their scope. This would be a significant change so it is not
implemented here, though the git pack-objects integration would be quick
after this series completes.
Neither the sparse:<oid> or hypothetical treesparse:<oid> options are or
should necessarily be supported by Git servers. It's too expensive to
compute dynamically and it doesn't work well with reachability bitmaps. What
becomes possible with this change is that it becomes reasonably fast to
construct bundles with these filters that can bootstrap a working
environment with the full history of all files within a given
sparse-checkout scope.
Performance Results
===================
Since the '--path-walk' option is ignored in today's Git version when a
'--filter' is added, the performance matches the behavior without
'--path-walk'. For the tables below, you can compare the rows against each
other (time and then packfile size) for the mode without and then with
'--path-walk' as a representation of "before" and "after". (These tables are
repeated in the commit messages as new implementations improve specific
rows.)
I chose a number of open source repositories of various sizes and shapes:
git/git
=======
Test HEAD
-------------------------------------------------------------------
5315.2: repack (no filter) 27.73
5315.3: repack size (no filter) 250.6M
5315.4: repack (no filter, --path-walk) 35.19
5315.5: repack size (no filter, --path-walk) 220.1M
5315.6: repack (blob:none) 13.42
5315.7: repack size (blob:none) 137.6M
5315.8: repack (blob:none, --path-walk) 20.98
5315.9: repack size (blob:none, --path-walk) 115.2M
5315.10: repack (sparse:oid) 72.53
5315.11: repack size (sparse:oid) 187.5M
5315.12: repack (sparse:oid, --path-walk) 29.00
5315.13: repack size (sparse:oid, --path-walk) 161.0M
nodejs/node
===========
Test HEAD
--------------------------------------------------------------------
5315.2: repack (no filter) 75.53
5315.3: repack size (no filter) 0.9G
5315.4: repack (no filter, --path-walk) 80.54
5315.5: repack size (no filter, --path-walk) 885.7M
5315.6: repack (blob:none) 12.65
5315.7: repack size (blob:none) 148.6M
5315.8: repack (blob:none, --path-walk) 17.60
5315.9: repack size (blob:none, --path-walk) 134.6M
5315.10: repack (sparse:oid) 518.84
5315.11: repack size (sparse:oid) 153.4M
5315.12: repack (sparse:oid, --path-walk) 27.99
5315.13: repack size (sparse:oid, --path-walk) 139.4M
microsoft/fluentui
==================
Test HEAD
--------------------------------------------------------------------
5315.2: repack (no filter) 146.77
5315.3: repack size (no filter) 562.1M
5315.4: repack (no filter, --path-walk) 72.82
5315.5: repack size (no filter, --path-walk) 172.6M
5315.6: repack (blob:none) 4.84
5315.7: repack size (blob:none) 62.7M
5315.8: repack (blob:none, --path-walk) 5.19
5315.9: repack size (blob:none, --path-walk) 59.9M
5315.10: repack (sparse:oid) 59.95
5315.11: repack size (sparse:oid) 85.6M
5315.12: repack (sparse:oid, --path-walk) 15.16
5315.13: repack size (sparse:oid, --path-walk) 72.5M
microsoftdocs/azure-devops-docs
===============================
Test HEAD
---------------------------------------------------------------------
5315.2: repack (no filter) 4.41
5315.3: repack size (no filter) 1.6G
5315.4: repack (no filter, --path-walk) 6.00
5315.5: repack size (no filter, --path-walk) 1.6G
5315.6: repack (blob:none) 1.35
5315.7: repack size (blob:none) 60.0M
5315.8: repack (blob:none, --path-walk) 1.23
5315.9: repack size (blob:none, --path-walk) 60.0M
5315.10: repack (sparse:oid) 138.24
5315.11: repack size (sparse:oid) 84.4M
5315.12: repack (sparse:oid, --path-walk) 1.86
5315.13: repack size (sparse:oid, --path-walk) 84.4M
Performance conclusions
=======================
As seen in earlier series around the '--path-walk' feature, the space
savings can be valuable but is not always guaranteed. When the space savings
doesn't happen, then the time spent is generally slower because of the
two-pass mechanism. The microsoftdocs/azure-devops-docs repo demonstrates
this case quite clearly.
However, even in these cases the 'sparse:<oid>' filters are much faster
because of the ability to check an entire set of objects against the
sparse-checkout patterns only once.
Thanks, -Stolee
UPDATES IN V2
=============
* Rebased onto en/backfill-fixes-and-edges to properly integrate with the
incompatible rev-list options logic in that series.
* Updated documentation as behavior changes. Credit to Taylor Blau for
presenting these suggestions in his RFC [2].
* Added three patches of Taylor's to extend more filter options.
UPDATES IN V3
=============
Upon realizing that the tests were not passing with
GIT_TEST_PACK_PATH_WALK=1, I spent a lot of time reworking each patch to
pass all tests with that variable enabled. This led to a lot of meaningful
changes:
* A new patch updates t5620-backfill.sh because they are currently failing
due to a check for quiet stderr checks. These changes are reverted later
when the filters are integrated so the warning stops being written.
* I move the logic for the path-walk API emitting "directly requested"
objects (non-commits in the 'pending' list). This is substantial enough
to be its own patch.
* The filtering logic is pulled entirely within the path-walk API instead
of needing integration within builtin/pack-objects.c.
* The tree:0 filter had a lot to be desired when fetching missing objects,
so is substantially updated.
* The object-type filter requires a change to the typical direct-request
behavior, including a new 'strict_types' member that prevents ever
allowing objects against type.
* The combine filter needed better logic around multiple blob size limits,
to take the smaller of the two.
* The t6601-path-walk.sh script has many test updates to better reflect the
new behavior, as required by the other partial clone tests under
GIT_TEST_PACK_PATH_WALK=1.
* Doc updates for 'form' to 'forms' when multiple forms are supported.
I've also updated Taylor's bitmap-related patches into three commits on top
of this series (see [3]).
UPDATES IN V4
=============
Thanks, Taylor for the careful review.
* Several typos are fixed.
* The performance test is corrected for issues around piping Git commands
and made more robust to the existence of submodules.
* BIG: The tree:0 patch is significantly updated in this version. Taylor
correctly smelled a problem with the new logic to emit the /tagged-trees
object set, and that signaled that those trees were previously never
emitted. I update the test to demonstrate that changing the data shape
(including tagged trees that are otherwise-unreachable) doesn't change
the test behavior, signaling a bug. The behavior change details all the
complexities of visiting only directly-requested trees under a tree:0
filter and recursing on all trees in other cases.
UPDATES IN V5
=============
Small adjustments to the performance test script.
P.S. I've CC'd the folks who were on the original path-walk feature thread
[1]
[1]
https://lore.kernel.org/git/pull.1819.git.1741571455.gitgitgadget@gmail.com/
[2] https://lore.kernel.org/git/cover.1777853408.git.me@ttaylorr.com/
[3]
https://github.com/derrickstolee/git/compare/path-walk-filters...derrickstolee:git:path-walk-bitmaps
Derrick Stolee (10):
t5620: make test work with path-walk var
pack-objects: pass --objects with --path-walk
t/perf: add pack-objects filter and path-walk benchmark
path-walk: always emit directly-requested objects
path-walk: support blobless filter
backfill: die on incompatible filter options
path-walk: support blob size limit filter
path-walk: add pl_sparse_trees to control tree pruning
pack-objects: support sparse:oid filter with path-walk
t6601: tag otherwise-unreachable trees
Taylor Blau (3):
path-walk: support `tree:0` filter
path-walk: support `object:type` filter
path-walk: support `combine` filter
Documentation/git-backfill.adoc | 4 +
Documentation/git-pack-objects.adoc | 8 +-
Documentation/technical/api-path-walk.adoc | 7 +
builtin/backfill.c | 8 +-
builtin/pack-objects.c | 23 +-
path-walk.c | 270 ++++++++--
path-walk.h | 31 ++
t/helper/test-path-walk.c | 17 +-
t/perf/p5315-pack-objects-filter.sh | 129 +++++
t/t5317-pack-objects-filter-objects.sh | 125 +++++
t/t5620-backfill.sh | 8 +
t/t6601-path-walk.sh | 572 +++++++++++++++++++--
12 files changed, 1124 insertions(+), 78 deletions(-)
create mode 100755 t/perf/p5315-pack-objects-filter.sh
base-commit: a1ad4a0fca14cdeb55ab9fb065551b15cafa8a4f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2101%2Fderrickstolee%2Fpath-walk-filters-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2101/derrickstolee/path-walk-filters-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/2101
Range-diff vs v4:
1: 0840110116 = 1: 0840110116 t5620: make test work with path-walk var
2: d7c87545f3 = 2: d7c87545f3 pack-objects: pass --objects with --path-walk
3: fb8a0f9c43 ! 3: 697ef716d2 t/perf: add pack-objects filter and path-walk benchmark
@@ t/perf/p5315-pack-objects-filter.sh (new)
+ awk "{print \$4;}" >top-dirs &&
+ top_nr=$(wc -l <top-dirs) &&
+
-+ >depth2-dirs &&
+ while read tdir
+ do
-+ git ls-tree -d --name-only "HEAD:$tdir" 2>/dev/null || return 1
-+ done <top-dirs >depth2-dirs.raw &&
-+ sed "s|^|$tdir/|" <depth2-dirs.raw >depth2-dirs &&
++ git ls-tree -d --format="$tdir/%(path)" "HEAD:$tdir" || return 1
++ done <top-dirs >depth2-dirs &&
+
+ d2_nr=$(wc -l <depth2-dirs) &&
+
4: e77c8a6bbc = 4: 91845bcef0 path-walk: always emit directly-requested objects
5: f4904f81e0 = 5: fdb9361198 path-walk: support blobless filter
6: f37467e46f = 6: 89726faf7e backfill: die on incompatible filter options
7: 133c1b156c = 7: 3884d4737f path-walk: support blob size limit filter
8: 0f517be8e3 = 8: 31b4ef0fa1 path-walk: add pl_sparse_trees to control tree pruning
9: b4dc09ab69 = 9: 7d8f0aa036 pack-objects: support sparse:oid filter with path-walk
10: 0b1eed0790 = 10: a68676d0de t6601: tag otherwise-unreachable trees
11: b23244c4c2 = 11: b0db73c6cc path-walk: support `tree:0` filter
12: 7e1e503361 = 12: 6845988f50 path-walk: support `object:type` filter
13: a615b1a707 = 13: d33d899251 path-walk: support `combine` filter
--
gitgitgadget
^ permalink raw reply
* Re: [PATCH v4 00/13] pack-objects: integrate --path-walk and some --filter options
From: Derrick Stolee @ 2026-05-22 18:24 UTC (permalink / raw)
To: Taylor Blau
Cc: Derrick Stolee via GitGitGadget, git, christian.couder, gitster,
johannes.schindelin, johncai86, karthik.188, kristofferhaugsbakk,
newren, peff, ps
In-Reply-To: <ahCGL+AsIaR+63Pr@nand.local>
On 5/22/2026 12:37 PM, Taylor Blau wrote:
> On Thu, May 21, 2026 at 07:01:33PM -0400, Derrick Stolee wrote:
>>> I'm curious what your thoughts are there. I think barring that things
>>> are near-complete here, though I did note one issue with the t/perf
>>> changes (that is my fault for having a bad suggestion on the earlier
>>> round).
>>
>> I like the suggested change to t/perf but I don't share your concerns
>> around the '/' character in the path (I go deeper into why in the
>> thread).
>
> Sounds good. I think a minor re-roll for that would be good, but I
> better understand your viewpoint around the '/' leading character now,
> so I think other than that we're good to go from my perspective.
Thanks! v5 is on its way with the update to the perf script.
Thanks,
-Stolee
^ permalink raw reply
* Re: I discovered a minor issue with `git fetch`.
From: Ben Knoble @ 2026-05-22 17:46 UTC (permalink / raw)
To: SURA; +Cc: git
In-Reply-To: <CAD6AYr9YmcnkdW=Nx=HUKcuaNbv1ukrAbXRnKyGibCQDy8N3hQ@mail.gmail.com>
> Le 22 mai 2026 à 03:48, SURA <surak8806@gmail.com> a écrit :
>
> Hello everyone
>
> The child processes spawned by `git fetch` can become zombie processes.
> In most scenarios, these zombie processes are reaped by Process 1, so
> this typically doesn't cause any problems.
>
> However, within a Docker container, the application service itself is
> sometimes designated as Process 1 (for instance, a service written in
> Go). Since these application services lack the capability to reap
> zombie processes, the zombies will gradually exhaust the available PID
> resources.
See also lore.kernel.org/git/202602231615147.3294516-1-cshung@gmail.com and subsequent discussion for related material.
>
> Here are the simple steps to reproduce this issue:
> 1. `git clone https://github.com/SURA907/pid-1.git`
> 2. `cd pid-1`
> 3. `docker build -t pid-1 .`
> 4. `docker run -d --name pid-1 pid-1:latest`
> 5. `docker exec -it pid-1 /bin/bash`
> 6. `mkdir repo && cd repo && git init --bare`
> 7. `ps -ef`
> ------
> UID PID PPID C STIME TTY TIME CMD
> root 1 0 0 07:16 ? 00:00:00 tail -f /dev/null
> root 7 0 0 07:16 pts/0 00:00:00 /bin/bash
> root 13 0 0 07:16 pts/1 00:00:00 /bin/bash
> root 29 7 0 07:17 pts/0 00:00:00 ps -ef
> ------
>
> 8. `git fetch https://github.com/git/git.git`
> 9. `ps -ef` (Run this command from a separate terminal session
> connected to the container)
> ------
> UID PID PPID C STIME TTY TIME CMD
> root 1 0 0 07:16 ? 00:00:00 tail -f /dev/null
> root 7 0 0 07:16 pts/0 00:00:00 /bin/bash
> root 13 0 0 07:16 pts/1 00:00:00 /bin/bash
> root 30 13 1 07:17 pts/1 00:00:00 git fetch https://github.com/git/git.git
> root 31 30 0 07:17 pts/1 00:00:00 /usr/local/libexec/git-core/git
> remote-https https://github.com/git/git.git
> https://github.com/git/git.git
> root 32 31 2 07:17 pts/1 00:00:00
> /usr/local/libexec/git-core/git-remote-https
> https://github.com/git/git.git https://github.com/git/git.git
> root 36 30 30 07:17 pts/1 00:00:00 /usr/local/libexec/git-core/git
> index-pack --stdin -v --fix-thin --keep=fetch-pack 30 on sura-pc
> --pack_header=2,399455
> root 38 7 0 07:17 pts/0 00:00:00 ps -ef
> ------
>
> 10. ps -ef (after fetch ends)
> ------
> UID PID PPID C STIME TTY TIME CMD
> root 1 0 0 07:16 ? 00:00:00 tail -f /dev/null
> root 7 0 0 07:16 pts/0 00:00:00 /bin/bash
> root 13 0 0 07:16 pts/1 00:00:00 /bin/bash
> root 52 1 0 07:19 ? 00:00:00 [git] <defunct>
> root 53 7 0 07:19 pts/0 00:00:00 ps -ef
> ------
>
> A zombie process has appeared. It appears to originate from a `fetch`
> subprocess that terminates very quickly; despite several attempts, I
> have been unable to successfully capture it.
>
> This issue was discovered within a legacy service. A few days after
> upgrading to Git 2.53.0, the system's PID resources were exhausted by
> zombie processes. This is likely the result of recent changes, as this
> problem did not exist in earlier versions (2.4x).
>
> To be honest, this is not an urgent matter; I have already deployed
> `tini` as the init process (PID 1) to prevent the service from
> becoming unavailable.
>
^ permalink raw reply
* Re: [PATCH 0/5] [RFC] diff: add diff.<driver>.process for external hunk providers
From: Michael Montalbo @ 2026-05-22 17:19 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Michael Montalbo via GitGitGadget, git
In-Reply-To: <xmqq8q9cui5c.fsf@gitster.g>
On Thu, May 21, 2026 at 10:29 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Michael Montalbo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > This series adds diff.<driver>.process, a long-running subprocess protocol
> > that lets external tools provide hunks to git's diff and blame pipelines.
> >
> > Over the past 18 years, git's diff pipeline accumulated many features that
> > operate on hunks: word diff, function context, color-moved, indent
> > heuristic, blame. External tools can replace the pipeline entirely
> > (diff.<driver>.command) or select among builtin algorithms
> > (diff.<driver>.algorithm), but there is no way for a tool to provide
> > line-change information into the pipeline. Tools that understand code
> > structure (tree-sitter parsers, format-aware analyzers, tools like
> > Difftastic and Mergiraf) must bypass git's pipeline and lose access to
> > everything downstream.
> >
> > The protocol follows filter.<driver>.process: pkt-line over stdin/stdout,
> > capability negotiation, one tool invocation per git command. The tool
> > receives file pairs and returns hunk descriptors that git feeds into the
> > standard xdiff pipeline. All output features work normally.
> >
> > Zero hunks with status=success means the tool considers the files
> > equivalent. git diff shows no output for the file, and git blame skips the
> > commit, attributing lines to earlier commits.
> >
> > On error or tool crash, git falls back silently to the builtin diff
> > algorithm. The feature is opt-in via diff.<driver>.process and
> > .gitattributes; unconfigured files are unaffected.
> >
> > The series includes git diff-process-normalize, a built-in tool that
> > compares files line by line ignoring whitespace (same logic as "git diff -w"
> > via xdiff_compare_lines):
>
> Interesting.
>
> If the goal is purely to normalize content before comparison
> (e.g. stripping comments or canonicalizing formatting), we already
> have the `textconv` mechanism. While `textconv` is a "one-shot"
> per-file process, it is significantly simpler.
>
> I suspect, however, that the primary focus here is to allow external
> tools to provide structural alignment (e.g. for AST- aware diffs
> like Difftastic or Mergiraf) without losing the original content in
> the display. Unlike `textconv`, which transforms the text the user
> sees, this protocol lets the display remain identical to the source
> while using a custom engine for the line-matching logic.
>
> If that is the intent, it should be stated more explicitly in the
> documentation and commit messages. The "whitespace-normalize"
> demonstration in [PATCH 5/5] is misleading because it's exactly the
> case where `textconv` would be sufficient.
>
Thank you for looking at this.
Yes, you have correctly identified the primary focus. My intention with the
whitespace normalization example was to provide a kind of "hello world"
diff process that would showcase how such a tool could interact with
the pipeline further down (i.e., blame vs diff output). However, I do agree
that it is a confusing example because it seems to clash with something
`textconv` already provides. I will update messaging across the series to
make the true intention of these changes more clear.
> I am afraid that the use of a long-running subprocess for every
> diff/blame invocation adds significant complexity and overhead. In
> particular, wouldn't the `blame` implementation performs a
> round-trip to the subprocess for every commit in the history? Even
> with a persistent process, the overhead of serializing and
> deserializing the entire file content twice (old and new) for every
> commit could be prohibitive for large files or deep histories.
>
> So, I dunno.
I hear you on this point. Anecdotally, my measurement of running blame
on diff.c:
Performance:
without process: 0.57s
with process: 0.67s
Blame attribution:
without process: 726 unique commits
with process: 721 unique commits (5 whitespace-only skipped)
Skipped commits:
0ea7d5b6f8 diff.c: fix some recent whitespace style violations
2775d92c53 diff.c: stylefix
4b25d091ba Fix a bunch of pointer declarations (codestyle)
a6080a0a44 War on whitespace
eeefa7c90e Style fixes, add a space after if/for/while.
I was imagining we could potentially optimize performance by extending the
protocol to enable passing OIDs as a capability so tools could read objects
directly without needing any serialization/deserialization.
^ permalink raw reply
* Re: Why do we need to wait 1s between a git add and commit
From: Fabrice SALVAIRE @ 2026-05-22 17:13 UTC (permalink / raw)
To: Kristoffer Haugsbakk, git
In-Reply-To: <b403477d-5587-4afc-bd02-dbd207c22e67@app.fastmail.com>
Hi,
Yes it throws 3000+ sequential subprocess calls at the speed of Python...
And the sleep time seems to be critical, 1s is ok but 100ms is not so ok.
I also have the feeling this is due to a git upgrade. But I didn't test
to downgrade.
That is a major issue if we have to be slow while using Git...
It is not unusual to write a shell script with a sequence of add/commit.
Le 22/05/2026 à 17:12, Kristoffer Haugsbakk a écrit :
> Hi
>
> On Fri, May 22, 2026, at 14:28, Fabrice SALVAIRE wrote:
>> I wrote a Python tool to dump a wiki to a git repository, that does
>> basically a succession of subprocess calls to git add and commit.
>>
>> Recently, I discovered this tool doesn't work any longer and that git
>> commit (2.54 on Fedora 42 / 43) crashes randomly.
>>
>> I cannot explain this behavior since my code is trivial.
>>
>> I had the intuition to add a sleep time of 1s just after a git call, and
>> it solves the issue.
>>
>> I noticed for some cases that another call to git commit were
>> successful. For most cases, git fsck and gitk report issues.
>>
>> It looks like the state of the git repository was not yet completed
>> before the end of the git subprocess.
> This might be caused by git-maintenance(1) being run in the background
> without locking? That’s a new issue in Git 2.54.0. See:
>
> https://lore.kernel.org/git/20260509175249.GA2336928@coredump.intra.peff.net/
>
> The following script reproduces the issue on Git 2.53.0. I am guessing
> that your script does something similar? It depends on how many commits
> it creates in a short timeframe.
>
> https://lore.kernel.org/git/20260508180341.GB737125@coredump.intra.peff.net/
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox