From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, gitster@pobox.com,
johannes.schindelin@gmx.de, johncai86@gmail.com,
karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com,
me@ttaylorr.com, newren@gmail.com, peff@peff.net, ps@pks.im,
Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v3 08/12] path-walk: add pl_sparse_trees to control tree pruning
Date: Mon, 11 May 2026 18:13:05 +0000 [thread overview]
Message-ID: <2360a5be812b6f8f7e9ccb36e8b5f3347ec646f5.1778523189.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2101.v3.git.1778523189.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
The path-walk API prunes trees and blobs when a sparse-checkout pattern
list is provided, which is the correct behavior for 'git backfill
--sparse' since it only needs to fill in objects at paths within the
sparse cone.
However, a future change will use the path-walk API with a sparse:<oid>
filter that restricts only blobs while retaining all reachable trees.
To support both behaviors, add a 'pl_sparse_trees' flag to
path_walk_info. When set (as in 'git backfill --sparse' and the
--stdin-pl test helper mode), the sparse patterns prune both trees and
blobs. When unset, only blobs are filtered and all trees are walked and
reported.
Additionally, move the SEEN flag assignment in add_tree_entries() to
after the sparse pattern and pathspec checks. Previously, SEEN was set
immediately upon discovering an object, before checking whether its path
matched the sparse patterns. When the same object ID appeared at
multiple paths (e.g. sibling directories with identical contents), the
first path to be visited would mark the object as SEEN. If that path was
outside the sparse cone, the object would be skipped there but also
never discovered at its in-cone path.
By deferring the SEEN flag until after the checks pass, objects that are
skipped due to sparse filtering remain discoverable at other paths where
they may be in scope.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
builtin/backfill.c | 1 +
path-walk.c | 5 +++--
path-walk.h | 6 ++++++
t/helper/test-path-walk.c | 6 +++++-
t/t6601-path-walk.sh | 37 +++++++++++++++++++++++++++++++++++++
5 files changed, 52 insertions(+), 3 deletions(-)
diff --git a/builtin/backfill.c b/builtin/backfill.c
index 5254a42711..e71e0f4742 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -109,6 +109,7 @@ static int do_backfill(struct backfill_context *ctx)
if (ctx->sparse) {
CALLOC_ARRAY(info.pl, 1);
+ info.pl_sparse_trees = 1;
if (get_sparse_checkout_patterns(info.pl)) {
path_walk_info_clear(&info);
return error(_("problem loading sparse-checkout"));
diff --git a/path-walk.c b/path-walk.c
index 16fdfd7c5a..21cc40c392 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -183,7 +183,6 @@ static int add_tree_entries(struct path_walk_context *ctx,
/* Skip this object if already seen. */
if (o->flags & SEEN)
continue;
- o->flags |= SEEN;
strbuf_setlen(&path, base_len);
strbuf_add(&path, entry.path, entry.pathlen);
@@ -204,7 +203,8 @@ static int add_tree_entries(struct path_walk_context *ctx,
ctx->repo->index);
if (ctx->info->pl->use_cone_patterns &&
- match == NOT_MATCHED)
+ match == NOT_MATCHED &&
+ (type == OBJ_BLOB || ctx->info->pl_sparse_trees))
continue;
else if (!ctx->info->pl->use_cone_patterns &&
type == OBJ_BLOB &&
@@ -239,6 +239,7 @@ static int add_tree_entries(struct path_walk_context *ctx,
continue;
}
+ o->flags |= SEEN;
add_path_to_list(ctx, path.buf, type, &entry.oid,
!(o->flags & UNINTERESTING));
diff --git a/path-walk.h b/path-walk.h
index 60ceb65433..7e57ae5f65 100644
--- a/path-walk.h
+++ b/path-walk.h
@@ -76,8 +76,14 @@ struct path_walk_info {
* of the cone. If not in cone mode, then all tree paths will be
* explored but the path_fn will only be called when the path matches
* the sparse-checkout patterns.
+ *
+ * When 'pl_sparse_trees' is zero, the sparse patterns only restrict
+ * blobs and all trees are included in the walk output. This matches
+ * the behavior of the sparse:oid object filter. When nonzero, trees
+ * are also pruned by the sparse patterns (as used by backfill).
*/
struct pattern_list *pl;
+ int pl_sparse_trees;
};
#define PATH_WALK_INFO_INIT { \
diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c
index 88f86ae0dc..3f2b50a9aa 100644
--- a/t/helper/test-path-walk.c
+++ b/t/helper/test-path-walk.c
@@ -68,7 +68,7 @@ static int emit_block(const char *path, struct oid_array *oids,
int cmd__path_walk(int argc, const char **argv)
{
- int res, stdin_pl = 0;
+ int res, stdin_pl = 0, pl_sparse_trees = -1;
struct rev_info revs = REV_INFO_INIT;
struct path_walk_info info = PATH_WALK_INFO_INIT;
struct path_walk_test_data data = { 0 };
@@ -89,6 +89,8 @@ int cmd__path_walk(int argc, const char **argv)
N_("toggle aggressive edge walk")),
OPT_BOOL(0, "stdin-pl", &stdin_pl,
N_("read a pattern list over stdin")),
+ OPT_BOOL(0, "pl-sparse-trees", &pl_sparse_trees,
+ N_("toggle pruning of trees by sparse patterns")),
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
OPT_END(),
};
@@ -116,6 +118,8 @@ int cmd__path_walk(int argc, const char **argv)
if (stdin_pl) {
struct strbuf in = STRBUF_INIT;
CALLOC_ARRAY(info.pl, 1);
+ info.pl_sparse_trees = (pl_sparse_trees >= 0) ?
+ pl_sparse_trees : 1;
info.pl->use_cone_patterns = 1;
diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh
index 45f366d738..02ad83dfb0 100755
--- a/t/t6601-path-walk.sh
+++ b/t/t6601-path-walk.sh
@@ -206,6 +206,43 @@ test_expect_success 'base & topic, sparse' '
test_cmp_sorted expect out
'
+test_expect_success 'base & topic, sparse, no tree pruning' '
+ cat >patterns <<-EOF &&
+ /*
+ !/*/
+ /left/
+ EOF
+
+ test-tool path-walk --stdin-pl --no-pl-sparse-trees \
+ -- base topic <patterns >out &&
+
+ cat >expect <<-EOF &&
+ 0:commit::$(git rev-parse topic)
+ 0:commit::$(git rev-parse base)
+ 0:commit::$(git rev-parse base~1)
+ 0:commit::$(git rev-parse base~2)
+ 1:tree::$(git rev-parse topic^{tree})
+ 1:tree::$(git rev-parse base^{tree})
+ 1:tree::$(git rev-parse base~1^{tree})
+ 1:tree::$(git rev-parse base~2^{tree})
+ 2:blob:a:$(git rev-parse base~2:a)
+ 3:tree:a/:$(git rev-parse base:a)
+ 4:tree:left/:$(git rev-parse base:left)
+ 4:tree:left/:$(git rev-parse base~2:left)
+ 5:blob:left/b:$(git rev-parse base~2:left/b)
+ 5:blob:left/b:$(git rev-parse base:left/b)
+ 6:tree:right/:$(git rev-parse topic:right)
+ 6:tree:right/:$(git rev-parse base~1:right)
+ 6:tree:right/:$(git rev-parse base~2:right)
+ blobs:3
+ commits:4
+ tags:0
+ trees:10
+ EOF
+
+ test_cmp_sorted expect out
+'
+
test_expect_success 'topic only' '
test-tool path-walk -- topic >out &&
--
gitgitgadget
next prev parent reply other threads:[~2026-05-11 18:13 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-02 14:15 [PATCH 0/7] pack-objects: integrate --path-walk and some --filter options Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 1/7] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-04 0:49 ` Junio C Hamano
2026-05-04 12:01 ` Derrick Stolee
2026-05-02 14:15 ` [PATCH 2/7] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 3/7] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 4/7] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-03 22:59 ` Junio C Hamano
2026-05-04 12:09 ` Derrick Stolee
2026-05-02 14:15 ` [PATCH 5/7] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 6/7] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 7/7] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 00/10] pack-objects: integrate --path-walk and some --filter options Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 01/10] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 02/10] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 03/10] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 04/10] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 05/10] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 06/10] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 07/10] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 08/10] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 09/10] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 10/10] path-walk: support `combine` filter Taylor Blau via GitGitGadget
2026-05-05 16:18 ` [PATCH v2 00/10] pack-objects: integrate --path-walk and some --filter options Derrick Stolee
2026-05-05 19:01 ` Taylor Blau
2026-05-05 19:44 ` Derrick Stolee
2026-05-05 20:42 ` Taylor Blau
2026-05-07 11:40 ` Derrick Stolee
2026-05-11 3:05 ` Junio C Hamano
2026-05-11 13:58 ` Derrick Stolee
2026-05-11 18:12 ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
2026-05-11 18:12 ` [PATCH v3 01/12] t5620: make test work with path-walk var Derrick Stolee via GitGitGadget
2026-05-12 1:03 ` Taylor Blau
2026-05-11 18:12 ` [PATCH v3 02/12] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-12 1:04 ` Taylor Blau
2026-05-11 18:13 ` [PATCH v3 03/12] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-12 1:11 ` Taylor Blau
2026-05-13 18:23 ` Derrick Stolee
2026-05-11 18:13 ` [PATCH v3 04/12] path-walk: always emit directly-requested objects Derrick Stolee via GitGitGadget
2026-05-12 1:23 ` Taylor Blau
2026-05-13 18:29 ` Derrick Stolee
2026-05-11 18:13 ` [PATCH v3 05/12] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-11 18:38 ` Taylor Blau
2026-05-11 19:44 ` Derrick Stolee
2026-05-11 18:13 ` [PATCH v3 06/12] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-12 1:26 ` Taylor Blau
2026-05-11 18:13 ` [PATCH v3 07/12] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-12 1:33 ` Taylor Blau
2026-05-13 18:35 ` Derrick Stolee
2026-05-11 18:13 ` Derrick Stolee via GitGitGadget [this message]
2026-05-11 18:13 ` [PATCH v3 09/12] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-11 18:13 ` [PATCH v3 10/12] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-12 1:41 ` Taylor Blau
2026-05-13 19:46 ` Derrick Stolee
2026-05-11 18:13 ` [PATCH v3 11/12] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-11 18:13 ` [PATCH v3 12/12] path-walk: support `combine` filter Taylor Blau via GitGitGadget
2026-05-12 1:43 ` [PATCH v3 00/12] pack-objects: integrate --path-walk and some --filter options Taylor Blau
2026-05-13 21:18 ` [PATCH v4 00/13] " Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 01/13] t5620: make test work with path-walk var Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 02/13] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 03/13] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 04/13] path-walk: always emit directly-requested objects Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 05/13] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 06/13] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 07/13] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 08/13] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 09/13] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 10/13] t6601: tag otherwise-unreachable trees Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 11/13] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 12/13] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 13/13] path-walk: support `combine` filter Taylor Blau via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2360a5be812b6f8f7e9ccb36e8b5f3347ec646f5.1778523189.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=karthik.188@gmail.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox