From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, gitster@pobox.com,
johannes.schindelin@gmx.de, johncai86@gmail.com,
karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com,
me@ttaylorr.com, newren@gmail.com, peff@peff.net, ps@pks.im,
Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v4 04/13] path-walk: always emit directly-requested objects
Date: Wed, 13 May 2026 21:18:46 +0000 [thread overview]
Message-ID: <e77c8a6bbc22da3428751f81ff5ee79aa5364237.1778707135.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2101.v4.git.1778707135.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
We are preparing to integrate the path-walk API with some --filter options
in 'git pack-objects', but there is a subtle issue that is revealed when
those are put together and the test suite is run with
GIT_TEST_PACK_PATH_WALK=1.
When a filter reduces the set of requested objects, this results in
filtering out directly-requested objects, such as in the download of needed
blobs in a blobless partial clone.
The root cause is that the scan of pending objects in the path-walk API
respects the filters set in the path_walk_info instead of overriding them
for pending objects.
We can tell that a path is part of the directly-referenced objects if its
path name starts with '/' (other paths, including root trees never have this
starting character). Create a path_is_for_direct_objects() to make this
meaning clear, especially as we add more references in the future as we
integrate the path-walk API with partial clone filter options.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/technical/api-path-walk.adoc | 7 ++++
path-walk.c | 42 ++++++++++++++--------
path-walk.h | 5 +++
3 files changed, 39 insertions(+), 15 deletions(-)
diff --git a/Documentation/technical/api-path-walk.adoc b/Documentation/technical/api-path-walk.adoc
index a67de1b143..6e17b13d61 100644
--- a/Documentation/technical/api-path-walk.adoc
+++ b/Documentation/technical/api-path-walk.adoc
@@ -48,6 +48,13 @@ commits.
applications could disable some options to make it simpler to walk
the objects or to have fewer calls to `path_fn`.
+
+Note that objects directly requested as pending objects (such as targets
+of lightweight tags or other ref tips) are always emitted to `path_fn`,
+even when the corresponding type flag is disabled. Only objects
+discovered during the tree walk are subject to these type filters. This
+ensures that objects specifically requested through the revision input
+are never silently dropped.
++
While it is possible to walk only commits in this way, consumers would be
better off using the revision walk API instead.
diff --git a/path-walk.c b/path-walk.c
index 6e426af433..05bfc1c114 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -248,6 +248,17 @@ static int add_tree_entries(struct path_walk_context *ctx,
return 0;
}
+/*
+ * Paths starting with '/' (e.g., "/tags", "/tagged-blobs") hold objects that
+ * were directly requested by 'pending' objects rather than discovered during
+ * tree traversal.
+ */
+static int path_is_for_direct_objects(const char *path)
+{
+ ASSERT(path);
+ return path[0] == '/';
+}
+
/*
* For each path in paths_to_explore, walk the trees another level
* and add any found blobs to the batch (but only if they exist and
@@ -306,14 +317,19 @@ static int walk_path(struct path_walk_context *ctx,
if (list->type == OBJ_BLOB &&
ctx->revs->prune_data.nr &&
+ !path_is_for_direct_objects(path) &&
!match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
path, strlen(path), 0,
NULL, 0))
return 0;
- /* Evaluate function pointer on this data, if requested. */
- if ((list->type == OBJ_TREE && ctx->info->trees) ||
- (list->type == OBJ_BLOB && ctx->info->blobs) ||
+ /*
+ * Evaluate function pointer on this data, if requested.
+ * Ignore object type filters for tagged objects (path starts
+ * with `/`).
+ */
+ if ((list->type == OBJ_TREE && (ctx->info->trees || path_is_for_direct_objects(path))) ||
+ (list->type == OBJ_BLOB && (ctx->info->blobs || path_is_for_direct_objects(path))) ||
(list->type == OBJ_TAG && ctx->info->tags))
ret = ctx->info->path_fn(path, &list->oids, list->type,
ctx->info->path_fn_data);
@@ -374,10 +390,8 @@ static int setup_pending_objects(struct path_walk_info *info,
if (info->tags)
CALLOC_ARRAY(tags, 1);
- if (info->blobs)
- CALLOC_ARRAY(tagged_blobs, 1);
- if (info->trees)
- root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);
+ CALLOC_ARRAY(tagged_blobs, 1);
+ root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);
/*
* Pending objects include:
@@ -421,8 +435,6 @@ static int setup_pending_objects(struct path_walk_info *info,
switch (obj->type) {
case OBJ_TREE:
- if (!info->trees)
- continue;
if (pending->path) {
char *path = *pending->path ? xstrfmt("%s/", pending->path)
: xstrdup("");
@@ -435,8 +447,6 @@ static int setup_pending_objects(struct path_walk_info *info,
break;
case OBJ_BLOB:
- if (!info->blobs)
- continue;
if (pending->path)
add_path_to_list(ctx, pending->path, OBJ_BLOB, &obj->oid, 1);
else
@@ -532,15 +542,17 @@ int walk_objects_by_path(struct path_walk_info *info)
push_to_stack(&ctx, root_path);
/*
- * Set these values before preparing the walk to catch
- * lightweight tags pointing to non-commits and indexed objects.
+ * Ensure that prepare_revision_walk() keeps all pending objects
+ * even through an object type filter.
*/
- info->revs->blob_objects = info->blobs;
- info->revs->tree_objects = info->trees;
+ info->revs->blob_objects = info->revs->tree_objects = 1;
if (prepare_revision_walk(info->revs))
die(_("failed to setup revision walk"));
+ info->revs->blob_objects = info->blobs;
+ info->revs->tree_objects = info->trees;
+
/*
* Walk trees to mark them as UNINTERESTING.
* This is particularly important when 'edge_aggressive' is set.
diff --git a/path-walk.h b/path-walk.h
index 5ef5a8440e..657eeda8ec 100644
--- a/path-walk.h
+++ b/path-walk.h
@@ -36,6 +36,11 @@ struct path_walk_info {
/**
* Initialize which object types the path_fn should be called on. This
* could also limit the walk to skip blobs if not set.
+ *
+ * Note: even when 'blobs' or 'trees' is disabled, objects that are
+ * directly requested as pending objects will still be emitted to
+ * path_fn. Only objects discovered during the tree walk are filtered by
+ * these flags.
*/
int commits;
int trees;
--
gitgitgadget
next prev parent reply other threads:[~2026-05-13 21:19 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-02 14:15 [PATCH 0/7] pack-objects: integrate --path-walk and some --filter options Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 1/7] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-04 0:49 ` Junio C Hamano
2026-05-04 12:01 ` Derrick Stolee
2026-05-02 14:15 ` [PATCH 2/7] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 3/7] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 4/7] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-03 22:59 ` Junio C Hamano
2026-05-04 12:09 ` Derrick Stolee
2026-05-02 14:15 ` [PATCH 5/7] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 6/7] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 7/7] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 00/10] pack-objects: integrate --path-walk and some --filter options Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 01/10] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 02/10] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 03/10] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 04/10] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 05/10] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 06/10] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 07/10] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 08/10] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 09/10] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 10/10] path-walk: support `combine` filter Taylor Blau via GitGitGadget
2026-05-05 16:18 ` [PATCH v2 00/10] pack-objects: integrate --path-walk and some --filter options Derrick Stolee
2026-05-05 19:01 ` Taylor Blau
2026-05-05 19:44 ` Derrick Stolee
2026-05-05 20:42 ` Taylor Blau
2026-05-07 11:40 ` Derrick Stolee
2026-05-11 3:05 ` Junio C Hamano
2026-05-11 13:58 ` Derrick Stolee
2026-05-11 18:12 ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
2026-05-11 18:12 ` [PATCH v3 01/12] t5620: make test work with path-walk var Derrick Stolee via GitGitGadget
2026-05-12 1:03 ` Taylor Blau
2026-05-11 18:12 ` [PATCH v3 02/12] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-12 1:04 ` Taylor Blau
2026-05-11 18:13 ` [PATCH v3 03/12] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-12 1:11 ` Taylor Blau
2026-05-13 18:23 ` Derrick Stolee
2026-05-11 18:13 ` [PATCH v3 04/12] path-walk: always emit directly-requested objects Derrick Stolee via GitGitGadget
2026-05-12 1:23 ` Taylor Blau
2026-05-13 18:29 ` Derrick Stolee
2026-05-11 18:13 ` [PATCH v3 05/12] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-11 18:38 ` Taylor Blau
2026-05-11 19:44 ` Derrick Stolee
2026-05-11 18:13 ` [PATCH v3 06/12] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-12 1:26 ` Taylor Blau
2026-05-11 18:13 ` [PATCH v3 07/12] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-12 1:33 ` Taylor Blau
2026-05-13 18:35 ` Derrick Stolee
2026-05-11 18:13 ` [PATCH v3 08/12] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-11 18:13 ` [PATCH v3 09/12] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-11 18:13 ` [PATCH v3 10/12] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-12 1:41 ` Taylor Blau
2026-05-13 19:46 ` Derrick Stolee
2026-05-11 18:13 ` [PATCH v3 11/12] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-11 18:13 ` [PATCH v3 12/12] path-walk: support `combine` filter Taylor Blau via GitGitGadget
2026-05-12 1:43 ` [PATCH v3 00/12] pack-objects: integrate --path-walk and some --filter options Taylor Blau
2026-05-13 21:18 ` [PATCH v4 00/13] " Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 01/13] t5620: make test work with path-walk var Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 02/13] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 03/13] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` Derrick Stolee via GitGitGadget [this message]
2026-05-13 21:18 ` [PATCH v4 05/13] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 06/13] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 07/13] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 08/13] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 09/13] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 10/13] t6601: tag otherwise-unreachable trees Derrick Stolee via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 11/13] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 12/13] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-13 21:18 ` [PATCH v4 13/13] path-walk: support `combine` filter Taylor Blau via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e77c8a6bbc22da3428751f81ff5ee79aa5364237.1778707135.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johannes.schindelin@gmx.de \
--cc=johncai86@gmail.com \
--cc=karthik.188@gmail.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox