Git development
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, gitster@pobox.com,
	johannes.schindelin@gmx.de, johncai86@gmail.com,
	karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com,
	me@ttaylorr.com, newren@gmail.com, peff@peff.net, ps@pks.im,
	Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v4 04/13] path-walk: always emit directly-requested objects
Date: Wed, 13 May 2026 21:18:46 +0000	[thread overview]
Message-ID: <e77c8a6bbc22da3428751f81ff5ee79aa5364237.1778707135.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2101.v4.git.1778707135.gitgitgadget@gmail.com>

From: Derrick Stolee <stolee@gmail.com>

We are preparing to integrate the path-walk API with some --filter options
in 'git pack-objects', but there is a subtle issue that is revealed when
those are put together and the test suite is run with
GIT_TEST_PACK_PATH_WALK=1.

When a filter reduces the set of requested objects, this results in
filtering out directly-requested objects, such as in the download of needed
blobs in a blobless partial clone.

The root cause is that the scan of pending objects in the path-walk API
respects the filters set in the path_walk_info instead of overriding them
for pending objects.

We can tell that a path is part of the directly-referenced objects if its
path name starts with '/' (other paths, including root trees never have this
starting character). Create a path_is_for_direct_objects() to make this
meaning clear, especially as we add more references in the future as we
integrate the path-walk API with partial clone filter options.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 Documentation/technical/api-path-walk.adoc |  7 ++++
 path-walk.c                                | 42 ++++++++++++++--------
 path-walk.h                                |  5 +++
 3 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/Documentation/technical/api-path-walk.adoc b/Documentation/technical/api-path-walk.adoc
index a67de1b143..6e17b13d61 100644
--- a/Documentation/technical/api-path-walk.adoc
+++ b/Documentation/technical/api-path-walk.adoc
@@ -48,6 +48,13 @@ commits.
 	applications could disable some options to make it simpler to walk
 	the objects or to have fewer calls to `path_fn`.
 +
+Note that objects directly requested as pending objects (such as targets
+of lightweight tags or other ref tips) are always emitted to `path_fn`,
+even when the corresponding type flag is disabled. Only objects
+discovered during the tree walk are subject to these type filters. This
+ensures that objects specifically requested through the revision input
+are never silently dropped.
++
 While it is possible to walk only commits in this way, consumers would be
 better off using the revision walk API instead.
 
diff --git a/path-walk.c b/path-walk.c
index 6e426af433..05bfc1c114 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -248,6 +248,17 @@ static int add_tree_entries(struct path_walk_context *ctx,
 	return 0;
 }
 
+/*
+ * Paths starting with '/' (e.g., "/tags", "/tagged-blobs") hold objects that
+ * were directly requested by 'pending' objects rather than discovered during
+ * tree traversal.
+ */
+static int path_is_for_direct_objects(const char *path)
+{
+	ASSERT(path);
+	return path[0] == '/';
+}
+
 /*
  * For each path in paths_to_explore, walk the trees another level
  * and add any found blobs to the batch (but only if they exist and
@@ -306,14 +317,19 @@ static int walk_path(struct path_walk_context *ctx,
 
 	if (list->type == OBJ_BLOB &&
 	    ctx->revs->prune_data.nr &&
+	    !path_is_for_direct_objects(path) &&
 	    !match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
 			   path, strlen(path), 0,
 			   NULL, 0))
 		return 0;
 
-	/* Evaluate function pointer on this data, if requested. */
-	if ((list->type == OBJ_TREE && ctx->info->trees) ||
-	    (list->type == OBJ_BLOB && ctx->info->blobs) ||
+	/*
+	 * Evaluate function pointer on this data, if requested.
+	 * Ignore object type filters for tagged objects (path starts
+	 * with `/`).
+	 */
+	if ((list->type == OBJ_TREE && (ctx->info->trees || path_is_for_direct_objects(path))) ||
+	    (list->type == OBJ_BLOB && (ctx->info->blobs || path_is_for_direct_objects(path))) ||
 	    (list->type == OBJ_TAG && ctx->info->tags))
 		ret = ctx->info->path_fn(path, &list->oids, list->type,
 					ctx->info->path_fn_data);
@@ -374,10 +390,8 @@ static int setup_pending_objects(struct path_walk_info *info,
 
 	if (info->tags)
 		CALLOC_ARRAY(tags, 1);
-	if (info->blobs)
-		CALLOC_ARRAY(tagged_blobs, 1);
-	if (info->trees)
-		root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);
+	CALLOC_ARRAY(tagged_blobs, 1);
+	root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);
 
 	/*
 	 * Pending objects include:
@@ -421,8 +435,6 @@ static int setup_pending_objects(struct path_walk_info *info,
 
 		switch (obj->type) {
 		case OBJ_TREE:
-			if (!info->trees)
-				continue;
 			if (pending->path) {
 				char *path = *pending->path ? xstrfmt("%s/", pending->path)
 							    : xstrdup("");
@@ -435,8 +447,6 @@ static int setup_pending_objects(struct path_walk_info *info,
 			break;
 
 		case OBJ_BLOB:
-			if (!info->blobs)
-				continue;
 			if (pending->path)
 				add_path_to_list(ctx, pending->path, OBJ_BLOB, &obj->oid, 1);
 			else
@@ -532,15 +542,17 @@ int walk_objects_by_path(struct path_walk_info *info)
 	push_to_stack(&ctx, root_path);
 
 	/*
-	 * Set these values before preparing the walk to catch
-	 * lightweight tags pointing to non-commits and indexed objects.
+	 * Ensure that prepare_revision_walk() keeps all pending objects
+	 * even through an object type filter.
 	 */
-	info->revs->blob_objects = info->blobs;
-	info->revs->tree_objects = info->trees;
+	info->revs->blob_objects = info->revs->tree_objects = 1;
 
 	if (prepare_revision_walk(info->revs))
 		die(_("failed to setup revision walk"));
 
+	info->revs->blob_objects = info->blobs;
+	info->revs->tree_objects = info->trees;
+
 	/*
 	 * Walk trees to mark them as UNINTERESTING.
 	 * This is particularly important when 'edge_aggressive' is set.
diff --git a/path-walk.h b/path-walk.h
index 5ef5a8440e..657eeda8ec 100644
--- a/path-walk.h
+++ b/path-walk.h
@@ -36,6 +36,11 @@ struct path_walk_info {
 	/**
 	 * Initialize which object types the path_fn should be called on. This
 	 * could also limit the walk to skip blobs if not set.
+	 *
+	 * Note: even when 'blobs' or 'trees' is disabled, objects that are
+	 * directly requested as pending objects will still be emitted to
+	 * path_fn. Only objects discovered during the tree walk are filtered by
+	 * these flags.
 	 */
 	int commits;
 	int trees;
-- 
gitgitgadget


  parent reply	other threads:[~2026-05-13 21:19 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-02 14:15 [PATCH 0/7] pack-objects: integrate --path-walk and some --filter options Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 1/7] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-04  0:49   ` Junio C Hamano
2026-05-04 12:01     ` Derrick Stolee
2026-05-02 14:15 ` [PATCH 2/7] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 3/7] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 4/7] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-03 22:59   ` Junio C Hamano
2026-05-04 12:09     ` Derrick Stolee
2026-05-02 14:15 ` [PATCH 5/7] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 6/7] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 7/7] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 00/10] pack-objects: integrate --path-walk and some --filter options Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 01/10] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 02/10] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 03/10] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 04/10] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 05/10] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 06/10] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 07/10] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 08/10] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 09/10] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 10/10] path-walk: support `combine` filter Taylor Blau via GitGitGadget
2026-05-05 16:18   ` [PATCH v2 00/10] pack-objects: integrate --path-walk and some --filter options Derrick Stolee
2026-05-05 19:01     ` Taylor Blau
2026-05-05 19:44       ` Derrick Stolee
2026-05-05 20:42         ` Taylor Blau
2026-05-07 11:40           ` Derrick Stolee
2026-05-11  3:05         ` Junio C Hamano
2026-05-11 13:58           ` Derrick Stolee
2026-05-11 18:12   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
2026-05-11 18:12     ` [PATCH v3 01/12] t5620: make test work with path-walk var Derrick Stolee via GitGitGadget
2026-05-12  1:03       ` Taylor Blau
2026-05-11 18:12     ` [PATCH v3 02/12] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-12  1:04       ` Taylor Blau
2026-05-11 18:13     ` [PATCH v3 03/12] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-12  1:11       ` Taylor Blau
2026-05-13 18:23         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 04/12] path-walk: always emit directly-requested objects Derrick Stolee via GitGitGadget
2026-05-12  1:23       ` Taylor Blau
2026-05-13 18:29         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 05/12] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-11 18:38       ` Taylor Blau
2026-05-11 19:44         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 06/12] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-12  1:26       ` Taylor Blau
2026-05-11 18:13     ` [PATCH v3 07/12] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-12  1:33       ` Taylor Blau
2026-05-13 18:35         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 08/12] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-11 18:13     ` [PATCH v3 09/12] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-11 18:13     ` [PATCH v3 10/12] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-12  1:41       ` Taylor Blau
2026-05-13 19:46         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 11/12] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-11 18:13     ` [PATCH v3 12/12] path-walk: support `combine` filter Taylor Blau via GitGitGadget
2026-05-12  1:43     ` [PATCH v3 00/12] pack-objects: integrate --path-walk and some --filter options Taylor Blau
2026-05-13 21:18     ` [PATCH v4 00/13] " Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 01/13] t5620: make test work with path-walk var Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 02/13] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 03/13] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` Derrick Stolee via GitGitGadget [this message]
2026-05-13 21:18       ` [PATCH v4 05/13] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 06/13] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 07/13] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 08/13] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 09/13] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 10/13] t6601: tag otherwise-unreachable trees Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 11/13] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 12/13] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 13/13] path-walk: support `combine` filter Taylor Blau via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e77c8a6bbc22da3428751f81ff5ee79aa5364237.1778707135.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=johncai86@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox