All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: christian.couder@gmail.com, gitster@pobox.com,
	johannes.schindelin@gmx.de, johncai86@gmail.com,
	karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com,
	me@ttaylorr.com, newren@gmail.com, peff@peff.net, ps@pks.im,
	Taylor Blau <me@ttaylorr.com>, Derrick Stolee <stolee@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v4 04/13] path-walk: always emit directly-requested objects
Date: Wed, 13 May 2026 21:18:46 +0000	[thread overview]
Message-ID: <e77c8a6bbc22da3428751f81ff5ee79aa5364237.1778707135.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2101.v4.git.1778707135.gitgitgadget@gmail.com>

From: Derrick Stolee <stolee@gmail.com>

We are preparing to integrate the path-walk API with some --filter options
in 'git pack-objects', but there is a subtle issue that is revealed when
those are put together and the test suite is run with
GIT_TEST_PACK_PATH_WALK=1.

When a filter reduces the set of requested objects, this results in
filtering out directly-requested objects, such as in the download of needed
blobs in a blobless partial clone.

The root cause is that the scan of pending objects in the path-walk API
respects the filters set in the path_walk_info instead of overriding them
for pending objects.

We can tell that a path is part of the directly-referenced objects if its
path name starts with '/' (other paths, including root trees never have this
starting character). Create a path_is_for_direct_objects() to make this
meaning clear, especially as we add more references in the future as we
integrate the path-walk API with partial clone filter options.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 Documentation/technical/api-path-walk.adoc |  7 ++++
 path-walk.c                                | 42 ++++++++++++++--------
 path-walk.h                                |  5 +++
 3 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/Documentation/technical/api-path-walk.adoc b/Documentation/technical/api-path-walk.adoc
index a67de1b143..6e17b13d61 100644
--- a/Documentation/technical/api-path-walk.adoc
+++ b/Documentation/technical/api-path-walk.adoc
@@ -48,6 +48,13 @@ commits.
 	applications could disable some options to make it simpler to walk
 	the objects or to have fewer calls to `path_fn`.
 +
+Note that objects directly requested as pending objects (such as targets
+of lightweight tags or other ref tips) are always emitted to `path_fn`,
+even when the corresponding type flag is disabled. Only objects
+discovered during the tree walk are subject to these type filters. This
+ensures that objects specifically requested through the revision input
+are never silently dropped.
++
 While it is possible to walk only commits in this way, consumers would be
 better off using the revision walk API instead.
 
diff --git a/path-walk.c b/path-walk.c
index 6e426af433..05bfc1c114 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -248,6 +248,17 @@ static int add_tree_entries(struct path_walk_context *ctx,
 	return 0;
 }
 
+/*
+ * Paths starting with '/' (e.g., "/tags", "/tagged-blobs") hold objects that
+ * were directly requested by 'pending' objects rather than discovered during
+ * tree traversal.
+ */
+static int path_is_for_direct_objects(const char *path)
+{
+	ASSERT(path);
+	return path[0] == '/';
+}
+
 /*
  * For each path in paths_to_explore, walk the trees another level
  * and add any found blobs to the batch (but only if they exist and
@@ -306,14 +317,19 @@ static int walk_path(struct path_walk_context *ctx,
 
 	if (list->type == OBJ_BLOB &&
 	    ctx->revs->prune_data.nr &&
+	    !path_is_for_direct_objects(path) &&
 	    !match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
 			   path, strlen(path), 0,
 			   NULL, 0))
 		return 0;
 
-	/* Evaluate function pointer on this data, if requested. */
-	if ((list->type == OBJ_TREE && ctx->info->trees) ||
-	    (list->type == OBJ_BLOB && ctx->info->blobs) ||
+	/*
+	 * Evaluate function pointer on this data, if requested.
+	 * Ignore object type filters for tagged objects (path starts
+	 * with `/`).
+	 */
+	if ((list->type == OBJ_TREE && (ctx->info->trees || path_is_for_direct_objects(path))) ||
+	    (list->type == OBJ_BLOB && (ctx->info->blobs || path_is_for_direct_objects(path))) ||
 	    (list->type == OBJ_TAG && ctx->info->tags))
 		ret = ctx->info->path_fn(path, &list->oids, list->type,
 					ctx->info->path_fn_data);
@@ -374,10 +390,8 @@ static int setup_pending_objects(struct path_walk_info *info,
 
 	if (info->tags)
 		CALLOC_ARRAY(tags, 1);
-	if (info->blobs)
-		CALLOC_ARRAY(tagged_blobs, 1);
-	if (info->trees)
-		root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);
+	CALLOC_ARRAY(tagged_blobs, 1);
+	root_tree_list = strmap_get(&ctx->paths_to_lists, root_path);
 
 	/*
 	 * Pending objects include:
@@ -421,8 +435,6 @@ static int setup_pending_objects(struct path_walk_info *info,
 
 		switch (obj->type) {
 		case OBJ_TREE:
-			if (!info->trees)
-				continue;
 			if (pending->path) {
 				char *path = *pending->path ? xstrfmt("%s/", pending->path)
 							    : xstrdup("");
@@ -435,8 +447,6 @@ static int setup_pending_objects(struct path_walk_info *info,
 			break;
 
 		case OBJ_BLOB:
-			if (!info->blobs)
-				continue;
 			if (pending->path)
 				add_path_to_list(ctx, pending->path, OBJ_BLOB, &obj->oid, 1);
 			else
@@ -532,15 +542,17 @@ int walk_objects_by_path(struct path_walk_info *info)
 	push_to_stack(&ctx, root_path);
 
 	/*
-	 * Set these values before preparing the walk to catch
-	 * lightweight tags pointing to non-commits and indexed objects.
+	 * Ensure that prepare_revision_walk() keeps all pending objects
+	 * even through an object type filter.
 	 */
-	info->revs->blob_objects = info->blobs;
-	info->revs->tree_objects = info->trees;
+	info->revs->blob_objects = info->revs->tree_objects = 1;
 
 	if (prepare_revision_walk(info->revs))
 		die(_("failed to setup revision walk"));
 
+	info->revs->blob_objects = info->blobs;
+	info->revs->tree_objects = info->trees;
+
 	/*
 	 * Walk trees to mark them as UNINTERESTING.
 	 * This is particularly important when 'edge_aggressive' is set.
diff --git a/path-walk.h b/path-walk.h
index 5ef5a8440e..657eeda8ec 100644
--- a/path-walk.h
+++ b/path-walk.h
@@ -36,6 +36,11 @@ struct path_walk_info {
 	/**
 	 * Initialize which object types the path_fn should be called on. This
 	 * could also limit the walk to skip blobs if not set.
+	 *
+	 * Note: even when 'blobs' or 'trees' is disabled, objects that are
+	 * directly requested as pending objects will still be emitted to
+	 * path_fn. Only objects discovered during the tree walk are filtered by
+	 * these flags.
 	 */
 	int commits;
 	int trees;
-- 
gitgitgadget


  parent reply	other threads:[~2026-05-13 21:19 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-02 14:15 [PATCH 0/7] pack-objects: integrate --path-walk and some --filter options Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 1/7] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-04  0:49   ` Junio C Hamano
2026-05-04 12:01     ` Derrick Stolee
2026-05-02 14:15 ` [PATCH 2/7] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 3/7] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 4/7] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-03 22:59   ` Junio C Hamano
2026-05-04 12:09     ` Derrick Stolee
2026-05-02 14:15 ` [PATCH 5/7] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 6/7] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-02 14:15 ` [PATCH 7/7] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21 ` [PATCH v2 00/10] pack-objects: integrate --path-walk and some --filter options Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 01/10] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 02/10] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 03/10] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 04/10] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 05/10] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 06/10] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 07/10] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 08/10] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 09/10] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-04 20:21   ` [PATCH v2 10/10] path-walk: support `combine` filter Taylor Blau via GitGitGadget
2026-05-05 16:18   ` [PATCH v2 00/10] pack-objects: integrate --path-walk and some --filter options Derrick Stolee
2026-05-05 19:01     ` Taylor Blau
2026-05-05 19:44       ` Derrick Stolee
2026-05-05 20:42         ` Taylor Blau
2026-05-07 11:40           ` Derrick Stolee
2026-05-11  3:05         ` Junio C Hamano
2026-05-11 13:58           ` Derrick Stolee
2026-05-11 18:12   ` [PATCH v3 00/12] " Derrick Stolee via GitGitGadget
2026-05-11 18:12     ` [PATCH v3 01/12] t5620: make test work with path-walk var Derrick Stolee via GitGitGadget
2026-05-12  1:03       ` Taylor Blau
2026-05-11 18:12     ` [PATCH v3 02/12] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-12  1:04       ` Taylor Blau
2026-05-11 18:13     ` [PATCH v3 03/12] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-12  1:11       ` Taylor Blau
2026-05-13 18:23         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 04/12] path-walk: always emit directly-requested objects Derrick Stolee via GitGitGadget
2026-05-12  1:23       ` Taylor Blau
2026-05-13 18:29         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 05/12] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-11 18:38       ` Taylor Blau
2026-05-11 19:44         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 06/12] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-12  1:26       ` Taylor Blau
2026-05-11 18:13     ` [PATCH v3 07/12] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-12  1:33       ` Taylor Blau
2026-05-13 18:35         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 08/12] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-11 18:13     ` [PATCH v3 09/12] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-11 18:13     ` [PATCH v3 10/12] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-12  1:41       ` Taylor Blau
2026-05-13 19:46         ` Derrick Stolee
2026-05-11 18:13     ` [PATCH v3 11/12] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-11 18:13     ` [PATCH v3 12/12] path-walk: support `combine` filter Taylor Blau via GitGitGadget
2026-05-12  1:43     ` [PATCH v3 00/12] pack-objects: integrate --path-walk and some --filter options Taylor Blau
2026-05-13 21:18     ` [PATCH v4 00/13] " Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 01/13] t5620: make test work with path-walk var Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 02/13] pack-objects: pass --objects with --path-walk Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 03/13] t/perf: add pack-objects filter and path-walk benchmark Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` Derrick Stolee via GitGitGadget [this message]
2026-05-13 21:18       ` [PATCH v4 05/13] path-walk: support blobless filter Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 06/13] backfill: die on incompatible filter options Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 07/13] path-walk: support blob size limit filter Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 08/13] path-walk: add pl_sparse_trees to control tree pruning Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 09/13] pack-objects: support sparse:oid filter with path-walk Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 10/13] t6601: tag otherwise-unreachable trees Derrick Stolee via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 11/13] path-walk: support `tree:0` filter Taylor Blau via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 12/13] path-walk: support `object:type` filter Taylor Blau via GitGitGadget
2026-05-13 21:18       ` [PATCH v4 13/13] path-walk: support `combine` filter Taylor Blau via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e77c8a6bbc22da3428751f81ff5ee79aa5364237.1778707135.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=johncai86@gmail.com \
    --cc=karthik.188@gmail.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.