All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com,
	Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>,
	r.siddharth.shrimali@gmail.com, ps@pks.im,
	Derrick Stolee <stolee@gmail.com>,
	Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v2 5/6] path-walk: support wildcard pathspecs for blob filtering
Date: Mon, 23 Mar 2026 11:40:18 +0000	[thread overview]
Message-ID: <1168edfb96c739581dd510f86bf1dbf5e28603e6.1774266019.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2070.v2.git.1774266019.gitgitgadget@gmail.com>

From: Derrick Stolee <stolee@gmail.com>

Previously, walk_objects_by_path() silently ignored pathspecs containing
wildcards or magic by clearing them. This caused all blobs to be
downloaded regardless of the given pathspec. Wildcard pathspecs like
"d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
during 'git backfill').

Support wildcard pathspecs by making two changes:

 1. Add an 'exact_pathspecs' flag to path_walk_context. When the
    pathspec has no wildcards or magic, set this flag and use the
    existing fast-path prefix matching in add_tree_entries(). When
    wildcards are present, skip that block since prefix matching
    cannot handle glob patterns.

 2. Add a match_pathspec() check in walk_path() to filter out blobs
    whose full path does not match the pathspec. This provides the
    actual blob-level filtering for wildcard pathspecs.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
 path-walk.c         | 22 +++++++++++++---------
 t/t5620-backfill.sh |  7 +++----
 2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/path-walk.c b/path-walk.c
index 0d640e2f24..6b83e0e1d5 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -63,6 +63,8 @@ struct path_walk_context {
 	 */
 	struct prio_queue path_stack;
 	struct strset path_stack_pushed;
+
+	unsigned exact_pathspecs:1;
 };
 
 static int compare_by_type(const void *one, const void *two, void *cb_data)
@@ -207,7 +209,7 @@ static int add_tree_entries(struct path_walk_context *ctx,
 				 match != MATCHED)
 				continue;
 		}
-		if (ctx->revs->prune_data.nr) {
+		if (ctx->revs->prune_data.nr && ctx->exact_pathspecs) {
 			struct pathspec *pd = &ctx->revs->prune_data;
 			bool found = false;
 
@@ -303,6 +305,13 @@ static int walk_path(struct path_walk_context *ctx,
 			return 0;
 	}
 
+	if (list->type == OBJ_BLOB &&
+	    ctx->revs->prune_data.nr &&
+	    !match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
+			   path, strlen(path), 0,
+			   NULL, 0))
+		return 0;
+
 	/* Evaluate function pointer on this data, if requested. */
 	if ((list->type == OBJ_TREE && ctx->info->trees) ||
 	    (list->type == OBJ_BLOB && ctx->info->blobs) ||
@@ -511,14 +520,9 @@ int walk_objects_by_path(struct path_walk_info *info)
 		info->revs->tag_objects = 1;
 
 	if (ctx.revs->prune_data.nr) {
-		/*
-		 * Only exact prefix pathspecs are currently supported.
-		 * Clear any wildcard or magic pathspecs to avoid
-		 * incorrect prefix matching.
-		 */
-		if (ctx.revs->prune_data.has_wildcard ||
-		    ctx.revs->prune_data.magic)
-			clear_pathspec(&ctx.revs->prune_data);
+		if (!ctx.revs->prune_data.has_wildcard &&
+		    !ctx.revs->prune_data.magic)
+			ctx.exact_pathspecs = 1;
 	}
 
 	/* Insert a single list for the root tree into the paths. */
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 52f6484ca1..c6f54ee91c 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -307,12 +307,11 @@ test_expect_success 'backfill with wildcard pathspec' '
 	git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
 	test_line_count = 48 missing &&
 
-	# TODO: The wildcard pathspec should limit downloaded blobs,
-	# but currently all blobs are downloaded.
-	git -C backfill-path backfill HEAD -- "d/file.*.txt" &&
+	git -C backfill-path backfill HEAD -- "d/file.*.txt" 2>err &&
+	test_must_be_empty err &&
 
 	git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
-	test_line_count = 0 missing
+	test_line_count = 40 missing
 '
 
 test_expect_success 'backfill with --all' '
-- 
gitgitgadget


  parent reply	other threads:[~2026-03-23 11:40 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17  0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17  0:29 ` [PATCH 1/5] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-17 21:52   ` Junio C Hamano
2026-03-17  0:29 ` [PATCH 2/5] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-17  0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 22:01   ` Junio C Hamano
2026-03-18 15:37   ` Kristoffer Haugsbakk
2026-03-23  0:31     ` Derrick Stolee
2026-03-19  9:54   ` Patrick Steinhardt
2026-03-23  0:35     ` Derrick Stolee
2026-03-17  0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-17 22:10   ` Junio C Hamano
2026-03-18 13:15     ` Derrick Stolee
2026-03-19  9:54       ` Patrick Steinhardt
2026-03-19  9:55   ` Patrick Steinhardt
2026-03-19 10:15   ` Patrick Steinhardt
2026-03-23  0:47     ` Derrick Stolee
2026-03-17  0:29 ` [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-17 22:19   ` Junio C Hamano
2026-03-18 13:16     ` Derrick Stolee
2026-03-23  1:33       ` Derrick Stolee
2026-03-17 21:45 ` [PATCH 0/5] backfill: accept revision arguments Junio C Hamano
2026-03-19  9:54 ` Patrick Steinhardt
2026-03-19 12:59   ` Derrick Stolee
2026-03-20  7:35     ` Patrick Steinhardt
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-23 11:40   ` [PATCH v2 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-24  7:59     ` Patrick Steinhardt
2026-03-26 12:55       ` Derrick Stolee
2026-03-23 11:40   ` [PATCH v2 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-24  7:59     ` Patrick Steinhardt
2026-03-26 12:58       ` Derrick Stolee
2026-03-23 11:40   ` Derrick Stolee via GitGitGadget [this message]
2026-03-23 11:40   ` [PATCH v2 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-23 15:29     ` Junio C Hamano
2026-03-23 20:39       ` Derrick Stolee
2026-03-26 15:14   ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-26 15:14     ` [PATCH v3 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-27  7:07     ` [PATCH v3 0/6] backfill: accept revision arguments Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1168edfb96c739581dd510f86bf1dbf5e28603e6.1774266019.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kristofferhaugsbakk@fastmail.com \
    --cc=ps@pks.im \
    --cc=r.siddharth.shrimali@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.