* [PATCH 1/5] revision: include object-name.h
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
@ 2026-03-17 0:29 ` Derrick Stolee via GitGitGadget
2026-03-17 21:52 ` Junio C Hamano
2026-03-17 0:29 ` [PATCH 2/5] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
` (6 subsequent siblings)
7 siblings, 1 reply; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-17 0:29 UTC (permalink / raw)
To: git; +Cc: gitster, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The REV_INFO_INIT macro includes a use of the DEFAULT_ABBREV macro, which is
defined in object-name.h. Include it in revision.h so consumers of
REV_INFO_INIT do not need to include this hidden dependency.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
revision.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/revision.h b/revision.h
index b36acfc2d9..18c9bbd822 100644
--- a/revision.h
+++ b/revision.h
@@ -4,6 +4,7 @@
#include "commit.h"
#include "grep.h"
#include "notes.h"
+#include "object-name.h"
#include "oidset.h"
#include "pretty.h"
#include "diff.h"
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* Re: [PATCH 1/5] revision: include object-name.h
2026-03-17 0:29 ` [PATCH 1/5] revision: include object-name.h Derrick Stolee via GitGitGadget
@ 2026-03-17 21:52 ` Junio C Hamano
0 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2026-03-17 21:52 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> The REV_INFO_INIT macro includes a use of the DEFAULT_ABBREV macro, which is
> defined in object-name.h. Include it in revision.h so consumers of
> REV_INFO_INIT do not need to include this hidden dependency.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> revision.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/revision.h b/revision.h
> index b36acfc2d9..18c9bbd822 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -4,6 +4,7 @@
> #include "commit.h"
> #include "grep.h"
> #include "notes.h"
> +#include "object-name.h"
> #include "oidset.h"
> #include "pretty.h"
> #include "diff.h"
OK. Other symbols REV_INFO_INIT needs are REV_SORT_IN_GRAPH_ORDER
(in <commit.h>), CMIT_FMT_DEFAULT (in <pretty.h>), and STRVEC_INIT
(in <strvec.h>), and all three are already included there.
Makes sense.
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH 2/5] t5620: prepare branched repo for revision tests
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 0:29 ` [PATCH 1/5] revision: include object-name.h Derrick Stolee via GitGitGadget
@ 2026-03-17 0:29 ` Derrick Stolee via GitGitGadget
2026-03-17 0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (5 subsequent siblings)
7 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-17 0:29 UTC (permalink / raw)
To: git; +Cc: gitster, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Prepare the test infrastructure for upcoming changes that teach 'git
backfill' to accept revision arguments and pathspecs.
Add test_tick before each commit in the setup loop so that commit dates
are deterministic. This enables reliable testing with '--since'.
Rename the 'd/e/' directory to 'd/f/' so that the prefix 'd/f' is
ambiguous with the files 'd/file.*.txt'. This exercises the subtlety
in prefix pathspec matching that will be added in a later commit.
Create a branched version of the test repository (src-revs) with:
- A 'side' branch merged into main, adding s/file.{1,2}.txt with
two versions (4 new blobs, 52 total from main HEAD).
- An unmerged 'other' branch adding o/file.{1,2}.txt (2 more blobs,
54 total reachable from --all).
This structure makes --all, --first-parent, and --since produce
meaningfully different results when used with 'git backfill'.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t5620-backfill.sh | 52 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 50 insertions(+), 2 deletions(-)
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 58c81556e7..1331949be4 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -15,7 +15,7 @@ test_expect_success 'setup repo for object creation' '
git init src &&
mkdir -p src/a/b/c &&
- mkdir -p src/d/e &&
+ mkdir -p src/d/f &&
for i in 1 2
do
@@ -26,8 +26,9 @@ test_expect_success 'setup repo for object creation' '
echo "Version $i of file a/b/$n" > src/a/b/file.$n.txt &&
echo "Version $i of file a/b/c/$n" > src/a/b/c/file.$n.txt &&
echo "Version $i of file d/$n" > src/d/file.$n.txt &&
- echo "Version $i of file d/e/$n" > src/d/e/file.$n.txt &&
+ echo "Version $i of file d/f/$n" > src/d/f/file.$n.txt &&
git -C src add . &&
+ test_tick &&
git -C src commit -m "Iteration $n" || return 1
done
done
@@ -41,6 +42,53 @@ test_expect_success 'setup bare clone for server' '
git -C srv.bare config --local uploadpack.allowanysha1inwant 1
'
+# Create a version of the repo with branches for testing revision
+# arguments like --all, --first-parent, and --since.
+#
+# main: 8 commits (linear) + merge of side branch
+# 48 original blobs + 4 side blobs = 52 blobs from main HEAD
+# side: 2 commits adding s/file.{1,2}.txt (v1, v2), merged into main
+# other: 1 commit adding o/file.{1,2}.txt (not merged)
+# 54 total blobs reachable from --all
+test_expect_success 'setup branched repo for revision tests' '
+ git clone src src-revs &&
+
+ # Side branch from tip of main with unique files
+ git -C src-revs checkout -b side HEAD &&
+ mkdir -p src-revs/s &&
+ echo "Side version 1 of file 1" >src-revs/s/file.1.txt &&
+ echo "Side version 1 of file 2" >src-revs/s/file.2.txt &&
+ test_tick &&
+ git -C src-revs add . &&
+ git -C src-revs commit -m "Side commit 1" &&
+
+ echo "Side version 2 of file 1" >src-revs/s/file.1.txt &&
+ echo "Side version 2 of file 2" >src-revs/s/file.2.txt &&
+ test_tick &&
+ git -C src-revs add . &&
+ git -C src-revs commit -m "Side commit 2" &&
+
+ # Merge side into main
+ git -C src-revs checkout main &&
+ test_tick &&
+ git -C src-revs merge side --no-ff -m "Merge side branch" &&
+
+ # Other branch (not merged) for --all testing
+ git -C src-revs checkout -b other main~1 &&
+ mkdir -p src-revs/o &&
+ echo "Other content 1" >src-revs/o/file.1.txt &&
+ echo "Other content 2" >src-revs/o/file.2.txt &&
+ test_tick &&
+ git -C src-revs add . &&
+ git -C src-revs commit -m "Other commit" &&
+
+ git -C src-revs checkout main &&
+
+ git clone --bare "file://$(pwd)/src-revs" srv-revs.bare &&
+ git -C srv-revs.bare config --local uploadpack.allowfilter 1 &&
+ git -C srv-revs.bare config --local uploadpack.allowanysha1inwant 1
+'
+
# do basic partial clone from "srv.bare"
test_expect_success 'do partial clone 1, backfill gets all objects' '
git clone --no-checkout --filter=blob:none \
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* [PATCH 3/5] backfill: accept revision arguments
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 0:29 ` [PATCH 1/5] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-17 0:29 ` [PATCH 2/5] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
@ 2026-03-17 0:29 ` Derrick Stolee via GitGitGadget
2026-03-17 22:01 ` Junio C Hamano
` (2 more replies)
2026-03-17 0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
` (4 subsequent siblings)
7 siblings, 3 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-17 0:29 UTC (permalink / raw)
To: git; +Cc: gitster, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The existing implementation of 'git backfill' only includes downloading
missing blobs reachable from HEAD. Advanced uses may desire more general
commit limiting options, such as '--all' for all references, specifying a
commit range via negative references, or specifying a recency of use such as
with '--since=<date>'.
All of these options are available if we use setup_revisions() to parse the
unknown arguments with the revision machinery. This opens up a large number
of possibilities, only a small set of which are tested here.
For documentation, we avoid duplicating the option documentation and instead
link to the documentation of 'git rev-list'.
Note that these arguments currently allow specifying a pathspec, which
modifies the commit history checks but does not limit the paths used in the
backfill logic. This will be updated in a future change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-backfill.adoc | 3 +
builtin/backfill.c | 19 ++--
t/t5620-backfill.sh | 156 ++++++++++++++++++++++++++++++++
3 files changed, 172 insertions(+), 6 deletions(-)
diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
index b8394dcf22..fdfe22d623 100644
--- a/Documentation/git-backfill.adoc
+++ b/Documentation/git-backfill.adoc
@@ -63,9 +63,12 @@ OPTIONS
current sparse-checkout. If the sparse-checkout feature is enabled,
then `--sparse` is assumed and can be disabled with `--no-sparse`.
+You may also specify the commit limiting options from linkgit:git-rev-list[1].
+
SEE ALSO
--------
linkgit:git-clone[1].
+linkgit:git-rev-list[1].
GIT
---
diff --git a/builtin/backfill.c b/builtin/backfill.c
index e80fc1b694..1b5595b27c 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -35,6 +35,7 @@ struct backfill_context {
struct oid_array current_batch;
size_t min_batch_size;
int sparse;
+ struct rev_info revs;
};
static void backfill_context_clear(struct backfill_context *ctx)
@@ -80,7 +81,6 @@ static int fill_missing_blobs(const char *path UNUSED,
static int do_backfill(struct backfill_context *ctx)
{
- struct rev_info revs;
struct path_walk_info info = PATH_WALK_INFO_INIT;
int ret;
@@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
}
}
- repo_init_revisions(ctx->repo, &revs, "");
- handle_revision_arg("HEAD", &revs, 0, 0);
+ /* Walk from HEAD if otherwise unspecified. */
+ if (!ctx->revs.pending.nr)
+ handle_revision_arg("HEAD", &ctx->revs, 0, 0);
info.blobs = 1;
info.tags = info.commits = info.trees = 0;
- info.revs = &revs;
+ info.revs = &ctx->revs;
info.path_fn = fill_missing_blobs;
info.path_fn_data = ctx;
@@ -109,7 +110,6 @@ static int do_backfill(struct backfill_context *ctx)
download_batch(ctx);
path_walk_info_clear(&info);
- release_revisions(&revs);
return ret;
}
@@ -121,6 +121,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
.current_batch = OID_ARRAY_INIT,
.min_batch_size = 50000,
.sparse = 0,
+ .revs = REV_INFO_INIT,
};
struct option options[] = {
OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
@@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
builtin_backfill_usage, options);
argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
- 0);
+ PARSE_OPT_KEEP_UNKNOWN_OPT |
+ PARSE_OPT_KEEP_ARGV0 |
+ PARSE_OPT_KEEP_DASHDASH);
+
+ repo_init_revisions(repo, &ctx.revs, prefix);
+ argc = setup_revisions(argc, argv, &ctx.revs, NULL);
repo_config(repo, git_default_config, NULL);
@@ -143,5 +149,6 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
result = do_backfill(&ctx);
backfill_context_clear(&ctx);
+ release_revisions(&ctx.revs);
return result;
}
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 1331949be4..db66d8b614 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -224,6 +224,162 @@ test_expect_success 'backfill --sparse without cone mode (negative)' '
test_line_count = 12 missing
'
+test_expect_success 'backfill with revision range' '
+ test_when_finished rm -rf backfill-revs &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-revs &&
+
+ # No blobs yet
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ git -C backfill-revs backfill HEAD~2..HEAD &&
+
+ # 30 objects downloaded.
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 18 missing
+'
+
+test_expect_success 'backfill with revisions over stdin' '
+ test_when_finished rm -rf backfill-revs &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-revs &&
+
+ # No blobs yet
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ cat >in <<-EOF &&
+ HEAD
+ ^HEAD~2
+ EOF
+
+ git -C backfill-revs backfill --stdin <in &&
+
+ # 30 objects downloaded.
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 18 missing
+'
+
+test_expect_success 'backfill with prefix pathspec' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The pathspec should limit the downloaded blobs to
+ # only those matching the prefix "d/f", but currently all
+ # blobs are downloaded.
+ git -C backfill-path backfill HEAD -- d/f &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with multiple pathspecs' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The pathspecs should limit the downloaded blobs to
+ # only those matching "d/f" or "a", but currently all blobs
+ # are downloaded.
+ git -C backfill-path backfill HEAD -- d/f a &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with wildcard pathspec' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The wildcard pathspec should limit downloaded blobs,
+ # but currently all blobs are downloaded.
+ git -C backfill-path backfill HEAD -- "d/file.*.txt" &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with --all' '
+ test_when_finished rm -rf backfill-all &&
+ git clone --no-checkout --filter=blob:none \
+ "file://$(pwd)/srv-revs.bare" backfill-all &&
+
+ # All blobs from all refs are missing
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 54 missing &&
+
+ # Backfill from HEAD gets main blobs only
+ git -C backfill-all backfill HEAD &&
+
+ # Other branch blobs still missing
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 2 missing &&
+
+ # Backfill with --all gets everything
+ git -C backfill-all backfill --all &&
+
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with --first-parent' '
+ test_when_finished rm -rf backfill-fp &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv-revs.bare" backfill-fp &&
+
+ git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 52 missing &&
+
+ # --first-parent skips the side branch commits, so
+ # s/file.{1,2}.txt v1 blobs (only in side commit 1) are missed.
+ git -C backfill-fp backfill --first-parent HEAD &&
+
+ git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 2 missing
+'
+
+test_expect_success 'backfill with --since' '
+ test_when_finished rm -rf backfill-since &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv-revs.bare" backfill-since &&
+
+ git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 52 missing &&
+
+ # Use a cutoff between commits 4 and 5 (between v1 and v2
+ # iterations). Commits 5-8 still carry v1 of files 2-4 in
+ # their trees, but v1 of file.1.txt is only in commits 1-4.
+ SINCE=$(git -C backfill-since log --first-parent --reverse \
+ --format=%ct HEAD~1 | sed -n 5p) &&
+ git -C backfill-since backfill --since="@$((SINCE - 1))" HEAD &&
+
+ # 6 missing: v1 of file.1.txt in all 6 directories
+ git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 6 missing
+'
+
. "$TEST_DIRECTORY"/lib-httpd.sh
start_httpd
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* Re: [PATCH 3/5] backfill: accept revision arguments
2026-03-17 0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
@ 2026-03-17 22:01 ` Junio C Hamano
2026-03-18 15:37 ` Kristoffer Haugsbakk
2026-03-19 9:54 ` Patrick Steinhardt
2 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2026-03-17 22:01 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> - repo_init_revisions(ctx->repo, &revs, "");
> - handle_revision_arg("HEAD", &revs, 0, 0);
So we used to "cheat" and did an initialization without even knowing
in which directory we were started ...
> + /* Walk from HEAD if otherwise unspecified. */
> + if (!ctx->revs.pending.nr)
> + handle_revision_arg("HEAD", &ctx->revs, 0, 0);
... but by initializing the revs correctly in the caller, we would
be correcting it. Looking good.
> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
> builtin_backfill_usage, options);
>
> argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
> - 0);
> + PARSE_OPT_KEEP_UNKNOWN_OPT |
> + PARSE_OPT_KEEP_ARGV0 |
> + PARSE_OPT_KEEP_DASHDASH);
> +
> + repo_init_revisions(repo, &ctx.revs, prefix);
> + argc = setup_revisions(argc, argv, &ctx.revs, NULL);
OK.
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [PATCH 3/5] backfill: accept revision arguments
2026-03-17 0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 22:01 ` Junio C Hamano
@ 2026-03-18 15:37 ` Kristoffer Haugsbakk
2026-03-23 0:31 ` Derrick Stolee
2026-03-19 9:54 ` Patrick Steinhardt
2 siblings, 1 reply; 46+ messages in thread
From: Kristoffer Haugsbakk @ 2026-03-18 15:37 UTC (permalink / raw)
To: Koji Nakamaru, git; +Cc: Junio C Hamano, Derrick Stolee
On Tue, Mar 17, 2026, at 01:29, Derrick Stolee via GitGitGadget wrote:
>[snip]
> diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
> index b8394dcf22..fdfe22d623 100644
> --- a/Documentation/git-backfill.adoc
> +++ b/Documentation/git-backfill.adoc
> @@ -63,9 +63,12 @@ OPTIONS
> current sparse-checkout. If the sparse-checkout feature is enabled,
> then `--sparse` is assumed and can be disabled with `--no-sparse`.
>
> +You may also specify the commit limiting options from linkgit:git-rev-list[1].
> +
> SEE ALSO
> --------
> linkgit:git-clone[1].
> +linkgit:git-rev-list[1].
Should there be a comma between these two?
>[snip]
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/5] backfill: accept revision arguments
2026-03-18 15:37 ` Kristoffer Haugsbakk
@ 2026-03-23 0:31 ` Derrick Stolee
0 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee @ 2026-03-23 0:31 UTC (permalink / raw)
To: Kristoffer Haugsbakk, Koji Nakamaru, git; +Cc: Junio C Hamano
On 3/18/26 11:37 AM, Kristoffer Haugsbakk wrote:
> On Tue, Mar 17, 2026, at 01:29, Derrick Stolee via GitGitGadget wrote:
>> [snip]
>> diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
>> index b8394dcf22..fdfe22d623 100644
>> --- a/Documentation/git-backfill.adoc
>> +++ b/Documentation/git-backfill.adoc
>> @@ -63,9 +63,12 @@ OPTIONS
>> current sparse-checkout. If the sparse-checkout feature is enabled,
>> then `--sparse` is assumed and can be disabled with `--no-sparse`.
>>
>> +You may also specify the commit limiting options from linkgit:git-rev-list[1].
>> +
>> SEE ALSO
>> --------
>> linkgit:git-clone[1].
>> +linkgit:git-rev-list[1].
>
> Should there be a comma between these two?
Good catch. Also there shouldn't be a hard stop, either.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 3/5] backfill: accept revision arguments
2026-03-17 0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 22:01 ` Junio C Hamano
2026-03-18 15:37 ` Kristoffer Haugsbakk
@ 2026-03-19 9:54 ` Patrick Steinhardt
2026-03-23 0:35 ` Derrick Stolee
2 siblings, 1 reply; 46+ messages in thread
From: Patrick Steinhardt @ 2026-03-19 9:54 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
> The existing implementation of 'git backfill' only includes downloading
> missing blobs reachable from HEAD. Advanced uses may desire more general
> commit limiting options, such as '--all' for all references, specifying a
> commit range via negative references, or specifying a recency of use such as
> with '--since=<date>'.
>
> All of these options are available if we use setup_revisions() to parse the
> unknown arguments with the revision machinery. This opens up a large number
> of possibilities, only a small set of which are tested here.
>
> For documentation, we avoid duplicating the option documentation and instead
> link to the documentation of 'git rev-list'.
>
> Note that these arguments currently allow specifying a pathspec, which
> modifies the commit history checks but does not limit the paths used in the
> backfill logic. This will be updated in a future change.
Makes me wonder whether reversing the order would have avoided this
slight awkwardness. But let's just stick with the current order, the end
result would be the same anyway.
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> Documentation/git-backfill.adoc | 3 +
> builtin/backfill.c | 19 ++--
> t/t5620-backfill.sh | 156 ++++++++++++++++++++++++++++++++
> 3 files changed, 172 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..1b5595b27c 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
> }
> }
>
> - repo_init_revisions(ctx->repo, &revs, "");
> - handle_revision_arg("HEAD", &revs, 0, 0);
> + /* Walk from HEAD if otherwise unspecified. */
> + if (!ctx->revs.pending.nr)
> + handle_revision_arg("HEAD", &ctx->revs, 0, 0);
Can we use `add_head_to_pending(&ctx->revs)` instead?
Patrick
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [PATCH 3/5] backfill: accept revision arguments
2026-03-19 9:54 ` Patrick Steinhardt
@ 2026-03-23 0:35 ` Derrick Stolee
0 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee @ 2026-03-23 0:35 UTC (permalink / raw)
To: Patrick Steinhardt, Derrick Stolee via GitGitGadget; +Cc: git, gitster
On 3/19/26 5:54 AM, Patrick Steinhardt wrote:
> On Tue, Mar 17, 2026 at 12:29:19AM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The existing implementation of 'git backfill' only includes downloading
>> missing blobs reachable from HEAD. Advanced uses may desire more general
>> commit limiting options, such as '--all' for all references, specifying a
>> commit range via negative references, or specifying a recency of use such as
>> with '--since=<date>'.
>>
>> All of these options are available if we use setup_revisions() to parse the
>> unknown arguments with the revision machinery. This opens up a large number
>> of possibilities, only a small set of which are tested here.
>>
>> For documentation, we avoid duplicating the option documentation and instead
>> link to the documentation of 'git rev-list'.
>>
>> Note that these arguments currently allow specifying a pathspec, which
>> modifies the commit history checks but does not limit the paths used in the
>> backfill logic. This will be updated in a future change.
>
> Makes me wonder whether reversing the order would have avoided this
> slight awkwardness. But let's just stick with the current order, the end
> result would be the same anyway.
True, we could have added the pathspec logic first, but we wouldn't be able
to test it right away because the parsing comes through the rev-list.
>> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>> ---
>> Documentation/git-backfill.adoc | 3 +
>> builtin/backfill.c | 19 ++--
>> t/t5620-backfill.sh | 156 ++++++++++++++++++++++++++++++++
>> 3 files changed, 172 insertions(+), 6 deletions(-)
>>
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..1b5595b27c 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
>> }
>> }
>>
>> - repo_init_revisions(ctx->repo, &revs, "");
>> - handle_revision_arg("HEAD", &revs, 0, 0);
>> + /* Walk from HEAD if otherwise unspecified. */
>> + if (!ctx->revs.pending.nr)
>> + handle_revision_arg("HEAD", &ctx->revs, 0, 0);
>
> Can we use `add_head_to_pending(&ctx->revs)` instead?
Nice. We absolutely can and should.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH 4/5] backfill: work with prefix pathspecs
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2026-03-17 0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
@ 2026-03-17 0:29 ` Derrick Stolee via GitGitGadget
2026-03-17 22:10 ` Junio C Hamano
` (2 more replies)
2026-03-17 0:29 ` [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
7 siblings, 3 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-17 0:29 UTC (permalink / raw)
To: git; +Cc: gitster, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The previous change allowed specifying revision arguments over the 'git
backfill' command-line. This created the opportunity for pathspecs that
specify a smaller set of starting commits, but otherwise did not restrict
the blob paths that were downloaded.
Update the path-walk API to accept certain kinds of pathspecs and to
silently ignore anything too complex. The current behavior focuses on
pathspecs that match paths exactly. This includes exact filenames,
including directory names as prefixes. Pathspecs containing wildcards
or magic are cleared so the path walk downloads all blobs, as before.
The reason for this restriction is to allow for a faster execution by
pruning the path walk to only trees that could contribute towards one of
those paths as a parent directory.
The test directory 'd/f/' (next to 'd/file*.txt') was prepared in a
previous commit to exercise the subtlety in prefix matching.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
path-walk.c | 55 +++++++++++++++++++++++++++++++++++++++++++++
t/t5620-backfill.sh | 16 +++++--------
2 files changed, 61 insertions(+), 10 deletions(-)
diff --git a/path-walk.c b/path-walk.c
index 364e4cfa19..e1ad4b0208 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -206,6 +206,49 @@ static int add_tree_entries(struct path_walk_context *ctx,
match != MATCHED)
continue;
}
+ if (ctx->revs->prune_data.nr) {
+ struct pathspec *pd = &ctx->revs->prune_data;
+ bool found = false;
+
+ for (int i = 0; i < pd->nr; i++) {
+ struct pathspec_item *item = &pd->items[i];
+
+ /*
+ * Is this path a parent directory of
+ * the pathspec item?
+ */
+ if (path.len < (size_t)item->len &&
+ !strncmp(path.buf, item->match, path.len) &&
+ item->match[path.len - 1] == '/') {
+ found = true;
+ break;
+ }
+
+ /*
+ * Or, is the pathspec an exact match?
+ */
+ if (path.len == (size_t)item->len &&
+ !strcmp(path.buf, item->match)) {
+ found = true;
+ break;
+ }
+
+ /*
+ * Or, is the pathspec a directory prefix
+ * match?
+ */
+ if (path.len > (size_t)item->len &&
+ !strncmp(path.buf, item->match, item->len) &&
+ path.buf[item->len] == '/') {
+ found = true;
+ break;
+ }
+ }
+
+ /* Skip paths that do not match the prefix. */
+ if (!found)
+ continue;
+ }
add_path_to_list(ctx, path.buf, type, &entry.oid,
!(o->flags & UNINTERESTING));
@@ -481,6 +524,18 @@ int walk_objects_by_path(struct path_walk_info *info)
if (info->tags)
info->revs->tag_objects = 1;
+ if (ctx.revs->prune_data.nr) {
+ /*
+ * Only exact prefix pathspecs are currently supported.
+ * Clear any wildcard or magic pathspecs to avoid
+ * incorrect prefix matching.
+ */
+ struct pathspec *pd = &ctx.revs->prune_data;
+
+ if (pd->has_wildcard || pd->magic)
+ pd->nr = 0;
+ }
+
/* Insert a single list for the root tree into the paths. */
CALLOC_ARRAY(root_tree_list, 1);
root_tree_list->type = OBJ_TREE;
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index db66d8b614..52f6484ca1 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -273,13 +273,11 @@ test_expect_success 'backfill with prefix pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # TODO: The pathspec should limit the downloaded blobs to
- # only those matching the prefix "d/f", but currently all
- # blobs are downloaded.
- git -C backfill-path backfill HEAD -- d/f &&
+ git -C backfill-path backfill HEAD -- d/f 2>err &&
+ test_must_be_empty err &&
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
- test_line_count = 0 missing
+ test_line_count = 40 missing
'
test_expect_success 'backfill with multiple pathspecs' '
@@ -292,13 +290,11 @@ test_expect_success 'backfill with multiple pathspecs' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # TODO: The pathspecs should limit the downloaded blobs to
- # only those matching "d/f" or "a", but currently all blobs
- # are downloaded.
- git -C backfill-path backfill HEAD -- d/f a &&
+ git -C backfill-path backfill HEAD -- d/f a 2>err &&
+ test_must_be_empty err &&
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
- test_line_count = 0 missing
+ test_line_count = 16 missing
'
test_expect_success 'backfill with wildcard pathspec' '
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* Re: [PATCH 4/5] backfill: work with prefix pathspecs
2026-03-17 0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
@ 2026-03-17 22:10 ` Junio C Hamano
2026-03-18 13:15 ` Derrick Stolee
2026-03-19 9:55 ` Patrick Steinhardt
2026-03-19 10:15 ` Patrick Steinhardt
2 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2026-03-17 22:10 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> The previous change allowed specifying revision arguments over the 'git
> backfill' command-line. This created the opportunity for pathspecs that
> specify a smaller set of starting commits, but otherwise did not restrict
> the blob paths that were downloaded.
"pathspecs that specify a smaller set of starting commits" is
puzzling, as starting commits would be coming from the revision
arguments. "opportunity for pathspec to further filter commits
to those that touch only the matching paths...", or something?
> Update the path-walk API to accept certain kinds of pathspecs and to
> silently ignore anything too complex.
Hmph, "silently ignore", instead of "no, you cannot use that! and
die", or at least "sorry, I cannot do that, so the result may not be
what you wanted, you've been warned"?
> The current behavior focuses on
> pathspecs that match paths exactly. This includes exact filenames,
> including directory names as prefixes. Pathspecs containing wildcards
> or magic are cleared so the path walk downloads all blobs, as before.
Ah, "we punt and lift the limitation to grab everything, so at least
everything you wanted to have will become available to you, even
though we may download more than what you asked"? OK, users would
survive that, and as we improve the pathspec support, the user
experience would only improve. OK.
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 4/5] backfill: work with prefix pathspecs
2026-03-17 22:10 ` Junio C Hamano
@ 2026-03-18 13:15 ` Derrick Stolee
2026-03-19 9:54 ` Patrick Steinhardt
0 siblings, 1 reply; 46+ messages in thread
From: Derrick Stolee @ 2026-03-18 13:15 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git
On 3/17/2026 6:10 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> The previous change allowed specifying revision arguments over the 'git
>> backfill' command-line. This created the opportunity for pathspecs that
>> specify a smaller set of starting commits, but otherwise did not restrict
>> the blob paths that were downloaded.
>
> "pathspecs that specify a smaller set of starting commits" is
> puzzling, as starting commits would be coming from the revision
> arguments. "opportunity for pathspec to further filter commits
> to those that touch only the matching paths...", or something?
You're right. I'm using "starting commits" incorrectly. My view was too
focused on how the path-walk API starts from the commits output by the
revision walk to get a list of root trees and then walks by path from
that point.
I'll reword to make this more clear.
>> Update the path-walk API to accept certain kinds of pathspecs and to
>> silently ignore anything too complex.
>
> Hmph, "silently ignore", instead of "no, you cannot use that! and
> die", or at least "sorry, I cannot do that, so the result may not be
> what you wanted, you've been warned"?
The behavior when silently ignoring is to over-download. The revision
walk still filters commits, but the path-walk then walks paths beyond
that pathspec. This will be fixed in the next commit, so adding an
error case didn't seem worth it. I'll do a better job foreshadowing.
>> The current behavior focuses on
>> pathspecs that match paths exactly. This includes exact filenames,
>> including directory names as prefixes. Pathspecs containing wildcards
>> or magic are cleared so the path walk downloads all blobs, as before.
>
> Ah, "we punt and lift the limitation to grab everything, so at least
> everything you wanted to have will become available to you, even
> though we may download more than what you asked"? OK, users would
> survive that, and as we improve the pathspec support, the user
> experience would only improve. OK.
Exactly. I can word things better.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 4/5] backfill: work with prefix pathspecs
2026-03-18 13:15 ` Derrick Stolee
@ 2026-03-19 9:54 ` Patrick Steinhardt
0 siblings, 0 replies; 46+ messages in thread
From: Patrick Steinhardt @ 2026-03-19 9:54 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git
On Wed, Mar 18, 2026 at 09:15:00AM -0400, Derrick Stolee wrote:
> On 3/17/2026 6:10 PM, Junio C Hamano wrote:
> >> Update the path-walk API to accept certain kinds of pathspecs and to
> >> silently ignore anything too complex.
> >
> > Hmph, "silently ignore", instead of "no, you cannot use that! and
> > die", or at least "sorry, I cannot do that, so the result may not be
> > what you wanted, you've been warned"?
>
> The behavior when silently ignoring is to over-download. The revision
> walk still filters commits, but the path-walk then walks paths beyond
> that pathspec. This will be fixed in the next commit, so adding an
> error case didn't seem worth it. I'll do a better job foreshadowing.
I guess this is a fine tradeoff when documented properly. But I think in
that case we should make very clear that this behaviour may change in
the future if find a way to efficiently limit the pathwalk, too.
Patrick
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 4/5] backfill: work with prefix pathspecs
2026-03-17 0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-17 22:10 ` Junio C Hamano
@ 2026-03-19 9:55 ` Patrick Steinhardt
2026-03-19 10:15 ` Patrick Steinhardt
2 siblings, 0 replies; 46+ messages in thread
From: Patrick Steinhardt @ 2026-03-19 9:55 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
On Tue, Mar 17, 2026 at 12:29:20AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 364e4cfa19..e1ad4b0208 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -481,6 +524,18 @@ int walk_objects_by_path(struct path_walk_info *info)
> if (info->tags)
> info->revs->tag_objects = 1;
>
> + if (ctx.revs->prune_data.nr) {
> + /*
> + * Only exact prefix pathspecs are currently supported.
> + * Clear any wildcard or magic pathspecs to avoid
> + * incorrect prefix matching.
> + */
> + struct pathspec *pd = &ctx.revs->prune_data;
> +
> + if (pd->has_wildcard || pd->magic)
> + pd->nr = 0;
> + }
Huh, curious. Won't this cause a leak? I guess we should rather use
`clear_pathspec()` here.
Also shows that this path is missing test coverage.
Patrick
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [PATCH 4/5] backfill: work with prefix pathspecs
2026-03-17 0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-17 22:10 ` Junio C Hamano
2026-03-19 9:55 ` Patrick Steinhardt
@ 2026-03-19 10:15 ` Patrick Steinhardt
2026-03-23 0:47 ` Derrick Stolee
2 siblings, 1 reply; 46+ messages in thread
From: Patrick Steinhardt @ 2026-03-19 10:15 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
On Tue, Mar 17, 2026 at 12:29:20AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 364e4cfa19..e1ad4b0208 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -206,6 +206,49 @@ static int add_tree_entries(struct path_walk_context *ctx,
> match != MATCHED)
> continue;
> }
> + if (ctx->revs->prune_data.nr) {
> + struct pathspec *pd = &ctx->revs->prune_data;
> + bool found = false;
> +
> + for (int i = 0; i < pd->nr; i++) {
> + struct pathspec_item *item = &pd->items[i];
> +
> + /*
> + * Is this path a parent directory of
> + * the pathspec item?
> + */
> + if (path.len < (size_t)item->len &&
> + !strncmp(path.buf, item->match, path.len) &&
> + item->match[path.len - 1] == '/') {
> + found = true;
> + break;
> + }
> +
> + /*
> + * Or, is the pathspec an exact match?
> + */
> + if (path.len == (size_t)item->len &&
> + !strcmp(path.buf, item->match)) {
> + found = true;
> + break;
> + }
> +
> + /*
> + * Or, is the pathspec a directory prefix
> + * match?
> + */
> + if (path.len > (size_t)item->len &&
> + !strncmp(path.buf, item->match, item->len) &&
> + path.buf[item->len] == '/') {
> + found = true;
> + break;
> + }
Ah, one more thing: we could expose `dir_prefix()` from "path.c" and
reuse it here.
Patrick
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [PATCH 4/5] backfill: work with prefix pathspecs
2026-03-19 10:15 ` Patrick Steinhardt
@ 2026-03-23 0:47 ` Derrick Stolee
0 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee @ 2026-03-23 0:47 UTC (permalink / raw)
To: Patrick Steinhardt, Derrick Stolee via GitGitGadget; +Cc: git, gitster
On 3/19/26 6:15 AM, Patrick Steinhardt wrote:
> Ah, one more thing: we could expose `dir_prefix()` from "path.c" and
> reuse it here.
Good idea. This becomes
/*
* Continue if either is a directory prefix
* of the other.
*/
if (dir_prefix(path.buf, item->match) ||
dir_prefix(item->match, path.buf)) {
found = true;
break;
}
With the idea that we need to walk the parents of each prefix in
addition to walking all of their children.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2026-03-17 0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
@ 2026-03-17 0:29 ` Derrick Stolee via GitGitGadget
2026-03-17 22:19 ` Junio C Hamano
2026-03-17 21:45 ` [PATCH 0/5] backfill: accept revision arguments Junio C Hamano
` (2 subsequent siblings)
7 siblings, 1 reply; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-17 0:29 UTC (permalink / raw)
To: git; +Cc: gitster, Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Previously, walk_objects_by_path() silently ignored pathspecs containing
wildcards or magic by clearing them. This caused all blobs to be
downloaded regardless of the given pathspec. Wildcard pathspecs like
"d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
during 'git backfill').
Support wildcard pathspecs by making three changes:
1. Add an 'exact_pathspecs' flag to path_walk_context. When the
pathspec has no wildcards or magic, set this flag and use the
existing fast-path prefix matching in add_tree_entries(). When
wildcards are present, skip that block since prefix matching
cannot handle glob patterns.
2. Disable revision-level commit pruning (revs->prune = 0) for
wildcard pathspecs. The revision walk uses the pathspec to filter
commits via TREESAME detection. For exact prefix pathspecs this
works well, but wildcard pathspecs may fail to match through
TREESAME because fnmatch with WM_PATHNAME does not cross directory
boundaries. Disabling pruning ensures all commits are visited and
their trees are available for the path-walk to filter.
3. Add a match_pathspec() check in walk_path() to filter out blobs
whose full path does not match the pathspec. This provides the
actual blob-level filtering for wildcard pathspecs.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
path-walk.c | 22 ++++++++++++++--------
t/t5620-backfill.sh | 7 +++----
2 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/path-walk.c b/path-walk.c
index e1ad4b0208..67fb0f7572 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -62,6 +62,8 @@ struct path_walk_context {
*/
struct prio_queue path_stack;
struct strset path_stack_pushed;
+
+ unsigned exact_pathspecs:1;
};
static int compare_by_type(const void *one, const void *two, void *cb_data)
@@ -206,7 +208,7 @@ static int add_tree_entries(struct path_walk_context *ctx,
match != MATCHED)
continue;
}
- if (ctx->revs->prune_data.nr) {
+ if (ctx->revs->prune_data.nr && ctx->exact_pathspecs) {
struct pathspec *pd = &ctx->revs->prune_data;
bool found = false;
@@ -317,6 +319,13 @@ static int walk_path(struct path_walk_context *ctx,
return 0;
}
+ if (list->type == OBJ_BLOB &&
+ ctx->revs->prune_data.nr &&
+ !match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
+ path, strlen(path), 0,
+ NULL, 0))
+ return 0;
+
/* Evaluate function pointer on this data, if requested. */
if ((list->type == OBJ_TREE && ctx->info->trees) ||
(list->type == OBJ_BLOB && ctx->info->blobs) ||
@@ -525,15 +534,12 @@ int walk_objects_by_path(struct path_walk_info *info)
info->revs->tag_objects = 1;
if (ctx.revs->prune_data.nr) {
- /*
- * Only exact prefix pathspecs are currently supported.
- * Clear any wildcard or magic pathspecs to avoid
- * incorrect prefix matching.
- */
struct pathspec *pd = &ctx.revs->prune_data;
- if (pd->has_wildcard || pd->magic)
- pd->nr = 0;
+ if (!pd->has_wildcard && !pd->magic)
+ ctx.exact_pathspecs = 1;
+ else
+ ctx.revs->prune = 0;
}
/* Insert a single list for the root tree into the paths. */
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 52f6484ca1..c6f54ee91c 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -307,12 +307,11 @@ test_expect_success 'backfill with wildcard pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # TODO: The wildcard pathspec should limit downloaded blobs,
- # but currently all blobs are downloaded.
- git -C backfill-path backfill HEAD -- "d/file.*.txt" &&
+ git -C backfill-path backfill HEAD -- "d/file.*.txt" 2>err &&
+ test_must_be_empty err &&
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
- test_line_count = 0 missing
+ test_line_count = 40 missing
'
test_expect_success 'backfill with --all' '
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* Re: [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering
2026-03-17 0:29 ` [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
@ 2026-03-17 22:19 ` Junio C Hamano
2026-03-18 13:16 ` Derrick Stolee
0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2026-03-17 22:19 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> Previously, walk_objects_by_path() silently ignored pathspecs containing
> wildcards or magic by clearing them. This caused all blobs to be
> downloaded regardless of the given pathspec. Wildcard pathspecs like
> "d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
> during 'git backfill').
>
> Support wildcard pathspecs by making three changes:
>
> 1. Add an 'exact_pathspecs' flag to path_walk_context. When the
> pathspec has no wildcards or magic, set this flag and use the
> existing fast-path prefix matching in add_tree_entries(). When
> wildcards are present, skip that block since prefix matching
> cannot handle glob patterns.
>
> 2. Disable revision-level commit pruning (revs->prune = 0) for
> wildcard pathspecs. The revision walk uses the pathspec to filter
> commits via TREESAME detection. For exact prefix pathspecs this
> works well, but wildcard pathspecs may fail to match through
> TREESAME because fnmatch with WM_PATHNAME does not cross directory
> boundaries. Disabling pruning ensures all commits are visited and
> their trees are available for the path-walk to filter.
Hmph, I wonder how significant an impact does it have on the
performance that we have to disable pruning here. With the bog
standard tree traversal, wouldn't tree_entry_interesting() already
be capable of doing this, even with fnmatch / WM_PATHNAME ?
> 3. Add a match_pathspec() check in walk_path() to filter out blobs
> whose full path does not match the pathspec. This provides the
> actual blob-level filtering for wildcard pathspecs.
>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The latter person cannot sign DCO or vouch for the origin of what
they have written in this patch, can they?
> ---
> path-walk.c | 22 ++++++++++++++--------
> t/t5620-backfill.sh | 7 +++----
> 2 files changed, 17 insertions(+), 12 deletions(-)
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering
2026-03-17 22:19 ` Junio C Hamano
@ 2026-03-18 13:16 ` Derrick Stolee
2026-03-23 1:33 ` Derrick Stolee
0 siblings, 1 reply; 46+ messages in thread
From: Derrick Stolee @ 2026-03-18 13:16 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git
On 3/17/2026 6:19 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> Previously, walk_objects_by_path() silently ignored pathspecs containing
>> wildcards or magic by clearing them. This caused all blobs to be
>> downloaded regardless of the given pathspec. Wildcard pathspecs like
>> "d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
>> during 'git backfill').
>>
>> Support wildcard pathspecs by making three changes:
>>
>> 1. Add an 'exact_pathspecs' flag to path_walk_context. When the
>> pathspec has no wildcards or magic, set this flag and use the
>> existing fast-path prefix matching in add_tree_entries(). When
>> wildcards are present, skip that block since prefix matching
>> cannot handle glob patterns.
>>
>> 2. Disable revision-level commit pruning (revs->prune = 0) for
>> wildcard pathspecs. The revision walk uses the pathspec to filter
>> commits via TREESAME detection. For exact prefix pathspecs this
>> works well, but wildcard pathspecs may fail to match through
>> TREESAME because fnmatch with WM_PATHNAME does not cross directory
>> boundaries. Disabling pruning ensures all commits are visited and
>> their trees are available for the path-walk to filter.
>
> Hmph, I wonder how significant an impact does it have on the
> performance that we have to disable pruning here. With the bog
> standard tree traversal, wouldn't tree_entry_interesting() already
> be capable of doing this, even with fnmatch / WM_PATHNAME ?
I will explore what's possible here and see what I can do.
>> 3. Add a match_pathspec() check in walk_path() to filter out blobs
>> whose full path does not match the pathspec. This provides the
>> actual blob-level filtering for wildcard pathspecs.
>>
>> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
>
> The latter person cannot sign DCO or vouch for the origin of what
> they have written in this patch, can they?
No they cannot. Sorry for this error.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering
2026-03-18 13:16 ` Derrick Stolee
@ 2026-03-23 1:33 ` Derrick Stolee
0 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee @ 2026-03-23 1:33 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget; +Cc: git
On 3/18/26 9:16 AM, Derrick Stolee wrote:
> On 3/17/2026 6:19 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>> From: Derrick Stolee <stolee@gmail.com>
>>>
>>> Previously, walk_objects_by_path() silently ignored pathspecs containing
>>> wildcards or magic by clearing them. This caused all blobs to be
>>> downloaded regardless of the given pathspec. Wildcard pathspecs like
>>> "d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
>>> during 'git backfill').
>>>
>>> Support wildcard pathspecs by making three changes:
>>>
>>> 1. Add an 'exact_pathspecs' flag to path_walk_context. When the
>>> pathspec has no wildcards or magic, set this flag and use the
>>> existing fast-path prefix matching in add_tree_entries(). When
>>> wildcards are present, skip that block since prefix matching
>>> cannot handle glob patterns.
>>>
>>> 2. Disable revision-level commit pruning (revs->prune = 0) for
>>> wildcard pathspecs. The revision walk uses the pathspec to filter
>>> commits via TREESAME detection. For exact prefix pathspecs this
>>> works well, but wildcard pathspecs may fail to match through
>>> TREESAME because fnmatch with WM_PATHNAME does not cross directory
>>> boundaries. Disabling pruning ensures all commits are visited and
>>> their trees are available for the path-walk to filter.
>>
>> Hmph, I wonder how significant an impact does it have on the
>> performance that we have to disable pruning here. With the bog
>> standard tree traversal, wouldn't tree_entry_interesting() already
>> be capable of doing this, even with fnmatch / WM_PATHNAME ?
>
> I will explore what's possible here and see what I can do.
I must have needed the 'revs->prune = 0' at some point during development
and left it even though it isn't actually necessary. Leaving it
implicitly at '1' should indeed be faster due to traversing fewer commits
and parsing fewer trees while still reaching all necessary blobs.
Only changes 1 and 3 are necessary.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 0/5] backfill: accept revision arguments
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (4 preceding siblings ...)
2026-03-17 0:29 ` [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
@ 2026-03-17 21:45 ` Junio C Hamano
2026-03-19 9:54 ` Patrick Steinhardt
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
7 siblings, 0 replies; 46+ messages in thread
From: Junio C Hamano @ 2026-03-17 21:45 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> The git backfill command assists in downloading missing blobs for blobless
> partial clones. However, its current version lacks some valuable
> functionality. It currently:
>
> 1. Only walks commits reachable from HEAD.
> 2. It walks all reachable commits to the full history.
> 3. It can focus on the current sparse-checkout definition, but otherwise it
> doesn't focus on a given pathspec.
>
> All of these are being updated by this patch series, which allows rev-list
> options to impact the path-walk. These include:
>
> 1. Specifying a given refspec, including --all.
Makes sense. You can only be on a single branch at a time, but may
want to work on multiple topics in reasonably quick succession in a
single repository. Being able to prepare enough material to go back
to when working on whichever topic in a single backfill invocation
would be a welcome addition.
> 2. Modifying the commit walk, including --first-parent, commit ranges, or
> recency using --since.
> 3. Modifying the set of paths to download using pathspecs.
Both are good mechanisms to express which subset of history you will
be working on.
> One particularly valuable situation here is that now a user can run git
> backfill -- <path> to download all versions of a specific file or a specific
> directory, accelerating history queries within that path without downloading
> more than necessary. This can accelerate git blame or git log -L for these
> paths, where normally those commands download missing blobs one-by-one
> during its diff algorithms.
Yup. Even if your project is a huge monorepo that contains all, you
do not necessarily have to look at everything the organization has
all the time. "git blame -C -C -C" would of course not work in such
an environment (would it end up on-demand lazy fetch these blobs, or
are there ways to say "I know the object store of my repository is
only sparsely populated, and I do not want you to on-demand download
the missing blobs---do your best to work with only what is already
available?), but that's a tradeoff a monorepo makes.
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [PATCH 0/5] backfill: accept revision arguments
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (5 preceding siblings ...)
2026-03-17 21:45 ` [PATCH 0/5] backfill: accept revision arguments Junio C Hamano
@ 2026-03-19 9:54 ` Patrick Steinhardt
2026-03-19 12:59 ` Derrick Stolee
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
7 siblings, 1 reply; 46+ messages in thread
From: Patrick Steinhardt @ 2026-03-19 9:54 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget; +Cc: git, gitster, Derrick Stolee
On Tue, Mar 17, 2026 at 12:29:16AM +0000, Derrick Stolee via GitGitGadget wrote:
> The git backfill command assists in downloading missing blobs for blobless
> partial clones. However, its current version lacks some valuable
> functionality. It currently:
>
> 1. Only walks commits reachable from HEAD.
> 2. It walks all reachable commits to the full history.
> 3. It can focus on the current sparse-checkout definition, but otherwise it
> doesn't focus on a given pathspec.
>
> All of these are being updated by this patch series, which allows rev-list
> options to impact the path-walk. These include:
>
> 1. Specifying a given refspec, including --all.
> 2. Modifying the commit walk, including --first-parent, commit ranges, or
> recency using --since.
> 3. Modifying the set of paths to download using pathspecs.
>
> One particularly valuable situation here is that now a user can run git
> backfill -- <path> to download all versions of a specific file or a specific
> directory, accelerating history queries within that path without downloading
> more than necessary. This can accelerate git blame or git log -L for these
> paths, where normally those commands download missing blobs one-by-one
> during its diff algorithms.
Nice.
I think especially blaming is a bit of a sore spot -- downloading blobs
one by one simply doesn't cut it there. I wonder whether we can easily
use the backfill mechanism to fetch blobs automatically in git-blame(1)
so that the user doesn't need to know about git-backfill(1) at all?
Patrick
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [PATCH 0/5] backfill: accept revision arguments
2026-03-19 9:54 ` Patrick Steinhardt
@ 2026-03-19 12:59 ` Derrick Stolee
2026-03-20 7:35 ` Patrick Steinhardt
0 siblings, 1 reply; 46+ messages in thread
From: Derrick Stolee @ 2026-03-19 12:59 UTC (permalink / raw)
To: Patrick Steinhardt, Derrick Stolee via GitGitGadget; +Cc: git, gitster
On 3/19/2026 5:54 AM, Patrick Steinhardt wrote:
> I think especially blaming is a bit of a sore spot -- downloading blobs
> one by one simply doesn't cut it there. I wonder whether we can easily
> use the backfill mechanism to fetch blobs automatically in git-blame(1)
> so that the user doesn't need to know about git-backfill(1) at all?
I've thought about this a bit, and I'm not sure that we want to run
'git backfill' directly. Instead, it would be nice if we did a "staged"
algorithm for 'git blame':
1. Walk commits according to the pathspec to collect the commits that
changed the path.
2. Collect the list of blob OIDs that will be needed for computing diffs
for the line-tracking algorithm.
3. In batches, download groups of missing blobs and then process them
for line-tracking diffs. (Stop if all lines are blamed; continue to
next batch if more lines are needed.)
This would be a significant rewrite of the blame algorithm, though. I
briefly considered this approach about a year ago and decided it would
be easier to start with 'git backfill' and see whether that satisfies
most needs.
The biggest reason to maybe avoid 'git backfill HEAD -- <path>' before
_every_ blame operation is that this will add overhead on repeated
calls that may be obnoxious in its own way. Maybe doing an opt-in
'git blame --backfill <path>' would make it easier for users to opt-in
when they want to.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH 0/5] backfill: accept revision arguments
2026-03-19 12:59 ` Derrick Stolee
@ 2026-03-20 7:35 ` Patrick Steinhardt
0 siblings, 0 replies; 46+ messages in thread
From: Patrick Steinhardt @ 2026-03-20 7:35 UTC (permalink / raw)
To: Derrick Stolee; +Cc: Derrick Stolee via GitGitGadget, git, gitster
On Thu, Mar 19, 2026 at 08:59:01AM -0400, Derrick Stolee wrote:
> On 3/19/2026 5:54 AM, Patrick Steinhardt wrote:
> > I think especially blaming is a bit of a sore spot -- downloading blobs
> > one by one simply doesn't cut it there. I wonder whether we can easily
> > use the backfill mechanism to fetch blobs automatically in git-blame(1)
> > so that the user doesn't need to know about git-backfill(1) at all?
>
> I've thought about this a bit, and I'm not sure that we want to run
> 'git backfill' directly. Instead, it would be nice if we did a "staged"
> algorithm for 'git blame':
>
> 1. Walk commits according to the pathspec to collect the commits that
> changed the path.
>
> 2. Collect the list of blob OIDs that will be needed for computing diffs
> for the line-tracking algorithm.
>
> 3. In batches, download groups of missing blobs and then process them
> for line-tracking diffs. (Stop if all lines are blamed; continue to
> next batch if more lines are needed.)
>
> This would be a significant rewrite of the blame algorithm, though. I
> briefly considered this approach about a year ago and decided it would
> be easier to start with 'git backfill' and see whether that satisfies
> most needs.
>
> The biggest reason to maybe avoid 'git backfill HEAD -- <path>' before
> _every_ blame operation is that this will add overhead on repeated
> calls that may be obnoxious in its own way. Maybe doing an opt-in
> 'git blame --backfill <path>' would make it easier for users to opt-in
> when they want to.
That's fair. I fully agree that just blindly doing this would be likely
be inefficient. Ideally, the batching logic would only kick in whenever
we see a missing object.
Anyway, this definitely doesn't have to be part of this series, I was
mostly wondering how hard it is to do.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v2 0/6] backfill: accept revision arguments
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (6 preceding siblings ...)
2026-03-19 9:54 ` Patrick Steinhardt
@ 2026-03-23 11:40 ` Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
` (6 more replies)
7 siblings, 7 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-23 11:40 UTC (permalink / raw)
To: git; +Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee
The git backfill command assists in downloading missing blobs for blobless
partial clones. However, its current version lacks some valuable
functionality. It currently:
1. Only walks commits reachable from HEAD.
2. It walks all reachable commits to the full history.
3. It can focus on the current sparse-checkout definition, but otherwise it
doesn't focus on a given pathspec.
All of these are being updated by this patch series, which allows rev-list
options to impact the path-walk. These include:
1. Specifying a given refspec, including --all.
2. Modifying the commit walk, including --first-parent, commit ranges, or
recency using --since.
3. Modifying the set of paths to download using pathspecs.
One particularly valuable situation here is that now a user can run git
backfill -- <path> to download all versions of a specific file or a specific
directory, accelerating history queries within that path without downloading
more than necessary. This can accelerate git blame or git log -L for these
paths, where normally those commands download missing blobs one-by-one
during its diff algorithms.
This patch series is organized in the following way:
1. A missing #include is added to prevent future compilation issues.
2. The test repo in t5620 is expanded to make later tests more interesting.
3. The backfill builtin parses the rev-list arguments. We test the top
arguments that work as expected, though the pathspec arguments need
extra work.
4. Update the path-walk logic to work efficiently with some pathspecs, such
as fixed prefix pathspecs, accelerating the computation.
5. For more complicated pathspecs, do a post-filter in builtin/backfill.c
instead of restricting the walk in the path-walk API.
The main goal of this series is to make such customizations possible, and to
improve performance where common use cases are expected. I'm open to
feedback as to whether we should consider more detailed performance analysis
or whether we should wait for how users interact with these new options
before overoptimizing unlikely use cases.
Updates in v2
=============
* Hard stops are replaced with a comma (and no punctuation) in the docs.
* add_head_to_pending() simplifies some code.
* My poor explanation of "starting commits" is updated.
* Language around temporary prefix restriction is clarified.
* Prefix match logic is simplified with dir_prefix().
* Temporary memory leak (introduced in v1's patch 4 and removed in v1's
patch 5) is removed in v2's patch 4.
* Commit pruning is reenabled in v2's patch 5. There was no need for that
with the way the logic works in the patch.
* Add a new patch with a test demonstrating the new behavior that was being
discussed in [1].
[1]
https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
Thanks, -Stolee
Derrick Stolee (6):
revision: include object-name.h
t5620: prepare branched repo for revision tests
backfill: accept revision arguments
backfill: work with prefix pathspecs
path-walk: support wildcard pathspecs for blob filtering
t5620: test backfill's unknown argument handling
Documentation/git-backfill.adoc | 5 +-
builtin/backfill.c | 19 ++-
path-walk.c | 44 +++++++
path.c | 2 +-
path.h | 6 +
revision.h | 1 +
t/t5620-backfill.sh | 211 +++++++++++++++++++++++++++++++-
7 files changed, 278 insertions(+), 10 deletions(-)
base-commit: 67ad42147a7acc2af6074753ebd03d904476118f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2070%2Fderrickstolee%2Fbackfill-revs-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2070/derrickstolee/backfill-revs-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2070
Range-diff vs v1:
1: fda0239103 = 1: fda0239103 revision: include object-name.h
2: 55a45b2fc8 = 2: 55a45b2fc8 t5620: prepare branched repo for revision tests
3: dc6652c84c ! 3: 610a162973 backfill: accept revision arguments
@@ Documentation/git-backfill.adoc: OPTIONS
+
SEE ALSO
--------
- linkgit:git-clone[1].
-+linkgit:git-rev-list[1].
+-linkgit:git-clone[1].
++linkgit:git-clone[1],
++linkgit:git-rev-list[1]
GIT
---
@@ builtin/backfill.c: static int do_backfill(struct backfill_context *ctx)
- handle_revision_arg("HEAD", &revs, 0, 0);
+ /* Walk from HEAD if otherwise unspecified. */
+ if (!ctx->revs.pending.nr)
-+ handle_revision_arg("HEAD", &ctx->revs, 0, 0);
++ add_head_to_pending(&ctx->revs);
info.blobs = 1;
info.tags = info.commits = info.trees = 0;
4: 977f62faa5 ! 4: f8f2c61326 backfill: work with prefix pathspecs
@@ Commit message
backfill: work with prefix pathspecs
The previous change allowed specifying revision arguments over the 'git
- backfill' command-line. This created the opportunity for pathspecs that
- specify a smaller set of starting commits, but otherwise did not restrict
- the blob paths that were downloaded.
+ backfill' command-line. This created the opportunity for restricting the
+ initial commit set by filtering the revision walk through a pathspec. Other
+ than filtering the commit set (and thereby the root trees), this did not
+ restrict the path-walk implementation of 'git backfill' and did not restrict
+ the blobs that were downloaded to only those matching the pathspec.
Update the path-walk API to accept certain kinds of pathspecs and to
- silently ignore anything too complex. The current behavior focuses on
- pathspecs that match paths exactly. This includes exact filenames,
- including directory names as prefixes. Pathspecs containing wildcards
- or magic are cleared so the path walk downloads all blobs, as before.
+ silently ignore anything too complex, for now. We will update this in the
+ next change to properly restrict to even complex pathspecs.
+
+ The current behavior focuses on pathspecs that match paths exactly. This
+ includes exact filenames, including directory names as prefixes. Pathspecs
+ containing wildcards or magic are cleared so the path walk downloads all
+ blobs, as before.
The reason for this restriction is to allow for a faster execution by
pruning the path walk to only trees that could contribute towards one of
@@ Commit message
Signed-off-by: Derrick Stolee <stolee@gmail.com>
## path-walk.c ##
+@@
+ #include "list-objects.h"
+ #include "object.h"
+ #include "oid-array.h"
++#include "path.h"
+ #include "prio-queue.h"
+ #include "repository.h"
+ #include "revision.h"
@@ path-walk.c: static int add_tree_entries(struct path_walk_context *ctx,
match != MATCHED)
continue;
@@ path-walk.c: static int add_tree_entries(struct path_walk_context *ctx,
+ struct pathspec *pd = &ctx->revs->prune_data;
+ bool found = false;
+
++ /* remove '/' for these checks. */
++ path.buf[path.len - 1] = 0;
++
+ for (int i = 0; i < pd->nr; i++) {
+ struct pathspec_item *item = &pd->items[i];
+
+ /*
-+ * Is this path a parent directory of
-+ * the pathspec item?
-+ */
-+ if (path.len < (size_t)item->len &&
-+ !strncmp(path.buf, item->match, path.len) &&
-+ item->match[path.len - 1] == '/') {
-+ found = true;
-+ break;
-+ }
-+
-+ /*
-+ * Or, is the pathspec an exact match?
-+ */
-+ if (path.len == (size_t)item->len &&
-+ !strcmp(path.buf, item->match)) {
-+ found = true;
-+ break;
-+ }
-+
-+ /*
-+ * Or, is the pathspec a directory prefix
-+ * match?
++ * Continue if either is a directory prefix
++ * of the other.
+ */
-+ if (path.len > (size_t)item->len &&
-+ !strncmp(path.buf, item->match, item->len) &&
-+ path.buf[item->len] == '/') {
++ if (dir_prefix(path.buf, item->match) ||
++ dir_prefix(item->match, path.buf)) {
+ found = true;
+ break;
+ }
+ }
+
++ /* return '/' after these checks. */
++ path.buf[path.len - 1] = '/';
++
+ /* Skip paths that do not match the prefix. */
+ if (!found)
+ continue;
@@ path-walk.c: int walk_objects_by_path(struct path_walk_info *info)
+ * Clear any wildcard or magic pathspecs to avoid
+ * incorrect prefix matching.
+ */
-+ struct pathspec *pd = &ctx.revs->prune_data;
-+
-+ if (pd->has_wildcard || pd->magic)
-+ pd->nr = 0;
++ if (ctx.revs->prune_data.has_wildcard ||
++ ctx.revs->prune_data.magic)
++ clear_pathspec(&ctx.revs->prune_data);
+ }
+
/* Insert a single list for the root tree into the paths. */
CALLOC_ARRAY(root_tree_list, 1);
root_tree_list->type = OBJ_TREE;
+ ## path.c ##
+@@ path.c: static void strbuf_cleanup_path(struct strbuf *sb)
+ strbuf_remove(sb, 0, path - sb->buf);
+ }
+
+-static int dir_prefix(const char *buf, const char *dir)
++int dir_prefix(const char *buf, const char *dir)
+ {
+ int len = strlen(dir);
+ return !strncmp(buf, dir, len) &&
+
+ ## path.h ##
+@@ path.h: const char *repo_submodule_path_replace(struct repository *repo,
+ const char *fmt, ...)
+ __attribute__((format (printf, 4, 5)));
+
++/*
++ * Given a directory name 'dir' (not ending with a trailing '/'),
++ * determine if 'buf' is equal to 'dir' or has prefix 'dir'+'/'.
++ */
++int dir_prefix(const char *buf, const char *dir);
++
+ void report_linked_checkout_garbage(struct repository *r);
+
+ /*
+
## t/t5620-backfill.sh ##
@@ t/t5620-backfill.sh: test_expect_success 'backfill with prefix pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
5: beb1c92554 ! 5: 1168edfb96 path-walk: support wildcard pathspecs for blob filtering
@@ Commit message
"d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
during 'git backfill').
- Support wildcard pathspecs by making three changes:
+ Support wildcard pathspecs by making two changes:
1. Add an 'exact_pathspecs' flag to path_walk_context. When the
pathspec has no wildcards or magic, set this flag and use the
@@ Commit message
wildcards are present, skip that block since prefix matching
cannot handle glob patterns.
- 2. Disable revision-level commit pruning (revs->prune = 0) for
- wildcard pathspecs. The revision walk uses the pathspec to filter
- commits via TREESAME detection. For exact prefix pathspecs this
- works well, but wildcard pathspecs may fail to match through
- TREESAME because fnmatch with WM_PATHNAME does not cross directory
- boundaries. Disabling pruning ensures all commits are visited and
- their trees are available for the path-walk to filter.
-
- 3. Add a match_pathspec() check in walk_path() to filter out blobs
+ 2. Add a match_pathspec() check in walk_path() to filter out blobs
whose full path does not match the pathspec. This provides the
actual blob-level filtering for wildcard pathspecs.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## path-walk.c ##
@@ path-walk.c: struct path_walk_context {
@@ path-walk.c: int walk_objects_by_path(struct path_walk_info *info)
- * Clear any wildcard or magic pathspecs to avoid
- * incorrect prefix matching.
- */
- struct pathspec *pd = &ctx.revs->prune_data;
-
-- if (pd->has_wildcard || pd->magic)
-- pd->nr = 0;
-+ if (!pd->has_wildcard && !pd->magic)
+- if (ctx.revs->prune_data.has_wildcard ||
+- ctx.revs->prune_data.magic)
+- clear_pathspec(&ctx.revs->prune_data);
++ if (!ctx.revs->prune_data.has_wildcard &&
++ !ctx.revs->prune_data.magic)
+ ctx.exact_pathspecs = 1;
-+ else
-+ ctx.revs->prune = 0;
}
/* Insert a single list for the root tree into the paths. */
-: ---------- > 6: 9699650aa7 t5620: test backfill's unknown argument handling
--
gitgitgadget
^ permalink raw reply [flat|nested] 46+ messages in thread* [PATCH v2 1/6] revision: include object-name.h
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
@ 2026-03-23 11:40 ` Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
` (5 subsequent siblings)
6 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-23 11:40 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The REV_INFO_INIT macro includes a use of the DEFAULT_ABBREV macro, which is
defined in object-name.h. Include it in revision.h so consumers of
REV_INFO_INIT do not need to include this hidden dependency.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
revision.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/revision.h b/revision.h
index b36acfc2d9..18c9bbd822 100644
--- a/revision.h
+++ b/revision.h
@@ -4,6 +4,7 @@
#include "commit.h"
#include "grep.h"
#include "notes.h"
+#include "object-name.h"
#include "oidset.h"
#include "pretty.h"
#include "diff.h"
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* [PATCH v2 2/6] t5620: prepare branched repo for revision tests
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
@ 2026-03-23 11:40 ` Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (4 subsequent siblings)
6 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-23 11:40 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Prepare the test infrastructure for upcoming changes that teach 'git
backfill' to accept revision arguments and pathspecs.
Add test_tick before each commit in the setup loop so that commit dates
are deterministic. This enables reliable testing with '--since'.
Rename the 'd/e/' directory to 'd/f/' so that the prefix 'd/f' is
ambiguous with the files 'd/file.*.txt'. This exercises the subtlety
in prefix pathspec matching that will be added in a later commit.
Create a branched version of the test repository (src-revs) with:
- A 'side' branch merged into main, adding s/file.{1,2}.txt with
two versions (4 new blobs, 52 total from main HEAD).
- An unmerged 'other' branch adding o/file.{1,2}.txt (2 more blobs,
54 total reachable from --all).
This structure makes --all, --first-parent, and --since produce
meaningfully different results when used with 'git backfill'.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t5620-backfill.sh | 52 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 50 insertions(+), 2 deletions(-)
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 58c81556e7..1331949be4 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -15,7 +15,7 @@ test_expect_success 'setup repo for object creation' '
git init src &&
mkdir -p src/a/b/c &&
- mkdir -p src/d/e &&
+ mkdir -p src/d/f &&
for i in 1 2
do
@@ -26,8 +26,9 @@ test_expect_success 'setup repo for object creation' '
echo "Version $i of file a/b/$n" > src/a/b/file.$n.txt &&
echo "Version $i of file a/b/c/$n" > src/a/b/c/file.$n.txt &&
echo "Version $i of file d/$n" > src/d/file.$n.txt &&
- echo "Version $i of file d/e/$n" > src/d/e/file.$n.txt &&
+ echo "Version $i of file d/f/$n" > src/d/f/file.$n.txt &&
git -C src add . &&
+ test_tick &&
git -C src commit -m "Iteration $n" || return 1
done
done
@@ -41,6 +42,53 @@ test_expect_success 'setup bare clone for server' '
git -C srv.bare config --local uploadpack.allowanysha1inwant 1
'
+# Create a version of the repo with branches for testing revision
+# arguments like --all, --first-parent, and --since.
+#
+# main: 8 commits (linear) + merge of side branch
+# 48 original blobs + 4 side blobs = 52 blobs from main HEAD
+# side: 2 commits adding s/file.{1,2}.txt (v1, v2), merged into main
+# other: 1 commit adding o/file.{1,2}.txt (not merged)
+# 54 total blobs reachable from --all
+test_expect_success 'setup branched repo for revision tests' '
+ git clone src src-revs &&
+
+ # Side branch from tip of main with unique files
+ git -C src-revs checkout -b side HEAD &&
+ mkdir -p src-revs/s &&
+ echo "Side version 1 of file 1" >src-revs/s/file.1.txt &&
+ echo "Side version 1 of file 2" >src-revs/s/file.2.txt &&
+ test_tick &&
+ git -C src-revs add . &&
+ git -C src-revs commit -m "Side commit 1" &&
+
+ echo "Side version 2 of file 1" >src-revs/s/file.1.txt &&
+ echo "Side version 2 of file 2" >src-revs/s/file.2.txt &&
+ test_tick &&
+ git -C src-revs add . &&
+ git -C src-revs commit -m "Side commit 2" &&
+
+ # Merge side into main
+ git -C src-revs checkout main &&
+ test_tick &&
+ git -C src-revs merge side --no-ff -m "Merge side branch" &&
+
+ # Other branch (not merged) for --all testing
+ git -C src-revs checkout -b other main~1 &&
+ mkdir -p src-revs/o &&
+ echo "Other content 1" >src-revs/o/file.1.txt &&
+ echo "Other content 2" >src-revs/o/file.2.txt &&
+ test_tick &&
+ git -C src-revs add . &&
+ git -C src-revs commit -m "Other commit" &&
+
+ git -C src-revs checkout main &&
+
+ git clone --bare "file://$(pwd)/src-revs" srv-revs.bare &&
+ git -C srv-revs.bare config --local uploadpack.allowfilter 1 &&
+ git -C srv-revs.bare config --local uploadpack.allowanysha1inwant 1
+'
+
# do basic partial clone from "srv.bare"
test_expect_success 'do partial clone 1, backfill gets all objects' '
git clone --no-checkout --filter=blob:none \
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* [PATCH v2 3/6] backfill: accept revision arguments
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
@ 2026-03-23 11:40 ` Derrick Stolee via GitGitGadget
2026-03-24 7:59 ` Patrick Steinhardt
2026-03-23 11:40 ` [PATCH v2 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
6 siblings, 1 reply; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-23 11:40 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The existing implementation of 'git backfill' only includes downloading
missing blobs reachable from HEAD. Advanced uses may desire more general
commit limiting options, such as '--all' for all references, specifying a
commit range via negative references, or specifying a recency of use such as
with '--since=<date>'.
All of these options are available if we use setup_revisions() to parse the
unknown arguments with the revision machinery. This opens up a large number
of possibilities, only a small set of which are tested here.
For documentation, we avoid duplicating the option documentation and instead
link to the documentation of 'git rev-list'.
Note that these arguments currently allow specifying a pathspec, which
modifies the commit history checks but does not limit the paths used in the
backfill logic. This will be updated in a future change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-backfill.adoc | 5 +-
builtin/backfill.c | 19 ++--
t/t5620-backfill.sh | 156 ++++++++++++++++++++++++++++++++
3 files changed, 173 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
index b8394dcf22..246ab417c2 100644
--- a/Documentation/git-backfill.adoc
+++ b/Documentation/git-backfill.adoc
@@ -63,9 +63,12 @@ OPTIONS
current sparse-checkout. If the sparse-checkout feature is enabled,
then `--sparse` is assumed and can be disabled with `--no-sparse`.
+You may also specify the commit limiting options from linkgit:git-rev-list[1].
+
SEE ALSO
--------
-linkgit:git-clone[1].
+linkgit:git-clone[1],
+linkgit:git-rev-list[1]
GIT
---
diff --git a/builtin/backfill.c b/builtin/backfill.c
index e80fc1b694..90c9d84793 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -35,6 +35,7 @@ struct backfill_context {
struct oid_array current_batch;
size_t min_batch_size;
int sparse;
+ struct rev_info revs;
};
static void backfill_context_clear(struct backfill_context *ctx)
@@ -80,7 +81,6 @@ static int fill_missing_blobs(const char *path UNUSED,
static int do_backfill(struct backfill_context *ctx)
{
- struct rev_info revs;
struct path_walk_info info = PATH_WALK_INFO_INIT;
int ret;
@@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
}
}
- repo_init_revisions(ctx->repo, &revs, "");
- handle_revision_arg("HEAD", &revs, 0, 0);
+ /* Walk from HEAD if otherwise unspecified. */
+ if (!ctx->revs.pending.nr)
+ add_head_to_pending(&ctx->revs);
info.blobs = 1;
info.tags = info.commits = info.trees = 0;
- info.revs = &revs;
+ info.revs = &ctx->revs;
info.path_fn = fill_missing_blobs;
info.path_fn_data = ctx;
@@ -109,7 +110,6 @@ static int do_backfill(struct backfill_context *ctx)
download_batch(ctx);
path_walk_info_clear(&info);
- release_revisions(&revs);
return ret;
}
@@ -121,6 +121,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
.current_batch = OID_ARRAY_INIT,
.min_batch_size = 50000,
.sparse = 0,
+ .revs = REV_INFO_INIT,
};
struct option options[] = {
OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
@@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
builtin_backfill_usage, options);
argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
- 0);
+ PARSE_OPT_KEEP_UNKNOWN_OPT |
+ PARSE_OPT_KEEP_ARGV0 |
+ PARSE_OPT_KEEP_DASHDASH);
+
+ repo_init_revisions(repo, &ctx.revs, prefix);
+ argc = setup_revisions(argc, argv, &ctx.revs, NULL);
repo_config(repo, git_default_config, NULL);
@@ -143,5 +149,6 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
result = do_backfill(&ctx);
backfill_context_clear(&ctx);
+ release_revisions(&ctx.revs);
return result;
}
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 1331949be4..db66d8b614 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -224,6 +224,162 @@ test_expect_success 'backfill --sparse without cone mode (negative)' '
test_line_count = 12 missing
'
+test_expect_success 'backfill with revision range' '
+ test_when_finished rm -rf backfill-revs &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-revs &&
+
+ # No blobs yet
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ git -C backfill-revs backfill HEAD~2..HEAD &&
+
+ # 30 objects downloaded.
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 18 missing
+'
+
+test_expect_success 'backfill with revisions over stdin' '
+ test_when_finished rm -rf backfill-revs &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-revs &&
+
+ # No blobs yet
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ cat >in <<-EOF &&
+ HEAD
+ ^HEAD~2
+ EOF
+
+ git -C backfill-revs backfill --stdin <in &&
+
+ # 30 objects downloaded.
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 18 missing
+'
+
+test_expect_success 'backfill with prefix pathspec' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The pathspec should limit the downloaded blobs to
+ # only those matching the prefix "d/f", but currently all
+ # blobs are downloaded.
+ git -C backfill-path backfill HEAD -- d/f &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with multiple pathspecs' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The pathspecs should limit the downloaded blobs to
+ # only those matching "d/f" or "a", but currently all blobs
+ # are downloaded.
+ git -C backfill-path backfill HEAD -- d/f a &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with wildcard pathspec' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The wildcard pathspec should limit downloaded blobs,
+ # but currently all blobs are downloaded.
+ git -C backfill-path backfill HEAD -- "d/file.*.txt" &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with --all' '
+ test_when_finished rm -rf backfill-all &&
+ git clone --no-checkout --filter=blob:none \
+ "file://$(pwd)/srv-revs.bare" backfill-all &&
+
+ # All blobs from all refs are missing
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 54 missing &&
+
+ # Backfill from HEAD gets main blobs only
+ git -C backfill-all backfill HEAD &&
+
+ # Other branch blobs still missing
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 2 missing &&
+
+ # Backfill with --all gets everything
+ git -C backfill-all backfill --all &&
+
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with --first-parent' '
+ test_when_finished rm -rf backfill-fp &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv-revs.bare" backfill-fp &&
+
+ git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 52 missing &&
+
+ # --first-parent skips the side branch commits, so
+ # s/file.{1,2}.txt v1 blobs (only in side commit 1) are missed.
+ git -C backfill-fp backfill --first-parent HEAD &&
+
+ git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 2 missing
+'
+
+test_expect_success 'backfill with --since' '
+ test_when_finished rm -rf backfill-since &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv-revs.bare" backfill-since &&
+
+ git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 52 missing &&
+
+ # Use a cutoff between commits 4 and 5 (between v1 and v2
+ # iterations). Commits 5-8 still carry v1 of files 2-4 in
+ # their trees, but v1 of file.1.txt is only in commits 1-4.
+ SINCE=$(git -C backfill-since log --first-parent --reverse \
+ --format=%ct HEAD~1 | sed -n 5p) &&
+ git -C backfill-since backfill --since="@$((SINCE - 1))" HEAD &&
+
+ # 6 missing: v1 of file.1.txt in all 6 directories
+ git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 6 missing
+'
+
. "$TEST_DIRECTORY"/lib-httpd.sh
start_httpd
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* Re: [PATCH v2 3/6] backfill: accept revision arguments
2026-03-23 11:40 ` [PATCH v2 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
@ 2026-03-24 7:59 ` Patrick Steinhardt
2026-03-26 12:55 ` Derrick Stolee
0 siblings, 1 reply; 46+ messages in thread
From: Patrick Steinhardt @ 2026-03-24 7:59 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, gitster, Kristoffer Haugsbakk, r.siddharth.shrimali,
Derrick Stolee
On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index e80fc1b694..90c9d84793 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
> builtin_backfill_usage, options);
>
> argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
> - 0);
> + PARSE_OPT_KEEP_UNKNOWN_OPT |
> + PARSE_OPT_KEEP_ARGV0 |
> + PARSE_OPT_KEEP_DASHDASH);
> +
> + repo_init_revisions(repo, &ctx.revs, prefix);
> + argc = setup_revisions(argc, argv, &ctx.revs, NULL);
We should probably die here in case we still have unknown arguments.
Patrick
^ permalink raw reply [flat|nested] 46+ messages in thread
* Re: [PATCH v2 3/6] backfill: accept revision arguments
2026-03-24 7:59 ` Patrick Steinhardt
@ 2026-03-26 12:55 ` Derrick Stolee
0 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee @ 2026-03-26 12:55 UTC (permalink / raw)
To: Patrick Steinhardt, Derrick Stolee via GitGitGadget
Cc: git, gitster, Kristoffer Haugsbakk, r.siddharth.shrimali
On 3/24/2026 3:59 AM, Patrick Steinhardt wrote:
> On Mon, Mar 23, 2026 at 11:40:16AM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/builtin/backfill.c b/builtin/backfill.c
>> index e80fc1b694..90c9d84793 100644
>> --- a/builtin/backfill.c
>> +++ b/builtin/backfill.c
>> @@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>> builtin_backfill_usage, options);
>>
>> argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
>> - 0);
>> + PARSE_OPT_KEEP_UNKNOWN_OPT |
>> + PARSE_OPT_KEEP_ARGV0 |
>> + PARSE_OPT_KEEP_DASHDASH);
>> +
>> + repo_init_revisions(repo, &ctx.revs, prefix);
>> + argc = setup_revisions(argc, argv, &ctx.revs, NULL);
>
> We should probably die here in case we still have unknown arguments.
That is indeed the fix for the bad test in patch 6. I'll make the
necessary update in v3's patch 6 along with the test for it.
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v2 4/6] backfill: work with prefix pathspecs
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2026-03-23 11:40 ` [PATCH v2 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
@ 2026-03-23 11:40 ` Derrick Stolee via GitGitGadget
2026-03-24 7:59 ` Patrick Steinhardt
2026-03-23 11:40 ` [PATCH v2 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
6 siblings, 1 reply; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-23 11:40 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The previous change allowed specifying revision arguments over the 'git
backfill' command-line. This created the opportunity for restricting the
initial commit set by filtering the revision walk through a pathspec. Other
than filtering the commit set (and thereby the root trees), this did not
restrict the path-walk implementation of 'git backfill' and did not restrict
the blobs that were downloaded to only those matching the pathspec.
Update the path-walk API to accept certain kinds of pathspecs and to
silently ignore anything too complex, for now. We will update this in the
next change to properly restrict to even complex pathspecs.
The current behavior focuses on pathspecs that match paths exactly. This
includes exact filenames, including directory names as prefixes. Pathspecs
containing wildcards or magic are cleared so the path walk downloads all
blobs, as before.
The reason for this restriction is to allow for a faster execution by
pruning the path walk to only trees that could contribute towards one of
those paths as a parent directory.
The test directory 'd/f/' (next to 'd/file*.txt') was prepared in a
previous commit to exercise the subtlety in prefix matching.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
path-walk.c | 40 ++++++++++++++++++++++++++++++++++++++++
path.c | 2 +-
path.h | 6 ++++++
t/t5620-backfill.sh | 16 ++++++----------
4 files changed, 53 insertions(+), 11 deletions(-)
diff --git a/path-walk.c b/path-walk.c
index 364e4cfa19..0d640e2f24 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -11,6 +11,7 @@
#include "list-objects.h"
#include "object.h"
#include "oid-array.h"
+#include "path.h"
#include "prio-queue.h"
#include "repository.h"
#include "revision.h"
@@ -206,6 +207,34 @@ static int add_tree_entries(struct path_walk_context *ctx,
match != MATCHED)
continue;
}
+ if (ctx->revs->prune_data.nr) {
+ struct pathspec *pd = &ctx->revs->prune_data;
+ bool found = false;
+
+ /* remove '/' for these checks. */
+ path.buf[path.len - 1] = 0;
+
+ for (int i = 0; i < pd->nr; i++) {
+ struct pathspec_item *item = &pd->items[i];
+
+ /*
+ * Continue if either is a directory prefix
+ * of the other.
+ */
+ if (dir_prefix(path.buf, item->match) ||
+ dir_prefix(item->match, path.buf)) {
+ found = true;
+ break;
+ }
+ }
+
+ /* return '/' after these checks. */
+ path.buf[path.len - 1] = '/';
+
+ /* Skip paths that do not match the prefix. */
+ if (!found)
+ continue;
+ }
add_path_to_list(ctx, path.buf, type, &entry.oid,
!(o->flags & UNINTERESTING));
@@ -481,6 +510,17 @@ int walk_objects_by_path(struct path_walk_info *info)
if (info->tags)
info->revs->tag_objects = 1;
+ if (ctx.revs->prune_data.nr) {
+ /*
+ * Only exact prefix pathspecs are currently supported.
+ * Clear any wildcard or magic pathspecs to avoid
+ * incorrect prefix matching.
+ */
+ if (ctx.revs->prune_data.has_wildcard ||
+ ctx.revs->prune_data.magic)
+ clear_pathspec(&ctx.revs->prune_data);
+ }
+
/* Insert a single list for the root tree into the paths. */
CALLOC_ARRAY(root_tree_list, 1);
root_tree_list->type = OBJ_TREE;
diff --git a/path.c b/path.c
index d726537622..aebb10b2e9 100644
--- a/path.c
+++ b/path.c
@@ -57,7 +57,7 @@ static void strbuf_cleanup_path(struct strbuf *sb)
strbuf_remove(sb, 0, path - sb->buf);
}
-static int dir_prefix(const char *buf, const char *dir)
+int dir_prefix(const char *buf, const char *dir)
{
int len = strlen(dir);
return !strncmp(buf, dir, len) &&
diff --git a/path.h b/path.h
index 0ec95a0b07..829fafd7e9 100644
--- a/path.h
+++ b/path.h
@@ -114,6 +114,12 @@ const char *repo_submodule_path_replace(struct repository *repo,
const char *fmt, ...)
__attribute__((format (printf, 4, 5)));
+/*
+ * Given a directory name 'dir' (not ending with a trailing '/'),
+ * determine if 'buf' is equal to 'dir' or has prefix 'dir'+'/'.
+ */
+int dir_prefix(const char *buf, const char *dir);
+
void report_linked_checkout_garbage(struct repository *r);
/*
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index db66d8b614..52f6484ca1 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -273,13 +273,11 @@ test_expect_success 'backfill with prefix pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # TODO: The pathspec should limit the downloaded blobs to
- # only those matching the prefix "d/f", but currently all
- # blobs are downloaded.
- git -C backfill-path backfill HEAD -- d/f &&
+ git -C backfill-path backfill HEAD -- d/f 2>err &&
+ test_must_be_empty err &&
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
- test_line_count = 0 missing
+ test_line_count = 40 missing
'
test_expect_success 'backfill with multiple pathspecs' '
@@ -292,13 +290,11 @@ test_expect_success 'backfill with multiple pathspecs' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # TODO: The pathspecs should limit the downloaded blobs to
- # only those matching "d/f" or "a", but currently all blobs
- # are downloaded.
- git -C backfill-path backfill HEAD -- d/f a &&
+ git -C backfill-path backfill HEAD -- d/f a 2>err &&
+ test_must_be_empty err &&
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
- test_line_count = 0 missing
+ test_line_count = 16 missing
'
test_expect_success 'backfill with wildcard pathspec' '
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* Re: [PATCH v2 4/6] backfill: work with prefix pathspecs
2026-03-23 11:40 ` [PATCH v2 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
@ 2026-03-24 7:59 ` Patrick Steinhardt
2026-03-26 12:58 ` Derrick Stolee
0 siblings, 1 reply; 46+ messages in thread
From: Patrick Steinhardt @ 2026-03-24 7:59 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, gitster, Kristoffer Haugsbakk, r.siddharth.shrimali,
Derrick Stolee
On Mon, Mar 23, 2026 at 11:40:17AM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 364e4cfa19..0d640e2f24 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -206,6 +207,34 @@ static int add_tree_entries(struct path_walk_context *ctx,
> match != MATCHED)
> continue;
> }
> + if (ctx->revs->prune_data.nr) {
> + struct pathspec *pd = &ctx->revs->prune_data;
> + bool found = false;
> +
> + /* remove '/' for these checks. */
> + path.buf[path.len - 1] = 0;
Hm. Is this _always_ safe to do? We add the directory separator a few
lines further up, but only in the case where `type == OBJ_TREE`. So in
reverse this may mean that there are cases where we don't have a
trailing '/'.
Maybe we should instead:
did_strip_suffix = strbuf_strip_suffix(path, "/");
...
if (did_strip_suffix)
strbuf_addch(path, "/");
Patrick
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [PATCH v2 4/6] backfill: work with prefix pathspecs
2026-03-24 7:59 ` Patrick Steinhardt
@ 2026-03-26 12:58 ` Derrick Stolee
0 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee @ 2026-03-26 12:58 UTC (permalink / raw)
To: Patrick Steinhardt, Derrick Stolee via GitGitGadget
Cc: git, gitster, Kristoffer Haugsbakk, r.siddharth.shrimali
On 3/24/2026 3:59 AM, Patrick Steinhardt wrote:
> On Mon, Mar 23, 2026 at 11:40:17AM +0000, Derrick Stolee via GitGitGadget wrote:
>> diff --git a/path-walk.c b/path-walk.c
>> index 364e4cfa19..0d640e2f24 100644
>> --- a/path-walk.c
>> +++ b/path-walk.c
>> @@ -206,6 +207,34 @@ static int add_tree_entries(struct path_walk_context *ctx,
>> match != MATCHED)
>> continue;
>> }
>> + if (ctx->revs->prune_data.nr) {
>> + struct pathspec *pd = &ctx->revs->prune_data;
>> + bool found = false;
>> +
>> + /* remove '/' for these checks. */
>> + path.buf[path.len - 1] = 0;
>
> Hm. Is this _always_ safe to do? We add the directory separator a few
> lines further up, but only in the case where `type == OBJ_TREE`. So in
> reverse this may mean that there are cases where we don't have a
> trailing '/'.
>
> Maybe we should instead:
>
> did_strip_suffix = strbuf_strip_suffix(path, "/");
>
> ...
>
> if (did_strip_suffix)
> strbuf_addch(path, "/");
This is much cleaner, too! Thanks.
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v2 5/6] path-walk: support wildcard pathspecs for blob filtering
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2026-03-23 11:40 ` [PATCH v2 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
@ 2026-03-23 11:40 ` Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
6 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-23 11:40 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Previously, walk_objects_by_path() silently ignored pathspecs containing
wildcards or magic by clearing them. This caused all blobs to be
downloaded regardless of the given pathspec. Wildcard pathspecs like
"d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
during 'git backfill').
Support wildcard pathspecs by making two changes:
1. Add an 'exact_pathspecs' flag to path_walk_context. When the
pathspec has no wildcards or magic, set this flag and use the
existing fast-path prefix matching in add_tree_entries(). When
wildcards are present, skip that block since prefix matching
cannot handle glob patterns.
2. Add a match_pathspec() check in walk_path() to filter out blobs
whose full path does not match the pathspec. This provides the
actual blob-level filtering for wildcard pathspecs.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
path-walk.c | 22 +++++++++++++---------
t/t5620-backfill.sh | 7 +++----
2 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/path-walk.c b/path-walk.c
index 0d640e2f24..6b83e0e1d5 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -63,6 +63,8 @@ struct path_walk_context {
*/
struct prio_queue path_stack;
struct strset path_stack_pushed;
+
+ unsigned exact_pathspecs:1;
};
static int compare_by_type(const void *one, const void *two, void *cb_data)
@@ -207,7 +209,7 @@ static int add_tree_entries(struct path_walk_context *ctx,
match != MATCHED)
continue;
}
- if (ctx->revs->prune_data.nr) {
+ if (ctx->revs->prune_data.nr && ctx->exact_pathspecs) {
struct pathspec *pd = &ctx->revs->prune_data;
bool found = false;
@@ -303,6 +305,13 @@ static int walk_path(struct path_walk_context *ctx,
return 0;
}
+ if (list->type == OBJ_BLOB &&
+ ctx->revs->prune_data.nr &&
+ !match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
+ path, strlen(path), 0,
+ NULL, 0))
+ return 0;
+
/* Evaluate function pointer on this data, if requested. */
if ((list->type == OBJ_TREE && ctx->info->trees) ||
(list->type == OBJ_BLOB && ctx->info->blobs) ||
@@ -511,14 +520,9 @@ int walk_objects_by_path(struct path_walk_info *info)
info->revs->tag_objects = 1;
if (ctx.revs->prune_data.nr) {
- /*
- * Only exact prefix pathspecs are currently supported.
- * Clear any wildcard or magic pathspecs to avoid
- * incorrect prefix matching.
- */
- if (ctx.revs->prune_data.has_wildcard ||
- ctx.revs->prune_data.magic)
- clear_pathspec(&ctx.revs->prune_data);
+ if (!ctx.revs->prune_data.has_wildcard &&
+ !ctx.revs->prune_data.magic)
+ ctx.exact_pathspecs = 1;
}
/* Insert a single list for the root tree into the paths. */
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 52f6484ca1..c6f54ee91c 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -307,12 +307,11 @@ test_expect_success 'backfill with wildcard pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # TODO: The wildcard pathspec should limit downloaded blobs,
- # but currently all blobs are downloaded.
- git -C backfill-path backfill HEAD -- "d/file.*.txt" &&
+ git -C backfill-path backfill HEAD -- "d/file.*.txt" 2>err &&
+ test_must_be_empty err &&
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
- test_line_count = 0 missing
+ test_line_count = 40 missing
'
test_expect_success 'backfill with --all' '
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* [PATCH v2 6/6] t5620: test backfill's unknown argument handling
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
` (4 preceding siblings ...)
2026-03-23 11:40 ` [PATCH v2 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
@ 2026-03-23 11:40 ` Derrick Stolee via GitGitGadget
2026-03-23 15:29 ` Junio C Hamano
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
6 siblings, 1 reply; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-23 11:40 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Before the recent changes to parse rev-list arguments inside of 'git
backfill', the builtin would take arbitrary arguments without complaint (and
ignore them). This was noticed and a patch was sent [1] which motivates this
change to encode this behavior in test.
[1] https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
Reported-by: Siddharth Shrimali <r.siddharth.shrimali@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t5620-backfill.sh | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index c6f54ee91c..85740f1f13 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -7,6 +7,14 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
. ./test-lib.sh
+test_expect_success 'backfill rejects unexpected arguments' '
+ test_must_fail git backfill unexpected-arg 2>err &&
+ test_grep "ambiguous argument .*unexpected-arg" err &&
+
+ test_must_fail git backfill --all --firt-parent unexpected-arg 2>err &&
+ test_grep "ambiguous argument .*unexpected-arg" err
+'
+
# We create objects in the 'src' repo.
test_expect_success 'setup repo for object creation' '
echo "{print \$1}" >print_1.awk &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* Re: [PATCH v2 6/6] t5620: test backfill's unknown argument handling
2026-03-23 11:40 ` [PATCH v2 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
@ 2026-03-23 15:29 ` Junio C Hamano
2026-03-23 20:39 ` Derrick Stolee
0 siblings, 1 reply; 46+ messages in thread
From: Junio C Hamano @ 2026-03-23 15:29 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Derrick Stolee <stolee@gmail.com>
>
> Before the recent changes to parse rev-list arguments inside of 'git
> backfill', the builtin would take arbitrary arguments without complaint (and
> ignore them). This was noticed and a patch was sent [1] which motivates this
> change to encode this behavior in test.
>
> [1] https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
>
> Reported-by: Siddharth Shrimali <r.siddharth.shrimali@gmail.com>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
> ---
> t/t5620-backfill.sh | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
> index c6f54ee91c..85740f1f13 100755
> --- a/t/t5620-backfill.sh
> +++ b/t/t5620-backfill.sh
> @@ -7,6 +7,14 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>
> . ./test-lib.sh
>
> +test_expect_success 'backfill rejects unexpected arguments' '
> + test_must_fail git backfill unexpected-arg 2>err &&
> + test_grep "ambiguous argument .*unexpected-arg" err &&
> +
> + test_must_fail git backfill --all --firt-parent unexpected-arg 2>err &&
> + test_grep "ambiguous argument .*unexpected-arg" err
> +'
Hmph, I would have expected that an earlier --firt-parent on the
command line would trigger "unknown option" instead.
Having said that, if the code lets the setup_revisions() parse the
command line, the usual "unless disambiguated with a double-dash
'--', stop at the first non-revision and take everything as paths
but for safety all of them must refer to an existing path in the
working tree" behaviour should trigger, and it is not specific to
"backfill", and may already be tested centrally (if not, I do not
object to such a new set of tests).
For any cmd that take revisions and pathspec (e.g., log, rev-list,
grep) these should hold true:
$ git $cmd [<options>]... Makefile HEAD
Without disambiguation the command should say "Ah, Makefile
is not a revision, so we will see no more revisions, and
everything, including the current one we are looking at, must be
an existing path on the working tree", and barfs on HEAD that
does not exist as a file/directory.
$ git $cmd [<options>]... Makefile -- HEAD
With disambiguation, the command should verify everything before
the double-dash to be a rev, and barf that Makefile is not a
rev.
$ git $cmd [<options>]... -- Makefile HEAD
With disambiguation, the command should take everything after
the double-dash to be a pathspec element without barfing. After
all, it may be referring to a path that used to exist in some
revision the command will look at.
Thanks.
^ permalink raw reply [flat|nested] 46+ messages in thread* Re: [PATCH v2 6/6] t5620: test backfill's unknown argument handling
2026-03-23 15:29 ` Junio C Hamano
@ 2026-03-23 20:39 ` Derrick Stolee
0 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee @ 2026-03-23 20:39 UTC (permalink / raw)
To: Junio C Hamano, Derrick Stolee via GitGitGadget
Cc: git, Kristoffer Haugsbakk, r.siddharth.shrimali, ps
On 3/23/2026 11:29 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Derrick Stolee <stolee@gmail.com>
>>
>> Before the recent changes to parse rev-list arguments inside of 'git
>> backfill', the builtin would take arbitrary arguments without complaint (and
>> ignore them). This was noticed and a patch was sent [1] which motivates this
>> change to encode this behavior in test.
>>
>> [1] https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
>>
>> Reported-by: Siddharth Shrimali <r.siddharth.shrimali@gmail.com>
>> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>> ---
>> t/t5620-backfill.sh | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
>> index c6f54ee91c..85740f1f13 100755
>> --- a/t/t5620-backfill.sh
>> +++ b/t/t5620-backfill.sh
>> @@ -7,6 +7,14 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>>
>> . ./test-lib.sh
>>
>> +test_expect_success 'backfill rejects unexpected arguments' '
>> + test_must_fail git backfill unexpected-arg 2>err &&
>> + test_grep "ambiguous argument .*unexpected-arg" err &&
>> +
>> + test_must_fail git backfill --all --firt-parent unexpected-arg 2>err &&
>> + test_grep "ambiguous argument .*unexpected-arg" err
>> +'
>
> Hmph, I would have expected that an earlier --firt-parent on the
> command line would trigger "unknown option" instead.
Interesting that my mistype has demonstrated an interesting
behavior here. It turns out that random options starting with
'--' are accepted here, including --unexpected-arg.
This means that we actually have room here for some improvement!
I'll see what can be done to make even these arguments be seen
as failures.
> Having said that, if the code lets the setup_revisions() parse the
> command line, the usual "unless disambiguated with a double-dash
> '--', stop at the first non-revision and take everything as paths
> but for safety all of them must refer to an existing path in the
> working tree" behaviour should trigger, and it is not specific to
> "backfill", and may already be tested centrally (if not, I do not
> object to such a new set of tests).
Thanks,
-Stolee
^ permalink raw reply [flat|nested] 46+ messages in thread
* [PATCH v3 0/6] backfill: accept revision arguments
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
` (5 preceding siblings ...)
2026-03-23 11:40 ` [PATCH v2 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
@ 2026-03-26 15:14 ` Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
` (6 more replies)
6 siblings, 7 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-26 15:14 UTC (permalink / raw)
To: git; +Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee
The git backfill command assists in downloading missing blobs for blobless
partial clones. However, its current version lacks some valuable
functionality. It currently:
1. Only walks commits reachable from HEAD.
2. It walks all reachable commits to the full history.
3. It can focus on the current sparse-checkout definition, but otherwise it
doesn't focus on a given pathspec.
All of these are being updated by this patch series, which allows rev-list
options to impact the path-walk. These include:
1. Specifying a given refspec, including --all.
2. Modifying the commit walk, including --first-parent, commit ranges, or
recency using --since.
3. Modifying the set of paths to download using pathspecs.
One particularly valuable situation here is that now a user can run git
backfill -- <path> to download all versions of a specific file or a specific
directory, accelerating history queries within that path without downloading
more than necessary. This can accelerate git blame or git log -L for these
paths, where normally those commands download missing blobs one-by-one
during its diff algorithms.
This patch series is organized in the following way:
1. A missing #include is added to prevent future compilation issues.
2. The test repo in t5620 is expanded to make later tests more interesting.
3. The backfill builtin parses the rev-list arguments. We test the top
arguments that work as expected, though the pathspec arguments need
extra work.
4. Update the path-walk logic to work efficiently with some pathspecs, such
as fixed prefix pathspecs, accelerating the computation.
5. For more complicated pathspecs, do a post-filter in builtin/backfill.c
instead of restricting the walk in the path-walk API.
The main goal of this series is to make such customizations possible, and to
improve performance where common use cases are expected. I'm open to
feedback as to whether we should consider more detailed performance analysis
or whether we should wait for how users interact with these new options
before overoptimizing unlikely use cases.
Updates in v2
=============
* Hard stops are replaced with a comma (and no punctuation) in the docs.
* add_head_to_pending() simplifies some code.
* My poor explanation of "starting commits" is updated.
* Language around temporary prefix restriction is clarified.
* Prefix match logic is simplified with dir_prefix().
* Temporary memory leak (introduced in v1's patch 4 and removed in v1's
patch 5) is removed in v2's patch 4.
* Commit pruning is reenabled in v2's patch 5. There was no need for that
with the way the logic works in the patch.
* Add a new patch with a test demonstrating the new behavior that was being
discussed in [1].
[1]
https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
Updates in v3
=============
* Fixed the argument checks to actually catch unknown arguments, because
the revision machinery will skip unknown options starting with --.
Thanks, -Stolee
Derrick Stolee (6):
revision: include object-name.h
t5620: prepare branched repo for revision tests
backfill: accept revision arguments
backfill: work with prefix pathspecs
path-walk: support wildcard pathspecs for blob filtering
t5620: test backfill's unknown argument handling
Documentation/git-backfill.adoc | 5 +-
builtin/backfill.c | 22 +++-
path-walk.c | 43 +++++++
path.c | 2 +-
path.h | 6 +
revision.h | 1 +
t/t5620-backfill.sh | 211 +++++++++++++++++++++++++++++++-
7 files changed, 280 insertions(+), 10 deletions(-)
base-commit: 67ad42147a7acc2af6074753ebd03d904476118f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2070%2Fderrickstolee%2Fbackfill-revs-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2070/derrickstolee/backfill-revs-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/2070
Range-diff vs v2:
1: fda0239103 = 1: fda0239103 revision: include object-name.h
2: 55a45b2fc8 = 2: 55a45b2fc8 t5620: prepare branched repo for revision tests
3: 610a162973 = 3: 610a162973 backfill: accept revision arguments
4: f8f2c61326 ! 4: 7223124fb3 backfill: work with prefix pathspecs
@@ path-walk.c: static int add_tree_entries(struct path_walk_context *ctx,
+ if (ctx->revs->prune_data.nr) {
+ struct pathspec *pd = &ctx->revs->prune_data;
+ bool found = false;
++ int did_strip_suffix = strbuf_strip_suffix(&path, "/");
+
-+ /* remove '/' for these checks. */
-+ path.buf[path.len - 1] = 0;
+
+ for (int i = 0; i < pd->nr; i++) {
+ struct pathspec_item *item = &pd->items[i];
@@ path-walk.c: static int add_tree_entries(struct path_walk_context *ctx,
+ }
+ }
+
-+ /* return '/' after these checks. */
-+ path.buf[path.len - 1] = '/';
++ if (did_strip_suffix)
++ strbuf_addch(&path, '/');
+
+ /* Skip paths that do not match the prefix. */
+ if (!found)
5: 1168edfb96 ! 5: 1ea278bd10 path-walk: support wildcard pathspecs for blob filtering
@@ path-walk.c: static int add_tree_entries(struct path_walk_context *ctx,
+ if (ctx->revs->prune_data.nr && ctx->exact_pathspecs) {
struct pathspec *pd = &ctx->revs->prune_data;
bool found = false;
-
+ int did_strip_suffix = strbuf_strip_suffix(&path, "/");
@@ path-walk.c: static int walk_path(struct path_walk_context *ctx,
return 0;
}
6: 9699650aa7 ! 6: b6423f9595 t5620: test backfill's unknown argument handling
@@ Commit message
Before the recent changes to parse rev-list arguments inside of 'git
backfill', the builtin would take arbitrary arguments without complaint (and
- ignore them). This was noticed and a patch was sent [1] which motivates this
- change to encode this behavior in test.
+ ignore them). This was noticed and a patch was sent [1] which motivates
+ this change.
[1] https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
+ Note that the revision machinery can output an "ambiguous argument"
+ warning if a value not starting with '--' is found and doesn't make
+ sense as a reference or a pathspec. For unrecognized arguments starting
+ with '--' we need to add logic into builtin/backfill.c to catch leftover
+ arguments.
+
Reported-by: Siddharth Shrimali <r.siddharth.shrimali@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
+ ## builtin/backfill.c ##
+@@ builtin/backfill.c: int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
+ repo_init_revisions(repo, &ctx.revs, prefix);
+ argc = setup_revisions(argc, argv, &ctx.revs, NULL);
+
++ if (argc > 1)
++ die(_("unrecognized argument: %s"), argv[1]);
++
+ repo_config(repo, git_default_config, NULL);
+
+ if (ctx.sparse < 0)
+
## t/t5620-backfill.sh ##
@@ t/t5620-backfill.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
@@ t/t5620-backfill.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+ test_must_fail git backfill unexpected-arg 2>err &&
+ test_grep "ambiguous argument .*unexpected-arg" err &&
+
-+ test_must_fail git backfill --all --firt-parent unexpected-arg 2>err &&
-+ test_grep "ambiguous argument .*unexpected-arg" err
++ test_must_fail git backfill --all --unexpected-arg --first-parent 2>err &&
++ test_grep "unrecognized argument: --unexpected-arg" err
+'
+
# We create objects in the 'src' repo.
--
gitgitgadget
^ permalink raw reply [flat|nested] 46+ messages in thread* [PATCH v3 1/6] revision: include object-name.h
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
@ 2026-03-26 15:14 ` Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
` (5 subsequent siblings)
6 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-26 15:14 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The REV_INFO_INIT macro includes a use of the DEFAULT_ABBREV macro, which is
defined in object-name.h. Include it in revision.h so consumers of
REV_INFO_INIT do not need to include this hidden dependency.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
revision.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/revision.h b/revision.h
index b36acfc2d9..18c9bbd822 100644
--- a/revision.h
+++ b/revision.h
@@ -4,6 +4,7 @@
#include "commit.h"
#include "grep.h"
#include "notes.h"
+#include "object-name.h"
#include "oidset.h"
#include "pretty.h"
#include "diff.h"
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* [PATCH v3 2/6] t5620: prepare branched repo for revision tests
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
@ 2026-03-26 15:14 ` Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (4 subsequent siblings)
6 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-26 15:14 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Prepare the test infrastructure for upcoming changes that teach 'git
backfill' to accept revision arguments and pathspecs.
Add test_tick before each commit in the setup loop so that commit dates
are deterministic. This enables reliable testing with '--since'.
Rename the 'd/e/' directory to 'd/f/' so that the prefix 'd/f' is
ambiguous with the files 'd/file.*.txt'. This exercises the subtlety
in prefix pathspec matching that will be added in a later commit.
Create a branched version of the test repository (src-revs) with:
- A 'side' branch merged into main, adding s/file.{1,2}.txt with
two versions (4 new blobs, 52 total from main HEAD).
- An unmerged 'other' branch adding o/file.{1,2}.txt (2 more blobs,
54 total reachable from --all).
This structure makes --all, --first-parent, and --since produce
meaningfully different results when used with 'git backfill'.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
t/t5620-backfill.sh | 52 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 50 insertions(+), 2 deletions(-)
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 58c81556e7..1331949be4 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -15,7 +15,7 @@ test_expect_success 'setup repo for object creation' '
git init src &&
mkdir -p src/a/b/c &&
- mkdir -p src/d/e &&
+ mkdir -p src/d/f &&
for i in 1 2
do
@@ -26,8 +26,9 @@ test_expect_success 'setup repo for object creation' '
echo "Version $i of file a/b/$n" > src/a/b/file.$n.txt &&
echo "Version $i of file a/b/c/$n" > src/a/b/c/file.$n.txt &&
echo "Version $i of file d/$n" > src/d/file.$n.txt &&
- echo "Version $i of file d/e/$n" > src/d/e/file.$n.txt &&
+ echo "Version $i of file d/f/$n" > src/d/f/file.$n.txt &&
git -C src add . &&
+ test_tick &&
git -C src commit -m "Iteration $n" || return 1
done
done
@@ -41,6 +42,53 @@ test_expect_success 'setup bare clone for server' '
git -C srv.bare config --local uploadpack.allowanysha1inwant 1
'
+# Create a version of the repo with branches for testing revision
+# arguments like --all, --first-parent, and --since.
+#
+# main: 8 commits (linear) + merge of side branch
+# 48 original blobs + 4 side blobs = 52 blobs from main HEAD
+# side: 2 commits adding s/file.{1,2}.txt (v1, v2), merged into main
+# other: 1 commit adding o/file.{1,2}.txt (not merged)
+# 54 total blobs reachable from --all
+test_expect_success 'setup branched repo for revision tests' '
+ git clone src src-revs &&
+
+ # Side branch from tip of main with unique files
+ git -C src-revs checkout -b side HEAD &&
+ mkdir -p src-revs/s &&
+ echo "Side version 1 of file 1" >src-revs/s/file.1.txt &&
+ echo "Side version 1 of file 2" >src-revs/s/file.2.txt &&
+ test_tick &&
+ git -C src-revs add . &&
+ git -C src-revs commit -m "Side commit 1" &&
+
+ echo "Side version 2 of file 1" >src-revs/s/file.1.txt &&
+ echo "Side version 2 of file 2" >src-revs/s/file.2.txt &&
+ test_tick &&
+ git -C src-revs add . &&
+ git -C src-revs commit -m "Side commit 2" &&
+
+ # Merge side into main
+ git -C src-revs checkout main &&
+ test_tick &&
+ git -C src-revs merge side --no-ff -m "Merge side branch" &&
+
+ # Other branch (not merged) for --all testing
+ git -C src-revs checkout -b other main~1 &&
+ mkdir -p src-revs/o &&
+ echo "Other content 1" >src-revs/o/file.1.txt &&
+ echo "Other content 2" >src-revs/o/file.2.txt &&
+ test_tick &&
+ git -C src-revs add . &&
+ git -C src-revs commit -m "Other commit" &&
+
+ git -C src-revs checkout main &&
+
+ git clone --bare "file://$(pwd)/src-revs" srv-revs.bare &&
+ git -C srv-revs.bare config --local uploadpack.allowfilter 1 &&
+ git -C srv-revs.bare config --local uploadpack.allowanysha1inwant 1
+'
+
# do basic partial clone from "srv.bare"
test_expect_success 'do partial clone 1, backfill gets all objects' '
git clone --no-checkout --filter=blob:none \
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* [PATCH v3 3/6] backfill: accept revision arguments
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
@ 2026-03-26 15:14 ` Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
` (3 subsequent siblings)
6 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-26 15:14 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The existing implementation of 'git backfill' only includes downloading
missing blobs reachable from HEAD. Advanced uses may desire more general
commit limiting options, such as '--all' for all references, specifying a
commit range via negative references, or specifying a recency of use such as
with '--since=<date>'.
All of these options are available if we use setup_revisions() to parse the
unknown arguments with the revision machinery. This opens up a large number
of possibilities, only a small set of which are tested here.
For documentation, we avoid duplicating the option documentation and instead
link to the documentation of 'git rev-list'.
Note that these arguments currently allow specifying a pathspec, which
modifies the commit history checks but does not limit the paths used in the
backfill logic. This will be updated in a future change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-backfill.adoc | 5 +-
builtin/backfill.c | 19 ++--
t/t5620-backfill.sh | 156 ++++++++++++++++++++++++++++++++
3 files changed, 173 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
index b8394dcf22..246ab417c2 100644
--- a/Documentation/git-backfill.adoc
+++ b/Documentation/git-backfill.adoc
@@ -63,9 +63,12 @@ OPTIONS
current sparse-checkout. If the sparse-checkout feature is enabled,
then `--sparse` is assumed and can be disabled with `--no-sparse`.
+You may also specify the commit limiting options from linkgit:git-rev-list[1].
+
SEE ALSO
--------
-linkgit:git-clone[1].
+linkgit:git-clone[1],
+linkgit:git-rev-list[1]
GIT
---
diff --git a/builtin/backfill.c b/builtin/backfill.c
index e80fc1b694..90c9d84793 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -35,6 +35,7 @@ struct backfill_context {
struct oid_array current_batch;
size_t min_batch_size;
int sparse;
+ struct rev_info revs;
};
static void backfill_context_clear(struct backfill_context *ctx)
@@ -80,7 +81,6 @@ static int fill_missing_blobs(const char *path UNUSED,
static int do_backfill(struct backfill_context *ctx)
{
- struct rev_info revs;
struct path_walk_info info = PATH_WALK_INFO_INIT;
int ret;
@@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
}
}
- repo_init_revisions(ctx->repo, &revs, "");
- handle_revision_arg("HEAD", &revs, 0, 0);
+ /* Walk from HEAD if otherwise unspecified. */
+ if (!ctx->revs.pending.nr)
+ add_head_to_pending(&ctx->revs);
info.blobs = 1;
info.tags = info.commits = info.trees = 0;
- info.revs = &revs;
+ info.revs = &ctx->revs;
info.path_fn = fill_missing_blobs;
info.path_fn_data = ctx;
@@ -109,7 +110,6 @@ static int do_backfill(struct backfill_context *ctx)
download_batch(ctx);
path_walk_info_clear(&info);
- release_revisions(&revs);
return ret;
}
@@ -121,6 +121,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
.current_batch = OID_ARRAY_INIT,
.min_batch_size = 50000,
.sparse = 0,
+ .revs = REV_INFO_INIT,
};
struct option options[] = {
OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
@@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
builtin_backfill_usage, options);
argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
- 0);
+ PARSE_OPT_KEEP_UNKNOWN_OPT |
+ PARSE_OPT_KEEP_ARGV0 |
+ PARSE_OPT_KEEP_DASHDASH);
+
+ repo_init_revisions(repo, &ctx.revs, prefix);
+ argc = setup_revisions(argc, argv, &ctx.revs, NULL);
repo_config(repo, git_default_config, NULL);
@@ -143,5 +149,6 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
result = do_backfill(&ctx);
backfill_context_clear(&ctx);
+ release_revisions(&ctx.revs);
return result;
}
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 1331949be4..db66d8b614 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -224,6 +224,162 @@ test_expect_success 'backfill --sparse without cone mode (negative)' '
test_line_count = 12 missing
'
+test_expect_success 'backfill with revision range' '
+ test_when_finished rm -rf backfill-revs &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-revs &&
+
+ # No blobs yet
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ git -C backfill-revs backfill HEAD~2..HEAD &&
+
+ # 30 objects downloaded.
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 18 missing
+'
+
+test_expect_success 'backfill with revisions over stdin' '
+ test_when_finished rm -rf backfill-revs &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-revs &&
+
+ # No blobs yet
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ cat >in <<-EOF &&
+ HEAD
+ ^HEAD~2
+ EOF
+
+ git -C backfill-revs backfill --stdin <in &&
+
+ # 30 objects downloaded.
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 18 missing
+'
+
+test_expect_success 'backfill with prefix pathspec' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The pathspec should limit the downloaded blobs to
+ # only those matching the prefix "d/f", but currently all
+ # blobs are downloaded.
+ git -C backfill-path backfill HEAD -- d/f &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with multiple pathspecs' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The pathspecs should limit the downloaded blobs to
+ # only those matching "d/f" or "a", but currently all blobs
+ # are downloaded.
+ git -C backfill-path backfill HEAD -- d/f a &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with wildcard pathspec' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The wildcard pathspec should limit downloaded blobs,
+ # but currently all blobs are downloaded.
+ git -C backfill-path backfill HEAD -- "d/file.*.txt" &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with --all' '
+ test_when_finished rm -rf backfill-all &&
+ git clone --no-checkout --filter=blob:none \
+ "file://$(pwd)/srv-revs.bare" backfill-all &&
+
+ # All blobs from all refs are missing
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 54 missing &&
+
+ # Backfill from HEAD gets main blobs only
+ git -C backfill-all backfill HEAD &&
+
+ # Other branch blobs still missing
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 2 missing &&
+
+ # Backfill with --all gets everything
+ git -C backfill-all backfill --all &&
+
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with --first-parent' '
+ test_when_finished rm -rf backfill-fp &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv-revs.bare" backfill-fp &&
+
+ git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 52 missing &&
+
+ # --first-parent skips the side branch commits, so
+ # s/file.{1,2}.txt v1 blobs (only in side commit 1) are missed.
+ git -C backfill-fp backfill --first-parent HEAD &&
+
+ git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 2 missing
+'
+
+test_expect_success 'backfill with --since' '
+ test_when_finished rm -rf backfill-since &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv-revs.bare" backfill-since &&
+
+ git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 52 missing &&
+
+ # Use a cutoff between commits 4 and 5 (between v1 and v2
+ # iterations). Commits 5-8 still carry v1 of files 2-4 in
+ # their trees, but v1 of file.1.txt is only in commits 1-4.
+ SINCE=$(git -C backfill-since log --first-parent --reverse \
+ --format=%ct HEAD~1 | sed -n 5p) &&
+ git -C backfill-since backfill --since="@$((SINCE - 1))" HEAD &&
+
+ # 6 missing: v1 of file.1.txt in all 6 directories
+ git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 6 missing
+'
+
. "$TEST_DIRECTORY"/lib-httpd.sh
start_httpd
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* [PATCH v3 4/6] backfill: work with prefix pathspecs
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (2 preceding siblings ...)
2026-03-26 15:14 ` [PATCH v3 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
@ 2026-03-26 15:14 ` Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
` (2 subsequent siblings)
6 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-26 15:14 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
The previous change allowed specifying revision arguments over the 'git
backfill' command-line. This created the opportunity for restricting the
initial commit set by filtering the revision walk through a pathspec. Other
than filtering the commit set (and thereby the root trees), this did not
restrict the path-walk implementation of 'git backfill' and did not restrict
the blobs that were downloaded to only those matching the pathspec.
Update the path-walk API to accept certain kinds of pathspecs and to
silently ignore anything too complex, for now. We will update this in the
next change to properly restrict to even complex pathspecs.
The current behavior focuses on pathspecs that match paths exactly. This
includes exact filenames, including directory names as prefixes. Pathspecs
containing wildcards or magic are cleared so the path walk downloads all
blobs, as before.
The reason for this restriction is to allow for a faster execution by
pruning the path walk to only trees that could contribute towards one of
those paths as a parent directory.
The test directory 'd/f/' (next to 'd/file*.txt') was prepared in a
previous commit to exercise the subtlety in prefix matching.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
path-walk.c | 39 +++++++++++++++++++++++++++++++++++++++
path.c | 2 +-
path.h | 6 ++++++
t/t5620-backfill.sh | 16 ++++++----------
4 files changed, 52 insertions(+), 11 deletions(-)
diff --git a/path-walk.c b/path-walk.c
index 364e4cfa19..3750552978 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -11,6 +11,7 @@
#include "list-objects.h"
#include "object.h"
#include "oid-array.h"
+#include "path.h"
#include "prio-queue.h"
#include "repository.h"
#include "revision.h"
@@ -206,6 +207,33 @@ static int add_tree_entries(struct path_walk_context *ctx,
match != MATCHED)
continue;
}
+ if (ctx->revs->prune_data.nr) {
+ struct pathspec *pd = &ctx->revs->prune_data;
+ bool found = false;
+ int did_strip_suffix = strbuf_strip_suffix(&path, "/");
+
+
+ for (int i = 0; i < pd->nr; i++) {
+ struct pathspec_item *item = &pd->items[i];
+
+ /*
+ * Continue if either is a directory prefix
+ * of the other.
+ */
+ if (dir_prefix(path.buf, item->match) ||
+ dir_prefix(item->match, path.buf)) {
+ found = true;
+ break;
+ }
+ }
+
+ if (did_strip_suffix)
+ strbuf_addch(&path, '/');
+
+ /* Skip paths that do not match the prefix. */
+ if (!found)
+ continue;
+ }
add_path_to_list(ctx, path.buf, type, &entry.oid,
!(o->flags & UNINTERESTING));
@@ -481,6 +509,17 @@ int walk_objects_by_path(struct path_walk_info *info)
if (info->tags)
info->revs->tag_objects = 1;
+ if (ctx.revs->prune_data.nr) {
+ /*
+ * Only exact prefix pathspecs are currently supported.
+ * Clear any wildcard or magic pathspecs to avoid
+ * incorrect prefix matching.
+ */
+ if (ctx.revs->prune_data.has_wildcard ||
+ ctx.revs->prune_data.magic)
+ clear_pathspec(&ctx.revs->prune_data);
+ }
+
/* Insert a single list for the root tree into the paths. */
CALLOC_ARRAY(root_tree_list, 1);
root_tree_list->type = OBJ_TREE;
diff --git a/path.c b/path.c
index d726537622..aebb10b2e9 100644
--- a/path.c
+++ b/path.c
@@ -57,7 +57,7 @@ static void strbuf_cleanup_path(struct strbuf *sb)
strbuf_remove(sb, 0, path - sb->buf);
}
-static int dir_prefix(const char *buf, const char *dir)
+int dir_prefix(const char *buf, const char *dir)
{
int len = strlen(dir);
return !strncmp(buf, dir, len) &&
diff --git a/path.h b/path.h
index 0ec95a0b07..829fafd7e9 100644
--- a/path.h
+++ b/path.h
@@ -114,6 +114,12 @@ const char *repo_submodule_path_replace(struct repository *repo,
const char *fmt, ...)
__attribute__((format (printf, 4, 5)));
+/*
+ * Given a directory name 'dir' (not ending with a trailing '/'),
+ * determine if 'buf' is equal to 'dir' or has prefix 'dir'+'/'.
+ */
+int dir_prefix(const char *buf, const char *dir);
+
void report_linked_checkout_garbage(struct repository *r);
/*
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index db66d8b614..52f6484ca1 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -273,13 +273,11 @@ test_expect_success 'backfill with prefix pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # TODO: The pathspec should limit the downloaded blobs to
- # only those matching the prefix "d/f", but currently all
- # blobs are downloaded.
- git -C backfill-path backfill HEAD -- d/f &&
+ git -C backfill-path backfill HEAD -- d/f 2>err &&
+ test_must_be_empty err &&
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
- test_line_count = 0 missing
+ test_line_count = 40 missing
'
test_expect_success 'backfill with multiple pathspecs' '
@@ -292,13 +290,11 @@ test_expect_success 'backfill with multiple pathspecs' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # TODO: The pathspecs should limit the downloaded blobs to
- # only those matching "d/f" or "a", but currently all blobs
- # are downloaded.
- git -C backfill-path backfill HEAD -- d/f a &&
+ git -C backfill-path backfill HEAD -- d/f a 2>err &&
+ test_must_be_empty err &&
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
- test_line_count = 0 missing
+ test_line_count = 16 missing
'
test_expect_success 'backfill with wildcard pathspec' '
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* [PATCH v3 5/6] path-walk: support wildcard pathspecs for blob filtering
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (3 preceding siblings ...)
2026-03-26 15:14 ` [PATCH v3 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
@ 2026-03-26 15:14 ` Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-27 7:07 ` [PATCH v3 0/6] backfill: accept revision arguments Patrick Steinhardt
6 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-26 15:14 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Previously, walk_objects_by_path() silently ignored pathspecs containing
wildcards or magic by clearing them. This caused all blobs to be
downloaded regardless of the given pathspec. Wildcard pathspecs like
"d/file.*.txt" are useful for narrowing which blobs to process (e.g.,
during 'git backfill').
Support wildcard pathspecs by making two changes:
1. Add an 'exact_pathspecs' flag to path_walk_context. When the
pathspec has no wildcards or magic, set this flag and use the
existing fast-path prefix matching in add_tree_entries(). When
wildcards are present, skip that block since prefix matching
cannot handle glob patterns.
2. Add a match_pathspec() check in walk_path() to filter out blobs
whose full path does not match the pathspec. This provides the
actual blob-level filtering for wildcard pathspecs.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
path-walk.c | 22 +++++++++++++---------
t/t5620-backfill.sh | 7 +++----
2 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/path-walk.c b/path-walk.c
index 3750552978..2aa3e7d8a4 100644
--- a/path-walk.c
+++ b/path-walk.c
@@ -63,6 +63,8 @@ struct path_walk_context {
*/
struct prio_queue path_stack;
struct strset path_stack_pushed;
+
+ unsigned exact_pathspecs:1;
};
static int compare_by_type(const void *one, const void *two, void *cb_data)
@@ -207,7 +209,7 @@ static int add_tree_entries(struct path_walk_context *ctx,
match != MATCHED)
continue;
}
- if (ctx->revs->prune_data.nr) {
+ if (ctx->revs->prune_data.nr && ctx->exact_pathspecs) {
struct pathspec *pd = &ctx->revs->prune_data;
bool found = false;
int did_strip_suffix = strbuf_strip_suffix(&path, "/");
@@ -302,6 +304,13 @@ static int walk_path(struct path_walk_context *ctx,
return 0;
}
+ if (list->type == OBJ_BLOB &&
+ ctx->revs->prune_data.nr &&
+ !match_pathspec(ctx->repo->index, &ctx->revs->prune_data,
+ path, strlen(path), 0,
+ NULL, 0))
+ return 0;
+
/* Evaluate function pointer on this data, if requested. */
if ((list->type == OBJ_TREE && ctx->info->trees) ||
(list->type == OBJ_BLOB && ctx->info->blobs) ||
@@ -510,14 +519,9 @@ int walk_objects_by_path(struct path_walk_info *info)
info->revs->tag_objects = 1;
if (ctx.revs->prune_data.nr) {
- /*
- * Only exact prefix pathspecs are currently supported.
- * Clear any wildcard or magic pathspecs to avoid
- * incorrect prefix matching.
- */
- if (ctx.revs->prune_data.has_wildcard ||
- ctx.revs->prune_data.magic)
- clear_pathspec(&ctx.revs->prune_data);
+ if (!ctx.revs->prune_data.has_wildcard &&
+ !ctx.revs->prune_data.magic)
+ ctx.exact_pathspecs = 1;
}
/* Insert a single list for the root tree into the paths. */
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 52f6484ca1..c6f54ee91c 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -307,12 +307,11 @@ test_expect_success 'backfill with wildcard pathspec' '
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
test_line_count = 48 missing &&
- # TODO: The wildcard pathspec should limit downloaded blobs,
- # but currently all blobs are downloaded.
- git -C backfill-path backfill HEAD -- "d/file.*.txt" &&
+ git -C backfill-path backfill HEAD -- "d/file.*.txt" 2>err &&
+ test_must_be_empty err &&
git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
- test_line_count = 0 missing
+ test_line_count = 40 missing
'
test_expect_success 'backfill with --all' '
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* [PATCH v3 6/6] t5620: test backfill's unknown argument handling
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (4 preceding siblings ...)
2026-03-26 15:14 ` [PATCH v3 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
@ 2026-03-26 15:14 ` Derrick Stolee via GitGitGadget
2026-03-27 7:07 ` [PATCH v3 0/6] backfill: accept revision arguments Patrick Steinhardt
6 siblings, 0 replies; 46+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2026-03-26 15:14 UTC (permalink / raw)
To: git
Cc: gitster, Kristoffer Haugsbakk, r.siddharth.shrimali, ps,
Derrick Stolee, Derrick Stolee
From: Derrick Stolee <stolee@gmail.com>
Before the recent changes to parse rev-list arguments inside of 'git
backfill', the builtin would take arbitrary arguments without complaint (and
ignore them). This was noticed and a patch was sent [1] which motivates
this change.
[1] https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
Note that the revision machinery can output an "ambiguous argument"
warning if a value not starting with '--' is found and doesn't make
sense as a reference or a pathspec. For unrecognized arguments starting
with '--' we need to add logic into builtin/backfill.c to catch leftover
arguments.
Reported-by: Siddharth Shrimali <r.siddharth.shrimali@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
builtin/backfill.c | 3 +++
t/t5620-backfill.sh | 8 ++++++++
2 files changed, 11 insertions(+)
diff --git a/builtin/backfill.c b/builtin/backfill.c
index 90c9d84793..edc19c01e5 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -142,6 +142,9 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
repo_init_revisions(repo, &ctx.revs, prefix);
argc = setup_revisions(argc, argv, &ctx.revs, NULL);
+ if (argc > 1)
+ die(_("unrecognized argument: %s"), argv[1]);
+
repo_config(repo, git_default_config, NULL);
if (ctx.sparse < 0)
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index c6f54ee91c..2c347a91fe 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -7,6 +7,14 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
. ./test-lib.sh
+test_expect_success 'backfill rejects unexpected arguments' '
+ test_must_fail git backfill unexpected-arg 2>err &&
+ test_grep "ambiguous argument .*unexpected-arg" err &&
+
+ test_must_fail git backfill --all --unexpected-arg --first-parent 2>err &&
+ test_grep "unrecognized argument: --unexpected-arg" err
+'
+
# We create objects in the 'src' repo.
test_expect_success 'setup repo for object creation' '
echo "{print \$1}" >print_1.awk &&
--
gitgitgadget
^ permalink raw reply related [flat|nested] 46+ messages in thread* Re: [PATCH v3 0/6] backfill: accept revision arguments
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
` (5 preceding siblings ...)
2026-03-26 15:14 ` [PATCH v3 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
@ 2026-03-27 7:07 ` Patrick Steinhardt
6 siblings, 0 replies; 46+ messages in thread
From: Patrick Steinhardt @ 2026-03-27 7:07 UTC (permalink / raw)
To: Derrick Stolee via GitGitGadget
Cc: git, gitster, Kristoffer Haugsbakk, r.siddharth.shrimali,
Derrick Stolee
On Thu, Mar 26, 2026 at 03:14:48PM +0000, Derrick Stolee via GitGitGadget wrote:
> 6: 9699650aa7 ! 6: b6423f9595 t5620: test backfill's unknown argument handling
> @@ Commit message
>
> Before the recent changes to parse rev-list arguments inside of 'git
> backfill', the builtin would take arbitrary arguments without complaint (and
> - ignore them). This was noticed and a patch was sent [1] which motivates this
> - change to encode this behavior in test.
> + ignore them). This was noticed and a patch was sent [1] which motivates
> + this change.
>
> [1] https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
>
> + Note that the revision machinery can output an "ambiguous argument"
> + warning if a value not starting with '--' is found and doesn't make
> + sense as a reference or a pathspec. For unrecognized arguments starting
> + with '--' we need to add logic into builtin/backfill.c to catch leftover
> + arguments.
> +
> Reported-by: Siddharth Shrimali <r.siddharth.shrimali@gmail.com>
> Signed-off-by: Derrick Stolee <stolee@gmail.com>
>
> + ## builtin/backfill.c ##
> +@@ builtin/backfill.c: int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
> + repo_init_revisions(repo, &ctx.revs, prefix);
> + argc = setup_revisions(argc, argv, &ctx.revs, NULL);
> +
> ++ if (argc > 1)
> ++ die(_("unrecognized argument: %s"), argv[1]);
> ++
> + repo_config(repo, git_default_config, NULL);
> +
> + if (ctx.sparse < 0)
> +
> ## t/t5620-backfill.sh ##
I would've expected this chunk to already come in patch 3, but that
alone isn't really worth a reroll. All the other changes look good to
me, thanks!
Patrick
^ permalink raw reply [flat|nested] 46+ messages in thread