From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com,
Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>,
r.siddharth.shrimali@gmail.com, ps@pks.im,
Derrick Stolee <stolee@gmail.com>,
Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v2 3/6] backfill: accept revision arguments
Date: Mon, 23 Mar 2026 11:40:16 +0000 [thread overview]
Message-ID: <610a162973a7ad59eba4ef4d5a9288f1fea1d2e8.1774266019.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2070.v2.git.1774266019.gitgitgadget@gmail.com>
From: Derrick Stolee <stolee@gmail.com>
The existing implementation of 'git backfill' only includes downloading
missing blobs reachable from HEAD. Advanced uses may desire more general
commit limiting options, such as '--all' for all references, specifying a
commit range via negative references, or specifying a recency of use such as
with '--since=<date>'.
All of these options are available if we use setup_revisions() to parse the
unknown arguments with the revision machinery. This opens up a large number
of possibilities, only a small set of which are tested here.
For documentation, we avoid duplicating the option documentation and instead
link to the documentation of 'git rev-list'.
Note that these arguments currently allow specifying a pathspec, which
modifies the commit history checks but does not limit the paths used in the
backfill logic. This will be updated in a future change.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
Documentation/git-backfill.adoc | 5 +-
builtin/backfill.c | 19 ++--
t/t5620-backfill.sh | 156 ++++++++++++++++++++++++++++++++
3 files changed, 173 insertions(+), 7 deletions(-)
diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
index b8394dcf22..246ab417c2 100644
--- a/Documentation/git-backfill.adoc
+++ b/Documentation/git-backfill.adoc
@@ -63,9 +63,12 @@ OPTIONS
current sparse-checkout. If the sparse-checkout feature is enabled,
then `--sparse` is assumed and can be disabled with `--no-sparse`.
+You may also specify the commit limiting options from linkgit:git-rev-list[1].
+
SEE ALSO
--------
-linkgit:git-clone[1].
+linkgit:git-clone[1],
+linkgit:git-rev-list[1]
GIT
---
diff --git a/builtin/backfill.c b/builtin/backfill.c
index e80fc1b694..90c9d84793 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -35,6 +35,7 @@ struct backfill_context {
struct oid_array current_batch;
size_t min_batch_size;
int sparse;
+ struct rev_info revs;
};
static void backfill_context_clear(struct backfill_context *ctx)
@@ -80,7 +81,6 @@ static int fill_missing_blobs(const char *path UNUSED,
static int do_backfill(struct backfill_context *ctx)
{
- struct rev_info revs;
struct path_walk_info info = PATH_WALK_INFO_INIT;
int ret;
@@ -92,13 +92,14 @@ static int do_backfill(struct backfill_context *ctx)
}
}
- repo_init_revisions(ctx->repo, &revs, "");
- handle_revision_arg("HEAD", &revs, 0, 0);
+ /* Walk from HEAD if otherwise unspecified. */
+ if (!ctx->revs.pending.nr)
+ add_head_to_pending(&ctx->revs);
info.blobs = 1;
info.tags = info.commits = info.trees = 0;
- info.revs = &revs;
+ info.revs = &ctx->revs;
info.path_fn = fill_missing_blobs;
info.path_fn_data = ctx;
@@ -109,7 +110,6 @@ static int do_backfill(struct backfill_context *ctx)
download_batch(ctx);
path_walk_info_clear(&info);
- release_revisions(&revs);
return ret;
}
@@ -121,6 +121,7 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
.current_batch = OID_ARRAY_INIT,
.min_batch_size = 50000,
.sparse = 0,
+ .revs = REV_INFO_INIT,
};
struct option options[] = {
OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
@@ -134,7 +135,12 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
builtin_backfill_usage, options);
argc = parse_options(argc, argv, prefix, options, builtin_backfill_usage,
- 0);
+ PARSE_OPT_KEEP_UNKNOWN_OPT |
+ PARSE_OPT_KEEP_ARGV0 |
+ PARSE_OPT_KEEP_DASHDASH);
+
+ repo_init_revisions(repo, &ctx.revs, prefix);
+ argc = setup_revisions(argc, argv, &ctx.revs, NULL);
repo_config(repo, git_default_config, NULL);
@@ -143,5 +149,6 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
result = do_backfill(&ctx);
backfill_context_clear(&ctx);
+ release_revisions(&ctx.revs);
return result;
}
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 1331949be4..db66d8b614 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -224,6 +224,162 @@ test_expect_success 'backfill --sparse without cone mode (negative)' '
test_line_count = 12 missing
'
+test_expect_success 'backfill with revision range' '
+ test_when_finished rm -rf backfill-revs &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-revs &&
+
+ # No blobs yet
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ git -C backfill-revs backfill HEAD~2..HEAD &&
+
+ # 30 objects downloaded.
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 18 missing
+'
+
+test_expect_success 'backfill with revisions over stdin' '
+ test_when_finished rm -rf backfill-revs &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-revs &&
+
+ # No blobs yet
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ cat >in <<-EOF &&
+ HEAD
+ ^HEAD~2
+ EOF
+
+ git -C backfill-revs backfill --stdin <in &&
+
+ # 30 objects downloaded.
+ git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 18 missing
+'
+
+test_expect_success 'backfill with prefix pathspec' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The pathspec should limit the downloaded blobs to
+ # only those matching the prefix "d/f", but currently all
+ # blobs are downloaded.
+ git -C backfill-path backfill HEAD -- d/f &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with multiple pathspecs' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The pathspecs should limit the downloaded blobs to
+ # only those matching "d/f" or "a", but currently all blobs
+ # are downloaded.
+ git -C backfill-path backfill HEAD -- d/f a &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with wildcard pathspec' '
+ test_when_finished rm -rf backfill-path &&
+ git clone --bare --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv.bare" backfill-path &&
+
+ # No blobs yet
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 48 missing &&
+
+ # TODO: The wildcard pathspec should limit downloaded blobs,
+ # but currently all blobs are downloaded.
+ git -C backfill-path backfill HEAD -- "d/file.*.txt" &&
+
+ git -C backfill-path rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with --all' '
+ test_when_finished rm -rf backfill-all &&
+ git clone --no-checkout --filter=blob:none \
+ "file://$(pwd)/srv-revs.bare" backfill-all &&
+
+ # All blobs from all refs are missing
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 54 missing &&
+
+ # Backfill from HEAD gets main blobs only
+ git -C backfill-all backfill HEAD &&
+
+ # Other branch blobs still missing
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 2 missing &&
+
+ # Backfill with --all gets everything
+ git -C backfill-all backfill --all &&
+
+ git -C backfill-all rev-list --quiet --objects --all --missing=print >missing &&
+ test_line_count = 0 missing
+'
+
+test_expect_success 'backfill with --first-parent' '
+ test_when_finished rm -rf backfill-fp &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv-revs.bare" backfill-fp &&
+
+ git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 52 missing &&
+
+ # --first-parent skips the side branch commits, so
+ # s/file.{1,2}.txt v1 blobs (only in side commit 1) are missed.
+ git -C backfill-fp backfill --first-parent HEAD &&
+
+ git -C backfill-fp rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 2 missing
+'
+
+test_expect_success 'backfill with --since' '
+ test_when_finished rm -rf backfill-since &&
+ git clone --no-checkout --filter=blob:none \
+ --single-branch --branch=main \
+ "file://$(pwd)/srv-revs.bare" backfill-since &&
+
+ git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 52 missing &&
+
+ # Use a cutoff between commits 4 and 5 (between v1 and v2
+ # iterations). Commits 5-8 still carry v1 of files 2-4 in
+ # their trees, but v1 of file.1.txt is only in commits 1-4.
+ SINCE=$(git -C backfill-since log --first-parent --reverse \
+ --format=%ct HEAD~1 | sed -n 5p) &&
+ git -C backfill-since backfill --since="@$((SINCE - 1))" HEAD &&
+
+ # 6 missing: v1 of file.1.txt in all 6 directories
+ git -C backfill-since rev-list --quiet --objects --missing=print HEAD >missing &&
+ test_line_count = 6 missing
+'
+
. "$TEST_DIRECTORY"/lib-httpd.sh
start_httpd
--
gitgitgadget
next prev parent reply other threads:[~2026-03-23 11:40 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 0:29 ` [PATCH 1/5] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-17 21:52 ` Junio C Hamano
2026-03-17 0:29 ` [PATCH 2/5] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-17 0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 22:01 ` Junio C Hamano
2026-03-18 15:37 ` Kristoffer Haugsbakk
2026-03-23 0:31 ` Derrick Stolee
2026-03-19 9:54 ` Patrick Steinhardt
2026-03-23 0:35 ` Derrick Stolee
2026-03-17 0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-17 22:10 ` Junio C Hamano
2026-03-18 13:15 ` Derrick Stolee
2026-03-19 9:54 ` Patrick Steinhardt
2026-03-19 9:55 ` Patrick Steinhardt
2026-03-19 10:15 ` Patrick Steinhardt
2026-03-23 0:47 ` Derrick Stolee
2026-03-17 0:29 ` [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-17 22:19 ` Junio C Hamano
2026-03-18 13:16 ` Derrick Stolee
2026-03-23 1:33 ` Derrick Stolee
2026-03-17 21:45 ` [PATCH 0/5] backfill: accept revision arguments Junio C Hamano
2026-03-19 9:54 ` Patrick Steinhardt
2026-03-19 12:59 ` Derrick Stolee
2026-03-20 7:35 ` Patrick Steinhardt
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` Derrick Stolee via GitGitGadget [this message]
2026-03-24 7:59 ` [PATCH v2 3/6] backfill: accept revision arguments Patrick Steinhardt
2026-03-26 12:55 ` Derrick Stolee
2026-03-23 11:40 ` [PATCH v2 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-24 7:59 ` Patrick Steinhardt
2026-03-26 12:58 ` Derrick Stolee
2026-03-23 11:40 ` [PATCH v2 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-23 15:29 ` Junio C Hamano
2026-03-23 20:39 ` Derrick Stolee
2026-03-26 15:14 ` [PATCH v3 0/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-27 7:07 ` [PATCH v3 0/6] backfill: accept revision arguments Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=610a162973a7ad59eba4ef4d5a9288f1fea1d2e8.1774266019.git.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=ps@pks.im \
--cc=r.siddharth.shrimali@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.