From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com,
Kristoffer Haugsbakk <kristofferhaugsbakk@fastmail.com>,
r.siddharth.shrimali@gmail.com, ps@pks.im,
Derrick Stolee <stolee@gmail.com>
Subject: [PATCH v3 0/6] backfill: accept revision arguments
Date: Thu, 26 Mar 2026 15:14:48 +0000 [thread overview]
Message-ID: <pull.2070.v3.git.1774538094.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2070.v2.git.1774266019.gitgitgadget@gmail.com>
The git backfill command assists in downloading missing blobs for blobless
partial clones. However, its current version lacks some valuable
functionality. It currently:
1. Only walks commits reachable from HEAD.
2. It walks all reachable commits to the full history.
3. It can focus on the current sparse-checkout definition, but otherwise it
doesn't focus on a given pathspec.
All of these are being updated by this patch series, which allows rev-list
options to impact the path-walk. These include:
1. Specifying a given refspec, including --all.
2. Modifying the commit walk, including --first-parent, commit ranges, or
recency using --since.
3. Modifying the set of paths to download using pathspecs.
One particularly valuable situation here is that now a user can run git
backfill -- <path> to download all versions of a specific file or a specific
directory, accelerating history queries within that path without downloading
more than necessary. This can accelerate git blame or git log -L for these
paths, where normally those commands download missing blobs one-by-one
during its diff algorithms.
This patch series is organized in the following way:
1. A missing #include is added to prevent future compilation issues.
2. The test repo in t5620 is expanded to make later tests more interesting.
3. The backfill builtin parses the rev-list arguments. We test the top
arguments that work as expected, though the pathspec arguments need
extra work.
4. Update the path-walk logic to work efficiently with some pathspecs, such
as fixed prefix pathspecs, accelerating the computation.
5. For more complicated pathspecs, do a post-filter in builtin/backfill.c
instead of restricting the walk in the path-walk API.
The main goal of this series is to make such customizations possible, and to
improve performance where common use cases are expected. I'm open to
feedback as to whether we should consider more detailed performance analysis
or whether we should wait for how users interact with these new options
before overoptimizing unlikely use cases.
Updates in v2
=============
* Hard stops are replaced with a comma (and no punctuation) in the docs.
* add_head_to_pending() simplifies some code.
* My poor explanation of "starting commits" is updated.
* Language around temporary prefix restriction is clarified.
* Prefix match logic is simplified with dir_prefix().
* Temporary memory leak (introduced in v1's patch 4 and removed in v1's
patch 5) is removed in v2's patch 4.
* Commit pruning is reenabled in v2's patch 5. There was no need for that
with the way the logic works in the patch.
* Add a new patch with a test demonstrating the new behavior that was being
discussed in [1].
[1]
https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
Updates in v3
=============
* Fixed the argument checks to actually catch unknown arguments, because
the revision machinery will skip unknown options starting with --.
Thanks, -Stolee
Derrick Stolee (6):
revision: include object-name.h
t5620: prepare branched repo for revision tests
backfill: accept revision arguments
backfill: work with prefix pathspecs
path-walk: support wildcard pathspecs for blob filtering
t5620: test backfill's unknown argument handling
Documentation/git-backfill.adoc | 5 +-
builtin/backfill.c | 22 +++-
path-walk.c | 43 +++++++
path.c | 2 +-
path.h | 6 +
revision.h | 1 +
t/t5620-backfill.sh | 211 +++++++++++++++++++++++++++++++-
7 files changed, 280 insertions(+), 10 deletions(-)
base-commit: 67ad42147a7acc2af6074753ebd03d904476118f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2070%2Fderrickstolee%2Fbackfill-revs-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2070/derrickstolee/backfill-revs-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/2070
Range-diff vs v2:
1: fda0239103 = 1: fda0239103 revision: include object-name.h
2: 55a45b2fc8 = 2: 55a45b2fc8 t5620: prepare branched repo for revision tests
3: 610a162973 = 3: 610a162973 backfill: accept revision arguments
4: f8f2c61326 ! 4: 7223124fb3 backfill: work with prefix pathspecs
@@ path-walk.c: static int add_tree_entries(struct path_walk_context *ctx,
+ if (ctx->revs->prune_data.nr) {
+ struct pathspec *pd = &ctx->revs->prune_data;
+ bool found = false;
++ int did_strip_suffix = strbuf_strip_suffix(&path, "/");
+
-+ /* remove '/' for these checks. */
-+ path.buf[path.len - 1] = 0;
+
+ for (int i = 0; i < pd->nr; i++) {
+ struct pathspec_item *item = &pd->items[i];
@@ path-walk.c: static int add_tree_entries(struct path_walk_context *ctx,
+ }
+ }
+
-+ /* return '/' after these checks. */
-+ path.buf[path.len - 1] = '/';
++ if (did_strip_suffix)
++ strbuf_addch(&path, '/');
+
+ /* Skip paths that do not match the prefix. */
+ if (!found)
5: 1168edfb96 ! 5: 1ea278bd10 path-walk: support wildcard pathspecs for blob filtering
@@ path-walk.c: static int add_tree_entries(struct path_walk_context *ctx,
+ if (ctx->revs->prune_data.nr && ctx->exact_pathspecs) {
struct pathspec *pd = &ctx->revs->prune_data;
bool found = false;
-
+ int did_strip_suffix = strbuf_strip_suffix(&path, "/");
@@ path-walk.c: static int walk_path(struct path_walk_context *ctx,
return 0;
}
6: 9699650aa7 ! 6: b6423f9595 t5620: test backfill's unknown argument handling
@@ Commit message
Before the recent changes to parse rev-list arguments inside of 'git
backfill', the builtin would take arbitrary arguments without complaint (and
- ignore them). This was noticed and a patch was sent [1] which motivates this
- change to encode this behavior in test.
+ ignore them). This was noticed and a patch was sent [1] which motivates
+ this change.
[1] https://lore.kernel.org/git/20260321031643.5185-1-r.siddharth.shrimali@gmail.com/
+ Note that the revision machinery can output an "ambiguous argument"
+ warning if a value not starting with '--' is found and doesn't make
+ sense as a reference or a pathspec. For unrecognized arguments starting
+ with '--' we need to add logic into builtin/backfill.c to catch leftover
+ arguments.
+
Reported-by: Siddharth Shrimali <r.siddharth.shrimali@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
+ ## builtin/backfill.c ##
+@@ builtin/backfill.c: int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
+ repo_init_revisions(repo, &ctx.revs, prefix);
+ argc = setup_revisions(argc, argv, &ctx.revs, NULL);
+
++ if (argc > 1)
++ die(_("unrecognized argument: %s"), argv[1]);
++
+ repo_config(repo, git_default_config, NULL);
+
+ if (ctx.sparse < 0)
+
## t/t5620-backfill.sh ##
@@ t/t5620-backfill.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
@@ t/t5620-backfill.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+ test_must_fail git backfill unexpected-arg 2>err &&
+ test_grep "ambiguous argument .*unexpected-arg" err &&
+
-+ test_must_fail git backfill --all --firt-parent unexpected-arg 2>err &&
-+ test_grep "ambiguous argument .*unexpected-arg" err
++ test_must_fail git backfill --all --unexpected-arg --first-parent 2>err &&
++ test_grep "unrecognized argument: --unexpected-arg" err
+'
+
# We create objects in the 'src' repo.
--
gitgitgadget
next prev parent reply other threads:[~2026-03-26 15:14 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 0:29 [PATCH 0/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 0:29 ` [PATCH 1/5] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-17 21:52 ` Junio C Hamano
2026-03-17 0:29 ` [PATCH 2/5] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-17 0:29 ` [PATCH 3/5] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-17 22:01 ` Junio C Hamano
2026-03-18 15:37 ` Kristoffer Haugsbakk
2026-03-23 0:31 ` Derrick Stolee
2026-03-19 9:54 ` Patrick Steinhardt
2026-03-23 0:35 ` Derrick Stolee
2026-03-17 0:29 ` [PATCH 4/5] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-17 22:10 ` Junio C Hamano
2026-03-18 13:15 ` Derrick Stolee
2026-03-19 9:54 ` Patrick Steinhardt
2026-03-19 9:55 ` Patrick Steinhardt
2026-03-19 10:15 ` Patrick Steinhardt
2026-03-23 0:47 ` Derrick Stolee
2026-03-17 0:29 ` [PATCH 5/5] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-17 22:19 ` Junio C Hamano
2026-03-18 13:16 ` Derrick Stolee
2026-03-23 1:33 ` Derrick Stolee
2026-03-17 21:45 ` [PATCH 0/5] backfill: accept revision arguments Junio C Hamano
2026-03-19 9:54 ` Patrick Steinhardt
2026-03-19 12:59 ` Derrick Stolee
2026-03-20 7:35 ` Patrick Steinhardt
2026-03-23 11:40 ` [PATCH v2 0/6] " Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-24 7:59 ` Patrick Steinhardt
2026-03-26 12:55 ` Derrick Stolee
2026-03-23 11:40 ` [PATCH v2 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-24 7:59 ` Patrick Steinhardt
2026-03-26 12:58 ` Derrick Stolee
2026-03-23 11:40 ` [PATCH v2 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-23 11:40 ` [PATCH v2 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-23 15:29 ` Junio C Hamano
2026-03-23 20:39 ` Derrick Stolee
2026-03-26 15:14 ` Derrick Stolee via GitGitGadget [this message]
2026-03-26 15:14 ` [PATCH v3 1/6] revision: include object-name.h Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 2/6] t5620: prepare branched repo for revision tests Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 3/6] backfill: accept revision arguments Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 4/6] backfill: work with prefix pathspecs Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 5/6] path-walk: support wildcard pathspecs for blob filtering Derrick Stolee via GitGitGadget
2026-03-26 15:14 ` [PATCH v3 6/6] t5620: test backfill's unknown argument handling Derrick Stolee via GitGitGadget
2026-03-27 7:07 ` [PATCH v3 0/6] backfill: accept revision arguments Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2070.v3.git.1774538094.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kristofferhaugsbakk@fastmail.com \
--cc=ps@pks.im \
--cc=r.siddharth.shrimali@gmail.com \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox