* [GSoC PATCH] backfill: add --[no-]progress option @ 2026-03-29 15:24 Trieu Huynh 2026-04-06 13:16 ` Derrick Stolee 0 siblings, 1 reply; 6+ messages in thread From: Trieu Huynh @ 2026-03-29 15:24 UTC (permalink / raw) To: git; +Cc: Trieu Huynh 'git backfill' is silent when downloading missing objects, giving no feedback during potentially long-running operations on large repositories. By contrast, 'git fetch', 'git gc', and 'git index-pack' all support --[no-]progress. Add a --[no-]progress option to 'git backfill' by: - Calling display_progress() in download_batch() to advance the counter after each batch is fetched. - Defaulting to showing progress when stderr is a terminal (isatty(2)), matching the behaviour of 'git fetch'. - Allowing the user to override the default via --[no-]progress. nit: progress.h was already included but unused, this commit puts it to use. Add tests to verify that: - --progress shows "Downloading batches" on stderr. - --no-progress suppresses output even on a TTY. - No flag on a non-TTY is silent (isatty default). Signed-off-by: Trieu Huynh <vikingtc4@gmail.com> --- builtin/backfill.c | 16 ++++++++++++++++ t/t5620-backfill.sh | 24 ++++++++++++++++++++++++ 2 files changed, 40 insertions(+) diff --git a/builtin/backfill.c b/builtin/backfill.c index e9a33e81be..27a301f9b2 100644 --- a/builtin/backfill.c +++ b/builtin/backfill.c @@ -35,6 +35,9 @@ struct backfill_context { struct oid_array current_batch; size_t min_batch_size; int sparse; + int show_progress; + size_t batches_requested; + struct progress *progress; }; static void backfill_context_clear(struct backfill_context *ctx) @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx) * avoid possible duplicate downloads of the same objects. */ odb_reprepare(ctx->repo->objects); + display_progress(ctx->progress, ++ctx->batches_requested); } static int fill_missing_blobs(const char *path UNUSED, @@ -120,12 +124,15 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit .current_batch = OID_ARRAY_INIT, .min_batch_size = 50000, .sparse = 0, + .show_progress = -1, }; struct option options[] = { OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size, N_("Minimum number of objects to request at a time")), OPT_BOOL(0, "sparse", &ctx.sparse, N_("Restrict the missing objects to the current sparse-checkout")), + OPT_BOOL(0, "progress", &ctx.show_progress, + N_("show progress while downloading missing objects")), OPT_END(), }; struct repo_config_values *cfg = repo_config_values(the_repository); @@ -141,7 +148,16 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit if (ctx.sparse < 0) ctx.sparse = cfg->apply_sparse_checkout; + if (ctx.show_progress < 0) + ctx.show_progress = isatty(2); + + if (ctx.show_progress) + ctx.progress = start_progress(ctx.repo, _("Downloading batches"), 0); + result = do_backfill(&ctx); + if (ctx.show_progress) + stop_progress(&ctx.progress); + backfill_context_clear(&ctx); return result; } diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh index 58c81556e7..ff67e8ecea 100755 --- a/t/t5620-backfill.sh +++ b/t/t5620-backfill.sh @@ -77,6 +77,30 @@ test_expect_success 'do partial clone 2, backfill min batch size' ' test_line_count = 0 revs2 ' +test_expect_success 'backfill --progress shows progress' ' + git clone --no-checkout --filter=blob:none \ + --single-branch --branch=main \ + "file://$(pwd)/srv.bare" clone-progress && + git -C clone-progress backfill --progress 2>err && + test_grep "Downloading batches" err +' + +test_expect_success 'backfill --no-progress is silent' ' + git clone --no-checkout --filter=blob:none \ + --single-branch --branch=main \ + "file://$(pwd)/srv.bare" clone-no-progress && + git -C clone-no-progress backfill --no-progress 2>err && + test_grep ! "Downloading batches" err +' + +test_expect_success 'backfill no flag on non-TTY is silent' ' + git clone --no-checkout --filter=blob:none \ + --single-branch --branch=main \ + "file://$(pwd)/srv.bare" clone-notty && + git -C clone-notty backfill 2>err && + test_grep ! "Downloading batches" err +' + test_expect_success 'backfill --sparse without sparse-checkout fails' ' git init not-sparse && test_must_fail git -C not-sparse backfill --sparse 2>err && -- 2.43.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [GSoC PATCH] backfill: add --[no-]progress option 2026-03-29 15:24 [GSoC PATCH] backfill: add --[no-]progress option Trieu Huynh @ 2026-04-06 13:16 ` Derrick Stolee 2026-04-06 17:35 ` Junio C Hamano 2026-04-07 19:15 ` Trieu Huynh 0 siblings, 2 replies; 6+ messages in thread From: Derrick Stolee @ 2026-04-06 13:16 UTC (permalink / raw) To: Trieu Huynh, git On 3/29/2026 11:24 AM, Trieu Huynh wrote: > 'git backfill' is silent when downloading missing objects, giving > no feedback during potentially long-running operations on large > repositories. By contrast, 'git fetch', 'git gc', and > 'git index-pack' all support --[no-]progress. I wouldn't use the word "silent" because the output is actually quite verbose by default. Each batch has progress output with the remote. For example, this is the output I get when running 'git backfill' on a blobless partial clone of the Git repo: $ git backfill remote: Enumerating objects: 50083, done. remote: Counting objects: 100% (865/865), done. remote: Compressing objects: 100% (177/177), done. remote: Total 50083 (delta 760), reused 688 (delta 688), pack-reused 49218 (from 1) Receiving objects: 100% (50083/50083), 37.13 MiB | 27.75 MiB/s, done. Resolving deltas: 100% (47710/47710), done. remote: Enumerating objects: 50393, done. remote: Counting objects: 100% (1559/1559), done. remote: Compressing objects: 100% (366/366), done. remote: Total 50393 (delta 1366), reused 1193 (delta 1193), pack-reused 48834 (from 2) Receiving objects: 100% (50393/50393), 44.56 MiB | 31.56 MiB/s, done. Resolving deltas: 100% (47261/47261), done. remote: Enumerating objects: 50000, done. remote: Counting objects: 100% (2313/2313), done. remote: Compressing objects: 100% (592/592), done. remote: Total 50000 (delta 1982), reused 1721 (delta 1721), pack-reused 47687 (from 2) Receiving objects: 100% (50000/50000), 90.49 MiB | 17.85 MiB/s, done. Resolving deltas: 100% (45321/45321), done. remote: Enumerating objects: 2155, done. remote: Counting objects: 100% (27/27), done. remote: Compressing objects: 100% (26/26), done. remote: Total 2155 (delta 6), reused 1 (delta 1), pack-reused 2128 (from 1) Receiving objects: 100% (2155/2155), 891.74 KiB | 3.75 MiB/s, done. Resolving deltas: 100% (1717/1717), done. With your patch, I think there would be some extra progress indicators between these batched fetch requests. Before moving forward with review of this patch, I think that it would be valuable to demonstrate the full output with and without your change. In addition, I think there would be value in a progress indicator _instead_ of these verbose outputs from the remote. That would require a change to how we initialize the fetches in a quiet mode. (I also understand that this output would probably not be the same if we have a filesystem protocol for fetching from a local repo, like we frequently do in the test suite.) > static void backfill_context_clear(struct backfill_context *ctx) > @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx) > * avoid possible duplicate downloads of the same objects. > */ > odb_reprepare(ctx->repo->objects); > + display_progress(ctx->progress, ++ctx->batches_requested); This looks correct. My preference is to not use prefix operators like this on struct members (it reads like you are incrementing 'ctx' and not 'batches_requested', even though it is correct). However, I'm not sure that we want the progress to indicate the number of _batches_ but instead should be the number of _objects_. > static int fill_missing_blobs(const char *path UNUSED, > @@ -120,12 +124,15 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit > .current_batch = OID_ARRAY_INIT, > .min_batch_size = 50000, > .sparse = 0, > + .show_progress = -1, > }; > struct option options[] = { > OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size, > N_("Minimum number of objects to request at a time")), > OPT_BOOL(0, "sparse", &ctx.sparse, > N_("Restrict the missing objects to the current sparse-checkout")), > + OPT_BOOL(0, "progress", &ctx.show_progress, > + N_("show progress while downloading missing objects")), > OPT_END(), > }; I hope that this does not cause any issues with the recent changes to include the rev-list options in git-backfill. Worth checking. > +test_expect_success 'backfill --progress shows progress' ' > + git clone --no-checkout --filter=blob:none \ > + --single-branch --branch=main \ > + "file://$(pwd)/srv.bare" clone-progress && > + git -C clone-progress backfill --progress 2>err && > + test_grep "Downloading batches" err > +' > + > +test_expect_success 'backfill --no-progress is silent' ' > + git clone --no-checkout --filter=blob:none \ > + --single-branch --branch=main \ > + "file://$(pwd)/srv.bare" clone-no-progress && > + git -C clone-no-progress backfill --no-progress 2>err && > + test_grep ! "Downloading batches" err > +' > + > +test_expect_success 'backfill no flag on non-TTY is silent' ' > + git clone --no-checkout --filter=blob:none \ > + --single-branch --branch=main \ > + "file://$(pwd)/srv.bare" clone-notty && > + git -C clone-notty backfill 2>err && > + test_grep ! "Downloading batches" err > +' What you are missing here is that the progress isn't silent when a TTY is present. There are several tests in the test suite that use the TTY prerequisite for this kind of behavior, such as this one from t9211-scalar-clone.sh: test_expect_success TTY 'progress with tty' ' enlistment=progress1 && test_config -C to-clone uploadpack.allowfilter true && test_config -C to-clone uploadpack.allowanysha1inwant true && test_terminal env GIT_PROGRESS_DELAY=0 \ scalar clone "file://$(pwd)/to-clone" "$enlistment" 2>stderr && grep "Enumerating objects" stderr >actual && test_line_count = 2 actual && cleanup_clone $enlistment ' Thanks, -Stolee ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GSoC PATCH] backfill: add --[no-]progress option 2026-04-06 13:16 ` Derrick Stolee @ 2026-04-06 17:35 ` Junio C Hamano 2026-04-07 19:22 ` Trieu Huynh 2026-04-07 19:15 ` Trieu Huynh 1 sibling, 1 reply; 6+ messages in thread From: Junio C Hamano @ 2026-04-06 17:35 UTC (permalink / raw) To: Derrick Stolee; +Cc: Trieu Huynh, git Derrick Stolee <stolee@gmail.com> writes: > On 3/29/2026 11:24 AM, Trieu Huynh wrote: >> 'git backfill' is silent when downloading missing objects, giving >> no feedback during potentially long-running operations on large >> repositories. By contrast, 'git fetch', 'git gc', and >> 'git index-pack' all support --[no-]progress. > > I wouldn't use the word "silent" because the output is actually > quite verbose by default. ;-) > With your patch, I think there would be some extra progress > indicators between these batched fetch requests. >> static void backfill_context_clear(struct backfill_context *ctx) >> @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx) >> * avoid possible duplicate downloads of the same objects. >> */ >> odb_reprepare(ctx->repo->objects); >> + display_progress(ctx->progress, ++ctx->batches_requested); > > This looks correct. My preference is to not use prefix operators > like this on struct members (it reads like you are incrementing > 'ctx' and not 'batches_requested', even though it is correct). Thanks for paying extra attention to such details. In general, post-increment and pre-decrement are the norm when evaluated in a void context, so the use of pre-increment above violates that norm too. > However, I'm not sure that we want the progress to indicate the > number of _batches_ but instead should be the number of _objects_. True, too. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GSoC PATCH] backfill: add --[no-]progress option 2026-04-06 17:35 ` Junio C Hamano @ 2026-04-07 19:22 ` Trieu Huynh 2026-04-07 19:42 ` Junio C Hamano 0 siblings, 1 reply; 6+ messages in thread From: Trieu Huynh @ 2026-04-07 19:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: Derrick Stolee, git On Mon, Apr 06, 2026 at 10:35:58AM -0700, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > > > On 3/29/2026 11:24 AM, Trieu Huynh wrote: > >> 'git backfill' is silent when downloading missing objects, giving > >> no feedback during potentially long-running operations on large > >> repositories. By contrast, 'git fetch', 'git gc', and > >> 'git index-pack' all support --[no-]progress. > > > > I wouldn't use the word "silent" because the output is actually > > quite verbose by default. > > ;-) > > > With your patch, I think there would be some extra progress > > indicators between these batched fetch requests. > > >> static void backfill_context_clear(struct backfill_context *ctx) > >> @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx) > >> * avoid possible duplicate downloads of the same objects. > >> */ > >> odb_reprepare(ctx->repo->objects); > >> + display_progress(ctx->progress, ++ctx->batches_requested); > > > > This looks correct. My preference is to not use prefix operators > > like this on struct members (it reads like you are incrementing > > 'ctx' and not 'batches_requested', even though it is correct). > > Thanks for paying extra attention to such details. In general, > post-increment and pre-decrement are the norm when evaluated in a > void context, so the use of pre-increment above violates that norm > too. > Thanks for pointing it out. Will update, eg: ++counter; foo(counter); > > However, I'm not sure that we want the progress to indicate the > > number of _batches_ but instead should be the number of _objects_. > > True, too. > Make sense to me, worth checking if we can feasibly track the total object count instead of just batches to make the progress more meaningful. > Thanks. Thank you for all your kind review. I'll update v2 as per your comments. Hopefully, it can address these concerns. BRs, Trieu Huynh ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GSoC PATCH] backfill: add --[no-]progress option 2026-04-07 19:22 ` Trieu Huynh @ 2026-04-07 19:42 ` Junio C Hamano 0 siblings, 0 replies; 6+ messages in thread From: Junio C Hamano @ 2026-04-07 19:42 UTC (permalink / raw) To: Trieu Huynh; +Cc: Derrick Stolee, git Trieu Huynh <vikingtc4@gmail.com> writes: >> >> + display_progress(ctx->progress, ++ctx->batches_requested); >> > >> > This looks correct. My preference is to not use prefix operators >> > like this on struct members (it reads like you are incrementing >> > 'ctx' and not 'batches_requested', even though it is correct). >> >> Thanks for paying extra attention to such details. In general, >> post-increment and pre-decrement are the norm when evaluated in a >> void context, so the use of pre-increment above violates that norm >> too. >> > Thanks for pointing it out. Will update, eg: > ++counter; > foo(counter); I think you meant "counter++; foo(counter);" instead. Otherwise, the first line is exactly a pre-increment in a void context. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [GSoC PATCH] backfill: add --[no-]progress option 2026-04-06 13:16 ` Derrick Stolee 2026-04-06 17:35 ` Junio C Hamano @ 2026-04-07 19:15 ` Trieu Huynh 1 sibling, 0 replies; 6+ messages in thread From: Trieu Huynh @ 2026-04-07 19:15 UTC (permalink / raw) To: Derrick Stolee; +Cc: git On Mon, Apr 06, 2026 at 09:16:30AM -0400, Derrick Stolee wrote: > On 3/29/2026 11:24 AM, Trieu Huynh wrote: > > 'git backfill' is silent when downloading missing objects, giving > > no feedback during potentially long-running operations on large > > repositories. By contrast, 'git fetch', 'git gc', and > > 'git index-pack' all support --[no-]progress. > > I wouldn't use the word "silent" because the output is actually > quite verbose by default. Each batch has progress output with the > remote. For example, this is the output I get when running 'git > backfill' on a blobless partial clone of the Git repo: > Make sense to me, will reword it to be more precise in v2. > $ git backfill > remote: Enumerating objects: 50083, done. > remote: Counting objects: 100% (865/865), done. > remote: Compressing objects: 100% (177/177), done. > remote: Total 50083 (delta 760), reused 688 (delta 688), pack-reused 49218 (from 1) > Receiving objects: 100% (50083/50083), 37.13 MiB | 27.75 MiB/s, done. > Resolving deltas: 100% (47710/47710), done. > remote: Enumerating objects: 50393, done. > remote: Counting objects: 100% (1559/1559), done. > remote: Compressing objects: 100% (366/366), done. > remote: Total 50393 (delta 1366), reused 1193 (delta 1193), pack-reused 48834 (from 2) > Receiving objects: 100% (50393/50393), 44.56 MiB | 31.56 MiB/s, done. > Resolving deltas: 100% (47261/47261), done. > remote: Enumerating objects: 50000, done. > remote: Counting objects: 100% (2313/2313), done. > remote: Compressing objects: 100% (592/592), done. > remote: Total 50000 (delta 1982), reused 1721 (delta 1721), pack-reused 47687 (from 2) > Receiving objects: 100% (50000/50000), 90.49 MiB | 17.85 MiB/s, done. > Resolving deltas: 100% (45321/45321), done. > remote: Enumerating objects: 2155, done. > remote: Counting objects: 100% (27/27), done. > remote: Compressing objects: 100% (26/26), done. > remote: Total 2155 (delta 6), reused 1 (delta 1), pack-reused 2128 (from 1) > Receiving objects: 100% (2155/2155), 891.74 KiB | 3.75 MiB/s, done. > Resolving deltas: 100% (1717/1717), done. > > With your patch, I think there would be some extra progress > indicators between these batched fetch requests. > > Before moving forward with review of this patch, I think that > it would be valuable to demonstrate the full output with and > without your change. > Agree, will include a side-by-side comparison in the v2. > In addition, I think there would be value in a progress indicator > _instead_ of these verbose outputs from the remote. That would > require a change to how we initialize the fetches in a quiet mode. > > (I also understand that this output would probably not be the same > if we have a filesystem protocol for fetching from a local repo, > like we frequently do in the test suite.) > Agreed, initialize the fetch in quiet mode and replacing it with a single consolidated indicator would be a nicer UX. I'll look into how git-fetch sets up quiet mode and try to wire that through download_batch() in v2. > > static void backfill_context_clear(struct backfill_context *ctx) > > @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx) > > * avoid possible duplicate downloads of the same objects. > > */ > > odb_reprepare(ctx->repo->objects); > > + display_progress(ctx->progress, ++ctx->batches_requested); > > This looks correct. My preference is to not use prefix operators > like this on struct members (it reads like you are incrementing > 'ctx' and not 'batches_requested', even though it is correct). > > However, I'm not sure that we want the progress to indicate the > number of _batches_ but instead should be the number of _objects_. > > > static int fill_missing_blobs(const char *path UNUSED, > > @@ -120,12 +124,15 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit > > .current_batch = OID_ARRAY_INIT, > > .min_batch_size = 50000, > > .sparse = 0, > > + .show_progress = -1, > > }; > > struct option options[] = { > > OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size, > > N_("Minimum number of objects to request at a time")), > > OPT_BOOL(0, "sparse", &ctx.sparse, > > N_("Restrict the missing objects to the current sparse-checkout")), > > + OPT_BOOL(0, "progress", &ctx.show_progress, > > + N_("show progress while downloading missing objects")), > > OPT_END(), > > }; > > I hope that this does not cause any issues with the recent changes > to include the rev-list options in git-backfill. Worth checking. > Thanks for point, worth checking if any conflicts. > > +test_expect_success 'backfill --progress shows progress' ' > > + git clone --no-checkout --filter=blob:none \ > > + --single-branch --branch=main \ > > + "file://$(pwd)/srv.bare" clone-progress && > > + git -C clone-progress backfill --progress 2>err && > > + test_grep "Downloading batches" err > > +' > > + > > +test_expect_success 'backfill --no-progress is silent' ' > > + git clone --no-checkout --filter=blob:none \ > > + --single-branch --branch=main \ > > + "file://$(pwd)/srv.bare" clone-no-progress && > > + git -C clone-no-progress backfill --no-progress 2>err && > > + test_grep ! "Downloading batches" err > > +' > > + > > +test_expect_success 'backfill no flag on non-TTY is silent' ' > > + git clone --no-checkout --filter=blob:none \ > > + --single-branch --branch=main \ > > + "file://$(pwd)/srv.bare" clone-notty && > > + git -C clone-notty backfill 2>err && > > + test_grep ! "Downloading batches" err > > +' > > What you are missing here is that the progress isn't silent when > a TTY is present. There are several tests in the test suite that > use the TTY prerequisite for this kind of behavior, such as this > one from t9211-scalar-clone.sh: > > test_expect_success TTY 'progress with tty' ' > enlistment=progress1 && > > test_config -C to-clone uploadpack.allowfilter true && > test_config -C to-clone uploadpack.allowanysha1inwant true && > > test_terminal env GIT_PROGRESS_DELAY=0 \ > scalar clone "file://$(pwd)/to-clone" "$enlistment" 2>stderr && > grep "Enumerating objects" stderr >actual && > test_line_count = 2 actual && > cleanup_clone $enlistment > ' > Agree, will add such that test as per above reference. > Thanks, > -Stolee > BRs, Trieu Huynh ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-07 19:42 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-29 15:24 [GSoC PATCH] backfill: add --[no-]progress option Trieu Huynh 2026-04-06 13:16 ` Derrick Stolee 2026-04-06 17:35 ` Junio C Hamano 2026-04-07 19:22 ` Trieu Huynh 2026-04-07 19:42 ` Junio C Hamano 2026-04-07 19:15 ` Trieu Huynh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox