Git development
 help / color / mirror / Atom feed
* [GSoC PATCH] backfill: add --[no-]progress option
@ 2026-03-29 15:24 Trieu Huynh
  2026-04-06 13:16 ` Derrick Stolee
  0 siblings, 1 reply; 6+ messages in thread
From: Trieu Huynh @ 2026-03-29 15:24 UTC (permalink / raw)
  To: git; +Cc: Trieu Huynh

'git backfill' is silent when downloading missing objects, giving
no feedback during potentially long-running operations on large
repositories. By contrast, 'git fetch', 'git gc', and
'git index-pack' all support --[no-]progress.

Add a --[no-]progress option to 'git backfill' by:

 - Calling display_progress() in download_batch() to advance the
   counter after each batch is fetched.
 - Defaulting to showing progress when stderr is a terminal
   (isatty(2)), matching the behaviour of 'git fetch'.
 - Allowing the user to override the default via --[no-]progress.

nit: progress.h was already included but unused, this commit puts
it to use.

Add tests to verify that:
 - --progress shows "Downloading batches" on stderr.
 - --no-progress suppresses output even on a TTY.
 - No flag on a non-TTY is silent (isatty default).

Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
---
 builtin/backfill.c  | 16 ++++++++++++++++
 t/t5620-backfill.sh | 24 ++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/builtin/backfill.c b/builtin/backfill.c
index e9a33e81be..27a301f9b2 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -35,6 +35,9 @@ struct backfill_context {
 	struct oid_array current_batch;
 	size_t min_batch_size;
 	int sparse;
+	int show_progress;
+	size_t batches_requested;
+	struct progress *progress;
 };
 
 static void backfill_context_clear(struct backfill_context *ctx)
@@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx)
 	 * avoid possible duplicate downloads of the same objects.
 	 */
 	odb_reprepare(ctx->repo->objects);
+	display_progress(ctx->progress, ++ctx->batches_requested);
 }
 
 static int fill_missing_blobs(const char *path UNUSED,
@@ -120,12 +124,15 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
 		.current_batch = OID_ARRAY_INIT,
 		.min_batch_size = 50000,
 		.sparse = 0,
+		.show_progress = -1,
 	};
 	struct option options[] = {
 		OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
 			     N_("Minimum number of objects to request at a time")),
 		OPT_BOOL(0, "sparse", &ctx.sparse,
 			 N_("Restrict the missing objects to the current sparse-checkout")),
+		OPT_BOOL(0, "progress", &ctx.show_progress,
+			 N_("show progress while downloading missing objects")),
 		OPT_END(),
 	};
 	struct repo_config_values *cfg = repo_config_values(the_repository);
@@ -141,7 +148,16 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
 	if (ctx.sparse < 0)
 		ctx.sparse = cfg->apply_sparse_checkout;
 
+	if (ctx.show_progress < 0)
+		ctx.show_progress = isatty(2);
+
+	if (ctx.show_progress)
+		ctx.progress = start_progress(ctx.repo, _("Downloading batches"), 0);
+
 	result = do_backfill(&ctx);
+	if (ctx.show_progress)
+		stop_progress(&ctx.progress);
+
 	backfill_context_clear(&ctx);
 	return result;
 }
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index 58c81556e7..ff67e8ecea 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -77,6 +77,30 @@ test_expect_success 'do partial clone 2, backfill min batch size' '
 	test_line_count = 0 revs2
 '
 
+test_expect_success 'backfill --progress shows progress' '
+	git clone --no-checkout --filter=blob:none \
+		--single-branch --branch=main \
+		"file://$(pwd)/srv.bare" clone-progress &&
+	git -C clone-progress backfill --progress 2>err &&
+	test_grep "Downloading batches" err
+'
+
+test_expect_success 'backfill --no-progress is silent' '
+	git clone --no-checkout --filter=blob:none \
+		--single-branch --branch=main \
+		"file://$(pwd)/srv.bare" clone-no-progress &&
+	git -C clone-no-progress backfill --no-progress 2>err &&
+	test_grep ! "Downloading batches" err
+'
+
+test_expect_success 'backfill no flag on non-TTY is silent' '
+	git clone --no-checkout --filter=blob:none \
+		--single-branch --branch=main \
+		"file://$(pwd)/srv.bare" clone-notty &&
+	git -C clone-notty backfill 2>err &&
+	test_grep ! "Downloading batches" err
+'
+
 test_expect_success 'backfill --sparse without sparse-checkout fails' '
 	git init not-sparse &&
 	test_must_fail git -C not-sparse backfill --sparse 2>err &&
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [GSoC PATCH] backfill: add --[no-]progress option
  2026-03-29 15:24 [GSoC PATCH] backfill: add --[no-]progress option Trieu Huynh
@ 2026-04-06 13:16 ` Derrick Stolee
  2026-04-06 17:35   ` Junio C Hamano
  2026-04-07 19:15   ` Trieu Huynh
  0 siblings, 2 replies; 6+ messages in thread
From: Derrick Stolee @ 2026-04-06 13:16 UTC (permalink / raw)
  To: Trieu Huynh, git

On 3/29/2026 11:24 AM, Trieu Huynh wrote:
> 'git backfill' is silent when downloading missing objects, giving
> no feedback during potentially long-running operations on large
> repositories. By contrast, 'git fetch', 'git gc', and
> 'git index-pack' all support --[no-]progress.

I wouldn't use the word "silent" because the output is actually
quite verbose by default. Each batch has progress output with the
remote. For example, this is the output I get when running 'git
backfill' on a blobless partial clone of the Git repo:

$ git backfill
remote: Enumerating objects: 50083, done.
remote: Counting objects: 100% (865/865), done.
remote: Compressing objects: 100% (177/177), done.
remote: Total 50083 (delta 760), reused 688 (delta 688), pack-reused 49218 (from 1)
Receiving objects: 100% (50083/50083), 37.13 MiB | 27.75 MiB/s, done.
Resolving deltas: 100% (47710/47710), done.
remote: Enumerating objects: 50393, done.
remote: Counting objects: 100% (1559/1559), done.
remote: Compressing objects: 100% (366/366), done.
remote: Total 50393 (delta 1366), reused 1193 (delta 1193), pack-reused 48834 (from 2)
Receiving objects: 100% (50393/50393), 44.56 MiB | 31.56 MiB/s, done.
Resolving deltas: 100% (47261/47261), done.
remote: Enumerating objects: 50000, done.
remote: Counting objects: 100% (2313/2313), done.
remote: Compressing objects: 100% (592/592), done.
remote: Total 50000 (delta 1982), reused 1721 (delta 1721), pack-reused 47687 (from 2)
Receiving objects: 100% (50000/50000), 90.49 MiB | 17.85 MiB/s, done.
Resolving deltas: 100% (45321/45321), done.
remote: Enumerating objects: 2155, done.
remote: Counting objects: 100% (27/27), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 2155 (delta 6), reused 1 (delta 1), pack-reused 2128 (from 1)
Receiving objects: 100% (2155/2155), 891.74 KiB | 3.75 MiB/s, done.
Resolving deltas: 100% (1717/1717), done.

With your patch, I think there would be some extra progress
indicators between these batched fetch requests.

Before moving forward with review of this patch, I think that
it would be valuable to demonstrate the full output with and
without your change.

In addition, I think there would be value in a progress indicator
_instead_ of these verbose outputs from the remote. That would
require a change to how we initialize the fetches in a quiet mode.

(I also understand that this output would probably not be the same
if we have a filesystem protocol for fetching from a local repo,
like we frequently do in the test suite.)

>  static void backfill_context_clear(struct backfill_context *ctx)
> @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx)
>  	 * avoid possible duplicate downloads of the same objects.
>  	 */
>  	odb_reprepare(ctx->repo->objects);
> +	display_progress(ctx->progress, ++ctx->batches_requested);

This looks correct. My preference is to not use prefix operators
like this on struct members (it reads like you are incrementing
'ctx' and not 'batches_requested', even though it is correct).

However, I'm not sure that we want the progress to indicate the
number of _batches_ but instead should be the number of _objects_.
  
>  static int fill_missing_blobs(const char *path UNUSED,
> @@ -120,12 +124,15 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>  		.current_batch = OID_ARRAY_INIT,
>  		.min_batch_size = 50000,
>  		.sparse = 0,
> +		.show_progress = -1,
>  	};
>  	struct option options[] = {
>  		OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
>  			     N_("Minimum number of objects to request at a time")),
>  		OPT_BOOL(0, "sparse", &ctx.sparse,
>  			 N_("Restrict the missing objects to the current sparse-checkout")),
> +		OPT_BOOL(0, "progress", &ctx.show_progress,
> +			 N_("show progress while downloading missing objects")),
>  		OPT_END(),
>  	};

I hope that this does not cause any issues with the recent changes
to include the rev-list options in git-backfill. Worth checking.

> +test_expect_success 'backfill --progress shows progress' '
> +	git clone --no-checkout --filter=blob:none \
> +		--single-branch --branch=main \
> +		"file://$(pwd)/srv.bare" clone-progress &&
> +	git -C clone-progress backfill --progress 2>err &&
> +	test_grep "Downloading batches" err
> +'
> +
> +test_expect_success 'backfill --no-progress is silent' '
> +	git clone --no-checkout --filter=blob:none \
> +		--single-branch --branch=main \
> +		"file://$(pwd)/srv.bare" clone-no-progress &&
> +	git -C clone-no-progress backfill --no-progress 2>err &&
> +	test_grep ! "Downloading batches" err
> +'
> +
> +test_expect_success 'backfill no flag on non-TTY is silent' '
> +	git clone --no-checkout --filter=blob:none \
> +		--single-branch --branch=main \
> +		"file://$(pwd)/srv.bare" clone-notty &&
> +	git -C clone-notty backfill 2>err &&
> +	test_grep ! "Downloading batches" err
> +'

What you are missing here is that the progress isn't silent when
a TTY is present. There are several tests in the test suite that
use the TTY prerequisite for this kind of behavior, such as this
one from t9211-scalar-clone.sh:

test_expect_success TTY 'progress with tty' '
	enlistment=progress1 &&

	test_config -C to-clone uploadpack.allowfilter true &&
	test_config -C to-clone uploadpack.allowanysha1inwant true &&

	test_terminal env GIT_PROGRESS_DELAY=0 \
		scalar clone "file://$(pwd)/to-clone" "$enlistment" 2>stderr &&
	grep "Enumerating objects" stderr >actual &&
	test_line_count = 2 actual &&
	cleanup_clone $enlistment
'

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GSoC PATCH] backfill: add --[no-]progress option
  2026-04-06 13:16 ` Derrick Stolee
@ 2026-04-06 17:35   ` Junio C Hamano
  2026-04-07 19:22     ` Trieu Huynh
  2026-04-07 19:15   ` Trieu Huynh
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2026-04-06 17:35 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Trieu Huynh, git

Derrick Stolee <stolee@gmail.com> writes:

> On 3/29/2026 11:24 AM, Trieu Huynh wrote:
>> 'git backfill' is silent when downloading missing objects, giving
>> no feedback during potentially long-running operations on large
>> repositories. By contrast, 'git fetch', 'git gc', and
>> 'git index-pack' all support --[no-]progress.
>
> I wouldn't use the word "silent" because the output is actually
> quite verbose by default.

;-)

> With your patch, I think there would be some extra progress
> indicators between these batched fetch requests.

>>  static void backfill_context_clear(struct backfill_context *ctx)
>> @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx)
>>  	 * avoid possible duplicate downloads of the same objects.
>>  	 */
>>  	odb_reprepare(ctx->repo->objects);
>> +	display_progress(ctx->progress, ++ctx->batches_requested);
>
> This looks correct. My preference is to not use prefix operators
> like this on struct members (it reads like you are incrementing
> 'ctx' and not 'batches_requested', even though it is correct).

Thanks for paying extra attention to such details.  In general,
post-increment and pre-decrement are the norm when evaluated in a
void context, so the use of pre-increment above violates that norm
too.

> However, I'm not sure that we want the progress to indicate the
> number of _batches_ but instead should be the number of _objects_.

True, too.

Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GSoC PATCH] backfill: add --[no-]progress option
  2026-04-06 13:16 ` Derrick Stolee
  2026-04-06 17:35   ` Junio C Hamano
@ 2026-04-07 19:15   ` Trieu Huynh
  1 sibling, 0 replies; 6+ messages in thread
From: Trieu Huynh @ 2026-04-07 19:15 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

On Mon, Apr 06, 2026 at 09:16:30AM -0400, Derrick Stolee wrote:
> On 3/29/2026 11:24 AM, Trieu Huynh wrote:
> > 'git backfill' is silent when downloading missing objects, giving
> > no feedback during potentially long-running operations on large
> > repositories. By contrast, 'git fetch', 'git gc', and
> > 'git index-pack' all support --[no-]progress.
> 
> I wouldn't use the word "silent" because the output is actually
> quite verbose by default. Each batch has progress output with the
> remote. For example, this is the output I get when running 'git
> backfill' on a blobless partial clone of the Git repo:
> 
Make sense to me, will reword it to be more precise in v2.
> $ git backfill
> remote: Enumerating objects: 50083, done.
> remote: Counting objects: 100% (865/865), done.
> remote: Compressing objects: 100% (177/177), done.
> remote: Total 50083 (delta 760), reused 688 (delta 688), pack-reused 49218 (from 1)
> Receiving objects: 100% (50083/50083), 37.13 MiB | 27.75 MiB/s, done.
> Resolving deltas: 100% (47710/47710), done.
> remote: Enumerating objects: 50393, done.
> remote: Counting objects: 100% (1559/1559), done.
> remote: Compressing objects: 100% (366/366), done.
> remote: Total 50393 (delta 1366), reused 1193 (delta 1193), pack-reused 48834 (from 2)
> Receiving objects: 100% (50393/50393), 44.56 MiB | 31.56 MiB/s, done.
> Resolving deltas: 100% (47261/47261), done.
> remote: Enumerating objects: 50000, done.
> remote: Counting objects: 100% (2313/2313), done.
> remote: Compressing objects: 100% (592/592), done.
> remote: Total 50000 (delta 1982), reused 1721 (delta 1721), pack-reused 47687 (from 2)
> Receiving objects: 100% (50000/50000), 90.49 MiB | 17.85 MiB/s, done.
> Resolving deltas: 100% (45321/45321), done.
> remote: Enumerating objects: 2155, done.
> remote: Counting objects: 100% (27/27), done.
> remote: Compressing objects: 100% (26/26), done.
> remote: Total 2155 (delta 6), reused 1 (delta 1), pack-reused 2128 (from 1)
> Receiving objects: 100% (2155/2155), 891.74 KiB | 3.75 MiB/s, done.
> Resolving deltas: 100% (1717/1717), done.
> 
> With your patch, I think there would be some extra progress
> indicators between these batched fetch requests.
> 
> Before moving forward with review of this patch, I think that
> it would be valuable to demonstrate the full output with and
> without your change.
> 
Agree, will include a side-by-side comparison in the v2.
> In addition, I think there would be value in a progress indicator
> _instead_ of these verbose outputs from the remote. That would
> require a change to how we initialize the fetches in a quiet mode.
> 
> (I also understand that this output would probably not be the same
> if we have a filesystem protocol for fetching from a local repo,
> like we frequently do in the test suite.)
> 
Agreed, initialize the fetch in quiet mode and replacing it with a
single consolidated indicator would be a nicer UX. I'll look into
how git-fetch sets up quiet mode and try to wire that through
download_batch() in v2.
> >  static void backfill_context_clear(struct backfill_context *ctx)
> > @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx)
> >  	 * avoid possible duplicate downloads of the same objects.
> >  	 */
> >  	odb_reprepare(ctx->repo->objects);
> > +	display_progress(ctx->progress, ++ctx->batches_requested);
> 
> This looks correct. My preference is to not use prefix operators
> like this on struct members (it reads like you are incrementing
> 'ctx' and not 'batches_requested', even though it is correct).
> 
> However, I'm not sure that we want the progress to indicate the
> number of _batches_ but instead should be the number of _objects_.
>   
> >  static int fill_missing_blobs(const char *path UNUSED,
> > @@ -120,12 +124,15 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
> >  		.current_batch = OID_ARRAY_INIT,
> >  		.min_batch_size = 50000,
> >  		.sparse = 0,
> > +		.show_progress = -1,
> >  	};
> >  	struct option options[] = {
> >  		OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
> >  			     N_("Minimum number of objects to request at a time")),
> >  		OPT_BOOL(0, "sparse", &ctx.sparse,
> >  			 N_("Restrict the missing objects to the current sparse-checkout")),
> > +		OPT_BOOL(0, "progress", &ctx.show_progress,
> > +			 N_("show progress while downloading missing objects")),
> >  		OPT_END(),
> >  	};
> 
> I hope that this does not cause any issues with the recent changes
> to include the rev-list options in git-backfill. Worth checking.
> 
Thanks for point, worth checking if any conflicts.
> > +test_expect_success 'backfill --progress shows progress' '
> > +	git clone --no-checkout --filter=blob:none \
> > +		--single-branch --branch=main \
> > +		"file://$(pwd)/srv.bare" clone-progress &&
> > +	git -C clone-progress backfill --progress 2>err &&
> > +	test_grep "Downloading batches" err
> > +'
> > +
> > +test_expect_success 'backfill --no-progress is silent' '
> > +	git clone --no-checkout --filter=blob:none \
> > +		--single-branch --branch=main \
> > +		"file://$(pwd)/srv.bare" clone-no-progress &&
> > +	git -C clone-no-progress backfill --no-progress 2>err &&
> > +	test_grep ! "Downloading batches" err
> > +'
> > +
> > +test_expect_success 'backfill no flag on non-TTY is silent' '
> > +	git clone --no-checkout --filter=blob:none \
> > +		--single-branch --branch=main \
> > +		"file://$(pwd)/srv.bare" clone-notty &&
> > +	git -C clone-notty backfill 2>err &&
> > +	test_grep ! "Downloading batches" err
> > +'
> 
> What you are missing here is that the progress isn't silent when
> a TTY is present. There are several tests in the test suite that
> use the TTY prerequisite for this kind of behavior, such as this
> one from t9211-scalar-clone.sh:
> 
> test_expect_success TTY 'progress with tty' '
> 	enlistment=progress1 &&
> 
> 	test_config -C to-clone uploadpack.allowfilter true &&
> 	test_config -C to-clone uploadpack.allowanysha1inwant true &&
> 
> 	test_terminal env GIT_PROGRESS_DELAY=0 \
> 		scalar clone "file://$(pwd)/to-clone" "$enlistment" 2>stderr &&
> 	grep "Enumerating objects" stderr >actual &&
> 	test_line_count = 2 actual &&
> 	cleanup_clone $enlistment
> '
> 
Agree, will add such that test as per above reference.
> Thanks,
> -Stolee
> 
BRs,
Trieu Huynh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GSoC PATCH] backfill: add --[no-]progress option
  2026-04-06 17:35   ` Junio C Hamano
@ 2026-04-07 19:22     ` Trieu Huynh
  2026-04-07 19:42       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Trieu Huynh @ 2026-04-07 19:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git

On Mon, Apr 06, 2026 at 10:35:58AM -0700, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
> > On 3/29/2026 11:24 AM, Trieu Huynh wrote:
> >> 'git backfill' is silent when downloading missing objects, giving
> >> no feedback during potentially long-running operations on large
> >> repositories. By contrast, 'git fetch', 'git gc', and
> >> 'git index-pack' all support --[no-]progress.
> >
> > I wouldn't use the word "silent" because the output is actually
> > quite verbose by default.
> 
> ;-)
> 
> > With your patch, I think there would be some extra progress
> > indicators between these batched fetch requests.
> 
> >>  static void backfill_context_clear(struct backfill_context *ctx)
> >> @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx)
> >>  	 * avoid possible duplicate downloads of the same objects.
> >>  	 */
> >>  	odb_reprepare(ctx->repo->objects);
> >> +	display_progress(ctx->progress, ++ctx->batches_requested);
> >
> > This looks correct. My preference is to not use prefix operators
> > like this on struct members (it reads like you are incrementing
> > 'ctx' and not 'batches_requested', even though it is correct).
> 
> Thanks for paying extra attention to such details.  In general,
> post-increment and pre-decrement are the norm when evaluated in a
> void context, so the use of pre-increment above violates that norm
> too.
> 
Thanks for pointing it out. Will update, eg:
++counter;
foo(counter);
> > However, I'm not sure that we want the progress to indicate the
> > number of _batches_ but instead should be the number of _objects_.
> 
> True, too.
> 
Make sense to me, worth checking if we can feasibly track the total
object count instead of just batches to make the progress more
meaningful.
> Thanks.
Thank you for all your kind review.
I'll update v2 as per your comments. Hopefully, it can address
these concerns. 

BRs,
Trieu Huynh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [GSoC PATCH] backfill: add --[no-]progress option
  2026-04-07 19:22     ` Trieu Huynh
@ 2026-04-07 19:42       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2026-04-07 19:42 UTC (permalink / raw)
  To: Trieu Huynh; +Cc: Derrick Stolee, git

Trieu Huynh <vikingtc4@gmail.com> writes:

>> >> +	display_progress(ctx->progress, ++ctx->batches_requested);
>> >
>> > This looks correct. My preference is to not use prefix operators
>> > like this on struct members (it reads like you are incrementing
>> > 'ctx' and not 'batches_requested', even though it is correct).
>> 
>> Thanks for paying extra attention to such details.  In general,
>> post-increment and pre-decrement are the norm when evaluated in a
>> void context, so the use of pre-increment above violates that norm
>> too.
>> 
> Thanks for pointing it out. Will update, eg:
> ++counter;
> foo(counter);

I think you meant "counter++; foo(counter);" instead.  Otherwise,
the first line is exactly a pre-increment in a void context.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-07 19:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-29 15:24 [GSoC PATCH] backfill: add --[no-]progress option Trieu Huynh
2026-04-06 13:16 ` Derrick Stolee
2026-04-06 17:35   ` Junio C Hamano
2026-04-07 19:22     ` Trieu Huynh
2026-04-07 19:42       ` Junio C Hamano
2026-04-07 19:15   ` Trieu Huynh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox