Git development
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Trieu Huynh <vikingtc4@gmail.com>, git@vger.kernel.org
Subject: Re: [GSoC PATCH] backfill: add --[no-]progress option
Date: Mon, 6 Apr 2026 09:16:30 -0400	[thread overview]
Message-ID: <8db10441-2fce-43ad-bcdc-331d26ec38ed@gmail.com> (raw)
In-Reply-To: <20260329152443.525493-1-vikingtc4@gmail.com>

On 3/29/2026 11:24 AM, Trieu Huynh wrote:
> 'git backfill' is silent when downloading missing objects, giving
> no feedback during potentially long-running operations on large
> repositories. By contrast, 'git fetch', 'git gc', and
> 'git index-pack' all support --[no-]progress.

I wouldn't use the word "silent" because the output is actually
quite verbose by default. Each batch has progress output with the
remote. For example, this is the output I get when running 'git
backfill' on a blobless partial clone of the Git repo:

$ git backfill
remote: Enumerating objects: 50083, done.
remote: Counting objects: 100% (865/865), done.
remote: Compressing objects: 100% (177/177), done.
remote: Total 50083 (delta 760), reused 688 (delta 688), pack-reused 49218 (from 1)
Receiving objects: 100% (50083/50083), 37.13 MiB | 27.75 MiB/s, done.
Resolving deltas: 100% (47710/47710), done.
remote: Enumerating objects: 50393, done.
remote: Counting objects: 100% (1559/1559), done.
remote: Compressing objects: 100% (366/366), done.
remote: Total 50393 (delta 1366), reused 1193 (delta 1193), pack-reused 48834 (from 2)
Receiving objects: 100% (50393/50393), 44.56 MiB | 31.56 MiB/s, done.
Resolving deltas: 100% (47261/47261), done.
remote: Enumerating objects: 50000, done.
remote: Counting objects: 100% (2313/2313), done.
remote: Compressing objects: 100% (592/592), done.
remote: Total 50000 (delta 1982), reused 1721 (delta 1721), pack-reused 47687 (from 2)
Receiving objects: 100% (50000/50000), 90.49 MiB | 17.85 MiB/s, done.
Resolving deltas: 100% (45321/45321), done.
remote: Enumerating objects: 2155, done.
remote: Counting objects: 100% (27/27), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 2155 (delta 6), reused 1 (delta 1), pack-reused 2128 (from 1)
Receiving objects: 100% (2155/2155), 891.74 KiB | 3.75 MiB/s, done.
Resolving deltas: 100% (1717/1717), done.

With your patch, I think there would be some extra progress
indicators between these batched fetch requests.

Before moving forward with review of this patch, I think that
it would be valuable to demonstrate the full output with and
without your change.

In addition, I think there would be value in a progress indicator
_instead_ of these verbose outputs from the remote. That would
require a change to how we initialize the fetches in a quiet mode.

(I also understand that this output would probably not be the same
if we have a filesystem protocol for fetching from a local repo,
like we frequently do in the test suite.)

>  static void backfill_context_clear(struct backfill_context *ctx)
> @@ -54,6 +57,7 @@ static void download_batch(struct backfill_context *ctx)
>  	 * avoid possible duplicate downloads of the same objects.
>  	 */
>  	odb_reprepare(ctx->repo->objects);
> +	display_progress(ctx->progress, ++ctx->batches_requested);

This looks correct. My preference is to not use prefix operators
like this on struct members (it reads like you are incrementing
'ctx' and not 'batches_requested', even though it is correct).

However, I'm not sure that we want the progress to indicate the
number of _batches_ but instead should be the number of _objects_.
  
>  static int fill_missing_blobs(const char *path UNUSED,
> @@ -120,12 +124,15 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
>  		.current_batch = OID_ARRAY_INIT,
>  		.min_batch_size = 50000,
>  		.sparse = 0,
> +		.show_progress = -1,
>  	};
>  	struct option options[] = {
>  		OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
>  			     N_("Minimum number of objects to request at a time")),
>  		OPT_BOOL(0, "sparse", &ctx.sparse,
>  			 N_("Restrict the missing objects to the current sparse-checkout")),
> +		OPT_BOOL(0, "progress", &ctx.show_progress,
> +			 N_("show progress while downloading missing objects")),
>  		OPT_END(),
>  	};

I hope that this does not cause any issues with the recent changes
to include the rev-list options in git-backfill. Worth checking.

> +test_expect_success 'backfill --progress shows progress' '
> +	git clone --no-checkout --filter=blob:none \
> +		--single-branch --branch=main \
> +		"file://$(pwd)/srv.bare" clone-progress &&
> +	git -C clone-progress backfill --progress 2>err &&
> +	test_grep "Downloading batches" err
> +'
> +
> +test_expect_success 'backfill --no-progress is silent' '
> +	git clone --no-checkout --filter=blob:none \
> +		--single-branch --branch=main \
> +		"file://$(pwd)/srv.bare" clone-no-progress &&
> +	git -C clone-no-progress backfill --no-progress 2>err &&
> +	test_grep ! "Downloading batches" err
> +'
> +
> +test_expect_success 'backfill no flag on non-TTY is silent' '
> +	git clone --no-checkout --filter=blob:none \
> +		--single-branch --branch=main \
> +		"file://$(pwd)/srv.bare" clone-notty &&
> +	git -C clone-notty backfill 2>err &&
> +	test_grep ! "Downloading batches" err
> +'

What you are missing here is that the progress isn't silent when
a TTY is present. There are several tests in the test suite that
use the TTY prerequisite for this kind of behavior, such as this
one from t9211-scalar-clone.sh:

test_expect_success TTY 'progress with tty' '
	enlistment=progress1 &&

	test_config -C to-clone uploadpack.allowfilter true &&
	test_config -C to-clone uploadpack.allowanysha1inwant true &&

	test_terminal env GIT_PROGRESS_DELAY=0 \
		scalar clone "file://$(pwd)/to-clone" "$enlistment" 2>stderr &&
	grep "Enumerating objects" stderr >actual &&
	test_line_count = 2 actual &&
	cleanup_clone $enlistment
'

Thanks,
-Stolee


  reply	other threads:[~2026-04-06 13:16 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-29 15:24 [GSoC PATCH] backfill: add --[no-]progress option Trieu Huynh
2026-04-06 13:16 ` Derrick Stolee [this message]
2026-04-06 17:35   ` Junio C Hamano
2026-04-07 19:22     ` Trieu Huynh
2026-04-07 19:42       ` Junio C Hamano
2026-04-07 19:15   ` Trieu Huynh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8db10441-2fce-43ad-bcdc-331d26ec38ed@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=vikingtc4@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox