Git development
 help / color / mirror / Atom feed
* [RFC GSoC PATCH] backfill: skip downloading for empty batches
@ 2026-03-31 12:12 Trieu Huynh
  2026-04-01 11:50 ` Patrick Steinhardt
  0 siblings, 1 reply; 3+ messages in thread
From: Trieu Huynh @ 2026-03-31 12:12 UTC (permalink / raw)
  To: git; +Cc: Trieu Huynh

When git backfill finishes its object walk, it unconditionally calls
download_batch to process any remaining objects. If the repository
is already up-to-date (no missing objects found), this call still
performs an unnecessary directory scan via odb_reprepare.

Fix it by adding a check in do_backfill to ensure download_batch is only
called if the current batch actually contains objects (nr > 0).

To facilitate testing and provide better telemetry, add a trace2 data
event for batches_requested. This allows us to verify that no batches
are processed when the command is run on an up-to-date repository.

Add a test case in t5620-backfill.sh to ensure silence and efficiency
when no objects are missing.

Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
---
Need discussion:
1. Is adding trace2_data_intmax() the preferred way to verify this 
   behavior in our test suite, or should we rely on redirection of 
   stderr to check for progress messages when the progress option
   is supported?

 builtin/backfill.c  |  3 ++-
 t/t5620-backfill.sh | 16 ++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/builtin/backfill.c b/builtin/backfill.c
index 0f31844ce7..67f9f28daf 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -58,6 +58,7 @@ static void download_batch(struct backfill_context *ctx)
 	 */
 	odb_reprepare(ctx->repo->objects);
 	display_progress(ctx->progress, ++ctx->batches_requested);
+	trace2_data_intmax("backfill", ctx->repo, "batches_requested", ctx->batches_requested);
 }
 
 static int fill_missing_blobs(const char *path UNUSED,
@@ -109,7 +110,7 @@ static int do_backfill(struct backfill_context *ctx)
 	ret = walk_objects_by_path(&info);
 
 	/* Download the objects that did not fill a batch. */
-	if (!ret)
+	if ( (!ret) && (ctx->current_batch.nr > 0) )
 		download_batch(ctx);
 
 	path_walk_info_clear(&info);
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index a1a8d736db..d3cc4022bf 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -221,6 +221,22 @@ test_expect_success 'backfill --sparse without cone mode (negative)' '
 	test_line_count = 12 missing
 '
 
+test_expect_success 'backfill does not request batches when up-to-date' '
+	git clone --no-checkout --filter=blob:none \
+		--single-branch --branch=main \
+		"file://$(pwd)/srv.bare" backfill-up-to-date &&
+
+	# First trigger to have a full download
+	git -C backfill-up-to-date backfill &&
+
+	# Second trigger to verify when already have a full download previously
+	GIT_TRACE2_EVENT="$(pwd)/up-to-date-trace" git \
+		-C backfill-up-to-date backfill &&
+
+	# Verify no  batches_request occurr
+	test_grep ! "batches_requested" up-to-date-trace
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC GSoC PATCH] backfill: skip downloading for empty batches
  2026-03-31 12:12 [RFC GSoC PATCH] backfill: skip downloading for empty batches Trieu Huynh
@ 2026-04-01 11:50 ` Patrick Steinhardt
  2026-04-01 19:44   ` Trieu Huynh
  0 siblings, 1 reply; 3+ messages in thread
From: Patrick Steinhardt @ 2026-04-01 11:50 UTC (permalink / raw)
  To: Trieu Huynh; +Cc: git

On Tue, Mar 31, 2026 at 09:12:04PM +0900, Trieu Huynh wrote:
> When git backfill finishes its object walk, it unconditionally calls
> download_batch to process any remaining objects. If the repository
> is already up-to-date (no missing objects found), this call still
> performs an unnecessary directory scan via odb_reprepare.
> 
> Fix it by adding a check in do_backfill to ensure download_batch is only
> called if the current batch actually contains objects (nr > 0).
> 
> To facilitate testing and provide better telemetry, add a trace2 data
> event for batches_requested. This allows us to verify that no batches
> are processed when the command is run on an up-to-date repository.
> 
> Add a test case in t5620-backfill.sh to ensure silence and efficiency
> when no objects are missing.
> 
> Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
> ---
> Need discussion:
> 1. Is adding trace2_data_intmax() the preferred way to verify this 
>    behavior in our test suite, or should we rely on redirection of 
>    stderr to check for progress messages when the progress option
>    is supported?

I think adding a call to trace2 only for the test itself doesn't make a
lot of sense if we already have another way to verify. But would we
actually see any progress messages? `promisor_remote_get_direct()` knows
to bail out early in case there is nothing to be downloaded, so the only
difference really is the call to `odb_reprepare()`.

Or is it? This part here...

> diff --git a/builtin/backfill.c b/builtin/backfill.c
> index 0f31844ce7..67f9f28daf 100644
> --- a/builtin/backfill.c
> +++ b/builtin/backfill.c
> @@ -58,6 +58,7 @@ static void download_batch(struct backfill_context *ctx)
>  	 */
>  	odb_reprepare(ctx->repo->objects);
>  	display_progress(ctx->progress, ++ctx->batches_requested);
> +	trace2_data_intmax("backfill", ctx->repo, "batches_requested", ctx->batches_requested);
>  }
>  
>  static int fill_missing_blobs(const char *path UNUSED,

... looks different. What commit is this patch based on? There is no
call to `display_progress()` on "master", and you didn't mention any
other dependency in your cover letter. Please note such dependencies
when you post a patch that has any requirements.

But in any case, this here would cause us to print "batches_requested"
events repeatedly, which doesn't make a lot of sense.

> @@ -109,7 +110,7 @@ static int do_backfill(struct backfill_context *ctx)
>  	ret = walk_objects_by_path(&info);
>  
>  	/* Download the objects that did not fill a batch. */
> -	if (!ret)
> +	if ( (!ret) && (ctx->current_batch.nr > 0) )
>  		download_batch(ctx);
>  
>  	path_walk_info_clear(&info);

Please pay attention to our coding guidlines, see
"Documentation/CodingGuidelines".

I guess a more robust fix would add the check in `download_batch()`
itself, but I guess both alternatives work. But overall, it's sensible
to avoid repreparing the ODB in case we know nothing has changed.

> diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
> index a1a8d736db..d3cc4022bf 100755
> --- a/t/t5620-backfill.sh
> +++ b/t/t5620-backfill.sh
> @@ -221,6 +221,22 @@ test_expect_success 'backfill --sparse without cone mode (negative)' '
>  	test_line_count = 12 missing
>  '
>  
> +test_expect_success 'backfill does not request batches when up-to-date' '
> +	git clone --no-checkout --filter=blob:none \
> +		--single-branch --branch=main \
> +		"file://$(pwd)/srv.bare" backfill-up-to-date &&
> +
> +	# First trigger to have a full download
> +	git -C backfill-up-to-date backfill &&
> +
> +	# Second trigger to verify when already have a full download previously
> +	GIT_TRACE2_EVENT="$(pwd)/up-to-date-trace" git \
> +		-C backfill-up-to-date backfill &&
> +
> +	# Verify no  batches_request occurr
> +	test_grep ! "batches_requested" up-to-date-trace
> +'

I'm ultimately not sure whether this change even needs a test. We're not
changing any user-visible behaviour, we're simply skipping some
pointless busywork that doesn't do much, but that shouldn't really hurt
much, either.

Patrick

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC GSoC PATCH] backfill: skip downloading for empty batches
  2026-04-01 11:50 ` Patrick Steinhardt
@ 2026-04-01 19:44   ` Trieu Huynh
  0 siblings, 0 replies; 3+ messages in thread
From: Trieu Huynh @ 2026-04-01 19:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On Wed, Apr 01, 2026 at 01:50:55PM +0200, Patrick Steinhardt wrote:
> On Tue, Mar 31, 2026 at 09:12:04PM +0900, Trieu Huynh wrote:
> > When git backfill finishes its object walk, it unconditionally calls
> > download_batch to process any remaining objects. If the repository
> > is already up-to-date (no missing objects found), this call still
> > performs an unnecessary directory scan via odb_reprepare.
> > 
> > Fix it by adding a check in do_backfill to ensure download_batch is only
> > called if the current batch actually contains objects (nr > 0).
> > 
> > To facilitate testing and provide better telemetry, add a trace2 data
> > event for batches_requested. This allows us to verify that no batches
> > are processed when the command is run on an up-to-date repository.
> > 
> > Add a test case in t5620-backfill.sh to ensure silence and efficiency
> > when no objects are missing.
> > 
> > Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
> > ---
> > Need discussion:
> > 1. Is adding trace2_data_intmax() the preferred way to verify this 
> >    behavior in our test suite, or should we rely on redirection of 
> >    stderr to check for progress messages when the progress option
> >    is supported?
> 
> I think adding a call to trace2 only for the test itself doesn't make a
> lot of sense if we already have another way to verify. But would we
> actually see any progress messages? `promisor_remote_get_direct()` knows
> to bail out early in case there is nothing to be downloaded, so the only
> difference really is the call to `odb_reprepare()`.
> 
> Or is it? This part here...
> 
currently, I have no idea to verify the change, so I'm adding a trace2 here.
> > diff --git a/builtin/backfill.c b/builtin/backfill.c
> > index 0f31844ce7..67f9f28daf 100644
> > --- a/builtin/backfill.c
> > +++ b/builtin/backfill.c
> > @@ -58,6 +58,7 @@ static void download_batch(struct backfill_context *ctx)
> >  	 */
> >  	odb_reprepare(ctx->repo->objects);
> >  	display_progress(ctx->progress, ++ctx->batches_requested);
> > +	trace2_data_intmax("backfill", ctx->repo, "batches_requested", ctx->batches_requested);
> >  }
> >  
> >  static int fill_missing_blobs(const char *path UNUSED,
> 
> ... looks different. What commit is this patch based on? There is no
> call to `display_progress()` on "master", and you didn't mention any
> other dependency in your cover letter. Please note such dependencies
> when you post a patch that has any requirements.
> 
I was submit another patch to support --[no-]progress option.
https://lore.kernel.org/git/20260329152443.525493-1-vikingtc4@gmail.com/
it should be based on master's latest rather than this change, sorry for
the confusion, will rebase on v2.
> But in any case, this here would cause us to print "batches_requested"
> events repeatedly, which doesn't make a lot of sense.
> 
ack, but I wonder if it should be defined method to verify if it can skip
when no objects are missing or not here.
> > @@ -109,7 +110,7 @@ static int do_backfill(struct backfill_context *ctx)
> >  	ret = walk_objects_by_path(&info);
> >  
> >  	/* Download the objects that did not fill a batch. */
> > -	if (!ret)
> > +	if ( (!ret) && (ctx->current_batch.nr > 0) )
> >  		download_batch(ctx);
> >  
> >  	path_walk_info_clear(&info);
> 
> Please pay attention to our coding guidlines, see
> "Documentation/CodingGuidelines".
> 
ack, I got it.
> I guess a more robust fix would add the check in `download_batch()`
> itself, but I guess both alternatives work. But overall, it's sensible
> to avoid repreparing the ODB in case we know nothing has changed.
> 
ack, waiting for other reviews.
> > diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
> > index a1a8d736db..d3cc4022bf 100755
> > --- a/t/t5620-backfill.sh
> > +++ b/t/t5620-backfill.sh
> > @@ -221,6 +221,22 @@ test_expect_success 'backfill --sparse without cone mode (negative)' '
> >  	test_line_count = 12 missing
> >  '
> >  
> > +test_expect_success 'backfill does not request batches when up-to-date' '
> > +	git clone --no-checkout --filter=blob:none \
> > +		--single-branch --branch=main \
> > +		"file://$(pwd)/srv.bare" backfill-up-to-date &&
> > +
> > +	# First trigger to have a full download
> > +	git -C backfill-up-to-date backfill &&
> > +
> > +	# Second trigger to verify when already have a full download previously
> > +	GIT_TRACE2_EVENT="$(pwd)/up-to-date-trace" git \
> > +		-C backfill-up-to-date backfill &&
> > +
> > +	# Verify no  batches_request occurr
> > +	test_grep ! "batches_requested" up-to-date-trace
> > +'
> 
> I'm ultimately not sure whether this change even needs a test. We're not
> changing any user-visible behaviour, we're simply skipping some
> pointless busywork that doesn't do much, but that shouldn't really hurt
> much, either.
> 
ack, can provide steps to verify in commit msg instead of adding a test.
> Patrick

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-01 19:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 12:12 [RFC GSoC PATCH] backfill: skip downloading for empty batches Trieu Huynh
2026-04-01 11:50 ` Patrick Steinhardt
2026-04-01 19:44   ` Trieu Huynh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox