Git development
 help / color / mirror / Atom feed
* [RFC GSoC PATCH] backfill: skip downloading for empty batches
@ 2026-03-31 12:12 Trieu Huynh
  2026-04-01 11:50 ` Patrick Steinhardt
  0 siblings, 1 reply; 3+ messages in thread
From: Trieu Huynh @ 2026-03-31 12:12 UTC (permalink / raw)
  To: git; +Cc: Trieu Huynh

When git backfill finishes its object walk, it unconditionally calls
download_batch to process any remaining objects. If the repository
is already up-to-date (no missing objects found), this call still
performs an unnecessary directory scan via odb_reprepare.

Fix it by adding a check in do_backfill to ensure download_batch is only
called if the current batch actually contains objects (nr > 0).

To facilitate testing and provide better telemetry, add a trace2 data
event for batches_requested. This allows us to verify that no batches
are processed when the command is run on an up-to-date repository.

Add a test case in t5620-backfill.sh to ensure silence and efficiency
when no objects are missing.

Signed-off-by: Trieu Huynh <vikingtc4@gmail.com>
---
Need discussion:
1. Is adding trace2_data_intmax() the preferred way to verify this 
   behavior in our test suite, or should we rely on redirection of 
   stderr to check for progress messages when the progress option
   is supported?

 builtin/backfill.c  |  3 ++-
 t/t5620-backfill.sh | 16 ++++++++++++++++
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/builtin/backfill.c b/builtin/backfill.c
index 0f31844ce7..67f9f28daf 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -58,6 +58,7 @@ static void download_batch(struct backfill_context *ctx)
 	 */
 	odb_reprepare(ctx->repo->objects);
 	display_progress(ctx->progress, ++ctx->batches_requested);
+	trace2_data_intmax("backfill", ctx->repo, "batches_requested", ctx->batches_requested);
 }
 
 static int fill_missing_blobs(const char *path UNUSED,
@@ -109,7 +110,7 @@ static int do_backfill(struct backfill_context *ctx)
 	ret = walk_objects_by_path(&info);
 
 	/* Download the objects that did not fill a batch. */
-	if (!ret)
+	if ( (!ret) && (ctx->current_batch.nr > 0) )
 		download_batch(ctx);
 
 	path_walk_info_clear(&info);
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index a1a8d736db..d3cc4022bf 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -221,6 +221,22 @@ test_expect_success 'backfill --sparse without cone mode (negative)' '
 	test_line_count = 12 missing
 '
 
+test_expect_success 'backfill does not request batches when up-to-date' '
+	git clone --no-checkout --filter=blob:none \
+		--single-branch --branch=main \
+		"file://$(pwd)/srv.bare" backfill-up-to-date &&
+
+	# First trigger to have a full download
+	git -C backfill-up-to-date backfill &&
+
+	# Second trigger to verify when already have a full download previously
+	GIT_TRACE2_EVENT="$(pwd)/up-to-date-trace" git \
+		-C backfill-up-to-date backfill &&
+
+	# Verify no  batches_request occurr
+	test_grep ! "batches_requested" up-to-date-trace
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-01 19:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 12:12 [RFC GSoC PATCH] backfill: skip downloading for empty batches Trieu Huynh
2026-04-01 11:50 ` Patrick Steinhardt
2026-04-01 19:44   ` Trieu Huynh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox