Git development
 help / color / mirror / Atom feed
From: "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <stolee@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Elijah Newren <newren@gmail.com>
Subject: [PATCH 3/3] backfill: default to grabbing edge blobs too
Date: Wed, 15 Apr 2026 23:58:02 +0000	[thread overview]
Message-ID: <607ed38e2a8ae94266b4a3d51610e604cca8df4f.1776297482.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2088.git.1776297482.gitgitgadget@gmail.com>

From: Elijah Newren <newren@gmail.com>

Commit 302aff09223f (backfill: accept revision arguments, 2026-03-26) added
support for accepting revision arguments to backfill.  This allows users
to do things like

   git backfill --remotes ^v2.3.0

and then run many commands without triggering on-demand downloads of
blobs.  However, if they have topics based on v2.3.0, they will likely
still trigger on-demand downloads.  Consider, for example, the command

   git log -p v2.3.0..topic

This would still trigger on-demand blob loadings after the backfill
command above, because the commit(s) with A as a parent will need to
diff against the blobs in A.  In fact, multiple commands need blobs from
the lower boundary of the revision range:

   * git log -p A..B                # After backfill A..B
   * git replay --onto TARGET A..B  # After backfill TARGET^! A..B
   * git checkout A && git merge B  # After backfill A...B

Add an extra --[no-]include-edges flag to allow grabbing blobs from
edge commits.  Since the point of backfill is to prevent on-demand blob
loading and these are common commands, default to --include-edges.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-backfill.adoc |   9 ++-
 builtin/backfill.c              |   8 ++-
 t/t5620-backfill.sh             | 110 ++++++++++++++++++++++++++++++--
 3 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-backfill.adoc b/Documentation/git-backfill.adoc
index bf26d7694f..c0a3b80615 100644
--- a/Documentation/git-backfill.adoc
+++ b/Documentation/git-backfill.adoc
@@ -9,7 +9,7 @@ git-backfill - Download missing objects in a partial clone
 SYNOPSIS
 --------
 [synopsis]
-git backfill [--min-batch-size=<n>] [--[no-]sparse] [<revision-range>]
+git backfill [--min-batch-size=<n>] [--[no-]sparse] [--[no-]include-edges] [<revision-range>]
 
 DESCRIPTION
 -----------
@@ -63,6 +63,13 @@ OPTIONS
 	current sparse-checkout. If the sparse-checkout feature is enabled,
 	then `--sparse` is assumed and can be disabled with `--no-sparse`.
 
+`--include-edges`::
+`--no-include-edges`::
+	Include blobs from boundary commits in the backfill.  Useful in
+	preparation for commands like `git log -p A..B` or `git replay
+	--onto TARGET A..B`, where A..B normally excludes A but you need
+	the blobs from A as well.  `--include-edges` is the default.
+
 `<revision-range>`::
 	Backfill only blobs reachable from commits in the specified
 	revision range.  When no _<revision-range>_ is specified, it
diff --git a/builtin/backfill.c b/builtin/backfill.c
index e934d360fd..7ffab2ea74 100644
--- a/builtin/backfill.c
+++ b/builtin/backfill.c
@@ -26,7 +26,7 @@
 #include "path-walk.h"
 
 static const char * const builtin_backfill_usage[] = {
-	N_("git backfill [--min-batch-size=<n>] [--[no-]sparse] [<revision-range>]"),
+	N_("git backfill [--min-batch-size=<n>] [--[no-]sparse] [--[no-]include-edges] [<revision-range>]"),
 	NULL
 };
 
@@ -35,6 +35,7 @@ struct backfill_context {
 	struct oid_array current_batch;
 	size_t min_batch_size;
 	int sparse;
+	int include_edges;
 	struct rev_info revs;
 };
 
@@ -116,6 +117,8 @@ static int do_backfill(struct backfill_context *ctx)
 	/* Walk from HEAD if otherwise unspecified. */
 	if (!ctx->revs.pending.nr)
 		add_head_to_pending(&ctx->revs);
+	if (ctx->include_edges)
+		ctx->revs.edge_hint = 1;
 
 	info.blobs = 1;
 	info.tags = info.commits = info.trees = 0;
@@ -143,12 +146,15 @@ int cmd_backfill(int argc, const char **argv, const char *prefix, struct reposit
 		.min_batch_size = 50000,
 		.sparse = -1,
 		.revs = REV_INFO_INIT,
+		.include_edges = 1,
 	};
 	struct option options[] = {
 		OPT_UNSIGNED(0, "min-batch-size", &ctx.min_batch_size,
 			     N_("Minimum number of objects to request at a time")),
 		OPT_BOOL(0, "sparse", &ctx.sparse,
 			 N_("Restrict the missing objects to the current sparse-checkout")),
+		OPT_BOOL(0, "include-edges", &ctx.include_edges,
+			 N_("Include blobs from boundary commits in the backfill")),
 		OPT_END(),
 	};
 	struct repo_config_values *cfg = repo_config_values(the_repository);
diff --git a/t/t5620-backfill.sh b/t/t5620-backfill.sh
index f3b5e39493..94f35ce190 100755
--- a/t/t5620-backfill.sh
+++ b/t/t5620-backfill.sh
@@ -257,11 +257,12 @@ test_expect_success 'backfill with revision range' '
 	git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
 	test_line_count = 48 missing &&
 
-	git -C backfill-revs backfill HEAD~2..HEAD &&
+	GIT_TRACE2_EVENT="$(pwd)/backfill-trace" git -C backfill-revs backfill HEAD~2..HEAD &&
 
-	# 30 objects downloaded.
+	# 36 objects downloaded, 12 still missing
+	test_trace2_data promisor fetch_count 36 <backfill-trace &&
 	git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
-	test_line_count = 18 missing
+	test_line_count = 12 missing
 '
 
 test_expect_success 'backfill with revisions over stdin' '
@@ -279,11 +280,12 @@ test_expect_success 'backfill with revisions over stdin' '
 	^HEAD~2
 	EOF
 
-	git -C backfill-revs backfill --stdin <in &&
+	GIT_TRACE2_EVENT="$(pwd)/backfill-trace" git -C backfill-revs backfill --stdin <in &&
 
-	# 30 objects downloaded.
+	# 36 objects downloaded, 12 still missing
+	test_trace2_data promisor fetch_count 36 <backfill-trace &&
 	git -C backfill-revs rev-list --quiet --objects --missing=print HEAD >missing &&
-	test_line_count = 18 missing
+	test_line_count = 12 missing
 '
 
 test_expect_success 'backfill with prefix pathspec' '
@@ -398,6 +400,102 @@ test_expect_success 'backfill with --since' '
 	test_line_count = 6 missing
 '
 
+test_expect_success 'backfill range with include-edges enables fetch-free git-log' '
+	git clone --no-checkout --filter=blob:none	\
+		--single-branch --branch=main		\
+		"file://$(pwd)/srv.bare" backfill-log &&
+
+	# Backfill the range with default include edges.
+	git -C backfill-log backfill HEAD~2..HEAD &&
+
+	# git log -p needs edge blobs for the "before" side of
+	# diffs.  With edge inclusion, all needed blobs are local.
+	GIT_TRACE2_EVENT="$(pwd)/log-trace" git \
+		-C backfill-log log -p HEAD~2..HEAD >log-output &&
+
+	# No promisor fetches should have been needed.
+	! grep "fetch_count" log-trace
+'
+
+test_expect_success 'backfill range without include edges causes on-demand fetches in git-log' '
+	git clone --no-checkout --filter=blob:none	\
+		--single-branch --branch=main		\
+		"file://$(pwd)/srv.bare" backfill-log-no-bdy &&
+
+	# Backfill WITHOUT include edges -- file.3 v1 blobs are missing.
+	git -C backfill-log-no-bdy backfill --no-include-edges HEAD~2..HEAD &&
+
+	# git log -p HEAD~2..HEAD computes diff of commit 7 against
+	# commit 6.  It needs file.3 v1 (the "before" side), which was
+	# not backfilled.  This triggers on-demand promisor fetches.
+	GIT_TRACE2_EVENT="$(pwd)/log-no-bdy-trace" git \
+		-C backfill-log-no-bdy log -p HEAD~2..HEAD >log-output &&
+
+	grep "fetch_count" log-no-bdy-trace
+'
+
+test_expect_success 'backfill range enables fetch-free replay' '
+	# Create a repo with a branch to replay.
+	git init replay-src &&
+	(
+		cd replay-src &&
+		git config uploadpack.allowfilter 1 &&
+		git config uploadpack.allowanysha1inwant 1 &&
+		test_commit base &&
+		git checkout -b topic &&
+		test_commit topic-change &&
+		git checkout main &&
+		test_commit main-change
+	) &&
+	git clone --bare --filter=blob:none \
+		"file://$(pwd)/replay-src" replay-dest.git &&
+
+	# Backfill the replay range: --onto main, replaying topic~1..topic.
+	# For replay, we need TARGET^! plus the range.
+	main_oid=$(git -C replay-dest.git rev-parse main) &&
+	topic_oid=$(git -C replay-dest.git rev-parse topic) &&
+	base_oid=$(git -C replay-dest.git rev-parse topic~1) &&
+	git -C replay-dest.git backfill \
+		"$main_oid^!" "$base_oid..$topic_oid" &&
+
+	# Now replay should complete without any promisor fetches.
+	GIT_TRACE2_EVENT="$(pwd)/replay-trace" git -C replay-dest.git \
+		replay --onto main topic~1..topic >replay-out &&
+
+	! grep "fetch_count" replay-trace
+'
+
+test_expect_success 'backfill enables fetch-free merge' '
+	# Create a repo with two branches to merge.
+	git init merge-src &&
+	(
+		cd merge-src &&
+		git config uploadpack.allowfilter 1 &&
+		git config uploadpack.allowanysha1inwant 1 &&
+		test_commit merge-base &&
+		git checkout -b side &&
+		test_commit side-change &&
+		git checkout main &&
+		test_commit main-side-change
+	) &&
+	git clone --filter=blob:none \
+		"file://$(pwd)/merge-src" merge-dest &&
+
+	# The clone checked out main, fetching its blobs.
+	# Backfill the three endpoint commits needed for merge.
+	main_oid=$(git -C merge-dest rev-parse origin/main) &&
+	side_oid=$(git -C merge-dest rev-parse origin/side) &&
+	mbase=$(git -C merge-dest merge-base origin/main origin/side) &&
+	git -C merge-dest backfill --no-include-edges \
+		"$main_oid^!" "$side_oid^!" "$mbase^!" &&
+
+	# Merge should complete without promisor fetches.
+	GIT_TRACE2_EVENT="$(pwd)/merge-trace" git -C merge-dest \
+		merge origin/side -m "test merge" &&
+
+	! grep "fetch_count" merge-trace
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
-- 
gitgitgadget

  parent reply	other threads:[~2026-04-15 23:58 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-15 23:57 [PATCH 0/3] Backfill fixes and edges Elijah Newren via GitGitGadget
2026-04-15 23:58 ` [PATCH 1/3] backfill: reject rev-list arguments that do not make sense Elijah Newren via GitGitGadget
2026-04-16 14:11   ` Derrick Stolee
2026-04-15 23:58 ` [PATCH 2/3] backfill: document acceptance of revision-range in more standard manner Elijah Newren via GitGitGadget
2026-04-16 14:12   ` Derrick Stolee
2026-04-15 23:58 ` Elijah Newren via GitGitGadget [this message]
2026-04-16 14:15   ` [PATCH 3/3] backfill: default to grabbing edge blobs too Derrick Stolee
2026-04-16 14:18 ` [PATCH 0/3] Backfill fixes and edges Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=607ed38e2a8ae94266b4a3d51610e604cca8df4f.1776297482.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox