public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] builtin/pack-objects: don't fetch objects when merging packs
@ 2026-02-11 12:44 Patrick Steinhardt
  2026-02-11 17:21 ` Junio C Hamano
  2026-02-12 23:46 ` Taylor Blau
  0 siblings, 2 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2026-02-11 12:44 UTC (permalink / raw)
  To: git; +Cc: Lukas Wanko

The "--stdin-packs" option can be used to merge objects from multiple
packfiles given via stdin into a new packfile. One big upside of this
option is that we don't have to perform a complete rev walk to enumerate
objects. Instead, we can simply enumerate all objects that are part of
the specified packfiles, which can be significantly faster in very large
repositories.

There is one downside though: when we don't perform a rev walk we also
don't have a good way to learn about the respective object's names. As a
consequence, we cannot use the name hashes as a heuristic to get better
delta selection.

We try to offset this downside though by performing a localized rev
walk: we queue all objects that we're about to repack as interesting,
and all objects from excluded packfiles as uninteresting. We then
perform a best-effort rev walk that allows us to fill in object names.

There is one gotcha here though: when "--exclude-promisor-objects" has
not been given we will perform backfill fetches for any promised objects
that are missing. This used to not be an issue though as this option was
mutually exclusive with "--stdin-packs". But that has changed recently,
and starting with dcc9c7ef47 (builtin/repack: handle promisor packs with
geometric repacking, 2026-01-05) we will now repack promisor packs
during geometric compaction. The consequence is that a geometric repack
may now perform a bunch of backfill fetches.

We of course cannot passe "--exclude-promisor-objects" to fix this
issue -- after all, the whole intent is to repack objects part of a
promisor pack. But arguably we don't have to: the rev walk is intended
as best effort, and we already configure it to ignore missing links to
other objects. So we can adapt the walk to unconditionally disable
fetching any missing objects.

Do so and add a test that verifies we don't backfill any objects.

Reported-by: Lukas Wanko <lwanko@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Hi,

we've recently encountered this issue in a partial clone of one of our
own repositoires. Thanks!

Patrick
---
 builtin/pack-objects.c        | 10 ++++++++++
 t/t5331-pack-objects-stdin.sh | 18 ++++++++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9807dd0eff..4053f9659f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3925,8 +3925,16 @@ static void add_unreachable_loose_objects(struct rev_info *revs);
 
 static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 {
+	int prev_fetch_if_missing = fetch_if_missing;
 	struct rev_info revs;
 
+	/*
+	 * The revision walk may hit objects that are promised, only. As the
+	 * walk is best-effort though we don't want to perform backfill fetches
+	 * for them.
+	 */
+	fetch_if_missing = 0;
+
 	repo_init_revisions(the_repository, &revs, NULL);
 	/*
 	 * Use a revision walk to fill in the namehash of objects in the include
@@ -3962,6 +3970,8 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
 			   stdin_packs_found_nr);
 	trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
 			   stdin_packs_hints_nr);
+
+	fetch_if_missing = prev_fetch_if_missing;
 }
 
 static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index cd949025b9..c3bbc76b0d 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -358,6 +358,24 @@ test_expect_success '--stdin-packs with promisors' '
 	)
 '
 
+test_expect_success '--stdin-packs does not perform backfill fetch' '
+	test_when_finished "rm -rf remote client" &&
+
+	git init remote &&
+	test_commit_bulk -C remote 10 &&
+	git -C remote config set --local uploadpack.allowfilter 1 &&
+	git -C remote config set --local uploadpack.allowanysha1inwant 1 &&
+
+	git clone --filter=tree:0 "file://$(pwd)/remote" client &&
+	(
+		cd client &&
+		ls .git/objects/pack/*.promisor | sed "s|.*/||; s/\.promisor$/.pack/" >packs &&
+		test_line_count -gt 1 packs &&
+		GIT_TRACE2_EVENT="$(pwd)/event.log" git pack-objects --stdin-packs pack <packs &&
+		test_grep ! "\"event\":\"child_start\"" event.log
+	)
+'
+
 stdin_packs__follow_with_only () {
 	rm -fr stdin_packs__follow_with_only &&
 	git init stdin_packs__follow_with_only &&

---
base-commit: 864f55e1906897b630333675a52874c0fec2a45c
change-id: 20260210-pks-pack-objects-stdin-skip-backfill-fetch-f69e55091b11


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-12 23:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-11 12:44 [PATCH] builtin/pack-objects: don't fetch objects when merging packs Patrick Steinhardt
2026-02-11 17:21 ` Junio C Hamano
2026-02-12  6:37   ` Patrick Steinhardt
2026-02-12 23:46 ` Taylor Blau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox