* [PATCH] builtin/pack-objects: don't fetch objects when merging packs
@ 2026-02-11 12:44 Patrick Steinhardt
2026-02-11 17:21 ` Junio C Hamano
2026-02-12 23:46 ` Taylor Blau
0 siblings, 2 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2026-02-11 12:44 UTC (permalink / raw)
To: git; +Cc: Lukas Wanko
The "--stdin-packs" option can be used to merge objects from multiple
packfiles given via stdin into a new packfile. One big upside of this
option is that we don't have to perform a complete rev walk to enumerate
objects. Instead, we can simply enumerate all objects that are part of
the specified packfiles, which can be significantly faster in very large
repositories.
There is one downside though: when we don't perform a rev walk we also
don't have a good way to learn about the respective object's names. As a
consequence, we cannot use the name hashes as a heuristic to get better
delta selection.
We try to offset this downside though by performing a localized rev
walk: we queue all objects that we're about to repack as interesting,
and all objects from excluded packfiles as uninteresting. We then
perform a best-effort rev walk that allows us to fill in object names.
There is one gotcha here though: when "--exclude-promisor-objects" has
not been given we will perform backfill fetches for any promised objects
that are missing. This used to not be an issue though as this option was
mutually exclusive with "--stdin-packs". But that has changed recently,
and starting with dcc9c7ef47 (builtin/repack: handle promisor packs with
geometric repacking, 2026-01-05) we will now repack promisor packs
during geometric compaction. The consequence is that a geometric repack
may now perform a bunch of backfill fetches.
We of course cannot passe "--exclude-promisor-objects" to fix this
issue -- after all, the whole intent is to repack objects part of a
promisor pack. But arguably we don't have to: the rev walk is intended
as best effort, and we already configure it to ignore missing links to
other objects. So we can adapt the walk to unconditionally disable
fetching any missing objects.
Do so and add a test that verifies we don't backfill any objects.
Reported-by: Lukas Wanko <lwanko@gitlab.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Hi,
we've recently encountered this issue in a partial clone of one of our
own repositoires. Thanks!
Patrick
---
builtin/pack-objects.c | 10 ++++++++++
t/t5331-pack-objects-stdin.sh | 18 ++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 9807dd0eff..4053f9659f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3925,8 +3925,16 @@ static void add_unreachable_loose_objects(struct rev_info *revs);
static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
{
+ int prev_fetch_if_missing = fetch_if_missing;
struct rev_info revs;
+ /*
+ * The revision walk may hit objects that are promised, only. As the
+ * walk is best-effort though we don't want to perform backfill fetches
+ * for them.
+ */
+ fetch_if_missing = 0;
+
repo_init_revisions(the_repository, &revs, NULL);
/*
* Use a revision walk to fill in the namehash of objects in the include
@@ -3962,6 +3970,8 @@ static void read_stdin_packs(enum stdin_packs_mode mode, int rev_list_unpacked)
stdin_packs_found_nr);
trace2_data_intmax("pack-objects", the_repository, "stdin_packs_hints",
stdin_packs_hints_nr);
+
+ fetch_if_missing = prev_fetch_if_missing;
}
static void add_cruft_object_entry(const struct object_id *oid, enum object_type type,
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index cd949025b9..c3bbc76b0d 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -358,6 +358,24 @@ test_expect_success '--stdin-packs with promisors' '
)
'
+test_expect_success '--stdin-packs does not perform backfill fetch' '
+ test_when_finished "rm -rf remote client" &&
+
+ git init remote &&
+ test_commit_bulk -C remote 10 &&
+ git -C remote config set --local uploadpack.allowfilter 1 &&
+ git -C remote config set --local uploadpack.allowanysha1inwant 1 &&
+
+ git clone --filter=tree:0 "file://$(pwd)/remote" client &&
+ (
+ cd client &&
+ ls .git/objects/pack/*.promisor | sed "s|.*/||; s/\.promisor$/.pack/" >packs &&
+ test_line_count -gt 1 packs &&
+ GIT_TRACE2_EVENT="$(pwd)/event.log" git pack-objects --stdin-packs pack <packs &&
+ test_grep ! "\"event\":\"child_start\"" event.log
+ )
+'
+
stdin_packs__follow_with_only () {
rm -fr stdin_packs__follow_with_only &&
git init stdin_packs__follow_with_only &&
---
base-commit: 864f55e1906897b630333675a52874c0fec2a45c
change-id: 20260210-pks-pack-objects-stdin-skip-backfill-fetch-f69e55091b11
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] builtin/pack-objects: don't fetch objects when merging packs
2026-02-11 12:44 [PATCH] builtin/pack-objects: don't fetch objects when merging packs Patrick Steinhardt
@ 2026-02-11 17:21 ` Junio C Hamano
2026-02-12 6:37 ` Patrick Steinhardt
2026-02-12 23:46 ` Taylor Blau
1 sibling, 1 reply; 4+ messages in thread
From: Junio C Hamano @ 2026-02-11 17:21 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Lukas Wanko
Patrick Steinhardt <ps@pks.im> writes:
> The "--stdin-packs" option can be used to merge objects from multiple
> packfiles given via stdin into a new packfile. One big upside of this
> option is that we don't have to perform a complete rev walk to enumerate
> objects. Instead, we can simply enumerate all objects that are part of
> the specified packfiles, which can be significantly faster in very large
> repositories.
>
> There is one downside though: when we don't perform a rev walk we also
> don't have a good way to learn about the respective object's names. As a
> consequence, we cannot use the name hashes as a heuristic to get better
> delta selection.
>
> We try to offset this downside though by performing a localized rev
> walk: we queue all objects that we're about to repack as interesting,
> and all objects from excluded packfiles as uninteresting. We then
> perform a best-effort rev walk that allows us to fill in object names.
>
> There is one gotcha here though: when "--exclude-promisor-objects" has
> not been given we will perform backfill fetches for any promised objects
> that are missing. This used to not be an issue though as this option was
> mutually exclusive with "--stdin-packs". But that has changed recently,
> and starting with dcc9c7ef47 (builtin/repack: handle promisor packs with
> geometric repacking, 2026-01-05) we will now repack promisor packs
> during geometric compaction. The consequence is that a geometric repack
> may now perform a bunch of backfill fetches.
>
> We of course cannot passe "--exclude-promisor-objects" to fix this
> issue -- after all, the whole intent is to repack objects part of a
> promisor pack. But arguably we don't have to: the rev walk is intended
> as best effort, and we already configure it to ignore missing links to
> other objects. So we can adapt the walk to unconditionally disable
> fetching any missing objects.
"passe" -> "pass".
Other than that, very nicely described, and the implementation is
surprisingly simple (thanks to a single global variable, and
asumption that makes it safe to use such a single global variable,
i.e., there is just one packing operation running at a time).
Will queue. Thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] builtin/pack-objects: don't fetch objects when merging packs
2026-02-11 17:21 ` Junio C Hamano
@ 2026-02-12 6:37 ` Patrick Steinhardt
0 siblings, 0 replies; 4+ messages in thread
From: Patrick Steinhardt @ 2026-02-12 6:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Lukas Wanko
On Wed, Feb 11, 2026 at 09:21:16AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > The "--stdin-packs" option can be used to merge objects from multiple
> > packfiles given via stdin into a new packfile. One big upside of this
> > option is that we don't have to perform a complete rev walk to enumerate
> > objects. Instead, we can simply enumerate all objects that are part of
> > the specified packfiles, which can be significantly faster in very large
> > repositories.
> >
> > There is one downside though: when we don't perform a rev walk we also
> > don't have a good way to learn about the respective object's names. As a
> > consequence, we cannot use the name hashes as a heuristic to get better
> > delta selection.
> >
> > We try to offset this downside though by performing a localized rev
> > walk: we queue all objects that we're about to repack as interesting,
> > and all objects from excluded packfiles as uninteresting. We then
> > perform a best-effort rev walk that allows us to fill in object names.
> >
> > There is one gotcha here though: when "--exclude-promisor-objects" has
> > not been given we will perform backfill fetches for any promised objects
> > that are missing. This used to not be an issue though as this option was
> > mutually exclusive with "--stdin-packs". But that has changed recently,
> > and starting with dcc9c7ef47 (builtin/repack: handle promisor packs with
> > geometric repacking, 2026-01-05) we will now repack promisor packs
> > during geometric compaction. The consequence is that a geometric repack
> > may now perform a bunch of backfill fetches.
> >
> > We of course cannot passe "--exclude-promisor-objects" to fix this
> > issue -- after all, the whole intent is to repack objects part of a
> > promisor pack. But arguably we don't have to: the rev walk is intended
> > as best effort, and we already configure it to ignore missing links to
> > other objects. So we can adapt the walk to unconditionally disable
> > fetching any missing objects.
>
> "passe" -> "pass".
Oops, right. Fixed locally, and I saw that you also fixed it in your
version.
> Other than that, very nicely described, and the implementation is
> surprisingly simple (thanks to a single global variable, and
> asumption that makes it safe to use such a single global variable,
> i.e., there is just one packing operation running at a time).
>
> Will queue. Thanks.
Thanks!
Patrick
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] builtin/pack-objects: don't fetch objects when merging packs
2026-02-11 12:44 [PATCH] builtin/pack-objects: don't fetch objects when merging packs Patrick Steinhardt
2026-02-11 17:21 ` Junio C Hamano
@ 2026-02-12 23:46 ` Taylor Blau
1 sibling, 0 replies; 4+ messages in thread
From: Taylor Blau @ 2026-02-12 23:46 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Lukas Wanko
On Wed, Feb 11, 2026 at 01:44:59PM +0100, Patrick Steinhardt wrote:
> The "--stdin-packs" option can be used to merge objects from multiple
> packfiles given via stdin into a new packfile. One big upside of this
> option is that we don't have to perform a complete rev walk to enumerate
> objects. Instead, we can simply enumerate all objects that are part of
> the specified packfiles, which can be significantly faster in very large
> repositories.
>
> There is one downside though: when we don't perform a rev walk we also
> don't have a good way to learn about the respective object's names. As a
> consequence, we cannot use the name hashes as a heuristic to get better
> delta selection.
>
> We try to offset this downside though by performing a localized rev
> walk: we queue all objects that we're about to repack as interesting,
> and all objects from excluded packfiles as uninteresting. We then
> perform a best-effort rev walk that allows us to fill in object names.
Nicely explained. I think it's reasonable to ask "well why are we doing
a revwalk, you just told me that --stdin-packs does not require a
(potentially expensive) traversal?". I think that the exposition you
provided here answers that question nicely.
> There is one gotcha here though: when "--exclude-promisor-objects" has
> not been given we will perform backfill fetches for any promised objects
> that are missing. This used to not be an issue though as this option was
> mutually exclusive with "--stdin-packs". But that has changed recently,
> and starting with dcc9c7ef47 (builtin/repack: handle promisor packs with
> geometric repacking, 2026-01-05) we will now repack promisor packs
> during geometric compaction. The consequence is that a geometric repack
> may now perform a bunch of backfill fetches.
Great find!
> We of course cannot passe "--exclude-promisor-objects" to fix this
s/passe/pass, though I think Junio noted this below.
> issue -- after all, the whole intent is to repack objects part of a
> promisor pack. But arguably we don't have to: the rev walk is intended
> as best effort, and we already configure it to ignore missing links to
> other objects. So we can adapt the walk to unconditionally disable
> fetching any missing objects.
Yep, I think this is the right trade-off.
> ---
> builtin/pack-objects.c | 10 ++++++++++
> t/t5331-pack-objects-stdin.sh | 18 ++++++++++++++++++
> 2 files changed, 28 insertions(+)
The implementation and test look exactly as expected. Thanks for jumping
on this and fixing it!
Thanks,
Taylor
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-12 23:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-11 12:44 [PATCH] builtin/pack-objects: don't fetch objects when merging packs Patrick Steinhardt
2026-02-11 17:21 ` Junio C Hamano
2026-02-12 6:37 ` Patrick Steinhardt
2026-02-12 23:46 ` Taylor Blau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox