From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed
Date: Tue, 9 Dec 2025 21:48:47 -0500 [thread overview]
Message-ID: <aTjfj45uFl/f3b4K@nand.local> (raw)
In-Reply-To: <20251208-pks-skip-noop-rewrite-v1-2-430d52dba9f0@pks.im>
On Mon, Dec 08, 2025 at 07:27:15PM +0100, Patrick Steinhardt wrote:
> Address this issue by introducing a new function that determines whether
> a rewrite of the MIDX would cause any user-visible changes. This covers
> the following cases:
>
> - No multi-pack index exists at all.
>
> - The user asked us to write a bitmap, and we don't have any.
>
> - The request preferred pack is different than the one that we have.
>
> - The packfiles covered by the MIDX are changing.
I can't think of any cases beyond the ones you listed here that would
require us to regenerate the MIDX. One kind-of-exception here would be:
$ git repack [...] --write-midx --write-bitmap-index
$ git repack [...] --write-midx
where the second repack would generate an identical MIDX, but does not
want to retain a bitmap. That case is already handled in the MIDX
writing code if you search for "want_bitmap".
That makes me wonder whether the repack layer is the most appropriate
one to handle this logic. It seems like write_midx_internal() would
reasonably be able to detect whether or not the MIDX we have already is
up-to-date with respect to the given input.
I think that makes some things about your patch easier and other things
a little harder ;-).
- On the "easier" front: while both the MIDX code and the portion of
the repack code that drives it receive the same set of packs to
include, the MIDX code already has the packs it would compare
in a standard format. That would avoid you having to handle ends_with
ends_with(include_name, ".idx") and ends_with(existing_name, ".pack")
as special cases, which would be nice.
- On the "harder" front: when driving MIDX generation with the
'--stdin-packs' option, we *don't* load an existing MIDX ever since
0c5a62f14bc (midx-write.c: do not read existing MIDX with
`packs_to_include`, 2024-06-11).
I don't think that "harder" one is a show-stopper, though. Commit
t0c5a62f14bc has enough gory details around how we generate pack IDs and
various subtle assumptions about how and when we load packs that I am
very hesitant to recommend changing it given its fragility (though we
should examine and harden any fragilities within midx-write.c, maybe
just separately ;-)).
So I don't think that we should make that change ahead of this patch.
While you can't rely on being able to read 'ctx.m', I think you could
load the MIDX belonging to "source" ad-hoc after we have computed the
packs to fill from the MIDX's perspective, which is right around where
that want_bitmap code lives.
I suspect that that may simplify some of your patch, though please let
me know if that turns out not to be the case.
> Only if any of these conditions trigger we decide to write a new MIDX.
> This allows us to significantly reduce the time for repacks that end up
> doing nothing:
>
> Benchmark 1: git repack --geometric=2 --write-midx -d
> Time (mean ± σ): 3.183 s ± 0.078 s [User: 2.924 s, System: 0.219 s]
> Range (min … max): 2.985 s … 3.260 s 10 runs
>
> Benchmark 2: ./git repack --geometric=2 --write-midx -d
> Time (mean ± σ): 102.5 ms ± 1.0 ms [User: 89.3 ms, System: 12.7 ms]
> Range (min … max): 101.3 ms … 105.3 ms 28 runs
>
> Summary
> ./git repack --geometric=2 --write-midx -d ran
> 31.06 ± 0.82 times faster than git repack --geometric=2 --write-midx -d
Makes sense since we're effectively timing just MIDX generation here.
(I'll avoid reading midx_needs_update() too closely here in case you
make any changes based on the above. Glancing over it briefly, it all
seemed reasonable to me.)
> +test_expect_success 'up-to-date multi-pack-index is retained' '
> + test_when_finished "rm -fr midx-up-to-date" &&
> + git init midx-up-to-date &&
> + (
> + cd midx-up-to-date &&
> +
> + # Write the initial pack that contains the most objects. This
> + # will be the preferred pack.
> + test_commit first &&
> + test_commit second &&
> + git repack -Ad --write-midx &&
> + test_midx_is_retained -Ad &&
Should the MIDX be retained in this case? I think there is a reasonable
argument to be made in either direction here.
On the one hand: we performed the expected repack, and doing so did not
cause the existing MIDX to be invalid, so we left it alone. On the other
hand: the caller asked us to repack without writing a MIDX, so could be
surprised that one exists.
> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 9fc1626fbf..980599961c 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -287,6 +287,86 @@ test_expect_success '--geometric with pack.packSizeLimit' '
> )
> '
>
> +test_expect_success '--geometric --write-midx retains up-to-date MIDX without bitmap index' '
> + test_when_finished "rm -fr repo" &&
> + git init repo &&
> + (
> + cd repo &&
> + test_commit initial &&
> +
> + test_path_is_missing .git/objects/pack/multi-pack-index &&
> + git repack --geometric=2 --write-midx --no-write-bitmap-index &&
> + test_path_is_file .git/objects/pack/multi-pack-index &&
> + test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
This portion is very similar to the new function added in t5319. I
wonder if it's worth sticking that in t/lib-midx.sh or similar?
I was wondering what this test was exercising that wasn't covered in the
earlier script, but since the geometric code path is quite different
from the non-geometric one, I think having coverage there is definitely
worthwhile.
> +test_expect_success '--geometric --write-midx regenerates MIDX when preferred pack changes' '
> + test_when_finished "rm -fr repo" &&
> + git init repo &&
> + (
> + cd repo &&
> +
> + test_commit first &&
> + test_commit second &&
> + test_commit third &&
> + git repack --geometric=2 --write-midx --write-bitmap-index &&
> + test_commit fourth &&
> + git repack --geometric=2 --write-midx --write-bitmap-index &&
This part is a little subtle since we're relying on the geometric repack
to pick up any loose objects that don't appear in existing pack(s). I
might suggest something like:
for c in first second third
do
test_commit $c || return 1
done &&
git repack -d && # ...
, to make that step explicit, but I don't think it's a huge deal.
> +
> + ls .git/objects/pack/*.pack >packs &&
Here and below you should be able to use $packdir instead of writing out
".git/objects/pack" explicitly.
> + test_line_count = 2 packs &&
> + preferred_pack=$(test-tool read-midx --preferred-pack .git/objects) &&
> + other_pack=$(ls .git/objects/pack/*.idx | grep -v "$preferred_pack") &&
> + echo "$preferred_pack" &&
> + echo "$other_pack" &&
Stray debug output?
> +
> + # Rewrite the multi-pack index with the current preferred pack.
> + # git-repack(1) should decide to _not_ repack the MIDX in that
s/repack the MIDX/rewrite the MIDX/
> + # case. This is mostly a sanity check to verify that the reason
> + # for the repack really only is the changed preferred pack.
> + rm -f .git/objects/pack/multi-pack-index* &&
Same note as above except here you can use $midx. I don't think you ever
have a .git/objects/pack/multi-pack-index.d here, so a standard removal
should do the trick. If you did, you would need to pass '-r' as well.
> + git multi-pack-index write --bitmap --preferred-pack="$preferred_pack" &&
> + test-tool chmtime =0 .git/objects/pack/multi-pack-index &&
> + ls -l .git/objects/pack/ >expect &&
> + git repack --geometric=2 --write-midx --write-bitmap-index &&
I'm having a little bit of a hard time following this one. Are we
guaranteed to pick $preferred_pack as our preferred pack here in the
second repack?
Thanks,
Taylor
next prev parent reply other threads:[~2025-12-10 2:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 0:22 ` Taylor Blau
2025-12-10 9:40 ` Patrick Steinhardt
2025-12-18 21:13 ` Taylor Blau
2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
2025-12-10 2:48 ` Taylor Blau [this message]
2025-12-10 9:40 ` Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
2025-12-11 8:46 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
2025-12-12 7:33 ` Patrick Steinhardt
2025-12-18 21:18 ` Taylor Blau
2025-12-19 6:25 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTjfj45uFl/f3b4K@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).