From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed
Date: Wed, 10 Dec 2025 10:40:21 +0100 [thread overview]
Message-ID: <aTlABTCWRPNvUEGc@pks.im> (raw)
In-Reply-To: <aTjfj45uFl/f3b4K@nand.local>
On Tue, Dec 09, 2025 at 09:48:47PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 07:27:15PM +0100, Patrick Steinhardt wrote:
> > Address this issue by introducing a new function that determines whether
> > a rewrite of the MIDX would cause any user-visible changes. This covers
> > the following cases:
> >
> > - No multi-pack index exists at all.
> >
> > - The user asked us to write a bitmap, and we don't have any.
> >
> > - The request preferred pack is different than the one that we have.
> >
> > - The packfiles covered by the MIDX are changing.
>
> I can't think of any cases beyond the ones you listed here that would
> require us to regenerate the MIDX. One kind-of-exception here would be:
>
> $ git repack [...] --write-midx --write-bitmap-index
> $ git repack [...] --write-midx
>
> where the second repack would generate an identical MIDX, but does not
> want to retain a bitmap. That case is already handled in the MIDX
> writing code if you search for "want_bitmap".
>
> That makes me wonder whether the repack layer is the most appropriate
> one to handle this logic. It seems like write_midx_internal() would
> reasonably be able to detect whether or not the MIDX we have already is
> up-to-date with respect to the given input.
One upside of having it in git-repack(1) is that we need to care about
less situations in general as we are operating on a higher level. And
because of that we can make more assumptions.
That being said, putting it into `write_midx_internal()` has the benefit
that we're of course covering more potential cases where we can avoid a
needless rewrite of the MIDX, and that we have better information to
decide whether it would be needed or not.
> I think that makes some things about your patch easier and other things
> a little harder ;-).
>
> - On the "easier" front: while both the MIDX code and the portion of
> the repack code that drives it receive the same set of packs to
> include, the MIDX code already has the packs it would compare
> in a standard format. That would avoid you having to handle ends_with
> ends_with(include_name, ".idx") and ends_with(existing_name, ".pack")
> as special cases, which would be nice.
>
> - On the "harder" front: when driving MIDX generation with the
> '--stdin-packs' option, we *don't* load an existing MIDX ever since
> 0c5a62f14bc (midx-write.c: do not read existing MIDX with
> `packs_to_include`, 2024-06-11).
>
> I don't think that "harder" one is a show-stopper, though. Commit
> t0c5a62f14bc has enough gory details around how we generate pack IDs and
> various subtle assumptions about how and when we load packs that I am
> very hesitant to recommend changing it given its fragility (though we
> should examine and harden any fragilities within midx-write.c, maybe
> just separately ;-)).
>
> So I don't think that we should make that change ahead of this patch.
> While you can't rely on being able to read 'ctx.m', I think you could
> load the MIDX belonging to "source" ad-hoc after we have computed the
> packs to fill from the MIDX's perspective, which is right around where
> that want_bitmap code lives.
Yeah, we're already loading the MIDX on-demand because in git-repack(1)
as we have closed the object database at the point in time where we're
about to write the MIDX. So overall the change is rather easy to make.
Also, now that I see that we already have some short-circuiting
conditions in git-multi-pack-index(1) I guess it makes sense to extend
those checks. They explicitly don't cover `--stdin-packs` rewrites right
now, so adding that check is an obvious improvement.
One thing I'm a bit torn on is whether or not to handle preferred packs
in the adjusted logic. We don't do so right now either, so I think I'll
drop this for now. I can see an argument that us not handling a changed
preferred pack is a bug though, in which case I'm happy to iterate.
Thanks for your feedback!
Patrick
next prev parent reply other threads:[~2025-12-10 9:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 0:22 ` Taylor Blau
2025-12-10 9:40 ` Patrick Steinhardt
2025-12-18 21:13 ` Taylor Blau
2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
2025-12-10 2:48 ` Taylor Blau
2025-12-10 9:40 ` Patrick Steinhardt [this message]
2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
2025-12-11 8:46 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
2025-12-12 7:33 ` Patrick Steinhardt
2025-12-18 21:18 ` Taylor Blau
2025-12-19 6:25 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTlABTCWRPNvUEGc@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).