From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed
Date: Wed, 10 Dec 2025 10:40:21 +0100 [thread overview]
Message-ID: <aTlABTCWRPNvUEGc@pks.im> (raw)
In-Reply-To: <aTjfj45uFl/f3b4K@nand.local>
On Tue, Dec 09, 2025 at 09:48:47PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 07:27:15PM +0100, Patrick Steinhardt wrote:
> > Address this issue by introducing a new function that determines whether
> > a rewrite of the MIDX would cause any user-visible changes. This covers
> > the following cases:
> >
> > - No multi-pack index exists at all.
> >
> > - The user asked us to write a bitmap, and we don't have any.
> >
> > - The request preferred pack is different than the one that we have.
> >
> > - The packfiles covered by the MIDX are changing.
>
> I can't think of any cases beyond the ones you listed here that would
> require us to regenerate the MIDX. One kind-of-exception here would be:
>
> $ git repack [...] --write-midx --write-bitmap-index
> $ git repack [...] --write-midx
>
> where the second repack would generate an identical MIDX, but does not
> want to retain a bitmap. That case is already handled in the MIDX
> writing code if you search for "want_bitmap".
>
> That makes me wonder whether the repack layer is the most appropriate
> one to handle this logic. It seems like write_midx_internal() would
> reasonably be able to detect whether or not the MIDX we have already is
> up-to-date with respect to the given input.
One upside of having it in git-repack(1) is that we need to care about
less situations in general as we are operating on a higher level. And
because of that we can make more assumptions.
That being said, putting it into `write_midx_internal()` has the benefit
that we're of course covering more potential cases where we can avoid a
needless rewrite of the MIDX, and that we have better information to
decide whether it would be needed or not.
> I think that makes some things about your patch easier and other things
> a little harder ;-).
>
> - On the "easier" front: while both the MIDX code and the portion of
> the repack code that drives it receive the same set of packs to
> include, the MIDX code already has the packs it would compare
> in a standard format. That would avoid you having to handle ends_with
> ends_with(include_name, ".idx") and ends_with(existing_name, ".pack")
> as special cases, which would be nice.
>
> - On the "harder" front: when driving MIDX generation with the
> '--stdin-packs' option, we *don't* load an existing MIDX ever since
> 0c5a62f14bc (midx-write.c: do not read existing MIDX with
> `packs_to_include`, 2024-06-11).
>
> I don't think that "harder" one is a show-stopper, though. Commit
> t0c5a62f14bc has enough gory details around how we generate pack IDs and
> various subtle assumptions about how and when we load packs that I am
> very hesitant to recommend changing it given its fragility (though we
> should examine and harden any fragilities within midx-write.c, maybe
> just separately ;-)).
>
> So I don't think that we should make that change ahead of this patch.
> While you can't rely on being able to read 'ctx.m', I think you could
> load the MIDX belonging to "source" ad-hoc after we have computed the
> packs to fill from the MIDX's perspective, which is right around where
> that want_bitmap code lives.
Yeah, we're already loading the MIDX on-demand because in git-repack(1)
as we have closed the object database at the point in time where we're
about to write the MIDX. So overall the change is rather easy to make.
Also, now that I see that we already have some short-circuiting
conditions in git-multi-pack-index(1) I guess it makes sense to extend
those checks. They explicitly don't cover `--stdin-packs` rewrites right
now, so adding that check is an obvious improvement.
One thing I'm a bit torn on is whether or not to handle preferred packs
in the adjusted logic. We don't do so right now either, so I think I'll
drop this for now. I can see an argument that us not handling a changed
preferred pack is a bug though, in which case I'm happy to iterate.
Thanks for your feedback!
Patrick
next prev parent reply other threads:[~2025-12-10 9:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 0:22 ` Taylor Blau
2025-12-10 9:40 ` Patrick Steinhardt
2025-12-18 21:13 ` Taylor Blau
2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
2025-12-10 2:48 ` Taylor Blau
2025-12-10 9:40 ` Patrick Steinhardt [this message]
2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
2025-12-10 12:52 ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
2025-12-11 8:46 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
2025-12-12 7:33 ` Patrick Steinhardt
2025-12-18 21:18 ` Taylor Blau
2025-12-19 6:25 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTlABTCWRPNvUEGc@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.