git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed
Date: Wed, 10 Dec 2025 10:40:21 +0100	[thread overview]
Message-ID: <aTlABTCWRPNvUEGc@pks.im> (raw)
In-Reply-To: <aTjfj45uFl/f3b4K@nand.local>

On Tue, Dec 09, 2025 at 09:48:47PM -0500, Taylor Blau wrote:
> On Mon, Dec 08, 2025 at 07:27:15PM +0100, Patrick Steinhardt wrote:
> > Address this issue by introducing a new function that determines whether
> > a rewrite of the MIDX would cause any user-visible changes. This covers
> > the following cases:
> >
> >   - No multi-pack index exists at all.
> >
> >   - The user asked us to write a bitmap, and we don't have any.
> >
> >   - The request preferred pack is different than the one that we have.
> >
> >   - The packfiles covered by the MIDX are changing.
> 
> I can't think of any cases beyond the ones you listed here that would
> require us to regenerate the MIDX. One kind-of-exception here would be:
> 
>     $ git repack [...] --write-midx --write-bitmap-index
>     $ git repack [...] --write-midx
> 
> where the second repack would generate an identical MIDX, but does not
> want to retain a bitmap. That case is already handled in the MIDX
> writing code if you search for "want_bitmap".
> 
> That makes me wonder whether the repack layer is the most appropriate
> one to handle this logic. It seems like write_midx_internal() would
> reasonably be able to detect whether or not the MIDX we have already is
> up-to-date with respect to the given input.

One upside of having it in git-repack(1) is that we need to care about
less situations in general as we are operating on a higher level. And
because of that we can make more assumptions.

That being said, putting it into `write_midx_internal()` has the benefit
that we're of course covering more potential cases where we can avoid a
needless rewrite of the MIDX, and that we have better information to
decide whether it would be needed or not.

> I think that makes some things about your patch easier and other things
> a little harder ;-).
> 
>  - On the "easier" front: while both the MIDX code and the portion of
>    the repack code that drives it receive the same set of packs to
>    include, the MIDX code already has the packs it would compare
>    in a standard format. That would avoid you having to handle ends_with
>    ends_with(include_name, ".idx") and ends_with(existing_name, ".pack")
>    as special cases, which would be nice.
> 
>  - On the "harder" front: when driving MIDX generation with the
>    '--stdin-packs' option, we *don't* load an existing MIDX ever since
>    0c5a62f14bc (midx-write.c: do not read existing MIDX with
>    `packs_to_include`, 2024-06-11).
> 
> I don't think that "harder" one is a show-stopper, though. Commit
> t0c5a62f14bc has enough gory details around how we generate pack IDs and
> various subtle assumptions about how and when we load packs that I am
> very hesitant to recommend changing it given its fragility (though we
> should examine and harden any fragilities within midx-write.c, maybe
> just separately ;-)).
> 
> So I don't think that we should make that change ahead of this patch.
> While you can't rely on being able to read 'ctx.m', I think you could
> load the MIDX belonging to "source" ad-hoc after we have computed the
> packs to fill from the MIDX's perspective, which is right around where
> that want_bitmap code lives.

Yeah, we're already loading the MIDX on-demand because in git-repack(1)
as we have closed the object database at the point in time where we're
about to write the MIDX. So overall the change is rather easy to make.

Also, now that I see that we already have some short-circuiting
conditions in git-multi-pack-index(1) I guess it makes sense to extend
those checks. They explicitly don't cover `--stdin-packs` rewrites right
now, so adding that check is an obvious improvement.

One thing I'm a bit torn on is whether or not to handle preferred packs
in the adjusted logic. We don't do so right now either, so I think I'll
drop this for now. I can see an argument that us not handling a changed
preferred pack is a bug though, in which case I'm happy to iterate.

Thanks for your feedback!

Patrick

  reply	other threads:[~2025-12-10  9:40 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-08 18:27 [PATCH 0/2] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-08 18:27 ` [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10  0:22   ` Taylor Blau
2025-12-10  9:40     ` Patrick Steinhardt
2025-12-18 21:13       ` Taylor Blau
2025-12-08 18:27 ` [PATCH 2/2] builtin/repack: don't regenerate MIDX unless needed Patrick Steinhardt
2025-12-10  2:48   ` Taylor Blau
2025-12-10  9:40     ` Patrick Steinhardt [this message]
2025-12-10 12:52 ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 1/3] midx: fix `BUG()` when getting preferred pack without a reverse index Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 2/3] midx-write: extract function to test whether MIDX needs updating Patrick Steinhardt
2025-12-10 12:52   ` [PATCH v2 3/3] midx-write: skip rewriting MIDX with `--stdin-packs` unless needed Patrick Steinhardt
2025-12-11  8:46   ` [PATCH v2 0/3] builtin/repack: avoid rewriting up-to-date MIDX Junio C Hamano
2025-12-12  7:33     ` Patrick Steinhardt
2025-12-18 21:18       ` Taylor Blau
2025-12-19  6:25         ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTlABTCWRPNvUEGc@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).