From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
Patrick Steinhardt <ps@pks.im>,
git@vger.kernel.org
Subject: Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
Date: Fri, 23 May 2025 13:46:20 -0400 [thread overview]
Message-ID: <aDC0bK+NOuuVvQtb@nand.local> (raw)
In-Reply-To: <20250523020820.GB559000@coredump.intra.peff.net>
On Thu, May 22, 2025 at 10:08:20PM -0400, Jeff King wrote:
> > (As an aside, I think I recall you suggesting a while ago that it might
> > be interesting to define "global" things with a different type than
> > "local" ones to prevent this sort of confusion. That would allow us to
> > keep both "pack_int_id" and the return value of "midx_for_pack()" in
> > scope at the same time, without the possibility of using one when you
> > meant to use the other.)
>
> Yeah. The trouble is that it becomes awkward in C, since the language
> will happily intermix two integer typedefs. So you have to wrap them in
> a struct and access the struct fields everywhere.
Yup :-<.
> > > - There's a similar case in midx-write.c:want_included_pack(). That
> > > one seems to have the same local/global confusion, but I do not
> > > obviously see anything preventing it from being fed a non-base midx.
> > > So it might possibly be buggy?
> >
> > Yeah, this spot is definitely broken. At minimum it would need something
> > like:
> >
> > --- 8< ---
> > diff --git a/midx-write.c b/midx-write.c
> > index 0897cbd829..54a04f7b75 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -1634,9 +1634,10 @@ static int want_included_pack(struct repository *r,
> > uint32_t pack_int_id)
> > {
> > struct packed_git *p;
> > + midx_for_pack(&m, pack_int_id);
> > if (prepare_midx_pack(r, m, pack_int_id))
> > return 0;
> > - p = m->packs[pack_int_id];
> > + p = m->packs[pack_int_id - m->num_packs_in_base];
> > if (!pack_kept_objects && p->pack_keep)
> > return 0;
> > if (p->is_cruft)
> > --- >8 ---
>
> Yep, that's exactly what I was envisioning. I guess it's probably
> possible to trigger in practice by writing a new midx based on an
> existing incremental state. I'll let you figure that part out. :)
Hmm. I thought that this spot was broken last night, but looking at it
again today I think that it is actually OK. I started writing an
analysis of why in response to your email, and then decided to throw it
away since those details really should live in the patch message. Here's
what I came up with:
--- 8< ---
Subject: [PATCH] midx-write.c: guard against incremental MIDXs in
want_included_pack()
The function want_included_pack() is used to determine whether or not a
the packfile corresponding to some given pack_int_id should be included
in a 'git multi-pack-index repack' operation.
This spot looks like it would be broken, particularly in:
struct packed_git *p;
if (prepare_midx_pack(r, m, pack_int_id))
return 0;
p = m->packs[pack_int_id];
, when pack_int_id is greater than m->num_pack_in_base (supposing that
m->num_packs_in_base is non-zero, or equivalently that m->base_midx is
non-NULL).
Suppose we have two MIDXs in an incremental MIDX chain, each having two
packs:
- M0 = {packs: [P0, P1], base_midx: NULL}
- M1 = {packs: [P2, P3], base_midx: M0}
noting that each pack is identified by its global pack_int_id within the
chain.
So if you do something like:
want_included_pack(the_repository, M1, pack_kept_objects, 2);
that function will call prepare_midx_pack(), which is smart enough to
realize that the pack of interest is in the current layer (M1), and
knows how to adjust its global pack_int_id into an index into the
current layer's 'packs' array.
But the following line:
p = m->packs[pack_int_id]; /* m is M1, pack_int_id is 2 */
looks broken, since each layer of the MIDX only maintains an array of
the packs stored within that layer, and 'm' wasn't adjusted back to
point at M1->base_midx (M0).
The right thing to do would be:
struct packed_git *p;
if (prepare_midx_pack(r, m, pack_int_id))
return 0;
/* open-code midx.c::midx_for_pack(), which is private */
while (m && pack_int_id < m->num_packs_in_base)
m = m->base_midx;
if (!m)
BUG("broken midx?");
if (pack_int_id >= m->num_packs + m->num_packs_in_base)
BUG("broken pack_int_id?");
p = m->packs[pack_int_id - m->num_packs_in_base];
But that would be overkill, since this function never deals with
incremental MIDXs having more than one layer! To see why, observe that
want_included_pack() has two callers:
- midx-write.c::fill_included_packs_all()
- midx-write.c::fill_included_packs_batch()
and those functions' sole caller is in midx-write.c::midx_repack(),
which dispatches a call to one or the other depending on whether or not
the batch_size is non-zero.
And at the very top of midx_repack(), we have a guard against
non-trivial incremental MIDX chains:
if (m->base_midx)
die(_("cannot repack an incremental multi-pack-index"));
So want_included_pack() is OK becuase it will never encounter a
situation where it has to chase backwards through the '->base_midx'
pointer. But that is not immediately clear from reading the code, and is
too fragile for my comfort. Make this more clear by adding an ASSERT()
to the above effect.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/midx-write.c b/midx-write.c
index 0897cbd829..991c42d1dc 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1634,6 +1634,9 @@ static int want_included_pack(struct repository *r,
uint32_t pack_int_id)
{
struct packed_git *p;
+
+ ASSERT(m && !m->base_midx);
+
if (prepare_midx_pack(r, m, pack_int_id))
return 0;
p = m->packs[pack_int_id];
@@ -1653,6 +1656,8 @@ static void fill_included_packs_all(struct repository *r,
uint32_t i;
int pack_kept_objects = 0;
+ ASSERT(m && !m->base_midx);
+
repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
for (i = 0; i < m->num_packs; i++) {
@@ -1673,6 +1678,8 @@ static void fill_included_packs_batch(struct repository *r,
struct repack_info *pack_info;
int pack_kept_objects = 0;
+ ASSERT(m && !m->base_midx);
+
CALLOC_ARRAY(pack_info, m->num_packs);
repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
--
2.49.0.221.g485f5f8636.dirty
--- >8 ---
> > > Likewise fill_included_packs_batch() in the same file.
> >
> > I think this one is actually OK for the same reason as the
> > expire_midx_packs() case. Its sole caller in midx_repack() has:
> >
> > if (m->base_midx)
> > die(_("cannot repack an incremental multi-pack-index"));
> >
> > , so we are OK there. We might want to add an ASSERT() in
> > fill_included_packs_batch() to make it clearer, though.
>
> Ah, good. Yes, I agree that an assertion would be a nice bit of
> documentation/safety. Though for these cases I think they only care
> about the pack (not the containing midx), so changing the return type of
> prepare_midx_pack() might be an even easier way to get that safety.
Ironically, the same argument applies for this function (and the _all()
variant) as well ;-).
Thanks,
Taylor
next prev parent reply other threads:[~2025-05-23 17:46 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-16 8:55 [PATCH] packfile: avoid access(3p) calls for missing packs Patrick Steinhardt
2025-05-16 18:34 ` Junio C Hamano
2025-05-19 6:52 ` Jeff King
2025-05-19 15:46 ` Junio C Hamano
2025-05-20 6:45 ` Patrick Steinhardt
2025-05-22 5:28 ` Jeff King
2025-05-23 1:02 ` Taylor Blau
2025-05-23 2:03 ` Jeff King
2025-05-20 9:53 ` [PATCH v2 0/2] " Patrick Steinhardt
2025-05-20 9:53 ` [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
2025-05-23 1:03 ` Taylor Blau
2025-05-20 9:53 ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
2025-05-22 5:32 ` Jeff King
2025-05-22 15:47 ` Junio C Hamano
2025-05-22 16:59 ` Jeff King
2025-05-22 18:44 ` Junio C Hamano
2025-05-23 1:22 ` Taylor Blau
2025-05-23 2:08 ` Jeff King
2025-05-23 17:46 ` Taylor Blau [this message]
2025-05-25 18:41 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
2025-05-25 18:41 ` [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack Taylor Blau
2025-05-26 7:23 ` Patrick Steinhardt
2025-05-28 2:00 ` Taylor Blau
2025-05-25 18:41 ` [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
2025-05-26 7:23 ` Patrick Steinhardt
2025-05-28 2:08 ` Taylor Blau
2025-05-25 18:41 ` [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx() Taylor Blau
2025-05-26 7:23 ` Patrick Steinhardt
2025-05-28 2:15 ` Taylor Blau
2025-05-25 18:42 ` [PATCH 4/5] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
2025-05-25 18:42 ` [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
2025-05-26 7:24 ` Patrick Steinhardt
2025-05-28 2:18 ` Taylor Blau
2025-05-28 11:53 ` Patrick Steinhardt
2025-05-28 22:58 ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
2025-05-28 22:59 ` [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()` Taylor Blau
2025-05-29 20:47 ` Junio C Hamano
2025-06-03 22:22 ` Taylor Blau
2025-05-29 20:51 ` Junio C Hamano
2025-06-03 22:23 ` Taylor Blau
2025-05-28 22:59 ` [PATCH v2 2/4] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
2025-05-28 22:59 ` [PATCH v2 3/4] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
2025-05-28 22:59 ` [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
2025-05-30 6:50 ` Jeff King
2025-06-03 22:27 ` Taylor Blau
2025-08-28 23:25 ` Junio C Hamano
2025-05-23 1:31 ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Taylor Blau
2025-05-23 2:18 ` Jeff King
2025-05-21 13:24 ` [PATCH v2 0/2] packfile: avoid access(3p) calls for missing packs Junio C Hamano
2025-05-28 12:24 ` [PATCH v3 " Patrick Steinhardt
2025-05-28 12:24 ` [PATCH v3 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
2025-05-28 12:24 ` [PATCH v3 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
2025-05-30 6:27 ` [PATCH v3 0/2] packfile: avoid access(3p) calls for missing packs Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aDC0bK+NOuuVvQtb@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).