git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] packfile: avoid access(3p) calls for missing packs
@ 2025-05-16  8:55 Patrick Steinhardt
  2025-05-16 18:34 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-16  8:55 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau

The function `add_packed_git()` adds a packfile to the known list of
packs in case it exists. In case the packfile doesn't look like a pack
though, or in case it doesn't exist, the function will simply return a
`NULL` pointer.

The ordering in which those checks are done is somewhat curious though:
we first perform a couple of access(3p) syscalls for auxiliary files
like ".keep" before we determine whether the ".pack" file itself exists.
And if we see that the ".pack" file does not exist, we bail out and thus
effectively discard all the information. This means that the access(3p)
calls were a complete waste of time.

The reason why we do this is likely because we reuse `p->pack_name` to
derive the other files' names, as well, so that we only have to allocate
this buffer once. As such, we have to compute the packfile name last so
that it doesn't get overwritten by the other names.

All of this likely doesn't matter in the general case: Git shouldn't end
up looking too many nonexistent packfiles anyway. But there are edge
cases where it _does_ matter. One such edge case that we have hit in our
production systems was a stale "multi-pack-index" file that contained
references to nonexistent packfiles. As there is no negative lookup
cache this causes us to repeatedly look up the same packfiles via the
multi-pack index, only to realize that they don't exist. This translated
into hundreds of thousands of syscalls to access(3p) and stat(3p).

While the issue was entirely self-made because the multi-pack index
should have been regenerated, we can still reduce the number of syscalls
by 75% in the case of nonexistent packfiles by reordering these calls.
This requires us to restore the final packfile name after the access(3p)
calls. But given that this is a mere memory copy it is very unlikely to
matter in comparison to the four syscalls we performed beforehand.

Another mitigation would be to introduce a negative lookup cache so that
we don't try to add missing packfiles repeatedly. But that refactoring
would be a bit more involved for dubious gains, so for now the author
has decided against it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
Hi,

this patch addresses an issue we have recently seen in our production
systems due to a stale MIDX. The MIDX contained entries for packfiles
that didn't exist anymore, which caused Git to repeatedly look up those
packfiles. Each missing packfile resulted in four repeated syscalls:
three access(3p) calls to check for supporting data structures, and one
call to stat(3p) to check for the packfile itself. The first three calls
are essentially wasted though when the stat(3p) call itself fails, which
is being fixed by this patch.

I doubt that the patch matters in almost any repository, but given that
the refactoring is trivial I thought to submit the patch regardless of
that. Another step would be to introduce a negative lookup cache -- but
that would be a bit more involved, so I decided against it for now as I
don't want to introduce complexity for dubious gains.

Thanks!

Patrick
---
 packfile.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/packfile.c b/packfile.c
index d91016f1c7f..870d48bd949 100644
--- a/packfile.c
+++ b/packfile.c
@@ -737,6 +737,12 @@ struct packed_git *add_packed_git(struct repository *r, const char *path,
 	p = alloc_packed_git(r, alloc);
 	memcpy(p->pack_name, path, path_len);
 
+	xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
+	if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
+		free(p);
+		return NULL;
+	}
+
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
 	if (!access(p->pack_name, F_OK))
 		p->pack_keep = 1;
@@ -749,11 +755,8 @@ struct packed_git *add_packed_git(struct repository *r, const char *path,
 	if (!access(p->pack_name, F_OK))
 		p->is_cruft = 1;
 
+	/* Restore the final packfile name. */
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
-	if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
-		free(p);
-		return NULL;
-	}
 
 	/* ok, it looks sane as far as we can check without
 	 * actually mapping the pack file.

---
base-commit: 1a8a4971cc6c179c4dd711f4a7f5d7178f4b3ab7
change-id: 20250516-pks-pack-avoid-stats-on-missing-8e3b75755cf0


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH] packfile: avoid access(3p) calls for missing packs
  2025-05-16  8:55 [PATCH] packfile: avoid access(3p) calls for missing packs Patrick Steinhardt
@ 2025-05-16 18:34 ` Junio C Hamano
  2025-05-19  6:52   ` Jeff King
  2025-05-20  9:53 ` [PATCH v2 0/2] " Patrick Steinhardt
  2025-05-28 12:24 ` [PATCH v3 " Patrick Steinhardt
  2 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2025-05-16 18:34 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau

Patrick Steinhardt <ps@pks.im> writes:

> The function `add_packed_git()` adds a packfile to the known list of
> packs in case it exists. In case the packfile doesn't look like a pack
> though, or in case it doesn't exist, the function will simply return a
> `NULL` pointer.
>
> The ordering in which those checks are done is somewhat curious though:
> we first perform a couple of access(3p) syscalls for auxiliary files
> like ".keep" before we determine whether the ".pack" file itself exists.
> And if we see that the ".pack" file does not exist, we bail out and thus
> effectively discard all the information. This means that the access(3p)
> calls were a complete waste of time.
>
> The reason why we do this is likely because we reuse `p->pack_name` to
> derive the other files' names, as well, so that we only have to allocate
> this buffer once. As such, we have to compute the packfile name last so
> that it doesn't get overwritten by the other names.

I vaguely recall that the order of checks between .idx and .pack
were deliberately chosen, as we do not want to add and attempt to
use .pack file before its associated .idx file is ready since
without the latter the former is unusable at runtime.  

    Side note: It may have been more important back when the name of
    the packfile was a hash of names of the objects in the pack,
    which meant that when you repack a quiescent and fully repacked
    repository twice with different parameters, you could have ended
    up with a packfile whose name is the same but the contents as a
    bytestream (hence the object offset) are different, making a
    stale .idx not pointing into the correct position in the new
    .pack file.  These days the name of the packfile is based on the
    contents of the pack as a bytestream, so we no longer suffer
    in such a scenario.

But I wouldn't be surprised if checks for other auxiliary files were
added later to come next to .idx not .pack, exactly because .pack is
the primary thing.

> All of this likely doesn't matter in the general case: Git shouldn't end
> up looking too many nonexistent packfiles anyway.

In any case, we shouldn't attempt to use .pack when its associated
files are missing.  It is not about nonexistent but about incomplete
(including "in the process of being written").

> But there are edge
> cases where it _does_ matter. One such edge case that we have hit in our
> production systems was a stale "multi-pack-index" file that contained
> references to nonexistent packfiles. As there is no negative lookup
> cache this causes us to repeatedly look up the same packfiles via the
> multi-pack index, only to realize that they don't exist. This translated
> into hundreds of thousands of syscalls to access(3p) and stat(3p).

Ouch.  Multi-pack index is a more recent invention, and I am not
surprised if such bugs still lurk.  And as you hinted, a solution
may be a negative "known-not-to-exist" list, but given one such
stale midx, would we try to open the same missing packfiles over and
over within the same process?

> while the issue was entirely self-made because the multi-pack index
> should have been regenerated, we can still reduce the number of syscalls
> by 75% in the case of nonexistent packfiles by reordering these calls.

That sounds more like a band-aid, if we still do the remaining 25%
that we somehow know would be unnecessary.

Note that I do not know if the ordering that I think was originally
introduced for correctness still matters for correctness.  The first
paragraph of this message was written without consulting "git log",
lore archive, or even the current state of the source, but from
vague memory.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] packfile: avoid access(3p) calls for missing packs
  2025-05-16 18:34 ` Junio C Hamano
@ 2025-05-19  6:52   ` Jeff King
  2025-05-19 15:46     ` Junio C Hamano
                       ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: Jeff King @ 2025-05-19  6:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Taylor Blau

On Fri, May 16, 2025 at 11:34:10AM -0700, Junio C Hamano wrote:

> > The reason why we do this is likely because we reuse `p->pack_name` to
> > derive the other files' names, as well, so that we only have to allocate
> > this buffer once. As such, we have to compute the packfile name last so
> > that it doesn't get overwritten by the other names.
> 
> I vaguely recall that the order of checks between .idx and .pack
> were deliberately chosen, as we do not want to add and attempt to
> use .pack file before its associated .idx file is ready since
> without the latter the former is unusable at runtime.  

It is definitely intentional and important that we discover packs by
their .idx file first. The writing side uses the creation of the .idx
file (via atomic rename) as the final step in making a pack accessible.
Before that, a reader could see a .pack without its various accessory
files (and so not realize it has a .keep file, for example).

>     Side note: It may have been more important back when the name of
>     the packfile was a hash of names of the objects in the pack,
>     which meant that when you repack a quiescent and fully repacked
>     repository twice with different parameters, you could have ended
>     up with a packfile whose name is the same but the contents as a
>     bytestream (hence the object offset) are different, making a
>     stale .idx not pointing into the correct position in the new
>     .pack file.  These days the name of the packfile is based on the
>     contents of the pack as a bytestream, so we no longer suffer
>     in such a scenario.

I don't think reading the .idx file first was sufficient to protect us
there, since the .pack could be replaced racily after the reader opened
the .idx. I don't recall if we were subject to that race back then (and
it just didn't come up enough to worry about), or if repack would throw
away an identically-named file. Anyway, as you note, it's no longer an
issue.

But I think that is mostly orthogonal to how the auxiliary files are
handled. I think Patrick's guess is correct that the order we have is
simply because it was most convenient to end up with ".pack" in the
name.

> > All of this likely doesn't matter in the general case: Git shouldn't end
> > up looking too many nonexistent packfiles anyway.
> 
> In any case, we shouldn't attempt to use .pack when its associated
> files are missing.  It is not about nonexistent but about incomplete
> (including "in the process of being written").

This shouldn't be a race for files being written. The .idx is written
last, so the .pack should always be there. And the midx is written after
that, so again, the pack should always be there. At least for newly
written packs.

We might see the opposite: packs going away because they were repacked
(and perhaps even a new midx generated). In that case you might see
a race like:

  - process R opens the existing midx, which mentions pack P (but does
    not yet open P)

  - process W repacks, deleting P and generating a new midx with P'

  - process R wants object X; the midx claims it is in pack P, so it
    tries to open that but fails

What should happen here is that R will fail to find the object at all,
should hit reprepare_packed_git(), and will then pick up the new pack.

I don't recall offhand whether reprepare_packed_git() will open the new
midx. If it doesn't, we'd still find the object via the actual pack
.idx, but we may end up consulting the now-stale midx over and over (so
we'll get the right answer, but there may be some room for
optimization).

> > while the issue was entirely self-made because the multi-pack index
> > should have been regenerated, we can still reduce the number of syscalls
> > by 75% in the case of nonexistent packfiles by reordering these calls.
> 
> That sounds more like a band-aid, if we still do the remaining 25%
> that we somehow know would be unnecessary.

Yeah, that was my gut reaction, too. add_packed_git() is not
optimized[1] at all. But it shouldn't have to be, because it's meant to
be called once per process. Even with the re-ordering, we'd still make a
bunch of pointless stat() calls for the .pack file we know is not there.

The code in prepare_midx_pack() converts a numeric pack id (referenced
inside the midx) into a "struct packed_git" pointer, caching the results
in multi_pack_index->packs. That field holds NULL for "we have not
looked it up yet" or a valid pointer to a packed_git. It probably needs
to hold a third state: "we tried and failed".

Something like this (large untested) patch:

diff --git a/midx.c b/midx.c
index 3d0015f782..354b1f886c 100644
--- a/midx.c
+++ b/midx.c
@@ -405,7 +405,7 @@ void close_midx(struct multi_pack_index *m)
 	munmap((unsigned char *)m->data, m->data_len);
 
 	for (i = 0; i < m->num_packs; i++) {
-		if (m->packs[i])
+		if (m->packs[i] && m->packs[i] != (void *)(intptr_t)-1)
 			m->packs[i]->multi_pack_index = 0;
 	}
 	FREE_AND_NULL(m->packs);
@@ -458,6 +458,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 
 	pack_int_id = midx_for_pack(&m, pack_int_id);
 
+	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
+		return 1;
 	if (m->packs[pack_int_id])
 		return 0;
 
@@ -482,8 +484,10 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 	strbuf_release(&pack_name);
 	strbuf_release(&key);
 
-	if (!p)
+	if (!p) {
+		m->packs[pack_int_id] = (void *)(intptr_t)-1;
 		return 1;
+	}
 
 	p->multi_pack_index = 1;
 	m->packs[pack_int_id] = p;

-Peff

[1] There probably are optimization opportunities in add_packed_git(). I
    don't think re-ordering will help much in the common case that we
    actually do have the pack. But really, most callers do not care
    about these auxiliary files at all! We could simply skip them during
    the initial setup and lazy-load them via accessor functions.

    I _think_ that should be OK with respect to races. For a newly added
    pack, we know they will always be in place before the matching .idx
    file (per the logic I outlined above). For a pack that goes away, we
    might racily fail to see its auxiliary file. But that is mostly true
    now (we might see its .idx, and then the .promisor file is deleted
    before we call access()). It does increase the size of that window,
    though (and in particular lets it happen even if the pack has
    actually been opened).

    I'm not sure how much that would matter in practice. OTOH, I'm not
    sure that saving a few access() calls would be all that meaningful,
    unless you have a zillion packs. And if you have a zillion packs,
    you're likely to get much bigger speedups in other areas by
    repacking them anyway. So it hasn't really been an area I've
    pursued before.

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH] packfile: avoid access(3p) calls for missing packs
  2025-05-19  6:52   ` Jeff King
@ 2025-05-19 15:46     ` Junio C Hamano
  2025-05-20  6:45     ` Patrick Steinhardt
  2025-05-23  1:02     ` Taylor Blau
  2 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2025-05-19 15:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Taylor Blau

Jeff King <peff@peff.net> writes:

> The code in prepare_midx_pack() converts a numeric pack id (referenced
> inside the midx) into a "struct packed_git" pointer, caching the results
> in multi_pack_index->packs. That field holds NULL for "we have not
> looked it up yet" or a valid pointer to a packed_git. It probably needs
> to hold a third state: "we tried and failed".

Yeah, this was exactly why I asked "are we having repeated failures
in the same process", and I am happy with this direction.

> Something like this (large untested) patch:
>
> diff --git a/midx.c b/midx.c
> index 3d0015f782..354b1f886c 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -405,7 +405,7 @@ void close_midx(struct multi_pack_index *m)
>  	munmap((unsigned char *)m->data, m->data_len);
>  
>  	for (i = 0; i < m->num_packs; i++) {
> -		if (m->packs[i])
> +		if (m->packs[i] && m->packs[i] != (void *)(intptr_t)-1)
>  			m->packs[i]->multi_pack_index = 0;
>  	}
>  	FREE_AND_NULL(m->packs);
> @@ -458,6 +458,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
>  
>  	pack_int_id = midx_for_pack(&m, pack_int_id);
>  
> +	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
> +		return 1;
>  	if (m->packs[pack_int_id])
>  		return 0;
>  
> @@ -482,8 +484,10 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
>  	strbuf_release(&pack_name);
>  	strbuf_release(&key);
>  
> -	if (!p)
> +	if (!p) {
> +		m->packs[pack_int_id] = (void *)(intptr_t)-1;
>  		return 1;
> +	}
>  
>  	p->multi_pack_index = 1;
>  	m->packs[pack_int_id] = p;

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] packfile: avoid access(3p) calls for missing packs
  2025-05-19  6:52   ` Jeff King
  2025-05-19 15:46     ` Junio C Hamano
@ 2025-05-20  6:45     ` Patrick Steinhardt
  2025-05-22  5:28       ` Jeff King
  2025-05-23  1:02     ` Taylor Blau
  2 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-20  6:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Taylor Blau

On Mon, May 19, 2025 at 02:52:21AM -0400, Jeff King wrote:
> On Fri, May 16, 2025 at 11:34:10AM -0700, Junio C Hamano wrote:
> 
> > > The reason why we do this is likely because we reuse `p->pack_name` to
> > > derive the other files' names, as well, so that we only have to allocate
> > > this buffer once. As such, we have to compute the packfile name last so
> > > that it doesn't get overwritten by the other names.
> > 
> > I vaguely recall that the order of checks between .idx and .pack
> > were deliberately chosen, as we do not want to add and attempt to
> > use .pack file before its associated .idx file is ready since
> > without the latter the former is unusable at runtime.  
> 
> It is definitely intentional and important that we discover packs by
> their .idx file first. The writing side uses the creation of the .idx
> file (via atomic rename) as the final step in making a pack accessible.
> Before that, a reader could see a .pack without its various accessory
> files (and so not realize it has a .keep file, for example).

Yup. But that order doesn't change anyway in this patch. What changes is
only the order for auxiliary and non-mandatory ".keep", ".promisor" and
".mtimes" files.

> >     Side note: It may have been more important back when the name of
> >     the packfile was a hash of names of the objects in the pack,
> >     which meant that when you repack a quiescent and fully repacked
> >     repository twice with different parameters, you could have ended
> >     up with a packfile whose name is the same but the contents as a
> >     bytestream (hence the object offset) are different, making a
> >     stale .idx not pointing into the correct position in the new
> >     .pack file.  These days the name of the packfile is based on the
> >     contents of the pack as a bytestream, so we no longer suffer
> >     in such a scenario.
> 
> I don't think reading the .idx file first was sufficient to protect us
> there, since the .pack could be replaced racily after the reader opened
> the .idx. I don't recall if we were subject to that race back then (and
> it just didn't come up enough to worry about), or if repack would throw
> away an identically-named file. Anyway, as you note, it's no longer an
> issue.
> 
> But I think that is mostly orthogonal to how the auxiliary files are
> handled. I think Patrick's guess is correct that the order we have is
> simply because it was most convenient to end up with ".pack" in the
> name.

Taking a step back though: do we always ensure the order in which we
those auxiliary files are created and deleted? If we know that those are
always:

  - Moved into place before their ".idx" file.

  - Deleted after deleting their ".pack" file.

Then reordering may cause us to race indeed so that we see the packfile,
but miss the auxiliary data structures. `unlink_pack_path()` does seem
to ensure the latter property, as both ".idx" and ".pack" are deleted
first. I'm not quite sure about the former, but it seems like we also do
this.

So I think that the proposed patch is wrong. There should definitely be
a comment in that function though to say that this order is intentional
and not merely an optimization.

[snip]
> > > while the issue was entirely self-made because the multi-pack index
> > > should have been regenerated, we can still reduce the number of syscalls
> > > by 75% in the case of nonexistent packfiles by reordering these calls.
> > 
> > That sounds more like a band-aid, if we still do the remaining 25%
> > that we somehow know would be unnecessary.
> 
> Yeah, that was my gut reaction, too. add_packed_git() is not
> optimized[1] at all. But it shouldn't have to be, because it's meant to
> be called once per process. Even with the re-ordering, we'd still make a
> bunch of pointless stat() calls for the .pack file we know is not there.

It certainly is, as mentioned in the cover letter. It was a supposedly
cheap win as I didn't want to introduce a negative lookup cache for an
edge case like this. But the patch below looks simple enough -- I feared
it would be a lot more involved.

> The code in prepare_midx_pack() converts a numeric pack id (referenced
> inside the midx) into a "struct packed_git" pointer, caching the results
> in multi_pack_index->packs. That field holds NULL for "we have not
> looked it up yet" or a valid pointer to a packed_git. It probably needs
> to hold a third state: "we tried and failed".
> 
> Something like this (large untested) patch:
> 
> diff --git a/midx.c b/midx.c
> index 3d0015f782..354b1f886c 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -405,7 +405,7 @@ void close_midx(struct multi_pack_index *m)
>  	munmap((unsigned char *)m->data, m->data_len);
>  
>  	for (i = 0; i < m->num_packs; i++) {
> -		if (m->packs[i])
> +		if (m->packs[i] && m->packs[i] != (void *)(intptr_t)-1)
>  			m->packs[i]->multi_pack_index = 0;
>  	}
>  	FREE_AND_NULL(m->packs);
> @@ -458,6 +458,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
>  
>  	pack_int_id = midx_for_pack(&m, pack_int_id);
>  
> +	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
> +		return 1;
>  	if (m->packs[pack_int_id])
>  		return 0;
>  
> @@ -482,8 +484,10 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
>  	strbuf_release(&pack_name);
>  	strbuf_release(&key);
>  
> -	if (!p)
> +	if (!p) {
> +		m->packs[pack_int_id] = (void *)(intptr_t)-1;
>  		return 1;
> +	}
>  
>  	p->multi_pack_index = 1;
>  	m->packs[pack_int_id] = p;
> 
> -Peff
> 
> [1] There probably are optimization opportunities in add_packed_git(). I
>     don't think re-ordering will help much in the common case that we
>     actually do have the pack. But really, most callers do not care
>     about these auxiliary files at all! We could simply skip them during
>     the initial setup and lazy-load them via accessor functions.
> 
>     I _think_ that should be OK with respect to races. For a newly added
>     pack, we know they will always be in place before the matching .idx
>     file (per the logic I outlined above). For a pack that goes away, we
>     might racily fail to see its auxiliary file. But that is mostly true
>     now (we might see its .idx, and then the .promisor file is deleted
>     before we call access()). It does increase the size of that window,
>     though (and in particular lets it happen even if the pack has
>     actually been opened).

I'm not sure it would be okay, as mentioned above. The current ordering
ensures that we always see auxiliary data structures in case the ".pack"
file exists. If we started to cache then that wouldn't be the case
anymore.

>     I'm not sure how much that would matter in practice. OTOH, I'm not
>     sure that saving a few access() calls would be all that meaningful,
>     unless you have a zillion packs. And if you have a zillion packs,
>     you're likely to get much bigger speedups in other areas by
>     repacking them anyway. So it hasn't really been an area I've
>     pursued before.

It probably doesn't in the general case at all.

Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 0/2] packfile: avoid access(3p) calls for missing packs
  2025-05-16  8:55 [PATCH] packfile: avoid access(3p) calls for missing packs Patrick Steinhardt
  2025-05-16 18:34 ` Junio C Hamano
@ 2025-05-20  9:53 ` Patrick Steinhardt
  2025-05-20  9:53   ` [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
                     ` (2 more replies)
  2025-05-28 12:24 ` [PATCH v3 " Patrick Steinhardt
  2 siblings, 3 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-20  9:53 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King, Junio C Hamano

Hi,

this patch addresses an issue we have recently seen in our production
systems due to a stale MIDX. The MIDX contained entries for packfiles
that didn't exist anymore, which caused Git to repeatedly look up those
packfiles. Each missing packfile resulted in four repeated syscalls:
three access(3p) calls to check for supporting data structures, and one
call to stat(3p) to check for the packfile itself. The first three calls
are essentially wasted though when the stat(3p) call itself fails, which
is being fixed by this patch.

I doubt that the patch matters in almost any repository, but given that
the refactoring is trivial I thought to submit the patch regardless of
that. Another step would be to introduce a negative lookup cache -- but
that would be a bit more involved, so I decided against it for now as I
don't want to introduce complexity for dubious gains.

Changes in v2:
  - Drop the patch that reorders syscalls and add a comment explaining
    why the order is important.
  - Add a negative lookup cache for indexed packfiles.
  - Link to v1: https://lore.kernel.org/r/20250516-pks-pack-avoid-stats-on-missing-v1-1-e2ef4d8798a3@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (2):
      packfile: explain ordering of how we look up auxiliary pack files
      midx: stop repeatedly looking up nonexistent packfiles

 midx.c     | 10 ++++++++--
 packfile.c | 11 +++++++++++
 2 files changed, 19 insertions(+), 2 deletions(-)

Range-diff versus v1:

1:  31ffb3712ca < -:  ----------- packfile: avoid access(3p) calls for missing packs
-:  ----------- > 1:  6125b84389d packfile: explain ordering of how we look up auxiliary pack files
-:  ----------- > 2:  8cb82a771c0 midx: stop repeatedly looking up nonexistent packfiles

---
base-commit: 1a8a4971cc6c179c4dd711f4a7f5d7178f4b3ab7
change-id: 20250516-pks-pack-avoid-stats-on-missing-8e3b75755cf0


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files
  2025-05-20  9:53 ` [PATCH v2 0/2] " Patrick Steinhardt
@ 2025-05-20  9:53   ` Patrick Steinhardt
  2025-05-23  1:03     ` Taylor Blau
  2025-05-20  9:53   ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
  2025-05-21 13:24   ` [PATCH v2 0/2] packfile: avoid access(3p) calls for missing packs Junio C Hamano
  2 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-20  9:53 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King, Junio C Hamano

When adding a packfile to an object database we perform four syscalls:

  - Three calls to access(3p) are done to check for auxiliary data
    structures.

  - One call to stat(3p) is done to check for the ".pack" itself.

One curious bit is that we perform the access(3p) calls before checking
for the packfile itself, but if the packfile doesn't exist we discard
all results. The access(3p) calls are thus essentially wasted, so one
may be triggered to reorder those calls so that we can short-circuit the
other syscalls in case the packfile does not exist.

The order in which we look up files is quite important though to help
avoid races:

  - When installing a packfile we move auxiliary data structures into
    place before we install the ".idx" file.

  - When deleting a packfile we first delete the ".idx" and ".pack"
    files before deleting auxiliary data structures.

As such, to avoid any races with concurrently created or deleted packs
we need to make sure that we _first_ read auxiliary data structures
before we read the corresponding ".idx" or ".pack" file. Otherwise it
may easily happen that we return a populated but misclassified pack.

Add a comment to `add_packed_git()` to make future readers aware of this
ordering requirement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 packfile.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/packfile.c b/packfile.c
index d91016f1c7f..933036e2606 100644
--- a/packfile.c
+++ b/packfile.c
@@ -737,6 +737,17 @@ struct packed_git *add_packed_git(struct repository *r, const char *path,
 	p = alloc_packed_git(r, alloc);
 	memcpy(p->pack_name, path, path_len);
 
+	/*
+	 * Note that we have to check auxiliary data structures before we check
+	 * for the ".pack" file to exist to avoid races with a packfile that is
+	 * in the process of being deleted. The ".pack" file is unlinked before
+	 * its auxiliary data structures, so we know that we either get a
+	 * consistent snapshot of all data structures or that we'll fail to
+	 * stat(3p) the packfile itself and thus return `NULL`.
+	 *
+	 * As such, we cannot bail out before the access(3p) calls in case the
+	 * packfile doesn't exist without doing two stat(3p) calls for it.
+	 */
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
 	if (!access(p->pack_name, F_OK))
 		p->pack_keep = 1;

-- 
2.49.0.1151.ga128411c76.dirty


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-20  9:53 ` [PATCH v2 0/2] " Patrick Steinhardt
  2025-05-20  9:53   ` [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
@ 2025-05-20  9:53   ` Patrick Steinhardt
  2025-05-22  5:32     ` Jeff King
  2025-05-21 13:24   ` [PATCH v2 0/2] packfile: avoid access(3p) calls for missing packs Junio C Hamano
  2 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-20  9:53 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King, Junio C Hamano

The multi-pack index acts as a cache across a set of packfiles so that
we can quickly look up which of those packfiles contains a given object.
As such, the multi-pack index naturally needs to be updated every time
one of the packfiles goes away, or otherwise the multi-pack index has
grown stale.

A stale multi-pack index should be handled gracefully by Git though, and
in fact it is: if the indexed pack cannot be found we simply ignore it
and eventually we fall back to doing the object lookup by just iterating
through all packs, even if those aren't indexed.

But while this fallback works, it has one significant downside: we don't
cache the fact that a pack has vanished. This leads to us repeatedly
trying to look up the same pack only to realize that it (still) doesn't
exist.

This issue can be easily demonstrated by creating a repository with a
stale multi-pack index and a couple of objects. We do so by creating a
repository with two packfiles, both of which are indexed by the
multi-pack index, and then repack those two packfiles. Note that we have
to move the multi-pack-index before doing the final repack, as Git knows
to delete it otherwise.

    $ git init repo
    $ cd repo/
    $ git config set maintenance.auto false
    $ for i in $(seq 1000); do printf "%d-original" $i >file-$i; done
    $ git add .
    $ git commit -moriginal
    $ git repack -dl
    $ for i in $(seq 1000); do printf "%d-modified" $i >file-$i; done
    $ git commit -a -mmodified
    $ git repack -dl
    $ git multi-pack-index write
    $ mv .git/objects/pack/multi-pack-index .
    $ git repack -Adl
    $ mv multi-pack-index .git/objects/pack/

Commands that cause a lot of objects lookups will now repeatedly invoke
`add_packed_git()`, which leads to three failed access(3p) calls as well
as one failed stat(3p) call. The following strace for example is done
for `git log --patch` in the above repository:

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     74.67    0.024693           1     18038     18031 access
     25.33    0.008378           1      6045      6017 newfstatat
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.033071           1     24083     24048 total

Fix the issue by introducing a negative lookup cache for indexed packs.
This cache works by simply storing an invalid pointer for a missing pack
when `prepare_midx_pack()` fails to look up the pack. Most users of the
`packs` array don't need to be adjusted, either, as they all know to
call `prepare_midx_pack()` before accessing the array.

With this change in place we can now see a significantly reduced number
of syscalls:

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     73.58    0.000323           5        60        28 newfstatat
     26.42    0.000116           5        23        16 access
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.000439           5        83        44 total

Furthermore, this change also results in a speedup:

    Benchmark 1: git log --patch (revision = HEAD~)
      Time (mean ± σ):      50.4 ms ±   2.5 ms    [User: 22.0 ms, System: 24.4 ms]
      Range (min … max):    45.4 ms …  54.9 ms    53 runs

    Benchmark 2: git log --patch (revision = HEAD)
      Time (mean ± σ):      12.7 ms ±   0.4 ms    [User: 11.1 ms, System: 1.6 ms]
      Range (min … max):    12.4 ms …  15.0 ms    191 runs

    Summary
      git log --patch (revision = HEAD) ran
        3.96 ± 0.22 times faster than git log --patch (revision = HEAD~)

In the end, it should in theory never be necessary to have this negative
lookup cache given that we know to update the multi-pack index together
with repacks. But as the change is quite contained and as the speedup
can be significant as demonstrated above, it does feel sensible to have
the negative lookup cache regardless.

Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 3d0015f7828..fbce88bd463 100644
--- a/midx.c
+++ b/midx.c
@@ -405,7 +405,7 @@ void close_midx(struct multi_pack_index *m)
 	munmap((unsigned char *)m->data, m->data_len);
 
 	for (i = 0; i < m->num_packs; i++) {
-		if (m->packs[i])
+		if (m->packs[i] && m->packs[i] != (void *)(intptr_t)-1)
 			m->packs[i]->multi_pack_index = 0;
 	}
 	FREE_AND_NULL(m->packs);
@@ -458,6 +458,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 
 	pack_int_id = midx_for_pack(&m, pack_int_id);
 
+	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
+		return 1;
 	if (m->packs[pack_int_id])
 		return 0;
 
@@ -482,8 +484,10 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 	strbuf_release(&pack_name);
 	strbuf_release(&key);
 
-	if (!p)
+	if (!p) {
+		m->packs[pack_int_id] = (void *)(intptr_t)-1;
 		return 1;
+	}
 
 	p->multi_pack_index = 1;
 	m->packs[pack_int_id] = p;
@@ -495,6 +499,8 @@ struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
 				   uint32_t pack_int_id)
 {
 	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+	if (m->packs[local_pack_int_id] == (void *)(intptr_t)-1)
+		return NULL;
 	return m->packs[local_pack_int_id];
 }
 

-- 
2.49.0.1151.ga128411c76.dirty


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 0/2] packfile: avoid access(3p) calls for missing packs
  2025-05-20  9:53 ` [PATCH v2 0/2] " Patrick Steinhardt
  2025-05-20  9:53   ` [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
  2025-05-20  9:53   ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
@ 2025-05-21 13:24   ` Junio C Hamano
  2 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2025-05-21 13:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, Jeff King

Patrick Steinhardt <ps@pks.im> writes:

> Changes in v2:
>   - Drop the patch that reorders syscalls and add a comment explaining
>     why the order is important.

Much nicer.

>   - Add a negative lookup cache for indexed packfiles.

Looking very good.

Thanks, queued.


>   - Link to v1: https://lore.kernel.org/r/20250516-pks-pack-avoid-stats-on-missing-v1-1-e2ef4d8798a3@pks.im
>
> Thanks!
>
> Patrick
>
> ---
> Patrick Steinhardt (2):
>       packfile: explain ordering of how we look up auxiliary pack files
>       midx: stop repeatedly looking up nonexistent packfiles
>
>  midx.c     | 10 ++++++++--
>  packfile.c | 11 +++++++++++
>  2 files changed, 19 insertions(+), 2 deletions(-)
>
> Range-diff versus v1:
>
> 1:  31ffb3712ca < -:  ----------- packfile: avoid access(3p) calls for missing packs
> -:  ----------- > 1:  6125b84389d packfile: explain ordering of how we look up auxiliary pack files
> -:  ----------- > 2:  8cb82a771c0 midx: stop repeatedly looking up nonexistent packfiles
>
> ---
> base-commit: 1a8a4971cc6c179c4dd711f4a7f5d7178f4b3ab7
> change-id: 20250516-pks-pack-avoid-stats-on-missing-8e3b75755cf0

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] packfile: avoid access(3p) calls for missing packs
  2025-05-20  6:45     ` Patrick Steinhardt
@ 2025-05-22  5:28       ` Jeff King
  0 siblings, 0 replies; 53+ messages in thread
From: Jeff King @ 2025-05-22  5:28 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Junio C Hamano, git, Taylor Blau

On Tue, May 20, 2025 at 08:45:06AM +0200, Patrick Steinhardt wrote:

> Taking a step back though: do we always ensure the order in which we
> those auxiliary files are created and deleted? If we know that those are
> always:
> 
>   - Moved into place before their ".idx" file.
> 
>   - Deleted after deleting their ".pack" file.
> 
> Then reordering may cause us to race indeed so that we see the packfile,
> but miss the auxiliary data structures. `unlink_pack_path()` does seem
> to ensure the latter property, as both ".idx" and ".pack" are deleted
> first. I'm not quite sure about the former, but it seems like we also do
> this.
> 
> So I think that the proposed patch is wrong. There should definitely be
> a comment in that function though to say that this order is intentional
> and not merely an optimization.

Hmm, interesting. Yes, I agree that deleting the .pack before any other
auxiliary files should catch the race, given the ordering on the read
side in add_packed_git().

But I don't think we ever discussed that ordering or its implications.
So I assumed all bets were off with respect to racing.

However, it seems we lucked into doing the correct thing by virtue of
adding to the end of the list in unlink_pack_path() repeatedly (though I
think if you dig all the way back into git-repack.sh, when there were
just .keep files, there were periods where we did it in the wrong
order).

But if we are doing it in the best order, whether correct or not, I
agree we should not disrupt that.

> > [1] There probably are optimization opportunities in add_packed_git(). I
> >     don't think re-ordering will help much in the common case that we
> >     actually do have the pack. But really, most callers do not care
> >     about these auxiliary files at all! We could simply skip them during
> >     the initial setup and lazy-load them via accessor functions.
> > 
> >     I _think_ that should be OK with respect to races. For a newly added
> >     pack, we know they will always be in place before the matching .idx
> >     file (per the logic I outlined above). For a pack that goes away, we
> >     might racily fail to see its auxiliary file. But that is mostly true
> >     now (we might see its .idx, and then the .promisor file is deleted
> >     before we call access()). It does increase the size of that window,
> >     though (and in particular lets it happen even if the pack has
> >     actually been opened).
> 
> I'm not sure it would be okay, as mentioned above. The current ordering
> ensures that we always see auxiliary data structures in case the ".pack"
> file exists. If we started to cache then that wouldn't be the case
> anymore.

Yeah, you're right. I thought it was already racy, but it luckily is
not. It does still feel unfortunate that we have to spend syscalls
loading information that won't be used by most commands, but it may not
be a measurable performance issue. I think if we did want to lazy-load
that information, we'd probably need to do an extra stat() of .pack
afterwards to check for the deletion race (so repack and pack-objects
would pay the penalty, but most other commands would not).

Anyway, that can be explored another day.

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-20  9:53   ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
@ 2025-05-22  5:32     ` Jeff King
  2025-05-22 15:47       ` Junio C Hamano
  2025-05-23  1:31       ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Taylor Blau
  0 siblings, 2 replies; 53+ messages in thread
From: Jeff King @ 2025-05-22  5:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, Junio C Hamano

On Tue, May 20, 2025 at 11:53:10AM +0200, Patrick Steinhardt wrote:

> @@ -458,6 +458,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
>  
>  	pack_int_id = midx_for_pack(&m, pack_int_id);
>  
> +	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
> +		return 1;
>  	if (m->packs[pack_int_id])
>  		return 0;

I did wonder while writing this if we might be able to hide the magic
number and gross casting inside a constant or macro. I think just:

  #define MIDX_PACK_ERROR ((void *)(intptr_t)-1)

would be enough?

Though...

> @@ -495,6 +499,8 @@ struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
>  				   uint32_t pack_int_id)
>  {
>  	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
> +	if (m->packs[local_pack_int_id] == (void *)(intptr_t)-1)
> +		return NULL;
>  	return m->packs[local_pack_int_id];

Yuck, yet another spot that needs to be aware of the new tri-state
value. One alternative is using an auxiliary array to cache the errors,
and then only the lookup function needs to care. Like:

diff --git a/midx.c b/midx.c
index c1adff4404..df71ead50b 100644
--- a/midx.c
+++ b/midx.c
@@ -186,6 +186,7 @@ static struct multi_pack_index *load_multi_pack_index_one(struct repository *r,
 
 	CALLOC_ARRAY(m->pack_names, m->num_packs);
 	CALLOC_ARRAY(m->packs, m->num_packs);
+	CALLOC_ARRAY(m->pack_err, m->num_packs);
 
 	cur_pack_name = (const char *)m->chunk_pack_names;
 	for (i = 0; i < m->num_packs; i++) {
@@ -408,6 +409,7 @@ void close_midx(struct multi_pack_index *m)
 		if (m->packs[i])
 			m->packs[i]->multi_pack_index = 0;
 	}
+	FREE_AND_NULL(m->pack_errs);
 	FREE_AND_NULL(m->packs);
 	FREE_AND_NULL(m->pack_names);
 	free(m);
@@ -460,6 +462,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 
 	if (m->packs[pack_int_id])
 		return 0;
+	if (m->pack_errs[pack_int_id])
+		return 1;
 
 	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
 		    m->pack_names[pack_int_id]);
@@ -482,8 +486,10 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 	strbuf_release(&pack_name);
 	strbuf_release(&key);
 
-	if (!p)
+	if (!p) {
+		m->pack_errs[pack_int_id] = 1;
 		return 1;
+	}
 
 	p->multi_pack_index = 1;
 	m->packs[pack_int_id] = p;

You could even lazy-malloc the extra array if you wanted to optimize the
common no-errors case, but I'm not sure it's a big deal.

-Peff

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-22  5:32     ` Jeff King
@ 2025-05-22 15:47       ` Junio C Hamano
  2025-05-22 16:59         ` Jeff King
  2025-05-23  1:31       ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Taylor Blau
  1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2025-05-22 15:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Taylor Blau

Jeff King <peff@peff.net> writes:

> @@ -460,6 +462,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
>  
>  	if (m->packs[pack_int_id])
>  		return 0;
> +	if (m->pack_errs[pack_int_id])
> +		return 1;
>  
>  	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
>  		    m->pack_names[pack_int_id]);
> @@ -482,8 +486,10 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
>  	strbuf_release(&pack_name);
>  	strbuf_release(&key);
>  
> -	if (!p)
> +	if (!p) {
> +		m->pack_errs[pack_int_id] = 1;
>  		return 1;
> +	}
>  
>  	p->multi_pack_index = 1;
>  	m->packs[pack_int_id] = p;
>
> You could even lazy-malloc the extra array if you wanted to optimize the
> common no-errors case, but I'm not sure it's a big deal.

So the idea is to leave m->packs[] for unused pack NULL instead of
magic value, so that users of that array do not need to check?

I think that is a lot safer than the magic "we know this fails"
value that existing callers that have long trusted that a non-NULL
.packs[] element is a valid pack.

By the way, I suspect I am not reading the code correctly, but I am
not sure what fill_midx_entry() does with a failed case.

	midx_for_object(&m, pos);
	pack_int_id = nth_midxed_pack_int_id(m, pos);

	if (prepare_midx_pack(r, m, pack_int_id))
		return 0;

With or without cached failure, this should return 0 when and only
when m->packs[pack_int_id] is a usable pack.  But what about the
access on the next line?

	p = m->packs[pack_int_id - m->num_packs_in_base];

Do we have any guarantee that we called prepare_midx_pack() for 
the pack at (pack_int_id - m->num_packs_in_base)th slot?  Can p
be NULL here?  And with the magic "we know this fails" value, can p
be that magic value?

	/*
	* We are about to tell the caller where they can locate the
	* requested object.  We better make sure the packfile is
	* still here and can be accessed before supplying that
	* answer, as it may have been deleted since the MIDX was
	* loaded!
	*/
	if (!is_pack_valid(p))
		return 0;

That value is passed to is_pack_valid(), which I do not think even
takes NULL.

Puzzled.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-22 15:47       ` Junio C Hamano
@ 2025-05-22 16:59         ` Jeff King
  2025-05-22 18:44           ` Junio C Hamano
  2025-05-23  1:22           ` Taylor Blau
  0 siblings, 2 replies; 53+ messages in thread
From: Jeff King @ 2025-05-22 16:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git, Taylor Blau

On Thu, May 22, 2025 at 08:47:58AM -0700, Junio C Hamano wrote:

> So the idea is to leave m->packs[] for unused pack NULL instead of
> magic value, so that users of that array do not need to check?
> 
> I think that is a lot safer than the magic "we know this fails"
> value that existing callers that have long trusted that a non-NULL
> .packs[] element is a valid pack.

Right. It would protect any further users of that assumption that we
missed.

> By the way, I suspect I am not reading the code correctly, but I am
> not sure what fill_midx_entry() does with a failed case.
> 
> 	midx_for_object(&m, pos);
> 	pack_int_id = nth_midxed_pack_int_id(m, pos);
> 
> 	if (prepare_midx_pack(r, m, pack_int_id))
> 		return 0;
> 
> With or without cached failure, this should return 0 when and only
> when m->packs[pack_int_id] is a usable pack.  But what about the
> access on the next line?

I think there's a subtlety here with incremental midx's, in that a pack
id can be "global" within the whole midx chain, or a local index into a
specific chain element's list of packs.

In fill_midx_entry(), I think we get such a global id back from
nth_midxed_pack_int_id(). And then when we hand that to
prepare_midx_pack(), it is converted into a local midx/id pair with:

  pack_int_id = midx_for_pack(&m, pack_int_id);

where the end of that function is the same:

  return pack_int_id - m->num_packs_in_base;

you saw elsewhere. So at that point we have a local midx and index,
which is what prepare_midx_pack() fills via the m->packs[pack_int_id]
field.

So when you say "only when m->packs[pack_int_id] is a usable pack", you
are talking about the local m/pack_int_id within that function.

But back in the caller...

> 	p = m->packs[pack_int_id - m->num_packs_in_base];
> 
> Do we have any guarantee that we called prepare_midx_pack() for 
> the pack at (pack_int_id - m->num_packs_in_base)th slot?  Can p
> be NULL here?  And with the magic "we know this fails" value, can p
> be that magic value?

Our pack_int_id is the global one, so it needs to be adjusted. But this
pack pointer we access is the same one that was filled (or not) by
prepare_midx_pack(). So it cannot be NULL or the magic "fails" value,
because prepare_midx_pack() returned 0.

So I think this code is fine.


One thing that did puzzle me: in prepare_midx_pack() we not only adjust
the pack_int_id, but we may walk back through the midx chain to find the
correct multi_pack_index struct. Wouldn't the caller need to do the
same?

The answer is that it does. The midx_for_object() call in
fill_midx_entry() does that same walk, storing the result in its local
"m" variable. So the walk backwards in prepare_midx_pack() is
superfluous for this particular caller, who we know is already handing
us the desired multi_pack_index struct, and it could just do:

  pack_int_id -= m->num_packs_in_base;

rather than calling midx_for_pack(). But the same is not necessarily
true for other callers, so we should continue calling that function.

I suspect this would all be a bit more obvious if prepare_midx_pack()
simply returned the pack pointer, avoiding the need for callers to look
at m->packs at all (and making it a true cache, internally only to
prepare_midx_pack()).

Looking at other callers of prepare_midx_pack():

  - in fill_packs_from_midx(), we do not adjust our "m" to match the
    index. But that is OK, because we adjust our local index (which we
    get by iterating from 0 to m->num_packs) to a global index when
    calling the function:

      if (prepare_midx_pack(ctx->repo, m, m->num_packs_in_base + i))
	...
      open_pack_index(m->packs[i]);

    which is fine.

  - I'm less sure of the call in expire_midx_packs(). It iterates over
    num_packs in the same way, but does:

      if (prepare_midx_pack(r, m, i))
              continue;

    and then looks at m->packs[i]. That would be wrong if "m" is not the
    first item in the chain. Ah, I see. Earlier we do:

      if (m->base_midx)
              die(_("cannot expire packs from an incremental multi-pack-index"));

    so we know that the global and local ids are equivalent in this
    instance (since the "base" midx . Still seems a bit fragile.

  - There's a similar case in midx-write.c:want_included_pack(). That
    one seems to have the same local/global confusion, but I do not
    obviously see anything preventing it from being fed a non-base midx.
    So it might possibly be buggy?

    Likewise fill_included_packs_batch() in the same file.

    In both cases I think if prepare_midx_pack() returned a pointer, we
    could just use it directly.

  - In nth_bitmapped_pack(), we call midx_for_pack() ourselves to get
    the right midx. That's good, but getting a pack pointer from
    prepare_midx_pack() wouldn't help us, because we still look at the
    midx struct for other reasons.

So I dunno. It might be a useful refactor, but it doesn't make the
problem go away entirely. I'm suspicious of those calls in midx-write.c,
but Taylor can probably say more about whether they're wrong or if
there's some less-obvious reason we'd only see a base midx there.

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-22 16:59         ` Jeff King
@ 2025-05-22 18:44           ` Junio C Hamano
  2025-05-23  1:22           ` Taylor Blau
  1 sibling, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2025-05-22 18:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Taylor Blau

Jeff King <peff@peff.net> writes:

> But back in the caller...
>
>> 	p = m->packs[pack_int_id - m->num_packs_in_base];
>> ...
>
> Our pack_int_id is the global one, so it needs to be adjusted. But this
> pack pointer we access is the same one that was filled (or not) by
> prepare_midx_pack(). So it cannot be NULL or the magic "fails" value,
> because prepare_midx_pack() returned 0.
>
> So I think this code is fine.

Ahh, I missed that call to midx_for_pack() in prepare_midx_pack()
that modifies pack_int_id variable.  Of course, it did not help that
the implementation detail of adjusting by m->num_packs_in_base which
is done in midx_for_pack() is not abstracted out and the caller needs
to do the same as above.  That confused me.

Thanks for an explanation.



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] packfile: avoid access(3p) calls for missing packs
  2025-05-19  6:52   ` Jeff King
  2025-05-19 15:46     ` Junio C Hamano
  2025-05-20  6:45     ` Patrick Steinhardt
@ 2025-05-23  1:02     ` Taylor Blau
  2025-05-23  2:03       ` Jeff King
  2 siblings, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-23  1:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Patrick Steinhardt, git

On Mon, May 19, 2025 at 02:52:21AM -0400, Jeff King wrote:
> On Fri, May 16, 2025 at 11:34:10AM -0700, Junio C Hamano wrote:
>
> > > The reason why we do this is likely because we reuse `p->pack_name` to
> > > derive the other files' names, as well, so that we only have to allocate
> > > this buffer once. As such, we have to compute the packfile name last so
> > > that it doesn't get overwritten by the other names.
> >
> > I vaguely recall that the order of checks between .idx and .pack
> > were deliberately chosen, as we do not want to add and attempt to
> > use .pack file before its associated .idx file is ready since
> > without the latter the former is unusable at runtime.
>
> It is definitely intentional and important that we discover packs by
> their .idx file first. The writing side uses the creation of the .idx
> file (via atomic rename) as the final step in making a pack accessible.
> Before that, a reader could see a .pack without its various accessory
> files (and so not realize it has a .keep file, for example).

Yup. Or seeing a pack before its .bitmap is written, in which case any
clones or fetches unfortunate enough to take place after the pack was
read but before the .bitmap was written  would be significantly slower
than they otherwise would have been with bitmaps.

> > > All of this likely doesn't matter in the general case: Git shouldn't end
> > > up looking too many nonexistent packfiles anyway.
> >
> > In any case, we shouldn't attempt to use .pack when its associated
> > files are missing.  It is not about nonexistent but about incomplete
> > (including "in the process of being written").
>
> This shouldn't be a race for files being written. The .idx is written
> last, so the .pack should always be there. And the midx is written after
> that, so again, the pack should always be there. At least for newly
> written packs.
>
> We might see the opposite: packs going away because they were repacked
> (and perhaps even a new midx generated). In that case you might see
> a race like:
>
>   - process R opens the existing midx, which mentions pack P (but does
>     not yet open P)
>
>   - process W repacks, deleting P and generating a new midx with P'
>
>   - process R wants object X; the midx claims it is in pack P, so it
>     tries to open that but fails
>
> What should happen here is that R will fail to find the object at all,
> should hit reprepare_packed_git(), and will then pick up the new pack.
>
> I don't recall offhand whether reprepare_packed_git() will open the new
> midx. If it doesn't, we'd still find the object via the actual pack
> .idx, but we may end up consulting the now-stale midx over and over (so
> we'll get the right answer, but there may be some room for
> optimization).

This all sounds right to me. And we do end up loading a new MIDX via
reprepare_packed_git(): that function calls prepare_packed_git() (which
doesn't immediately return, since we just zero'd out the
r->objects->packed_git_initialized flag). We then enumerate the ODBs,
calling prepare_multi_pack_index_one() and prepare_packed_git_one() for
each.

From there we end up in load_multi_pack_index(), which gets a freshened
view of the MIDX file. If all goes well there, then
prepare_multi_pack_index_one() assigns it to
r->objects->multi_pack_index as expected.

> The code in prepare_midx_pack() converts a numeric pack id (referenced
> inside the midx) into a "struct packed_git" pointer, caching the results
> in multi_pack_index->packs. That field holds NULL for "we have not
> looked it up yet" or a valid pointer to a packed_git. It probably needs
> to hold a third state: "we tried and failed".

Yup, that seems like a good direction to me.

> [1] There probably are optimization opportunities in add_packed_git(). I
>     don't think re-ordering will help much in the common case that we
>     actually do have the pack. But really, most callers do not care
>     about these auxiliary files at all! We could simply skip them during
>     the initial setup and lazy-load them via accessor functions.

Nice idea. I initially though, "that's not right, we definitely need to
populate the flag bits, e.g., p->is_cruft". But that was before I read
"and lazy-load them via accessor functions". That turns those fields
into a tri-state, which is a nice way to avoid this altogether.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files
  2025-05-20  9:53   ` [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
@ 2025-05-23  1:03     ` Taylor Blau
  0 siblings, 0 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-23  1:03 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano

On Tue, May 20, 2025 at 11:53:09AM +0200, Patrick Steinhardt wrote:
> Add a comment to `add_packed_git()` to make future readers aware of this
> ordering requirement.

Thanks, I am really glad that you added this comment to aide future
readers.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-22 16:59         ` Jeff King
  2025-05-22 18:44           ` Junio C Hamano
@ 2025-05-23  1:22           ` Taylor Blau
  2025-05-23  2:08             ` Jeff King
  1 sibling, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-23  1:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Patrick Steinhardt, git

On Thu, May 22, 2025 at 12:59:24PM -0400, Jeff King wrote:
> > By the way, I suspect I am not reading the code correctly, but I am
> > not sure what fill_midx_entry() does with a failed case.
> >
> > 	midx_for_object(&m, pos);
> > 	pack_int_id = nth_midxed_pack_int_id(m, pos);
> >
> > 	if (prepare_midx_pack(r, m, pack_int_id))
> > 		return 0;
> >
> > With or without cached failure, this should return 0 when and only
> > when m->packs[pack_int_id] is a usable pack.  But what about the
> > access on the next line?
>
> I think there's a subtlety here with incremental midx's, in that a pack
> id can be "global" within the whole midx chain, or a local index into a
> specific chain element's list of packs.

If you'll indulge me in a bit of pedantry for a moment, I would add that
the pack_int_id is *always* global within the whole MIDX chain. It only
happens to additionally be the correct local index when
m->num_packs_in_base is zero.

The thing here:

    m->packs[pack_int_id - m->num_packs_in_base];
             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

is indexing into the current layer's pack array, which only contains the
packs at that layer, so the index there is no longer really a
pack_int_id at all.

But...

> In fill_midx_entry(), I think we get such a global id back from
> nth_midxed_pack_int_id(). And then when we hand that to
> prepare_midx_pack(), it is converted into a local midx/id pair with:
>
>   pack_int_id = midx_for_pack(&m, pack_int_id);

...this confusion is really my fault, since the thing midx_for_pack() is
returning isn't really a pack_int_id in the classic sense at all. I
think I chose this name to avoid making the patch noisy, and to avoid
the possibility of accidentally using the global identifier when you
meant to use the local index.

(As an aside, I think I recall you suggesting a while ago that it might
be interesting to define "global" things with a different type than
"local" ones to prevent this sort of confusion. That would allow us to
keep both "pack_int_id" and the return value of "midx_for_pack()" in
scope at the same time, without the possibility of using one when you
meant to use the other.)

> where the end of that function is the same:
>
>   return pack_int_id - m->num_packs_in_base;
>
> you saw elsewhere. So at that point we have a local midx and index,
> which is what prepare_midx_pack() fills via the m->packs[pack_int_id]
> field.
>
> So when you say "only when m->packs[pack_int_id] is a usable pack", you
> are talking about the local m/pack_int_id within that function.
>
> But back in the caller...
>
> > 	p = m->packs[pack_int_id - m->num_packs_in_base];
> >
> > Do we have any guarantee that we called prepare_midx_pack() for
> > the pack at (pack_int_id - m->num_packs_in_base)th slot?  Can p
> > be NULL here?  And with the magic "we know this fails" value, can p
> > be that magic value?
>
> Our pack_int_id is the global one, so it needs to be adjusted. But this
> pack pointer we access is the same one that was filled (or not) by
> prepare_midx_pack(). So it cannot be NULL or the magic "fails" value,
> because prepare_midx_pack() returned 0.
>
> So I think this code is fine.

Thanks for the analysis.

> One thing that did puzzle me: in prepare_midx_pack() we not only adjust
> the pack_int_id, but we may walk back through the midx chain to find the
> correct multi_pack_index struct. Wouldn't the caller need to do the
> same?
>
> The answer is that it does. The midx_for_object() call in
> fill_midx_entry() does that same walk, storing the result in its local
> "m" variable. So the walk backwards in prepare_midx_pack() is
> superfluous for this particular caller, who we know is already handing
> us the desired multi_pack_index struct, and it could just do:
>
>   pack_int_id -= m->num_packs_in_base;

Right. You could imagine that after getting the pack_int_id back from
nth_midxed_pack_int_id(), we could do:

    midx_for_pack(&m, pack_int_id);

, which would be a noop because of the chain invariant that we never
duplicate objects or packs anywhere in the chain. (IOW, if you know
object X is in pack Y, and you move your MIDX pointer to the layer
containing X, you are guaranteed to be able to find Y in that layer's
array of packs.)

> rather than calling midx_for_pack(). But the same is not necessarily
> true for other callers, so we should continue calling that function.

...exactly ;-).

> I suspect this would all be a bit more obvious if prepare_midx_pack()
> simply returned the pack pointer, avoiding the need for callers to look
> at m->packs at all (and making it a true cache, internally only to
> prepare_midx_pack()).
>
> Looking at other callers of prepare_midx_pack():
>
>   - in fill_packs_from_midx(), we do not adjust our "m" to match the
>     index. But that is OK, because we adjust our local index (which we
>     get by iterating from 0 to m->num_packs) to a global index when
>     calling the function:
>
>       if (prepare_midx_pack(ctx->repo, m, m->num_packs_in_base + i))
> 	...
>       open_pack_index(m->packs[i]);
>
>     which is fine.
>
>   - I'm less sure of the call in expire_midx_packs(). It iterates over
>     num_packs in the same way, but does:
>
>       if (prepare_midx_pack(r, m, i))
>               continue;
>
>     and then looks at m->packs[i]. That would be wrong if "m" is not the
>     first item in the chain. Ah, I see. Earlier we do:
>
>       if (m->base_midx)
>               die(_("cannot expire packs from an incremental multi-pack-index"));
>
>     so we know that the global and local ids are equivalent in this
>     instance (since the "base" midx . Still seems a bit fragile.

Yeah, I think you can only meaningfully expire packs from the most
recent layer, which is why I added that guard. I agree it is still
fragile, though.

>   - There's a similar case in midx-write.c:want_included_pack(). That
>     one seems to have the same local/global confusion, but I do not
>     obviously see anything preventing it from being fed a non-base midx.
>     So it might possibly be buggy?

Yeah, this spot is definitely broken. At minimum it would need something
like:

--- 8< ---
diff --git a/midx-write.c b/midx-write.c
index 0897cbd829..54a04f7b75 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1634,9 +1634,10 @@ static int want_included_pack(struct repository *r,
 			      uint32_t pack_int_id)
 {
 	struct packed_git *p;
+	midx_for_pack(&m, pack_int_id);
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return 0;
-	p = m->packs[pack_int_id];
+	p = m->packs[pack_int_id - m->num_packs_in_base];
 	if (!pack_kept_objects && p->pack_keep)
 		return 0;
 	if (p->is_cruft)
--- >8 ---

>     Likewise fill_included_packs_batch() in the same file.

I think this one is actually OK for the same reason as the
expire_midx_packs() case. Its sole caller in midx_repack() has:

    if (m->base_midx)
        die(_("cannot repack an incremental multi-pack-index"));

, so we are OK there. We might want to add an ASSERT() in
fill_included_packs_batch() to make it clearer, though.

>     In both cases I think if prepare_midx_pack() returned a pointer, we
>     could just use it directly.

Agreed.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-22  5:32     ` Jeff King
  2025-05-22 15:47       ` Junio C Hamano
@ 2025-05-23  1:31       ` Taylor Blau
  2025-05-23  2:18         ` Jeff King
  1 sibling, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-23  1:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Junio C Hamano

On Thu, May 22, 2025 at 01:32:35AM -0400, Jeff King wrote:
> On Tue, May 20, 2025 at 11:53:10AM +0200, Patrick Steinhardt wrote:
>
> > @@ -458,6 +458,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
> >
> >  	pack_int_id = midx_for_pack(&m, pack_int_id);
> >
> > +	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
> > +		return 1;
> >  	if (m->packs[pack_int_id])
> >  		return 0;
>
> I did wonder while writing this if we might be able to hide the magic
> number and gross casting inside a constant or macro. I think just:
>
>   #define MIDX_PACK_ERROR ((void *)(intptr_t)-1)
>
> would be enough?
>
> Though...

I agree with the longer-term goal of having prepare_midx_pack() just
return a pointer to a struct packed_git. But in the meantime, I do think
having a #define for the "oops, I tried to load this packfile and it was
broken" case is a good idea.

> > @@ -495,6 +499,8 @@ struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
> >  				   uint32_t pack_int_id)
> >  {
> >  	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
> > +	if (m->packs[local_pack_int_id] == (void *)(intptr_t)-1)
> > +		return NULL;
> >  	return m->packs[local_pack_int_id];
>
> Yuck, yet another spot that needs to be aware of the new tri-state
> value. One alternative is using an auxiliary array to cache the errors,
> and then only the lookup function needs to care. Like:

I like this direction, though I dislike having a separate array that we
need to keep in sync with m->packs. It might be nice to have an array
like:

  struct {
      struct packed_git *p;
      unsigned err:1;
  } *packs;

, which would allow you to keep the error state next to the packed_git
itself.

I wonder if changing the signature to:

    int prepare_midx_pack(struct repository *r,
                          struct multi_pack_index *m,
                          uint32_t pack_int_id,
                          struct packed_git **p_out);

would be a good idea. It allows you to pass garbage input (like a
non-existent pack_int_id) and get a useful error back. It also allows
you to pass a pack_int_id that is valid, but cannot be loaded and get a
useful error back via the return value.

But I think without actually trying it and seeing what the fallout looks
like, it's hard to say whether or not the above is a step in the right
direction.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH] packfile: avoid access(3p) calls for missing packs
  2025-05-23  1:02     ` Taylor Blau
@ 2025-05-23  2:03       ` Jeff King
  0 siblings, 0 replies; 53+ messages in thread
From: Jeff King @ 2025-05-23  2:03 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, Patrick Steinhardt, git

On Thu, May 22, 2025 at 09:02:06PM -0400, Taylor Blau wrote:

> > I don't recall offhand whether reprepare_packed_git() will open the new
> > midx. If it doesn't, we'd still find the object via the actual pack
> > .idx, but we may end up consulting the now-stale midx over and over (so
> > we'll get the right answer, but there may be some room for
> > optimization).
> 
> This all sounds right to me. And we do end up loading a new MIDX via
> reprepare_packed_git(): that function calls prepare_packed_git() (which
> doesn't immediately return, since we just zero'd out the
> r->objects->packed_git_initialized flag). We then enumerate the ODBs,
> calling prepare_multi_pack_index_one() and prepare_packed_git_one() for
> each.
> 
> From there we end up in load_multi_pack_index(), which gets a freshened
> view of the MIDX file. If all goes well there, then
> prepare_multi_pack_index_one() assigns it to
> r->objects->multi_pack_index as expected.

One thing that puzzled me here is how prepare_multi_pack_index_one()
deals with an existing r->objects->multi_pack_index:

          m = load_multi_pack_index(r, object_dir, local);
  
          if (m) {
                  struct multi_pack_index *mp = r->objects->multi_pack_index;
                  if (mp) {
                          m->next = mp->next;
                          mp->next = m;
                  } else
                          r->objects->multi_pack_index = m;
                  return 1;
          }

I don't see reprepare_packed_git() zero-ing out the existing midx
pointer. So would we hit that "if (mp)" block, in which case we seem to
create a loop of next pointers? But the old midx would still be the
first one in the loop. Would we still consult it?

Or am I just reading this totally wrong?

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-23  1:22           ` Taylor Blau
@ 2025-05-23  2:08             ` Jeff King
  2025-05-23 17:46               ` Taylor Blau
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2025-05-23  2:08 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, Patrick Steinhardt, git

On Thu, May 22, 2025 at 09:22:23PM -0400, Taylor Blau wrote:

> > I think there's a subtlety here with incremental midx's, in that a pack
> > id can be "global" within the whole midx chain, or a local index into a
> > specific chain element's list of packs.
> 
> If you'll indulge me in a bit of pedantry for a moment, I would add that
> the pack_int_id is *always* global within the whole MIDX chain. It only
> happens to additionally be the correct local index when
> m->num_packs_in_base is zero.

Yeah, I agree with that...

> The thing here:
> 
>     m->packs[pack_int_id - m->num_packs_in_base];
>              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> is indexing into the current layer's pack array, which only contains the
> packs at that layer, so the index there is no longer really a
> pack_int_id at all.

...and this...

> > In fill_midx_entry(), I think we get such a global id back from
> > nth_midxed_pack_int_id(). And then when we hand that to
> > prepare_midx_pack(), it is converted into a local midx/id pair with:
> >
> >   pack_int_id = midx_for_pack(&m, pack_int_id);
> 
> ...this confusion is really my fault, since the thing midx_for_pack() is
> returning isn't really a pack_int_id in the classic sense at all. I
> think I chose this name to avoid making the patch noisy, and to avoid
> the possibility of accidentally using the global identifier when you
> meant to use the local index.

...but yeah, this is the part that got me. ;) I think elsewhere you
called it local_pack_int_id, in nth_midxed_pack(). Which made things
more clear (though still calls it a pack_int_id ;) ).

> (As an aside, I think I recall you suggesting a while ago that it might
> be interesting to define "global" things with a different type than
> "local" ones to prevent this sort of confusion. That would allow us to
> keep both "pack_int_id" and the return value of "midx_for_pack()" in
> scope at the same time, without the possibility of using one when you
> meant to use the other.)

Yeah. The trouble is that it becomes awkward in C, since the language
will happily intermix two integer typedefs. So you have to wrap them in
a struct and access the struct fields everywhere.

> >   - There's a similar case in midx-write.c:want_included_pack(). That
> >     one seems to have the same local/global confusion, but I do not
> >     obviously see anything preventing it from being fed a non-base midx.
> >     So it might possibly be buggy?
> 
> Yeah, this spot is definitely broken. At minimum it would need something
> like:
> 
> --- 8< ---
> diff --git a/midx-write.c b/midx-write.c
> index 0897cbd829..54a04f7b75 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1634,9 +1634,10 @@ static int want_included_pack(struct repository *r,
>  			      uint32_t pack_int_id)
>  {
>  	struct packed_git *p;
> +	midx_for_pack(&m, pack_int_id);
>  	if (prepare_midx_pack(r, m, pack_int_id))
>  		return 0;
> -	p = m->packs[pack_int_id];
> +	p = m->packs[pack_int_id - m->num_packs_in_base];
>  	if (!pack_kept_objects && p->pack_keep)
>  		return 0;
>  	if (p->is_cruft)
> --- >8 ---

Yep, that's exactly what I was envisioning. I guess it's probably
possible to trigger in practice by writing a new midx based on an
existing incremental state. I'll let you figure that part out. :)

> >     Likewise fill_included_packs_batch() in the same file.
> 
> I think this one is actually OK for the same reason as the
> expire_midx_packs() case. Its sole caller in midx_repack() has:
> 
>     if (m->base_midx)
>         die(_("cannot repack an incremental multi-pack-index"));
> 
> , so we are OK there. We might want to add an ASSERT() in
> fill_included_packs_batch() to make it clearer, though.

Ah, good. Yes, I agree that an assertion would be a nice bit of
documentation/safety. Though for these cases I think they only care
about the pack (not the containing midx), so changing the return type of
prepare_midx_pack() might be an even easier way to get that safety.

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-23  1:31       ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Taylor Blau
@ 2025-05-23  2:18         ` Jeff King
  0 siblings, 0 replies; 53+ messages in thread
From: Jeff King @ 2025-05-23  2:18 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Patrick Steinhardt, git, Junio C Hamano

On Thu, May 22, 2025 at 09:31:48PM -0400, Taylor Blau wrote:

> > Yuck, yet another spot that needs to be aware of the new tri-state
> > value. One alternative is using an auxiliary array to cache the errors,
> > and then only the lookup function needs to care. Like:
> 
> I like this direction, though I dislike having a separate array that we
> need to keep in sync with m->packs. It might be nice to have an array
> like:
> 
>   struct {
>       struct packed_git *p;
>       unsigned err:1;
>   } *packs;
> 
> , which would allow you to keep the error state next to the packed_git
> itself.

In general, yes, I think array-of-struct is better than struct-of-array.
In this particular case the latter is not too bad because the management
is all handled centrally in the midx constructor.

The downside of doing array-of-struct as you propose is that every site
that uses it will need to be modified. But that is at least a one-time
pain and not an ongoing maintenance burden (unlike the "magic" value
approach).

> I wonder if changing the signature to:
> 
>     int prepare_midx_pack(struct repository *r,
>                           struct multi_pack_index *m,
>                           uint32_t pack_int_id,
>                           struct packed_git **p_out);
> 
> would be a good idea. It allows you to pass garbage input (like a
> non-existent pack_int_id) and get a useful error back. It also allows
> you to pass a pack_int_id that is valid, but cannot be loaded and get a
> useful error back via the return value.

Would the return value be a richer set of values than the current
success/fail? If not, then I think just returning the pack pointer does
that fine.

I was also tempted to suggest that it should take a "struct
multi_pack_index **", to return the matching midx as an out-parameter.
That would "just work" for callers that want to look at the surrounding
midx, too.

But maybe it gets weird for ones that are (correctly) expecting to find
the pack within the same midx. I.e., code like this:

  for (i = 0; i < m->num_packs; i++) {
	if (prepare_midx_pack(r, m, i + m->num_packs_in_base))
		die(...);
	/* do something with m->pack[i] */
  }

is correct now, and we _shouldn't_ ever need to switch to a different
"m" inside prepare_midx_pack(). But if we ever did, propagating that up
to the caller would be mighty confusing.

So perhaps better to leave it as-is, and let the caller explicitly do
midx_for_pack() or whatever to find the right "m" as necessary.

> But I think without actually trying it and seeing what the fallout looks
> like, it's hard to say whether or not the above is a step in the right
> direction.

Yup. All of this can wait, too. Patrick's series is fixing its own
localized issue (however he wants to structure the extra bit of
storage). Most of what we're talking about here is future-proofing so it
can happen at our leisure. I think the only other actual bug is the
want_included_pack() one discussed in the other part of the thread.

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-23  2:08             ` Jeff King
@ 2025-05-23 17:46               ` Taylor Blau
  2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
  0 siblings, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-23 17:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Patrick Steinhardt, git

On Thu, May 22, 2025 at 10:08:20PM -0400, Jeff King wrote:
> > (As an aside, I think I recall you suggesting a while ago that it might
> > be interesting to define "global" things with a different type than
> > "local" ones to prevent this sort of confusion. That would allow us to
> > keep both "pack_int_id" and the return value of "midx_for_pack()" in
> > scope at the same time, without the possibility of using one when you
> > meant to use the other.)
>
> Yeah. The trouble is that it becomes awkward in C, since the language
> will happily intermix two integer typedefs. So you have to wrap them in
> a struct and access the struct fields everywhere.

Yup :-<.

> > >   - There's a similar case in midx-write.c:want_included_pack(). That
> > >     one seems to have the same local/global confusion, but I do not
> > >     obviously see anything preventing it from being fed a non-base midx.
> > >     So it might possibly be buggy?
> >
> > Yeah, this spot is definitely broken. At minimum it would need something
> > like:
> >
> > --- 8< ---
> > diff --git a/midx-write.c b/midx-write.c
> > index 0897cbd829..54a04f7b75 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -1634,9 +1634,10 @@ static int want_included_pack(struct repository *r,
> >  			      uint32_t pack_int_id)
> >  {
> >  	struct packed_git *p;
> > +	midx_for_pack(&m, pack_int_id);
> >  	if (prepare_midx_pack(r, m, pack_int_id))
> >  		return 0;
> > -	p = m->packs[pack_int_id];
> > +	p = m->packs[pack_int_id - m->num_packs_in_base];
> >  	if (!pack_kept_objects && p->pack_keep)
> >  		return 0;
> >  	if (p->is_cruft)
> > --- >8 ---
>
> Yep, that's exactly what I was envisioning. I guess it's probably
> possible to trigger in practice by writing a new midx based on an
> existing incremental state. I'll let you figure that part out. :)

Hmm. I thought that this spot was broken last night, but looking at it
again today I think that it is actually OK. I started writing an
analysis of why in response to your email, and then decided to throw it
away since those details really should live in the patch message. Here's
what I came up with:

--- 8< ---
Subject: [PATCH] midx-write.c: guard against incremental MIDXs in
 want_included_pack()

The function want_included_pack() is used to determine whether or not a
the packfile corresponding to some given pack_int_id should be included
in a 'git multi-pack-index repack' operation.

This spot looks like it would be broken, particularly in:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;
    p = m->packs[pack_int_id];

, when pack_int_id is greater than m->num_pack_in_base (supposing that
m->num_packs_in_base is non-zero, or equivalently that m->base_midx is
non-NULL).

Suppose we have two MIDXs in an incremental MIDX chain, each having two
packs:

  - M0 = {packs: [P0, P1], base_midx: NULL}
  - M1 = {packs: [P2, P3], base_midx: M0}

noting that each pack is identified by its global pack_int_id within the
chain.

So if you do something like:

    want_included_pack(the_repository, M1, pack_kept_objects, 2);

that function will call prepare_midx_pack(), which is smart enough to
realize that the pack of interest is in the current layer (M1), and
knows how to adjust its global pack_int_id into an index into the
current layer's 'packs' array.

But the following line:

    p = m->packs[pack_int_id]; /* m is M1, pack_int_id is 2 */

looks broken, since each layer of the MIDX only maintains an array of
the packs stored within that layer, and 'm' wasn't adjusted back to
point at M1->base_midx (M0).

The right thing to do would be:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;

    /* open-code midx.c::midx_for_pack(), which is private */
    while (m && pack_int_id < m->num_packs_in_base)
        m = m->base_midx;

    if (!m)
        BUG("broken midx?");
    if (pack_int_id >= m->num_packs + m->num_packs_in_base)
        BUG("broken pack_int_id?");

    p = m->packs[pack_int_id - m->num_packs_in_base];

But that would be overkill, since this function never deals with
incremental MIDXs having more than one layer! To see why, observe that
want_included_pack() has two callers:

  - midx-write.c::fill_included_packs_all()
  - midx-write.c::fill_included_packs_batch()

and those functions' sole caller is in midx-write.c::midx_repack(),
which dispatches a call to one or the other depending on whether or not
the batch_size is non-zero.

And at the very top of midx_repack(), we have a guard against
non-trivial incremental MIDX chains:

    if (m->base_midx)
        die(_("cannot repack an incremental multi-pack-index"));

So want_included_pack() is OK becuase it will never encounter a
situation where it has to chase backwards through the '->base_midx'
pointer. But that is not immediately clear from reading the code, and is
too fragile for my comfort. Make this more clear by adding an ASSERT()
to the above effect.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/midx-write.c b/midx-write.c
index 0897cbd829..991c42d1dc 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1634,6 +1634,9 @@ static int want_included_pack(struct repository *r,
 			      uint32_t pack_int_id)
 {
 	struct packed_git *p;
+
+	ASSERT(m && !m->base_midx);
+
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return 0;
 	p = m->packs[pack_int_id];
@@ -1653,6 +1656,8 @@ static void fill_included_packs_all(struct repository *r,
 	uint32_t i;
 	int pack_kept_objects = 0;

+	ASSERT(m && !m->base_midx);
+
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);

 	for (i = 0; i < m->num_packs; i++) {
@@ -1673,6 +1678,8 @@ static void fill_included_packs_batch(struct repository *r,
 	struct repack_info *pack_info;
 	int pack_kept_objects = 0;

+	ASSERT(m && !m->base_midx);
+
 	CALLOC_ARRAY(pack_info, m->num_packs);

 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
--
2.49.0.221.g485f5f8636.dirty
--- >8 ---

> > >     Likewise fill_included_packs_batch() in the same file.
> >
> > I think this one is actually OK for the same reason as the
> > expire_midx_packs() case. Its sole caller in midx_repack() has:
> >
> >     if (m->base_midx)
> >         die(_("cannot repack an incremental multi-pack-index"));
> >
> > , so we are OK there. We might want to add an ASSERT() in
> > fill_included_packs_batch() to make it clearer, though.
>
> Ah, good. Yes, I agree that an assertion would be a nice bit of
> documentation/safety. Though for these cases I think they only care
> about the pack (not the containing midx), so changing the return type of
> prepare_midx_pack() might be an even easier way to get that safety.

Ironically, the same argument applies for this function (and the _all()
variant) as well ;-).

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics
  2025-05-23 17:46               ` Taylor Blau
@ 2025-05-25 18:41                 ` Taylor Blau
  2025-05-25 18:41                   ` [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack Taylor Blau
                                     ` (5 more replies)
  0 siblings, 6 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-25 18:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

[This brief series is based on ps/midx-negative-packfile-cache].

This series implements a suggestion from this thread to have
prepare_midx_pack() return a pointer to packed_git structure identified
by the given pack_int_id.

This convention makes converting a pack_int_id into a bona-fide pack
significantly less error-prone, particularly when dealing with
incremental MIDXs.

Along the way, I noticed a couple of opportunities for clean-up and
light refactorings, which make up the first four patches.

In the future, I think we could take this further by:

  - applying the same pattern to the 'pack_names' array, and

  - renaming both arrays something like 'packs_' or similar to indicate
    that they are "private" and should not be accessed outside of
    midx.c.

We might be able to go even further than that by making the entire
structure opaque to external callers by defining it within midx.c
itself. We don't get there in this series, but it sets us on the right
direction.

Thanks in advance for your review!

Taylor Blau (5):
  pack-bitmap.c: fix broken warning() when missing MIDX'd pack
  midx-write.c: guard against incremental MIDXs in want_included_pack()
  midx-write.c: simplify fill_packs_from_midx()
  midx-write.c: extract inner loop from fill_packs_from_midx()
  midx: return a `packed_git` pointer from `prepare_midx_pack()`

 midx-write.c  | 114 +++++++++++++++++++++++++++++++-------------------
 midx.c        |  81 ++++++++++++++++++-----------------
 midx.h        |   4 +-
 pack-bitmap.c |   4 +-
 4 files changed, 115 insertions(+), 88 deletions(-)


base-commit: 73fe0882eec06f7b9ea021fe062a32cd1731e574
-- 
2.49.0.641.gb9c9c4c3bd

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack
  2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
@ 2025-05-25 18:41                   ` Taylor Blau
  2025-05-26  7:23                     ` Patrick Steinhardt
  2025-05-25 18:41                   ` [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
                                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-25 18:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

In commit 44f9fd6496 (pack-bitmap.c: check preferred pack validity when
opening MIDX bitmap, 2022-05-24) we started opening all packs contained
within a MIDX when loading its corresponding bitmap.

However, if a pack is missing, then we will emit a warning like:

    warning: could not open pack pack-$HASH.pack

Later on commit f31a17cea5 (pack-bitmap.c: open and store incremental
bitmap layers, 2025-03-20) updated this code to work with incremental
MIDX bitmaps, but did not adjust the index into the 'pack_names' field.

So if there is a pack in an incremental MIDX chain with a pack in a MIDX
layer with a non-zero number of packs in its base layer(s) (in other
words, any MIDX layer outside of the first one) that cannot be loaded,
we will do an out-of-bounds lookup.

Adjust the lookup into the 'pack_names' array by the number of packs in
the base to prevent a potential SIGSEGV here.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b9f1d86604..99c4927e9c 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -490,7 +490,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 	for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
 		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
 			warning(_("could not open pack %s"),
-				bitmap_git->midx->pack_names[i]);
+				bitmap_git->midx->pack_names[i - bitmap_git->midx->num_packs_in_base]);
 			goto cleanup;
 		}
 	}
-- 
2.49.0.641.gb9c9c4c3bd


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack()
  2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
  2025-05-25 18:41                   ` [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack Taylor Blau
@ 2025-05-25 18:41                   ` Taylor Blau
  2025-05-26  7:23                     ` Patrick Steinhardt
  2025-05-25 18:41                   ` [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx() Taylor Blau
                                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-25 18:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

The function want_included_pack() is used to determine whether or not a
the packfile corresponding to some given pack_int_id should be included
in a 'git multi-pack-index repack' operation.

This spot looks like it would be broken, particularly in:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;
    p = m->packs[pack_int_id];

, when pack_int_id is greater than m->num_pack_in_base (supposing that
m->num_packs_in_base is non-zero, or equivalently that m->base_midx is
non-NULL).

Suppose we have two MIDXs in an incremental MIDX chain, each having two
packs:

  - M0 = {packs: [P0, P1], base_midx: NULL}
  - M1 = {packs: [P2, P3], base_midx: M0}

noting that each pack is identified by its global pack_int_id within the
chain.

So if you do something like:

    want_included_pack(the_repository, M1, pack_kept_objects, 2);

that function will call prepare_midx_pack(), which is smart enough to
realize that the pack of interest is in the current layer (M1), and
knows how to adjust its global pack_int_id into an index into the
current layer's 'packs' array.

But the following line:

    p = m->packs[pack_int_id]; /* m is M1, pack_int_id is 2 */

looks broken, since each layer of the MIDX only maintains an array of
the packs stored within that layer, and 'm' wasn't adjusted back to
point at M1->base_midx (M0).

The right thing to do would be:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;

    /* open-code midx.c::midx_for_pack(), which is private */
    while (m && pack_int_id < m->num_packs_in_base)
        m = m->base_midx;

    if (!m)
        BUG("broken midx?");
    if (pack_int_id >= m->num_packs + m->num_packs_in_base)
        BUG("broken pack_int_id?");

    p = m->packs[pack_int_id - m->num_packs_in_base];

But that would be overkill, since this function never deals with
incremental MIDXs having more than one layer! To see why, observe that
want_included_pack() has two callers:

  - midx-write.c::fill_included_packs_all()
  - midx-write.c::fill_included_packs_batch()

and those functions' sole caller is in midx-write.c::midx_repack(),
which dispatches a call to one or the other depending on whether or not
the batch_size is non-zero.

And at the very top of midx_repack(), we have a guard against
non-trivial incremental MIDX chains:

    if (m->base_midx)
        die(_("cannot repack an incremental multi-pack-index"));

So want_included_pack() is OK becuase it will never encounter a
situation where it has to chase backwards through the '->base_midx'
pointer. But that is not immediately clear from reading the code, and is
too fragile for my comfort. Make this more clear by adding an ASSERT()
to the above effect.

Apply the same treatment to each of the fill_included_packs-related
functions as well, since those are deceptively OK by the same reasoning.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/midx-write.c b/midx-write.c
index dd3b3070e5..e4a3830d45 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1636,6 +1636,9 @@ static int want_included_pack(struct repository *r,
 			      uint32_t pack_int_id)
 {
 	struct packed_git *p;
+
+	ASSERT(m && !m->base_midx);
+
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return 0;
 	p = m->packs[pack_int_id];
@@ -1655,6 +1658,8 @@ static void fill_included_packs_all(struct repository *r,
 	uint32_t i;
 	int pack_kept_objects = 0;
 
+	ASSERT(m && !m->base_midx);
+
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
@@ -1675,6 +1680,8 @@ static void fill_included_packs_batch(struct repository *r,
 	struct repack_info *pack_info;
 	int pack_kept_objects = 0;
 
+	ASSERT(m && !m->base_midx);
+
 	CALLOC_ARRAY(pack_info, m->num_packs);
 
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
-- 
2.49.0.641.gb9c9c4c3bd


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx()
  2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
  2025-05-25 18:41                   ` [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack Taylor Blau
  2025-05-25 18:41                   ` [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
@ 2025-05-25 18:41                   ` Taylor Blau
  2025-05-26  7:23                     ` Patrick Steinhardt
  2025-05-25 18:42                   ` [PATCH 4/5] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
                                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-25 18:41 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

When enumerating packs within fill_packs_from_midx(), we open the pack's
index iff we are either writing a reverse index, have a preferred pack
(by name) or both.

Let's simplify this into a single case by setting the
MIDX_WRITE_REV_INDEX flag bit when we have a preferred_pack_name. This
is a little bit of a shortcut to reduce the line length in the loop
below. But it sets us up nicely to extract the inner loop of this
function out into its own function, where we will no longer have to pass
the preferred_pack_name.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index e4a3830d45..7802a4b694 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -943,6 +943,19 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
 {
 	struct multi_pack_index *m;
 
+	if (preferred_pack_name) {
+		/*
+		 * If a preferred pack is specified, need to have
+		 * packed_git's loaded to ensure the chosen preferred
+		 * pack has a non-zero object count.
+		 *
+		 * Trick ourselves into thinking that we're writing a
+		 * reverse index in this case in order to open up the
+		 * pack index file.
+		 */
+		flags |= MIDX_WRITE_REV_INDEX;
+	}
+
 	for (m = ctx->m; m; m = m->base_midx) {
 		uint32_t i;
 
@@ -953,13 +966,8 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
 			 * If generating a reverse index, need to have
 			 * packed_git's loaded to compare their
 			 * mtimes and object count.
-			 *
-			 * If a preferred pack is specified, need to
-			 * have packed_git's loaded to ensure the chosen
-			 * preferred pack has a non-zero object count.
 			 */
-			if (flags & MIDX_WRITE_REV_INDEX ||
-			    preferred_pack_name) {
+			if (flags & MIDX_WRITE_REV_INDEX) {
 				if (prepare_midx_pack(ctx->repo, m,
 						      m->num_packs_in_base + i)) {
 					error(_("could not load pack"));
-- 
2.49.0.641.gb9c9c4c3bd


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 4/5] midx-write.c: extract inner loop from fill_packs_from_midx()
  2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
                                     ` (2 preceding siblings ...)
  2025-05-25 18:41                   ` [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx() Taylor Blau
@ 2025-05-25 18:42                   ` Taylor Blau
  2025-05-25 18:42                   ` [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
  2025-05-28 22:58                   ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
  5 siblings, 0 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-25 18:42 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

The function fill_packs_from_midx() does relatively little, but ends up
in a doubly-nested loop because we're enumerating each pack within each
layer of the incremental MIDX chain.

Let's de-dent the inner loop of fill_packs_from_midx() by extracting its
contents into a separate function, and calling that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 62 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 36 insertions(+), 26 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 7802a4b694..2afb67d728 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -938,6 +938,38 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 	return result;
 }
 
+static int fill_packs_from_midx_1(struct write_midx_context *ctx,
+				  struct multi_pack_index *m,
+				  uint32_t flags)
+{
+	for (uint32_t i = 0; i < m->num_packs; i++) {
+		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+
+		/*
+		 * If generating a reverse index, need to have
+		 * packed_git's loaded to compare their
+		 * mtimes and object count.
+		 */
+		if (flags & MIDX_WRITE_REV_INDEX) {
+			if (prepare_midx_pack(ctx->repo, m,
+					      m->num_packs_in_base + i)) {
+				error(_("could not load pack"));
+				return 1;
+			}
+
+			if (open_pack_index(m->packs[i]))
+				die(_("could not open index for %s"),
+				    m->packs[i]->pack_name);
+		}
+
+		fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
+			       m->pack_names[i],
+			       m->num_packs_in_base + i);
+	}
+
+	return 0;
+}
+
 static int fill_packs_from_midx(struct write_midx_context *ctx,
 				const char *preferred_pack_name, uint32_t flags)
 {
@@ -957,33 +989,11 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
 	}
 
 	for (m = ctx->m; m; m = m->base_midx) {
-		uint32_t i;
-
-		for (i = 0; i < m->num_packs; i++) {
-			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
-			/*
-			 * If generating a reverse index, need to have
-			 * packed_git's loaded to compare their
-			 * mtimes and object count.
-			 */
-			if (flags & MIDX_WRITE_REV_INDEX) {
-				if (prepare_midx_pack(ctx->repo, m,
-						      m->num_packs_in_base + i)) {
-					error(_("could not load pack"));
-					return 1;
-				}
-
-				if (open_pack_index(m->packs[i]))
-					die(_("could not open index for %s"),
-					    m->packs[i]->pack_name);
-			}
-
-			fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
-				       m->pack_names[i],
-				       m->num_packs_in_base + i);
-		}
+		int ret = fill_packs_from_midx_1(ctx, m, flags);
+		if (ret)
+			return ret;
 	}
+
 	return 0;
 }
 
-- 
2.49.0.641.gb9c9c4c3bd


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()`
  2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
                                     ` (3 preceding siblings ...)
  2025-05-25 18:42                   ` [PATCH 4/5] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
@ 2025-05-25 18:42                   ` Taylor Blau
  2025-05-26  7:24                     ` Patrick Steinhardt
  2025-05-28 22:58                   ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
  5 siblings, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-25 18:42 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

The prepare_midx_pack() function is designed to convert a MIDX-specific
pack_int_id for a given pack into a pointer into an actual `packed_git`
structure.

In general, these calls look something like:

    struct packed_git *p;
    if (prepare_midx_pack(the_repository, some_midx, some_pack_int_id))
        die("could not load pack xyz");
    p = some_midx->packs[some_pack_int_id];

, and in a pre-incremental MIDX world, this pattern works well. However,
in a post-incremental MIDX world, this pattern is a little more prone to
errors.

These errors can happen when the given 'pack_int_id' is not usable as an
index into the 'm->packs' array. And this happens in all layers but the
bottom-most one in an incremental MIDX chain. Each layer stores only the
packs that are local to that layer of the chain, and offsets them by the
total number of packs in the base MIDX(s).

But there is other awkwardness here. Thinking back to the above snippet,
suppose that the pack with ID 'some_pack_int_id' is in a layer in the
middle of the MIDX chain. Then it is still invalid to do:

    p = some_midx->packs[some_pack_int_id - some_midx->num_packs_in_base];

, becuase the top-most layer (here 'some_midx') may not even have that
pack! So we would have to chase down the '->base_midx' pointer in order
to get the correct result. midx.c has a helper to do this (called
'midx_for_pack()'), but it is meant only for internal use.

That means that a full, working version of the above adjusted to handle
incremental MIDXs would look something like:

    struct packed_git *p;
    if (prepare_midx_pack(the_repository, some_midx, some_pack_int_id))
        die("could not load pack xyz");

    while (m && pack_int_id < m->num_packs_in_base)
        m = m->base_midx;

    if (!m)
        BUG("broken midx?");
    if (pack_int_id >= m->num_packs + m->num_packs_in_base)
        BUG("broken pack_int_id?");

    p = m->packs[pack_int_id - m->num_packs_in_base];

, which is far too verbose to access a single pack by its pack_int_id in
a MIDX chain.

Let's instead have prepare_midx_pack() return a pointer to the
packed_git structure itself, hiding the above as an implementation
detail of prepare_midx_pack(). This patch turns the above snippet into:

    struct packed_git *p = prepare_midx_pack(the_repository, some_midx,
                                             some_pack_int_id);
    if (!p)
        die("could not load pack xyz");

making it far easier and less error-prone to access packs by their
pack_int_id in a MIDX chain.

(In the future, we may want to consider similar treatment for, e.g., the
pack_names array. Likewise, it might make sense to rename the "packs"
member of the MIDX structure to suggest that it shouldn't be accessed
directly outside of midx.c.)

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c  | 37 +++++++++++------------
 midx.c        | 81 +++++++++++++++++++++++++--------------------------
 midx.h        |  4 ++-
 pack-bitmap.c |  2 +-
 4 files changed, 63 insertions(+), 61 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index 2afb67d728..e8dc771665 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -943,6 +943,8 @@ static int fill_packs_from_midx_1(struct write_midx_context *ctx,
 				  uint32_t flags)
 {
 	for (uint32_t i = 0; i < m->num_packs; i++) {
+		struct packed_git *p = NULL;
+
 		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
 
 		/*
@@ -951,18 +953,19 @@ static int fill_packs_from_midx_1(struct write_midx_context *ctx,
 		 * mtimes and object count.
 		 */
 		if (flags & MIDX_WRITE_REV_INDEX) {
-			if (prepare_midx_pack(ctx->repo, m,
-					      m->num_packs_in_base + i)) {
+			p = prepare_midx_pack(ctx->repo, m,
+					      m->num_packs_in_base + i);
+			if (!p) {
 				error(_("could not load pack"));
 				return 1;
 			}
 
-			if (open_pack_index(m->packs[i]))
+			if (open_pack_index(p))
 				die(_("could not open index for %s"),
-				    m->packs[i]->pack_name);
+				    p->pack_name);
 		}
 
-		fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
+		fill_pack_info(&ctx->info[ctx->nr++], p,
 			       m->pack_names[i],
 			       m->num_packs_in_base + i);
 	}
@@ -1596,20 +1599,19 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 					  _("Finding and deleting unreferenced packfiles"),
 					  m->num_packs);
 	for (i = 0; i < m->num_packs; i++) {
+		struct packed_git *p;
 		char *pack_name;
 		display_progress(progress, i + 1);
 
 		if (count[i])
 			continue;
 
-		if (prepare_midx_pack(r, m, i))
+		p = prepare_midx_pack(r, m, i);
+		if (!p || p->pack_keep || p->is_cruft)
 			continue;
 
-		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
-			continue;
-
-		pack_name = xstrdup(m->packs[i]->pack_name);
-		close_pack(m->packs[i]);
+		pack_name = xstrdup(p->pack_name);
+		close_pack(p);
 
 		string_list_insert(&packs_to_drop, m->pack_names[i]);
 		unlink_pack_path(pack_name, 0);
@@ -1657,9 +1659,9 @@ static int want_included_pack(struct repository *r,
 
 	ASSERT(m && !m->base_midx);
 
-	if (prepare_midx_pack(r, m, pack_int_id))
+	p = prepare_midx_pack(r, m, pack_int_id);
+	if (!p)
 		return 0;
-	p = m->packs[pack_int_id];
 	if (!pack_kept_objects && p->pack_keep)
 		return 0;
 	if (p->is_cruft)
@@ -1705,12 +1707,11 @@ static void fill_included_packs_batch(struct repository *r,
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
+		struct packed_git *p = prepare_midx_pack(r, m, i);
+
 		pack_info[i].pack_int_id = i;
-
-		if (prepare_midx_pack(r, m, i))
-			continue;
-
-		pack_info[i].mtime = m->packs[i]->mtime;
+		if (p)
+			pack_info[i].mtime = p->mtime;
 	}
 
 	for (i = 0; i < m->num_objects; i++) {
diff --git a/midx.c b/midx.c
index fbce88bd46..f7f509cf46 100644
--- a/midx.c
+++ b/midx.c
@@ -449,50 +449,48 @@ static uint32_t midx_for_pack(struct multi_pack_index **_m,
 	return pack_int_id - m->num_packs_in_base;
 }
 
-int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
-		      uint32_t pack_int_id)
+struct packed_git *prepare_midx_pack(struct repository *r,
+				     struct multi_pack_index *m,
+				     uint32_t pack_int_id)
 {
-	struct strbuf pack_name = STRBUF_INIT;
-	struct strbuf key = STRBUF_INIT;
-	struct packed_git *p;
+	uint32_t pack_pos = midx_for_pack(&m, pack_int_id);
 
-	pack_int_id = midx_for_pack(&m, pack_int_id);
+	if (!m->packs[pack_pos]) {
+		struct strbuf pack_name = STRBUF_INIT;
+		struct strbuf key = STRBUF_INIT;
+		struct packed_git *p;
 
-	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
-		return 1;
-	if (m->packs[pack_int_id])
-		return 0;
+		strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
+			    m->pack_names[pack_pos]);
 
-	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
-		    m->pack_names[pack_int_id]);
-
-	/* pack_map holds the ".pack" name, but we have the .idx */
-	strbuf_addbuf(&key, &pack_name);
-	strbuf_strip_suffix(&key, ".idx");
-	strbuf_addstr(&key, ".pack");
-	p = hashmap_get_entry_from_hash(&r->objects->pack_map,
-					strhash(key.buf), key.buf,
-					struct packed_git, packmap_ent);
-	if (!p) {
-		p = add_packed_git(r, pack_name.buf, pack_name.len, m->local);
-		if (p) {
-			install_packed_git(r, p);
-			list_add_tail(&p->mru, &r->objects->packed_git_mru);
+		/* pack_map holds the ".pack" name, but we have the .idx */
+		strbuf_addbuf(&key, &pack_name);
+		strbuf_strip_suffix(&key, ".idx");
+		strbuf_addstr(&key, ".pack");
+		p = hashmap_get_entry_from_hash(&r->objects->pack_map,
+						strhash(key.buf), key.buf,
+						struct packed_git, packmap_ent);
+		if (!p) {
+			p = add_packed_git(r, pack_name.buf, pack_name.len,
+					   m->local);
+			if (p) {
+				install_packed_git(r, p);
+				list_add_tail(&p->mru,
+					      &r->objects->packed_git_mru);
+			}
 		}
-	}
 
-	strbuf_release(&pack_name);
-	strbuf_release(&key);
+		strbuf_release(&pack_name);
+		strbuf_release(&key);
 
-	if (!p) {
-		m->packs[pack_int_id] = (void *)(intptr_t)-1;
-		return 1;
+		m->packs[pack_pos] = p ? p : (void *)(intptr_t)-1;
+		if (p)
+			p->multi_pack_index = 1;
 	}
 
-	p->multi_pack_index = 1;
-	m->packs[pack_int_id] = p;
-
-	return 0;
+	if (m->packs[pack_pos] == (void *)(intptr_t)-1)
+		return NULL;
+	return m->packs[pack_pos];
 }
 
 struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
@@ -514,10 +512,11 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 	if (!m->chunk_bitmapped_packs)
 		return error(_("MIDX does not contain the BTMP chunk"));
 
-	if (prepare_midx_pack(r, m, pack_int_id))
-		return error(_("could not load bitmapped pack %"PRIu32), pack_int_id);
+	bp->p = prepare_midx_pack(r, m, pack_int_id);
+	if (!bp->p)
+		return error(_("could not load bitmapped pack %"PRIu32),
+			     pack_int_id);
 
-	bp->p = m->packs[local_pack_int_id];
 	bp->bitmap_pos = get_be32((char *)m->chunk_bitmapped_packs +
 				  MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id);
 	bp->bitmap_nr = get_be32((char *)m->chunk_bitmapped_packs +
@@ -614,9 +613,9 @@ int fill_midx_entry(struct repository *r,
 	midx_for_object(&m, pos);
 	pack_int_id = nth_midxed_pack_int_id(m, pos);
 
-	if (prepare_midx_pack(r, m, pack_int_id))
+	p = prepare_midx_pack(r, m, pack_int_id);
+	if (!p)
 		return 0;
-	p = m->packs[pack_int_id - m->num_packs_in_base];
 
 	/*
 	* We are about to tell the caller where they can locate the
@@ -917,7 +916,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 						  _("Looking for referenced packfiles"),
 						  m->num_packs + m->num_packs_in_base);
 	for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
-		if (prepare_midx_pack(r, m, i))
+		if (!prepare_midx_pack(r, m, i))
 			midx_report("failed to load pack in position %d", i);
 
 		display_progress(progress, i + 1);
diff --git a/midx.h b/midx.h
index 9d1374cbd5..612999fe6a 100644
--- a/midx.h
+++ b/midx.h
@@ -104,7 +104,9 @@ void get_split_midx_filename_ext(const struct git_hash_algo *hash_algo,
 struct multi_pack_index *load_multi_pack_index(struct repository *r,
 					       const char *object_dir,
 					       int local);
-int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
+struct packed_git *prepare_midx_pack(struct repository *r,
+				     struct multi_pack_index *m,
+				     uint32_t pack_int_id);
 struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
 				   uint32_t pack_int_id);
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 99c4927e9c..648b9c0212 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -488,7 +488,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 	}
 
 	for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
-		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
+		if (!prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
 			warning(_("could not open pack %s"),
 				bitmap_git->midx->pack_names[i - bitmap_git->midx->num_packs_in_base]);
 			goto cleanup;
-- 
2.49.0.641.gb9c9c4c3bd

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack
  2025-05-25 18:41                   ` [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack Taylor Blau
@ 2025-05-26  7:23                     ` Patrick Steinhardt
  2025-05-28  2:00                       ` Taylor Blau
  0 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-26  7:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Sun, May 25, 2025 at 02:41:51PM -0400, Taylor Blau wrote:
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index b9f1d86604..99c4927e9c 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -490,7 +490,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
>  	for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
>  		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
>  			warning(_("could not open pack %s"),
> -				bitmap_git->midx->pack_names[i]);
> +				bitmap_git->midx->pack_names[i - bitmap_git->midx->num_packs_in_base]);

Doesn't this cause a negative array index though in the case where
`prepare_midx_pack()` returns an error for any `i` smaller than the
number of packs in base?

Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack()
  2025-05-25 18:41                   ` [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
@ 2025-05-26  7:23                     ` Patrick Steinhardt
  2025-05-28  2:08                       ` Taylor Blau
  0 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-26  7:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Sun, May 25, 2025 at 02:41:54PM -0400, Taylor Blau wrote:
[snip]
> So want_included_pack() is OK becuase it will never encounter a

s/becuase/because/

> situation where it has to chase backwards through the '->base_midx'
> pointer. But that is not immediately clear from reading the code, and is
> too fragile for my comfort. Make this more clear by adding an ASSERT()
> to the above effect.
> 
> Apply the same treatment to each of the fill_included_packs-related
> functions as well, since those are deceptively OK by the same reasoning.

Ok.

Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx()
  2025-05-25 18:41                   ` [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx() Taylor Blau
@ 2025-05-26  7:23                     ` Patrick Steinhardt
  2025-05-28  2:15                       ` Taylor Blau
  0 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-26  7:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Sun, May 25, 2025 at 02:41:57PM -0400, Taylor Blau wrote:
> diff --git a/midx-write.c b/midx-write.c
> index e4a3830d45..7802a4b694 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -943,6 +943,19 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
>  {
>  	struct multi_pack_index *m;
>  
> +	if (preferred_pack_name) {
> +		/*
> +		 * If a preferred pack is specified, need to have
> +		 * packed_git's loaded to ensure the chosen preferred
> +		 * pack has a non-zero object count.
> +		 *
> +		 * Trick ourselves into thinking that we're writing a
> +		 * reverse index in this case in order to open up the
> +		 * pack index file.
> +		 */
> +		flags |= MIDX_WRITE_REV_INDEX;
> +	}

This change feels a bit weird to me. Sure, it does allow us to pull out
the loop in the subsequent patch. But honestly, that makes this
workaround even weirder in that we now set unrelated flags in some
function and expect a different function to only honor it in order to
open the packfile.

Shouldn't we instead have a separate flag for the new function that
tells it whether or not it is supposed to prepare the pack?

Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()`
  2025-05-25 18:42                   ` [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
@ 2025-05-26  7:24                     ` Patrick Steinhardt
  2025-05-28  2:18                       ` Taylor Blau
  0 siblings, 1 reply; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-26  7:24 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Sun, May 25, 2025 at 02:42:03PM -0400, Taylor Blau wrote:
> diff --git a/midx.c b/midx.c
> index fbce88bd46..f7f509cf46 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -449,50 +449,48 @@ static uint32_t midx_for_pack(struct multi_pack_index **_m,
>  	return pack_int_id - m->num_packs_in_base;
>  }
>  
> -int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
> -		      uint32_t pack_int_id)
> +struct packed_git *prepare_midx_pack(struct repository *r,
> +				     struct multi_pack_index *m,
> +				     uint32_t pack_int_id)
>  {
> -	struct strbuf pack_name = STRBUF_INIT;
> -	struct strbuf key = STRBUF_INIT;
> -	struct packed_git *p;
> +	uint32_t pack_pos = midx_for_pack(&m, pack_int_id);
>  
> -	pack_int_id = midx_for_pack(&m, pack_int_id);
> +	if (!m->packs[pack_pos]) {
> +		struct strbuf pack_name = STRBUF_INIT;
> +		struct strbuf key = STRBUF_INIT;
> +		struct packed_git *p;
>  
> -	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
> -		return 1;

Ah, so this series builds on top of my patch that introduces the
negative lookup cache? That wasn't quite clear to me and makes it a bit
hard to iterate on my patch now. Could I suggest that you maybe include
that patch as part of this series so that those can be iterated on in
tandem?

Overall I think that this change is quite sensible and hides away at
least some of the complexity. Thanks!

Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack
  2025-05-26  7:23                     ` Patrick Steinhardt
@ 2025-05-28  2:00                       ` Taylor Blau
  0 siblings, 0 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-28  2:00 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano

On Mon, May 26, 2025 at 09:23:48AM +0200, Patrick Steinhardt wrote:
> On Sun, May 25, 2025 at 02:41:51PM -0400, Taylor Blau wrote:
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index b9f1d86604..99c4927e9c 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -490,7 +490,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
> >  	for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
> >  		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
> >  			warning(_("could not open pack %s"),
> > -				bitmap_git->midx->pack_names[i]);
> > +				bitmap_git->midx->pack_names[i - bitmap_git->midx->num_packs_in_base]);
>
> Doesn't this cause a negative array index though in the case where
> `prepare_midx_pack()` returns an error for any `i` smaller than the
> number of packs in base?

Nice catch, yeah, this is definitely broken when the value i is smaller
than m->num_packs_in_base.

This is another spot that would benefit from similar treatment where
callers (outside of midx.c and *maybe* midx-write.c) access the
pack_names array through a function liked nth_midxed_pack_name() or
similar.

I'll adjust this and send a new round.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack()
  2025-05-26  7:23                     ` Patrick Steinhardt
@ 2025-05-28  2:08                       ` Taylor Blau
  0 siblings, 0 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-28  2:08 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano

On Mon, May 26, 2025 at 09:23:56AM +0200, Patrick Steinhardt wrote:
> On Sun, May 25, 2025 at 02:41:54PM -0400, Taylor Blau wrote:
> [snip]
> > So want_included_pack() is OK becuase it will never encounter a
>
> s/becuase/because/

Thanks for spotting.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx()
  2025-05-26  7:23                     ` Patrick Steinhardt
@ 2025-05-28  2:15                       ` Taylor Blau
  0 siblings, 0 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-28  2:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano

On Mon, May 26, 2025 at 09:23:59AM +0200, Patrick Steinhardt wrote:
> On Sun, May 25, 2025 at 02:41:57PM -0400, Taylor Blau wrote:
> > diff --git a/midx-write.c b/midx-write.c
> > index e4a3830d45..7802a4b694 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -943,6 +943,19 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
> >  {
> >  	struct multi_pack_index *m;
> >
> > +	if (preferred_pack_name) {
> > +		/*
> > +		 * If a preferred pack is specified, need to have
> > +		 * packed_git's loaded to ensure the chosen preferred
> > +		 * pack has a non-zero object count.
> > +		 *
> > +		 * Trick ourselves into thinking that we're writing a
> > +		 * reverse index in this case in order to open up the
> > +		 * pack index file.
> > +		 */
> > +		flags |= MIDX_WRITE_REV_INDEX;
> > +	}
>
> This change feels a bit weird to me. Sure, it does allow us to pull out
> the loop in the subsequent patch. But honestly, that makes this
> workaround even weirder in that we now set unrelated flags in some
> function and expect a different function to only honor it in order to
> open the packfile.
>
> Shouldn't we instead have a separate flag for the new function that
> tells it whether or not it is supposed to prepare the pack?

Yeah, I like that much better, thanks!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()`
  2025-05-26  7:24                     ` Patrick Steinhardt
@ 2025-05-28  2:18                       ` Taylor Blau
  2025-05-28 11:53                         ` Patrick Steinhardt
  0 siblings, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-28  2:18 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Jeff King, Junio C Hamano

On Mon, May 26, 2025 at 09:24:03AM +0200, Patrick Steinhardt wrote:
> On Sun, May 25, 2025 at 02:42:03PM -0400, Taylor Blau wrote:
> > diff --git a/midx.c b/midx.c
> > index fbce88bd46..f7f509cf46 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -449,50 +449,48 @@ static uint32_t midx_for_pack(struct multi_pack_index **_m,
> >  	return pack_int_id - m->num_packs_in_base;
> >  }
> >
> > -int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
> > -		      uint32_t pack_int_id)
> > +struct packed_git *prepare_midx_pack(struct repository *r,
> > +				     struct multi_pack_index *m,
> > +				     uint32_t pack_int_id)
> >  {
> > -	struct strbuf pack_name = STRBUF_INIT;
> > -	struct strbuf key = STRBUF_INIT;
> > -	struct packed_git *p;
> > +	uint32_t pack_pos = midx_for_pack(&m, pack_int_id);
> >
> > -	pack_int_id = midx_for_pack(&m, pack_int_id);
> > +	if (!m->packs[pack_pos]) {
> > +		struct strbuf pack_name = STRBUF_INIT;
> > +		struct strbuf key = STRBUF_INIT;
> > +		struct packed_git *p;
> >
> > -	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
> > -		return 1;
>
> Ah, so this series builds on top of my patch that introduces the
> negative lookup cache? That wasn't quite clear to me and makes it a bit
> hard to iterate on my patch now.

Eek, sorry for the confusion. I mentioned it in the beginning of my
cover letter as well as the (less visible) "base-commit" identifier. Is
there another spot where I could have highlighted the dependency more
clearly?

> Could I suggest that you maybe include that patch as part of this
> series so that those can be iterated on in tandem?

I could, however your branch is already tentatively marked for 'next',
and Junio applied this topic to a branch based on yours as I had
suggested. So we could change it, but I think it would be more of a
hassle for Junio, so I'd rather avoid doing so.

Are there changes that you want to make on top of your patch?

> Overall I think that this change is quite sensible and hides away at
> least some of the complexity. Thanks!

Much appreciated :-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()`
  2025-05-28  2:18                       ` Taylor Blau
@ 2025-05-28 11:53                         ` Patrick Steinhardt
  0 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-28 11:53 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Jeff King, Junio C Hamano

On Tue, May 27, 2025 at 10:18:06PM -0400, Taylor Blau wrote:
> On Mon, May 26, 2025 at 09:24:03AM +0200, Patrick Steinhardt wrote:
> > On Sun, May 25, 2025 at 02:42:03PM -0400, Taylor Blau wrote:
> > > diff --git a/midx.c b/midx.c
> > > index fbce88bd46..f7f509cf46 100644
> > > --- a/midx.c
> > > +++ b/midx.c
> > > @@ -449,50 +449,48 @@ static uint32_t midx_for_pack(struct multi_pack_index **_m,
> > >  	return pack_int_id - m->num_packs_in_base;
> > >  }
> > >
> > > -int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
> > > -		      uint32_t pack_int_id)
> > > +struct packed_git *prepare_midx_pack(struct repository *r,
> > > +				     struct multi_pack_index *m,
> > > +				     uint32_t pack_int_id)
> > >  {
> > > -	struct strbuf pack_name = STRBUF_INIT;
> > > -	struct strbuf key = STRBUF_INIT;
> > > -	struct packed_git *p;
> > > +	uint32_t pack_pos = midx_for_pack(&m, pack_int_id);
> > >
> > > -	pack_int_id = midx_for_pack(&m, pack_int_id);
> > > +	if (!m->packs[pack_pos]) {
> > > +		struct strbuf pack_name = STRBUF_INIT;
> > > +		struct strbuf key = STRBUF_INIT;
> > > +		struct packed_git *p;
> > >
> > > -	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
> > > -		return 1;
> >
> > Ah, so this series builds on top of my patch that introduces the
> > negative lookup cache? That wasn't quite clear to me and makes it a bit
> > hard to iterate on my patch now.
> 
> Eek, sorry for the confusion. I mentioned it in the beginning of my
> cover letter as well as the (less visible) "base-commit" identifier. Is
> there another spot where I could have highlighted the dependency more
> clearly?

Oh, it was right at the top of that mail. Maybe the square brackets have
made me skip it? No idea. In any case it was present, I just happened to
read over it.

> > Could I suggest that you maybe include that patch as part of this
> > series so that those can be iterated on in tandem?
> 
> I could, however your branch is already tentatively marked for 'next',
> and Junio applied this topic to a branch based on yours as I had
> suggested. So we could change it, but I think it would be more of a
> hassle for Junio, so I'd rather avoid doing so.
> 
> Are there changes that you want to make on top of your patch?

The only change was that I wanted to add is the `(void *)(intptr_t)-1`
part with a macro, as proposed by Peff. You know, it should be trivial
to adapt to that change, so I'll just send out a new version now, which
should then probably be merged down to `next` soonish anyway.

Patrick

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 0/2] packfile: avoid access(3p) calls for missing packs
  2025-05-16  8:55 [PATCH] packfile: avoid access(3p) calls for missing packs Patrick Steinhardt
  2025-05-16 18:34 ` Junio C Hamano
  2025-05-20  9:53 ` [PATCH v2 0/2] " Patrick Steinhardt
@ 2025-05-28 12:24 ` Patrick Steinhardt
  2025-05-28 12:24   ` [PATCH v3 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
                     ` (2 more replies)
  2 siblings, 3 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-28 12:24 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King, Junio C Hamano

Hi,

this patch addresses an issue we have recently seen in our production
systems due to a stale MIDX. The MIDX contained entries for packfiles
that didn't exist anymore, which caused Git to repeatedly look up those
packfiles. Each missing packfile resulted in four repeated syscalls:
three access(3p) calls to check for supporting data structures, and one
call to stat(3p) to check for the packfile itself. The first three calls
are essentially wasted though when the stat(3p) call itself fails, which
is being fixed by this patch.

I doubt that the patch matters in almost any repository, but given that
the refactoring is trivial I thought to submit the patch regardless of
that. Another step would be to introduce a negative lookup cache -- but
that would be a bit more involved, so I decided against it for now as I
don't want to introduce complexity for dubious gains.

Changes in v2:
  - Drop the patch that reorders syscalls and add a comment explaining
    why the order is important.
  - Add a negative lookup cache for indexed packfiles.
  - Link to v1: https://lore.kernel.org/r/20250516-pks-pack-avoid-stats-on-missing-v1-1-e2ef4d8798a3@pks.im

Changes in v3:
  - Use a macro to hide away the `(void *)(intptr_t)-1` magic.
  - Link to v2: https://lore.kernel.org/r/20250520-pks-pack-avoid-stats-on-missing-v2-0-333c5217fb05@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (2):
      packfile: explain ordering of how we look up auxiliary pack files
      midx: stop repeatedly looking up nonexistent packfiles

 midx.c     | 12 ++++++++++--
 packfile.c | 11 +++++++++++
 2 files changed, 21 insertions(+), 2 deletions(-)

Range-diff versus v2:

1:  cf601a63e98 = 1:  c4087674967 packfile: explain ordering of how we look up auxiliary pack files
2:  e3108f7ce48 ! 2:  2fa98231464 midx: stop repeatedly looking up nonexistent packfiles
    @@ Commit message
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## midx.c ##
    +@@
    + #include "pack-bitmap.h"
    + #include "pack-revindex.h"
    + 
    ++#define MIDX_PACK_ERROR ((void *)(intptr_t)-1)
    ++
    + int midx_checksum_valid(struct multi_pack_index *m);
    + void clear_midx_files_ext(const char *object_dir, const char *ext,
    + 			  const char *keep_hash);
     @@ midx.c: void close_midx(struct multi_pack_index *m)
      	munmap((unsigned char *)m->data, m->data_len);
      
      	for (i = 0; i < m->num_packs; i++) {
     -		if (m->packs[i])
    -+		if (m->packs[i] && m->packs[i] != (void *)(intptr_t)-1)
    ++		if (m->packs[i] && m->packs[i] != MIDX_PACK_ERROR)
      			m->packs[i]->multi_pack_index = 0;
      	}
      	FREE_AND_NULL(m->packs);
    @@ midx.c: int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
      
      	pack_int_id = midx_for_pack(&m, pack_int_id);
      
    -+	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
    ++	if (m->packs[pack_int_id] == MIDX_PACK_ERROR)
     +		return 1;
      	if (m->packs[pack_int_id])
      		return 0;
    @@ midx.c: int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
      
     -	if (!p)
     +	if (!p) {
    -+		m->packs[pack_int_id] = (void *)(intptr_t)-1;
    ++		m->packs[pack_int_id] = MIDX_PACK_ERROR;
      		return 1;
     +	}
      
    @@ midx.c: struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
      				   uint32_t pack_int_id)
      {
      	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
    -+	if (m->packs[local_pack_int_id] == (void *)(intptr_t)-1)
    ++	if (m->packs[local_pack_int_id] == MIDX_PACK_ERROR)
     +		return NULL;
      	return m->packs[local_pack_int_id];
      }

---
base-commit: 1a8a4971cc6c179c4dd711f4a7f5d7178f4b3ab7
change-id: 20250516-pks-pack-avoid-stats-on-missing-8e3b75755cf0


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v3 1/2] packfile: explain ordering of how we look up auxiliary pack files
  2025-05-28 12:24 ` [PATCH v3 " Patrick Steinhardt
@ 2025-05-28 12:24   ` Patrick Steinhardt
  2025-05-28 12:24   ` [PATCH v3 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
  2025-05-30  6:27   ` [PATCH v3 0/2] packfile: avoid access(3p) calls for missing packs Jeff King
  2 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-28 12:24 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King, Junio C Hamano

When adding a packfile to an object database we perform four syscalls:

  - Three calls to access(3p) are done to check for auxiliary data
    structures.

  - One call to stat(3p) is done to check for the ".pack" itself.

One curious bit is that we perform the access(3p) calls before checking
for the packfile itself, but if the packfile doesn't exist we discard
all results. The access(3p) calls are thus essentially wasted, so one
may be triggered to reorder those calls so that we can short-circuit the
other syscalls in case the packfile does not exist.

The order in which we look up files is quite important though to help
avoid races:

  - When installing a packfile we move auxiliary data structures into
    place before we install the ".idx" file.

  - When deleting a packfile we first delete the ".idx" and ".pack"
    files before deleting auxiliary data structures.

As such, to avoid any races with concurrently created or deleted packs
we need to make sure that we _first_ read auxiliary data structures
before we read the corresponding ".idx" or ".pack" file. Otherwise it
may easily happen that we return a populated but misclassified pack.

Add a comment to `add_packed_git()` to make future readers aware of this
ordering requirement.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 packfile.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/packfile.c b/packfile.c
index d91016f1c7f..933036e2606 100644
--- a/packfile.c
+++ b/packfile.c
@@ -737,6 +737,17 @@ struct packed_git *add_packed_git(struct repository *r, const char *path,
 	p = alloc_packed_git(r, alloc);
 	memcpy(p->pack_name, path, path_len);
 
+	/*
+	 * Note that we have to check auxiliary data structures before we check
+	 * for the ".pack" file to exist to avoid races with a packfile that is
+	 * in the process of being deleted. The ".pack" file is unlinked before
+	 * its auxiliary data structures, so we know that we either get a
+	 * consistent snapshot of all data structures or that we'll fail to
+	 * stat(3p) the packfile itself and thus return `NULL`.
+	 *
+	 * As such, we cannot bail out before the access(3p) calls in case the
+	 * packfile doesn't exist without doing two stat(3p) calls for it.
+	 */
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
 	if (!access(p->pack_name, F_OK))
 		p->pack_keep = 1;

-- 
2.49.0.1266.g31b7d2e469.dirty


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v3 2/2] midx: stop repeatedly looking up nonexistent packfiles
  2025-05-28 12:24 ` [PATCH v3 " Patrick Steinhardt
  2025-05-28 12:24   ` [PATCH v3 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
@ 2025-05-28 12:24   ` Patrick Steinhardt
  2025-05-30  6:27   ` [PATCH v3 0/2] packfile: avoid access(3p) calls for missing packs Jeff King
  2 siblings, 0 replies; 53+ messages in thread
From: Patrick Steinhardt @ 2025-05-28 12:24 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Jeff King, Junio C Hamano

The multi-pack index acts as a cache across a set of packfiles so that
we can quickly look up which of those packfiles contains a given object.
As such, the multi-pack index naturally needs to be updated every time
one of the packfiles goes away, or otherwise the multi-pack index has
grown stale.

A stale multi-pack index should be handled gracefully by Git though, and
in fact it is: if the indexed pack cannot be found we simply ignore it
and eventually we fall back to doing the object lookup by just iterating
through all packs, even if those aren't indexed.

But while this fallback works, it has one significant downside: we don't
cache the fact that a pack has vanished. This leads to us repeatedly
trying to look up the same pack only to realize that it (still) doesn't
exist.

This issue can be easily demonstrated by creating a repository with a
stale multi-pack index and a couple of objects. We do so by creating a
repository with two packfiles, both of which are indexed by the
multi-pack index, and then repack those two packfiles. Note that we have
to move the multi-pack-index before doing the final repack, as Git knows
to delete it otherwise.

    $ git init repo
    $ cd repo/
    $ git config set maintenance.auto false
    $ for i in $(seq 1000); do printf "%d-original" $i >file-$i; done
    $ git add .
    $ git commit -moriginal
    $ git repack -dl
    $ for i in $(seq 1000); do printf "%d-modified" $i >file-$i; done
    $ git commit -a -mmodified
    $ git repack -dl
    $ git multi-pack-index write
    $ mv .git/objects/pack/multi-pack-index .
    $ git repack -Adl
    $ mv multi-pack-index .git/objects/pack/

Commands that cause a lot of objects lookups will now repeatedly invoke
`add_packed_git()`, which leads to three failed access(3p) calls as well
as one failed stat(3p) call. The following strace for example is done
for `git log --patch` in the above repository:

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     74.67    0.024693           1     18038     18031 access
     25.33    0.008378           1      6045      6017 newfstatat
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.033071           1     24083     24048 total

Fix the issue by introducing a negative lookup cache for indexed packs.
This cache works by simply storing an invalid pointer for a missing pack
when `prepare_midx_pack()` fails to look up the pack. Most users of the
`packs` array don't need to be adjusted, either, as they all know to
call `prepare_midx_pack()` before accessing the array.

With this change in place we can now see a significantly reduced number
of syscalls:

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     73.58    0.000323           5        60        28 newfstatat
     26.42    0.000116           5        23        16 access
    ------ ----------- ----------- --------- --------- ----------------
    100.00    0.000439           5        83        44 total

Furthermore, this change also results in a speedup:

    Benchmark 1: git log --patch (revision = HEAD~)
      Time (mean ± σ):      50.4 ms ±   2.5 ms    [User: 22.0 ms, System: 24.4 ms]
      Range (min … max):    45.4 ms …  54.9 ms    53 runs

    Benchmark 2: git log --patch (revision = HEAD)
      Time (mean ± σ):      12.7 ms ±   0.4 ms    [User: 11.1 ms, System: 1.6 ms]
      Range (min … max):    12.4 ms …  15.0 ms    191 runs

    Summary
      git log --patch (revision = HEAD) ran
        3.96 ± 0.22 times faster than git log --patch (revision = HEAD~)

In the end, it should in theory never be necessary to have this negative
lookup cache given that we know to update the multi-pack index together
with repacks. But as the change is quite contained and as the speedup
can be significant as demonstrated above, it does feel sensible to have
the negative lookup cache regardless.

Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 3d0015f7828..cd6e766ce2b 100644
--- a/midx.c
+++ b/midx.c
@@ -13,6 +13,8 @@
 #include "pack-bitmap.h"
 #include "pack-revindex.h"
 
+#define MIDX_PACK_ERROR ((void *)(intptr_t)-1)
+
 int midx_checksum_valid(struct multi_pack_index *m);
 void clear_midx_files_ext(const char *object_dir, const char *ext,
 			  const char *keep_hash);
@@ -405,7 +407,7 @@ void close_midx(struct multi_pack_index *m)
 	munmap((unsigned char *)m->data, m->data_len);
 
 	for (i = 0; i < m->num_packs; i++) {
-		if (m->packs[i])
+		if (m->packs[i] && m->packs[i] != MIDX_PACK_ERROR)
 			m->packs[i]->multi_pack_index = 0;
 	}
 	FREE_AND_NULL(m->packs);
@@ -458,6 +460,8 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 
 	pack_int_id = midx_for_pack(&m, pack_int_id);
 
+	if (m->packs[pack_int_id] == MIDX_PACK_ERROR)
+		return 1;
 	if (m->packs[pack_int_id])
 		return 0;
 
@@ -482,8 +486,10 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
 	strbuf_release(&pack_name);
 	strbuf_release(&key);
 
-	if (!p)
+	if (!p) {
+		m->packs[pack_int_id] = MIDX_PACK_ERROR;
 		return 1;
+	}
 
 	p->multi_pack_index = 1;
 	m->packs[pack_int_id] = p;
@@ -495,6 +501,8 @@ struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
 				   uint32_t pack_int_id)
 {
 	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+	if (m->packs[local_pack_int_id] == MIDX_PACK_ERROR)
+		return NULL;
 	return m->packs[local_pack_int_id];
 }
 

-- 
2.49.0.1266.g31b7d2e469.dirty


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics
  2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
                                     ` (4 preceding siblings ...)
  2025-05-25 18:42                   ` [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
@ 2025-05-28 22:58                   ` Taylor Blau
  2025-05-28 22:59                     ` [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()` Taylor Blau
                                       ` (3 more replies)
  5 siblings, 4 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-28 22:58 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

This series has been updated to be based on v3 of
ps/midx-negative-packfile-cache, which is 1f34bf3e08 (midx: stop
repeatedly looking up nonexistent packfiles, 2025-05-28) at the time of
writing.

There are a couple of minor changes from last time, including:

 - using the new MIDX_PACK_ERROR macro instead of casting '-1' to a void
   pointer

 - converting some direct m->pack_names lookups (outside of midx.c and
   midx-write.c with one exception) to use a similar pattern to access
   the nth pack name

As usual, a range-diff against v1 is below for convenience.

Thanks again for your review!

Taylor Blau (4):
  midx: access pack names through `nth_midxed_pack_name()`
  midx-write.c: guard against incremental MIDXs in want_included_pack()
  midx-write.c: extract inner loop from fill_packs_from_midx()
  midx: return a `packed_git` pointer from `prepare_midx_pack()`

 midx-write.c              | 101 +++++++++++++++++++++-----------------
 midx.c                    |  88 +++++++++++++++++----------------
 midx.h                    |   6 ++-
 pack-bitmap.c             |   6 +--
 t/helper/test-read-midx.c |   7 +--
 5 files changed, 116 insertions(+), 92 deletions(-)

Range-diff against v1:
1:  a7082dc7f0 < -:  ---------- packfile: explain ordering of how we look up auxiliary pack files
2:  73fe0882ee < -:  ---------- midx: stop repeatedly looking up nonexistent packfiles
3:  ad7295b11b < -:  ---------- pack-bitmap.c: fix broken warning() when missing MIDX'd pack
5:  5d97b706e1 ! 1:  d3508d3cfb midx-write.c: simplify fill_packs_from_midx()
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    midx-write.c: simplify fill_packs_from_midx()
    +    midx: access pack names through `nth_midxed_pack_name()`
     
    -    When enumerating packs within fill_packs_from_midx(), we open the pack's
    -    index iff we are either writing a reverse index, have a preferred pack
    -    (by name) or both.
    +    Accessing a MIDX's 'pack_names' array is somewhat error-prone when
    +    dealing with incremental MIDX chains, where the (global) pack_int_id for
    +    some pack may differ from the containing layer's index for that pack.
     
    -    Let's simplify this into a single case by setting the
    -    MIDX_WRITE_REV_INDEX flag bit when we have a preferred_pack_name. This
    -    is a little bit of a shortcut to reduce the line length in the loop
    -    below. But it sets us up nicely to extract the inner loop of this
    -    function out into its own function, where we will no longer have to pass
    -    the preferred_pack_name.
    +    Introduce `nth_midxed_pack_name()` in an effort to reduce a common
    +    source of errors by discouraging external callers from accessing a
    +    layer's `pack_names` array directly.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    - ## midx-write.c ##
    -@@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx,
    - {
    - 	struct multi_pack_index *m;
    + ## midx.c ##
    +@@ midx.c: struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
    + 	return m->packs[local_pack_int_id];
    + }
      
    -+	if (preferred_pack_name) {
    -+		/*
    -+		 * If a preferred pack is specified, need to have
    -+		 * packed_git's loaded to ensure the chosen preferred
    -+		 * pack has a non-zero object count.
    -+		 *
    -+		 * Trick ourselves into thinking that we're writing a
    -+		 * reverse index in this case in order to open up the
    -+		 * pack index file.
    -+		 */
    -+		flags |= MIDX_WRITE_REV_INDEX;
    -+	}
    ++const char *nth_midxed_pack_name(struct multi_pack_index *m,
    ++				 uint32_t pack_int_id)
    ++{
    ++	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
    ++	return m->pack_names[local_pack_int_id];
    ++}
     +
    - 	for (m = ctx->m; m; m = m->base_midx) {
    - 		uint32_t i;
    + #define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
      
    -@@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx,
    - 			 * If generating a reverse index, need to have
    - 			 * packed_git's loaded to compare their
    - 			 * mtimes and object count.
    --			 *
    --			 * If a preferred pack is specified, need to
    --			 * have packed_git's loaded to ensure the chosen
    --			 * preferred pack has a non-zero object count.
    - 			 */
    --			if (flags & MIDX_WRITE_REV_INDEX ||
    --			    preferred_pack_name) {
    -+			if (flags & MIDX_WRITE_REV_INDEX) {
    - 				if (prepare_midx_pack(ctx->repo, m,
    - 						      m->num_packs_in_base + i)) {
    - 					error(_("could not load pack"));
    + int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
    +
    + ## midx.h ##
    +@@ midx.h: struct multi_pack_index *load_multi_pack_index(struct repository *r,
    + int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
    + struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
    + 				   uint32_t pack_int_id);
    ++const char *nth_midxed_pack_name(struct multi_pack_index *m,
    ++				 uint32_t pack_int_id);
    + int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
    + 		       struct bitmapped_pack *bp, uint32_t pack_int_id);
    + int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
    +
    + ## pack-bitmap.c ##
    +@@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
    + 	for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
    + 		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
    + 			warning(_("could not open pack %s"),
    +-				bitmap_git->midx->pack_names[i]);
    ++				nth_midxed_pack_name(bitmap_git->midx, i));
    + 			goto cleanup;
    + 		}
    + 	}
    +@@ pack-bitmap.c: void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
    + 			struct bitmapped_pack pack;
    + 			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
    + 				warning(_("unable to load pack: '%s', disabling pack-reuse"),
    +-					bitmap_git->midx->pack_names[i]);
    ++					nth_midxed_pack_name(bitmap_git->midx, i));
    + 				free(packs);
    + 				return;
    + 			}
    +
    + ## t/helper/test-read-midx.c ##
    +@@ t/helper/test-read-midx.c: static int read_midx_file(const char *object_dir, const char *checksum,
    + 	printf("\nnum_objects: %d\n", m->num_objects);
    + 
    + 	printf("packs:\n");
    +-	for (i = 0; i < m->num_packs; i++)
    +-		printf("%s\n", m->pack_names[i]);
    ++	for (i = m->num_packs_in_base; i < m->num_packs + m->num_packs_in_base;
    ++	     i++)
    ++		printf("%s\n", nth_midxed_pack_name(m, i));
    + 
    + 	printf("object-dir: %s\n", m->object_dir);
    + 
    +@@ t/helper/test-read-midx.c: static int read_midx_preferred_pack(const char *object_dir)
    + 		return 1;
    + 	}
    + 
    +-	printf("%s\n", midx->pack_names[preferred_pack]);
    ++	printf("%s\n", nth_midxed_pack_name(midx, preferred_pack));
    + 	close_midx(midx);
    + 	return 0;
    + }
4:  d2f9645aa1 ! 2:  cf3ab81a08 midx-write.c: guard against incremental MIDXs in want_included_pack()
    @@ Commit message
             if (m->base_midx)
                 die(_("cannot repack an incremental multi-pack-index"));
     
    -    So want_included_pack() is OK becuase it will never encounter a
    +    So want_included_pack() is OK because it will never encounter a
         situation where it has to chase backwards through the '->base_midx'
         pointer. But that is not immediately clear from reading the code, and is
         too fragile for my comfort. Make this more clear by adding an ASSERT()
6:  a4080c12df ! 3:  e54988bfea midx-write.c: extract inner loop from fill_packs_from_midx()
    @@ midx-write.c: static struct multi_pack_index *lookup_multi_pack_index(struct rep
      
     +static int fill_packs_from_midx_1(struct write_midx_context *ctx,
     +				  struct multi_pack_index *m,
    -+				  uint32_t flags)
    ++				  int prepare_packs)
     +{
     +	for (uint32_t i = 0; i < m->num_packs; i++) {
    -+		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
    -+
     +		/*
     +		 * If generating a reverse index, need to have
     +		 * packed_git's loaded to compare their
     +		 * mtimes and object count.
     +		 */
    -+		if (flags & MIDX_WRITE_REV_INDEX) {
    ++		if (prepare_packs) {
     +			if (prepare_midx_pack(ctx->repo, m,
     +					      m->num_packs_in_base + i)) {
     +				error(_("could not load pack"));
    @@ midx-write.c: static struct multi_pack_index *lookup_multi_pack_index(struct rep
      static int fill_packs_from_midx(struct write_midx_context *ctx,
      				const char *preferred_pack_name, uint32_t flags)
      {
    -@@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx,
    - 	}
    + 	struct multi_pack_index *m;
    ++	int prepare_packs;
    ++
    ++	/*
    ++	 * If generating a reverse index, need to have packed_git's
    ++	 * loaded to compare their mtimes and object count.
    ++	 */
    ++	prepare_packs = !!(flags & MIDX_WRITE_REV_INDEX || preferred_pack_name);
      
      	for (m = ctx->m; m; m = m->base_midx) {
     -		uint32_t i;
    @@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx,
     -			 * If generating a reverse index, need to have
     -			 * packed_git's loaded to compare their
     -			 * mtimes and object count.
    +-			 *
    +-			 * If a preferred pack is specified, need to
    +-			 * have packed_git's loaded to ensure the chosen
    +-			 * preferred pack has a non-zero object count.
     -			 */
    --			if (flags & MIDX_WRITE_REV_INDEX) {
    +-			if (flags & MIDX_WRITE_REV_INDEX ||
    +-			    preferred_pack_name) {
     -				if (prepare_midx_pack(ctx->repo, m,
     -						      m->num_packs_in_base + i)) {
     -					error(_("could not load pack"));
    @@ midx-write.c: static int fill_packs_from_midx(struct write_midx_context *ctx,
     -				       m->pack_names[i],
     -				       m->num_packs_in_base + i);
     -		}
    -+		int ret = fill_packs_from_midx_1(ctx, m, flags);
    ++		int ret = fill_packs_from_midx_1(ctx, m, prepare_packs);
     +		if (ret)
     +			return ret;
      	}
7:  80699bb3ee ! 4:  e3e21db673 midx: return a `packed_git` pointer from `prepare_midx_pack()`
    @@ Commit message
     
      ## midx-write.c ##
     @@ midx-write.c: static int fill_packs_from_midx_1(struct write_midx_context *ctx,
    - 				  uint32_t flags)
    + 				  int prepare_packs)
      {
      	for (uint32_t i = 0; i < m->num_packs; i++) {
    +-		/*
    +-		 * If generating a reverse index, need to have
    +-		 * packed_git's loaded to compare their
    +-		 * mtimes and object count.
    +-		 */
     +		struct packed_git *p = NULL;
     +
    - 		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
    - 
    - 		/*
    -@@ midx-write.c: static int fill_packs_from_midx_1(struct write_midx_context *ctx,
    - 		 * mtimes and object count.
    - 		 */
    - 		if (flags & MIDX_WRITE_REV_INDEX) {
    ++		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
    + 		if (prepare_packs) {
     -			if (prepare_midx_pack(ctx->repo, m,
     -					      m->num_packs_in_base + i)) {
     +			p = prepare_midx_pack(ctx->repo, m,
    @@ midx-write.c: static int fill_packs_from_midx_1(struct write_midx_context *ctx,
      		}
      
     -		fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
    -+		fill_pack_info(&ctx->info[ctx->nr++], p,
    - 			       m->pack_names[i],
    +-			       m->pack_names[i],
    ++		fill_pack_info(&ctx->info[ctx->nr++], p, m->pack_names[i],
      			       m->num_packs_in_base + i);
      	}
    + 
     @@ midx-write.c: int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
      					  _("Finding and deleting unreferenced packfiles"),
      					  m->num_packs);
    @@ midx.c: static uint32_t midx_for_pack(struct multi_pack_index **_m,
     +		struct strbuf key = STRBUF_INIT;
     +		struct packed_git *p;
      
    --	if (m->packs[pack_int_id] == (void *)(intptr_t)-1)
    +-	if (m->packs[pack_int_id] == MIDX_PACK_ERROR)
     -		return 1;
     -	if (m->packs[pack_int_id])
     -		return 0;
    @@ midx.c: static uint32_t midx_for_pack(struct multi_pack_index **_m,
     +		strbuf_release(&key);
      
     -	if (!p) {
    --		m->packs[pack_int_id] = (void *)(intptr_t)-1;
    +-		m->packs[pack_int_id] = MIDX_PACK_ERROR;
     -		return 1;
    -+		m->packs[pack_pos] = p ? p : (void *)(intptr_t)-1;
    ++		m->packs[pack_pos] = p ? p : MIDX_PACK_ERROR;
     +		if (p)
     +			p->multi_pack_index = 1;
      	}
    @@ midx.c: static uint32_t midx_for_pack(struct multi_pack_index **_m,
     -	m->packs[pack_int_id] = p;
     -
     -	return 0;
    -+	if (m->packs[pack_pos] == (void *)(intptr_t)-1)
    ++	if (m->packs[pack_pos] == MIDX_PACK_ERROR)
     +		return NULL;
     +	return m->packs[pack_pos];
      }
    @@ midx.h: void get_split_midx_filename_ext(const struct git_hash_algo *hash_algo,
     +				     uint32_t pack_int_id);
      struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
      				   uint32_t pack_int_id);
    - int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
    + const char *nth_midxed_pack_name(struct multi_pack_index *m,
     
      ## pack-bitmap.c ##
     @@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
    @@ pack-bitmap.c: static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
     -		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
     +		if (!prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
      			warning(_("could not open pack %s"),
    - 				bitmap_git->midx->pack_names[i - bitmap_git->midx->num_packs_in_base]);
    + 				nth_midxed_pack_name(bitmap_git->midx, i));
      			goto cleanup;

base-commit: 1f34bf3e082741e053d25b76a0ffe31d9d967594
-- 
2.49.0.640.ga4de40e6a8

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()`
  2025-05-28 22:58                   ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
@ 2025-05-28 22:59                     ` Taylor Blau
  2025-05-29 20:47                       ` Junio C Hamano
  2025-05-29 20:51                       ` Junio C Hamano
  2025-05-28 22:59                     ` [PATCH v2 2/4] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
                                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-28 22:59 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

Accessing a MIDX's 'pack_names' array is somewhat error-prone when
dealing with incremental MIDX chains, where the (global) pack_int_id for
some pack may differ from the containing layer's index for that pack.

Introduce `nth_midxed_pack_name()` in an effort to reduce a common
source of errors by discouraging external callers from accessing a
layer's `pack_names` array directly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                    | 7 +++++++
 midx.h                    | 2 ++
 pack-bitmap.c             | 4 ++--
 t/helper/test-read-midx.c | 7 ++++---
 4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/midx.c b/midx.c
index cd6e766ce2..6705e77881 100644
--- a/midx.c
+++ b/midx.c
@@ -506,6 +506,13 @@ struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
 	return m->packs[local_pack_int_id];
 }
 
+const char *nth_midxed_pack_name(struct multi_pack_index *m,
+				 uint32_t pack_int_id)
+{
+	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
+	return m->pack_names[local_pack_int_id];
+}
+
 #define MIDX_CHUNK_BITMAPPED_PACKS_WIDTH (2 * sizeof(uint32_t))
 
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
diff --git a/midx.h b/midx.h
index 9d1374cbd5..0fb490f4d4 100644
--- a/midx.h
+++ b/midx.h
@@ -107,6 +107,8 @@ struct multi_pack_index *load_multi_pack_index(struct repository *r,
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
 struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
 				   uint32_t pack_int_id);
+const char *nth_midxed_pack_name(struct multi_pack_index *m,
+				 uint32_t pack_int_id);
 int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 		       struct bitmapped_pack *bp, uint32_t pack_int_id);
 int bsearch_one_midx(const struct object_id *oid, struct multi_pack_index *m,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index b9f1d86604..8ddc150778 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -490,7 +490,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 	for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
 		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
 			warning(_("could not open pack %s"),
-				bitmap_git->midx->pack_names[i]);
+				nth_midxed_pack_name(bitmap_git->midx, i));
 			goto cleanup;
 		}
 	}
@@ -2469,7 +2469,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 			struct bitmapped_pack pack;
 			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
 				warning(_("unable to load pack: '%s', disabling pack-reuse"),
-					bitmap_git->midx->pack_names[i]);
+					nth_midxed_pack_name(bitmap_git->midx, i));
 				free(packs);
 				return;
 			}
diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index ac81390899..fbed0f6919 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -53,8 +53,9 @@ static int read_midx_file(const char *object_dir, const char *checksum,
 	printf("\nnum_objects: %d\n", m->num_objects);
 
 	printf("packs:\n");
-	for (i = 0; i < m->num_packs; i++)
-		printf("%s\n", m->pack_names[i]);
+	for (i = m->num_packs_in_base; i < m->num_packs + m->num_packs_in_base;
+	     i++)
+		printf("%s\n", nth_midxed_pack_name(m, i));
 
 	printf("object-dir: %s\n", m->object_dir);
 
@@ -108,7 +109,7 @@ static int read_midx_preferred_pack(const char *object_dir)
 		return 1;
 	}
 
-	printf("%s\n", midx->pack_names[preferred_pack]);
+	printf("%s\n", nth_midxed_pack_name(midx, preferred_pack));
 	close_midx(midx);
 	return 0;
 }
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 2/4] midx-write.c: guard against incremental MIDXs in want_included_pack()
  2025-05-28 22:58                   ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
  2025-05-28 22:59                     ` [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()` Taylor Blau
@ 2025-05-28 22:59                     ` Taylor Blau
  2025-05-28 22:59                     ` [PATCH v2 3/4] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
  2025-05-28 22:59                     ` [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
  3 siblings, 0 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-28 22:59 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

The function want_included_pack() is used to determine whether or not a
the packfile corresponding to some given pack_int_id should be included
in a 'git multi-pack-index repack' operation.

This spot looks like it would be broken, particularly in:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;
    p = m->packs[pack_int_id];

, when pack_int_id is greater than m->num_pack_in_base (supposing that
m->num_packs_in_base is non-zero, or equivalently that m->base_midx is
non-NULL).

Suppose we have two MIDXs in an incremental MIDX chain, each having two
packs:

  - M0 = {packs: [P0, P1], base_midx: NULL}
  - M1 = {packs: [P2, P3], base_midx: M0}

noting that each pack is identified by its global pack_int_id within the
chain.

So if you do something like:

    want_included_pack(the_repository, M1, pack_kept_objects, 2);

that function will call prepare_midx_pack(), which is smart enough to
realize that the pack of interest is in the current layer (M1), and
knows how to adjust its global pack_int_id into an index into the
current layer's 'packs' array.

But the following line:

    p = m->packs[pack_int_id]; /* m is M1, pack_int_id is 2 */

looks broken, since each layer of the MIDX only maintains an array of
the packs stored within that layer, and 'm' wasn't adjusted back to
point at M1->base_midx (M0).

The right thing to do would be:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;

    /* open-code midx.c::midx_for_pack(), which is private */
    while (m && pack_int_id < m->num_packs_in_base)
        m = m->base_midx;

    if (!m)
        BUG("broken midx?");
    if (pack_int_id >= m->num_packs + m->num_packs_in_base)
        BUG("broken pack_int_id?");

    p = m->packs[pack_int_id - m->num_packs_in_base];

But that would be overkill, since this function never deals with
incremental MIDXs having more than one layer! To see why, observe that
want_included_pack() has two callers:

  - midx-write.c::fill_included_packs_all()
  - midx-write.c::fill_included_packs_batch()

and those functions' sole caller is in midx-write.c::midx_repack(),
which dispatches a call to one or the other depending on whether or not
the batch_size is non-zero.

And at the very top of midx_repack(), we have a guard against
non-trivial incremental MIDX chains:

    if (m->base_midx)
        die(_("cannot repack an incremental multi-pack-index"));

So want_included_pack() is OK because it will never encounter a
situation where it has to chase backwards through the '->base_midx'
pointer. But that is not immediately clear from reading the code, and is
too fragile for my comfort. Make this more clear by adding an ASSERT()
to the above effect.

Apply the same treatment to each of the fill_included_packs-related
functions as well, since those are deceptively OK by the same reasoning.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/midx-write.c b/midx-write.c
index dd3b3070e5..e4a3830d45 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1636,6 +1636,9 @@ static int want_included_pack(struct repository *r,
 			      uint32_t pack_int_id)
 {
 	struct packed_git *p;
+
+	ASSERT(m && !m->base_midx);
+
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return 0;
 	p = m->packs[pack_int_id];
@@ -1655,6 +1658,8 @@ static void fill_included_packs_all(struct repository *r,
 	uint32_t i;
 	int pack_kept_objects = 0;
 
+	ASSERT(m && !m->base_midx);
+
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
@@ -1675,6 +1680,8 @@ static void fill_included_packs_batch(struct repository *r,
 	struct repack_info *pack_info;
 	int pack_kept_objects = 0;
 
+	ASSERT(m && !m->base_midx);
+
 	CALLOC_ARRAY(pack_info, m->num_packs);
 
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 3/4] midx-write.c: extract inner loop from fill_packs_from_midx()
  2025-05-28 22:58                   ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
  2025-05-28 22:59                     ` [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()` Taylor Blau
  2025-05-28 22:59                     ` [PATCH v2 2/4] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
@ 2025-05-28 22:59                     ` Taylor Blau
  2025-05-28 22:59                     ` [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
  3 siblings, 0 replies; 53+ messages in thread
From: Taylor Blau @ 2025-05-28 22:59 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

The function fill_packs_from_midx() does relatively little, but ends up
in a doubly-nested loop because we're enumerating each pack within each
layer of the incremental MIDX chain.

Let's de-dent the inner loop of fill_packs_from_midx() by extracting its
contents into a separate function, and calling that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c | 72 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 31 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index e4a3830d45..ca2384e291 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -938,44 +938,54 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
 	return result;
 }
 
+static int fill_packs_from_midx_1(struct write_midx_context *ctx,
+				  struct multi_pack_index *m,
+				  int prepare_packs)
+{
+	for (uint32_t i = 0; i < m->num_packs; i++) {
+		/*
+		 * If generating a reverse index, need to have
+		 * packed_git's loaded to compare their
+		 * mtimes and object count.
+		 */
+		if (prepare_packs) {
+			if (prepare_midx_pack(ctx->repo, m,
+					      m->num_packs_in_base + i)) {
+				error(_("could not load pack"));
+				return 1;
+			}
+
+			if (open_pack_index(m->packs[i]))
+				die(_("could not open index for %s"),
+				    m->packs[i]->pack_name);
+		}
+
+		fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
+			       m->pack_names[i],
+			       m->num_packs_in_base + i);
+	}
+
+	return 0;
+}
+
 static int fill_packs_from_midx(struct write_midx_context *ctx,
 				const char *preferred_pack_name, uint32_t flags)
 {
 	struct multi_pack_index *m;
+	int prepare_packs;
+
+	/*
+	 * If generating a reverse index, need to have packed_git's
+	 * loaded to compare their mtimes and object count.
+	 */
+	prepare_packs = !!(flags & MIDX_WRITE_REV_INDEX || preferred_pack_name);
 
 	for (m = ctx->m; m; m = m->base_midx) {
-		uint32_t i;
-
-		for (i = 0; i < m->num_packs; i++) {
-			ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
-			/*
-			 * If generating a reverse index, need to have
-			 * packed_git's loaded to compare their
-			 * mtimes and object count.
-			 *
-			 * If a preferred pack is specified, need to
-			 * have packed_git's loaded to ensure the chosen
-			 * preferred pack has a non-zero object count.
-			 */
-			if (flags & MIDX_WRITE_REV_INDEX ||
-			    preferred_pack_name) {
-				if (prepare_midx_pack(ctx->repo, m,
-						      m->num_packs_in_base + i)) {
-					error(_("could not load pack"));
-					return 1;
-				}
-
-				if (open_pack_index(m->packs[i]))
-					die(_("could not open index for %s"),
-					    m->packs[i]->pack_name);
-			}
-
-			fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
-				       m->pack_names[i],
-				       m->num_packs_in_base + i);
-		}
+		int ret = fill_packs_from_midx_1(ctx, m, prepare_packs);
+		if (ret)
+			return ret;
 	}
+
 	return 0;
 }
 
-- 
2.49.0.640.ga4de40e6a8


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()`
  2025-05-28 22:58                   ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
                                       ` (2 preceding siblings ...)
  2025-05-28 22:59                     ` [PATCH v2 3/4] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
@ 2025-05-28 22:59                     ` Taylor Blau
  2025-05-30  6:50                       ` Jeff King
  3 siblings, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-05-28 22:59 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King, Junio C Hamano

The prepare_midx_pack() function is designed to convert a MIDX-specific
pack_int_id for a given pack into a pointer into an actual `packed_git`
structure.

In general, these calls look something like:

    struct packed_git *p;
    if (prepare_midx_pack(the_repository, some_midx, some_pack_int_id))
        die("could not load pack xyz");
    p = some_midx->packs[some_pack_int_id];

, and in a pre-incremental MIDX world, this pattern works well. However,
in a post-incremental MIDX world, this pattern is a little more prone to
errors.

These errors can happen when the given 'pack_int_id' is not usable as an
index into the 'm->packs' array. And this happens in all layers but the
bottom-most one in an incremental MIDX chain. Each layer stores only the
packs that are local to that layer of the chain, and offsets them by the
total number of packs in the base MIDX(s).

But there is other awkwardness here. Thinking back to the above snippet,
suppose that the pack with ID 'some_pack_int_id' is in a layer in the
middle of the MIDX chain. Then it is still invalid to do:

    p = some_midx->packs[some_pack_int_id - some_midx->num_packs_in_base];

, becuase the top-most layer (here 'some_midx') may not even have that
pack! So we would have to chase down the '->base_midx' pointer in order
to get the correct result. midx.c has a helper to do this (called
'midx_for_pack()'), but it is meant only for internal use.

That means that a full, working version of the above adjusted to handle
incremental MIDXs would look something like:

    struct packed_git *p;
    if (prepare_midx_pack(the_repository, some_midx, some_pack_int_id))
        die("could not load pack xyz");

    while (m && pack_int_id < m->num_packs_in_base)
        m = m->base_midx;

    if (!m)
        BUG("broken midx?");
    if (pack_int_id >= m->num_packs + m->num_packs_in_base)
        BUG("broken pack_int_id?");

    p = m->packs[pack_int_id - m->num_packs_in_base];

, which is far too verbose to access a single pack by its pack_int_id in
a MIDX chain.

Let's instead have prepare_midx_pack() return a pointer to the
packed_git structure itself, hiding the above as an implementation
detail of prepare_midx_pack(). This patch turns the above snippet into:

    struct packed_git *p = prepare_midx_pack(the_repository, some_midx,
                                             some_pack_int_id);
    if (!p)
        die("could not load pack xyz");

making it far easier and less error-prone to access packs by their
pack_int_id in a MIDX chain.

(In the future, we may want to consider similar treatment for, e.g., the
pack_names array. Likewise, it might make sense to rename the "packs"
member of the MIDX structure to suggest that it shouldn't be accessed
directly outside of midx.c.)

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx-write.c  | 44 +++++++++++++---------------
 midx.c        | 81 +++++++++++++++++++++++++--------------------------
 midx.h        |  4 ++-
 pack-bitmap.c |  2 +-
 4 files changed, 64 insertions(+), 67 deletions(-)

diff --git a/midx-write.c b/midx-write.c
index ca2384e291..fc74be813d 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -943,25 +943,23 @@ static int fill_packs_from_midx_1(struct write_midx_context *ctx,
 				  int prepare_packs)
 {
 	for (uint32_t i = 0; i < m->num_packs; i++) {
-		/*
-		 * If generating a reverse index, need to have
-		 * packed_git's loaded to compare their
-		 * mtimes and object count.
-		 */
+		struct packed_git *p = NULL;
+
+		ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
 		if (prepare_packs) {
-			if (prepare_midx_pack(ctx->repo, m,
-					      m->num_packs_in_base + i)) {
+			p = prepare_midx_pack(ctx->repo, m,
+					      m->num_packs_in_base + i);
+			if (!p) {
 				error(_("could not load pack"));
 				return 1;
 			}
 
-			if (open_pack_index(m->packs[i]))
+			if (open_pack_index(p))
 				die(_("could not open index for %s"),
-				    m->packs[i]->pack_name);
+				    p->pack_name);
 		}
 
-		fill_pack_info(&ctx->info[ctx->nr++], m->packs[i],
-			       m->pack_names[i],
+		fill_pack_info(&ctx->info[ctx->nr++], p, m->pack_names[i],
 			       m->num_packs_in_base + i);
 	}
 
@@ -1588,20 +1586,19 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 					  _("Finding and deleting unreferenced packfiles"),
 					  m->num_packs);
 	for (i = 0; i < m->num_packs; i++) {
+		struct packed_git *p;
 		char *pack_name;
 		display_progress(progress, i + 1);
 
 		if (count[i])
 			continue;
 
-		if (prepare_midx_pack(r, m, i))
+		p = prepare_midx_pack(r, m, i);
+		if (!p || p->pack_keep || p->is_cruft)
 			continue;
 
-		if (m->packs[i]->pack_keep || m->packs[i]->is_cruft)
-			continue;
-
-		pack_name = xstrdup(m->packs[i]->pack_name);
-		close_pack(m->packs[i]);
+		pack_name = xstrdup(p->pack_name);
+		close_pack(p);
 
 		string_list_insert(&packs_to_drop, m->pack_names[i]);
 		unlink_pack_path(pack_name, 0);
@@ -1649,9 +1646,9 @@ static int want_included_pack(struct repository *r,
 
 	ASSERT(m && !m->base_midx);
 
-	if (prepare_midx_pack(r, m, pack_int_id))
+	p = prepare_midx_pack(r, m, pack_int_id);
+	if (!p)
 		return 0;
-	p = m->packs[pack_int_id];
 	if (!pack_kept_objects && p->pack_keep)
 		return 0;
 	if (p->is_cruft)
@@ -1697,12 +1694,11 @@ static void fill_included_packs_batch(struct repository *r,
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
+		struct packed_git *p = prepare_midx_pack(r, m, i);
+
 		pack_info[i].pack_int_id = i;
-
-		if (prepare_midx_pack(r, m, i))
-			continue;
-
-		pack_info[i].mtime = m->packs[i]->mtime;
+		if (p)
+			pack_info[i].mtime = p->mtime;
 	}
 
 	for (i = 0; i < m->num_objects; i++) {
diff --git a/midx.c b/midx.c
index 6705e77881..a2e7a3ec0e 100644
--- a/midx.c
+++ b/midx.c
@@ -451,50 +451,48 @@ static uint32_t midx_for_pack(struct multi_pack_index **_m,
 	return pack_int_id - m->num_packs_in_base;
 }
 
-int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
-		      uint32_t pack_int_id)
+struct packed_git *prepare_midx_pack(struct repository *r,
+				     struct multi_pack_index *m,
+				     uint32_t pack_int_id)
 {
-	struct strbuf pack_name = STRBUF_INIT;
-	struct strbuf key = STRBUF_INIT;
-	struct packed_git *p;
+	uint32_t pack_pos = midx_for_pack(&m, pack_int_id);
 
-	pack_int_id = midx_for_pack(&m, pack_int_id);
+	if (!m->packs[pack_pos]) {
+		struct strbuf pack_name = STRBUF_INIT;
+		struct strbuf key = STRBUF_INIT;
+		struct packed_git *p;
 
-	if (m->packs[pack_int_id] == MIDX_PACK_ERROR)
-		return 1;
-	if (m->packs[pack_int_id])
-		return 0;
+		strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
+			    m->pack_names[pack_pos]);
 
-	strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
-		    m->pack_names[pack_int_id]);
-
-	/* pack_map holds the ".pack" name, but we have the .idx */
-	strbuf_addbuf(&key, &pack_name);
-	strbuf_strip_suffix(&key, ".idx");
-	strbuf_addstr(&key, ".pack");
-	p = hashmap_get_entry_from_hash(&r->objects->pack_map,
-					strhash(key.buf), key.buf,
-					struct packed_git, packmap_ent);
-	if (!p) {
-		p = add_packed_git(r, pack_name.buf, pack_name.len, m->local);
-		if (p) {
-			install_packed_git(r, p);
-			list_add_tail(&p->mru, &r->objects->packed_git_mru);
+		/* pack_map holds the ".pack" name, but we have the .idx */
+		strbuf_addbuf(&key, &pack_name);
+		strbuf_strip_suffix(&key, ".idx");
+		strbuf_addstr(&key, ".pack");
+		p = hashmap_get_entry_from_hash(&r->objects->pack_map,
+						strhash(key.buf), key.buf,
+						struct packed_git, packmap_ent);
+		if (!p) {
+			p = add_packed_git(r, pack_name.buf, pack_name.len,
+					   m->local);
+			if (p) {
+				install_packed_git(r, p);
+				list_add_tail(&p->mru,
+					      &r->objects->packed_git_mru);
+			}
 		}
-	}
 
-	strbuf_release(&pack_name);
-	strbuf_release(&key);
+		strbuf_release(&pack_name);
+		strbuf_release(&key);
 
-	if (!p) {
-		m->packs[pack_int_id] = MIDX_PACK_ERROR;
-		return 1;
+		m->packs[pack_pos] = p ? p : MIDX_PACK_ERROR;
+		if (p)
+			p->multi_pack_index = 1;
 	}
 
-	p->multi_pack_index = 1;
-	m->packs[pack_int_id] = p;
-
-	return 0;
+	if (m->packs[pack_pos] == MIDX_PACK_ERROR)
+		return NULL;
+	return m->packs[pack_pos];
 }
 
 struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
@@ -523,10 +521,11 @@ int nth_bitmapped_pack(struct repository *r, struct multi_pack_index *m,
 	if (!m->chunk_bitmapped_packs)
 		return error(_("MIDX does not contain the BTMP chunk"));
 
-	if (prepare_midx_pack(r, m, pack_int_id))
-		return error(_("could not load bitmapped pack %"PRIu32), pack_int_id);
+	bp->p = prepare_midx_pack(r, m, pack_int_id);
+	if (!bp->p)
+		return error(_("could not load bitmapped pack %"PRIu32),
+			     pack_int_id);
 
-	bp->p = m->packs[local_pack_int_id];
 	bp->bitmap_pos = get_be32((char *)m->chunk_bitmapped_packs +
 				  MIDX_CHUNK_BITMAPPED_PACKS_WIDTH * local_pack_int_id);
 	bp->bitmap_nr = get_be32((char *)m->chunk_bitmapped_packs +
@@ -623,9 +622,9 @@ int fill_midx_entry(struct repository *r,
 	midx_for_object(&m, pos);
 	pack_int_id = nth_midxed_pack_int_id(m, pos);
 
-	if (prepare_midx_pack(r, m, pack_int_id))
+	p = prepare_midx_pack(r, m, pack_int_id);
+	if (!p)
 		return 0;
-	p = m->packs[pack_int_id - m->num_packs_in_base];
 
 	/*
 	* We are about to tell the caller where they can locate the
@@ -926,7 +925,7 @@ int verify_midx_file(struct repository *r, const char *object_dir, unsigned flag
 						  _("Looking for referenced packfiles"),
 						  m->num_packs + m->num_packs_in_base);
 	for (i = 0; i < m->num_packs + m->num_packs_in_base; i++) {
-		if (prepare_midx_pack(r, m, i))
+		if (!prepare_midx_pack(r, m, i))
 			midx_report("failed to load pack in position %d", i);
 
 		display_progress(progress, i + 1);
diff --git a/midx.h b/midx.h
index 0fb490f4d4..4ac05b8234 100644
--- a/midx.h
+++ b/midx.h
@@ -104,7 +104,9 @@ void get_split_midx_filename_ext(const struct git_hash_algo *hash_algo,
 struct multi_pack_index *load_multi_pack_index(struct repository *r,
 					       const char *object_dir,
 					       int local);
-int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
+struct packed_git *prepare_midx_pack(struct repository *r,
+				     struct multi_pack_index *m,
+				     uint32_t pack_int_id);
 struct packed_git *nth_midxed_pack(struct multi_pack_index *m,
 				   uint32_t pack_int_id);
 const char *nth_midxed_pack_name(struct multi_pack_index *m,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 8ddc150778..1f72e605d4 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -488,7 +488,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 	}
 
 	for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
-		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
+		if (!prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
 			warning(_("could not open pack %s"),
 				nth_midxed_pack_name(bitmap_git->midx, i));
 			goto cleanup;
-- 
2.49.0.640.ga4de40e6a8

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()`
  2025-05-28 22:59                     ` [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()` Taylor Blau
@ 2025-05-29 20:47                       ` Junio C Hamano
  2025-06-03 22:22                         ` Taylor Blau
  2025-05-29 20:51                       ` Junio C Hamano
  1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2025-05-29 20:47 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Patrick Steinhardt, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> Accessing a MIDX's 'pack_names' array is somewhat error-prone when
> dealing with incremental MIDX chains, where the (global) pack_int_id for
> some pack may differ from the containing layer's index for that pack.
>
> Introduce `nth_midxed_pack_name()` in an effort to reduce a common
> source of errors by discouraging external callers from accessing a
> layer's `pack_names` array directly.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx.c                    | 7 +++++++
>  midx.h                    | 2 ++
>  pack-bitmap.c             | 4 ++--
>  t/helper/test-read-midx.c | 7 ++++---
>  4 files changed, 15 insertions(+), 5 deletions(-)

Hmph, I am not sure if an accessor really makes it harder to make
mistakes, but I'd expect it to be mechanical rewrite from a[n] to
fn(a, n)?

> +const char *nth_midxed_pack_name(struct multi_pack_index *m,
> +				 uint32_t pack_int_id)
> +{
> +	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
> +	return m->pack_names[local_pack_int_id];
> +}

OK, midx_for_pack() takes a pack_int_id, finds the midx that
contains the pack (by updating the 'm' via its pointer arg), and
turns pack_int_id into local offset into m->pack_names[] array,
and returns that string.

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index b9f1d86604..8ddc150778 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -490,7 +490,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
>  	for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
>  		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
>  			warning(_("could not open pack %s"),
> -				bitmap_git->midx->pack_names[i]);
> +				nth_midxed_pack_name(bitmap_git->midx, i));

This loop runs from 0 to (num_packs + num_packs_in_base).  I
understand if it runs from num_packs_in_base to (num_packs +
num_packs_in_base), iterating only on this layer, but probably this
just tries to open everything (i.e. in addition to num_packs we
have, we know num_packs_in_base packs are there in our base layer(s),
so we iterate from 0 to that number).

The updated code converts the global 'i', which runs from 0 to
"everything under us" num_packs + num_packs_in_base, to
corresponding layer midx plus offset in it, so it looks good, but
then, is the original reference to bitmap_git->midx->pack_names[i]
even correct?  If we have a base, i can run larger than
bitmap_git->midx->num_packs, which is the size of the array
bitmap_git->midx->pack_names[].

Or, unlike how the proposed log message portrayed this change as
(i.e. code clean up), does this patch fix real bugs that manifest
only when midx files are chained?

> @@ -2469,7 +2469,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
>  			struct bitmapped_pack pack;
>  			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
>  				warning(_("unable to load pack: '%s', disabling pack-reuse"),
> -					bitmap_git->midx->pack_names[i]);
> +					nth_midxed_pack_name(bitmap_git->midx, i));
>  				free(packs);
>  				return;
>  			}

Similar to the above, this is also in a loop that runs from 0 to
num_packs+num_packs_in_base.  Is the array access to find the name
for the error message in the original even correct when midx are
chained?

> diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
> index ac81390899..fbed0f6919 100644
> --- a/t/helper/test-read-midx.c
> +++ b/t/helper/test-read-midx.c
> @@ -53,8 +53,9 @@ static int read_midx_file(const char *object_dir, const char *checksum,
>  	printf("\nnum_objects: %d\n", m->num_objects);
>  
>  	printf("packs:\n");
> -	for (i = 0; i < m->num_packs; i++)
> -		printf("%s\n", m->pack_names[i]);
> +	for (i = m->num_packs_in_base; i < m->num_packs + m->num_packs_in_base;
> +	     i++)
> +		printf("%s\n", nth_midxed_pack_name(m, i));

OK.  This used to iterate from 0 to num_packs using the local
offset.  Now it iterates from num_packs_in_base to num_packs_in_base+num_packs,
meaning we iterate over packs in the given midx.  No change in
behaviour, as accesses to m->pack_names[i] using the local offset in
the original was correct, and the updated code iterates using the
global offset.  This is not a bugfix but is a code cleanup.

> @@ -108,7 +109,7 @@ static int read_midx_preferred_pack(const char *object_dir)
>  		return 1;
>  	}
>  
> -	printf("%s\n", midx->pack_names[preferred_pack]);
> +	printf("%s\n", nth_midxed_pack_name(midx, preferred_pack));

Again, is the original buggy when midx are chained?

>  	close_midx(midx);
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()`
  2025-05-28 22:59                     ` [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()` Taylor Blau
  2025-05-29 20:47                       ` Junio C Hamano
@ 2025-05-29 20:51                       ` Junio C Hamano
  2025-06-03 22:23                         ` Taylor Blau
  1 sibling, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2025-05-29 20:51 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Patrick Steinhardt, Jeff King

Taylor Blau <me@ttaylorr.com> writes:

> Accessing a MIDX's 'pack_names' array is somewhat error-prone when
> dealing with incremental MIDX chains, where the (global) pack_int_id for
> some pack may differ from the containing layer's index for that pack.
>
> Introduce `nth_midxed_pack_name()` in an effort to reduce a common
> source of errors by discouraging external callers from accessing a
> layer's `pack_names` array directly.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  midx.c                    | 7 +++++++
>  midx.h                    | 2 ++
>  pack-bitmap.c             | 4 ++--
>  t/helper/test-read-midx.c | 7 ++++---
>  4 files changed, 15 insertions(+), 5 deletions(-)

One thing I forgot to ask.  Should we expect that

	$ git grep -E -e '(\.|->)pack_names\['

to give hits only from the implementation of nth_midxed_pack_name()?

> +const char *nth_midxed_pack_name(struct multi_pack_index *m,
> +				 uint32_t pack_int_id)
> +{
> +	uint32_t local_pack_int_id = midx_for_pack(&m, pack_int_id);
> +	return m->pack_names[local_pack_int_id];
> +}

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v3 0/2] packfile: avoid access(3p) calls for missing packs
  2025-05-28 12:24 ` [PATCH v3 " Patrick Steinhardt
  2025-05-28 12:24   ` [PATCH v3 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
  2025-05-28 12:24   ` [PATCH v3 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
@ 2025-05-30  6:27   ` Jeff King
  2 siblings, 0 replies; 53+ messages in thread
From: Jeff King @ 2025-05-30  6:27 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, Junio C Hamano

On Wed, May 28, 2025 at 02:24:09PM +0200, Patrick Steinhardt wrote:

> Changes in v3:
>   - Use a macro to hide away the `(void *)(intptr_t)-1` magic.
>   - Link to v2: https://lore.kernel.org/r/20250520-pks-pack-avoid-stats-on-missing-v2-0-333c5217fb05@pks.im

Thanks, this looks good to me.

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()`
  2025-05-28 22:59                     ` [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
@ 2025-05-30  6:50                       ` Jeff King
  2025-06-03 22:27                         ` Taylor Blau
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff King @ 2025-05-30  6:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Wed, May 28, 2025 at 06:59:09PM -0400, Taylor Blau wrote:

> Let's instead have prepare_midx_pack() return a pointer to the
> packed_git structure itself, hiding the above as an implementation
> detail of prepare_midx_pack(). This patch turns the above snippet into:
> 
>     struct packed_git *p = prepare_midx_pack(the_repository, some_midx,
>                                              some_pack_int_id);
>     if (!p)
>         die("could not load pack xyz");
> 
> making it far easier and less error-prone to access packs by their
> pack_int_id in a MIDX chain.

So obviously I like this direction, but a few small comments:

> (In the future, we may want to consider similar treatment for, e.g., the
> pack_names array. Likewise, it might make sense to rename the "packs"
> member of the MIDX structure to suggest that it shouldn't be accessed
> directly outside of midx.c.)

Is this note still valid for v2? It looks like patch 1 adds
nth_midxed_pack_name() and tries to use it everywhere.

> @@ -1649,9 +1646,9 @@ static int want_included_pack(struct repository *r,
>  
>  	ASSERT(m && !m->base_midx);
>  
> -	if (prepare_midx_pack(r, m, pack_int_id))
> +	p = prepare_midx_pack(r, m, pack_int_id);
> +	if (!p)
>  		return 0;
> -	p = m->packs[pack_int_id];
>  	if (!pack_kept_objects && p->pack_keep)
>  		return 0;
>  	if (p->is_cruft)

The ASSERT() in the context is from earlier in the series. But do we
need it once we have this patch? We no longer look at pack_int_id except
to pass it to prepare_midx_pack(), which handles non-base midx slices
just fine.

So we could loosen the assertion now. Or we could wait for later when
somebody wants/needs to do so, but I'm not sure how easy they would find
it to dig in the history. They would find the commit that added the
ASSERT(), but may not realize that this later commit made it OK to
loosen.

I didn't check the other ASSERT() spots from that earlier patch (IIRC,
some of them may actually look use the pack_int_id for other things, and
wouldn't be ready for non-base slices).

> -int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
> -		      uint32_t pack_int_id)
> +struct packed_git *prepare_midx_pack(struct repository *r,
> +				     struct multi_pack_index *m,
> +				     uint32_t pack_int_id)

We used to return "1" for failure and "0" for success. Now we're
reversed: we return NULL for failure and non-zero for success.

So code like:

  if (prepare_midx_pack(...))
	return error("yikes");

needs to be updated, but the compiler won't help us because it is happy
to convert both an int and a pointer into a boolean check.

Should we rename the function to make sure we catch any callers for
topics in flight?

I'd have thought we could call it nth_midxed_pack(), but that seems to
exist already, with the caveat that it never prepares the pack, but only
serves what's in the cache. I wonder if we could simply replace that
with what prepare_midx_pack() does, but it may be more conservative to
leave the two separate. So I guess nth_midxed_pack_load() or something.

-Peff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()`
  2025-05-29 20:47                       ` Junio C Hamano
@ 2025-06-03 22:22                         ` Taylor Blau
  0 siblings, 0 replies; 53+ messages in thread
From: Taylor Blau @ 2025-06-03 22:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Jeff King

On Thu, May 29, 2025 at 01:47:44PM -0700, Junio C Hamano wrote:
> > diff --git a/pack-bitmap.c b/pack-bitmap.c
> > index b9f1d86604..8ddc150778 100644
> > --- a/pack-bitmap.c
> > +++ b/pack-bitmap.c
> > @@ -490,7 +490,7 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
> >  	for (i = 0; i < bitmap_git->midx->num_packs + bitmap_git->midx->num_packs_in_base; i++) {
> >  		if (prepare_midx_pack(bitmap_repo(bitmap_git), bitmap_git->midx, i)) {
> >  			warning(_("could not open pack %s"),
> > -				bitmap_git->midx->pack_names[i]);
> > +				nth_midxed_pack_name(bitmap_git->midx, i));
>
> This loop runs from 0 to (num_packs + num_packs_in_base).  I
> understand if it runs from num_packs_in_base to (num_packs +
> num_packs_in_base), iterating only on this layer, but probably this
> just tries to open everything (i.e. in addition to num_packs we
> have, we know num_packs_in_base packs are there in our base layer(s),
> so we iterate from 0 to that number).
>
> The updated code converts the global 'i', which runs from 0 to
> "everything under us" num_packs + num_packs_in_base, to
> corresponding layer midx plus offset in it, so it looks good, but
> then, is the original reference to bitmap_git->midx->pack_names[i]
> even correct?  If we have a base, i can run larger than
> bitmap_git->midx->num_packs, which is the size of the array
> bitmap_git->midx->pack_names[].
>
> Or, unlike how the proposed log message portrayed this change as
> (i.e. code clean up), does this patch fix real bugs that manifest
> only when midx files are chained?

Right; the original code was buggy if we had a failure opening a MIDX'd
pack outside of the base layer in an incremental MIDX bitmap. Reading
the proposed log message again, I see what you're saying. I am happy to
clarify that this is indeed a bugfix, not just a cleanup.

> > @@ -2469,7 +2469,7 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
> >  			struct bitmapped_pack pack;
> >  			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
> >  				warning(_("unable to load pack: '%s', disabling pack-reuse"),
> > -					bitmap_git->midx->pack_names[i]);
> > +					nth_midxed_pack_name(bitmap_git->midx, i));
> >  				free(packs);
> >  				return;
> >  			}
>
> Similar to the above, this is also in a loop that runs from 0 to
> num_packs+num_packs_in_base.  Is the array access to find the name
> for the error message in the original even correct when midx are
> chained?

Right; this spot suffers from the same bug as the previous hunk.

> > diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
> > index ac81390899..fbed0f6919 100644
> > --- a/t/helper/test-read-midx.c
> > +++ b/t/helper/test-read-midx.c
> > @@ -53,8 +53,9 @@ static int read_midx_file(const char *object_dir, const char *checksum,
> >  	printf("\nnum_objects: %d\n", m->num_objects);
> >
> >  	printf("packs:\n");
> > -	for (i = 0; i < m->num_packs; i++)
> > -		printf("%s\n", m->pack_names[i]);
> > +	for (i = m->num_packs_in_base; i < m->num_packs + m->num_packs_in_base;
> > +	     i++)
> > +		printf("%s\n", nth_midxed_pack_name(m, i));
>
> OK.  This used to iterate from 0 to num_packs using the local
> offset.  Now it iterates from num_packs_in_base to num_packs_in_base+num_packs,
> meaning we iterate over packs in the given midx.  No change in
> behaviour, as accesses to m->pack_names[i] using the local offset in
> the original was correct, and the updated code iterates using the
> global offset.  This is not a bugfix but is a code cleanup.

Right.

> > @@ -108,7 +109,7 @@ static int read_midx_preferred_pack(const char *object_dir)
> >  		return 1;
> >  	}
> >
> > -	printf("%s\n", midx->pack_names[preferred_pack]);
> > +	printf("%s\n", nth_midxed_pack_name(midx, preferred_pack));
>
> Again, is the original buggy when midx are chained?

This is also a bugfix.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()`
  2025-05-29 20:51                       ` Junio C Hamano
@ 2025-06-03 22:23                         ` Taylor Blau
  0 siblings, 0 replies; 53+ messages in thread
From: Taylor Blau @ 2025-06-03 22:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Jeff King

On Thu, May 29, 2025 at 01:51:03PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Accessing a MIDX's 'pack_names' array is somewhat error-prone when
> > dealing with incremental MIDX chains, where the (global) pack_int_id for
> > some pack may differ from the containing layer's index for that pack.
> >
> > Introduce `nth_midxed_pack_name()` in an effort to reduce a common
> > source of errors by discouraging external callers from accessing a
> > layer's `pack_names` array directly.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> >  midx.c                    | 7 +++++++
> >  midx.h                    | 2 ++
> >  pack-bitmap.c             | 4 ++--
> >  t/helper/test-read-midx.c | 7 ++++---
> >  4 files changed, 15 insertions(+), 5 deletions(-)
>
> One thing I forgot to ask.  Should we expect that
>
> 	$ git grep -E -e '(\.|->)pack_names\['
>
> to give hits only from the implementation of nth_midxed_pack_name()?

I waffled on this a bit... and I think the answer is "almost". Certainly
callers outside of midx.c or midx-write.c should not be touching
"pack_names" directly.

But there are a couple of spots in midx-write.c that are really more
convenient to directly look at the pack_names array rather than going
through the accessor.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()`
  2025-05-30  6:50                       ` Jeff King
@ 2025-06-03 22:27                         ` Taylor Blau
  2025-08-28 23:25                           ` Junio C Hamano
  0 siblings, 1 reply; 53+ messages in thread
From: Taylor Blau @ 2025-06-03 22:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Patrick Steinhardt, Junio C Hamano

On Fri, May 30, 2025 at 02:50:34AM -0400, Jeff King wrote:
> On Wed, May 28, 2025 at 06:59:09PM -0400, Taylor Blau wrote:
>
> > Let's instead have prepare_midx_pack() return a pointer to the
> > packed_git structure itself, hiding the above as an implementation
> > detail of prepare_midx_pack(). This patch turns the above snippet into:
> >
> >     struct packed_git *p = prepare_midx_pack(the_repository, some_midx,
> >                                              some_pack_int_id);
> >     if (!p)
> >         die("could not load pack xyz");
> >
> > making it far easier and less error-prone to access packs by their
> > pack_int_id in a MIDX chain.
>
> So obviously I like this direction, but a few small comments:
>
> > (In the future, we may want to consider similar treatment for, e.g., the
> > pack_names array. Likewise, it might make sense to rename the "packs"
> > member of the MIDX structure to suggest that it shouldn't be accessed
> > directly outside of midx.c.)
>
> Is this note still valid for v2? It looks like patch 1 adds
> nth_midxed_pack_name() and tries to use it everywhere.

Yeah, we should get rid of this. I had written it before I wrote what is
now the first patch in this series, and neglected to remove it before
sending out the latest round.

> > @@ -1649,9 +1646,9 @@ static int want_included_pack(struct repository *r,
> >
> >  	ASSERT(m && !m->base_midx);
> >
> > -	if (prepare_midx_pack(r, m, pack_int_id))
> > +	p = prepare_midx_pack(r, m, pack_int_id);
> > +	if (!p)
> >  		return 0;
> > -	p = m->packs[pack_int_id];
> >  	if (!pack_kept_objects && p->pack_keep)
> >  		return 0;
> >  	if (p->is_cruft)
>
> The ASSERT() in the context is from earlier in the series. But do we
> need it once we have this patch? We no longer look at pack_int_id except
> to pass it to prepare_midx_pack(), which handles non-base midx slices
> just fine.
>
> So we could loosen the assertion now. Or we could wait for later when
> somebody wants/needs to do so, but I'm not sure how easy they would find
> it to dig in the history. They would find the commit that added the
> ASSERT(), but may not realize that this later commit made it OK to
> loosen.
>
> I didn't check the other ASSERT() spots from that earlier patch (IIRC,
> some of them may actually look use the pack_int_id for other things, and
> wouldn't be ready for non-base slices).

We could loosen the assertion here, but part of me likes keeping it as a
self-documenting note that this function is only intended to be used for
non-incremental MIDXs.

> > -int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
> > -		      uint32_t pack_int_id)
> > +struct packed_git *prepare_midx_pack(struct repository *r,
> > +				     struct multi_pack_index *m,
> > +				     uint32_t pack_int_id)
>
> We used to return "1" for failure and "0" for success. Now we're
> reversed: we return NULL for failure and non-zero for success.
>
> So code like:
>
>   if (prepare_midx_pack(...))
> 	return error("yikes");
>
> needs to be updated, but the compiler won't help us because it is happy
> to convert both an int and a pointer into a boolean check.
>
> Should we rename the function to make sure we catch any callers for
> topics in flight?
>
> I'd have thought we could call it nth_midxed_pack(), but that seems to
> exist already, with the caveat that it never prepares the pack, but only
> serves what's in the cache. I wonder if we could simply replace that
> with what prepare_midx_pack() does, but it may be more conservative to
> leave the two separate. So I guess nth_midxed_pack_load() or something.

In general there aren't a ton of in-flight changes in the MIDX code at
any given time, so I think we could get away without renaming it. But I
don't mind erring on the side of caution here, either.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()`
  2025-06-03 22:27                         ` Taylor Blau
@ 2025-08-28 23:25                           ` Junio C Hamano
  0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2025-08-28 23:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, Patrick Steinhardt

Taylor Blau <me@ttaylorr.com> writes:

> On Fri, May 30, 2025 at 02:50:34AM -0400, Jeff King wrote:
>> ...
>> Is this note still valid for v2? It looks like patch 1 adds
>> nth_midxed_pack_name() and tries to use it everywhere.
>
> Yeah, we should get rid of this. I had written it before I wrote what is
> now the first patch in this series, and neglected to remove it before
> sending out the latest round.
> ...
>> I'd have thought we could call it nth_midxed_pack(), but that seems to
>> exist already, with the caveat that it never prepares the pack, but only
>> serves what's in the cache. I wonder if we could simply replace that
>> with what prepare_midx_pack() does, but it may be more conservative to
>> leave the two separate. So I guess nth_midxed_pack_load() or something.
>
> In general there aren't a ton of in-flight changes in the MIDX code at
> any given time, so I think we could get away without renaming it. But I
> don't mind erring on the side of caution here, either.

The topic went dark after this message.  Are there still unresolved
issues, or do we want to get it unstuck?

Thanks.

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2025-08-28 23:25 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-16  8:55 [PATCH] packfile: avoid access(3p) calls for missing packs Patrick Steinhardt
2025-05-16 18:34 ` Junio C Hamano
2025-05-19  6:52   ` Jeff King
2025-05-19 15:46     ` Junio C Hamano
2025-05-20  6:45     ` Patrick Steinhardt
2025-05-22  5:28       ` Jeff King
2025-05-23  1:02     ` Taylor Blau
2025-05-23  2:03       ` Jeff King
2025-05-20  9:53 ` [PATCH v2 0/2] " Patrick Steinhardt
2025-05-20  9:53   ` [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
2025-05-23  1:03     ` Taylor Blau
2025-05-20  9:53   ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
2025-05-22  5:32     ` Jeff King
2025-05-22 15:47       ` Junio C Hamano
2025-05-22 16:59         ` Jeff King
2025-05-22 18:44           ` Junio C Hamano
2025-05-23  1:22           ` Taylor Blau
2025-05-23  2:08             ` Jeff King
2025-05-23 17:46               ` Taylor Blau
2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
2025-05-25 18:41                   ` [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack Taylor Blau
2025-05-26  7:23                     ` Patrick Steinhardt
2025-05-28  2:00                       ` Taylor Blau
2025-05-25 18:41                   ` [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
2025-05-26  7:23                     ` Patrick Steinhardt
2025-05-28  2:08                       ` Taylor Blau
2025-05-25 18:41                   ` [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx() Taylor Blau
2025-05-26  7:23                     ` Patrick Steinhardt
2025-05-28  2:15                       ` Taylor Blau
2025-05-25 18:42                   ` [PATCH 4/5] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
2025-05-25 18:42                   ` [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
2025-05-26  7:24                     ` Patrick Steinhardt
2025-05-28  2:18                       ` Taylor Blau
2025-05-28 11:53                         ` Patrick Steinhardt
2025-05-28 22:58                   ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()` Taylor Blau
2025-05-29 20:47                       ` Junio C Hamano
2025-06-03 22:22                         ` Taylor Blau
2025-05-29 20:51                       ` Junio C Hamano
2025-06-03 22:23                         ` Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 2/4] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 3/4] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
2025-05-30  6:50                       ` Jeff King
2025-06-03 22:27                         ` Taylor Blau
2025-08-28 23:25                           ` Junio C Hamano
2025-05-23  1:31       ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Taylor Blau
2025-05-23  2:18         ` Jeff King
2025-05-21 13:24   ` [PATCH v2 0/2] packfile: avoid access(3p) calls for missing packs Junio C Hamano
2025-05-28 12:24 ` [PATCH v3 " Patrick Steinhardt
2025-05-28 12:24   ` [PATCH v3 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
2025-05-28 12:24   ` [PATCH v3 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
2025-05-30  6:27   ` [PATCH v3 0/2] packfile: avoid access(3p) calls for missing packs Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).