git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>,
	Patrick Steinhardt <ps@pks.im>,
	git@vger.kernel.org
Subject: Re: [PATCH] packfile: avoid access(3p) calls for missing packs
Date: Thu, 22 May 2025 21:02:06 -0400	[thread overview]
Message-ID: <aC/JDrF5uJh9XxXj@nand.local> (raw)
In-Reply-To: <20250519065221.GC102701@coredump.intra.peff.net>

On Mon, May 19, 2025 at 02:52:21AM -0400, Jeff King wrote:
> On Fri, May 16, 2025 at 11:34:10AM -0700, Junio C Hamano wrote:
>
> > > The reason why we do this is likely because we reuse `p->pack_name` to
> > > derive the other files' names, as well, so that we only have to allocate
> > > this buffer once. As such, we have to compute the packfile name last so
> > > that it doesn't get overwritten by the other names.
> >
> > I vaguely recall that the order of checks between .idx and .pack
> > were deliberately chosen, as we do not want to add and attempt to
> > use .pack file before its associated .idx file is ready since
> > without the latter the former is unusable at runtime.
>
> It is definitely intentional and important that we discover packs by
> their .idx file first. The writing side uses the creation of the .idx
> file (via atomic rename) as the final step in making a pack accessible.
> Before that, a reader could see a .pack without its various accessory
> files (and so not realize it has a .keep file, for example).

Yup. Or seeing a pack before its .bitmap is written, in which case any
clones or fetches unfortunate enough to take place after the pack was
read but before the .bitmap was written  would be significantly slower
than they otherwise would have been with bitmaps.

> > > All of this likely doesn't matter in the general case: Git shouldn't end
> > > up looking too many nonexistent packfiles anyway.
> >
> > In any case, we shouldn't attempt to use .pack when its associated
> > files are missing.  It is not about nonexistent but about incomplete
> > (including "in the process of being written").
>
> This shouldn't be a race for files being written. The .idx is written
> last, so the .pack should always be there. And the midx is written after
> that, so again, the pack should always be there. At least for newly
> written packs.
>
> We might see the opposite: packs going away because they were repacked
> (and perhaps even a new midx generated). In that case you might see
> a race like:
>
>   - process R opens the existing midx, which mentions pack P (but does
>     not yet open P)
>
>   - process W repacks, deleting P and generating a new midx with P'
>
>   - process R wants object X; the midx claims it is in pack P, so it
>     tries to open that but fails
>
> What should happen here is that R will fail to find the object at all,
> should hit reprepare_packed_git(), and will then pick up the new pack.
>
> I don't recall offhand whether reprepare_packed_git() will open the new
> midx. If it doesn't, we'd still find the object via the actual pack
> .idx, but we may end up consulting the now-stale midx over and over (so
> we'll get the right answer, but there may be some room for
> optimization).

This all sounds right to me. And we do end up loading a new MIDX via
reprepare_packed_git(): that function calls prepare_packed_git() (which
doesn't immediately return, since we just zero'd out the
r->objects->packed_git_initialized flag). We then enumerate the ODBs,
calling prepare_multi_pack_index_one() and prepare_packed_git_one() for
each.

From there we end up in load_multi_pack_index(), which gets a freshened
view of the MIDX file. If all goes well there, then
prepare_multi_pack_index_one() assigns it to
r->objects->multi_pack_index as expected.

> The code in prepare_midx_pack() converts a numeric pack id (referenced
> inside the midx) into a "struct packed_git" pointer, caching the results
> in multi_pack_index->packs. That field holds NULL for "we have not
> looked it up yet" or a valid pointer to a packed_git. It probably needs
> to hold a third state: "we tried and failed".

Yup, that seems like a good direction to me.

> [1] There probably are optimization opportunities in add_packed_git(). I
>     don't think re-ordering will help much in the common case that we
>     actually do have the pack. But really, most callers do not care
>     about these auxiliary files at all! We could simply skip them during
>     the initial setup and lazy-load them via accessor functions.

Nice idea. I initially though, "that's not right, we definitely need to
populate the flag bits, e.g., p->is_cruft". But that was before I read
"and lazy-load them via accessor functions". That turns those fields
into a tri-state, which is a nice way to avoid this altogether.

Thanks,
Taylor

  parent reply	other threads:[~2025-05-23  1:02 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-16  8:55 [PATCH] packfile: avoid access(3p) calls for missing packs Patrick Steinhardt
2025-05-16 18:34 ` Junio C Hamano
2025-05-19  6:52   ` Jeff King
2025-05-19 15:46     ` Junio C Hamano
2025-05-20  6:45     ` Patrick Steinhardt
2025-05-22  5:28       ` Jeff King
2025-05-23  1:02     ` Taylor Blau [this message]
2025-05-23  2:03       ` Jeff King
2025-05-20  9:53 ` [PATCH v2 0/2] " Patrick Steinhardt
2025-05-20  9:53   ` [PATCH v2 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
2025-05-23  1:03     ` Taylor Blau
2025-05-20  9:53   ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
2025-05-22  5:32     ` Jeff King
2025-05-22 15:47       ` Junio C Hamano
2025-05-22 16:59         ` Jeff King
2025-05-22 18:44           ` Junio C Hamano
2025-05-23  1:22           ` Taylor Blau
2025-05-23  2:08             ` Jeff King
2025-05-23 17:46               ` Taylor Blau
2025-05-25 18:41                 ` [PATCH 0/5] midx: improve prepare_midx_pack() ergonomics Taylor Blau
2025-05-25 18:41                   ` [PATCH 1/5] pack-bitmap.c: fix broken warning() when missing MIDX'd pack Taylor Blau
2025-05-26  7:23                     ` Patrick Steinhardt
2025-05-28  2:00                       ` Taylor Blau
2025-05-25 18:41                   ` [PATCH 2/5] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
2025-05-26  7:23                     ` Patrick Steinhardt
2025-05-28  2:08                       ` Taylor Blau
2025-05-25 18:41                   ` [PATCH 3/5] midx-write.c: simplify fill_packs_from_midx() Taylor Blau
2025-05-26  7:23                     ` Patrick Steinhardt
2025-05-28  2:15                       ` Taylor Blau
2025-05-25 18:42                   ` [PATCH 4/5] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
2025-05-25 18:42                   ` [PATCH 5/5] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
2025-05-26  7:24                     ` Patrick Steinhardt
2025-05-28  2:18                       ` Taylor Blau
2025-05-28 11:53                         ` Patrick Steinhardt
2025-05-28 22:58                   ` [PATCH v2 0/4] midx: improve prepare_midx_pack() ergonomics Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 1/4] midx: access pack names through `nth_midxed_pack_name()` Taylor Blau
2025-05-29 20:47                       ` Junio C Hamano
2025-06-03 22:22                         ` Taylor Blau
2025-05-29 20:51                       ` Junio C Hamano
2025-06-03 22:23                         ` Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 2/4] midx-write.c: guard against incremental MIDXs in want_included_pack() Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 3/4] midx-write.c: extract inner loop from fill_packs_from_midx() Taylor Blau
2025-05-28 22:59                     ` [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()` Taylor Blau
2025-05-30  6:50                       ` Jeff King
2025-06-03 22:27                         ` Taylor Blau
2025-08-28 23:25                           ` Junio C Hamano
2025-05-23  1:31       ` [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles Taylor Blau
2025-05-23  2:18         ` Jeff King
2025-05-21 13:24   ` [PATCH v2 0/2] packfile: avoid access(3p) calls for missing packs Junio C Hamano
2025-05-28 12:24 ` [PATCH v3 " Patrick Steinhardt
2025-05-28 12:24   ` [PATCH v3 1/2] packfile: explain ordering of how we look up auxiliary pack files Patrick Steinhardt
2025-05-28 12:24   ` [PATCH v3 2/2] midx: stop repeatedly looking up nonexistent packfiles Patrick Steinhardt
2025-05-30  6:27   ` [PATCH v3 0/2] packfile: avoid access(3p) calls for missing packs Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aC/JDrF5uJh9XxXj@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).