git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH 2/8] packfile: move the MRU list into the packfile store
Date: Thu, 30 Oct 2025 09:59:20 +0100	[thread overview]
Message-ID: <aQMo6DxZqYc6gEjQ@pks.im> (raw)
In-Reply-To: <aQKXpM8g3Oy3DVAa@nand.local>

On Wed, Oct 29, 2025 at 06:39:32PM -0400, Taylor Blau wrote:
> On Tue, Oct 28, 2025 at 12:08:32PM +0100, Patrick Steinhardt wrote:
> > Packfiles have two lists associated to them:
> >
> >   - A list that keeps track of packfiles in the order that they were
> >     added to a packfile store.
> >
> >   - A list that keeps track of packfiles in most-recently-used order so
> >     that packfiles that are more likely to contain a specific object are
> >     ordered towards the front.
> >
> > Both of these lists are hosted by `struct packed_git` itself, So to
> > identify all packfiles in a repository you simply need to grab the first
> > packfile and then iterate the `->next` pointers or the MRU list. This
> > pattern has the problem that all packfiles are part of the same list,
> > regardless of whether or not they belong to the same object source.
> >
> > With the upcoming pluggable object database effort this needs to change:
> > packfiles should be contained by a single object source, and reading an
> > object from any such packfile should use that source to look up the
> > object. Consequently, we need to break up the global lists of packfiles
> 
> s/lists/list/

It's actually two: the MRU-ordered and the mtime-ordered one.

> > into per-object-source lists.
> 
> How does this work for alternates? My understanding is that each
> alternate now has its own object source. So to perform an object lookup
> in a repository with alternate(s), I am assuming that at some layer we
> need to iterate over those sources to then enumerate the packs in that
> source looking for some object.
> 
> I would have imagined that packfile.c::find_pack_entry() would have to
> be adjusted in a similar way as above, but I couldn't find the changes
> in this series, so I feel like I must be missing something in my
> understanding of how this all works together :-).
> 
> Are packs from different sources still connected somehow such that
> iterating over the list of packs from one source will enumerate the list
> of packs from all sources?

This whole mechanism isn't yet part of this series :) So right now we
don't really change anything yet, and the list of packfiles is still a
global list across all alternates. This series is thus basically only
paving the path towards having per-alternate packfile stores.

Moving the packfile store into the sources is going to be part of the
next series.

> > A first step towards this goal is to move those lists ouf of `struct
> 
> s/ouf/out/
> 
> > packed_git` and into the packfile store. While the packfile store is
> > currently sitting on the `struct object_database` level, the intent is
> > to push it down one level into the `struct odb_source` in a subsequent
> > patch series.
> 
> Before sending, I was confused by "Consequently, we need to break up the
> global lists of packfiles [...]", since it wasn't clear whether or not
> this series realizes that goal, or pushes us in the direction towards
> it.
> 
> But this clarifies things, and I think is the reason that we do not see
> more invasive changes like needing to enumerate the MRU cache of each
> store in order to find an object like I mentioned above.

Yup, exactly.

> > Introduce a new `struct packfile_list` that is used to manage lists of
> > packfiles and use it to store the list of most-recently-used packfiles
> > in `struct packfile_store`. For now, the new list type is only used in a
> > single spot, but we'll expand its usage in subsequent patches.
> 
> I am a little curious why we need a new list type and implementation
> here. Is it to avoid exposing the list as part of struct packed_git like
> we are forced to do with list_head?
> 
> I could imagine that you might want to avoid exposing the "struct
> list_head mru" part of packed_git to avoid the suggestion that all
> packfiles (including those from different sources) are part of the same
> list. But if that's the case, I wonder if we couldn't have kept the same
> mru list and clarified via comment that it is per-store, not global.
> 
> I suppose that is a bit of a foot-gun, and perhaps that is what you are
> trying to do here, but after reading the patch message a few times I
> wasn't clear on what the motivation for the new type was.

Yes, this is one of the reasons, it very much feels like a foot-gun to
me. I found it significantly harder to work with the list embedded into
the packfiles themselves, and it made it significantly harder to check
whether the split really is done correctly. So by moving the list into
the packfile store it now becomes obvious in our code's layout, and it
becomes much easier to build an implicitly-correct mental model.

The second reason though is that packfiles aren't only used in the
context of our ODB, but also by other layers like our transport. I want
to have a clean split so that a packfile is a completely separate entity
that can exist without an object store. But if the list pointers used by
the store are embedded into the packfile itself, then that boundary gets
a lot more fuzzy.

So it's basically separation of concerns: `struct packed_git` should
only ever be concerned about a singular packfile. And the packfile store
is then concerned with managing a set of packfiles.

Thanks!

Patrick

  reply	other threads:[~2025-10-30  8:59 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-28 11:08 [PATCH 0/8] packfiles: track pack lists via the packfile store Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 1/8] packfile: use a `strmap` to store packs by name Patrick Steinhardt
2025-10-29 22:16   ` Taylor Blau
2025-10-28 11:08 ` [PATCH 2/8] packfile: move the MRU list into the packfile store Patrick Steinhardt
2025-10-29 22:39   ` Taylor Blau
2025-10-30  8:59     ` Patrick Steinhardt [this message]
2025-10-28 11:08 ` [PATCH 3/8] http: refactor subsystem to use `packfile_list`s Patrick Steinhardt
2025-10-29 14:24   ` Toon Claes
2025-10-30  8:58     ` Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 4/8] packfile: fix approximation of object counts Patrick Steinhardt
2025-10-29 22:49   ` Taylor Blau
2025-10-30  8:58     ` Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 5/8] builtin/pack-objects: simplify logic to find kept or nonlocal objects Patrick Steinhardt
2025-10-29 14:55   ` Toon Claes
2025-10-29 23:15     ` Taylor Blau
2025-10-30  8:59     ` Patrick Steinhardt
2025-10-29 23:13   ` Taylor Blau
2025-10-30  8:58     ` Patrick Steinhardt
2025-10-30  9:31   ` Toon Claes
2025-10-30  9:52     ` Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 6/8] packfile: move list of packs into the packfile store Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 7/8] packfile: always add packfiles to MRU when adding a pack Patrick Steinhardt
2025-10-29 23:25   ` Taylor Blau
2025-10-30  8:58     ` Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 8/8] packfile: track packs via the MRU list exclusively Patrick Steinhardt
2025-10-30 10:38 ` [PATCH v2 0/8] packfiles: track pack lists via the packfile store Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 1/8] packfile: use a `strmap` to store packs by name Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 2/8] packfile: move the MRU list into the packfile store Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 3/8] http: refactor subsystem to use `packfile_list`s Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 4/8] packfile: fix approximation of object counts Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 5/8] builtin/pack-objects: simplify logic to find kept or nonlocal objects Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 6/8] packfile: move list of packs into the packfile store Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 7/8] packfile: always add packfiles to MRU when adding a pack Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 8/8] packfile: track packs via the MRU list exclusively Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQMo6DxZqYc6gEjQ@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).