git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: [PATCH 2/8] packfile: move the MRU list into the packfile store
Date: Wed, 29 Oct 2025 18:39:32 -0400	[thread overview]
Message-ID: <aQKXpM8g3Oy3DVAa@nand.local> (raw)
In-Reply-To: <20251028-pks-packfiles-store-drop-list-v1-2-1a3b82030a7a@pks.im>

On Tue, Oct 28, 2025 at 12:08:32PM +0100, Patrick Steinhardt wrote:
> Packfiles have two lists associated to them:
>
>   - A list that keeps track of packfiles in the order that they were
>     added to a packfile store.
>
>   - A list that keeps track of packfiles in most-recently-used order so
>     that packfiles that are more likely to contain a specific object are
>     ordered towards the front.
>
> Both of these lists are hosted by `struct packed_git` itself, So to
> identify all packfiles in a repository you simply need to grab the first
> packfile and then iterate the `->next` pointers or the MRU list. This
> pattern has the problem that all packfiles are part of the same list,
> regardless of whether or not they belong to the same object source.
>
> With the upcoming pluggable object database effort this needs to change:
> packfiles should be contained by a single object source, and reading an
> object from any such packfile should use that source to look up the
> object. Consequently, we need to break up the global lists of packfiles

s/lists/list/

> into per-object-source lists.

How does this work for alternates? My understanding is that each
alternate now has its own object source. So to perform an object lookup
in a repository with alternate(s), I am assuming that at some layer we
need to iterate over those sources to then enumerate the packs in that
source looking for some object.

I would have imagined that packfile.c::find_pack_entry() would have to
be adjusted in a similar way as above, but I couldn't find the changes
in this series, so I feel like I must be missing something in my
understanding of how this all works together :-).

Are packs from different sources still connected somehow such that
iterating over the list of packs from one source will enumerate the list
of packs from all sources?

> A first step towards this goal is to move those lists ouf of `struct

s/ouf/out/

> packed_git` and into the packfile store. While the packfile store is
> currently sitting on the `struct object_database` level, the intent is
> to push it down one level into the `struct odb_source` in a subsequent
> patch series.

Before sending, I was confused by "Consequently, we need to break up the
global lists of packfiles [...]", since it wasn't clear whether or not
this series realizes that goal, or pushes us in the direction towards
it.

But this clarifies things, and I think is the reason that we do not see
more invasive changes like needing to enumerate the MRU cache of each
store in order to find an object like I mentioned above.

> Introduce a new `struct packfile_list` that is used to manage lists of
> packfiles and use it to store the list of most-recently-used packfiles
> in `struct packfile_store`. For now, the new list type is only used in a
> single spot, but we'll expand its usage in subsequent patches.

I am a little curious why we need a new list type and implementation
here. Is it to avoid exposing the list as part of struct packed_git like
we are forced to do with list_head?

I could imagine that you might want to avoid exposing the "struct
list_head mru" part of packed_git to avoid the suggestion that all
packfiles (including those from different sources) are part of the same
list. But if that's the case, I wonder if we couldn't have kept the same
mru list and clarified via comment that it is per-store, not global.

I suppose that is a bit of a foot-gun, and perhaps that is what you are
trying to do here, but after reading the patch message a few times I
wasn't clear on what the motivation for the new type was.

Thanks,
Taylor

  reply	other threads:[~2025-10-29 22:39 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-28 11:08 [PATCH 0/8] packfiles: track pack lists via the packfile store Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 1/8] packfile: use a `strmap` to store packs by name Patrick Steinhardt
2025-10-29 22:16   ` Taylor Blau
2025-10-28 11:08 ` [PATCH 2/8] packfile: move the MRU list into the packfile store Patrick Steinhardt
2025-10-29 22:39   ` Taylor Blau [this message]
2025-10-30  8:59     ` Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 3/8] http: refactor subsystem to use `packfile_list`s Patrick Steinhardt
2025-10-29 14:24   ` Toon Claes
2025-10-30  8:58     ` Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 4/8] packfile: fix approximation of object counts Patrick Steinhardt
2025-10-29 22:49   ` Taylor Blau
2025-10-30  8:58     ` Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 5/8] builtin/pack-objects: simplify logic to find kept or nonlocal objects Patrick Steinhardt
2025-10-29 14:55   ` Toon Claes
2025-10-29 23:15     ` Taylor Blau
2025-10-30  8:59     ` Patrick Steinhardt
2025-10-29 23:13   ` Taylor Blau
2025-10-30  8:58     ` Patrick Steinhardt
2025-10-30  9:31   ` Toon Claes
2025-10-30  9:52     ` Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 6/8] packfile: move list of packs into the packfile store Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 7/8] packfile: always add packfiles to MRU when adding a pack Patrick Steinhardt
2025-10-29 23:25   ` Taylor Blau
2025-10-30  8:58     ` Patrick Steinhardt
2025-10-28 11:08 ` [PATCH 8/8] packfile: track packs via the MRU list exclusively Patrick Steinhardt
2025-10-30 10:38 ` [PATCH v2 0/8] packfiles: track pack lists via the packfile store Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 1/8] packfile: use a `strmap` to store packs by name Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 2/8] packfile: move the MRU list into the packfile store Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 3/8] http: refactor subsystem to use `packfile_list`s Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 4/8] packfile: fix approximation of object counts Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 5/8] builtin/pack-objects: simplify logic to find kept or nonlocal objects Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 6/8] packfile: move list of packs into the packfile store Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 7/8] packfile: always add packfiles to MRU when adding a pack Patrick Steinhardt
2025-10-30 10:38   ` [PATCH v2 8/8] packfile: track packs via the MRU list exclusively Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQKXpM8g3Oy3DVAa@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).