From: Patrick Steinhardt <ps@pks.im>
To: Justin Tobler <jltobler@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 05/10] packfile: move packfile store into object source
Date: Thu, 18 Dec 2025 07:50:48 +0100 [thread overview]
Message-ID: <aUOkSDz_UTSM92YT@pks.im> (raw)
In-Reply-To: <coihoxu2loroo4xrlog226c6ib7hy6wfsvlbfvanhp3xksezie@buisls2ii3d5>
On Wed, Dec 17, 2025 at 06:52:53PM -0600, Justin Tobler wrote:
> On 25/12/15 08:36AM, Patrick Steinhardt wrote:
> > diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> > index 7849005ccb..b8a7757cfd 100644
> > --- a/builtin/fast-import.c
> > +++ b/builtin/fast-import.c
> > @@ -900,7 +900,7 @@ static void end_packfile(void)
> > idx_name = keep_pack(create_index());
> >
> > /* Register the packfile with core git's machinery. */
> > - new_p = packfile_store_load_pack(pack_data->repo->objects->packfiles,
> > + new_p = packfile_store_load_pack(pack_data->repo->objects->sources->packfiles,
> > idx_name, 1);
>
> Naive question: it looks likes we are only using the primary source's packfile
> store here. Is that fine?
We write the new packfiles to the primary source, yup. That's in line
how we write objects in general: they always end up in the primary
source. In contrast to that, reading data will involve all sources.
> > diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> > index a7e901e49c..b67fb0256c 100644
> > --- a/builtin/index-pack.c
> > +++ b/builtin/index-pack.c
> > @@ -1638,7 +1638,7 @@ static void final(const char *final_pack_name, const char *curr_pack_name,
> > hash, "idx", 1);
> >
> > if (do_fsck_object && startup_info->have_repository)
> > - packfile_store_load_pack(the_repository->objects->packfiles,
> > + packfile_store_load_pack(the_repository->objects->sources->packfiles,
>
> Does packfile_store_load_pack() also load stores from all sources?
No, it doesn't, and in this case here we also don't want to. The sources
have already been loaded beforehand if we have a repo, but here we want
to activate the new packfile that we have just indexed. This is done so
that we can then perform fsck checks for the objects contained in that
specific packfile. And hence, we really only want to activate that
single packfile with our primary packfile store.
> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index e86b8f387a..7fd90a9996 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -1529,49 +1529,53 @@ static int want_cruft_object_mtime(struct repository *r,
> > const struct object_id *oid,
> > unsigned flags, uint32_t mtime)
> > {
> > - struct packed_git **cache = packfile_store_get_kept_pack_cache(r->objects->packfiles, flags);
> > + struct odb_source *source;
> >
> > - for (; *cache; cache++) {
> > - struct packed_git *p = *cache;
> > - off_t ofs;
> > - uint32_t candidate_mtime;
> > + for (source = r->objects->sources; source; source = source->next) {
> > + struct packed_git **cache = packfile_store_get_kept_pack_cache(source->packfiles, flags);
> >
> > - ofs = find_pack_entry_one(oid, p);
> > - if (!ofs)
> > - continue;
> > + for (; *cache; cache++) {
> > + struct packed_git *p = *cache;
> > + off_t ofs;
> > + uint32_t candidate_mtime;
> >
> > - /*
> > - * We have a copy of the object 'oid' in a non-cruft
> > - * pack. We can avoid packing an additional copy
> > - * regardless of what the existing copy's mtime is since
> > - * it is outside of a cruft pack.
> > - */
> > - if (!p->is_cruft)
> > - return 0;
> > -
> > - /*
> > - * If we have a copy of the object 'oid' in a cruft
> > - * pack, then either read the cruft pack's mtime for
> > - * that object, or, if that can't be loaded, assume the
> > - * pack's mtime itself.
> > - */
> > - if (!load_pack_mtimes(p)) {
> > - uint32_t pos;
> > - if (offset_to_pack_pos(p, ofs, &pos) < 0)
> > + ofs = find_pack_entry_one(oid, p);
> > + if (!ofs)
> > continue;
> > - candidate_mtime = nth_packed_mtime(p, pos);
> > - } else {
> > - candidate_mtime = p->mtime;
> > - }
> >
> > - /*
> > - * We have a surviving copy of the object in a cruft
> > - * pack whose mtime is greater than or equal to the one
> > - * we are considering. We can thus avoid packing an
> > - * additional copy of that object.
> > - */
> > - if (mtime <= candidate_mtime)
> > - return 0;
> > + /*
> > + * We have a copy of the object 'oid' in a non-cruft
> > + * pack. We can avoid packing an additional copy
> > + * regardless of what the existing copy's mtime is since
> > + * it is outside of a cruft pack.
> > + */
> > + if (!p->is_cruft)
> > + return 0;
> > +
> > + /*
> > + * If we have a copy of the object 'oid' in a cruft
> > + * pack, then either read the cruft pack's mtime for
> > + * that object, or, if that can't be loaded, assume the
> > + * pack's mtime itself.
> > + */
> > + if (!load_pack_mtimes(p)) {
> > + uint32_t pos;
> > + if (offset_to_pack_pos(p, ofs, &pos) < 0)
> > + continue;
> > + candidate_mtime = nth_packed_mtime(p, pos);
> > + } else {
> > + candidate_mtime = p->mtime;
> > + }
> > +
> > + /*
> > + * We have a surviving copy of the object in a cruft
> > + * pack whose mtime is greater than or equal to the one
> > + * we are considering. We can thus avoid packing an
> > + * additional copy of that object.
> > + */
> > + if (mtime <= candidate_mtime)
> > + return 0;
> > + }
>
> Ok, this all looks the same, but repeated for each source.
>
> Naive question: If a the same OID were to exist in multiple ODB sources,
> could this effect the behavior now that there are separate packfile
> stores?
At least it would be the same behaviour as beforehand. We used to
iterate through all packfiles before, and we still do that now until we
find either:
- A non-cruft pack that contains the object.
- A cruft pack that contains it with a more recent mtime.
The idea here is that we only want to pack the object into a _new_ cruft
pack if it would have a more recent mtime in such a newer pack. If that
wouldn't be the case there is no need to write the object into the cruft
pack that we're just about to write -- it would be immediately stale
anyway.
So it's basically the whole idea of this loop that the object may exist
in multiple packfiles, and we're trying to figure out what our actions
should be based on that.
> > diff --git a/http.c b/http.c
> > index 41f850db16..7815f144de 100644
> > --- a/http.c
> > +++ b/http.c
> > @@ -2544,7 +2544,7 @@ void http_install_packfile(struct packed_git *p,
> > struct packfile_list *list_to_remove_from)
> > {
> > packfile_list_remove(list_to_remove_from, p);
> > - packfile_store_add_pack(the_repository->objects->packfiles, p);
> > + packfile_store_add_pack(the_repository->objects->sources->packfiles, p);
>
> Here we are always adding the packfile to the primary packfile store. A
> thus relies of the primary source having a packfile store. In order to
> move to pluggable ODBs I suppose this is one of spots that will have to
> be resolved later.
Indeed, we will eventually want to grow a new interfaces that allows us
to write a whole pack into the ODB. From thereon, the backend can then
do whatever it wants with the packfile.
Patrick
next prev parent reply other threads:[~2025-12-18 6:50 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-15 7:36 [PATCH 00/10] Start tracking packfiles per object database source Patrick Steinhardt
2025-12-15 7:36 ` [PATCH 01/10] packfile: create store via its owning source Patrick Steinhardt
2025-12-15 21:30 ` Justin Tobler
2025-12-16 8:36 ` Patrick Steinhardt
2025-12-15 7:36 ` [PATCH 02/10] packfile: pass source to `prepare_pack()` Patrick Steinhardt
2025-12-15 21:38 ` Justin Tobler
2025-12-15 7:36 ` [PATCH 03/10] packfile: refactor kept-pack cache to work with packfile stores Patrick Steinhardt
2025-12-15 21:56 ` Justin Tobler
2025-12-16 9:09 ` Patrick Steinhardt
2025-12-15 7:36 ` [PATCH 04/10] packfile: refactor misleading code when unusing pack windows Patrick Steinhardt
2025-12-15 7:36 ` [PATCH 05/10] packfile: move packfile store into object source Patrick Steinhardt
2025-12-18 0:52 ` Justin Tobler
2025-12-18 6:50 ` Patrick Steinhardt [this message]
2025-12-15 7:36 ` [PATCH 06/10] packfile: only prepare owning store in `packfile_store_get_packs()` Patrick Steinhardt
2025-12-18 0:58 ` Justin Tobler
2025-12-15 7:36 ` [PATCH 07/10] packfile: only prepare owning store in `packfile_store_prepare()` Patrick Steinhardt
2025-12-15 7:36 ` [PATCH 08/10] packfile: inline `find_kept_pack_entry()` Patrick Steinhardt
2025-12-18 1:06 ` Justin Tobler
2025-12-18 6:48 ` Patrick Steinhardt
2025-12-15 7:36 ` [PATCH 09/10] packfile: refactor `find_pack_entry()` to work on the packfile store Patrick Steinhardt
2025-12-15 7:36 ` [PATCH 10/10] packfile: move MIDX into " Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 00/10] Start tracking packfiles per object database source Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 01/10] packfile: create store via its owning source Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 02/10] packfile: pass source to `prepare_pack()` Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 03/10] packfile: refactor kept-pack cache to work with packfile stores Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 04/10] packfile: refactor misleading code when unusing pack windows Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 05/10] packfile: move packfile store into object source Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 06/10] packfile: only prepare owning store in `packfile_store_get_packs()` Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 07/10] packfile: only prepare owning store in `packfile_store_prepare()` Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 08/10] packfile: inline `find_kept_pack_entry()` Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 09/10] packfile: refactor `find_pack_entry()` to work on the packfile store Patrick Steinhardt
2025-12-18 6:55 ` [PATCH v2 10/10] packfile: move MIDX into " Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aUOkSDz_UTSM92YT@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=jltobler@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).