From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources
Date: Thu, 10 Jul 2025 19:56:28 -0400 [thread overview]
Message-ID: <aHBTLFvL+eQcw6J0@nand.local> (raw)
In-Reply-To: <20250709-b4-pks-midx-via-odb-alternate-v1-5-f31150d21331@pks.im>
On Wed, Jul 09, 2025 at 09:54:53AM +0200, Patrick Steinhardt wrote:
> The function `get_multi_pack_index()` loads multi-pack indices via
> `prepare_packed_git()` and then returns the linked list of multi-pack
> indices that is stored in `struct object_database`. That list is in the
> process of being removed though in favor of storing the MIDX as part of
> the object database source it belongs to.
>
> Refactor `get_multi_pack_index()` so that it returns the multi-pack
> index for a single object source. Callers are now expected to call this
> function for each source they are interested in. This requires them to
> iterate through alternates, so we have to prepare alternate object
> sources before doing so.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> builtin/pack-objects.c | 9 ++++++---
> builtin/repack.c | 4 ++--
> midx-write.c | 22 ++--------------------
> object-name.c | 21 ++++++++++++++-------
> pack-bitmap.c | 20 ++++++++++++++------
> packfile.c | 30 +++++++++++-------------------
> packfile.h | 3 +--
> 7 files changed, 50 insertions(+), 59 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 5781dec9808..f889e69e07d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1701,7 +1701,6 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
> {
> int want;
> struct list_head *pos;
> - struct multi_pack_index *m;
>
> if (!exclude && local && has_loose_object_nonlocal(oid))
> return 0;
> @@ -1721,9 +1720,13 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
> *found_offset = 0;
> }
>
> - for (m = get_multi_pack_index(the_repository); m; m = m->next) {
> + odb_prepare_alternates(the_repository->objects);
> +
> + for (struct odb_source *source = the_repository->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
All makes sense up to this point. I would suggest adding the following
to the top of this function:
struct repository *r = the_repository;
struct odb_source *source;
to shorten up these lines a bit, but I'm nit-picking so if you feel
strongly that the existing patch is clearer, I won't object.
> struct pack_entry e;
> - if (fill_midx_entry(the_repository, oid, &e, m)) {
> +
> + if (m && fill_midx_entry(the_repository, oid, &e, m)) {
Good.
> want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset, found_mtime);
> if (want != -1)
> return want;
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 9bbf032b6dd..5956df5d927 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -218,9 +218,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
> static void remove_redundant_pack(const char *dir_name, const char *base_name)
> {
> struct strbuf buf = STRBUF_INIT;
> - struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
> + struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);
Is the first source always guaranteed to be the local one? I assume that the
answer here is "yes", and there are certainly other places in the code
where we make a similar assumption. But just wanted to make sure as
this popped into my head while reading.
> strbuf_addf(&buf, "%s.pack", base_name);
> - if (m && midx_contains_pack(m, buf.buf))
> + if (m && m->local && midx_contains_pack(m, buf.buf))
...hmm, maybe not?
> clear_midx_file(the_repository);
> strbuf_insertf(&buf, 0, "%s/", dir_name);
> unlink_pack_path(buf.buf, 1);
> diff --git a/midx-write.c b/midx-write.c
> index f2cfb85476e..c1ae62d3549 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
> static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
> const char *object_dir)
> {
> - struct multi_pack_index *result = NULL;
> - struct multi_pack_index *cur;
> - char *obj_dir_real = real_pathdup(object_dir, 1);
> - struct strbuf cur_path_real = STRBUF_INIT;
> -
> - /* Ensure the given object_dir is local, or a known alternate. */
> - odb_find_source(r->objects, obj_dir_real);
> -
> - for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
> - strbuf_realpath(&cur_path_real, cur->object_dir, 1);
> - if (!strcmp(obj_dir_real, cur_path_real.buf)) {
> - result = cur;
> - goto cleanup;
> - }
> - }
> -
> -cleanup:
> - free(obj_dir_real);
> - strbuf_release(&cur_path_real);
> - return result;
> + struct odb_source *source = odb_find_source(r->objects, object_dir);
> + return get_multi_pack_index(source);
When I first read this I wondered what would happen if we passed in an
unknown object_dir such that odb_find_source() returned NULL. But that
function will never return NULL, and instead will die() if given a bogus
object_dir.
So this is fine, though I would have imagined that we'd return NULL
within odb_find_source() and let the caller die() (or not).
> }
>
> static int fill_packs_from_midx(struct write_midx_context *ctx,
> diff --git a/object-name.c b/object-name.c
> index ddafe7f9b13..1e7fdcb90a8 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -198,16 +198,19 @@ static void unique_in_pack(struct packed_git *p,
>
> static void find_short_packed_object(struct disambiguate_state *ds)
> {
> - struct multi_pack_index *m;
> struct packed_git *p;
>
> /* Skip, unless oids from the storage hash algorithm are wanted */
> if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> return;
>
> - for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
> - m = m->next)
> - unique_in_midx(m, ds);
> + odb_prepare_alternates(ds->repo->objects);
> + for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + if (m)
> + unique_in_midx(m, ds);
> + }
> +
Makes sense, though now having seen this pattern a few times, I am
wondering if it would be worth it to add a utility function that takes a
callback and iterates over the various MIDXs. But perhaps that is taking
DRY-ing things up a little too far ;-).
For what it's worth, I do think that what you wrote here makes more
logical sense: MIDXs are tied to individual alternate object DBs, which
here means that there is one MIDX per "struct odb_source". It just is a
little more verbose to type out.
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 0a4af199c05..7b51d381837 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -692,14 +692,16 @@ static int open_midx_bitmap(struct repository *r,
> struct bitmap_index *bitmap_git)
> {
> int ret = -1;
> - struct multi_pack_index *midx;
>
> assert(!bitmap_git->map);
>
> - for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
> - if (!open_midx_bitmap_1(bitmap_git, midx))
> + odb_prepare_alternates(r->objects);
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *midx = get_multi_pack_index(source);
> + if (midx && !open_midx_bitmap_1(bitmap_git, midx))
> ret = 0;
> }
Makes sense.
> +
Stray diff?
> @@ -3307,9 +3309,15 @@ int verify_bitmap_files(struct repository *r)
> {
> int res = 0;
>
> - for (struct multi_pack_index *m = get_multi_pack_index(r);
> - m; m = m->next) {
> - char *midx_bitmap_name = midx_bitmap_filename(m);
> + odb_prepare_alternates(r->objects);
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + char *midx_bitmap_name;
> +
> + if (!m)
> + continue;
> +
> + midx_bitmap_name = midx_bitmap_filename(m);
This one looks good as well.
> diff --git a/packfile.c b/packfile.c
> index e5d9d7ac8bc..e1ced050451 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -963,14 +963,17 @@ static void prepare_packed_git(struct repository *r);
> unsigned long repo_approximate_object_count(struct repository *r)
> {
> if (!r->objects->approximate_object_count_valid) {
> - unsigned long count;
> - struct multi_pack_index *m;
> + unsigned long count = 0;
> struct packed_git *p;
>
> prepare_packed_git(r);
I was wondering where the odb_prepare_alternates() call was in this one,
but it is a byproduct of calling prepare_packed_git(). So this spot
looks OK as well.
> - count = 0;
> - for (m = get_multi_pack_index(r); m; m = m->next)
> - count += m->num_objects;
> +
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + if (m)
> + count += m->num_objects;
Unrelated to your patch, I wonder if it makes sense to check for
overflow here. I think so, though presumably if we are overflowing
"unsigned long" then likely we have much bigger problems to worry about
;-).
The rest looks good to me.
Thanks,
Taylor
next prev parent reply other threads:[~2025-07-10 23:56 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 1/8] midx: start tracking per object database source Patrick Steinhardt
2025-07-10 20:56 ` Justin Tobler
2025-07-10 23:10 ` Taylor Blau
2025-07-10 23:19 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
2025-07-10 21:07 ` Justin Tobler
2025-07-10 23:14 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 3/8] midx: stop using linked list when closing MIDX Patrick Steinhardt
2025-07-10 21:31 ` Justin Tobler
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-10 23:22 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 4/8] midx: track whether we have loaded the MIDX Patrick Steinhardt
2025-07-10 22:16 ` Justin Tobler
2025-07-10 23:26 ` Taylor Blau
2025-07-15 8:27 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
2025-07-10 22:35 ` Justin Tobler
2025-07-10 23:56 ` Taylor Blau [this message]
2025-07-15 8:27 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 6/8] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 7/8] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 8/8] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
2025-07-10 22:48 ` Justin Tobler
2025-07-09 22:04 ` [PATCH 0/8] odb: track multi-pack-indices via their object sources Junio C Hamano
2025-07-10 23:58 ` Taylor Blau
2025-07-15 8:27 ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 1/7] midx: start tracking per object database source Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 2/7] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 3/7] midx: stop using linked list when closing MIDX Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 4/7] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 5/7] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 6/7] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 7/7] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
2025-07-15 21:59 ` [PATCH v2 0/7] odb: track multi-pack-indices via their object sources Justin Tobler
2025-07-23 21:22 ` Junio C Hamano
2025-07-24 8:00 ` Patrick Steinhardt
2025-08-12 21:58 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aHBTLFvL+eQcw6J0@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.