git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH 1/8] midx: start tracking per object database source
Date: Thu, 10 Jul 2025 19:10:02 -0400	[thread overview]
Message-ID: <aHBISjW/OErOy3Cp@nand.local> (raw)
In-Reply-To: <20250709-b4-pks-midx-via-odb-alternate-v1-1-f31150d21331@pks.im>

On Wed, Jul 09, 2025 at 09:54:49AM +0200, Patrick Steinhardt wrote:
> Multi-pack indices are tracked via `struct multi_pack_index`. This data
> structure is stored as a linked list inside `struct object_database`,
> which is the global database that spans across all of the object
> sources.
>
> This layout causes two problems:
>
>   - Multi-pack indices aren't global to an object database, but instead
>     there can be one multi-pack index per source. This creates a
>     mismatch between the on-disk layout and how things are organized in
>     the object database subsystems and makes some parts, like figuring
>     out whether a source has an MIDX, quite awkward.

This is a little confusing to me. What do we consider to be an "object
database", and what do we consider to be a "source"?

For instance, if I have a repository with one or more alternates, I
would imagine that each alternate is a separate "source", and the
sources together comprise the object database. Does that match the way
you're thinking about it?

If so, that makes sense. But if not (i.e., we consider all alternates to
belong to the same object database and share a single source), then I
don't know how this will interact with the existing MIDX alternates
mechanism.

Some clarification here would be helpful, I think.

> Refactor `prepare_multi_pack_index_one()` so that it works on a specific
> source, which allows us to easily store a pointer to the multi-pack
> index inside of it. For now, this pointer exists next to the existing
> linked list we have in the object database. Users will be adjusted in
> subsequent patches to instead use the per-source pointers.

OK.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  midx.c     | 19 +++++++++++--------
>  midx.h     |  7 ++++---
>  odb.h      | 11 +++++++++--
>  packfile.c |  4 +++-
>  4 files changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 3c5bc821730..a91231bfcdf 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -724,28 +724,29 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
>  	return 0;
>  }
>
> -int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local)
> +int prepare_multi_pack_index_one(struct odb_source *source, int local)

OK, so the combination of a repository and "object_dir" path becomes a
new "struct odb_source", which makes sense. But shouldn't "local" also
be a property of the odb_source?

I am not familiar enough with the odb_source work thus far to know
whether that is a good idea or not, but just something that stood out
to me while reading.

>  {
> +	struct repository *r = source->odb->repo;
>  	struct multi_pack_index *m;
> -	struct multi_pack_index *m_search;
>
>  	prepare_repo_settings(r);
>  	if (!r->settings.core_multi_pack_index)
>  		return 0;
>
> -	for (m_search = r->objects->multi_pack_index; m_search; m_search = m_search->next)
> -		if (!strcmp(object_dir, m_search->object_dir))
> -			return 1;
> -
> -	m = load_multi_pack_index(r, object_dir, local);
> +	if (source->multi_pack_index)
> +		return 1;

This makes sense. The old code was walking along the linked list of
MIDXs across alternates to see if we have already loaded one for the
given object_dir. But since there is a one-to-one relationship between
odb_source and object_dir, it suffices to see whether or not we have
loaded the MIDX for a given source at all.

> +	m = load_multi_pack_index(r, source->path, local);
>  	if (m) {
>  		struct multi_pack_index *mp = r->objects->multi_pack_index;
>  		if (mp) {
>  			m->next = mp->next;
>  			mp->next = m;
> -		} else
> +		} else {
>  			r->objects->multi_pack_index = m;
> +		}

I am glad that we are cleaning these up (since our style guide says that
if one or more if/else branch(es) has braces, they all should), but it
does make reading this patch a little more difficult. Not a big deal,
but I would have rather seen this as a separate cleanup patch.

> +		source->multi_pack_index = m;

As an aside, I see that we're calling this new field "multi_pack_index".
I wonder if it would be better to call it "midx", since typing out
"multi_pack_index" is a little verbose, and "midx" is a common
abbreviation that I don't think we would cause any confusion with.

> @@ -837,6 +838,8 @@ void clear_midx_file(struct repository *r)
>  	if (r->objects && r->objects->multi_pack_index) {
>  		close_midx(r->objects->multi_pack_index);
>  		r->objects->multi_pack_index = NULL;
> +		for (struct odb_source *source = r->objects->sources; source; source = source->next)
> +			source->multi_pack_index = NULL;

Makes sense.

> diff --git a/midx.h b/midx.h
> index 9d1374cbd58..b1626a9a7c4 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -3,11 +3,12 @@
>
>  #include "string-list.h"
>
> +struct bitmapped_pack;
> +struct git_hash_algo;
>  struct object_id;
> +struct odb_source;
>  struct pack_entry;
>  struct repository;
> -struct bitmapped_pack;
> -struct git_hash_algo;

(I'm nit-picking, but a similar note here that the unrelated changes
make it harder to see what is actually going on in this hunk, which is
adding a declaration for "struct odb_source".)

> diff --git a/odb.h b/odb.h
> index e922f256802..8e79c7be520 100644
> --- a/odb.h
> +++ b/odb.h
> @@ -9,10 +9,11 @@
>  #include "string-list.h"
>  #include "thread-utils.h"
>
> +struct multi_pack_index;
>  struct oidmap;
>  struct oidtree;
> -struct strbuf;
>  struct repository;
> +struct strbuf;

(Same here.)

Thanks,
Taylor

  parent reply	other threads:[~2025-07-10 23:10 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-09  7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 1/8] midx: start tracking per object database source Patrick Steinhardt
2025-07-10 20:56   ` Justin Tobler
2025-07-10 23:10   ` Taylor Blau [this message]
2025-07-10 23:19     ` Taylor Blau
2025-07-15  8:26       ` Patrick Steinhardt
2025-07-15  8:26     ` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
2025-07-10 21:07   ` Justin Tobler
2025-07-10 23:14   ` Taylor Blau
2025-07-15  8:26     ` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 3/8] midx: stop using linked list when closing MIDX Patrick Steinhardt
2025-07-10 21:31   ` Justin Tobler
2025-07-15  8:26     ` Patrick Steinhardt
2025-07-10 23:22   ` Taylor Blau
2025-07-15  8:26     ` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 4/8] midx: track whether we have loaded the MIDX Patrick Steinhardt
2025-07-10 22:16   ` Justin Tobler
2025-07-10 23:26     ` Taylor Blau
2025-07-15  8:27       ` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
2025-07-10 22:35   ` Justin Tobler
2025-07-10 23:56   ` Taylor Blau
2025-07-15  8:27     ` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 6/8] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 7/8] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
2025-07-09  7:54 ` [PATCH 8/8] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
2025-07-10 22:48   ` Justin Tobler
2025-07-09 22:04 ` [PATCH 0/8] odb: track multi-pack-indices via their object sources Junio C Hamano
2025-07-10 23:58   ` Taylor Blau
2025-07-15  8:27     ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 1/7] midx: start tracking per object database source Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 2/7] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 3/7] midx: stop using linked list when closing MIDX Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 4/7] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 5/7] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 6/7] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
2025-07-15 11:29   ` [PATCH v2 7/7] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
2025-07-15 21:59   ` [PATCH v2 0/7] odb: track multi-pack-indices via their object sources Justin Tobler
2025-07-23 21:22   ` Junio C Hamano
2025-07-24  8:00     ` Patrick Steinhardt
2025-08-12 21:58       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHBISjW/OErOy3Cp@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).