* [PATCH 1/8] midx: start tracking per object database source
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
@ 2025-07-09 7:54 ` Patrick Steinhardt
2025-07-10 20:56 ` Justin Tobler
2025-07-10 23:10 ` Taylor Blau
2025-07-09 7:54 ` [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
` (8 subsequent siblings)
9 siblings, 2 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-09 7:54 UTC (permalink / raw)
To: git
Multi-pack indices are tracked via `struct multi_pack_index`. This data
structure is stored as a linked list inside `struct object_database`,
which is the global database that spans across all of the object
sources.
This layout causes two problems:
- Multi-pack indices aren't global to an object database, but instead
there can be one multi-pack index per source. This creates a
mismatch between the on-disk layout and how things are organized in
the object database subsystems and makes some parts, like figuring
out whether a source has an MIDX, quite awkward.
- Multi-pack indices are an implementation detail of how efficient
access for packfiles work. As such, they are neither relevant in the
context of loose objects, nor in a potential future where we have
pluggable backends.
Refactor `prepare_multi_pack_index_one()` so that it works on a specific
source, which allows us to easily store a pointer to the multi-pack
index inside of it. For now, this pointer exists next to the existing
linked list we have in the object database. Users will be adjusted in
subsequent patches to instead use the per-source pointers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
midx.c | 19 +++++++++++--------
midx.h | 7 ++++---
odb.h | 11 +++++++++--
packfile.c | 4 +++-
4 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/midx.c b/midx.c
index 3c5bc821730..a91231bfcdf 100644
--- a/midx.c
+++ b/midx.c
@@ -724,28 +724,29 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
return 0;
}
-int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local)
+int prepare_multi_pack_index_one(struct odb_source *source, int local)
{
+ struct repository *r = source->odb->repo;
struct multi_pack_index *m;
- struct multi_pack_index *m_search;
prepare_repo_settings(r);
if (!r->settings.core_multi_pack_index)
return 0;
- for (m_search = r->objects->multi_pack_index; m_search; m_search = m_search->next)
- if (!strcmp(object_dir, m_search->object_dir))
- return 1;
-
- m = load_multi_pack_index(r, object_dir, local);
+ if (source->multi_pack_index)
+ return 1;
+ m = load_multi_pack_index(r, source->path, local);
if (m) {
struct multi_pack_index *mp = r->objects->multi_pack_index;
if (mp) {
m->next = mp->next;
mp->next = m;
- } else
+ } else {
r->objects->multi_pack_index = m;
+ }
+ source->multi_pack_index = m;
+
return 1;
}
@@ -837,6 +838,8 @@ void clear_midx_file(struct repository *r)
if (r->objects && r->objects->multi_pack_index) {
close_midx(r->objects->multi_pack_index);
r->objects->multi_pack_index = NULL;
+ for (struct odb_source *source = r->objects->sources; source; source = source->next)
+ source->multi_pack_index = NULL;
}
if (remove_path(midx.buf))
diff --git a/midx.h b/midx.h
index 9d1374cbd58..b1626a9a7c4 100644
--- a/midx.h
+++ b/midx.h
@@ -3,11 +3,12 @@
#include "string-list.h"
+struct bitmapped_pack;
+struct git_hash_algo;
struct object_id;
+struct odb_source;
struct pack_entry;
struct repository;
-struct bitmapped_pack;
-struct git_hash_algo;
#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
#define MIDX_VERSION 1
@@ -123,7 +124,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
int midx_contains_pack(struct multi_pack_index *m,
const char *idx_or_pack_name);
int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id);
-int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
+int prepare_multi_pack_index_one(struct odb_source *source, int local);
/*
* Variant of write_midx_file which writes a MIDX containing only the packs
diff --git a/odb.h b/odb.h
index e922f256802..8e79c7be520 100644
--- a/odb.h
+++ b/odb.h
@@ -9,10 +9,11 @@
#include "string-list.h"
#include "thread-utils.h"
+struct multi_pack_index;
struct oidmap;
struct oidtree;
-struct strbuf;
struct repository;
+struct strbuf;
/*
* Compute the exact path an alternate is at and returns it. In case of
@@ -55,6 +56,13 @@ struct odb_source {
/* Map between object IDs for loose objects. */
struct loose_object_map *loose_map;
+ /*
+ * private data
+ *
+ * should only be accessed directly by packfile.c and midx.c
+ */
+ struct multi_pack_index *multi_pack_index;
+
/*
* This is a temporary object store created by the tmp_objdir
* facility. Disable ref updates since the objects in the store
@@ -75,7 +83,6 @@ struct odb_source {
};
struct packed_git;
-struct multi_pack_index;
struct cached_object_entry;
/*
diff --git a/packfile.c b/packfile.c
index af9ccfdba62..16efc2fdca3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -372,6 +372,8 @@ void close_object_store(struct object_database *o)
if (o->multi_pack_index) {
close_midx(o->multi_pack_index);
o->multi_pack_index = NULL;
+ for (struct odb_source *source = o->sources; source; source = source->next)
+ source->multi_pack_index = NULL;
}
close_commit_graph(o);
@@ -1037,7 +1039,7 @@ static void prepare_packed_git(struct repository *r)
odb_prepare_alternates(r->objects);
for (source = r->objects->sources; source; source = source->next) {
int local = (source == r->objects->sources);
- prepare_multi_pack_index_one(r, source->path, local);
+ prepare_multi_pack_index_one(source, local);
prepare_packed_git_one(r, source->path, local);
}
rearrange_packed_git(r);
--
2.50.1.327.g047016eb4a.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH 1/8] midx: start tracking per object database source
2025-07-09 7:54 ` [PATCH 1/8] midx: start tracking per object database source Patrick Steinhardt
@ 2025-07-10 20:56 ` Justin Tobler
2025-07-10 23:10 ` Taylor Blau
1 sibling, 0 replies; 43+ messages in thread
From: Justin Tobler @ 2025-07-10 20:56 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On 25/07/09 09:54AM, Patrick Steinhardt wrote:
> Multi-pack indices are tracked via `struct multi_pack_index`. This data
> structure is stored as a linked list inside `struct object_database`,
> which is the global database that spans across all of the object
> sources.
>
> This layout causes two problems:
>
> - Multi-pack indices aren't global to an object database, but instead
> there can be one multi-pack index per source. This creates a
> mismatch between the on-disk layout and how things are organized in
> the object database subsystems and makes some parts, like figuring
> out whether a source has an MIDX, quite awkward.
In a future where there is a separate implementation for each object
source, each multi-pack index is really only relevant for a particular
source. Some source implementations will also not require a midx to
begin with. For organization purposes, tracking a single midx per source
for now seems much more ideal.
> - Multi-pack indices are an implementation detail of how efficient
> access for packfiles work. As such, they are neither relevant in the
> context of loose objects, nor in a potential future where we have
> pluggable backends.
Agreed.
> Refactor `prepare_multi_pack_index_one()` so that it works on a specific
> source, which allows us to easily store a pointer to the multi-pack
> index inside of it. For now, this pointer exists next to the existing
> linked list we have in the object database. Users will be adjusted in
> subsequent patches to instead use the per-source pointers.
Looking at `prepare_multi_pack_index_one()`, this function is
responsible for checking if a multi-pack index is already loaded in the
linked list and, if not, loading/inserting it into the list. So now we
will want it to handle setting up the object source midx as well.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> midx.c | 19 +++++++++++--------
> midx.h | 7 ++++---
> odb.h | 11 +++++++++--
> packfile.c | 4 +++-
> 4 files changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 3c5bc821730..a91231bfcdf 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -724,28 +724,29 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
> return 0;
> }
>
> -int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local)
> +int prepare_multi_pack_index_one(struct odb_source *source, int local)
> {
> + struct repository *r = source->odb->repo;
> struct multi_pack_index *m;
> - struct multi_pack_index *m_search;
>
> prepare_repo_settings(r);
> if (!r->settings.core_multi_pack_index)
> return 0;
>
> - for (m_search = r->objects->multi_pack_index; m_search; m_search = m_search->next)
> - if (!strcmp(object_dir, m_search->object_dir))
> - return 1;
We are able to avoid searching the linked list because we know if the
source already has a multi_pack_index it must also be inserted into the
linked list. Makes sense.
> - m = load_multi_pack_index(r, object_dir, local);
> + if (source->multi_pack_index)
> + return 1;
>
> + m = load_multi_pack_index(r, source->path, local);
> if (m) {
> struct multi_pack_index *mp = r->objects->multi_pack_index;
> if (mp) {
> m->next = mp->next;
> mp->next = m;
> - } else
> + } else {
> r->objects->multi_pack_index = m;
> + }
> + source->multi_pack_index = m;
We still insert the multi-pack index into the list, but more importantly
we now store it in the source it pertains to. It is a bit awkward that
source now points to a list of midx when we really only care about the
first entry, but I suspect that will be addressed later in the series by
doing away with the list entirely.
> +
> return 1;
> }
>
> @@ -837,6 +838,8 @@ void clear_midx_file(struct repository *r)
> if (r->objects && r->objects->multi_pack_index) {
> close_midx(r->objects->multi_pack_index);
> r->objects->multi_pack_index = NULL;
> + for (struct odb_source *source = r->objects->sources; source; source = source->next)
> + source->multi_pack_index = NULL;
> }
>
> if (remove_path(midx.buf))
> diff --git a/midx.h b/midx.h
> index 9d1374cbd58..b1626a9a7c4 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -3,11 +3,12 @@
>
> #include "string-list.h"
>
> +struct bitmapped_pack;
> +struct git_hash_algo;
> struct object_id;
> +struct odb_source;
> struct pack_entry;
> struct repository;
> -struct bitmapped_pack;
> -struct git_hash_algo;
>
> #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
> #define MIDX_VERSION 1
> @@ -123,7 +124,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
> int midx_contains_pack(struct multi_pack_index *m,
> const char *idx_or_pack_name);
> int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id);
> -int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
> +int prepare_multi_pack_index_one(struct odb_source *source, int local);
>
> /*
> * Variant of write_midx_file which writes a MIDX containing only the packs
> diff --git a/odb.h b/odb.h
> index e922f256802..8e79c7be520 100644
> --- a/odb.h
> +++ b/odb.h
> @@ -9,10 +9,11 @@
> #include "string-list.h"
> #include "thread-utils.h"
>
> +struct multi_pack_index;
> struct oidmap;
> struct oidtree;
> -struct strbuf;
> struct repository;
> +struct strbuf;
>
> /*
> * Compute the exact path an alternate is at and returns it. In case of
> @@ -55,6 +56,13 @@ struct odb_source {
> /* Map between object IDs for loose objects. */
> struct loose_object_map *loose_map;
>
> + /*
> + * private data
> + *
> + * should only be accessed directly by packfile.c and midx.c
> + */
> + struct multi_pack_index *multi_pack_index;
> +
> /*
> * This is a temporary object store created by the tmp_objdir
> * facility. Disable ref updates since the objects in the store
> @@ -75,7 +83,6 @@ struct odb_source {
> };
>
> struct packed_git;
> -struct multi_pack_index;
> struct cached_object_entry;
>
> /*
> diff --git a/packfile.c b/packfile.c
> index af9ccfdba62..16efc2fdca3 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -372,6 +372,8 @@ void close_object_store(struct object_database *o)
> if (o->multi_pack_index) {
> close_midx(o->multi_pack_index);
> o->multi_pack_index = NULL;
> + for (struct odb_source *source = o->sources; source; source = source->next)
> + source->multi_pack_index = NULL;
Ok, so cleanup of the multi-pack index is still handled by recursively
iterating through the global list via `cose_midx()`, but we now unset
the midx in each source. Looks good.
-Justin
> }
>
> close_commit_graph(o);
> @@ -1037,7 +1039,7 @@ static void prepare_packed_git(struct repository *r)
> odb_prepare_alternates(r->objects);
> for (source = r->objects->sources; source; source = source->next) {
> int local = (source == r->objects->sources);
> - prepare_multi_pack_index_one(r, source->path, local);
> + prepare_multi_pack_index_one(source, local);
> prepare_packed_git_one(r, source->path, local);
> }
> rearrange_packed_git(r);
>
> --
> 2.50.1.327.g047016eb4a.dirty
>
>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 1/8] midx: start tracking per object database source
2025-07-09 7:54 ` [PATCH 1/8] midx: start tracking per object database source Patrick Steinhardt
2025-07-10 20:56 ` Justin Tobler
@ 2025-07-10 23:10 ` Taylor Blau
2025-07-10 23:19 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
1 sibling, 2 replies; 43+ messages in thread
From: Taylor Blau @ 2025-07-10 23:10 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Wed, Jul 09, 2025 at 09:54:49AM +0200, Patrick Steinhardt wrote:
> Multi-pack indices are tracked via `struct multi_pack_index`. This data
> structure is stored as a linked list inside `struct object_database`,
> which is the global database that spans across all of the object
> sources.
>
> This layout causes two problems:
>
> - Multi-pack indices aren't global to an object database, but instead
> there can be one multi-pack index per source. This creates a
> mismatch between the on-disk layout and how things are organized in
> the object database subsystems and makes some parts, like figuring
> out whether a source has an MIDX, quite awkward.
This is a little confusing to me. What do we consider to be an "object
database", and what do we consider to be a "source"?
For instance, if I have a repository with one or more alternates, I
would imagine that each alternate is a separate "source", and the
sources together comprise the object database. Does that match the way
you're thinking about it?
If so, that makes sense. But if not (i.e., we consider all alternates to
belong to the same object database and share a single source), then I
don't know how this will interact with the existing MIDX alternates
mechanism.
Some clarification here would be helpful, I think.
> Refactor `prepare_multi_pack_index_one()` so that it works on a specific
> source, which allows us to easily store a pointer to the multi-pack
> index inside of it. For now, this pointer exists next to the existing
> linked list we have in the object database. Users will be adjusted in
> subsequent patches to instead use the per-source pointers.
OK.
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> midx.c | 19 +++++++++++--------
> midx.h | 7 ++++---
> odb.h | 11 +++++++++--
> packfile.c | 4 +++-
> 4 files changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 3c5bc821730..a91231bfcdf 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -724,28 +724,29 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
> return 0;
> }
>
> -int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local)
> +int prepare_multi_pack_index_one(struct odb_source *source, int local)
OK, so the combination of a repository and "object_dir" path becomes a
new "struct odb_source", which makes sense. But shouldn't "local" also
be a property of the odb_source?
I am not familiar enough with the odb_source work thus far to know
whether that is a good idea or not, but just something that stood out
to me while reading.
> {
> + struct repository *r = source->odb->repo;
> struct multi_pack_index *m;
> - struct multi_pack_index *m_search;
>
> prepare_repo_settings(r);
> if (!r->settings.core_multi_pack_index)
> return 0;
>
> - for (m_search = r->objects->multi_pack_index; m_search; m_search = m_search->next)
> - if (!strcmp(object_dir, m_search->object_dir))
> - return 1;
> -
> - m = load_multi_pack_index(r, object_dir, local);
> + if (source->multi_pack_index)
> + return 1;
This makes sense. The old code was walking along the linked list of
MIDXs across alternates to see if we have already loaded one for the
given object_dir. But since there is a one-to-one relationship between
odb_source and object_dir, it suffices to see whether or not we have
loaded the MIDX for a given source at all.
> + m = load_multi_pack_index(r, source->path, local);
> if (m) {
> struct multi_pack_index *mp = r->objects->multi_pack_index;
> if (mp) {
> m->next = mp->next;
> mp->next = m;
> - } else
> + } else {
> r->objects->multi_pack_index = m;
> + }
I am glad that we are cleaning these up (since our style guide says that
if one or more if/else branch(es) has braces, they all should), but it
does make reading this patch a little more difficult. Not a big deal,
but I would have rather seen this as a separate cleanup patch.
> + source->multi_pack_index = m;
As an aside, I see that we're calling this new field "multi_pack_index".
I wonder if it would be better to call it "midx", since typing out
"multi_pack_index" is a little verbose, and "midx" is a common
abbreviation that I don't think we would cause any confusion with.
> @@ -837,6 +838,8 @@ void clear_midx_file(struct repository *r)
> if (r->objects && r->objects->multi_pack_index) {
> close_midx(r->objects->multi_pack_index);
> r->objects->multi_pack_index = NULL;
> + for (struct odb_source *source = r->objects->sources; source; source = source->next)
> + source->multi_pack_index = NULL;
Makes sense.
> diff --git a/midx.h b/midx.h
> index 9d1374cbd58..b1626a9a7c4 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -3,11 +3,12 @@
>
> #include "string-list.h"
>
> +struct bitmapped_pack;
> +struct git_hash_algo;
> struct object_id;
> +struct odb_source;
> struct pack_entry;
> struct repository;
> -struct bitmapped_pack;
> -struct git_hash_algo;
(I'm nit-picking, but a similar note here that the unrelated changes
make it harder to see what is actually going on in this hunk, which is
adding a declaration for "struct odb_source".)
> diff --git a/odb.h b/odb.h
> index e922f256802..8e79c7be520 100644
> --- a/odb.h
> +++ b/odb.h
> @@ -9,10 +9,11 @@
> #include "string-list.h"
> #include "thread-utils.h"
>
> +struct multi_pack_index;
> struct oidmap;
> struct oidtree;
> -struct strbuf;
> struct repository;
> +struct strbuf;
(Same here.)
Thanks,
Taylor
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 1/8] midx: start tracking per object database source
2025-07-10 23:10 ` Taylor Blau
@ 2025-07-10 23:19 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-15 8:26 ` Patrick Steinhardt
1 sibling, 1 reply; 43+ messages in thread
From: Taylor Blau @ 2025-07-10 23:19 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Thu, Jul 10, 2025 at 07:10:02PM -0400, Taylor Blau wrote:
> On Wed, Jul 09, 2025 at 09:54:49AM +0200, Patrick Steinhardt wrote:
> > Multi-pack indices are tracked via `struct multi_pack_index`. This data
> > structure is stored as a linked list inside `struct object_database`,
> > which is the global database that spans across all of the object
> > sources.
> >
> > This layout causes two problems:
> >
> > - Multi-pack indices aren't global to an object database, but instead
> > there can be one multi-pack index per source. This creates a
> > mismatch between the on-disk layout and how things are organized in
> > the object database subsystems and makes some parts, like figuring
> > out whether a source has an MIDX, quite awkward.
>
> This is a little confusing to me. What do we consider to be an "object
> database", and what do we consider to be a "source"?
>
> For instance, if I have a repository with one or more alternates, I
> would imagine that each alternate is a separate "source", and the
> sources together comprise the object database. Does that match the way
> you're thinking about it?
>
> If so, that makes sense. But if not (i.e., we consider all alternates to
> belong to the same object database and share a single source), then I
> don't know how this will interact with the existing MIDX alternates
> mechanism.
>
> Some clarification here would be helpful, I think.
Ahh..., after applying part of this series locally, the documentation in
"struct odb_source" clarifies that (at least as of yet) there is one
source per alternate, along with a source for the local object database,
and that these collection of sources comprise the object database.
Makes sense.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/8] midx: start tracking per object database source
2025-07-10 23:19 ` Taylor Blau
@ 2025-07-15 8:26 ` Patrick Steinhardt
0 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 8:26 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
On Thu, Jul 10, 2025 at 07:19:16PM -0400, Taylor Blau wrote:
> On Thu, Jul 10, 2025 at 07:10:02PM -0400, Taylor Blau wrote:
> > On Wed, Jul 09, 2025 at 09:54:49AM +0200, Patrick Steinhardt wrote:
> > > Multi-pack indices are tracked via `struct multi_pack_index`. This data
> > > structure is stored as a linked list inside `struct object_database`,
> > > which is the global database that spans across all of the object
> > > sources.
> > >
> > > This layout causes two problems:
> > >
> > > - Multi-pack indices aren't global to an object database, but instead
> > > there can be one multi-pack index per source. This creates a
> > > mismatch between the on-disk layout and how things are organized in
> > > the object database subsystems and makes some parts, like figuring
> > > out whether a source has an MIDX, quite awkward.
> >
> > This is a little confusing to me. What do we consider to be an "object
> > database", and what do we consider to be a "source"?
> >
> > For instance, if I have a repository with one or more alternates, I
> > would imagine that each alternate is a separate "source", and the
> > sources together comprise the object database. Does that match the way
> > you're thinking about it?
> >
> > If so, that makes sense. But if not (i.e., we consider all alternates to
> > belong to the same object database and share a single source), then I
> > don't know how this will interact with the existing MIDX alternates
> > mechanism.
> >
> > Some clarification here would be helpful, I think.
>
> Ahh..., after applying part of this series locally, the documentation in
> "struct odb_source" clarifies that (at least as of yet) there is one
> source per alternate, along with a source for the local object database,
> and that these collection of sources comprise the object database.
>
> Makes sense.
I'll still add a note, as all of this is a rather recent development.
Doesn't hurt to reiterate some of those points.
Patrick
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 1/8] midx: start tracking per object database source
2025-07-10 23:10 ` Taylor Blau
2025-07-10 23:19 ` Taylor Blau
@ 2025-07-15 8:26 ` Patrick Steinhardt
1 sibling, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 8:26 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
On Thu, Jul 10, 2025 at 07:10:02PM -0400, Taylor Blau wrote:
> On Wed, Jul 09, 2025 at 09:54:49AM +0200, Patrick Steinhardt wrote:
> > Multi-pack indices are tracked via `struct multi_pack_index`. This data
> > structure is stored as a linked list inside `struct object_database`,
> > which is the global database that spans across all of the object
> > sources.
> >
> > This layout causes two problems:
> >
> > - Multi-pack indices aren't global to an object database, but instead
> > there can be one multi-pack index per source. This creates a
> > mismatch between the on-disk layout and how things are organized in
> > the object database subsystems and makes some parts, like figuring
> > out whether a source has an MIDX, quite awkward.
>
> This is a little confusing to me. What do we consider to be an "object
> database", and what do we consider to be a "source"?
>
> For instance, if I have a repository with one or more alternates, I
> would imagine that each alternate is a separate "source", and the
> sources together comprise the object database. Does that match the way
> you're thinking about it?
>
> If so, that makes sense. But if not (i.e., we consider all alternates to
> belong to the same object database and share a single source), then I
> don't know how this will interact with the existing MIDX alternates
> mechanism.
>
> Some clarification here would be helpful, I think.
I'll add some, sure. This whole infra is currently developing, so it
doesn't hurt to reiterate some of the points that have already landed.
> > diff --git a/midx.c b/midx.c
> > index 3c5bc821730..a91231bfcdf 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -724,28 +724,29 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
> > return 0;
> > }
> >
> > -int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local)
> > +int prepare_multi_pack_index_one(struct odb_source *source, int local)
>
> OK, so the combination of a repository and "object_dir" path becomes a
> new "struct odb_source", which makes sense. But shouldn't "local" also
> be a property of the odb_source?
>
> I am not familiar enough with the odb_source work thus far to know
> whether that is a good idea or not, but just something that stood out
> to me while reading.
Yes, you're right on point. I have a follow-up patch series that'll
eventually get rid of some duplicate information that is now present in
the ODB source connected to an MIDX.
[snip]
> > + source->multi_pack_index = m;
>
> As an aside, I see that we're calling this new field "multi_pack_index".
> I wonder if it would be better to call it "midx", since typing out
> "multi_pack_index" is a little verbose, and "midx" is a common
> abbreviation that I don't think we would cause any confusion with.
Sure, I can do that.
> > diff --git a/midx.h b/midx.h
> > index 9d1374cbd58..b1626a9a7c4 100644
> > --- a/midx.h
> > +++ b/midx.h
> > @@ -3,11 +3,12 @@
> >
> > #include "string-list.h"
> >
> > +struct bitmapped_pack;
> > +struct git_hash_algo;
> > struct object_id;
> > +struct odb_source;
> > struct pack_entry;
> > struct repository;
> > -struct bitmapped_pack;
> > -struct git_hash_algo;
>
> (I'm nit-picking, but a similar note here that the unrelated changes
> make it harder to see what is actually going on in this hunk, which is
> adding a declaration for "struct odb_source".)
>
> > diff --git a/odb.h b/odb.h
> > index e922f256802..8e79c7be520 100644
> > --- a/odb.h
> > +++ b/odb.h
> > @@ -9,10 +
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 1/8] midx: start tracking per object database source Patrick Steinhardt
@ 2025-07-09 7:54 ` Patrick Steinhardt
2025-07-10 21:07 ` Justin Tobler
2025-07-10 23:14 ` Taylor Blau
2025-07-09 7:54 ` [PATCH 3/8] midx: stop using linked list when closing MIDX Patrick Steinhardt
` (7 subsequent siblings)
9 siblings, 2 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-09 7:54 UTC (permalink / raw)
To: git
In the preceding commit we have refactored how we load multi-pack
indices so that we take take the source as input for which we want to
load the MIDX. As part of this refactoring we started to store a pointer
to the MIDX in `struct odb_source` itself.
Refactor loading of packfiles in the same way: instead of passing in the
object directory, we now pass in the source for which we want to load
packfiles. This allows us to simplify the code because we don't have to
search for a corresponding MIDX anymore, but we can instead directly use
the MIDX that we have already prepared beforehand.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/packfile.c b/packfile.c
index 16efc2fdca3..b43dd2fe6cb 100644
--- a/packfile.c
+++ b/packfile.c
@@ -935,22 +935,17 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
report_garbage(PACKDIR_FILE_GARBAGE, full_name);
}
-static void prepare_packed_git_one(struct repository *r, char *objdir, int local)
+static void prepare_packed_git_one(struct odb_source *source, int local)
{
- struct prepare_pack_data data;
struct string_list garbage = STRING_LIST_INIT_DUP;
+ struct prepare_pack_data data = {
+ .m = source->multi_pack_index,
+ .r = source->odb->repo,
+ .garbage = &garbage,
+ .local = local,
+ };
- data.m = r->objects->multi_pack_index;
-
- /* look for the multi-pack-index for this object directory */
- while (data.m && strcmp(data.m->object_dir, objdir))
- data.m = data.m->next;
-
- data.r = r;
- data.garbage = &garbage;
- data.local = local;
-
- for_each_file_in_pack_dir(objdir, prepare_pack, &data);
+ for_each_file_in_pack_dir(source->path, prepare_pack, &data);
report_pack_garbage(data.garbage);
string_list_clear(data.garbage, 0);
@@ -1040,7 +1035,7 @@ static void prepare_packed_git(struct repository *r)
for (source = r->objects->sources; source; source = source->next) {
int local = (source == r->objects->sources);
prepare_multi_pack_index_one(source, local);
- prepare_packed_git_one(r, source->path, local);
+ prepare_packed_git_one(source, local);
}
rearrange_packed_git(r);
--
2.50.1.327.g047016eb4a.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources
2025-07-09 7:54 ` [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
@ 2025-07-10 21:07 ` Justin Tobler
2025-07-10 23:14 ` Taylor Blau
1 sibling, 0 replies; 43+ messages in thread
From: Justin Tobler @ 2025-07-10 21:07 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On 25/07/09 09:54AM, Patrick Steinhardt wrote:
> In the preceding commit we have refactored how we load multi-pack
> indices so that we take take the source as input for which we want to
s/take take/take/
> load the MIDX. As part of this refactoring we started to store a pointer
> to the MIDX in `struct odb_source` itself.
>
> Refactor loading of packfiles in the same way: instead of passing in the
> object directory, we now pass in the source for which we want to load
> packfiles. This allows us to simplify the code because we don't have to
> search for a corresponding MIDX anymore, but we can instead directly use
> the MIDX that we have already prepared beforehand.
The `odb_source` now contains readily contains MIDX, objdir, and
repository which allows us to simplify. Nice
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> packfile.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/packfile.c b/packfile.c
> index 16efc2fdca3..b43dd2fe6cb 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -935,22 +935,17 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
> report_garbage(PACKDIR_FILE_GARBAGE, full_name);
> }
>
> -static void prepare_packed_git_one(struct repository *r, char *objdir, int local)
> +static void prepare_packed_git_one(struct odb_source *source, int local)
> {
> - struct prepare_pack_data data;
> struct string_list garbage = STRING_LIST_INIT_DUP;
> + struct prepare_pack_data data = {
> + .m = source->multi_pack_index,
> + .r = source->odb->repo,
> + .garbage = &garbage,
> + .local = local,
> + };
>
> - data.m = r->objects->multi_pack_index;
> -
> - /* look for the multi-pack-index for this object directory */
> - while (data.m && strcmp(data.m->object_dir, objdir))
> - data.m = data.m->next;
Much simpler now :)
> -
> - data.r = r;
> - data.garbage = &garbage;
> - data.local = local;
> -
> - for_each_file_in_pack_dir(objdir, prepare_pack, &data);
> + for_each_file_in_pack_dir(source->path, prepare_pack, &data);
>
> report_pack_garbage(data.garbage);
> string_list_clear(data.garbage, 0);
> @@ -1040,7 +1035,7 @@ static void prepare_packed_git(struct repository *r)
> for (source = r->objects->sources; source; source = source->next) {
> int local = (source == r->objects->sources);
> prepare_multi_pack_index_one(source, local);
> - prepare_packed_git_one(r, source->path, local);
> + prepare_packed_git_one(source, local);
> }
> rearrange_packed_git(r);
>
>
> --
> 2.50.1.327.g047016eb4a.dirty
>
>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources
2025-07-09 7:54 ` [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
2025-07-10 21:07 ` Justin Tobler
@ 2025-07-10 23:14 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
1 sibling, 1 reply; 43+ messages in thread
From: Taylor Blau @ 2025-07-10 23:14 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Wed, Jul 09, 2025 at 09:54:50AM +0200, Patrick Steinhardt wrote:
> In the preceding commit we have refactored how we load multi-pack
> indices so that we take take the source as input for which we want to
> load the MIDX. As part of this refactoring we started to store a pointer
> to the MIDX in `struct odb_source` itself.
The first sentence here is a little confusing, but may read more clearly
if written as:
In the preceding commit we refactored how we load multi-pack
indices to take a corresponding "source" as input.
> Refactor loading of packfiles in the same way: instead of passing in the
> object directory, we now pass in the source for which we want to load
s/for/from
> packfiles. This allows us to simplify the code because we don't have to
> search for a corresponding MIDX anymore, but we can instead directly use
> the MIDX that we have already prepared beforehand.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> packfile.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/packfile.c b/packfile.c
> index 16efc2fdca3..b43dd2fe6cb 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -935,22 +935,17 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
> report_garbage(PACKDIR_FILE_GARBAGE, full_name);
> }
>
> -static void prepare_packed_git_one(struct repository *r, char *objdir, int local)
> +static void prepare_packed_git_one(struct odb_source *source, int local)
> {
> - struct prepare_pack_data data;
> struct string_list garbage = STRING_LIST_INIT_DUP;
> + struct prepare_pack_data data = {
> + .m = source->multi_pack_index,
> + .r = source->odb->repo,
> + .garbage = &garbage,
> + .local = local,
> + };
>
> - data.m = r->objects->multi_pack_index;
> -
> - /* look for the multi-pack-index for this object directory */
> - while (data.m && strcmp(data.m->object_dir, objdir))
> - data.m = data.m->next;
Right, since we know that the MIDX corresponding to this source belongs
to the same "object_dir" path. Having an ASSERT() here may make that
more clear, but this change looks correct to me.
I am still a little unclear on how sources and ODBs correspond to one
another, but under my working assumption from the previous patch, I
think this is right.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources
2025-07-10 23:14 ` Taylor Blau
@ 2025-07-15 8:26 ` Patrick Steinhardt
0 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 8:26 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
On Thu, Jul 10, 2025 at 07:14:29PM -0400, Taylor Blau wrote:
> On Wed, Jul 09, 2025 at 09:54:50AM +0200, Patrick Steinhardt wrote:
> > In the preceding commit we have refactored how we load multi-pack
> > indices so that we take take the source as input for which we want to
> > load the MIDX. As part of this refactoring we started to store a pointer
> > to the MIDX in `struct odb_source` itself.
>
> The first sentence here is a little confusing, but may read more clearly
> if written as:
>
> In the preceding commit we refactored how we load multi-pack
> indices to take a corresponding "source" as input.
Will use, thanks.
> > diff --git a/packfile.c b/packfile.c
> > index 16efc2fdca3..b43dd2fe6cb 100644
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -935,22 +935,17 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
> > report_garbage(PACKDIR_FILE_GARBAGE, full_name);
> > }
> >
> > -static void prepare_packed_git_one(struct repository *r, char *objdir, int local)
> > +static void prepare_packed_git_one(struct odb_source *source, int local)
> > {
> > - struct prepare_pack_data data;
> > struct string_list garbage = STRING_LIST_INIT_DUP;
> > + struct prepare_pack_data data = {
> > + .m = source->multi_pack_index,
> > + .r = source->odb->repo,
> > + .garbage = &garbage,
> > + .local = local,
> > + };
> >
> > - data.m = r->objects->multi_pack_index;
> > -
> > - /* look for the multi-pack-index for this object directory */
> > - while (data.m && strcmp(data.m->object_dir, objdir))
> > - data.m = data.m->next;
>
> Right, since we know that the MIDX corresponding to this source belongs
> to the same "object_dir" path. Having an ASSERT() here may make that
> more clear, but this change looks correct to me.
>
> I am still a little unclear on how sources and ODBs correspond to one
> another, but under my working assumption from the previous patch, I
> think this is right.
Yes, your understanding is correct :)
Patrick
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 3/8] midx: stop using linked list when closing MIDX
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 1/8] midx: start tracking per object database source Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 2/8] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
@ 2025-07-09 7:54 ` Patrick Steinhardt
2025-07-10 21:31 ` Justin Tobler
2025-07-10 23:22 ` Taylor Blau
2025-07-09 7:54 ` [PATCH 4/8] midx: track whether we have loaded the MIDX Patrick Steinhardt
` (6 subsequent siblings)
9 siblings, 2 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-09 7:54 UTC (permalink / raw)
To: git
When calling `close_midx()` we not only close the multi-pack index for
one object source, but instead we iterate through the whole linked list
of MIDXs to close all of them. This linked list is about to go away in
favor of using the new per-source pointer to its respective MIDX.
Refactor the function to iterate through sources instead.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
midx.c | 11 ++++++-----
packfile.c | 10 +++++-----
2 files changed, 11 insertions(+), 10 deletions(-)
diff --git a/midx.c b/midx.c
index a91231bfcdf..416b3e8b54f 100644
--- a/midx.c
+++ b/midx.c
@@ -401,7 +401,6 @@ void close_midx(struct multi_pack_index *m)
if (!m)
return;
- close_midx(m->next);
close_midx(m->base_midx);
munmap((unsigned char *)m->data, m->data_len);
@@ -835,11 +834,13 @@ void clear_midx_file(struct repository *r)
get_midx_filename(r->hash_algo, &midx, r->objects->sources->path);
- if (r->objects && r->objects->multi_pack_index) {
- close_midx(r->objects->multi_pack_index);
- r->objects->multi_pack_index = NULL;
- for (struct odb_source *source = r->objects->sources; source; source = source->next)
+ if (r->objects) {
+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
+ if (source->multi_pack_index)
+ close_midx(source->multi_pack_index);
source->multi_pack_index = NULL;
+ }
+ r->objects->multi_pack_index = NULL;
}
if (remove_path(midx.buf))
diff --git a/packfile.c b/packfile.c
index b43dd2fe6cb..546c161d0c1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -369,12 +369,12 @@ void close_object_store(struct object_database *o)
else
close_pack(p);
- if (o->multi_pack_index) {
- close_midx(o->multi_pack_index);
- o->multi_pack_index = NULL;
- for (struct odb_source *source = o->sources; source; source = source->next)
- source->multi_pack_index = NULL;
+ for (struct odb_source *source = o->sources; source; source = source->next) {
+ if (source->multi_pack_index)
+ close_midx(source->multi_pack_index);
+ source->multi_pack_index = NULL;
}
+ o->multi_pack_index = NULL;
close_commit_graph(o);
}
--
2.50.1.327.g047016eb4a.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH 3/8] midx: stop using linked list when closing MIDX
2025-07-09 7:54 ` [PATCH 3/8] midx: stop using linked list when closing MIDX Patrick Steinhardt
@ 2025-07-10 21:31 ` Justin Tobler
2025-07-15 8:26 ` Patrick Steinhardt
2025-07-10 23:22 ` Taylor Blau
1 sibling, 1 reply; 43+ messages in thread
From: Justin Tobler @ 2025-07-10 21:31 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On 25/07/09 09:54AM, Patrick Steinhardt wrote:
> When calling `close_midx()` we not only close the multi-pack index for
> one object source, but instead we iterate through the whole linked list
> of MIDXs to close all of them. This linked list is about to go away in
> favor of using the new per-source pointer to its respective MIDX.
>
> Refactor the function to iterate through sources instead.
The `close_midx()` function itself is not iterating though the sources.
Rather each of the callsites are now resposible to ensure `close_midx()`
is called separately for each source. It might be nice to clarify this a
bit in the message.
I also noticed that there are several other existing `close_midx()`
callsites that we leave as-is. Each of these sites though looks like
they don't care about globally closing all MIDXs so they should be fine.
This might also we worth mentioning.
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> midx.c | 11 ++++++-----
> packfile.c | 10 +++++-----
> 2 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index a91231bfcdf..416b3e8b54f 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -401,7 +401,6 @@ void close_midx(struct multi_pack_index *m)
> if (!m)
> return;
>
> - close_midx(m->next);
By dropping this recursive call, now only the single MIDX chain is
closed. Other entries in the list are no longer recursively interated
through and will have to `close_midx()` called on them explicitly.
> close_midx(m->base_midx);
>
> munmap((unsigned char *)m->data, m->data_len);
> @@ -835,11 +834,13 @@ void clear_midx_file(struct repository *r)
>
> get_midx_filename(r->hash_algo, &midx, r->objects->sources->path);
>
> - if (r->objects && r->objects->multi_pack_index) {
> - close_midx(r->objects->multi_pack_index);
> - r->objects->multi_pack_index = NULL;
> - for (struct odb_source *source = r->objects->sources; source; source = source->next)
> + if (r->objects) {
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + if (source->multi_pack_index)
> + close_midx(source->multi_pack_index);
> source->multi_pack_index = NULL;
> + }
> + r->objects->multi_pack_index = NULL;
At this callsite we want to close all of the loaded MIDX and do so my
iterating through each of the sources. Since we still have the global
MIDX for now, we explicitly unset it.
> }
>
> if (remove_path(midx.buf))
> diff --git a/packfile.c b/packfile.c
> index b43dd2fe6cb..546c161d0c1 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -369,12 +369,12 @@ void close_object_store(struct object_database *o)
> else
> close_pack(p);
>
> - if (o->multi_pack_index) {
> - close_midx(o->multi_pack_index);
> - o->multi_pack_index = NULL;
> - for (struct odb_source *source = o->sources; source; source = source->next)
> - source->multi_pack_index = NULL;
> + for (struct odb_source *source = o->sources; source; source = source->next) {
> + if (source->multi_pack_index)
> + close_midx(source->multi_pack_index);
> + source->multi_pack_index = NULL;
> }
> + o->multi_pack_index = NULL;
Same here. Looking good :)
-Justin
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 3/8] midx: stop using linked list when closing MIDX
2025-07-10 21:31 ` Justin Tobler
@ 2025-07-15 8:26 ` Patrick Steinhardt
0 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 8:26 UTC (permalink / raw)
To: Justin Tobler; +Cc: git
On Thu, Jul 10, 2025 at 04:31:24PM -0500, Justin Tobler wrote:
> On 25/07/09 09:54AM, Patrick Steinhardt wrote:
> > When calling `close_midx()` we not only close the multi-pack index for
> > one object source, but instead we iterate through the whole linked list
> > of MIDXs to close all of them. This linked list is about to go away in
> > favor of using the new per-source pointer to its respective MIDX.
> >
> > Refactor the function to iterate through sources instead.
>
> The `close_midx()` function itself is not iterating though the sources.
> Rather each of the callsites are now resposible to ensure `close_midx()`
> is called separately for each source. It might be nice to clarify this a
> bit in the message.
>
> I also noticed that there are several other existing `close_midx()`
> callsites that we leave as-is. Each of these sites though looks like
> they don't care about globally closing all MIDXs so they should be fine.
> This might also we worth mentioning.
Indeed. I'll add a note.
Patrick
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 3/8] midx: stop using linked list when closing MIDX
2025-07-09 7:54 ` [PATCH 3/8] midx: stop using linked list when closing MIDX Patrick Steinhardt
2025-07-10 21:31 ` Justin Tobler
@ 2025-07-10 23:22 ` Taylor Blau
2025-07-15 8:26 ` Patrick Steinhardt
1 sibling, 1 reply; 43+ messages in thread
From: Taylor Blau @ 2025-07-10 23:22 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Wed, Jul 09, 2025 at 09:54:51AM +0200, Patrick Steinhardt wrote:
> When calling `close_midx()` we not only close the multi-pack index for
> one object source, but instead we iterate through the whole linked list
> of MIDXs to close all of them. This linked list is about to go away in
> favor of using the new per-source pointer to its respective MIDX.
>
> Refactor the function to iterate through sources instead.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> midx.c | 11 ++++++-----
> packfile.c | 10 +++++-----
> 2 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index a91231bfcdf..416b3e8b54f 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -401,7 +401,6 @@ void close_midx(struct multi_pack_index *m)
> if (!m)
> return;
>
> - close_midx(m->next);
OK, so previously this recursive call to "close_midx()" would have freed
up the `m->next` MIDX...
> close_midx(m->base_midx);
>
> munmap((unsigned char *)m->data, m->data_len);
> @@ -835,11 +834,13 @@ void clear_midx_file(struct repository *r)
>
> get_midx_filename(r->hash_algo, &midx, r->objects->sources->path);
>
> - if (r->objects && r->objects->multi_pack_index) {
> - close_midx(r->objects->multi_pack_index);
> - r->objects->multi_pack_index = NULL;
> - for (struct odb_source *source = r->objects->sources; source; source = source->next)
> + if (r->objects) {
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + if (source->multi_pack_index)
> + close_midx(source->multi_pack_index);
> source->multi_pack_index = NULL;
...and then this line would NULL the now-free()'d memory out.
But instead we are directly iterating through the sources and both
closing and NULL-ing out their respective MIDXs (if any).
As an aside: I know we do the C99-style for loop with declarations in
many places, but in this instance it seems to have produced an awfully
long line. I wonder if in this instance it would be better to write:
struct odb_source *source;
for (source = r->objects->sources; source; source = source->next) {
/* ... */
}
That's still a little lengthy, but it's fewer than 80 characters ;-).
> + }
> + r->objects->multi_pack_index = NULL;
Presumably this pointer will go away at some point in the future as
well?
> }
>
> if (remove_path(midx.buf))
> diff --git a/packfile.c b/packfile.c
> index b43dd2fe6cb..546c161d0c1 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -369,12 +369,12 @@ void close_object_store(struct object_database *o)
> else
> close_pack(p);
>
> - if (o->multi_pack_index) {
> - close_midx(o->multi_pack_index);
> - o->multi_pack_index = NULL;
> - for (struct odb_source *source = o->sources; source; source = source->next)
> - source->multi_pack_index = NULL;
> + for (struct odb_source *source = o->sources; source; source = source->next) {
Same comment here as well.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 3/8] midx: stop using linked list when closing MIDX
2025-07-10 23:22 ` Taylor Blau
@ 2025-07-15 8:26 ` Patrick Steinhardt
0 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 8:26 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
On Thu, Jul 10, 2025 at 07:22:58PM -0400, Taylor Blau wrote:
> On Wed, Jul 09, 2025 at 09:54:51AM +0200, Patrick Steinhardt wrote:
> > diff --git a/midx.c b/midx.c
> > index a91231bfcdf..416b3e8b54f 100644
> > --- a/midx.c
> > +++ b/midx.c
> > @@ -835,11 +834,13 @@ void clear_midx_file(struct repository *r)
> >
> > get_midx_filename(r->hash_algo, &midx, r->objects->sources->path);
> >
> > - if (r->objects && r->objects->multi_pack_index) {
> > - close_midx(r->objects->multi_pack_index);
> > - r->objects->multi_pack_index = NULL;
> > - for (struct odb_source *source = r->objects->sources; source; source = source->next)
> > + if (r->objects) {
> > + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> > + if (source->multi_pack_index)
> > + close_midx(source->multi_pack_index);
> > source->multi_pack_index = NULL;
>
> ...and then this line would NULL the now-free()'d memory out.
>
> But instead we are directly iterating through the sources and both
> closing and NULL-ing out their respective MIDXs (if any).
>
> As an aside: I know we do the C99-style for loop with declarations in
> many places, but in this instance it seems to have produced an awfully
> long line. I wonder if in this instance it would be better to write:
>
> struct odb_source *source;
> for (source = r->objects->sources; source; source = source->next) {
> /* ... */
> }
>
> That's still a little lengthy, but it's fewer than 80 characters ;-).
Fair, can adapt.
> > + }
> > + r->objects->multi_pack_index = NULL;
>
> Presumably this pointer will go away at some point in the future as
> well?
Yup, exactly, it's removed at the end of this series.
Patrick
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 4/8] midx: track whether we have loaded the MIDX
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
` (2 preceding siblings ...)
2025-07-09 7:54 ` [PATCH 3/8] midx: stop using linked list when closing MIDX Patrick Steinhardt
@ 2025-07-09 7:54 ` Patrick Steinhardt
2025-07-10 22:16 ` Justin Tobler
2025-07-09 7:54 ` [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
` (5 subsequent siblings)
9 siblings, 1 reply; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-09 7:54 UTC (permalink / raw)
To: git
When calling `prepare_multi_pack_index_one()` we know to skip loading a
multi-pack index that we have already loaded beforehand. While this
works well in case there actually is a multi-pack index, it doesn't work
when we already tried to load a nonexistent one.
This doesn't cause problems with the current layout, where users
typically iterate through MIDXs via the linked list stored in the object
database. But that linked list is going away, and those users will
instead have to call `get_multi_pack_index()` for each object source. So
if one of those sources doesn't have an MIDX, we may end up trying to
repeatedly load it even though we know it doesn't exist.
Address this issue by introducing a new variable that tracks whether we
have tried to load multi-pack index of a given source.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
midx.c | 12 ++++++------
odb.h | 1 +
packfile.c | 1 +
3 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/midx.c b/midx.c
index 416b3e8b54f..6d3a166fa01 100644
--- a/midx.c
+++ b/midx.c
@@ -728,13 +728,13 @@ int prepare_multi_pack_index_one(struct odb_source *source, int local)
struct repository *r = source->odb->repo;
struct multi_pack_index *m;
+ if (source->multi_pack_index_loaded)
+ return !!source->multi_pack_index;
+
prepare_repo_settings(r);
if (!r->settings.core_multi_pack_index)
return 0;
- if (source->multi_pack_index)
- return 1;
-
m = load_multi_pack_index(r, source->path, local);
if (m) {
struct multi_pack_index *mp = r->objects->multi_pack_index;
@@ -745,11 +745,10 @@ int prepare_multi_pack_index_one(struct odb_source *source, int local)
r->objects->multi_pack_index = m;
}
source->multi_pack_index = m;
-
- return 1;
}
- return 0;
+ source->multi_pack_index_loaded = 1;
+ return !!source->multi_pack_index;
}
int midx_checksum_valid(struct multi_pack_index *m)
@@ -839,6 +838,7 @@ void clear_midx_file(struct repository *r)
if (source->multi_pack_index)
close_midx(source->multi_pack_index);
source->multi_pack_index = NULL;
+ source->multi_pack_index_loaded = 0;
}
r->objects->multi_pack_index = NULL;
}
diff --git a/odb.h b/odb.h
index 8e79c7be520..b39534dd55b 100644
--- a/odb.h
+++ b/odb.h
@@ -62,6 +62,7 @@ struct odb_source {
* should only be accessed directly by packfile.c and midx.c
*/
struct multi_pack_index *multi_pack_index;
+ int multi_pack_index_loaded;
/*
* This is a temporary object store created by the tmp_objdir
diff --git a/packfile.c b/packfile.c
index 546c161d0c1..e5d9d7ac8bc 100644
--- a/packfile.c
+++ b/packfile.c
@@ -373,6 +373,7 @@ void close_object_store(struct object_database *o)
if (source->multi_pack_index)
close_midx(source->multi_pack_index);
source->multi_pack_index = NULL;
+ source->multi_pack_index_loaded = 0;
}
o->multi_pack_index = NULL;
--
2.50.1.327.g047016eb4a.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH 4/8] midx: track whether we have loaded the MIDX
2025-07-09 7:54 ` [PATCH 4/8] midx: track whether we have loaded the MIDX Patrick Steinhardt
@ 2025-07-10 22:16 ` Justin Tobler
2025-07-10 23:26 ` Taylor Blau
0 siblings, 1 reply; 43+ messages in thread
From: Justin Tobler @ 2025-07-10 22:16 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On 25/07/09 09:54AM, Patrick Steinhardt wrote:
> When calling `prepare_multi_pack_index_one()` we know to skip loading a
> multi-pack index that we have already loaded beforehand. While this
> works well in case there actually is a multi-pack index, it doesn't work
> when we already tried to load a nonexistent one.
>
> This doesn't cause problems with the current layout, where users
> typically iterate through MIDXs via the linked list stored in the object
> database. But that linked list is going away, and those users will
> instead have to call `get_multi_pack_index()` for each object source. So
> if one of those sources doesn't have an MIDX, we may end up trying to
> repeatedly load it even though we know it doesn't exist.
IIUC, in its current form `get_multi_pack_index()` returns the global
list of MIDXs. The MIDXs are loaded when calling `prepare_packed_git()`
into both the global `r->objects->multi_pack_index` and
`source->multi_pack_index` for each source as appropriate.
Looking at `prepare_packed_git()`, it checks
`r->objects->packed_git_initialized` to see if it has already been
initialized. If the intent is to start calling `get_multi_pack_index()`
for each source individually, doesn't `prepare_packed_git()` still only
execute once regardless already?
> Address this issue by introducing a new variable that tracks whether we
> have tried to load multi-pack index of a given source.
The contents of the patch look good, but I'm not entirely sure
introducing a separate variable to track if the source has attempted to
load a MIDX is useful.
-Justin
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 4/8] midx: track whether we have loaded the MIDX
2025-07-10 22:16 ` Justin Tobler
@ 2025-07-10 23:26 ` Taylor Blau
2025-07-15 8:27 ` Patrick Steinhardt
0 siblings, 1 reply; 43+ messages in thread
From: Taylor Blau @ 2025-07-10 23:26 UTC (permalink / raw)
To: Justin Tobler; +Cc: Patrick Steinhardt, git
On Thu, Jul 10, 2025 at 05:16:12PM -0500, Justin Tobler wrote:
> Looking at `prepare_packed_git()`, it checks
> `r->objects->packed_git_initialized` to see if it has already been
> initialized. If the intent is to start calling `get_multi_pack_index()`
> for each source individually, doesn't `prepare_packed_git()` still only
> execute once regardless already?
I was wondering the same thing. Perhaps that packed_git_initialized
field is going away sometime in the future and this is its logical
replacement (at least for MIDXs)?
In either case, that would be worth clarifying. If it's not doing
anything (i.e., because we have no plans to get rid of
packed_git_initialized), then I agree that this patch could probably be
dropped, but I suspect that I don't have the full picture in my head yet.
> > Address this issue by introducing a new variable that tracks whether we
> > have tried to load multi-pack index of a given source.
>
> The contents of the patch look good, but I'm not entirely sure
> introducing a separate variable to track if the source has attempted to
> load a MIDX is useful.
Yup.
> -Justin
Thanks,
Taylor
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 4/8] midx: track whether we have loaded the MIDX
2025-07-10 23:26 ` Taylor Blau
@ 2025-07-15 8:27 ` Patrick Steinhardt
0 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 8:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: Justin Tobler, git
On Thu, Jul 10, 2025 at 07:26:16PM -0400, Taylor Blau wrote:
> On Thu, Jul 10, 2025 at 05:16:12PM -0500, Justin Tobler wrote:
> > Looking at `prepare_packed_git()`, it checks
> > `r->objects->packed_git_initialized` to see if it has already been
> > initialized. If the intent is to start calling `get_multi_pack_index()`
> > for each source individually, doesn't `prepare_packed_git()` still only
> > execute once regardless already?
>
> I was wondering the same thing. Perhaps that packed_git_initialized
> field is going away sometime in the future and this is its logical
> replacement (at least for MIDXs)?
>
> In either case, that would be worth clarifying. If it's not doing
> anything (i.e., because we have no plans to get rid of
> packed_git_initialized), then I agree that this patch could probably be
> dropped, but I suspect that I don't have the full picture in my head yet.
>
> > > Address this issue by introducing a new variable that tracks whether we
> > > have tried to load multi-pack index of a given source.
> >
> > The contents of the patch look good, but I'm not entirely sure
> > introducing a separate variable to track if the source has attempted to
> > load a MIDX is useful.
>
> Yup.
Hm, true, you're both correct, I missed that we already have this guard.
I'll drop the patch.
Patrick
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
` (3 preceding siblings ...)
2025-07-09 7:54 ` [PATCH 4/8] midx: track whether we have loaded the MIDX Patrick Steinhardt
@ 2025-07-09 7:54 ` Patrick Steinhardt
2025-07-10 22:35 ` Justin Tobler
2025-07-10 23:56 ` Taylor Blau
2025-07-09 7:54 ` [PATCH 6/8] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
` (4 subsequent siblings)
9 siblings, 2 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-09 7:54 UTC (permalink / raw)
To: git
The function `get_multi_pack_index()` loads multi-pack indices via
`prepare_packed_git()` and then returns the linked list of multi-pack
indices that is stored in `struct object_database`. That list is in the
process of being removed though in favor of storing the MIDX as part of
the object database source it belongs to.
Refactor `get_multi_pack_index()` so that it returns the multi-pack
index for a single object source. Callers are now expected to call this
function for each source they are interested in. This requires them to
iterate through alternates, so we have to prepare alternate object
sources before doing so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/pack-objects.c | 9 ++++++---
builtin/repack.c | 4 ++--
midx-write.c | 22 ++--------------------
object-name.c | 21 ++++++++++++++-------
pack-bitmap.c | 20 ++++++++++++++------
packfile.c | 30 +++++++++++-------------------
packfile.h | 3 +--
7 files changed, 50 insertions(+), 59 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5781dec9808..f889e69e07d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1701,7 +1701,6 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
{
int want;
struct list_head *pos;
- struct multi_pack_index *m;
if (!exclude && local && has_loose_object_nonlocal(oid))
return 0;
@@ -1721,9 +1720,13 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
*found_offset = 0;
}
- for (m = get_multi_pack_index(the_repository); m; m = m->next) {
+ odb_prepare_alternates(the_repository->objects);
+
+ for (struct odb_source *source = the_repository->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
struct pack_entry e;
- if (fill_midx_entry(the_repository, oid, &e, m)) {
+
+ if (m && fill_midx_entry(the_repository, oid, &e, m)) {
want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset, found_mtime);
if (want != -1)
return want;
diff --git a/builtin/repack.c b/builtin/repack.c
index 9bbf032b6dd..5956df5d927 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -218,9 +218,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
static void remove_redundant_pack(const char *dir_name, const char *base_name)
{
struct strbuf buf = STRBUF_INIT;
- struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
+ struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);
strbuf_addf(&buf, "%s.pack", base_name);
- if (m && midx_contains_pack(m, buf.buf))
+ if (m && m->local && midx_contains_pack(m, buf.buf))
clear_midx_file(the_repository);
strbuf_insertf(&buf, 0, "%s/", dir_name);
unlink_pack_path(buf.buf, 1);
diff --git a/midx-write.c b/midx-write.c
index f2cfb85476e..c1ae62d3549 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
const char *object_dir)
{
- struct multi_pack_index *result = NULL;
- struct multi_pack_index *cur;
- char *obj_dir_real = real_pathdup(object_dir, 1);
- struct strbuf cur_path_real = STRBUF_INIT;
-
- /* Ensure the given object_dir is local, or a known alternate. */
- odb_find_source(r->objects, obj_dir_real);
-
- for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
- strbuf_realpath(&cur_path_real, cur->object_dir, 1);
- if (!strcmp(obj_dir_real, cur_path_real.buf)) {
- result = cur;
- goto cleanup;
- }
- }
-
-cleanup:
- free(obj_dir_real);
- strbuf_release(&cur_path_real);
- return result;
+ struct odb_source *source = odb_find_source(r->objects, object_dir);
+ return get_multi_pack_index(source);
}
static int fill_packs_from_midx(struct write_midx_context *ctx,
diff --git a/object-name.c b/object-name.c
index ddafe7f9b13..1e7fdcb90a8 100644
--- a/object-name.c
+++ b/object-name.c
@@ -198,16 +198,19 @@ static void unique_in_pack(struct packed_git *p,
static void find_short_packed_object(struct disambiguate_state *ds)
{
- struct multi_pack_index *m;
struct packed_git *p;
/* Skip, unless oids from the storage hash algorithm are wanted */
if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
return;
- for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
- m = m->next)
- unique_in_midx(m, ds);
+ odb_prepare_alternates(ds->repo->objects);
+ for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ if (m)
+ unique_in_midx(m, ds);
+ }
+
for (p = get_packed_git(ds->repo); p && !ds->ambiguous;
p = p->next)
unique_in_pack(p, ds);
@@ -792,11 +795,15 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
static void find_abbrev_len_packed(struct min_abbrev_data *mad)
{
- struct multi_pack_index *m;
struct packed_git *p;
- for (m = get_multi_pack_index(mad->repo); m; m = m->next)
- find_abbrev_len_for_midx(m, mad);
+ odb_prepare_alternates(mad->repo->objects);
+ for (struct odb_source *source = mad->repo->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ if (m)
+ find_abbrev_len_for_midx(m, mad);
+ }
+
for (p = get_packed_git(mad->repo); p; p = p->next)
find_abbrev_len_for_pack(p, mad);
}
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 0a4af199c05..7b51d381837 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -692,14 +692,16 @@ static int open_midx_bitmap(struct repository *r,
struct bitmap_index *bitmap_git)
{
int ret = -1;
- struct multi_pack_index *midx;
assert(!bitmap_git->map);
- for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
- if (!open_midx_bitmap_1(bitmap_git, midx))
+ odb_prepare_alternates(r->objects);
+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *midx = get_multi_pack_index(source);
+ if (midx && !open_midx_bitmap_1(bitmap_git, midx))
ret = 0;
}
+
return ret;
}
@@ -3307,9 +3309,15 @@ int verify_bitmap_files(struct repository *r)
{
int res = 0;
- for (struct multi_pack_index *m = get_multi_pack_index(r);
- m; m = m->next) {
- char *midx_bitmap_name = midx_bitmap_filename(m);
+ odb_prepare_alternates(r->objects);
+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ char *midx_bitmap_name;
+
+ if (!m)
+ continue;
+
+ midx_bitmap_name = midx_bitmap_filename(m);
res |= verify_bitmap_file(r->hash_algo, midx_bitmap_name);
free(midx_bitmap_name);
}
diff --git a/packfile.c b/packfile.c
index e5d9d7ac8bc..e1ced050451 100644
--- a/packfile.c
+++ b/packfile.c
@@ -963,14 +963,17 @@ static void prepare_packed_git(struct repository *r);
unsigned long repo_approximate_object_count(struct repository *r)
{
if (!r->objects->approximate_object_count_valid) {
- unsigned long count;
- struct multi_pack_index *m;
+ unsigned long count = 0;
struct packed_git *p;
prepare_packed_git(r);
- count = 0;
- for (m = get_multi_pack_index(r); m; m = m->next)
- count += m->num_objects;
+
+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ if (m)
+ count += m->num_objects;
+ }
+
for (p = r->objects->packed_git; p; p = p->next) {
if (open_pack_index(p))
continue;
@@ -1074,21 +1077,10 @@ struct packed_git *get_packed_git(struct repository *r)
return r->objects->packed_git;
}
-struct multi_pack_index *get_multi_pack_index(struct repository *r)
-{
- prepare_packed_git(r);
- return r->objects->multi_pack_index;
-}
-
-struct multi_pack_index *get_local_multi_pack_index(struct repository *r)
+struct multi_pack_index *get_multi_pack_index(struct odb_source *source)
{
- struct multi_pack_index *m = get_multi_pack_index(r);
-
- /* no need to iterate; we always put the local one first (if any) */
- if (m && m->local)
- return m;
-
- return NULL;
+ prepare_packed_git(source->odb->repo);
+ return source->multi_pack_index;
}
struct packed_git *get_all_packs(struct repository *r)
diff --git a/packfile.h b/packfile.h
index 53c3b7d3b43..f16753f2a9b 100644
--- a/packfile.h
+++ b/packfile.h
@@ -147,8 +147,7 @@ void install_packed_git(struct repository *r, struct packed_git *pack);
struct packed_git *get_packed_git(struct repository *r);
struct list_head *get_packed_git_mru(struct repository *r);
-struct multi_pack_index *get_multi_pack_index(struct repository *r);
-struct multi_pack_index *get_local_multi_pack_index(struct repository *r);
+struct multi_pack_index *get_multi_pack_index(struct odb_source *source);
struct packed_git *get_all_packs(struct repository *r);
/*
--
2.50.1.327.g047016eb4a.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources
2025-07-09 7:54 ` [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
@ 2025-07-10 22:35 ` Justin Tobler
2025-07-10 23:56 ` Taylor Blau
1 sibling, 0 replies; 43+ messages in thread
From: Justin Tobler @ 2025-07-10 22:35 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On 25/07/09 09:54AM, Patrick Steinhardt wrote:
> The function `get_multi_pack_index()` loads multi-pack indices via
> `prepare_packed_git()` and then returns the linked list of multi-pack
> indices that is stored in `struct object_database`. That list is in the
> process of being removed though in favor of storing the MIDX as part of
> the object database source it belongs to.
>
> Refactor `get_multi_pack_index()` so that it returns the multi-pack
> index for a single object source. Callers are now expected to call this
> function for each source they are interested in. This requires them to
> iterate through alternates, so we have to prepare alternate object
> sources before doing so.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> builtin/pack-objects.c | 9 ++++++---
> builtin/repack.c | 4 ++--
> midx-write.c | 22 ++--------------------
> object-name.c | 21 ++++++++++++++-------
> pack-bitmap.c | 20 ++++++++++++++------
> packfile.c | 30 +++++++++++-------------------
> packfile.h | 3 +--
> 7 files changed, 50 insertions(+), 59 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 5781dec9808..f889e69e07d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1701,7 +1701,6 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
> {
> int want;
> struct list_head *pos;
> - struct multi_pack_index *m;
>
> if (!exclude && local && has_loose_object_nonlocal(oid))
> return 0;
> @@ -1721,9 +1720,13 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
> *found_offset = 0;
> }
>
> - for (m = get_multi_pack_index(the_repository); m; m = m->next) {
> + odb_prepare_alternates(the_repository->objects);
Previously this explicit call to `odb_prepare_alternates()` was not
necessary because it was done when `prepare_packed_git()` was invoked.
Now that we iterate though the sources upfront to call
`get_multi_pack_index()` on, we need to prepare those sources first.
Makes sense.
> +
> + for (struct odb_source *source = the_repository->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> struct pack_entry e;
> - if (fill_midx_entry(the_repository, oid, &e, m)) {
> +
> + if (m && fill_midx_entry(the_repository, oid, &e, m)) {
> want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset, found_mtime);
> if (want != -1)
> return want;
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 9bbf032b6dd..5956df5d927 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -218,9 +218,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
> static void remove_redundant_pack(const char *dir_name, const char *base_name)
> {
> struct strbuf buf = STRBUF_INIT;
> - struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
> + struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);
> strbuf_addf(&buf, "%s.pack", base_name);
> - if (m && midx_contains_pack(m, buf.buf))
> + if (m && m->local && midx_contains_pack(m, buf.buf))
> clear_midx_file(the_repository);
> strbuf_insertf(&buf, 0, "%s/", dir_name);
> unlink_pack_path(buf.buf, 1);
> diff --git a/midx-write.c b/midx-write.c
> index f2cfb85476e..c1ae62d3549 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
> static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
> const char *object_dir)
> {
> - struct multi_pack_index *result = NULL;
> - struct multi_pack_index *cur;
> - char *obj_dir_real = real_pathdup(object_dir, 1);
> - struct strbuf cur_path_real = STRBUF_INIT;
> -
> - /* Ensure the given object_dir is local, or a known alternate. */
> - odb_find_source(r->objects, obj_dir_real);
> -
> - for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
> - strbuf_realpath(&cur_path_real, cur->object_dir, 1);
> - if (!strcmp(obj_dir_real, cur_path_real.buf)) {
> - result = cur;
> - goto cleanup;
> - }
> - }
> -
> -cleanup:
> - free(obj_dir_real);
> - strbuf_release(&cur_path_real);
> - return result;
> + struct odb_source *source = odb_find_source(r->objects, object_dir);
> + return get_multi_pack_index(source);
Nice and simple :)
> }
>
> static int fill_packs_from_midx(struct write_midx_context *ctx,
> diff --git a/object-name.c b/object-name.c
> index ddafe7f9b13..1e7fdcb90a8 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -198,16 +198,19 @@ static void unique_in_pack(struct packed_git *p,
>
> static void find_short_packed_object(struct disambiguate_state *ds)
> {
> - struct multi_pack_index *m;
> struct packed_git *p;
>
> /* Skip, unless oids from the storage hash algorithm are wanted */
> if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> return;
>
> - for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
> - m = m->next)
> - unique_in_midx(m, ds);
> + odb_prepare_alternates(ds->repo->objects);
> + for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + if (m)
> + unique_in_midx(m, ds);
> + }
> +
> for (p = get_packed_git(ds->repo); p && !ds->ambiguous;
> p = p->next)
> unique_in_pack(p, ds);
> @@ -792,11 +795,15 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
>
> static void find_abbrev_len_packed(struct min_abbrev_data *mad)
> {
> - struct multi_pack_index *m;
> struct packed_git *p;
>
> - for (m = get_multi_pack_index(mad->repo); m; m = m->next)
> - find_abbrev_len_for_midx(m, mad);
> + odb_prepare_alternates(mad->repo->objects);
> + for (struct odb_source *source = mad->repo->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + if (m)
> + find_abbrev_len_for_midx(m, mad);
> + }
> +
> for (p = get_packed_git(mad->repo); p; p = p->next)
> find_abbrev_len_for_pack(p, mad);
> }
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 0a4af199c05..7b51d381837 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -692,14 +692,16 @@ static int open_midx_bitmap(struct repository *r,
> struct bitmap_index *bitmap_git)
> {
> int ret = -1;
> - struct multi_pack_index *midx;
>
> assert(!bitmap_git->map);
>
> - for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
> - if (!open_midx_bitmap_1(bitmap_git, midx))
> + odb_prepare_alternates(r->objects);
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *midx = get_multi_pack_index(source);
> + if (midx && !open_midx_bitmap_1(bitmap_git, midx))
> ret = 0;
> }
> +
> return ret;
> }
>
> @@ -3307,9 +3309,15 @@ int verify_bitmap_files(struct repository *r)
> {
> int res = 0;
>
> - for (struct multi_pack_index *m = get_multi_pack_index(r);
> - m; m = m->next) {
> - char *midx_bitmap_name = midx_bitmap_filename(m);
> + odb_prepare_alternates(r->objects);
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + char *midx_bitmap_name;
> +
> + if (!m)
> + continue;
> +
> + midx_bitmap_name = midx_bitmap_filename(m);
> res |= verify_bitmap_file(r->hash_algo, midx_bitmap_name);
> free(midx_bitmap_name);
> }
> diff --git a/packfile.c b/packfile.c
> index e5d9d7ac8bc..e1ced050451 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -963,14 +963,17 @@ static void prepare_packed_git(struct repository *r);
> unsigned long repo_approximate_object_count(struct repository *r)
> {
> if (!r->objects->approximate_object_count_valid) {
> - unsigned long count;
> - struct multi_pack_index *m;
> + unsigned long count = 0;
> struct packed_git *p;
>
> prepare_packed_git(r);
> - count = 0;
> - for (m = get_multi_pack_index(r); m; m = m->next)
> - count += m->num_objects;
> +
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + if (m)
> + count += m->num_objects;
> + }
> +
> for (p = r->objects->packed_git; p; p = p->next) {
> if (open_pack_index(p))
> continue;
> @@ -1074,21 +1077,10 @@ struct packed_git *get_packed_git(struct repository *r)
> return r->objects->packed_git;
> }
>
> -struct multi_pack_index *get_multi_pack_index(struct repository *r)
> -{
> - prepare_packed_git(r);
> - return r->objects->multi_pack_index;
> -}
> -
> -struct multi_pack_index *get_local_multi_pack_index(struct repository *r)
> +struct multi_pack_index *get_multi_pack_index(struct odb_source *source)
> {
> - struct multi_pack_index *m = get_multi_pack_index(r);
> -
> - /* no need to iterate; we always put the local one first (if any) */
> - if (m && m->local)
> - return m;
> -
> - return NULL;
> + prepare_packed_git(source->odb->repo);
> + return source->multi_pack_index;
Now get_multi_pack_index() provides the MIDX for the specified source.
> }
>
> struct packed_git *get_all_packs(struct repository *r)
> diff --git a/packfile.h b/packfile.h
> index 53c3b7d3b43..f16753f2a9b 100644
> --- a/packfile.h
> +++ b/packfile.h
> @@ -147,8 +147,7 @@ void install_packed_git(struct repository *r, struct packed_git *pack);
>
> struct packed_git *get_packed_git(struct repository *r);
> struct list_head *get_packed_git_mru(struct repository *r);
> -struct multi_pack_index *get_multi_pack_index(struct repository *r);
> -struct multi_pack_index *get_local_multi_pack_index(struct repository *r);
> +struct multi_pack_index *get_multi_pack_index(struct odb_source *source);
> struct packed_git *get_all_packs(struct repository *r);
>
> /*
>
> --
> 2.50.1.327.g047016eb4a.dirty
>
>
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources
2025-07-09 7:54 ` [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
2025-07-10 22:35 ` Justin Tobler
@ 2025-07-10 23:56 ` Taylor Blau
2025-07-15 8:27 ` Patrick Steinhardt
1 sibling, 1 reply; 43+ messages in thread
From: Taylor Blau @ 2025-07-10 23:56 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Wed, Jul 09, 2025 at 09:54:53AM +0200, Patrick Steinhardt wrote:
> The function `get_multi_pack_index()` loads multi-pack indices via
> `prepare_packed_git()` and then returns the linked list of multi-pack
> indices that is stored in `struct object_database`. That list is in the
> process of being removed though in favor of storing the MIDX as part of
> the object database source it belongs to.
>
> Refactor `get_multi_pack_index()` so that it returns the multi-pack
> index for a single object source. Callers are now expected to call this
> function for each source they are interested in. This requires them to
> iterate through alternates, so we have to prepare alternate object
> sources before doing so.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> builtin/pack-objects.c | 9 ++++++---
> builtin/repack.c | 4 ++--
> midx-write.c | 22 ++--------------------
> object-name.c | 21 ++++++++++++++-------
> pack-bitmap.c | 20 ++++++++++++++------
> packfile.c | 30 +++++++++++-------------------
> packfile.h | 3 +--
> 7 files changed, 50 insertions(+), 59 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 5781dec9808..f889e69e07d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1701,7 +1701,6 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
> {
> int want;
> struct list_head *pos;
> - struct multi_pack_index *m;
>
> if (!exclude && local && has_loose_object_nonlocal(oid))
> return 0;
> @@ -1721,9 +1720,13 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
> *found_offset = 0;
> }
>
> - for (m = get_multi_pack_index(the_repository); m; m = m->next) {
> + odb_prepare_alternates(the_repository->objects);
> +
> + for (struct odb_source *source = the_repository->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
All makes sense up to this point. I would suggest adding the following
to the top of this function:
struct repository *r = the_repository;
struct odb_source *source;
to shorten up these lines a bit, but I'm nit-picking so if you feel
strongly that the existing patch is clearer, I won't object.
> struct pack_entry e;
> - if (fill_midx_entry(the_repository, oid, &e, m)) {
> +
> + if (m && fill_midx_entry(the_repository, oid, &e, m)) {
Good.
> want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset, found_mtime);
> if (want != -1)
> return want;
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 9bbf032b6dd..5956df5d927 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -218,9 +218,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
> static void remove_redundant_pack(const char *dir_name, const char *base_name)
> {
> struct strbuf buf = STRBUF_INIT;
> - struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
> + struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);
Is the first source always guaranteed to be the local one? I assume that the
answer here is "yes", and there are certainly other places in the code
where we make a similar assumption. But just wanted to make sure as
this popped into my head while reading.
> strbuf_addf(&buf, "%s.pack", base_name);
> - if (m && midx_contains_pack(m, buf.buf))
> + if (m && m->local && midx_contains_pack(m, buf.buf))
...hmm, maybe not?
> clear_midx_file(the_repository);
> strbuf_insertf(&buf, 0, "%s/", dir_name);
> unlink_pack_path(buf.buf, 1);
> diff --git a/midx-write.c b/midx-write.c
> index f2cfb85476e..c1ae62d3549 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
> static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
> const char *object_dir)
> {
> - struct multi_pack_index *result = NULL;
> - struct multi_pack_index *cur;
> - char *obj_dir_real = real_pathdup(object_dir, 1);
> - struct strbuf cur_path_real = STRBUF_INIT;
> -
> - /* Ensure the given object_dir is local, or a known alternate. */
> - odb_find_source(r->objects, obj_dir_real);
> -
> - for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
> - strbuf_realpath(&cur_path_real, cur->object_dir, 1);
> - if (!strcmp(obj_dir_real, cur_path_real.buf)) {
> - result = cur;
> - goto cleanup;
> - }
> - }
> -
> -cleanup:
> - free(obj_dir_real);
> - strbuf_release(&cur_path_real);
> - return result;
> + struct odb_source *source = odb_find_source(r->objects, object_dir);
> + return get_multi_pack_index(source);
When I first read this I wondered what would happen if we passed in an
unknown object_dir such that odb_find_source() returned NULL. But that
function will never return NULL, and instead will die() if given a bogus
object_dir.
So this is fine, though I would have imagined that we'd return NULL
within odb_find_source() and let the caller die() (or not).
> }
>
> static int fill_packs_from_midx(struct write_midx_context *ctx,
> diff --git a/object-name.c b/object-name.c
> index ddafe7f9b13..1e7fdcb90a8 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -198,16 +198,19 @@ static void unique_in_pack(struct packed_git *p,
>
> static void find_short_packed_object(struct disambiguate_state *ds)
> {
> - struct multi_pack_index *m;
> struct packed_git *p;
>
> /* Skip, unless oids from the storage hash algorithm are wanted */
> if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> return;
>
> - for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
> - m = m->next)
> - unique_in_midx(m, ds);
> + odb_prepare_alternates(ds->repo->objects);
> + for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + if (m)
> + unique_in_midx(m, ds);
> + }
> +
Makes sense, though now having seen this pattern a few times, I am
wondering if it would be worth it to add a utility function that takes a
callback and iterates over the various MIDXs. But perhaps that is taking
DRY-ing things up a little too far ;-).
For what it's worth, I do think that what you wrote here makes more
logical sense: MIDXs are tied to individual alternate object DBs, which
here means that there is one MIDX per "struct odb_source". It just is a
little more verbose to type out.
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 0a4af199c05..7b51d381837 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -692,14 +692,16 @@ static int open_midx_bitmap(struct repository *r,
> struct bitmap_index *bitmap_git)
> {
> int ret = -1;
> - struct multi_pack_index *midx;
>
> assert(!bitmap_git->map);
>
> - for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
> - if (!open_midx_bitmap_1(bitmap_git, midx))
> + odb_prepare_alternates(r->objects);
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *midx = get_multi_pack_index(source);
> + if (midx && !open_midx_bitmap_1(bitmap_git, midx))
> ret = 0;
> }
Makes sense.
> +
Stray diff?
> @@ -3307,9 +3309,15 @@ int verify_bitmap_files(struct repository *r)
> {
> int res = 0;
>
> - for (struct multi_pack_index *m = get_multi_pack_index(r);
> - m; m = m->next) {
> - char *midx_bitmap_name = midx_bitmap_filename(m);
> + odb_prepare_alternates(r->objects);
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + char *midx_bitmap_name;
> +
> + if (!m)
> + continue;
> +
> + midx_bitmap_name = midx_bitmap_filename(m);
This one looks good as well.
> diff --git a/packfile.c b/packfile.c
> index e5d9d7ac8bc..e1ced050451 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -963,14 +963,17 @@ static void prepare_packed_git(struct repository *r);
> unsigned long repo_approximate_object_count(struct repository *r)
> {
> if (!r->objects->approximate_object_count_valid) {
> - unsigned long count;
> - struct multi_pack_index *m;
> + unsigned long count = 0;
> struct packed_git *p;
>
> prepare_packed_git(r);
I was wondering where the odb_prepare_alternates() call was in this one,
but it is a byproduct of calling prepare_packed_git(). So this spot
looks OK as well.
> - count = 0;
> - for (m = get_multi_pack_index(r); m; m = m->next)
> - count += m->num_objects;
> +
> + for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> + struct multi_pack_index *m = get_multi_pack_index(source);
> + if (m)
> + count += m->num_objects;
Unrelated to your patch, I wonder if it makes sense to check for
overflow here. I think so, though presumably if we are overflowing
"unsigned long" then likely we have much bigger problems to worry about
;-).
The rest looks good to me.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources
2025-07-10 23:56 ` Taylor Blau
@ 2025-07-15 8:27 ` Patrick Steinhardt
0 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 8:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
On Thu, Jul 10, 2025 at 07:56:28PM -0400, Taylor Blau wrote:
> On Wed, Jul 09, 2025 at 09:54:53AM +0200, Patrick Steinhardt wrote:
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index 9bbf032b6dd..5956df5d927 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -218,9 +218,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
> > static void remove_redundant_pack(const char *dir_name, const char *base_name)
> > {
> > struct strbuf buf = STRBUF_INIT;
> > - struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
> > + struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);
>
> Is the first source always guaranteed to be the local one? I assume that the
> answer here is "yes", and there are certainly other places in the code
> where we make a similar assumption. But just wanted to make sure as
> this popped into my head while reading.
>
> > strbuf_addf(&buf, "%s.pack", base_name);
> > - if (m && midx_contains_pack(m, buf.buf))
> > + if (m && m->local && midx_contains_pack(m, buf.buf))
>
> ...hmm, maybe not?
The answer is "almost always". The only exception is in case there is a
temporary object source added via `odb_set_temporary_primary_source()`,
for example for quarantine directories.
Generally speaking this patch here doesn't really change anything, and
we could just as well drop the `m->local` part above. But I think that
overall we're not doing a good job to track the local object source, so
it felt safer to me to add the above guard.
> > diff --git a/midx-write.c b/midx-write.c
> > index f2cfb85476e..c1ae62d3549 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
> > static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
> > const char *object_dir)
> > {
> > - struct multi_pack_index *result = NULL;
> > - struct multi_pack_index *cur;
> > - char *obj_dir_real = real_pathdup(object_dir, 1);
> > - struct strbuf cur_path_real = STRBUF_INIT;
> > -
> > - /* Ensure the given object_dir is local, or a known alternate. */
> > - odb_find_source(r->objects, obj_dir_real);
> > -
> > - for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
> > - strbuf_realpath(&cur_path_real, cur->object_dir, 1);
> > - if (!strcmp(obj_dir_real, cur_path_real.buf)) {
> > - result = cur;
> > - goto cleanup;
> > - }
> > - }
> > -
> > -cleanup:
> > - free(obj_dir_real);
> > - strbuf_release(&cur_path_real);
> > - return result;
> > + struct odb_source *source = odb_find_source(r->objects, object_dir);
> > + return get_multi_pack_index(source);
>
> When I first read this I wondered what would happen if we passed in an
> unknown object_dir such that odb_find_source() returned NULL. But that
> function will never return NULL, and instead will die() if given a bogus
> object_dir.
>
> So this is fine, though I would have imagined that we'd return NULL
> within odb_find_source() and let the caller die() (or not).
Fully agreed. I've got a follow-up patch series that does this
refactoring.
> > diff --git a/object-name.c b/object-name.c
> > index ddafe7f9b13..1e7fdcb90a8 100644
> > --- a/object-name.c
> > +++ b/object-name.c
> > @@ -198,16 +198,19 @@ static void unique_in_pack(struct packed_git *p,
> >
> > static void find_short_packed_object(struct disambiguate_state *ds)
> > {
> > - struct multi_pack_index *m;
> > struct packed_git *p;
> >
> > /* Skip, unless oids from the storage hash algorithm are wanted */
> > if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> > return;
> >
> > - for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
> > - m = m->next)
> > - unique_in_midx(m, ds);
> > + odb_prepare_alternates(ds->repo->objects);
> > + for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
> > + struct multi_pack_index *m = get_multi_pack_index(source);
> > + if (m)
> > + unique_in_midx(m, ds);
> > + }
> > +
>
> Makes sense, though now having seen this pattern a few times, I am
> wondering if it would be worth it to add a utility function that takes a
> callback and iterates over the various MIDXs. But perhaps that is taking
> DRY-ing things up a little too far ;-).
>
> For what it's worth, I do think that what you wrote here makes more
> logical sense: MIDXs are tied to individual alternate object DBs, which
> here means that there is one MIDX per "struct odb_source". It just is a
> little more verbose to type out.
I was wondering whether it would make sense to add a looping macro, but
I ultimately decided against it as it doesn't make too much of a
difference to really matter. I wouldn't mind adding it though if you or
others feel strongly about it.
Patrick
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 6/8] packfile: stop using linked MIDX list in `find_pack_entry()`
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
` (4 preceding siblings ...)
2025-07-09 7:54 ` [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
@ 2025-07-09 7:54 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 7/8] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
` (3 subsequent siblings)
9 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-09 7:54 UTC (permalink / raw)
To: git
Refactor `find_pack_entry()` so that we stop using the linked list of
multi-pack indices. Note that there is no need to explicitly prepare
alternates, and neither do we have to use `get_multi_pack_index()`,
because `prepare_packed_git()` already takes care of populating all data
structures for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/packfile.c b/packfile.c
index e1ced050451..776a72678f5 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2073,16 +2073,15 @@ static int fill_pack_entry(const struct object_id *oid,
int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e)
{
struct list_head *pos;
- struct multi_pack_index *m;
prepare_packed_git(r);
- if (!r->objects->packed_git && !r->objects->multi_pack_index)
- return 0;
- for (m = r->objects->multi_pack_index; m; m = m->next) {
- if (fill_midx_entry(r, oid, e, m))
+ for (struct odb_source *source = r->objects->sources; source; source = source->next)
+ if (source->multi_pack_index && fill_midx_entry(r, oid, e, source->multi_pack_index))
return 1;
- }
+
+ if (!r->objects->packed_git)
+ return 0;
list_for_each(pos, &r->objects->packed_git_mru) {
struct packed_git *p = list_entry(pos, struct packed_git, mru);
--
2.50.1.327.g047016eb4a.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH 7/8] packfile: stop using linked MIDX list in `get_all_packs()`
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
` (5 preceding siblings ...)
2025-07-09 7:54 ` [PATCH 6/8] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
@ 2025-07-09 7:54 ` Patrick Steinhardt
2025-07-09 7:54 ` [PATCH 8/8] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
` (2 subsequent siblings)
9 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-09 7:54 UTC (permalink / raw)
To: git
Refactor `get_all_packs()` so that we stop using the linked list of
multi-pack indices. Note that there is no need to explicitly prepare
alternates, and neither do we have to use `get_multi_pack_index()`,
because `prepare_packed_git()` already takes care of populating all data
structures for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/packfile.c b/packfile.c
index 776a72678f5..3eeec20906b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1085,12 +1085,13 @@ struct multi_pack_index *get_multi_pack_index(struct odb_source *source)
struct packed_git *get_all_packs(struct repository *r)
{
- struct multi_pack_index *m;
-
prepare_packed_git(r);
- for (m = r->objects->multi_pack_index; m; m = m->next) {
- uint32_t i;
- for (i = 0; i < m->num_packs + m->num_packs_in_base; i++)
+
+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = source->multi_pack_index;
+ if (!m)
+ continue;
+ for (uint32_t i = 0; i < m->num_packs + m->num_packs_in_base; i++)
prepare_midx_pack(r, m, i);
}
--
2.50.1.327.g047016eb4a.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH 8/8] midx: remove now-unused linked list of multi-pack indices
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
` (6 preceding siblings ...)
2025-07-09 7:54 ` [PATCH 7/8] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
@ 2025-07-09 7:54 ` Patrick Steinhardt
2025-07-10 22:48 ` Justin Tobler
2025-07-09 22:04 ` [PATCH 0/8] odb: track multi-pack-indices via their object sources Junio C Hamano
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
9 siblings, 1 reply; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-09 7:54 UTC (permalink / raw)
To: git
In the preceding commits we have migrated all users of the linked list
of multi-pack indices to instead use those stored in the object database
sources. Remove those now-unused pointers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
midx.c | 16 ++--------------
midx.h | 2 --
odb.h | 7 -------
packfile.c | 1 -
4 files changed, 2 insertions(+), 24 deletions(-)
diff --git a/midx.c b/midx.c
index 6d3a166fa01..27623e8cbb7 100644
--- a/midx.c
+++ b/midx.c
@@ -726,7 +726,6 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
int prepare_multi_pack_index_one(struct odb_source *source, int local)
{
struct repository *r = source->odb->repo;
- struct multi_pack_index *m;
if (source->multi_pack_index_loaded)
return !!source->multi_pack_index;
@@ -735,19 +734,9 @@ int prepare_multi_pack_index_one(struct odb_source *source, int local)
if (!r->settings.core_multi_pack_index)
return 0;
- m = load_multi_pack_index(r, source->path, local);
- if (m) {
- struct multi_pack_index *mp = r->objects->multi_pack_index;
- if (mp) {
- m->next = mp->next;
- mp->next = m;
- } else {
- r->objects->multi_pack_index = m;
- }
- source->multi_pack_index = m;
- }
-
+ source->multi_pack_index = load_multi_pack_index(r, source->path, local);
source->multi_pack_index_loaded = 1;
+
return !!source->multi_pack_index;
}
@@ -840,7 +829,6 @@ void clear_midx_file(struct repository *r)
source->multi_pack_index = NULL;
source->multi_pack_index_loaded = 0;
}
- r->objects->multi_pack_index = NULL;
}
if (remove_path(midx.buf))
diff --git a/midx.h b/midx.h
index b1626a9a7c4..c4192c92d44 100644
--- a/midx.h
+++ b/midx.h
@@ -35,8 +35,6 @@ struct repository;
"GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL"
struct multi_pack_index {
- struct multi_pack_index *next;
-
const unsigned char *data;
size_t data_len;
diff --git a/odb.h b/odb.h
index b39534dd55b..8ff523a33de 100644
--- a/odb.h
+++ b/odb.h
@@ -124,13 +124,6 @@ struct object_database {
struct commit_graph *commit_graph;
unsigned commit_graph_attempted : 1; /* if loading has been attempted */
- /*
- * private data
- *
- * should only be accessed directly by packfile.c and midx.c
- */
- struct multi_pack_index *multi_pack_index;
-
/*
* private data
*
diff --git a/packfile.c b/packfile.c
index 3eeec20906b..453a38395bb 100644
--- a/packfile.c
+++ b/packfile.c
@@ -375,7 +375,6 @@ void close_object_store(struct object_database *o)
source->multi_pack_index = NULL;
source->multi_pack_index_loaded = 0;
}
- o->multi_pack_index = NULL;
close_commit_graph(o);
}
--
2.50.1.327.g047016eb4a.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH 8/8] midx: remove now-unused linked list of multi-pack indices
2025-07-09 7:54 ` [PATCH 8/8] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
@ 2025-07-10 22:48 ` Justin Tobler
0 siblings, 0 replies; 43+ messages in thread
From: Justin Tobler @ 2025-07-10 22:48 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On 25/07/09 09:54AM, Patrick Steinhardt wrote:
> In the preceding commits we have migrated all users of the linked list
> of multi-pack indices to instead use those stored in the object database
> sources. Remove those now-unused pointers.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> midx.c | 16 ++--------------
> midx.h | 2 --
> odb.h | 7 -------
> packfile.c | 1 -
> 4 files changed, 2 insertions(+), 24 deletions(-)
>
> diff --git a/midx.c b/midx.c
> index 6d3a166fa01..27623e8cbb7 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -726,7 +726,6 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
> int prepare_multi_pack_index_one(struct odb_source *source, int local)
> {
> struct repository *r = source->odb->repo;
> - struct multi_pack_index *m;
>
> if (source->multi_pack_index_loaded)
> return !!source->multi_pack_index;
> @@ -735,19 +734,9 @@ int prepare_multi_pack_index_one(struct odb_source *source, int local)
> if (!r->settings.core_multi_pack_index)
> return 0;
>
> - m = load_multi_pack_index(r, source->path, local);
> - if (m) {
> - struct multi_pack_index *mp = r->objects->multi_pack_index;
> - if (mp) {
> - m->next = mp->next;
> - mp->next = m;
> - } else {
> - r->objects->multi_pack_index = m;
> - }
> - source->multi_pack_index = m;
> - }
> -
> + source->multi_pack_index = load_multi_pack_index(r, source->path, local);
Now that we are dropping the MIDX list stored in the object database, we
no longer have to store. Nice and simple.
> source->multi_pack_index_loaded = 1;
> +
> return !!source->multi_pack_index;
> }
>
> @@ -840,7 +829,6 @@ void clear_midx_file(struct repository *r)
> source->multi_pack_index = NULL;
> source->multi_pack_index_loaded = 0;
> }
> - r->objects->multi_pack_index = NULL;
> }
>
> if (remove_path(midx.buf))
> diff --git a/midx.h b/midx.h
> index b1626a9a7c4..c4192c92d44 100644
> --- a/midx.h
> +++ b/midx.h
> @@ -35,8 +35,6 @@ struct repository;
> "GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL"
>
> struct multi_pack_index {
> - struct multi_pack_index *next;
We no longer need the next pointer since each `struct multi_pack_index`
represents an indepented MIDX chain.
> -
> const unsigned char *data;
> size_t data_len;
>
> diff --git a/odb.h b/odb.h
> index b39534dd55b..8ff523a33de 100644
> --- a/odb.h
> +++ b/odb.h
> @@ -124,13 +124,6 @@ struct object_database {
> struct commit_graph *commit_graph;
> unsigned commit_graph_attempted : 1; /* if loading has been attempted */
>
> - /*
> - * private data
> - *
> - * should only be accessed directly by packfile.c and midx.c
> - */
> - struct multi_pack_index *multi_pack_index;
Now each object source specifies its own MIDX and this one can go away.
Nice :)
> -
> /*
> * private data
> *
> diff --git a/packfile.c b/packfile.c
> index 3eeec20906b..453a38395bb 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -375,7 +375,6 @@ void close_object_store(struct object_database *o)
> source->multi_pack_index = NULL;
> source->multi_pack_index_loaded = 0;
> }
> - o->multi_pack_index = NULL;
>
> close_commit_graph(o);
> }
>
> --
> 2.50.1.327.g047016eb4a.dirty
>
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 0/8] odb: track multi-pack-indices via their object sources
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
` (7 preceding siblings ...)
2025-07-09 7:54 ` [PATCH 8/8] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
@ 2025-07-09 22:04 ` Junio C Hamano
2025-07-10 23:58 ` Taylor Blau
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
9 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2025-07-09 22:04 UTC (permalink / raw)
To: Patrick Steinhardt, Taylor Blau; +Cc: git
Patrick Steinhardt <ps@pks.im> writes:
> This patch series thus refactors the codebase to stop tracking MIDX's
> globally. Instead, they are being pushed down one level so that every
> `struct odb_source` has an optional MIDX itself. This simplifies some of
> our code and will make it easier in a future iteration to move the data
> into a packfile-specific object source backend.
>
> This series is built on top of a30f80fde92 (The eighth batch,
> 2025-07-08) with "ps/object-store" at 841a03b4046 (odb: rename
> `read_object_with_reference()`, 2025-07-01) merged into it.
You do not have to deal with it just yet, but FYI, another topic in
flight has a commit that adds a few more callers to a function this
topic renames away. Namely, 5ee86c27 (repack: exclude cruft pack(s)
from the MIDX where possible, 2025-06-23).
If this topic needs to be rerolled after the other topic graduates
to 'master', we may need to see this topic rebased on a newer
'master' with something like the attached patch squashed in, but
because the other topic is at least a few more days away from
'next', and it might still need another final finishing touch
iteration, let's keep these two topics independent from each other a
bit longer, and let me deal with this trivial semantic conflict
resolution, at least for now.
Thanks.
diff --git a/builtin/repack.c b/builtin/repack.c
index a74b2ca7f3..21723866b9 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -1531,7 +1531,7 @@ int cmd_repack(int argc,
* midx_has_unknown_packs() will make the decision for
* us.
*/
- if (!get_local_multi_pack_index(the_repository))
+ if (!get_multi_pack_index(the_repository->objects->sources))
midx_must_contain_cruft = 1;
}
@@ -1614,9 +1614,9 @@ int cmd_repack(int argc,
string_list_sort(&names);
- if (get_local_multi_pack_index(the_repository)) {
+ if (get_multi_pack_index(the_repository->objects->sources)) {
struct multi_pack_index *m =
- get_local_multi_pack_index(the_repository);
+ get_multi_pack_index(the_repository->objects->sources);
ALLOC_ARRAY(midx_pack_names,
m->num_packs + m->num_packs_in_base);
--
2.50.1-382-gda22511645
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH 0/8] odb: track multi-pack-indices via their object sources
2025-07-09 22:04 ` [PATCH 0/8] odb: track multi-pack-indices via their object sources Junio C Hamano
@ 2025-07-10 23:58 ` Taylor Blau
2025-07-15 8:27 ` Patrick Steinhardt
0 siblings, 1 reply; 43+ messages in thread
From: Taylor Blau @ 2025-07-10 23:58 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Patrick Steinhardt, git
On Wed, Jul 09, 2025 at 03:04:44PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > This patch series thus refactors the codebase to stop tracking MIDX's
> > globally. Instead, they are being pushed down one level so that every
> > `struct odb_source` has an optional MIDX itself. This simplifies some of
> > our code and will make it easier in a future iteration to move the data
> > into a packfile-specific object source backend.
> >
> > This series is built on top of a30f80fde92 (The eighth batch,
> > 2025-07-08) with "ps/object-store" at 841a03b4046 (odb: rename
> > `read_object_with_reference()`, 2025-07-01) merged into it.
>
> You do not have to deal with it just yet, but FYI, another topic in
> flight has a commit that adds a few more callers to a function this
> topic renames away. Namely, 5ee86c27 (repack: exclude cruft pack(s)
> from the MIDX where possible, 2025-06-23).
Yup, there are a handful of new get_local_multi_pack_index() calls in
that topic.
> If this topic needs to be rerolled after the other topic graduates
> to 'master', we may need to see this topic rebased on a newer
> 'master' with something like the attached patch squashed in, but
> because the other topic is at least a few more days away from
> 'next', and it might still need another final finishing touch
> iteration, let's keep these two topics independent from each other a
> bit longer, and let me deal with this trivial semantic conflict
> resolution, at least for now.
>
> Thanks.
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index a74b2ca7f3..21723866b9 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
Assuming that in these three cases that the first entry in
the_repository->objects->sources refers to the local object database,
then I agree with the proposed changes.
Thanks for flagging it :-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 0/8] odb: track multi-pack-indices via their object sources
2025-07-10 23:58 ` Taylor Blau
@ 2025-07-15 8:27 ` Patrick Steinhardt
0 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 8:27 UTC (permalink / raw)
To: Taylor Blau; +Cc: Junio C Hamano, git
On Thu, Jul 10, 2025 at 07:58:42PM -0400, Taylor Blau wrote:
> On Wed, Jul 09, 2025 at 03:04:44PM -0700, Junio C Hamano wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> > > This patch series thus refactors the codebase to stop tracking MIDX's
> > > globally. Instead, they are being pushed down one level so that every
> > > `struct odb_source` has an optional MIDX itself. This simplifies some of
> > > our code and will make it easier in a future iteration to move the data
> > > into a packfile-specific object source backend.
> > >
> > > This series is built on top of a30f80fde92 (The eighth batch,
> > > 2025-07-08) with "ps/object-store" at 841a03b4046 (odb: rename
> > > `read_object_with_reference()`, 2025-07-01) merged into it.
> >
> > You do not have to deal with it just yet, but FYI, another topic in
> > flight has a commit that adds a few more callers to a function this
> > topic renames away. Namely, 5ee86c27 (repack: exclude cruft pack(s)
> > from the MIDX where possible, 2025-06-23).
>
> Yup, there are a handful of new get_local_multi_pack_index() calls in
> that topic.
>
> > If this topic needs to be rerolled after the other topic graduates
> > to 'master', we may need to see this topic rebased on a newer
> > 'master' with something like the attached patch squashed in, but
> > because the other topic is at least a few more days away from
> > 'next', and it might still need another final finishing touch
> > iteration, let's keep these two topics independent from each other a
> > bit longer, and let me deal with this trivial semantic conflict
> > resolution, at least for now.
> >
> > Thanks.
> >
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index a74b2ca7f3..21723866b9 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
>
> Assuming that in these three cases that the first entry in
> the_repository->objects->sources refers to the local object database,
> then I agree with the proposed changes.
>
> Thanks for flagging it :-).
Indeed, thanks. As the topic has been merged to "next" by now I'll
rebase my series on top of it.
Patrick
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 0/7] odb: track multi-pack-indices via their object sources
2025-07-09 7:54 [PATCH 0/8] odb: track multi-pack-indices via their object sources Patrick Steinhardt
` (8 preceding siblings ...)
2025-07-09 22:04 ` [PATCH 0/8] odb: track multi-pack-indices via their object sources Junio C Hamano
@ 2025-07-15 11:29 ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 1/7] midx: start tracking per object database source Patrick Steinhardt
` (8 more replies)
9 siblings, 9 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 11:29 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Justin Tobler, Junio C Hamano
Hi,
multi-pack-indices are tracked via `struct multi_pack_index`. This data
structure is stored inside `struct object_database`, which is the global
database that spans across all of the object sources
This layout causes two problems:
- Multi-pack indices aren't global to an object database, but instead
there can be one multi-pack index per object source. This creates a
mismatch between the on-disk layout and how things are organized in
the object database subsystems and makes some parts, like figuring
out whether an object source has an MIDX, quite awkward.
- Multi-pack indices are an implementation detail of how efficient
access for packfiles work. As such, they are neither relevant in the
context of loose objects, nor in a potential future where we have
pluggable backends.
This patch series thus refactors the codebase to stop tracking MIDX's
globally. Instead, they are being pushed down one level so that every
`struct odb_source` has an optional MIDX itself. This simplifies some of
our code and will make it easier in a future iteration to move the data
into a packfile-specific object source backend.
Changes in v2:
- Changed the base of this series. It is now built on top of
a30f80fde92 (The eighth batch, 2025-07-08) with "ps/object-store" at
841a03b4046 (odb: rename `read_object_with_reference()`, 2025-07-01)
and "tb/midx-avoid-cruft-packs" at 5ee86c273bf (repack: exclude
cruft pack(s) from the MIDX where possible, 2025-06-23) merged into
it.
- Re-explain the split between object databases and object sources
to help readers out a bit, given that this is a rather recent
change.
- Rename `struct odb_source::multi_pack_index` to `struct
odb_source::midx`.
- Fix some overly long lines when looping through the individual
sources.
- Drop the patch that guards re-loading MIDXs, as we already have the
guard via `packed_git_initialized`.
- Remove some while-at-it changes to make the diffs easier to read.
- Link to v1: https://lore.kernel.org/r/20250709-b4-pks-midx-via-odb-alternate-v1-0-f31150d21331@pks.im
Thanks!
Patrick
---
Patrick Steinhardt (7):
midx: start tracking per object database source
packfile: refactor `prepare_packed_git_one()` to work on sources
midx: stop using linked list when closing MIDX
packfile: refactor `get_multi_pack_index()` to work on sources
packfile: stop using linked MIDX list in `find_pack_entry()`
packfile: stop using linked MIDX list in `get_all_packs()`
midx: remove now-unused linked list of multi-pack indices
builtin/pack-objects.c | 10 ++++--
builtin/repack.c | 10 +++---
midx-write.c | 22 ++-----------
midx.c | 36 ++++++++-------------
midx.h | 5 ++-
object-name.c | 22 +++++++++----
odb.h | 16 +++++-----
pack-bitmap.c | 21 ++++++++----
packfile.c | 86 ++++++++++++++++++++++----------------------------
packfile.h | 3 +-
10 files changed, 107 insertions(+), 124 deletions(-)
Range-diff versus v1:
1: d5a98780836 ! 1: 0617b4bfaf7 midx: start tracking per object database source
@@ Commit message
This layout causes two problems:
- - Multi-pack indices aren't global to an object database, but instead
- there can be one multi-pack index per source. This creates a
- mismatch between the on-disk layout and how things are organized in
- the object database subsystems and makes some parts, like figuring
- out whether a source has an MIDX, quite awkward.
+ - Object databases consist of multiple object sources (e.g. one source
+ per alternate object directory), where each multi-pack index is
+ specific to one of those sources. Regardless of that though, the
+ MIDX is not tracked per source, but tracked globally for the whole
+ object database. This creates a mismatch between the on-disk layout
+ and how things are organized in the object database subsystems and
+ makes some parts, like figuring out whether a source has an MIDX,
+ quite awkward.
- Multi-pack indices are an implementation detail of how efficient
access for packfiles work. As such, they are neither relevant in the
@@ midx.c: int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_i
- return 1;
-
- m = load_multi_pack_index(r, object_dir, local);
-+ if (source->multi_pack_index)
++ if (source->midx)
+ return 1;
+ m = load_multi_pack_index(r, source->path, local);
@@ midx.c: int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_i
+ } else {
r->objects->multi_pack_index = m;
+ }
-+ source->multi_pack_index = m;
++ source->midx = m;
+
return 1;
}
@@ midx.c: void clear_midx_file(struct repository *r)
close_midx(r->objects->multi_pack_index);
r->objects->multi_pack_index = NULL;
+ for (struct odb_source *source = r->objects->sources; source; source = source->next)
-+ source->multi_pack_index = NULL;
++ source->midx = NULL;
}
if (remove_path(midx.buf))
## midx.h ##
-@@
-
- #include "string-list.h"
-
-+struct bitmapped_pack;
-+struct git_hash_algo;
- struct object_id;
-+struct odb_source;
- struct pack_entry;
+@@ midx.h: struct pack_entry;
struct repository;
--struct bitmapped_pack;
--struct git_hash_algo;
+ struct bitmapped_pack;
+ struct git_hash_algo;
++struct odb_source;
#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
#define MIDX_VERSION 1
@@ midx.h: int fill_midx_entry(struct repository *r, const struct object_id *oid, s
* Variant of write_midx_file which writes a MIDX containing only the packs
## odb.h ##
-@@
- #include "string-list.h"
- #include "thread-utils.h"
-
-+struct multi_pack_index;
- struct oidmap;
+@@ odb.h: struct oidmap;
struct oidtree;
--struct strbuf;
+ struct strbuf;
struct repository;
-+struct strbuf;
++struct multi_pack_index;
/*
* Compute the exact path an alternate is at and returns it. In case of
@@ odb.h: struct odb_source {
+ *
+ * should only be accessed directly by packfile.c and midx.c
+ */
-+ struct multi_pack_index *multi_pack_index;
++ struct multi_pack_index *midx;
+
/*
* This is a temporary object store created by the tmp_objdir
@@ packfile.c: void close_object_store(struct object_database *o)
close_midx(o->multi_pack_index);
o->multi_pack_index = NULL;
+ for (struct odb_source *source = o->sources; source; source = source->next)
-+ source->multi_pack_index = NULL;
++ source->midx = NULL;
}
close_commit_graph(o);
2: 8315835a35a ! 2: f81ea4ea950 packfile: refactor `prepare_packed_git_one()` to work on sources
@@ Metadata
## Commit message ##
packfile: refactor `prepare_packed_git_one()` to work on sources
- In the preceding commit we have refactored how we load multi-pack
- indices so that we take take the source as input for which we want to
- load the MIDX. As part of this refactoring we started to store a pointer
- to the MIDX in `struct odb_source` itself.
+ In the preceding commit we refactored how we load multi-pack indices to
+ take a corresponding "source" as input. As part of this refactoring we
+ started to store a pointer to the MIDX in `struct odb_source` itself.
Refactor loading of packfiles in the same way: instead of passing in the
- object directory, we now pass in the source for which we want to load
+ object directory, we now pass in the source from which we want to load
packfiles. This allows us to simplify the code because we don't have to
search for a corresponding MIDX anymore, but we can instead directly use
the MIDX that we have already prepared beforehand.
@@ packfile.c: static void prepare_pack(const char *full_name, size_t full_name_len
- struct prepare_pack_data data;
struct string_list garbage = STRING_LIST_INIT_DUP;
+ struct prepare_pack_data data = {
-+ .m = source->multi_pack_index,
++ .m = source->midx,
+ .r = source->odb->repo,
+ .garbage = &garbage,
+ .local = local,
3: d4e426c028a ! 3: 7bf5ce4f766 midx: stop using linked list when closing MIDX
@@ Commit message
Refactor the function to iterate through sources instead.
+ Note that after this patch, there's a couple of callsites left that
+ continue to use `close_midx()` without iterating through all sources.
+ These are all cases where we don't care about the MIDX from other
+ sources though, so it's fine to keep them as-is.
+
Signed-off-by: Patrick Steinhardt <ps@pks.im>
## midx.c ##
@@ midx.c: void clear_midx_file(struct repository *r)
- r->objects->multi_pack_index = NULL;
- for (struct odb_source *source = r->objects->sources; source; source = source->next)
+ if (r->objects) {
-+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
-+ if (source->multi_pack_index)
-+ close_midx(source->multi_pack_index);
- source->multi_pack_index = NULL;
++ struct odb_source *source;
++
++ for (source = r->objects->sources; source; source = source->next) {
++ if (source->midx)
++ close_midx(source->midx);
+ source->midx = NULL;
+ }
+ r->objects->multi_pack_index = NULL;
}
@@ midx.c: void clear_midx_file(struct repository *r)
if (remove_path(midx.buf))
## packfile.c ##
+@@ packfile.c: void close_pack(struct packed_git *p)
+
+ void close_object_store(struct object_database *o)
+ {
++ struct odb_source *source;
+ struct packed_git *p;
+
+ for (p = o->packed_git; p; p = p->next)
@@ packfile.c: void close_object_store(struct object_database *o)
else
close_pack(p);
@@ packfile.c: void close_object_store(struct object_database *o)
- close_midx(o->multi_pack_index);
- o->multi_pack_index = NULL;
- for (struct odb_source *source = o->sources; source; source = source->next)
-- source->multi_pack_index = NULL;
-+ for (struct odb_source *source = o->sources; source; source = source->next) {
-+ if (source->multi_pack_index)
-+ close_midx(source->multi_pack_index);
-+ source->multi_pack_index = NULL;
+- source->midx = NULL;
++ for (source = o->sources; source; source = source->next) {
++ if (source->midx)
++ close_midx(source->midx);
++ source->midx = NULL;
}
+ o->multi_pack_index = NULL;
4: 06661849d72 < -: ----------- midx: track whether we have loaded the MIDX
5: 2b7ae703f48 ! 4: abf26698a61 packfile: refactor `get_multi_pack_index()` to work on sources
@@ Commit message
## builtin/pack-objects.c ##
@@ builtin/pack-objects.c: static int want_object_in_pack_mtime(const struct object_id *oid,
+ uint32_t found_mtime)
{
int want;
++ struct odb_source *source;
struct list_head *pos;
- struct multi_pack_index *m;
@@ builtin/pack-objects.c: static int want_object_in_pack_mtime(const struct object
- for (m = get_multi_pack_index(the_repository); m; m = m->next) {
+ odb_prepare_alternates(the_repository->objects);
+
-+ for (struct odb_source *source = the_repository->objects->sources; source; source = source->next) {
++ for (source = the_repository->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
struct pack_entry e;
- if (fill_midx_entry(the_repository, oid, &e, m)) {
@@ builtin/repack.c: static void mark_packs_for_deletion(struct existing_packs *exi
clear_midx_file(the_repository);
strbuf_insertf(&buf, 0, "%s/", dir_name);
unlink_pack_path(buf.buf, 1);
+@@ builtin/repack.c: int cmd_repack(int argc,
+ * midx_has_unknown_packs() will make the decision for
+ * us.
+ */
+- if (!get_local_multi_pack_index(the_repository))
++ if (!get_multi_pack_index(the_repository->objects->sources))
+ midx_must_contain_cruft = 1;
+ }
+
+@@ builtin/repack.c: int cmd_repack(int argc,
+
+ string_list_sort(&names);
+
+- if (get_local_multi_pack_index(the_repository)) {
++ if (get_multi_pack_index(the_repository->objects->sources)) {
+ struct multi_pack_index *m =
+- get_local_multi_pack_index(the_repository);
++ get_multi_pack_index(the_repository->objects->sources);
+
+ ALLOC_ARRAY(midx_pack_names,
+ m->num_packs + m->num_packs_in_base);
## midx-write.c ##
@@ midx-write.c: static int write_midx_bitmap(struct write_midx_context *ctx,
@@ object-name.c: static void unique_in_pack(struct packed_git *p,
static void find_short_packed_object(struct disambiguate_state *ds)
{
- struct multi_pack_index *m;
++ struct odb_source *source;
struct packed_git *p;
/* Skip, unless oids from the storage hash algorithm are wanted */
@@ object-name.c: static void unique_in_pack(struct packed_git *p,
- m = m->next)
- unique_in_midx(m, ds);
+ odb_prepare_alternates(ds->repo->objects);
-+ for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
++ for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ if (m)
+ unique_in_midx(m, ds);
@@ object-name.c: static void find_abbrev_len_for_pack(struct packed_git *p,
}
## pack-bitmap.c ##
-@@ pack-bitmap.c: static int open_midx_bitmap(struct repository *r,
+@@ pack-bitmap.c: static int open_pack_bitmap(struct repository *r,
+ static int open_midx_bitmap(struct repository *r,
struct bitmap_index *bitmap_git)
{
++ struct odb_source *source;
int ret = -1;
- struct multi_pack_index *midx;
@@ pack-bitmap.c: static int open_midx_bitmap(struct repository *r,
- for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
- if (!open_midx_bitmap_1(bitmap_git, midx))
+ odb_prepare_alternates(r->objects);
-+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
++ for (source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *midx = get_multi_pack_index(source);
+ if (midx && !open_midx_bitmap_1(bitmap_git, midx))
ret = 0;
}
-+
return ret;
- }
+@@ pack-bitmap.c: static int verify_bitmap_file(const struct git_hash_algo *algop,
-@@ pack-bitmap.c: int verify_bitmap_files(struct repository *r)
+ int verify_bitmap_files(struct repository *r)
{
++ struct odb_source *source;
int res = 0;
- for (struct multi_pack_index *m = get_multi_pack_index(r);
- m; m = m->next) {
- char *midx_bitmap_name = midx_bitmap_filename(m);
+ odb_prepare_alternates(r->objects);
-+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
++ for (source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ char *midx_bitmap_name;
+
@@ packfile.c: static void prepare_packed_git(struct repository *r);
if (!r->objects->approximate_object_count_valid) {
- unsigned long count;
- struct multi_pack_index *m;
++ struct odb_source *source;
+ unsigned long count = 0;
struct packed_git *p;
@@ packfile.c: static void prepare_packed_git(struct repository *r);
- for (m = get_multi_pack_index(r); m; m = m->next)
- count += m->num_objects;
+
-+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
++ for (source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ if (m)
+ count += m->num_objects;
@@ packfile.c: struct packed_git *get_packed_git(struct repository *r)
-
- return NULL;
+ prepare_packed_git(source->odb->repo);
-+ return source->multi_pack_index;
++ return source->midx;
}
struct packed_git *get_all_packs(struct repository *r)
6: 5f9d1d62514 ! 5: 05264cb723d packfile: stop using linked MIDX list in `find_pack_entry()`
@@ packfile.c: static int fill_pack_entry(const struct object_id *oid,
- for (m = r->objects->multi_pack_index; m; m = m->next) {
- if (fill_midx_entry(r, oid, e, m))
+ for (struct odb_source *source = r->objects->sources; source; source = source->next)
-+ if (source->multi_pack_index && fill_midx_entry(r, oid, e, source->multi_pack_index))
++ if (source->midx && fill_midx_entry(r, oid, e, source->midx))
return 1;
- }
+
7: 020761d0ed5 ! 6: 995581b2979 packfile: stop using linked MIDX list in `get_all_packs()`
@@ packfile.c: struct multi_pack_index *get_multi_pack_index(struct odb_source *sou
- for (i = 0; i < m->num_packs + m->num_packs_in_base; i++)
+
+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
-+ struct multi_pack_index *m = source->multi_pack_index;
++ struct multi_pack_index *m = source->midx;
+ if (!m)
+ continue;
+ for (uint32_t i = 0; i < m->num_packs + m->num_packs_in_base; i++)
8: d34129034ef ! 7: 7320e4cbe37 midx: remove now-unused linked list of multi-pack indices
@@ midx.c: int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_i
struct repository *r = source->odb->repo;
- struct multi_pack_index *m;
- if (source->multi_pack_index_loaded)
- return !!source->multi_pack_index;
-@@ midx.c: int prepare_multi_pack_index_one(struct odb_source *source, int local)
+ prepare_repo_settings(r);
if (!r->settings.core_multi_pack_index)
- return 0;
+@@ midx.c: int prepare_multi_pack_index_one(struct odb_source *source, int local)
+ if (source->midx)
+ return 1;
- m = load_multi_pack_index(r, source->path, local);
- if (m) {
@@ midx.c: int prepare_multi_pack_index_one(struct odb_source *source, int local)
- } else {
- r->objects->multi_pack_index = m;
- }
-- source->multi_pack_index = m;
+- source->midx = m;
++ source->midx = load_multi_pack_index(r, source->path, local);
+
+- return 1;
- }
-
-+ source->multi_pack_index = load_multi_pack_index(r, source->path, local);
- source->multi_pack_index_loaded = 1;
-+
- return !!source->multi_pack_index;
+- return 0;
++ return !!source->midx;
}
+ int midx_checksum_valid(struct multi_pack_index *m)
@@ midx.c: void clear_midx_file(struct repository *r)
- source->multi_pack_index = NULL;
- source->multi_pack_index_loaded = 0;
+ close_midx(source->midx);
+ source->midx = NULL;
}
- r->objects->multi_pack_index = NULL;
}
@@ midx.c: void clear_midx_file(struct repository *r)
if (remove_path(midx.buf))
## midx.h ##
-@@ midx.h: struct repository;
+@@ midx.h: struct odb_source;
"GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL"
struct multi_pack_index {
@@ odb.h: struct object_database {
## packfile.c ##
@@ packfile.c: void close_object_store(struct object_database *o)
- source->multi_pack_index = NULL;
- source->multi_pack_index_loaded = 0;
+ close_midx(source->midx);
+ source->midx = NULL;
}
- o->multi_pack_index = NULL;
---
base-commit: 10b5a133fe89e14e8b4a0d441119d71ab3620ba8
change-id: 20250513-b4-pks-midx-via-odb-alternate-d4b5940a28cd
^ permalink raw reply [flat|nested] 43+ messages in thread* [PATCH v2 1/7] midx: start tracking per object database source
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
@ 2025-07-15 11:29 ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 2/7] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
` (7 subsequent siblings)
8 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 11:29 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Justin Tobler, Junio C Hamano
Multi-pack indices are tracked via `struct multi_pack_index`. This data
structure is stored as a linked list inside `struct object_database`,
which is the global database that spans across all of the object
sources.
This layout causes two problems:
- Object databases consist of multiple object sources (e.g. one source
per alternate object directory), where each multi-pack index is
specific to one of those sources. Regardless of that though, the
MIDX is not tracked per source, but tracked globally for the whole
object database. This creates a mismatch between the on-disk layout
and how things are organized in the object database subsystems and
makes some parts, like figuring out whether a source has an MIDX,
quite awkward.
- Multi-pack indices are an implementation detail of how efficient
access for packfiles work. As such, they are neither relevant in the
context of loose objects, nor in a potential future where we have
pluggable backends.
Refactor `prepare_multi_pack_index_one()` so that it works on a specific
source, which allows us to easily store a pointer to the multi-pack
index inside of it. For now, this pointer exists next to the existing
linked list we have in the object database. Users will be adjusted in
subsequent patches to instead use the per-source pointers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
midx.c | 19 +++++++++++--------
midx.h | 3 ++-
odb.h | 9 ++++++++-
packfile.c | 4 +++-
4 files changed, 24 insertions(+), 11 deletions(-)
diff --git a/midx.c b/midx.c
index 3c5bc821730..2f64c26058f 100644
--- a/midx.c
+++ b/midx.c
@@ -724,28 +724,29 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
return 0;
}
-int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local)
+int prepare_multi_pack_index_one(struct odb_source *source, int local)
{
+ struct repository *r = source->odb->repo;
struct multi_pack_index *m;
- struct multi_pack_index *m_search;
prepare_repo_settings(r);
if (!r->settings.core_multi_pack_index)
return 0;
- for (m_search = r->objects->multi_pack_index; m_search; m_search = m_search->next)
- if (!strcmp(object_dir, m_search->object_dir))
- return 1;
-
- m = load_multi_pack_index(r, object_dir, local);
+ if (source->midx)
+ return 1;
+ m = load_multi_pack_index(r, source->path, local);
if (m) {
struct multi_pack_index *mp = r->objects->multi_pack_index;
if (mp) {
m->next = mp->next;
mp->next = m;
- } else
+ } else {
r->objects->multi_pack_index = m;
+ }
+ source->midx = m;
+
return 1;
}
@@ -837,6 +838,8 @@ void clear_midx_file(struct repository *r)
if (r->objects && r->objects->multi_pack_index) {
close_midx(r->objects->multi_pack_index);
r->objects->multi_pack_index = NULL;
+ for (struct odb_source *source = r->objects->sources; source; source = source->next)
+ source->midx = NULL;
}
if (remove_path(midx.buf))
diff --git a/midx.h b/midx.h
index 9d1374cbd58..639a6f50e45 100644
--- a/midx.h
+++ b/midx.h
@@ -8,6 +8,7 @@ struct pack_entry;
struct repository;
struct bitmapped_pack;
struct git_hash_algo;
+struct odb_source;
#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
#define MIDX_VERSION 1
@@ -123,7 +124,7 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
int midx_contains_pack(struct multi_pack_index *m,
const char *idx_or_pack_name);
int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id);
-int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
+int prepare_multi_pack_index_one(struct odb_source *source, int local);
/*
* Variant of write_midx_file which writes a MIDX containing only the packs
diff --git a/odb.h b/odb.h
index e922f256802..f09dba1fe1d 100644
--- a/odb.h
+++ b/odb.h
@@ -13,6 +13,7 @@ struct oidmap;
struct oidtree;
struct strbuf;
struct repository;
+struct multi_pack_index;
/*
* Compute the exact path an alternate is at and returns it. In case of
@@ -55,6 +56,13 @@ struct odb_source {
/* Map between object IDs for loose objects. */
struct loose_object_map *loose_map;
+ /*
+ * private data
+ *
+ * should only be accessed directly by packfile.c and midx.c
+ */
+ struct multi_pack_index *midx;
+
/*
* This is a temporary object store created by the tmp_objdir
* facility. Disable ref updates since the objects in the store
@@ -75,7 +83,6 @@ struct odb_source {
};
struct packed_git;
-struct multi_pack_index;
struct cached_object_entry;
/*
diff --git a/packfile.c b/packfile.c
index af9ccfdba62..8bdd85fc7e7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -372,6 +372,8 @@ void close_object_store(struct object_database *o)
if (o->multi_pack_index) {
close_midx(o->multi_pack_index);
o->multi_pack_index = NULL;
+ for (struct odb_source *source = o->sources; source; source = source->next)
+ source->midx = NULL;
}
close_commit_graph(o);
@@ -1037,7 +1039,7 @@ static void prepare_packed_git(struct repository *r)
odb_prepare_alternates(r->objects);
for (source = r->objects->sources; source; source = source->next) {
int local = (source == r->objects->sources);
- prepare_multi_pack_index_one(r, source->path, local);
+ prepare_multi_pack_index_one(source, local);
prepare_packed_git_one(r, source->path, local);
}
rearrange_packed_git(r);
--
2.50.1.404.ge9779f6434.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 2/7] packfile: refactor `prepare_packed_git_one()` to work on sources
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 1/7] midx: start tracking per object database source Patrick Steinhardt
@ 2025-07-15 11:29 ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 3/7] midx: stop using linked list when closing MIDX Patrick Steinhardt
` (6 subsequent siblings)
8 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 11:29 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Justin Tobler, Junio C Hamano
In the preceding commit we refactored how we load multi-pack indices to
take a corresponding "source" as input. As part of this refactoring we
started to store a pointer to the MIDX in `struct odb_source` itself.
Refactor loading of packfiles in the same way: instead of passing in the
object directory, we now pass in the source from which we want to load
packfiles. This allows us to simplify the code because we don't have to
search for a corresponding MIDX anymore, but we can instead directly use
the MIDX that we have already prepared beforehand.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/packfile.c b/packfile.c
index 8bdd85fc7e7..0b3142973b6 100644
--- a/packfile.c
+++ b/packfile.c
@@ -935,22 +935,17 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
report_garbage(PACKDIR_FILE_GARBAGE, full_name);
}
-static void prepare_packed_git_one(struct repository *r, char *objdir, int local)
+static void prepare_packed_git_one(struct odb_source *source, int local)
{
- struct prepare_pack_data data;
struct string_list garbage = STRING_LIST_INIT_DUP;
+ struct prepare_pack_data data = {
+ .m = source->midx,
+ .r = source->odb->repo,
+ .garbage = &garbage,
+ .local = local,
+ };
- data.m = r->objects->multi_pack_index;
-
- /* look for the multi-pack-index for this object directory */
- while (data.m && strcmp(data.m->object_dir, objdir))
- data.m = data.m->next;
-
- data.r = r;
- data.garbage = &garbage;
- data.local = local;
-
- for_each_file_in_pack_dir(objdir, prepare_pack, &data);
+ for_each_file_in_pack_dir(source->path, prepare_pack, &data);
report_pack_garbage(data.garbage);
string_list_clear(data.garbage, 0);
@@ -1040,7 +1035,7 @@ static void prepare_packed_git(struct repository *r)
for (source = r->objects->sources; source; source = source->next) {
int local = (source == r->objects->sources);
prepare_multi_pack_index_one(source, local);
- prepare_packed_git_one(r, source->path, local);
+ prepare_packed_git_one(source, local);
}
rearrange_packed_git(r);
--
2.50.1.404.ge9779f6434.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 3/7] midx: stop using linked list when closing MIDX
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 1/7] midx: start tracking per object database source Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 2/7] packfile: refactor `prepare_packed_git_one()` to work on sources Patrick Steinhardt
@ 2025-07-15 11:29 ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 4/7] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
` (5 subsequent siblings)
8 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 11:29 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Justin Tobler, Junio C Hamano
When calling `close_midx()` we not only close the multi-pack index for
one object source, but instead we iterate through the whole linked list
of MIDXs to close all of them. This linked list is about to go away in
favor of using the new per-source pointer to its respective MIDX.
Refactor the function to iterate through sources instead.
Note that after this patch, there's a couple of callsites left that
continue to use `close_midx()` without iterating through all sources.
These are all cases where we don't care about the MIDX from other
sources though, so it's fine to keep them as-is.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
midx.c | 13 ++++++++-----
packfile.c | 11 ++++++-----
2 files changed, 14 insertions(+), 10 deletions(-)
diff --git a/midx.c b/midx.c
index 2f64c26058f..472d6bf17ab 100644
--- a/midx.c
+++ b/midx.c
@@ -401,7 +401,6 @@ void close_midx(struct multi_pack_index *m)
if (!m)
return;
- close_midx(m->next);
close_midx(m->base_midx);
munmap((unsigned char *)m->data, m->data_len);
@@ -835,11 +834,15 @@ void clear_midx_file(struct repository *r)
get_midx_filename(r->hash_algo, &midx, r->objects->sources->path);
- if (r->objects && r->objects->multi_pack_index) {
- close_midx(r->objects->multi_pack_index);
- r->objects->multi_pack_index = NULL;
- for (struct odb_source *source = r->objects->sources; source; source = source->next)
+ if (r->objects) {
+ struct odb_source *source;
+
+ for (source = r->objects->sources; source; source = source->next) {
+ if (source->midx)
+ close_midx(source->midx);
source->midx = NULL;
+ }
+ r->objects->multi_pack_index = NULL;
}
if (remove_path(midx.buf))
diff --git a/packfile.c b/packfile.c
index 0b3142973b6..7b350f018ca 100644
--- a/packfile.c
+++ b/packfile.c
@@ -361,6 +361,7 @@ void close_pack(struct packed_git *p)
void close_object_store(struct object_database *o)
{
+ struct odb_source *source;
struct packed_git *p;
for (p = o->packed_git; p; p = p->next)
@@ -369,12 +370,12 @@ void close_object_store(struct object_database *o)
else
close_pack(p);
- if (o->multi_pack_index) {
- close_midx(o->multi_pack_index);
- o->multi_pack_index = NULL;
- for (struct odb_source *source = o->sources; source; source = source->next)
- source->midx = NULL;
+ for (source = o->sources; source; source = source->next) {
+ if (source->midx)
+ close_midx(source->midx);
+ source->midx = NULL;
}
+ o->multi_pack_index = NULL;
close_commit_graph(o);
}
--
2.50.1.404.ge9779f6434.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 4/7] packfile: refactor `get_multi_pack_index()` to work on sources
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
` (2 preceding siblings ...)
2025-07-15 11:29 ` [PATCH v2 3/7] midx: stop using linked list when closing MIDX Patrick Steinhardt
@ 2025-07-15 11:29 ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 5/7] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
` (4 subsequent siblings)
8 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 11:29 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Justin Tobler, Junio C Hamano
The function `get_multi_pack_index()` loads multi-pack indices via
`prepare_packed_git()` and then returns the linked list of multi-pack
indices that is stored in `struct object_database`. That list is in the
process of being removed though in favor of storing the MIDX as part of
the object database source it belongs to.
Refactor `get_multi_pack_index()` so that it returns the multi-pack
index for a single object source. Callers are now expected to call this
function for each source they are interested in. This requires them to
iterate through alternates, so we have to prepare alternate object
sources before doing so.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
builtin/pack-objects.c | 10 +++++++---
builtin/repack.c | 10 +++++-----
midx-write.c | 22 ++--------------------
object-name.c | 22 +++++++++++++++-------
pack-bitmap.c | 21 +++++++++++++++------
packfile.c | 31 ++++++++++++-------------------
packfile.h | 3 +--
7 files changed, 57 insertions(+), 62 deletions(-)
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 067b9e322a9..3dd84495b86 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1706,8 +1706,8 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
uint32_t found_mtime)
{
int want;
+ struct odb_source *source;
struct list_head *pos;
- struct multi_pack_index *m;
if (!exclude && local && has_loose_object_nonlocal(oid))
return 0;
@@ -1727,9 +1727,13 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
*found_offset = 0;
}
- for (m = get_multi_pack_index(the_repository); m; m = m->next) {
+ odb_prepare_alternates(the_repository->objects);
+
+ for (source = the_repository->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
struct pack_entry e;
- if (fill_midx_entry(the_repository, oid, &e, m)) {
+
+ if (m && fill_midx_entry(the_repository, oid, &e, m)) {
want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset, found_mtime);
if (want != -1)
return want;
diff --git a/builtin/repack.c b/builtin/repack.c
index 5e89d96df13..d63e1a9fec2 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -223,9 +223,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
static void remove_redundant_pack(const char *dir_name, const char *base_name)
{
struct strbuf buf = STRBUF_INIT;
- struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
+ struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);
strbuf_addf(&buf, "%s.pack", base_name);
- if (m && midx_contains_pack(m, buf.buf))
+ if (m && m->local && midx_contains_pack(m, buf.buf))
clear_midx_file(the_repository);
strbuf_insertf(&buf, 0, "%s/", dir_name);
unlink_pack_path(buf.buf, 1);
@@ -1531,7 +1531,7 @@ int cmd_repack(int argc,
* midx_has_unknown_packs() will make the decision for
* us.
*/
- if (!get_local_multi_pack_index(the_repository))
+ if (!get_multi_pack_index(the_repository->objects->sources))
midx_must_contain_cruft = 1;
}
@@ -1614,9 +1614,9 @@ int cmd_repack(int argc,
string_list_sort(&names);
- if (get_local_multi_pack_index(the_repository)) {
+ if (get_multi_pack_index(the_repository->objects->sources)) {
struct multi_pack_index *m =
- get_local_multi_pack_index(the_repository);
+ get_multi_pack_index(the_repository->objects->sources);
ALLOC_ARRAY(midx_pack_names,
m->num_packs + m->num_packs_in_base);
diff --git a/midx-write.c b/midx-write.c
index f2cfb85476e..c1ae62d3549 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
const char *object_dir)
{
- struct multi_pack_index *result = NULL;
- struct multi_pack_index *cur;
- char *obj_dir_real = real_pathdup(object_dir, 1);
- struct strbuf cur_path_real = STRBUF_INIT;
-
- /* Ensure the given object_dir is local, or a known alternate. */
- odb_find_source(r->objects, obj_dir_real);
-
- for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
- strbuf_realpath(&cur_path_real, cur->object_dir, 1);
- if (!strcmp(obj_dir_real, cur_path_real.buf)) {
- result = cur;
- goto cleanup;
- }
- }
-
-cleanup:
- free(obj_dir_real);
- strbuf_release(&cur_path_real);
- return result;
+ struct odb_source *source = odb_find_source(r->objects, object_dir);
+ return get_multi_pack_index(source);
}
static int fill_packs_from_midx(struct write_midx_context *ctx,
diff --git a/object-name.c b/object-name.c
index ddafe7f9b13..27138b55b47 100644
--- a/object-name.c
+++ b/object-name.c
@@ -198,16 +198,20 @@ static void unique_in_pack(struct packed_git *p,
static void find_short_packed_object(struct disambiguate_state *ds)
{
- struct multi_pack_index *m;
+ struct odb_source *source;
struct packed_git *p;
/* Skip, unless oids from the storage hash algorithm are wanted */
if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
return;
- for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
- m = m->next)
- unique_in_midx(m, ds);
+ odb_prepare_alternates(ds->repo->objects);
+ for (source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ if (m)
+ unique_in_midx(m, ds);
+ }
+
for (p = get_packed_git(ds->repo); p && !ds->ambiguous;
p = p->next)
unique_in_pack(p, ds);
@@ -792,11 +796,15 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
static void find_abbrev_len_packed(struct min_abbrev_data *mad)
{
- struct multi_pack_index *m;
struct packed_git *p;
- for (m = get_multi_pack_index(mad->repo); m; m = m->next)
- find_abbrev_len_for_midx(m, mad);
+ odb_prepare_alternates(mad->repo->objects);
+ for (struct odb_source *source = mad->repo->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ if (m)
+ find_abbrev_len_for_midx(m, mad);
+ }
+
for (p = get_packed_git(mad->repo); p; p = p->next)
find_abbrev_len_for_pack(p, mad);
}
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 0a4af199c05..64278e2acf7 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -691,13 +691,15 @@ static int open_pack_bitmap(struct repository *r,
static int open_midx_bitmap(struct repository *r,
struct bitmap_index *bitmap_git)
{
+ struct odb_source *source;
int ret = -1;
- struct multi_pack_index *midx;
assert(!bitmap_git->map);
- for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
- if (!open_midx_bitmap_1(bitmap_git, midx))
+ odb_prepare_alternates(r->objects);
+ for (source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *midx = get_multi_pack_index(source);
+ if (midx && !open_midx_bitmap_1(bitmap_git, midx))
ret = 0;
}
return ret;
@@ -3305,11 +3307,18 @@ static int verify_bitmap_file(const struct git_hash_algo *algop,
int verify_bitmap_files(struct repository *r)
{
+ struct odb_source *source;
int res = 0;
- for (struct multi_pack_index *m = get_multi_pack_index(r);
- m; m = m->next) {
- char *midx_bitmap_name = midx_bitmap_filename(m);
+ odb_prepare_alternates(r->objects);
+ for (source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ char *midx_bitmap_name;
+
+ if (!m)
+ continue;
+
+ midx_bitmap_name = midx_bitmap_filename(m);
res |= verify_bitmap_file(r->hash_algo, midx_bitmap_name);
free(midx_bitmap_name);
}
diff --git a/packfile.c b/packfile.c
index 7b350f018ca..d0f38a02035 100644
--- a/packfile.c
+++ b/packfile.c
@@ -963,14 +963,18 @@ static void prepare_packed_git(struct repository *r);
unsigned long repo_approximate_object_count(struct repository *r)
{
if (!r->objects->approximate_object_count_valid) {
- unsigned long count;
- struct multi_pack_index *m;
+ struct odb_source *source;
+ unsigned long count = 0;
struct packed_git *p;
prepare_packed_git(r);
- count = 0;
- for (m = get_multi_pack_index(r); m; m = m->next)
- count += m->num_objects;
+
+ for (source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = get_multi_pack_index(source);
+ if (m)
+ count += m->num_objects;
+ }
+
for (p = r->objects->packed_git; p; p = p->next) {
if (open_pack_index(p))
continue;
@@ -1074,21 +1078,10 @@ struct packed_git *get_packed_git(struct repository *r)
return r->objects->packed_git;
}
-struct multi_pack_index *get_multi_pack_index(struct repository *r)
-{
- prepare_packed_git(r);
- return r->objects->multi_pack_index;
-}
-
-struct multi_pack_index *get_local_multi_pack_index(struct repository *r)
+struct multi_pack_index *get_multi_pack_index(struct odb_source *source)
{
- struct multi_pack_index *m = get_multi_pack_index(r);
-
- /* no need to iterate; we always put the local one first (if any) */
- if (m && m->local)
- return m;
-
- return NULL;
+ prepare_packed_git(source->odb->repo);
+ return source->midx;
}
struct packed_git *get_all_packs(struct repository *r)
diff --git a/packfile.h b/packfile.h
index 53c3b7d3b43..f16753f2a9b 100644
--- a/packfile.h
+++ b/packfile.h
@@ -147,8 +147,7 @@ void install_packed_git(struct repository *r, struct packed_git *pack);
struct packed_git *get_packed_git(struct repository *r);
struct list_head *get_packed_git_mru(struct repository *r);
-struct multi_pack_index *get_multi_pack_index(struct repository *r);
-struct multi_pack_index *get_local_multi_pack_index(struct repository *r);
+struct multi_pack_index *get_multi_pack_index(struct odb_source *source);
struct packed_git *get_all_packs(struct repository *r);
/*
--
2.50.1.404.ge9779f6434.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 5/7] packfile: stop using linked MIDX list in `find_pack_entry()`
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
` (3 preceding siblings ...)
2025-07-15 11:29 ` [PATCH v2 4/7] packfile: refactor `get_multi_pack_index()` to work on sources Patrick Steinhardt
@ 2025-07-15 11:29 ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 6/7] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
` (3 subsequent siblings)
8 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 11:29 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Justin Tobler, Junio C Hamano
Refactor `find_pack_entry()` so that we stop using the linked list of
multi-pack indices. Note that there is no need to explicitly prepare
alternates, and neither do we have to use `get_multi_pack_index()`,
because `prepare_packed_git()` already takes care of populating all data
structures for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/packfile.c b/packfile.c
index d0f38a02035..2d19c53ea96 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2074,16 +2074,15 @@ static int fill_pack_entry(const struct object_id *oid,
int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e)
{
struct list_head *pos;
- struct multi_pack_index *m;
prepare_packed_git(r);
- if (!r->objects->packed_git && !r->objects->multi_pack_index)
- return 0;
- for (m = r->objects->multi_pack_index; m; m = m->next) {
- if (fill_midx_entry(r, oid, e, m))
+ for (struct odb_source *source = r->objects->sources; source; source = source->next)
+ if (source->midx && fill_midx_entry(r, oid, e, source->midx))
return 1;
- }
+
+ if (!r->objects->packed_git)
+ return 0;
list_for_each(pos, &r->objects->packed_git_mru) {
struct packed_git *p = list_entry(pos, struct packed_git, mru);
--
2.50.1.404.ge9779f6434.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 6/7] packfile: stop using linked MIDX list in `get_all_packs()`
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
` (4 preceding siblings ...)
2025-07-15 11:29 ` [PATCH v2 5/7] packfile: stop using linked MIDX list in `find_pack_entry()` Patrick Steinhardt
@ 2025-07-15 11:29 ` Patrick Steinhardt
2025-07-15 11:29 ` [PATCH v2 7/7] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
` (2 subsequent siblings)
8 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 11:29 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Justin Tobler, Junio C Hamano
Refactor `get_all_packs()` so that we stop using the linked list of
multi-pack indices. Note that there is no need to explicitly prepare
alternates, and neither do we have to use `get_multi_pack_index()`,
because `prepare_packed_git()` already takes care of populating all data
structures for us.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
packfile.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/packfile.c b/packfile.c
index 2d19c53ea96..ff33692f4b5 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1086,12 +1086,13 @@ struct multi_pack_index *get_multi_pack_index(struct odb_source *source)
struct packed_git *get_all_packs(struct repository *r)
{
- struct multi_pack_index *m;
-
prepare_packed_git(r);
- for (m = r->objects->multi_pack_index; m; m = m->next) {
- uint32_t i;
- for (i = 0; i < m->num_packs + m->num_packs_in_base; i++)
+
+ for (struct odb_source *source = r->objects->sources; source; source = source->next) {
+ struct multi_pack_index *m = source->midx;
+ if (!m)
+ continue;
+ for (uint32_t i = 0; i < m->num_packs + m->num_packs_in_base; i++)
prepare_midx_pack(r, m, i);
}
--
2.50.1.404.ge9779f6434.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* [PATCH v2 7/7] midx: remove now-unused linked list of multi-pack indices
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
` (5 preceding siblings ...)
2025-07-15 11:29 ` [PATCH v2 6/7] packfile: stop using linked MIDX list in `get_all_packs()` Patrick Steinhardt
@ 2025-07-15 11:29 ` Patrick Steinhardt
2025-07-15 21:59 ` [PATCH v2 0/7] odb: track multi-pack-indices via their object sources Justin Tobler
2025-07-23 21:22 ` Junio C Hamano
8 siblings, 0 replies; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-15 11:29 UTC (permalink / raw)
To: git; +Cc: Taylor Blau, Justin Tobler, Junio C Hamano
In the preceding commits we have migrated all users of the linked list
of multi-pack indices to instead use those stored in the object database
sources. Remove those now-unused pointers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
midx.c | 18 ++----------------
midx.h | 2 --
odb.h | 7 -------
packfile.c | 1 -
4 files changed, 2 insertions(+), 26 deletions(-)
diff --git a/midx.c b/midx.c
index 472d6bf17ab..7d407682e60 100644
--- a/midx.c
+++ b/midx.c
@@ -726,7 +726,6 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
int prepare_multi_pack_index_one(struct odb_source *source, int local)
{
struct repository *r = source->odb->repo;
- struct multi_pack_index *m;
prepare_repo_settings(r);
if (!r->settings.core_multi_pack_index)
@@ -735,21 +734,9 @@ int prepare_multi_pack_index_one(struct odb_source *source, int local)
if (source->midx)
return 1;
- m = load_multi_pack_index(r, source->path, local);
- if (m) {
- struct multi_pack_index *mp = r->objects->multi_pack_index;
- if (mp) {
- m->next = mp->next;
- mp->next = m;
- } else {
- r->objects->multi_pack_index = m;
- }
- source->midx = m;
+ source->midx = load_multi_pack_index(r, source->path, local);
- return 1;
- }
-
- return 0;
+ return !!source->midx;
}
int midx_checksum_valid(struct multi_pack_index *m)
@@ -842,7 +829,6 @@ void clear_midx_file(struct repository *r)
close_midx(source->midx);
source->midx = NULL;
}
- r->objects->multi_pack_index = NULL;
}
if (remove_path(midx.buf))
diff --git a/midx.h b/midx.h
index 639a6f50e45..076382de8ac 100644
--- a/midx.h
+++ b/midx.h
@@ -35,8 +35,6 @@ struct odb_source;
"GIT_TEST_MULTI_PACK_INDEX_WRITE_INCREMENTAL"
struct multi_pack_index {
- struct multi_pack_index *next;
-
const unsigned char *data;
size_t data_len;
diff --git a/odb.h b/odb.h
index f09dba1fe1d..09177bf430d 100644
--- a/odb.h
+++ b/odb.h
@@ -123,13 +123,6 @@ struct object_database {
struct commit_graph *commit_graph;
unsigned commit_graph_attempted : 1; /* if loading has been attempted */
- /*
- * private data
- *
- * should only be accessed directly by packfile.c and midx.c
- */
- struct multi_pack_index *multi_pack_index;
-
/*
* private data
*
diff --git a/packfile.c b/packfile.c
index ff33692f4b5..5d73932f50c 100644
--- a/packfile.c
+++ b/packfile.c
@@ -375,7 +375,6 @@ void close_object_store(struct object_database *o)
close_midx(source->midx);
source->midx = NULL;
}
- o->multi_pack_index = NULL;
close_commit_graph(o);
}
--
2.50.1.404.ge9779f6434.dirty
^ permalink raw reply related [flat|nested] 43+ messages in thread* Re: [PATCH v2 0/7] odb: track multi-pack-indices via their object sources
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
` (6 preceding siblings ...)
2025-07-15 11:29 ` [PATCH v2 7/7] midx: remove now-unused linked list of multi-pack indices Patrick Steinhardt
@ 2025-07-15 21:59 ` Justin Tobler
2025-07-23 21:22 ` Junio C Hamano
8 siblings, 0 replies; 43+ messages in thread
From: Justin Tobler @ 2025-07-15 21:59 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Taylor Blau, Junio C Hamano
On 25/07/15 01:29PM, Patrick Steinhardt wrote:
> Hi,
>
> multi-pack-indices are tracked via `struct multi_pack_index`. This data
> structure is stored inside `struct object_database`, which is the global
> database that spans across all of the object sources
>
> This layout causes two problems:
>
> - Multi-pack indices aren't global to an object database, but instead
> there can be one multi-pack index per object source. This creates a
> mismatch between the on-disk layout and how things are organized in
> the object database subsystems and makes some parts, like figuring
> out whether an object source has an MIDX, quite awkward.
>
> - Multi-pack indices are an implementation detail of how efficient
> access for packfiles work. As such, they are neither relevant in the
> context of loose objects, nor in a potential future where we have
> pluggable backends.
>
> This patch series thus refactors the codebase to stop tracking MIDX's
> globally. Instead, they are being pushed down one level so that every
> `struct odb_source` has an optional MIDX itself. This simplifies some of
> our code and will make it easier in a future iteration to move the data
> into a packfile-specific object source backend.
>
> Changes in v2:
> - Changed the base of this series. It is now built on top of
> a30f80fde92 (The eighth batch, 2025-07-08) with "ps/object-store" at
> 841a03b4046 (odb: rename `read_object_with_reference()`, 2025-07-01)
> and "tb/midx-avoid-cruft-packs" at 5ee86c273bf (repack: exclude
> cruft pack(s) from the MIDX where possible, 2025-06-23) merged into
> it.
> - Re-explain the split between object databases and object sources
> to help readers out a bit, given that this is a rather recent
> change.
> - Rename `struct odb_source::multi_pack_index` to `struct
> odb_source::midx`.
> - Fix some overly long lines when looping through the individual
> sources.
> - Drop the patch that guards re-loading MIDXs, as we already have the
> guard via `packed_git_initialized`.
> - Remove some while-at-it changes to make the diffs easier to read.
> - Link to v1: https://lore.kernel.org/r/20250709-b4-pks-midx-via-odb-alternate-v1-0-f31150d21331@pks.im
Thanks Patrick! From the range-diff, this version looks good to me :)
-Justin
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 0/7] odb: track multi-pack-indices via their object sources
2025-07-15 11:29 ` [PATCH v2 0/7] " Patrick Steinhardt
` (7 preceding siblings ...)
2025-07-15 21:59 ` [PATCH v2 0/7] odb: track multi-pack-indices via their object sources Justin Tobler
@ 2025-07-23 21:22 ` Junio C Hamano
2025-07-24 8:00 ` Patrick Steinhardt
8 siblings, 1 reply; 43+ messages in thread
From: Junio C Hamano @ 2025-07-23 21:22 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git, Taylor Blau, Justin Tobler
Patrick Steinhardt <ps@pks.im> writes:
> Changes in v2:
> - Changed the base of this series. It is now built on top of
> a30f80fde92 (The eighth batch, 2025-07-08) with "ps/object-store" at
> 841a03b4046 (odb: rename `read_object_with_reference()`, 2025-07-01)
> and "tb/midx-avoid-cruft-packs" at 5ee86c273bf (repack: exclude
> cruft pack(s) from the MIDX where possible, 2025-06-23) merged into
> it.
> - Re-explain the split between object databases and object sources
> to help readers out a bit, given that this is a rather recent
> change.
> - Rename `struct odb_source::multi_pack_index` to `struct
> odb_source::midx`.
> - Fix some overly long lines when looping through the individual
> sources.
> - Drop the patch that guards re-loading MIDXs, as we already have the
> guard via `packed_git_initialized`.
> - Remove some while-at-it changes to make the diffs easier to read.
> - Link to v1: https://lore.kernel.org/r/20250709-b4-pks-midx-via-odb-alternate-v1-0-f31150d21331@pks.im
Shall we mark the topic for 'next' now?
We haven't seen any comments on this iteration.
Thanks.
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: [PATCH v2 0/7] odb: track multi-pack-indices via their object sources
2025-07-23 21:22 ` Junio C Hamano
@ 2025-07-24 8:00 ` Patrick Steinhardt
2025-08-12 21:58 ` Taylor Blau
0 siblings, 1 reply; 43+ messages in thread
From: Patrick Steinhardt @ 2025-07-24 8:00 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Taylor Blau, Justin Tobler
On Wed, Jul 23, 2025 at 02:22:08PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > Changes in v2:
> > - Changed the base of this series. It is now built on top of
> > a30f80fde92 (The eighth batch, 2025-07-08) with "ps/object-store" at
> > 841a03b4046 (odb: rename `read_object_with_reference()`, 2025-07-01)
> > and "tb/midx-avoid-cruft-packs" at 5ee86c273bf (repack: exclude
> > cruft pack(s) from the MIDX where possible, 2025-06-23) merged into
> > it.
> > - Re-explain the split between object databases and object sources
> > to help readers out a bit, given that this is a rather recent
> > change.
> > - Rename `struct odb_source::multi_pack_index` to `struct
> > odb_source::midx`.
> > - Fix some overly long lines when looping through the individual
> > sources.
> > - Drop the patch that guards re-loading MIDXs, as we already have the
> > guard via `packed_git_initialized`.
> > - Remove some while-at-it changes to make the diffs easier to read.
> > - Link to v1: https://lore.kernel.org/r/20250709-b4-pks-midx-via-odb-alternate-v1-0-f31150d21331@pks.im
>
> Shall we mark the topic for 'next' now?
>
> We haven't seen any comments on this iteration.
Almost all of the comments on the previous version were about style, so
nothing significant has changed in this version except for a couple of
renames and style fixes. Which means that I'm fine with the comments I
got for v1, but I wouldn't mind waiting two or three more days until
this gets merged down.
Patrick
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 0/7] odb: track multi-pack-indices via their object sources
2025-07-24 8:00 ` Patrick Steinhardt
@ 2025-08-12 21:58 ` Taylor Blau
0 siblings, 0 replies; 43+ messages in thread
From: Taylor Blau @ 2025-08-12 21:58 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: Junio C Hamano, git, Justin Tobler, Derrick Stolee
On Thu, Jul 24, 2025 at 10:00:02AM +0200, Patrick Steinhardt wrote:
> On Wed, Jul 23, 2025 at 02:22:08PM -0700, Junio C Hamano wrote:
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> > > Changes in v2:
> > > - Changed the base of this series. It is now built on top of
> > > a30f80fde92 (The eighth batch, 2025-07-08) with "ps/object-store" at
> > > 841a03b4046 (odb: rename `read_object_with_reference()`, 2025-07-01)
> > > and "tb/midx-avoid-cruft-packs" at 5ee86c273bf (repack: exclude
> > > cruft pack(s) from the MIDX where possible, 2025-06-23) merged into
> > > it.
> > > - Re-explain the split between object databases and object sources
> > > to help readers out a bit, given that this is a rather recent
> > > change.
> > > - Rename `struct odb_source::multi_pack_index` to `struct
> > > odb_source::midx`.
> > > - Fix some overly long lines when looping through the individual
> > > sources.
> > > - Drop the patch that guards re-loading MIDXs, as we already have the
> > > guard via `packed_git_initialized`.
> > > - Remove some while-at-it changes to make the diffs easier to read.
> > > - Link to v1: https://lore.kernel.org/r/20250709-b4-pks-midx-via-odb-alternate-v1-0-f31150d21331@pks.im
> >
> > Shall we mark the topic for 'next' now?
> >
> > We haven't seen any comments on this iteration.
>
> Almost all of the comments on the previous version were about style, so
> nothing significant has changed in this version except for a couple of
> renames and style fixes. Which means that I'm fine with the comments I
> got for v1, but I wouldn't mind waiting two or three more days until
> this gets merged down.
Sorry for dropping this off of my review queue -- I read the range-diff
and the new round looks good to me. I would, however, like to hear from
Stolee (CC'd) on some of the `--object-dir` behavior touched by this
series before merging.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 43+ messages in thread