* [PATCH 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
@ 2024-05-23 16:38 ` Taylor Blau
2024-05-29 7:47 ` Patrick Steinhardt
2024-05-23 16:38 ` [PATCH 2/8] midx-write.c: reduce argument count for `get_sorted_entries()` Taylor Blau
` (7 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Taylor Blau @ 2024-05-23 16:38 UTC (permalink / raw)
To: git
When passing a preferred pack to the MIDX write machinery, we ensure
that the given preferred pack is non-empty since 5d3cd09a808 (midx:
reject empty `--preferred-pack`'s, 2021-08-31).
However packs are only loaded (via `write_midx_internal()`, though a
subsequent patch will refactor this code out to its own function) when
the `MIDX_WRITE_REV_INDEX` flag is set.
So if a caller runs:
$ git multi-pack-index write --preferred-pack=...
with both (a) an existing MIDX, and (b) specifies a pack from that MIDX
as the preferred one, without passing `--bitmap`, then the check added
in 5d3cd09a808 will result in a segfault.
Note that packs loaded from disk which don't appear in an existing MIDX
do not trigger this issue, as those packs are loaded unconditionally. We
conditionally load packs from a MIDX since we tolerate MIDXs whose
packs do not resolve (i.e., via the MIDX write after removing
unreferenced packs via 'git multi-pack-index expire').
In practice, this isn't possible to trigger when running `git
multi-pack-index write` from via `git repack`, as the latter always
passes `--stdin-packs`, which prevents us from loading an existing MIDX,
as it forces all packs to be read from disk.
But a future commit in this series will change that behavior to
unconditionally load an existing MIDX, even with `--stdin-packs`, making
this behavior trigger-able from 'repack' much more easily.
Prevent this from being an issue by removing the segfault altogether by
calling `prepare_midx_pack()` on packs loaded from an existing MIDX when
either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a
`--preferred-pack`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 8 +++++++-
t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/midx-write.c b/midx-write.c
index 9d096d5a28..03e95ae821 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -930,11 +930,17 @@ static int write_midx_internal(const char *object_dir,
for (i = 0; i < ctx.m->num_packs; i++) {
ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
- if (flags & MIDX_WRITE_REV_INDEX) {
+ if (flags & MIDX_WRITE_REV_INDEX ||
+ preferred_pack_name) {
/*
* If generating a reverse index, need to have
* packed_git's loaded to compare their
* mtimes and object count.
+ *
+ * If a preferred pack is specified,
+ * need to have packed_git's loaded to
+ * ensure the chosen preferred pack has
+ * a non-zero object count.
*/
if (prepare_midx_pack(the_repository, ctx.m, i)) {
error(_("could not load pack"));
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index dd09134db0..10d2a6bf92 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -350,6 +350,29 @@ test_expect_success 'preferred packs must be non-empty' '
)
'
+test_expect_success 'preferred pack from existing MIDX without bitmaps' '
+ git init preferred-without-bitmaps &&
+ (
+ cd preferred-without-bitmaps &&
+
+ test_commit one &&
+ pack="$(git pack-objects --all $objdir/pack/pack </dev/null)" &&
+
+ git multi-pack-index write &&
+
+ # make another pack so that the subsequent MIDX write
+ # has something to do
+ test_commit two &&
+ git repack -d &&
+
+ # write a new MIDX without bitmaps reusing the singular
+ # pack from the existing MIDX as the preferred pack in
+ # the new MIDX
+ git multi-pack-index write --preferred-pack=pack-$pack.pack
+ )
+
+'
+
test_expect_success 'verify multi-pack-index success' '
git multi-pack-index verify --object-dir=$objdir
'
--
2.45.1.217.g9bb58e2bf5a.dirty
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps
2024-05-23 16:38 ` [PATCH 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
@ 2024-05-29 7:47 ` Patrick Steinhardt
2024-05-29 22:29 ` Taylor Blau
0 siblings, 1 reply; 34+ messages in thread
From: Patrick Steinhardt @ 2024-05-29 7:47 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 3719 bytes --]
On Thu, May 23, 2024 at 12:38:03PM -0400, Taylor Blau wrote:
> When passing a preferred pack to the MIDX write machinery, we ensure
> that the given preferred pack is non-empty since 5d3cd09a808 (midx:
> reject empty `--preferred-pack`'s, 2021-08-31).
>
> However packs are only loaded (via `write_midx_internal()`, though a
> subsequent patch will refactor this code out to its own function) when
> the `MIDX_WRITE_REV_INDEX` flag is set.
>
> So if a caller runs:
>
> $ git multi-pack-index write --preferred-pack=...
>
> with both (a) an existing MIDX, and (b) specifies a pack from that MIDX
> as the preferred one, without passing `--bitmap`, then the check added
> in 5d3cd09a808 will result in a segfault.
The check you're talking about is the following one, right?
if (ctx.preferred_pack_idx > -1) {
struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
if (!preferred->num_objects) {
error(_("cannot select preferred pack %s with no objects"),
preferred->pack_name);
result = 1;
goto cleanup;
}
}
And the segfault is because the index wasn't populated, and thus
`ctx.info[ctx.preferred_pack_idx].p == NULL`?
> Note that packs loaded from disk which don't appear in an existing MIDX
> do not trigger this issue, as those packs are loaded unconditionally. We
> conditionally load packs from a MIDX since we tolerate MIDXs whose
> packs do not resolve (i.e., via the MIDX write after removing
> unreferenced packs via 'git multi-pack-index expire').
>
> In practice, this isn't possible to trigger when running `git
> multi-pack-index write` from via `git repack`, as the latter always
s/from via/via/
> passes `--stdin-packs`, which prevents us from loading an existing MIDX,
> as it forces all packs to be read from disk.
>
> But a future commit in this series will change that behavior to
> unconditionally load an existing MIDX, even with `--stdin-packs`, making
> this behavior trigger-able from 'repack' much more easily.
>
> Prevent this from being an issue by removing the segfault altogether by
Removing segfaults is always good :)
> calling `prepare_midx_pack()` on packs loaded from an existing MIDX when
> either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a
> `--preferred-pack`.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> midx-write.c | 8 +++++++-
> t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++
> 2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/midx-write.c b/midx-write.c
> index 9d096d5a28..03e95ae821 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -930,11 +930,17 @@ static int write_midx_internal(const char *object_dir,
> for (i = 0; i < ctx.m->num_packs; i++) {
> ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
>
> - if (flags & MIDX_WRITE_REV_INDEX) {
> + if (flags & MIDX_WRITE_REV_INDEX ||
> + preferred_pack_name) {
> /*
> * If generating a reverse index, need to have
> * packed_git's loaded to compare their
> * mtimes and object count.
> + *
> + * If a preferred pack is specified,
> + * need to have packed_git's loaded to
> + * ensure the chosen preferred pack has
> + * a non-zero object count.
> */
> if (prepare_midx_pack(the_repository, ctx.m, i)) {
> error(_("could not load pack"));
We now end up loading all packs, but in practice it should be sufficient
to only load the preferred pack, right? Is there a particular reason why
we now load all packs?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps
2024-05-29 7:47 ` Patrick Steinhardt
@ 2024-05-29 22:29 ` Taylor Blau
0 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:29 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Wed, May 29, 2024 at 09:47:52AM +0200, Patrick Steinhardt wrote:
> On Thu, May 23, 2024 at 12:38:03PM -0400, Taylor Blau wrote:
> > When passing a preferred pack to the MIDX write machinery, we ensure
> > that the given preferred pack is non-empty since 5d3cd09a808 (midx:
> > reject empty `--preferred-pack`'s, 2021-08-31).
> >
> > However packs are only loaded (via `write_midx_internal()`, though a
> > subsequent patch will refactor this code out to its own function) when
> > the `MIDX_WRITE_REV_INDEX` flag is set.
> >
> > So if a caller runs:
> >
> > $ git multi-pack-index write --preferred-pack=...
> >
> > with both (a) an existing MIDX, and (b) specifies a pack from that MIDX
> > as the preferred one, without passing `--bitmap`, then the check added
> > in 5d3cd09a808 will result in a segfault.
>
> The check you're talking about is the following one, right?
>
> if (ctx.preferred_pack_idx > -1) {
> struct packed_git *preferred = ctx.info[ctx.preferred_pack_idx].p;
> if (!preferred->num_objects) {
> error(_("cannot select preferred pack %s with no objects"),
> preferred->pack_name);
> result = 1;
> goto cleanup;
> }
> }
>
> And the segfault is because the index wasn't populated, and thus
> `ctx.info[ctx.preferred_pack_idx].p == NULL`?
Exactly.
> > Note that packs loaded from disk which don't appear in an existing MIDX
> > do not trigger this issue, as those packs are loaded unconditionally. We
> > conditionally load packs from a MIDX since we tolerate MIDXs whose
> > packs do not resolve (i.e., via the MIDX write after removing
> > unreferenced packs via 'git multi-pack-index expire').
> >
> > In practice, this isn't possible to trigger when running `git
> > multi-pack-index write` from via `git repack`, as the latter always
>
> s/from via/via/
Ah, oops, thanks.
> > passes `--stdin-packs`, which prevents us from loading an existing MIDX,
> > as it forces all packs to be read from disk.
> >
> > But a future commit in this series will change that behavior to
> > unconditionally load an existing MIDX, even with `--stdin-packs`, making
> > this behavior trigger-able from 'repack' much more easily.
> >
> > Prevent this from being an issue by removing the segfault altogether by
>
> Removing segfaults is always good :)
>
> > calling `prepare_midx_pack()` on packs loaded from an existing MIDX when
> > either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a
> > `--preferred-pack`.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>
> > ---
> > midx-write.c | 8 +++++++-
> > t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++
> > 2 files changed, 30 insertions(+), 1 deletion(-)
> >
> > diff --git a/midx-write.c b/midx-write.c
> > index 9d096d5a28..03e95ae821 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -930,11 +930,17 @@ static int write_midx_internal(const char *object_dir,
> > for (i = 0; i < ctx.m->num_packs; i++) {
> > ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> >
> > - if (flags & MIDX_WRITE_REV_INDEX) {
> > + if (flags & MIDX_WRITE_REV_INDEX ||
> > + preferred_pack_name) {
> > /*
> > * If generating a reverse index, need to have
> > * packed_git's loaded to compare their
> > * mtimes and object count.
> > + *
> > + * If a preferred pack is specified,
> > + * need to have packed_git's loaded to
> > + * ensure the chosen preferred pack has
> > + * a non-zero object count.
> > */
> > if (prepare_midx_pack(the_repository, ctx.m, i)) {
> > error(_("could not load pack"));
>
> We now end up loading all packs, but in practice it should be sufficient
> to only load the preferred pack, right? Is there a particular reason why
> we now load all packs?
I had originally considered that, but discarded the idea because it
seemed like it might be fragile to depend on those two points in the
code having the exact same notion of what the preferred pack is.
Since it's cheap enough to load all of these packs in unconditionally
anyways, that's the route that I ended up taking here.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 2/8] midx-write.c: reduce argument count for `get_sorted_entries()`
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
2024-05-23 16:38 ` [PATCH 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
@ 2024-05-23 16:38 ` Taylor Blau
2024-05-29 7:47 ` Patrick Steinhardt
2024-05-23 16:38 ` [PATCH 3/8] midx-write.c: pass `start_pack` to `get_sorted_entries()` Taylor Blau
` (6 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Taylor Blau @ 2024-05-23 16:38 UTC (permalink / raw)
To: git
The function `midx-write.c::get_sorted_entries()` is responsible for
constructing the array of OIDs from a given list of packs which will
comprise the MIDX being written.
The singular call-site for this function looks something like:
ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr,
&ctx.entries_nr,
ctx.preferred_pack_idx);
This function has five formal arguments, all of which are members of the
shared `struct write_midx_context` used to track various pieces of
information about the MIDX being written.
The function `get_sorted_entries()` dates back to fe1ed56f5e4 (midx:
sort and deduplicate objects from packfiles, 2018-07-12), which came
shortly after 396f257018a (multi-pack-index: read packfile list,
2018-07-12). The latter patch introduced the `pack_list` structure,
which was a precursor to the structure we now know as
`write_midx_context` (c.f. 577dc49696a (midx: rename pack_info to
write_midx_context, 2021-02-18)).
At the time, `get_sorted_entries()` likely could have used the pack_list
structure introduced earlier in 396f257018a, but understandably did not
since the structure only contained three fields (only two of which were
relevant to `get_sorted_entries()`) at the time.
Simplify the declaration of this function by taking a single pointer to
the whole `struct write_midx_context` instead of various members within
it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 41 ++++++++++++++++++-----------------------
1 file changed, 18 insertions(+), 23 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 03e95ae821..ad32e8953d 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -299,21 +299,17 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
* Copy only the de-duplicated entries (selected by most-recent modified time
* of a packfile containing the object).
*/
-static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
- struct pack_info *info,
- uint32_t nr_packs,
- size_t *nr_objects,
- int preferred_pack)
+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx)
{
uint32_t cur_fanout, cur_pack, cur_object;
size_t alloc_objects, total_objects = 0;
struct midx_fanout fanout = { 0 };
struct pack_midx_entry *deduplicated_entries = NULL;
- uint32_t start_pack = m ? m->num_packs : 0;
+ uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
- for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
+ for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
total_objects = st_add(total_objects,
- info[cur_pack].p->num_objects);
+ ctx->info[cur_pack].p->num_objects);
/*
* As we de-duplicate by fanout value, we expect the fanout
@@ -324,25 +320,25 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
ALLOC_ARRAY(fanout.entries, fanout.alloc);
ALLOC_ARRAY(deduplicated_entries, alloc_objects);
- *nr_objects = 0;
+ ctx->entries_nr = 0;
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
fanout.nr = 0;
- if (m)
- midx_fanout_add_midx_fanout(&fanout, m, cur_fanout,
- preferred_pack);
+ if (ctx->m)
+ midx_fanout_add_midx_fanout(&fanout, ctx->m, cur_fanout,
+ ctx->preferred_pack_idx);
- for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
- int preferred = cur_pack == preferred_pack;
+ for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
+ int preferred = cur_pack == ctx->preferred_pack_idx;
midx_fanout_add_pack_fanout(&fanout,
- info, cur_pack,
+ ctx->info, cur_pack,
preferred, cur_fanout);
}
- if (-1 < preferred_pack && preferred_pack < start_pack)
- midx_fanout_add_pack_fanout(&fanout, info,
- preferred_pack, 1,
+ if (-1 < ctx->preferred_pack_idx && ctx->preferred_pack_idx < start_pack)
+ midx_fanout_add_pack_fanout(&fanout, ctx->info,
+ ctx->preferred_pack_idx, 1,
cur_fanout);
midx_fanout_sort(&fanout);
@@ -356,12 +352,12 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
&fanout.entries[cur_object].oid))
continue;
- ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1),
+ ALLOC_GROW(deduplicated_entries, st_add(ctx->entries_nr, 1),
alloc_objects);
- memcpy(&deduplicated_entries[*nr_objects],
+ memcpy(&deduplicated_entries[ctx->entries_nr],
&fanout.entries[cur_object],
sizeof(struct pack_midx_entry));
- (*nr_objects)++;
+ ctx->entries_nr++;
}
}
@@ -1055,8 +1051,7 @@ static int write_midx_internal(const char *object_dir,
}
}
- ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
- ctx.preferred_pack_idx);
+ ctx.entries = get_sorted_entries(&ctx);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
--
2.45.1.217.g9bb58e2bf5a.dirty
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 2/8] midx-write.c: reduce argument count for `get_sorted_entries()`
2024-05-23 16:38 ` [PATCH 2/8] midx-write.c: reduce argument count for `get_sorted_entries()` Taylor Blau
@ 2024-05-29 7:47 ` Patrick Steinhardt
2024-05-29 22:35 ` Taylor Blau
0 siblings, 1 reply; 34+ messages in thread
From: Patrick Steinhardt @ 2024-05-29 7:47 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2023 bytes --]
On Thu, May 23, 2024 at 12:38:06PM -0400, Taylor Blau wrote:
> diff --git a/midx-write.c b/midx-write.c
> index 03e95ae821..ad32e8953d 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -299,21 +299,17 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
> * Copy only the de-duplicated entries (selected by most-recent modified time
> * of a packfile containing the object).
> */
> -static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
> - struct pack_info *info,
> - uint32_t nr_packs,
> - size_t *nr_objects,
> - int preferred_pack)
> +static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx)
> {
> uint32_t cur_fanout, cur_pack, cur_object;
> size_t alloc_objects, total_objects = 0;
> struct midx_fanout fanout = { 0 };
> struct pack_midx_entry *deduplicated_entries = NULL;
> - uint32_t start_pack = m ? m->num_packs : 0;
> + uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
>
> - for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
> + for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
> total_objects = st_add(total_objects,
> - info[cur_pack].p->num_objects);
> + ctx->info[cur_pack].p->num_objects);
>
> /*
> * As we de-duplicate by fanout value, we expect the fanout
> @@ -324,25 +320,25 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
>
> ALLOC_ARRAY(fanout.entries, fanout.alloc);
> ALLOC_ARRAY(deduplicated_entries, alloc_objects);
> - *nr_objects = 0;
> + ctx->entries_nr = 0;
Nit: I think it's a bit surprising that a getter function would modify
the passed in structure. It's also a bit puzzling that we assign
`entries_nr` in here, but rely on the caller to set the corresponding
`entries` field. I think we should either have the caller assign both
fields, or we should rename the function and assign both of these fields
in the function.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 2/8] midx-write.c: reduce argument count for `get_sorted_entries()`
2024-05-29 7:47 ` Patrick Steinhardt
@ 2024-05-29 22:35 ` Taylor Blau
0 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:35 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Wed, May 29, 2024 at 09:47:57AM +0200, Patrick Steinhardt wrote:
> Nit: I think it's a bit surprising that a getter function would modify
> the passed in structure. It's also a bit puzzling that we assign
> `entries_nr` in here, but rely on the caller to set the corresponding
> `entries` field. I think we should either have the caller assign both
> fields, or we should rename the function and assign both of these fields
> in the function.
Thanks, I agree and should have changed the name of this function in the
existing round.
I renamed it to "compute_sorted_entries()" which takes only a single
"struct write_midx_context *", and assigns both "ctx->entries" and
"ctx->entries_nr".
Thanks,
Taylor
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 3/8] midx-write.c: pass `start_pack` to `get_sorted_entries()`
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
2024-05-23 16:38 ` [PATCH 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
2024-05-23 16:38 ` [PATCH 2/8] midx-write.c: reduce argument count for `get_sorted_entries()` Taylor Blau
@ 2024-05-23 16:38 ` Taylor Blau
2024-05-29 7:48 ` Patrick Steinhardt
2024-05-23 16:38 ` [PATCH 4/8] midx-write.c: extract `should_include_pack()` Taylor Blau
` (5 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Taylor Blau @ 2024-05-23 16:38 UTC (permalink / raw)
To: git
The function `get_sorted_entries()` is broadly responsible for
building an array of the objects to be written into a MIDX based on the
provided list of packs.
If we have loaded an existing MIDX, however, we may not use all of its
packs, despite loading them into the ctx->info array.
The existing implementation simply skips past the first
ctx->m->num_packs (if ctx->m is non-NULL, indicating that we loaded an
existing MIDX). Future changes (outside the scope of this patch series)
to the MIDX code will require us to skip *at most* that number[^1].
We could tag each pack with a bit that indicates the pack's contents
should be included in the MIDX. But we can just as easily determine the
number of packs to skip by passing in the number of packs we learned
about after processing an existing MIDX.
[^1]: Kind of. The real number will be bounded by the number of packs in
a MIDX layer, and the number of packs in its base layer(s), but that
concept hasn't been fully defined yet.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index ad32e8953d..cf7e391b6e 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -299,13 +299,13 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
* Copy only the de-duplicated entries (selected by most-recent modified time
* of a packfile containing the object).
*/
-static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx)
+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx,
+ uint32_t start_pack)
{
uint32_t cur_fanout, cur_pack, cur_object;
size_t alloc_objects, total_objects = 0;
struct midx_fanout fanout = { 0 };
struct pack_midx_entry *deduplicated_entries = NULL;
- uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
total_objects = st_add(total_objects,
@@ -886,7 +886,7 @@ static int write_midx_internal(const char *object_dir,
{
struct strbuf midx_name = STRBUF_INIT;
unsigned char midx_hash[GIT_MAX_RAWSZ];
- uint32_t i;
+ uint32_t i, start_pack;
struct hashfile *f = NULL;
struct lock_file lk;
struct write_midx_context ctx = { 0 };
@@ -954,6 +954,8 @@ static int write_midx_internal(const char *object_dir,
}
}
+ start_pack = ctx.nr;
+
ctx.pack_paths_checked = 0;
if (flags & MIDX_PROGRESS)
ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
@@ -1051,7 +1053,7 @@ static int write_midx_internal(const char *object_dir,
}
}
- ctx.entries = get_sorted_entries(&ctx);
+ ctx.entries = get_sorted_entries(&ctx, start_pack);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
--
2.45.1.217.g9bb58e2bf5a.dirty
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 3/8] midx-write.c: pass `start_pack` to `get_sorted_entries()`
2024-05-23 16:38 ` [PATCH 3/8] midx-write.c: pass `start_pack` to `get_sorted_entries()` Taylor Blau
@ 2024-05-29 7:48 ` Patrick Steinhardt
2024-05-29 22:36 ` Taylor Blau
0 siblings, 1 reply; 34+ messages in thread
From: Patrick Steinhardt @ 2024-05-29 7:48 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]
On Thu, May 23, 2024 at 12:38:09PM -0400, Taylor Blau wrote:
> The function `get_sorted_entries()` is broadly responsible for
> building an array of the objects to be written into a MIDX based on the
> provided list of packs.
>
> If we have loaded an existing MIDX, however, we may not use all of its
> packs, despite loading them into the ctx->info array.
>
> The existing implementation simply skips past the first
> ctx->m->num_packs (if ctx->m is non-NULL, indicating that we loaded an
> existing MIDX). Future changes (outside the scope of this patch series)
> to the MIDX code will require us to skip *at most* that number[^1].
>
> We could tag each pack with a bit that indicates the pack's contents
> should be included in the MIDX. But we can just as easily determine the
> number of packs to skip by passing in the number of packs we learned
> about after processing an existing MIDX.
Will we always want to skip packs from the start of the array? Or may it
happen that we want to skip packs in the middle of it? It's a bit hard
to judge because there isn't a ton of context when exactly we'll want to
skip, and why.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 3/8] midx-write.c: pass `start_pack` to `get_sorted_entries()`
2024-05-29 7:48 ` Patrick Steinhardt
@ 2024-05-29 22:36 ` Taylor Blau
0 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:36 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Wed, May 29, 2024 at 09:48:03AM +0200, Patrick Steinhardt wrote:
> Will we always want to skip packs from the start of the array? Or may it
> happen that we want to skip packs in the middle of it? It's a bit hard
> to judge because there isn't a ton of context when exactly we'll want to
> skip, and why.
It's always from the start of the array, but I'll add some more context
into the patch message as to why this is.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 4/8] midx-write.c: extract `should_include_pack()`
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (2 preceding siblings ...)
2024-05-23 16:38 ` [PATCH 3/8] midx-write.c: pass `start_pack` to `get_sorted_entries()` Taylor Blau
@ 2024-05-23 16:38 ` Taylor Blau
2024-05-29 7:48 ` Patrick Steinhardt
2024-05-23 16:38 ` [PATCH 5/8] midx-write.c: extract `fill_packs_from_midx()` Taylor Blau
` (4 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Taylor Blau @ 2024-05-23 16:38 UTC (permalink / raw)
To: git
The add_pack_to_midx() callback used via for_each_file_in_pack_dir() is
used to add packs with .idx files to the MIDX being written.
Within this function, we have a pair of checks that discards packs
which:
- appear in an existing MIDX, if we successfully read an existing MIDX
from disk
- or, appear in the "to_include" list, if invoking the MIDX write
machinery with the `--stdin-packs` command-line argument.
In a future commit will want to call a slight variant of these checks
from the code that reuses all packs from an existing MIDX, as well as
the current location via add_pack_to_midx(). The latter will be
modified in subsequent commits to only reuse packs which appear in the
to_include list, if one was given.
Prepare for that step by extracting these checks as a subroutine that
may be called from both places.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 48 ++++++++++++++++++++++++++++--------------------
1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index cf7e391b6e..f593cf1593 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -100,6 +100,32 @@ struct write_midx_context {
struct string_list *to_include;
};
+static int should_include_pack(const struct write_midx_context *ctx,
+ const char *file_name)
+{
+ /*
+ * Note that at most one of ctx->m and ctx->to_include are set,
+ * so we are testing midx_contains_pack() and
+ * string_list_has_string() independently (guarded by the
+ * appropriate NULL checks).
+ *
+ * We could support passing to_include while reusing an existing
+ * MIDX, but don't currently since the reuse process drags
+ * forward all packs from an existing MIDX (without checking
+ * whether or not they appear in the to_include list).
+ *
+ * If we added support for that, these next two conditional
+ * should be performed independently (likely checking
+ * to_include before the existing MIDX).
+ */
+ if (ctx->m && midx_contains_pack(ctx->m, file_name))
+ return 0;
+ else if (ctx->to_include &&
+ !string_list_has_string(ctx->to_include, file_name))
+ return 0;
+ return 1;
+}
+
static void add_pack_to_midx(const char *full_path, size_t full_path_len,
const char *file_name, void *data)
{
@@ -108,29 +134,11 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
if (ends_with(file_name, ".idx")) {
display_progress(ctx->progress, ++ctx->pack_paths_checked);
- /*
- * Note that at most one of ctx->m and ctx->to_include are set,
- * so we are testing midx_contains_pack() and
- * string_list_has_string() independently (guarded by the
- * appropriate NULL checks).
- *
- * We could support passing to_include while reusing an existing
- * MIDX, but don't currently since the reuse process drags
- * forward all packs from an existing MIDX (without checking
- * whether or not they appear in the to_include list).
- *
- * If we added support for that, these next two conditional
- * should be performed independently (likely checking
- * to_include before the existing MIDX).
- */
- if (ctx->m && midx_contains_pack(ctx->m, file_name))
- return;
- else if (ctx->to_include &&
- !string_list_has_string(ctx->to_include, file_name))
+
+ if (!should_include_pack(ctx, file_name))
return;
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
p = add_packed_git(full_path, full_path_len, 0);
if (!p) {
warning(_("failed to add packfile '%s'"),
--
2.45.1.217.g9bb58e2bf5a.dirty
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 4/8] midx-write.c: extract `should_include_pack()`
2024-05-23 16:38 ` [PATCH 4/8] midx-write.c: extract `should_include_pack()` Taylor Blau
@ 2024-05-29 7:48 ` Patrick Steinhardt
2024-05-29 22:40 ` Taylor Blau
0 siblings, 1 reply; 34+ messages in thread
From: Patrick Steinhardt @ 2024-05-29 7:48 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 650 bytes --]
On Thu, May 23, 2024 at 12:38:13PM -0400, Taylor Blau wrote:
> The add_pack_to_midx() callback used via for_each_file_in_pack_dir() is
> used to add packs with .idx files to the MIDX being written.
>
> Within this function, we have a pair of checks that discards packs
> which:
>
> - appear in an existing MIDX, if we successfully read an existing MIDX
> from disk
>
> - or, appear in the "to_include" list, if invoking the MIDX write
> machinery with the `--stdin-packs` command-line argument.
>
> In a future commit will want to call a slight variant of these checks
Either s/In a/A/ or s/commit/&, we/.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 4/8] midx-write.c: extract `should_include_pack()`
2024-05-29 7:48 ` Patrick Steinhardt
@ 2024-05-29 22:40 ` Taylor Blau
0 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:40 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Wed, May 29, 2024 at 09:48:08AM +0200, Patrick Steinhardt wrote:
> On Thu, May 23, 2024 at 12:38:13PM -0400, Taylor Blau wrote:
> > The add_pack_to_midx() callback used via for_each_file_in_pack_dir() is
> > used to add packs with .idx files to the MIDX being written.
> >
> > Within this function, we have a pair of checks that discards packs
> > which:
> >
> > - appear in an existing MIDX, if we successfully read an existing MIDX
> > from disk
> >
> > - or, appear in the "to_include" list, if invoking the MIDX write
> > machinery with the `--stdin-packs` command-line argument.
> >
> > In a future commit will want to call a slight variant of these checks
>
> Either s/In a/A/ or s/commit/&, we/.
Thanks for your careful review, it is much appreciated! I meant to write
the first one, and will correct it in the subsequent round :-).
Thanks,
Taylor
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 5/8] midx-write.c: extract `fill_packs_from_midx()`
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (3 preceding siblings ...)
2024-05-23 16:38 ` [PATCH 4/8] midx-write.c: extract `should_include_pack()` Taylor Blau
@ 2024-05-23 16:38 ` Taylor Blau
2024-05-23 16:38 ` [PATCH 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
` (3 subsequent siblings)
8 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-23 16:38 UTC (permalink / raw)
To: git
When write_midx_internal() loads an existing MIDX, all packs are copied
forward into the new MIDX. Improve the readability of
write_midx_internal() by extracting this functionality out into a
separate function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 68 +++++++++++++++++++++++++++++-----------------------
1 file changed, 38 insertions(+), 30 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index f593cf1593..9712ac044f 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -885,6 +885,40 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
return result;
}
+static int fill_packs_from_midx(struct write_midx_context *ctx,
+ const char *preferred_pack_name, uint32_t flags)
+{
+ uint32_t i;
+
+ for (i = 0; i < ctx->m->num_packs; i++) {
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+
+ if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
+ /*
+ * If generating a reverse index, need to have
+ * packed_git's loaded to compare their
+ * mtimes and object count.
+ *
+ *
+ * If a preferred pack is specified, need to
+ * have packed_git's loaded to ensure the chosen
+ * preferred pack has a non-zero object count.
+ */
+ if (prepare_midx_pack(the_repository, ctx->m, i))
+ return error(_("could not load pack"));
+
+ if (open_pack_index(ctx->m->packs[i]))
+ die(_("could not open index for %s"),
+ ctx->m->packs[i]->pack_name);
+ }
+
+ fill_pack_info(&ctx->info[ctx->nr++], ctx->m->packs[i],
+ ctx->m->pack_names[i], i);
+ }
+
+ return 0;
+}
+
static int write_midx_internal(const char *object_dir,
struct string_list *packs_to_include,
struct string_list *packs_to_drop,
@@ -930,36 +964,10 @@ static int write_midx_internal(const char *object_dir,
ctx.info = NULL;
ALLOC_ARRAY(ctx.info, ctx.alloc);
- if (ctx.m) {
- for (i = 0; i < ctx.m->num_packs; i++) {
- ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
-
- if (flags & MIDX_WRITE_REV_INDEX ||
- preferred_pack_name) {
- /*
- * If generating a reverse index, need to have
- * packed_git's loaded to compare their
- * mtimes and object count.
- *
- * If a preferred pack is specified,
- * need to have packed_git's loaded to
- * ensure the chosen preferred pack has
- * a non-zero object count.
- */
- if (prepare_midx_pack(the_repository, ctx.m, i)) {
- error(_("could not load pack"));
- result = 1;
- goto cleanup;
- }
-
- if (open_pack_index(ctx.m->packs[i]))
- die(_("could not open index for %s"),
- ctx.m->packs[i]->pack_name);
- }
-
- fill_pack_info(&ctx.info[ctx.nr++], ctx.m->packs[i],
- ctx.m->pack_names[i], i);
- }
+ if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
+ flags) < 0) {
+ result = 1;
+ goto cleanup;
}
start_pack = ctx.nr;
--
2.45.1.217.g9bb58e2bf5a.dirty
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include`
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (4 preceding siblings ...)
2024-05-23 16:38 ` [PATCH 5/8] midx-write.c: extract `fill_packs_from_midx()` Taylor Blau
@ 2024-05-23 16:38 ` Taylor Blau
2024-05-29 7:48 ` Patrick Steinhardt
2024-05-23 16:38 ` [PATCH 7/8] midx: replace `get_midx_rev_filename()` with a generic helper Taylor Blau
` (2 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Taylor Blau @ 2024-05-23 16:38 UTC (permalink / raw)
To: git
Avoid unconditionally copying all packs from an existing MIDX into a new
MIDX by checking that packs added via `fill_packs_from_midx()` don't
appear in the `to_include` set, if one was provided.
Do so by calling `should_include_pack()` from both `add_pack_to_midx()`
and `fill_packs_from_midx()`.
In order to make this work, teach `should_include_pack()` a new
"exclude_from_midx" parameter, which allows skipping the first check.
This is done so that the caller in `fill_packs_from_midx()` doesn't
reject all of the packs it provided since they appear in an existing
MIDX by definition.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 42 +++++++++++-------------------------------
1 file changed, 11 insertions(+), 31 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 9712ac044f..36ac4ab65b 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -101,27 +101,13 @@ struct write_midx_context {
};
static int should_include_pack(const struct write_midx_context *ctx,
- const char *file_name)
+ const char *file_name,
+ int exclude_from_midx)
{
- /*
- * Note that at most one of ctx->m and ctx->to_include are set,
- * so we are testing midx_contains_pack() and
- * string_list_has_string() independently (guarded by the
- * appropriate NULL checks).
- *
- * We could support passing to_include while reusing an existing
- * MIDX, but don't currently since the reuse process drags
- * forward all packs from an existing MIDX (without checking
- * whether or not they appear in the to_include list).
- *
- * If we added support for that, these next two conditional
- * should be performed independently (likely checking
- * to_include before the existing MIDX).
- */
- if (ctx->m && midx_contains_pack(ctx->m, file_name))
+ if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
return 0;
- else if (ctx->to_include &&
- !string_list_has_string(ctx->to_include, file_name))
+ if (ctx->to_include && !string_list_has_string(ctx->to_include,
+ file_name))
return 0;
return 1;
}
@@ -135,7 +121,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
if (ends_with(file_name, ".idx")) {
display_progress(ctx->progress, ++ctx->pack_paths_checked);
- if (!should_include_pack(ctx, file_name))
+ if (!should_include_pack(ctx, file_name, 1))
return;
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
@@ -891,6 +877,9 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
uint32_t i;
for (i = 0; i < ctx->m->num_packs; i++) {
+ if (!should_include_pack(ctx, ctx->m->pack_names[i], 0))
+ continue;
+
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
@@ -945,15 +934,7 @@ static int write_midx_internal(const char *object_dir,
die_errno(_("unable to create leading directories of %s"),
midx_name.buf);
- if (!packs_to_include) {
- /*
- * Only reference an existing MIDX when not filtering which
- * packs to include, since all packs and objects are copied
- * blindly from an existing MIDX if one is present.
- */
- ctx.m = lookup_multi_pack_index(the_repository, object_dir);
- }
-
+ ctx.m = lookup_multi_pack_index(the_repository, object_dir);
if (ctx.m && !midx_checksum_valid(ctx.m)) {
warning(_("ignoring existing multi-pack-index; checksum mismatch"));
ctx.m = NULL;
@@ -962,6 +943,7 @@ static int write_midx_internal(const char *object_dir,
ctx.nr = 0;
ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
ctx.info = NULL;
+ ctx.to_include = packs_to_include;
ALLOC_ARRAY(ctx.info, ctx.alloc);
if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
@@ -978,8 +960,6 @@ static int write_midx_internal(const char *object_dir,
else
ctx.progress = NULL;
- ctx.to_include = packs_to_include;
-
for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
stop_progress(&ctx.progress);
--
2.45.1.217.g9bb58e2bf5a.dirty
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include`
2024-05-23 16:38 ` [PATCH 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
@ 2024-05-29 7:48 ` Patrick Steinhardt
2024-05-29 22:46 ` Taylor Blau
0 siblings, 1 reply; 34+ messages in thread
From: Patrick Steinhardt @ 2024-05-29 7:48 UTC (permalink / raw)
To: Taylor Blau; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 2399 bytes --]
On Thu, May 23, 2024 at 12:38:19PM -0400, Taylor Blau wrote:
> Avoid unconditionally copying all packs from an existing MIDX into a new
> MIDX by checking that packs added via `fill_packs_from_midx()` don't
> appear in the `to_include` set, if one was provided.
>
> Do so by calling `should_include_pack()` from both `add_pack_to_midx()`
> and `fill_packs_from_midx()`.
This is missing an explanation why exactly we want that. Is the current
behaviour a bug? Is it a preparation for a future change? Is this change
expected to modify any existing behaviour?
Reading through the patch we now unconditionally load the existing MIDX
when writing a new one, but I'm not exactly sure what the effect of that
is going to be.
[snip]
> diff --git a/midx-write.c b/midx-write.c
> index 9712ac044f..36ac4ab65b 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -101,27 +101,13 @@ struct write_midx_context {
> };
>
> static int should_include_pack(const struct write_midx_context *ctx,
> - const char *file_name)
> + const char *file_name,
> + int exclude_from_midx)
> {
> - /*
> - * Note that at most one of ctx->m and ctx->to_include are set,
> - * so we are testing midx_contains_pack() and
> - * string_list_has_string() independently (guarded by the
> - * appropriate NULL checks).
> - *
> - * We could support passing to_include while reusing an existing
> - * MIDX, but don't currently since the reuse process drags
> - * forward all packs from an existing MIDX (without checking
> - * whether or not they appear in the to_include list).
> - *
> - * If we added support for that, these next two conditional
> - * should be performed independently (likely checking
> - * to_include before the existing MIDX).
> - */
> - if (ctx->m && midx_contains_pack(ctx->m, file_name))
> + if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
> return 0;
> - else if (ctx->to_include &&
> - !string_list_has_string(ctx->to_include, file_name))
> + if (ctx->to_include && !string_list_has_string(ctx->to_include,
> + file_name))
The second branch is a no-op change, right? The only change was that you
converted from `else if` to `if`. I'd propose to either keep this as-is,
or to do this change in the preceding patch already that introduces this
function.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include`
2024-05-29 7:48 ` Patrick Steinhardt
@ 2024-05-29 22:46 ` Taylor Blau
0 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:46 UTC (permalink / raw)
To: Patrick Steinhardt; +Cc: git
On Wed, May 29, 2024 at 09:48:20AM +0200, Patrick Steinhardt wrote:
> On Thu, May 23, 2024 at 12:38:19PM -0400, Taylor Blau wrote:
> > Avoid unconditionally copying all packs from an existing MIDX into a new
> > MIDX by checking that packs added via `fill_packs_from_midx()` don't
> > appear in the `to_include` set, if one was provided.
> >
> > Do so by calling `should_include_pack()` from both `add_pack_to_midx()`
> > and `fill_packs_from_midx()`.
>
> This is missing an explanation why exactly we want that. Is the current
> behaviour a bug? Is it a preparation for a future change? Is this change
> expected to modify any existing behaviour?
>
> Reading through the patch we now unconditionally load the existing MIDX
> when writing a new one, but I'm not exactly sure what the effect of that
> is going to be.
Very fair. The short answer is that this is a prerequisite for the
incremental MIDX series that I'm working on. The longer answer is that
an incremental MIDX-aware writer needs to be able to consult with the
existing MIDX (if one exists) and exclude any objects which already
appear in an earlier layer of the MIDX. This is done because we cannot
have the same object appear in multiple layers of the MIDX, for reasons
that are probably not interesting to this series.
I put a more concise version of the explanation above into this patch
which I'll send another round of in v2 of this series shortly.
> [snip]
> > diff --git a/midx-write.c b/midx-write.c
> > index 9712ac044f..36ac4ab65b 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -101,27 +101,13 @@ struct write_midx_context {
> > };
> >
> > static int should_include_pack(const struct write_midx_context *ctx,
> > - const char *file_name)
> > + const char *file_name,
> > + int exclude_from_midx)
> > {
> > - /*
> > - * Note that at most one of ctx->m and ctx->to_include are set,
> > - * so we are testing midx_contains_pack() and
> > - * string_list_has_string() independently (guarded by the
> > - * appropriate NULL checks).
> > - *
> > - * We could support passing to_include while reusing an existing
> > - * MIDX, but don't currently since the reuse process drags
> > - * forward all packs from an existing MIDX (without checking
> > - * whether or not they appear in the to_include list).
> > - *
> > - * If we added support for that, these next two conditional
> > - * should be performed independently (likely checking
> > - * to_include before the existing MIDX).
> > - */
> > - if (ctx->m && midx_contains_pack(ctx->m, file_name))
> > + if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
> > return 0;
> > - else if (ctx->to_include &&
> > - !string_list_has_string(ctx->to_include, file_name))
> > + if (ctx->to_include && !string_list_has_string(ctx->to_include,
> > + file_name))
>
> The second branch is a no-op change, right? The only change was that you
> converted from `else if` to `if`. I'd propose to either keep this as-is,
> or to do this change in the preceding patch already that introduces this
> function.
It is a no-op, but I would rather keep these separate to keep the
previous step a pure code movement rather than introducing any textual
changes.
Thanks,
Taylor
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 7/8] midx: replace `get_midx_rev_filename()` with a generic helper
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (5 preceding siblings ...)
2024-05-23 16:38 ` [PATCH 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
@ 2024-05-23 16:38 ` Taylor Blau
2024-05-23 16:38 ` [PATCH 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper Taylor Blau
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
8 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-23 16:38 UTC (permalink / raw)
To: git
Commit f894081deae (pack-revindex: read multi-pack reverse indexes,
2021-03-30) introduced the `get_midx_rev_filename()` helper (later
modified by commit 60980aed786 (midx.c: write MIDX filenames to
strbuf, 2021-10-26)).
This function returns the location of the classic ".rev" files we used
to write for MIDXs (prior to 95e8383bac1 (midx.c: make changing the
preferred pack safe, 2022-01-25)), which is always of the form:
$GIT_DIR/objects/pack/multi-pack-index-$HASH.rev
Replace this function with a generic helper that populates a strbuf with
the above form, replacing the ".rev" extension with a caller-provided
argument.
This will allow us to remove a similarly-defined function in the
pack-bitmap code (used to determine the location of a MIDX .bitmap file)
by reimplementing it in terms of `get_midx_filename_ext()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx.c | 14 ++++++++------
midx.h | 6 +++++-
pack-revindex.c | 3 ++-
3 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/midx.c b/midx.c
index 6f07de3688..bc4797196f 100644
--- a/midx.c
+++ b/midx.c
@@ -24,14 +24,16 @@ const unsigned char *get_midx_checksum(struct multi_pack_index *m)
}
void get_midx_filename(struct strbuf *out, const char *object_dir)
+{
+ get_midx_filename_ext(out, object_dir, NULL, NULL);
+}
+
+void get_midx_filename_ext(struct strbuf *out, const char *object_dir,
+ const unsigned char *hash, const char *ext)
{
strbuf_addf(out, "%s/pack/multi-pack-index", object_dir);
-}
-
-void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m)
-{
- get_midx_filename(out, m->object_dir);
- strbuf_addf(out, "-%s.rev", hash_to_hex(get_midx_checksum(m)));
+ if (ext)
+ strbuf_addf(out, "-%s.%s", hash_to_hex(hash), ext);
}
static int midx_read_oid_fanout(const unsigned char *chunk_start,
diff --git a/midx.h b/midx.h
index dc477dff44..8554f2d616 100644
--- a/midx.h
+++ b/midx.h
@@ -74,9 +74,13 @@ struct multi_pack_index {
#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
+#define MIDX_EXT_REV "rev"
+#define MIDX_EXT_BITMAP "bitmap"
+
const unsigned char *get_midx_checksum(struct multi_pack_index *m);
void get_midx_filename(struct strbuf *out, const char *object_dir);
-void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m);
+void get_midx_filename_ext(struct strbuf *out, const char *object_dir,
+ const unsigned char *hash, const char *ext);
struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
diff --git a/pack-revindex.c b/pack-revindex.c
index a7624d8be8..fc63aa76a2 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -381,7 +381,8 @@ int load_midx_revindex(struct multi_pack_index *m)
trace2_data_string("load_midx_revindex", the_repository,
"source", "rev");
- get_midx_rev_filename(&revindex_name, m);
+ get_midx_filename_ext(&revindex_name, m->object_dir,
+ get_midx_checksum(m), MIDX_EXT_REV);
ret = load_revindex_from_disk(revindex_name.buf,
m->num_objects,
--
2.45.1.217.g9bb58e2bf5a.dirty
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (6 preceding siblings ...)
2024-05-23 16:38 ` [PATCH 7/8] midx: replace `get_midx_rev_filename()` with a generic helper Taylor Blau
@ 2024-05-23 16:38 ` Taylor Blau
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
8 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-23 16:38 UTC (permalink / raw)
To: git
Now that we have the `get_midx_filename_ext()` helper, we can
reimplement the `midx_bitmap_filename()` function in terms of it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 35c5ef9d3c..fe8e8a51d3 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -309,9 +309,8 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
char *midx_bitmap_filename(struct multi_pack_index *midx)
{
struct strbuf buf = STRBUF_INIT;
-
- get_midx_filename(&buf, midx->object_dir);
- strbuf_addf(&buf, "-%s.bitmap", hash_to_hex(get_midx_checksum(midx)));
+ get_midx_filename_ext(&buf, midx->object_dir, get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
return strbuf_detach(&buf, NULL);
}
--
2.45.1.217.g9bb58e2bf5a.dirty
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (7 preceding siblings ...)
2024-05-23 16:38 ` [PATCH 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper Taylor Blau
@ 2024-05-29 22:55 ` Taylor Blau
2024-05-29 22:55 ` [PATCH v2 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
` (8 more replies)
8 siblings, 9 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:55 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano
This is a small reroll of my series which has a grab-bag of midx-write
related cleanups that I pulled out of a larger series to implement
incremental MIDX chains.
This series is mostly the same as last time, but notable changes
include:
- renamed `get_sorted_entries()` to `compute_sorted_entries()` to
avoid confusion that the function performs a side-effecting write to
`ctx->entries_nr`.
- fixed a handful of typos
- add explanations in a couple of the patches to better motivate the
change.
Thanks to Patrick Steinhardt for his careful review on the previous
round!
Taylor Blau (8):
midx-write.c: tolerate `--preferred-pack` without bitmaps
midx-write.c: reduce argument count for `get_sorted_entries()`
midx-write.c: pass `start_pack` to `compute_sorted_entries()`
midx-write.c: extract `should_include_pack()`
midx-write.c: extract `fill_packs_from_midx()`
midx-write.c: support reading an existing MIDX with `packs_to_include`
midx: replace `get_midx_rev_filename()` with a generic helper
pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper
midx-write.c | 161 ++++++++++++++++++------------------
midx.c | 14 ++--
midx.h | 6 +-
pack-bitmap.c | 5 +-
pack-revindex.c | 3 +-
t/t5319-multi-pack-index.sh | 23 ++++++
6 files changed, 119 insertions(+), 93 deletions(-)
Range-diff against v1:
1: c753bc379b ! 1: ad268bd264 midx-write.c: tolerate `--preferred-pack` without bitmaps
@@ Commit message
unreferenced packs via 'git multi-pack-index expire').
In practice, this isn't possible to trigger when running `git
- multi-pack-index write` from via `git repack`, as the latter always
- passes `--stdin-packs`, which prevents us from loading an existing MIDX,
- as it forces all packs to be read from disk.
+ multi-pack-index write` from `git repack`, as the latter always passes
+ `--stdin-packs`, which prevents us from loading an existing MIDX, as it
+ forces all packs to be read from disk.
But a future commit in this series will change that behavior to
unconditionally load an existing MIDX, even with `--stdin-packs`, making
2: 07dad5a581 ! 2: 9d422efe6e midx-write.c: reduce argument count for `get_sorted_entries()`
@@ Commit message
Simplify the declaration of this function by taking a single pointer to
the whole `struct write_midx_context` instead of various members within
- it.
+ it. Since this function is now computing the entire result (populating
+ both `ctx->entries`, and `ctx->entries_nr`), rename it to something that
+ doesn't start with "get_" to make clear that this function has a
+ side-effect.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
@@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout
- uint32_t nr_packs,
- size_t *nr_objects,
- int preferred_pack)
-+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx)
++static void compute_sorted_entries(struct write_midx_context *ctx)
{
uint32_t cur_fanout, cur_pack, cur_object;
size_t alloc_objects, total_objects = 0;
struct midx_fanout fanout = { 0 };
- struct pack_midx_entry *deduplicated_entries = NULL;
+- struct pack_midx_entry *deduplicated_entries = NULL;
- uint32_t start_pack = m ? m->num_packs : 0;
+ uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
@@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout
/*
* As we de-duplicate by fanout value, we expect the fanout
@@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
+ alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
ALLOC_ARRAY(fanout.entries, fanout.alloc);
- ALLOC_ARRAY(deduplicated_entries, alloc_objects);
+- ALLOC_ARRAY(deduplicated_entries, alloc_objects);
- *nr_objects = 0;
++ ALLOC_ARRAY(ctx->entries, alloc_objects);
+ ctx->entries_nr = 0;
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
@@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pac
continue;
- ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1),
-+ ALLOC_GROW(deduplicated_entries, st_add(ctx->entries_nr, 1),
++ ALLOC_GROW(ctx->entries, st_add(ctx->entries_nr, 1),
alloc_objects);
- memcpy(&deduplicated_entries[*nr_objects],
-+ memcpy(&deduplicated_entries[ctx->entries_nr],
++ memcpy(&ctx->entries[ctx->entries_nr],
&fanout.entries[cur_object],
sizeof(struct pack_midx_entry));
- (*nr_objects)++;
@@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pac
}
}
+ free(fanout.entries);
+- return deduplicated_entries;
+ }
+
+ static int write_midx_pack_names(struct hashfile *f, void *data)
@@ midx-write.c: static int write_midx_internal(const char *object_dir,
}
}
- ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
- ctx.preferred_pack_idx);
-+ ctx.entries = get_sorted_entries(&ctx);
++ compute_sorted_entries(&ctx);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
3: 7acf4557dc ! 3: e81296f8cc midx-write.c: pass `start_pack` to `get_sorted_entries()`
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- midx-write.c: pass `start_pack` to `get_sorted_entries()`
+ midx-write.c: pass `start_pack` to `compute_sorted_entries()`
- The function `get_sorted_entries()` is broadly responsible for
+ The function `compute_sorted_entries()` is broadly responsible for
building an array of the objects to be written into a MIDX based on the
provided list of packs.
@@ Commit message
The existing implementation simply skips past the first
ctx->m->num_packs (if ctx->m is non-NULL, indicating that we loaded an
- existing MIDX). Future changes (outside the scope of this patch series)
- to the MIDX code will require us to skip *at most* that number[^1].
+ existing MIDX). This is because we read objects in packs from an
+ existing MIDX via the MIDX itself, rather than from the pack-level
+ fanout to guarantee a de-duplicated result (see: a40498a1265 (midx: use
+ existing midx when writing new one, 2018-07-12)).
+
+ Future changes (outside the scope of this patch series) to the MIDX code
+ will require us to skip *at most* that number[^1].
We could tag each pack with a bit that indicates the pack's contents
should be included in the MIDX. But we can just as easily determine the
@@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout
* Copy only the de-duplicated entries (selected by most-recent modified time
* of a packfile containing the object).
*/
--static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx)
-+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx,
-+ uint32_t start_pack)
+-static void compute_sorted_entries(struct write_midx_context *ctx)
++static void compute_sorted_entries(struct write_midx_context *ctx,
++ uint32_t start_pack)
{
uint32_t cur_fanout, cur_pack, cur_object;
size_t alloc_objects, total_objects = 0;
struct midx_fanout fanout = { 0 };
- struct pack_midx_entry *deduplicated_entries = NULL;
- uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
@@ midx-write.c: static int write_midx_internal(const char *object_dir,
}
}
-- ctx.entries = get_sorted_entries(&ctx);
-+ ctx.entries = get_sorted_entries(&ctx, start_pack);
+- compute_sorted_entries(&ctx);
++ compute_sorted_entries(&ctx, start_pack);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
4: 3908546ea8 ! 4: 9cd9257465 midx-write.c: extract `should_include_pack()`
@@ Commit message
- or, appear in the "to_include" list, if invoking the MIDX write
machinery with the `--stdin-packs` command-line argument.
- In a future commit will want to call a slight variant of these checks
- from the code that reuses all packs from an existing MIDX, as well as
- the current location via add_pack_to_midx(). The latter will be
- modified in subsequent commits to only reuse packs which appear in the
- to_include list, if one was given.
+ A future commit will want to call a slight variant of these checks from
+ the code that reuses all packs from an existing MIDX, as well as the
+ current location via add_pack_to_midx(). The latter will be modified in
+ subsequent commits to only reuse packs which appear in the to_include
+ list, if one was given.
Prepare for that step by extracting these checks as a subroutine that
may be called from both places.
5: dc813ea1ca = 5: 1bb890e87c midx-write.c: extract `fill_packs_from_midx()`
6: 61268114c6 ! 6: fe187b1939 midx-write.c: support reading an existing MIDX with `packs_to_include`
@@ Commit message
reject all of the packs it provided since they appear in an existing
MIDX by definition.
+ The sum total of this change is that we are now able to read and
+ reference objects in an existing MIDX even when given a non-NULL
+ `packs_to_include`. This is a prerequisite step for incremental MIDXs,
+ which need to load any existing MIDX (if one is present) in order to
+ determine whether or not an object already appears in an earlier portion
+ of the MIDX to avoid duplicating it across multiple portions.
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## midx-write.c ##
7: f4c0167f43 = 7: f4f977c1c7 midx: replace `get_midx_rev_filename()` with a generic helper
8: 79e3f7f83f = 8: bcadaf9278 pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper
base-commit: 3a57aa566a21e7a510c64881bc6bdff7eb397988
--
2.45.1.321.gbcadaf92783
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
@ 2024-05-29 22:55 ` Taylor Blau
2024-05-29 22:55 ` [PATCH v2 2/8] midx-write.c: reduce argument count for `get_sorted_entries()` Taylor Blau
` (7 subsequent siblings)
8 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:55 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano
When passing a preferred pack to the MIDX write machinery, we ensure
that the given preferred pack is non-empty since 5d3cd09a808 (midx:
reject empty `--preferred-pack`'s, 2021-08-31).
However packs are only loaded (via `write_midx_internal()`, though a
subsequent patch will refactor this code out to its own function) when
the `MIDX_WRITE_REV_INDEX` flag is set.
So if a caller runs:
$ git multi-pack-index write --preferred-pack=...
with both (a) an existing MIDX, and (b) specifies a pack from that MIDX
as the preferred one, without passing `--bitmap`, then the check added
in 5d3cd09a808 will result in a segfault.
Note that packs loaded from disk which don't appear in an existing MIDX
do not trigger this issue, as those packs are loaded unconditionally. We
conditionally load packs from a MIDX since we tolerate MIDXs whose
packs do not resolve (i.e., via the MIDX write after removing
unreferenced packs via 'git multi-pack-index expire').
In practice, this isn't possible to trigger when running `git
multi-pack-index write` from `git repack`, as the latter always passes
`--stdin-packs`, which prevents us from loading an existing MIDX, as it
forces all packs to be read from disk.
But a future commit in this series will change that behavior to
unconditionally load an existing MIDX, even with `--stdin-packs`, making
this behavior trigger-able from 'repack' much more easily.
Prevent this from being an issue by removing the segfault altogether by
calling `prepare_midx_pack()` on packs loaded from an existing MIDX when
either the `MIDX_WRITE_REV_INDEX` flag is set *or* we specified a
`--preferred-pack`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 8 +++++++-
t/t5319-multi-pack-index.sh | 23 +++++++++++++++++++++++
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/midx-write.c b/midx-write.c
index 2fa120c213..591b79bcbf 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -935,11 +935,17 @@ static int write_midx_internal(const char *object_dir,
for (i = 0; i < ctx.m->num_packs; i++) {
ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
- if (flags & MIDX_WRITE_REV_INDEX) {
+ if (flags & MIDX_WRITE_REV_INDEX ||
+ preferred_pack_name) {
/*
* If generating a reverse index, need to have
* packed_git's loaded to compare their
* mtimes and object count.
+ *
+ * If a preferred pack is specified,
+ * need to have packed_git's loaded to
+ * ensure the chosen preferred pack has
+ * a non-zero object count.
*/
if (prepare_midx_pack(the_repository, ctx.m, i)) {
error(_("could not load pack"));
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index dd09134db0..10d2a6bf92 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -350,6 +350,29 @@ test_expect_success 'preferred packs must be non-empty' '
)
'
+test_expect_success 'preferred pack from existing MIDX without bitmaps' '
+ git init preferred-without-bitmaps &&
+ (
+ cd preferred-without-bitmaps &&
+
+ test_commit one &&
+ pack="$(git pack-objects --all $objdir/pack/pack </dev/null)" &&
+
+ git multi-pack-index write &&
+
+ # make another pack so that the subsequent MIDX write
+ # has something to do
+ test_commit two &&
+ git repack -d &&
+
+ # write a new MIDX without bitmaps reusing the singular
+ # pack from the existing MIDX as the preferred pack in
+ # the new MIDX
+ git multi-pack-index write --preferred-pack=pack-$pack.pack
+ )
+
+'
+
test_expect_success 'verify multi-pack-index success' '
git multi-pack-index verify --object-dir=$objdir
'
--
2.45.1.321.gbcadaf92783
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 2/8] midx-write.c: reduce argument count for `get_sorted_entries()`
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
2024-05-29 22:55 ` [PATCH v2 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
@ 2024-05-29 22:55 ` Taylor Blau
2024-05-29 22:55 ` [PATCH v2 3/8] midx-write.c: pass `start_pack` to `compute_sorted_entries()` Taylor Blau
` (6 subsequent siblings)
8 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:55 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano
The function `midx-write.c::get_sorted_entries()` is responsible for
constructing the array of OIDs from a given list of packs which will
comprise the MIDX being written.
The singular call-site for this function looks something like:
ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr,
&ctx.entries_nr,
ctx.preferred_pack_idx);
This function has five formal arguments, all of which are members of the
shared `struct write_midx_context` used to track various pieces of
information about the MIDX being written.
The function `get_sorted_entries()` dates back to fe1ed56f5e4 (midx:
sort and deduplicate objects from packfiles, 2018-07-12), which came
shortly after 396f257018a (multi-pack-index: read packfile list,
2018-07-12). The latter patch introduced the `pack_list` structure,
which was a precursor to the structure we now know as
`write_midx_context` (c.f. 577dc49696a (midx: rename pack_info to
write_midx_context, 2021-02-18)).
At the time, `get_sorted_entries()` likely could have used the pack_list
structure introduced earlier in 396f257018a, but understandably did not
since the structure only contained three fields (only two of which were
relevant to `get_sorted_entries()`) at the time.
Simplify the declaration of this function by taking a single pointer to
the whole `struct write_midx_context` instead of various members within
it. Since this function is now computing the entire result (populating
both `ctx->entries`, and `ctx->entries_nr`), rename it to something that
doesn't start with "get_" to make clear that this function has a
side-effect.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 45 +++++++++++++++++++--------------------------
1 file changed, 19 insertions(+), 26 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 591b79bcbf..15965ceb70 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -299,21 +299,16 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
* Copy only the de-duplicated entries (selected by most-recent modified time
* of a packfile containing the object).
*/
-static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
- struct pack_info *info,
- uint32_t nr_packs,
- size_t *nr_objects,
- int preferred_pack)
+static void compute_sorted_entries(struct write_midx_context *ctx)
{
uint32_t cur_fanout, cur_pack, cur_object;
size_t alloc_objects, total_objects = 0;
struct midx_fanout fanout = { 0 };
- struct pack_midx_entry *deduplicated_entries = NULL;
- uint32_t start_pack = m ? m->num_packs : 0;
+ uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
- for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++)
+ for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
total_objects = st_add(total_objects,
- info[cur_pack].p->num_objects);
+ ctx->info[cur_pack].p->num_objects);
/*
* As we de-duplicate by fanout value, we expect the fanout
@@ -323,26 +318,26 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
ALLOC_ARRAY(fanout.entries, fanout.alloc);
- ALLOC_ARRAY(deduplicated_entries, alloc_objects);
- *nr_objects = 0;
+ ALLOC_ARRAY(ctx->entries, alloc_objects);
+ ctx->entries_nr = 0;
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
fanout.nr = 0;
- if (m)
- midx_fanout_add_midx_fanout(&fanout, m, cur_fanout,
- preferred_pack);
+ if (ctx->m)
+ midx_fanout_add_midx_fanout(&fanout, ctx->m, cur_fanout,
+ ctx->preferred_pack_idx);
- for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
- int preferred = cur_pack == preferred_pack;
+ for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++) {
+ int preferred = cur_pack == ctx->preferred_pack_idx;
midx_fanout_add_pack_fanout(&fanout,
- info, cur_pack,
+ ctx->info, cur_pack,
preferred, cur_fanout);
}
- if (-1 < preferred_pack && preferred_pack < start_pack)
- midx_fanout_add_pack_fanout(&fanout, info,
- preferred_pack, 1,
+ if (-1 < ctx->preferred_pack_idx && ctx->preferred_pack_idx < start_pack)
+ midx_fanout_add_pack_fanout(&fanout, ctx->info,
+ ctx->preferred_pack_idx, 1,
cur_fanout);
midx_fanout_sort(&fanout);
@@ -356,17 +351,16 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
&fanout.entries[cur_object].oid))
continue;
- ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1),
+ ALLOC_GROW(ctx->entries, st_add(ctx->entries_nr, 1),
alloc_objects);
- memcpy(&deduplicated_entries[*nr_objects],
+ memcpy(&ctx->entries[ctx->entries_nr],
&fanout.entries[cur_object],
sizeof(struct pack_midx_entry));
- (*nr_objects)++;
+ ctx->entries_nr++;
}
}
free(fanout.entries);
- return deduplicated_entries;
}
static int write_midx_pack_names(struct hashfile *f, void *data)
@@ -1060,8 +1054,7 @@ static int write_midx_internal(const char *object_dir,
}
}
- ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
- ctx.preferred_pack_idx);
+ compute_sorted_entries(&ctx);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
--
2.45.1.321.gbcadaf92783
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 3/8] midx-write.c: pass `start_pack` to `compute_sorted_entries()`
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
2024-05-29 22:55 ` [PATCH v2 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
2024-05-29 22:55 ` [PATCH v2 2/8] midx-write.c: reduce argument count for `get_sorted_entries()` Taylor Blau
@ 2024-05-29 22:55 ` Taylor Blau
2024-05-30 6:59 ` Jeff King
2024-05-29 22:55 ` [PATCH v2 4/8] midx-write.c: extract `should_include_pack()` Taylor Blau
` (5 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:55 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano
The function `compute_sorted_entries()` is broadly responsible for
building an array of the objects to be written into a MIDX based on the
provided list of packs.
If we have loaded an existing MIDX, however, we may not use all of its
packs, despite loading them into the ctx->info array.
The existing implementation simply skips past the first
ctx->m->num_packs (if ctx->m is non-NULL, indicating that we loaded an
existing MIDX). This is because we read objects in packs from an
existing MIDX via the MIDX itself, rather than from the pack-level
fanout to guarantee a de-duplicated result (see: a40498a1265 (midx: use
existing midx when writing new one, 2018-07-12)).
Future changes (outside the scope of this patch series) to the MIDX code
will require us to skip *at most* that number[^1].
We could tag each pack with a bit that indicates the pack's contents
should be included in the MIDX. But we can just as easily determine the
number of packs to skip by passing in the number of packs we learned
about after processing an existing MIDX.
[^1]: Kind of. The real number will be bounded by the number of packs in
a MIDX layer, and the number of packs in its base layer(s), but that
concept hasn't been fully defined yet.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 15965ceb70..949a66e973 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -299,12 +299,12 @@ static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
* Copy only the de-duplicated entries (selected by most-recent modified time
* of a packfile containing the object).
*/
-static void compute_sorted_entries(struct write_midx_context *ctx)
+static void compute_sorted_entries(struct write_midx_context *ctx,
+ uint32_t start_pack)
{
uint32_t cur_fanout, cur_pack, cur_object;
size_t alloc_objects, total_objects = 0;
struct midx_fanout fanout = { 0 };
- uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
total_objects = st_add(total_objects,
@@ -889,7 +889,7 @@ static int write_midx_internal(const char *object_dir,
{
struct strbuf midx_name = STRBUF_INIT;
unsigned char midx_hash[GIT_MAX_RAWSZ];
- uint32_t i;
+ uint32_t i, start_pack;
struct hashfile *f = NULL;
struct lock_file lk;
struct write_midx_context ctx = { 0 };
@@ -957,6 +957,8 @@ static int write_midx_internal(const char *object_dir,
}
}
+ start_pack = ctx.nr;
+
ctx.pack_paths_checked = 0;
if (flags & MIDX_PROGRESS)
ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
@@ -1054,7 +1056,7 @@ static int write_midx_internal(const char *object_dir,
}
}
- compute_sorted_entries(&ctx);
+ compute_sorted_entries(&ctx, start_pack);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
--
2.45.1.321.gbcadaf92783
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v2 3/8] midx-write.c: pass `start_pack` to `compute_sorted_entries()`
2024-05-29 22:55 ` [PATCH v2 3/8] midx-write.c: pass `start_pack` to `compute_sorted_entries()` Taylor Blau
@ 2024-05-30 6:59 ` Jeff King
0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2024-05-30 6:59 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano
On Wed, May 29, 2024 at 06:55:28PM -0400, Taylor Blau wrote:
> The function `compute_sorted_entries()` is broadly responsible for
> building an array of the objects to be written into a MIDX based on the
> provided list of packs.
>
> If we have loaded an existing MIDX, however, we may not use all of its
> packs, despite loading them into the ctx->info array.
>
> The existing implementation simply skips past the first
> ctx->m->num_packs (if ctx->m is non-NULL, indicating that we loaded an
> existing MIDX). This is because we read objects in packs from an
> existing MIDX via the MIDX itself, rather than from the pack-level
> fanout to guarantee a de-duplicated result (see: a40498a1265 (midx: use
> existing midx when writing new one, 2018-07-12)).
Hmm, if the last patch had not removed the separate array/nr pair for
packs, then you could just pass (array + start_pack, nr - start_pack). :)
But I think it is probably reasonable to keep the notion of "here are
all the packs" and "you are skipping some of them" clear through the
call stack anyway. Especially if the notion of start_pack gets more
complicated:
> Future changes (outside the scope of this patch series) to the MIDX code
> will require us to skip *at most* that number[^1].
So this patch seems fine to me.
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 4/8] midx-write.c: extract `should_include_pack()`
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (2 preceding siblings ...)
2024-05-29 22:55 ` [PATCH v2 3/8] midx-write.c: pass `start_pack` to `compute_sorted_entries()` Taylor Blau
@ 2024-05-29 22:55 ` Taylor Blau
2024-05-29 22:55 ` [PATCH v2 5/8] midx-write.c: extract `fill_packs_from_midx()` Taylor Blau
` (4 subsequent siblings)
8 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:55 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano
The add_pack_to_midx() callback used via for_each_file_in_pack_dir() is
used to add packs with .idx files to the MIDX being written.
Within this function, we have a pair of checks that discards packs
which:
- appear in an existing MIDX, if we successfully read an existing MIDX
from disk
- or, appear in the "to_include" list, if invoking the MIDX write
machinery with the `--stdin-packs` command-line argument.
A future commit will want to call a slight variant of these checks from
the code that reuses all packs from an existing MIDX, as well as the
current location via add_pack_to_midx(). The latter will be modified in
subsequent commits to only reuse packs which appear in the to_include
list, if one was given.
Prepare for that step by extracting these checks as a subroutine that
may be called from both places.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 48 ++++++++++++++++++++++++++++--------------------
1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index 949a66e973..a80709726a 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -100,6 +100,32 @@ struct write_midx_context {
struct string_list *to_include;
};
+static int should_include_pack(const struct write_midx_context *ctx,
+ const char *file_name)
+{
+ /*
+ * Note that at most one of ctx->m and ctx->to_include are set,
+ * so we are testing midx_contains_pack() and
+ * string_list_has_string() independently (guarded by the
+ * appropriate NULL checks).
+ *
+ * We could support passing to_include while reusing an existing
+ * MIDX, but don't currently since the reuse process drags
+ * forward all packs from an existing MIDX (without checking
+ * whether or not they appear in the to_include list).
+ *
+ * If we added support for that, these next two conditional
+ * should be performed independently (likely checking
+ * to_include before the existing MIDX).
+ */
+ if (ctx->m && midx_contains_pack(ctx->m, file_name))
+ return 0;
+ else if (ctx->to_include &&
+ !string_list_has_string(ctx->to_include, file_name))
+ return 0;
+ return 1;
+}
+
static void add_pack_to_midx(const char *full_path, size_t full_path_len,
const char *file_name, void *data)
{
@@ -108,29 +134,11 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
if (ends_with(file_name, ".idx")) {
display_progress(ctx->progress, ++ctx->pack_paths_checked);
- /*
- * Note that at most one of ctx->m and ctx->to_include are set,
- * so we are testing midx_contains_pack() and
- * string_list_has_string() independently (guarded by the
- * appropriate NULL checks).
- *
- * We could support passing to_include while reusing an existing
- * MIDX, but don't currently since the reuse process drags
- * forward all packs from an existing MIDX (without checking
- * whether or not they appear in the to_include list).
- *
- * If we added support for that, these next two conditional
- * should be performed independently (likely checking
- * to_include before the existing MIDX).
- */
- if (ctx->m && midx_contains_pack(ctx->m, file_name))
- return;
- else if (ctx->to_include &&
- !string_list_has_string(ctx->to_include, file_name))
+
+ if (!should_include_pack(ctx, file_name))
return;
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
-
p = add_packed_git(full_path, full_path_len, 0);
if (!p) {
warning(_("failed to add packfile '%s'"),
--
2.45.1.321.gbcadaf92783
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 5/8] midx-write.c: extract `fill_packs_from_midx()`
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (3 preceding siblings ...)
2024-05-29 22:55 ` [PATCH v2 4/8] midx-write.c: extract `should_include_pack()` Taylor Blau
@ 2024-05-29 22:55 ` Taylor Blau
2024-05-29 22:55 ` [PATCH v2 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
` (3 subsequent siblings)
8 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:55 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano
When write_midx_internal() loads an existing MIDX, all packs are copied
forward into the new MIDX. Improve the readability of
write_midx_internal() by extracting this functionality out into a
separate function.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 68 +++++++++++++++++++++++++++++-----------------------
1 file changed, 38 insertions(+), 30 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index a80709726a..c93e1ae305 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -888,6 +888,40 @@ static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
return result;
}
+static int fill_packs_from_midx(struct write_midx_context *ctx,
+ const char *preferred_pack_name, uint32_t flags)
+{
+ uint32_t i;
+
+ for (i = 0; i < ctx->m->num_packs; i++) {
+ ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
+
+ if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
+ /*
+ * If generating a reverse index, need to have
+ * packed_git's loaded to compare their
+ * mtimes and object count.
+ *
+ *
+ * If a preferred pack is specified, need to
+ * have packed_git's loaded to ensure the chosen
+ * preferred pack has a non-zero object count.
+ */
+ if (prepare_midx_pack(the_repository, ctx->m, i))
+ return error(_("could not load pack"));
+
+ if (open_pack_index(ctx->m->packs[i]))
+ die(_("could not open index for %s"),
+ ctx->m->packs[i]->pack_name);
+ }
+
+ fill_pack_info(&ctx->info[ctx->nr++], ctx->m->packs[i],
+ ctx->m->pack_names[i], i);
+ }
+
+ return 0;
+}
+
static int write_midx_internal(const char *object_dir,
struct string_list *packs_to_include,
struct string_list *packs_to_drop,
@@ -933,36 +967,10 @@ static int write_midx_internal(const char *object_dir,
ctx.info = NULL;
ALLOC_ARRAY(ctx.info, ctx.alloc);
- if (ctx.m) {
- for (i = 0; i < ctx.m->num_packs; i++) {
- ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
-
- if (flags & MIDX_WRITE_REV_INDEX ||
- preferred_pack_name) {
- /*
- * If generating a reverse index, need to have
- * packed_git's loaded to compare their
- * mtimes and object count.
- *
- * If a preferred pack is specified,
- * need to have packed_git's loaded to
- * ensure the chosen preferred pack has
- * a non-zero object count.
- */
- if (prepare_midx_pack(the_repository, ctx.m, i)) {
- error(_("could not load pack"));
- result = 1;
- goto cleanup;
- }
-
- if (open_pack_index(ctx.m->packs[i]))
- die(_("could not open index for %s"),
- ctx.m->packs[i]->pack_name);
- }
-
- fill_pack_info(&ctx.info[ctx.nr++], ctx.m->packs[i],
- ctx.m->pack_names[i], i);
- }
+ if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
+ flags) < 0) {
+ result = 1;
+ goto cleanup;
}
start_pack = ctx.nr;
--
2.45.1.321.gbcadaf92783
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include`
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (4 preceding siblings ...)
2024-05-29 22:55 ` [PATCH v2 5/8] midx-write.c: extract `fill_packs_from_midx()` Taylor Blau
@ 2024-05-29 22:55 ` Taylor Blau
2024-05-30 7:05 ` Jeff King
2024-05-29 22:55 ` [PATCH v2 7/8] midx: replace `get_midx_rev_filename()` with a generic helper Taylor Blau
` (2 subsequent siblings)
8 siblings, 1 reply; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:55 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano
Avoid unconditionally copying all packs from an existing MIDX into a new
MIDX by checking that packs added via `fill_packs_from_midx()` don't
appear in the `to_include` set, if one was provided.
Do so by calling `should_include_pack()` from both `add_pack_to_midx()`
and `fill_packs_from_midx()`.
In order to make this work, teach `should_include_pack()` a new
"exclude_from_midx" parameter, which allows skipping the first check.
This is done so that the caller in `fill_packs_from_midx()` doesn't
reject all of the packs it provided since they appear in an existing
MIDX by definition.
The sum total of this change is that we are now able to read and
reference objects in an existing MIDX even when given a non-NULL
`packs_to_include`. This is a prerequisite step for incremental MIDXs,
which need to load any existing MIDX (if one is present) in order to
determine whether or not an object already appears in an earlier portion
of the MIDX to avoid duplicating it across multiple portions.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx-write.c | 42 +++++++++++-------------------------------
1 file changed, 11 insertions(+), 31 deletions(-)
diff --git a/midx-write.c b/midx-write.c
index c93e1ae305..8b96c566e7 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -101,27 +101,13 @@ struct write_midx_context {
};
static int should_include_pack(const struct write_midx_context *ctx,
- const char *file_name)
+ const char *file_name,
+ int exclude_from_midx)
{
- /*
- * Note that at most one of ctx->m and ctx->to_include are set,
- * so we are testing midx_contains_pack() and
- * string_list_has_string() independently (guarded by the
- * appropriate NULL checks).
- *
- * We could support passing to_include while reusing an existing
- * MIDX, but don't currently since the reuse process drags
- * forward all packs from an existing MIDX (without checking
- * whether or not they appear in the to_include list).
- *
- * If we added support for that, these next two conditional
- * should be performed independently (likely checking
- * to_include before the existing MIDX).
- */
- if (ctx->m && midx_contains_pack(ctx->m, file_name))
+ if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
return 0;
- else if (ctx->to_include &&
- !string_list_has_string(ctx->to_include, file_name))
+ if (ctx->to_include && !string_list_has_string(ctx->to_include,
+ file_name))
return 0;
return 1;
}
@@ -135,7 +121,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len,
if (ends_with(file_name, ".idx")) {
display_progress(ctx->progress, ++ctx->pack_paths_checked);
- if (!should_include_pack(ctx, file_name))
+ if (!should_include_pack(ctx, file_name, 1))
return;
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
@@ -894,6 +880,9 @@ static int fill_packs_from_midx(struct write_midx_context *ctx,
uint32_t i;
for (i = 0; i < ctx->m->num_packs; i++) {
+ if (!should_include_pack(ctx, ctx->m->pack_names[i], 0))
+ continue;
+
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
if (flags & MIDX_WRITE_REV_INDEX || preferred_pack_name) {
@@ -948,15 +937,7 @@ static int write_midx_internal(const char *object_dir,
die_errno(_("unable to create leading directories of %s"),
midx_name.buf);
- if (!packs_to_include) {
- /*
- * Only reference an existing MIDX when not filtering which
- * packs to include, since all packs and objects are copied
- * blindly from an existing MIDX if one is present.
- */
- ctx.m = lookup_multi_pack_index(the_repository, object_dir);
- }
-
+ ctx.m = lookup_multi_pack_index(the_repository, object_dir);
if (ctx.m && !midx_checksum_valid(ctx.m)) {
warning(_("ignoring existing multi-pack-index; checksum mismatch"));
ctx.m = NULL;
@@ -965,6 +946,7 @@ static int write_midx_internal(const char *object_dir,
ctx.nr = 0;
ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
ctx.info = NULL;
+ ctx.to_include = packs_to_include;
ALLOC_ARRAY(ctx.info, ctx.alloc);
if (ctx.m && fill_packs_from_midx(&ctx, preferred_pack_name,
@@ -981,8 +963,6 @@ static int write_midx_internal(const char *object_dir,
else
ctx.progress = NULL;
- ctx.to_include = packs_to_include;
-
for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
stop_progress(&ctx.progress);
--
2.45.1.321.gbcadaf92783
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v2 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include`
2024-05-29 22:55 ` [PATCH v2 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
@ 2024-05-30 7:05 ` Jeff King
0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2024-05-30 7:05 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano
On Wed, May 29, 2024 at 06:55:39PM -0400, Taylor Blau wrote:
> static int should_include_pack(const struct write_midx_context *ctx,
> - const char *file_name)
> + const char *file_name,
> + int exclude_from_midx)
Hmm, so we grow a new flag...
> {
> - /*
> - * Note that at most one of ctx->m and ctx->to_include are set,
> - * so we are testing midx_contains_pack() and
> - * string_list_has_string() independently (guarded by the
> - * appropriate NULL checks).
> - *
> - * We could support passing to_include while reusing an existing
> - * MIDX, but don't currently since the reuse process drags
> - * forward all packs from an existing MIDX (without checking
> - * whether or not they appear in the to_include list).
> - *
> - * If we added support for that, these next two conditional
> - * should be performed independently (likely checking
> - * to_include before the existing MIDX).
> - */
> - if (ctx->m && midx_contains_pack(ctx->m, file_name))
> + if (exclude_from_midx && ctx->m && midx_contains_pack(ctx->m, file_name))
> return 0;
> - else if (ctx->to_include &&
> - !string_list_has_string(ctx->to_include, file_name))
> + if (ctx->to_include && !string_list_has_string(ctx->to_include,
> + file_name))
> return 0;
...that disables half of what the function is checking.
It makes me wonder if it should just be two functions (or if the callers
should just inline them, since they are themselves basically
one-liners). But I think this is getting into bike-shedding territory.
I'm OK with it as-is.
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 7/8] midx: replace `get_midx_rev_filename()` with a generic helper
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (5 preceding siblings ...)
2024-05-29 22:55 ` [PATCH v2 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
@ 2024-05-29 22:55 ` Taylor Blau
2024-05-30 7:10 ` Jeff King
2024-05-29 22:55 ` [PATCH v2 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper Taylor Blau
2024-05-30 7:14 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Jeff King
8 siblings, 1 reply; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:55 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano
Commit f894081deae (pack-revindex: read multi-pack reverse indexes,
2021-03-30) introduced the `get_midx_rev_filename()` helper (later
modified by commit 60980aed786 (midx.c: write MIDX filenames to
strbuf, 2021-10-26)).
This function returns the location of the classic ".rev" files we used
to write for MIDXs (prior to 95e8383bac1 (midx.c: make changing the
preferred pack safe, 2022-01-25)), which is always of the form:
$GIT_DIR/objects/pack/multi-pack-index-$HASH.rev
Replace this function with a generic helper that populates a strbuf with
the above form, replacing the ".rev" extension with a caller-provided
argument.
This will allow us to remove a similarly-defined function in the
pack-bitmap code (used to determine the location of a MIDX .bitmap file)
by reimplementing it in terms of `get_midx_filename_ext()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
midx.c | 14 ++++++++------
midx.h | 6 +++++-
pack-revindex.c | 3 ++-
3 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/midx.c b/midx.c
index 6f07de3688..bc4797196f 100644
--- a/midx.c
+++ b/midx.c
@@ -24,14 +24,16 @@ const unsigned char *get_midx_checksum(struct multi_pack_index *m)
}
void get_midx_filename(struct strbuf *out, const char *object_dir)
+{
+ get_midx_filename_ext(out, object_dir, NULL, NULL);
+}
+
+void get_midx_filename_ext(struct strbuf *out, const char *object_dir,
+ const unsigned char *hash, const char *ext)
{
strbuf_addf(out, "%s/pack/multi-pack-index", object_dir);
-}
-
-void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m)
-{
- get_midx_filename(out, m->object_dir);
- strbuf_addf(out, "-%s.rev", hash_to_hex(get_midx_checksum(m)));
+ if (ext)
+ strbuf_addf(out, "-%s.%s", hash_to_hex(hash), ext);
}
static int midx_read_oid_fanout(const unsigned char *chunk_start,
diff --git a/midx.h b/midx.h
index dc477dff44..8554f2d616 100644
--- a/midx.h
+++ b/midx.h
@@ -74,9 +74,13 @@ struct multi_pack_index {
#define MIDX_WRITE_BITMAP_HASH_CACHE (1 << 3)
#define MIDX_WRITE_BITMAP_LOOKUP_TABLE (1 << 4)
+#define MIDX_EXT_REV "rev"
+#define MIDX_EXT_BITMAP "bitmap"
+
const unsigned char *get_midx_checksum(struct multi_pack_index *m);
void get_midx_filename(struct strbuf *out, const char *object_dir);
-void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m);
+void get_midx_filename_ext(struct strbuf *out, const char *object_dir,
+ const unsigned char *hash, const char *ext);
struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id);
diff --git a/pack-revindex.c b/pack-revindex.c
index a7624d8be8..fc63aa76a2 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -381,7 +381,8 @@ int load_midx_revindex(struct multi_pack_index *m)
trace2_data_string("load_midx_revindex", the_repository,
"source", "rev");
- get_midx_rev_filename(&revindex_name, m);
+ get_midx_filename_ext(&revindex_name, m->object_dir,
+ get_midx_checksum(m), MIDX_EXT_REV);
ret = load_revindex_from_disk(revindex_name.buf,
m->num_objects,
--
2.45.1.321.gbcadaf92783
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v2 7/8] midx: replace `get_midx_rev_filename()` with a generic helper
2024-05-29 22:55 ` [PATCH v2 7/8] midx: replace `get_midx_rev_filename()` with a generic helper Taylor Blau
@ 2024-05-30 7:10 ` Jeff King
0 siblings, 0 replies; 34+ messages in thread
From: Jeff King @ 2024-05-30 7:10 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano
On Wed, May 29, 2024 at 06:55:42PM -0400, Taylor Blau wrote:
> Commit f894081deae (pack-revindex: read multi-pack reverse indexes,
> 2021-03-30) introduced the `get_midx_rev_filename()` helper (later
> modified by commit 60980aed786 (midx.c: write MIDX filenames to
> strbuf, 2021-10-26)).
>
> This function returns the location of the classic ".rev" files we used
> to write for MIDXs (prior to 95e8383bac1 (midx.c: make changing the
> preferred pack safe, 2022-01-25)), which is always of the form:
>
> $GIT_DIR/objects/pack/multi-pack-index-$HASH.rev
>
> Replace this function with a generic helper that populates a strbuf with
> the above form, replacing the ".rev" extension with a caller-provided
> argument.
Makes sense, as we have similar routines for packfiles.
> +void get_midx_filename_ext(struct strbuf *out, const char *object_dir,
> + const unsigned char *hash, const char *ext)
> {
> strbuf_addf(out, "%s/pack/multi-pack-index", object_dir);
> -}
> -
> -void get_midx_rev_filename(struct strbuf *out, struct multi_pack_index *m)
> -{
> - get_midx_filename(out, m->object_dir);
> - strbuf_addf(out, "-%s.rev", hash_to_hex(get_midx_checksum(m)));
> + if (ext)
> + strbuf_addf(out, "-%s.%s", hash_to_hex(hash), ext);
> }
This bare "const unsigned char *hash" caught my eye, as we've mostly
been removing them. But it was present in the original, too; it was just
hidden in the return from get_midx_checksum().
And I'm not sure what the non-ugly version is. We implicitly use
the_hash_algo for these kinds of trailer checksums, and calling them
"struct object_id" is probably even more confusing. I guess they could
be "struct csum_file_trailer" or something, but I'm not sure that would
actually make anything more clear. Anyway, none of this is new in your
patch and we can ignore it for now.
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (6 preceding siblings ...)
2024-05-29 22:55 ` [PATCH v2 7/8] midx: replace `get_midx_rev_filename()` with a generic helper Taylor Blau
@ 2024-05-29 22:55 ` Taylor Blau
2024-05-30 7:14 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Jeff King
8 siblings, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-29 22:55 UTC (permalink / raw)
To: git; +Cc: Jeff King, Elijah Newren, Patrick Steinhardt, Junio C Hamano
Now that we have the `get_midx_filename_ext()` helper, we can
reimplement the `midx_bitmap_filename()` function in terms of it.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
pack-bitmap.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 35c5ef9d3c..fe8e8a51d3 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -309,9 +309,8 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
char *midx_bitmap_filename(struct multi_pack_index *midx)
{
struct strbuf buf = STRBUF_INIT;
-
- get_midx_filename(&buf, midx->object_dir);
- strbuf_addf(&buf, "-%s.bitmap", hash_to_hex(get_midx_checksum(midx)));
+ get_midx_filename_ext(&buf, midx->object_dir, get_midx_checksum(midx),
+ MIDX_EXT_BITMAP);
return strbuf_detach(&buf, NULL);
}
--
2.45.1.321.gbcadaf92783
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs
2024-05-29 22:55 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
` (7 preceding siblings ...)
2024-05-29 22:55 ` [PATCH v2 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper Taylor Blau
@ 2024-05-30 7:14 ` Jeff King
2024-05-30 13:59 ` Taylor Blau
2024-05-31 8:28 ` Patrick Steinhardt
8 siblings, 2 replies; 34+ messages in thread
From: Jeff King @ 2024-05-30 7:14 UTC (permalink / raw)
To: Taylor Blau; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano
On Wed, May 29, 2024 at 06:55:15PM -0400, Taylor Blau wrote:
> This is a small reroll of my series which has a grab-bag of midx-write
> related cleanups that I pulled out of a larger series to implement
> incremental MIDX chains.
These all look pretty reasonable to me. Thanks for breaking them off of
the larger series. I think it's generally nice to get things in smaller
chunks.
Sometimes it is a little tough to evaluate refactorings without seeing
the larger context in which they'd be used. But all of these were either
immediate improvements, or didn't take much imagination to see where
they'd make later things easier.
-Peff
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs
2024-05-30 7:14 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Jeff King
@ 2024-05-30 13:59 ` Taylor Blau
2024-05-31 8:28 ` Patrick Steinhardt
1 sibling, 0 replies; 34+ messages in thread
From: Taylor Blau @ 2024-05-30 13:59 UTC (permalink / raw)
To: Jeff King; +Cc: git, Elijah Newren, Patrick Steinhardt, Junio C Hamano
On Thu, May 30, 2024 at 03:14:32AM -0400, Jeff King wrote:
> On Wed, May 29, 2024 at 06:55:15PM -0400, Taylor Blau wrote:
>
> > This is a small reroll of my series which has a grab-bag of midx-write
> > related cleanups that I pulled out of a larger series to implement
> > incremental MIDX chains.
>
> These all look pretty reasonable to me. Thanks for breaking them off of
> the larger series. I think it's generally nice to get things in smaller
> chunks.
Thanks for reviewing! It is much appreciated, as always :-).
> Sometimes it is a little tough to evaluate refactorings without seeing
> the larger context in which they'd be used. But all of these were either
> immediate improvements, or didn't take much imagination to see where
> they'd make later things easier.
Yeah, I feel like this is something that I'm still trying to find the
right balance with. I think with the pseudo-merge bitmaps series, I
erred on the side of making the series too large, which made it
difficult to easily review.
Seeing the review process on that series made me a little uneasy about
sending the incremental MIDX bitmaps series, which was growing similarly
large. I think that seeing these 8 patches or so get sent in preparation
makes the later series easier to review, but I agree that they are
somewhat unsatisfying on their own.
I think in this case it was a reasonable trade-off to make, but you
could probably make a compelling case either way there ;-).
On the bright side, what was once a ~30 patch series is now three series
(of which this is the first one). The main series is now 13 or so
patches, the end state of which is to have incremental MIDXs working
without support for reachability bitmaps. The final series I'm planning
in this area adds support for reachability bitmaps with incremental
MIDXs, but that series is only half a dozen or so patches.
I'm hoping that by sending it in chunks, the end-to-end review process
will be more straightforward and approachable. If you have any other
ideas to make it easier, definitely let me know!
Thanks,
Taylor
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs
2024-05-30 7:14 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Jeff King
2024-05-30 13:59 ` Taylor Blau
@ 2024-05-31 8:28 ` Patrick Steinhardt
1 sibling, 0 replies; 34+ messages in thread
From: Patrick Steinhardt @ 2024-05-31 8:28 UTC (permalink / raw)
To: Jeff King; +Cc: Taylor Blau, git, Elijah Newren, Junio C Hamano
[-- Attachment #1: Type: text/plain, Size: 1447 bytes --]
On Thu, May 30, 2024 at 03:14:32AM -0400, Jeff King wrote:
> On Wed, May 29, 2024 at 06:55:15PM -0400, Taylor Blau wrote:
>
> > This is a small reroll of my series which has a grab-bag of midx-write
> > related cleanups that I pulled out of a larger series to implement
> > incremental MIDX chains.
>
> These all look pretty reasonable to me. Thanks for breaking them off of
> the larger series. I think it's generally nice to get things in smaller
> chunks.
>
> Sometimes it is a little tough to evaluate refactorings without seeing
> the larger context in which they'd be used. But all of these were either
> immediate improvements, or didn't take much imagination to see where
> they'd make later things easier.
I've read through the range-diff and didn't spot anything unexpected
there. So this version looks good from my point of view, thanks.
And I very much agree that it's nice to split out smaller topics like
this. I think it's okay to send patch series that have dozens of commits
when most of the commits are trivial. But the changes in the MIDX aren't
that and require the reviewer to dive deep. For me this has the result
that I need an hour or more on such large patch series to review them.
And because I often do not find the time for that I push them onto my
pile of shame of stuff that I do want to review.
I get though that this isn't always easy, and in any case it's a
tradeoff.
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 34+ messages in thread