From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>,
Patrick Steinhardt <ps@pks.im>,
Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs
Date: Wed, 29 May 2024 18:55:15 -0400 [thread overview]
Message-ID: <cover.1717023301.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1716482279.git.me@ttaylorr.com>
This is a small reroll of my series which has a grab-bag of midx-write
related cleanups that I pulled out of a larger series to implement
incremental MIDX chains.
This series is mostly the same as last time, but notable changes
include:
- renamed `get_sorted_entries()` to `compute_sorted_entries()` to
avoid confusion that the function performs a side-effecting write to
`ctx->entries_nr`.
- fixed a handful of typos
- add explanations in a couple of the patches to better motivate the
change.
Thanks to Patrick Steinhardt for his careful review on the previous
round!
Taylor Blau (8):
midx-write.c: tolerate `--preferred-pack` without bitmaps
midx-write.c: reduce argument count for `get_sorted_entries()`
midx-write.c: pass `start_pack` to `compute_sorted_entries()`
midx-write.c: extract `should_include_pack()`
midx-write.c: extract `fill_packs_from_midx()`
midx-write.c: support reading an existing MIDX with `packs_to_include`
midx: replace `get_midx_rev_filename()` with a generic helper
pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper
midx-write.c | 161 ++++++++++++++++++------------------
midx.c | 14 ++--
midx.h | 6 +-
pack-bitmap.c | 5 +-
pack-revindex.c | 3 +-
t/t5319-multi-pack-index.sh | 23 ++++++
6 files changed, 119 insertions(+), 93 deletions(-)
Range-diff against v1:
1: c753bc379b ! 1: ad268bd264 midx-write.c: tolerate `--preferred-pack` without bitmaps
@@ Commit message
unreferenced packs via 'git multi-pack-index expire').
In practice, this isn't possible to trigger when running `git
- multi-pack-index write` from via `git repack`, as the latter always
- passes `--stdin-packs`, which prevents us from loading an existing MIDX,
- as it forces all packs to be read from disk.
+ multi-pack-index write` from `git repack`, as the latter always passes
+ `--stdin-packs`, which prevents us from loading an existing MIDX, as it
+ forces all packs to be read from disk.
But a future commit in this series will change that behavior to
unconditionally load an existing MIDX, even with `--stdin-packs`, making
2: 07dad5a581 ! 2: 9d422efe6e midx-write.c: reduce argument count for `get_sorted_entries()`
@@ Commit message
Simplify the declaration of this function by taking a single pointer to
the whole `struct write_midx_context` instead of various members within
- it.
+ it. Since this function is now computing the entire result (populating
+ both `ctx->entries`, and `ctx->entries_nr`), rename it to something that
+ doesn't start with "get_" to make clear that this function has a
+ side-effect.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
@@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout
- uint32_t nr_packs,
- size_t *nr_objects,
- int preferred_pack)
-+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx)
++static void compute_sorted_entries(struct write_midx_context *ctx)
{
uint32_t cur_fanout, cur_pack, cur_object;
size_t alloc_objects, total_objects = 0;
struct midx_fanout fanout = { 0 };
- struct pack_midx_entry *deduplicated_entries = NULL;
+- struct pack_midx_entry *deduplicated_entries = NULL;
- uint32_t start_pack = m ? m->num_packs : 0;
+ uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
@@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout
/*
* As we de-duplicate by fanout value, we expect the fanout
@@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
+ alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
ALLOC_ARRAY(fanout.entries, fanout.alloc);
- ALLOC_ARRAY(deduplicated_entries, alloc_objects);
+- ALLOC_ARRAY(deduplicated_entries, alloc_objects);
- *nr_objects = 0;
++ ALLOC_ARRAY(ctx->entries, alloc_objects);
+ ctx->entries_nr = 0;
for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
@@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pac
continue;
- ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1),
-+ ALLOC_GROW(deduplicated_entries, st_add(ctx->entries_nr, 1),
++ ALLOC_GROW(ctx->entries, st_add(ctx->entries_nr, 1),
alloc_objects);
- memcpy(&deduplicated_entries[*nr_objects],
-+ memcpy(&deduplicated_entries[ctx->entries_nr],
++ memcpy(&ctx->entries[ctx->entries_nr],
&fanout.entries[cur_object],
sizeof(struct pack_midx_entry));
- (*nr_objects)++;
@@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pac
}
}
+ free(fanout.entries);
+- return deduplicated_entries;
+ }
+
+ static int write_midx_pack_names(struct hashfile *f, void *data)
@@ midx-write.c: static int write_midx_internal(const char *object_dir,
}
}
- ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
- ctx.preferred_pack_idx);
-+ ctx.entries = get_sorted_entries(&ctx);
++ compute_sorted_entries(&ctx);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
3: 7acf4557dc ! 3: e81296f8cc midx-write.c: pass `start_pack` to `get_sorted_entries()`
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- midx-write.c: pass `start_pack` to `get_sorted_entries()`
+ midx-write.c: pass `start_pack` to `compute_sorted_entries()`
- The function `get_sorted_entries()` is broadly responsible for
+ The function `compute_sorted_entries()` is broadly responsible for
building an array of the objects to be written into a MIDX based on the
provided list of packs.
@@ Commit message
The existing implementation simply skips past the first
ctx->m->num_packs (if ctx->m is non-NULL, indicating that we loaded an
- existing MIDX). Future changes (outside the scope of this patch series)
- to the MIDX code will require us to skip *at most* that number[^1].
+ existing MIDX). This is because we read objects in packs from an
+ existing MIDX via the MIDX itself, rather than from the pack-level
+ fanout to guarantee a de-duplicated result (see: a40498a1265 (midx: use
+ existing midx when writing new one, 2018-07-12)).
+
+ Future changes (outside the scope of this patch series) to the MIDX code
+ will require us to skip *at most* that number[^1].
We could tag each pack with a bit that indicates the pack's contents
should be included in the MIDX. But we can just as easily determine the
@@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout
* Copy only the de-duplicated entries (selected by most-recent modified time
* of a packfile containing the object).
*/
--static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx)
-+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx,
-+ uint32_t start_pack)
+-static void compute_sorted_entries(struct write_midx_context *ctx)
++static void compute_sorted_entries(struct write_midx_context *ctx,
++ uint32_t start_pack)
{
uint32_t cur_fanout, cur_pack, cur_object;
size_t alloc_objects, total_objects = 0;
struct midx_fanout fanout = { 0 };
- struct pack_midx_entry *deduplicated_entries = NULL;
- uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
@@ midx-write.c: static int write_midx_internal(const char *object_dir,
}
}
-- ctx.entries = get_sorted_entries(&ctx);
-+ ctx.entries = get_sorted_entries(&ctx, start_pack);
+- compute_sorted_entries(&ctx);
++ compute_sorted_entries(&ctx, start_pack);
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
4: 3908546ea8 ! 4: 9cd9257465 midx-write.c: extract `should_include_pack()`
@@ Commit message
- or, appear in the "to_include" list, if invoking the MIDX write
machinery with the `--stdin-packs` command-line argument.
- In a future commit will want to call a slight variant of these checks
- from the code that reuses all packs from an existing MIDX, as well as
- the current location via add_pack_to_midx(). The latter will be
- modified in subsequent commits to only reuse packs which appear in the
- to_include list, if one was given.
+ A future commit will want to call a slight variant of these checks from
+ the code that reuses all packs from an existing MIDX, as well as the
+ current location via add_pack_to_midx(). The latter will be modified in
+ subsequent commits to only reuse packs which appear in the to_include
+ list, if one was given.
Prepare for that step by extracting these checks as a subroutine that
may be called from both places.
5: dc813ea1ca = 5: 1bb890e87c midx-write.c: extract `fill_packs_from_midx()`
6: 61268114c6 ! 6: fe187b1939 midx-write.c: support reading an existing MIDX with `packs_to_include`
@@ Commit message
reject all of the packs it provided since they appear in an existing
MIDX by definition.
+ The sum total of this change is that we are now able to read and
+ reference objects in an existing MIDX even when given a non-NULL
+ `packs_to_include`. This is a prerequisite step for incremental MIDXs,
+ which need to load any existing MIDX (if one is present) in order to
+ determine whether or not an object already appears in an earlier portion
+ of the MIDX to avoid duplicating it across multiple portions.
+
Signed-off-by: Taylor Blau <me@ttaylorr.com>
## midx-write.c ##
7: f4c0167f43 = 7: f4f977c1c7 midx: replace `get_midx_rev_filename()` with a generic helper
8: 79e3f7f83f = 8: bcadaf9278 pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper
base-commit: 3a57aa566a21e7a510c64881bc6bdff7eb397988
--
2.45.1.321.gbcadaf92783
next prev parent reply other threads:[~2024-05-29 22:55 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
2024-05-23 16:38 ` [PATCH 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
2024-05-29 7:47 ` Patrick Steinhardt
2024-05-29 22:29 ` Taylor Blau
2024-05-23 16:38 ` [PATCH 2/8] midx-write.c: reduce argument count for `get_sorted_entries()` Taylor Blau
2024-05-29 7:47 ` Patrick Steinhardt
2024-05-29 22:35 ` Taylor Blau
2024-05-23 16:38 ` [PATCH 3/8] midx-write.c: pass `start_pack` to `get_sorted_entries()` Taylor Blau
2024-05-29 7:48 ` Patrick Steinhardt
2024-05-29 22:36 ` Taylor Blau
2024-05-23 16:38 ` [PATCH 4/8] midx-write.c: extract `should_include_pack()` Taylor Blau
2024-05-29 7:48 ` Patrick Steinhardt
2024-05-29 22:40 ` Taylor Blau
2024-05-23 16:38 ` [PATCH 5/8] midx-write.c: extract `fill_packs_from_midx()` Taylor Blau
2024-05-23 16:38 ` [PATCH 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
2024-05-29 7:48 ` Patrick Steinhardt
2024-05-29 22:46 ` Taylor Blau
2024-05-23 16:38 ` [PATCH 7/8] midx: replace `get_midx_rev_filename()` with a generic helper Taylor Blau
2024-05-23 16:38 ` [PATCH 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper Taylor Blau
2024-05-29 22:55 ` Taylor Blau [this message]
2024-05-29 22:55 ` [PATCH v2 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
2024-05-29 22:55 ` [PATCH v2 2/8] midx-write.c: reduce argument count for `get_sorted_entries()` Taylor Blau
2024-05-29 22:55 ` [PATCH v2 3/8] midx-write.c: pass `start_pack` to `compute_sorted_entries()` Taylor Blau
2024-05-30 6:59 ` Jeff King
2024-05-29 22:55 ` [PATCH v2 4/8] midx-write.c: extract `should_include_pack()` Taylor Blau
2024-05-29 22:55 ` [PATCH v2 5/8] midx-write.c: extract `fill_packs_from_midx()` Taylor Blau
2024-05-29 22:55 ` [PATCH v2 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
2024-05-30 7:05 ` Jeff King
2024-05-29 22:55 ` [PATCH v2 7/8] midx: replace `get_midx_rev_filename()` with a generic helper Taylor Blau
2024-05-30 7:10 ` Jeff King
2024-05-29 22:55 ` [PATCH v2 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper Taylor Blau
2024-05-30 7:14 ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Jeff King
2024-05-30 13:59 ` Taylor Blau
2024-05-31 8:28 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1717023301.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).