All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>,
	Patrick Steinhardt <ps@pks.im>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs
Date: Wed, 29 May 2024 18:55:15 -0400	[thread overview]
Message-ID: <cover.1717023301.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1716482279.git.me@ttaylorr.com>

This is a small reroll of my series which has a grab-bag of midx-write
related cleanups that I pulled out of a larger series to implement
incremental MIDX chains.

This series is mostly the same as last time, but notable changes
include:

  - renamed `get_sorted_entries()` to `compute_sorted_entries()` to
    avoid confusion that the function performs a side-effecting write to
    `ctx->entries_nr`.

  - fixed a handful of typos

  - add explanations in a couple of the patches to better motivate the
    change.

Thanks to Patrick Steinhardt for his careful review on the previous
round!

Taylor Blau (8):
  midx-write.c: tolerate `--preferred-pack` without bitmaps
  midx-write.c: reduce argument count for `get_sorted_entries()`
  midx-write.c: pass `start_pack` to `compute_sorted_entries()`
  midx-write.c: extract `should_include_pack()`
  midx-write.c: extract `fill_packs_from_midx()`
  midx-write.c: support reading an existing MIDX with `packs_to_include`
  midx: replace `get_midx_rev_filename()` with a generic helper
  pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper

 midx-write.c                | 161 ++++++++++++++++++------------------
 midx.c                      |  14 ++--
 midx.h                      |   6 +-
 pack-bitmap.c               |   5 +-
 pack-revindex.c             |   3 +-
 t/t5319-multi-pack-index.sh |  23 ++++++
 6 files changed, 119 insertions(+), 93 deletions(-)

Range-diff against v1:
1:  c753bc379b ! 1:  ad268bd264 midx-write.c: tolerate `--preferred-pack` without bitmaps
    @@ Commit message
         unreferenced packs via 'git multi-pack-index expire').
     
         In practice, this isn't possible to trigger when running `git
    -    multi-pack-index write` from via `git repack`, as the latter always
    -    passes `--stdin-packs`, which prevents us from loading an existing MIDX,
    -    as it forces all packs to be read from disk.
    +    multi-pack-index write` from `git repack`, as the latter always passes
    +    `--stdin-packs`, which prevents us from loading an existing MIDX, as it
    +    forces all packs to be read from disk.
     
         But a future commit in this series will change that behavior to
         unconditionally load an existing MIDX, even with `--stdin-packs`, making
2:  07dad5a581 ! 2:  9d422efe6e midx-write.c: reduce argument count for `get_sorted_entries()`
    @@ Commit message
     
         Simplify the declaration of this function by taking a single pointer to
         the whole `struct write_midx_context` instead of various members within
    -    it.
    +    it. Since this function is now computing the entire result (populating
    +    both `ctx->entries`, and `ctx->entries_nr`), rename it to something that
    +    doesn't start with "get_" to make clear that this function has a
    +    side-effect.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout
     -						  uint32_t nr_packs,
     -						  size_t *nr_objects,
     -						  int preferred_pack)
    -+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx)
    ++static void compute_sorted_entries(struct write_midx_context *ctx)
      {
      	uint32_t cur_fanout, cur_pack, cur_object;
      	size_t alloc_objects, total_objects = 0;
      	struct midx_fanout fanout = { 0 };
    - 	struct pack_midx_entry *deduplicated_entries = NULL;
    +-	struct pack_midx_entry *deduplicated_entries = NULL;
     -	uint32_t start_pack = m ? m->num_packs : 0;
     +	uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
      
    @@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout
      	/*
      	 * As we de-duplicate by fanout value, we expect the fanout
     @@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
    + 	alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
      
      	ALLOC_ARRAY(fanout.entries, fanout.alloc);
    - 	ALLOC_ARRAY(deduplicated_entries, alloc_objects);
    +-	ALLOC_ARRAY(deduplicated_entries, alloc_objects);
     -	*nr_objects = 0;
    ++	ALLOC_ARRAY(ctx->entries, alloc_objects);
     +	ctx->entries_nr = 0;
      
      	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
    @@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pac
      				continue;
      
     -			ALLOC_GROW(deduplicated_entries, st_add(*nr_objects, 1),
    -+			ALLOC_GROW(deduplicated_entries, st_add(ctx->entries_nr, 1),
    ++			ALLOC_GROW(ctx->entries, st_add(ctx->entries_nr, 1),
      				   alloc_objects);
     -			memcpy(&deduplicated_entries[*nr_objects],
    -+			memcpy(&deduplicated_entries[ctx->entries_nr],
    ++			memcpy(&ctx->entries[ctx->entries_nr],
      			       &fanout.entries[cur_object],
      			       sizeof(struct pack_midx_entry));
     -			(*nr_objects)++;
    @@ midx-write.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pac
      		}
      	}
      
    + 	free(fanout.entries);
    +-	return deduplicated_entries;
    + }
    + 
    + static int write_midx_pack_names(struct hashfile *f, void *data)
     @@ midx-write.c: static int write_midx_internal(const char *object_dir,
      		}
      	}
      
     -	ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr,
     -					 ctx.preferred_pack_idx);
    -+	ctx.entries = get_sorted_entries(&ctx);
    ++	compute_sorted_entries(&ctx);
      
      	ctx.large_offsets_needed = 0;
      	for (i = 0; i < ctx.entries_nr; i++) {
3:  7acf4557dc ! 3:  e81296f8cc midx-write.c: pass `start_pack` to `get_sorted_entries()`
    @@ Metadata
     Author: Taylor Blau <me@ttaylorr.com>
     
      ## Commit message ##
    -    midx-write.c: pass `start_pack` to `get_sorted_entries()`
    +    midx-write.c: pass `start_pack` to `compute_sorted_entries()`
     
    -    The function `get_sorted_entries()` is broadly responsible for
    +    The function `compute_sorted_entries()` is broadly responsible for
         building an array of the objects to be written into a MIDX based on the
         provided list of packs.
     
    @@ Commit message
     
         The existing implementation simply skips past the first
         ctx->m->num_packs (if ctx->m is non-NULL, indicating that we loaded an
    -    existing MIDX). Future changes (outside the scope of this patch series)
    -    to the MIDX code will require us to skip *at most* that number[^1].
    +    existing MIDX). This is because we read objects in packs from an
    +    existing MIDX via the MIDX itself, rather than from the pack-level
    +    fanout to guarantee a de-duplicated result (see: a40498a1265 (midx: use
    +    existing midx when writing new one, 2018-07-12)).
    +
    +    Future changes (outside the scope of this patch series) to the MIDX code
    +    will require us to skip *at most* that number[^1].
     
         We could tag each pack with a bit that indicates the pack's contents
         should be included in the MIDX. But we can just as easily determine the
    @@ midx-write.c: static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout
       * Copy only the de-duplicated entries (selected by most-recent modified time
       * of a packfile containing the object).
       */
    --static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx)
    -+static struct pack_midx_entry *get_sorted_entries(struct write_midx_context *ctx,
    -+						  uint32_t start_pack)
    +-static void compute_sorted_entries(struct write_midx_context *ctx)
    ++static void compute_sorted_entries(struct write_midx_context *ctx,
    ++				   uint32_t start_pack)
      {
      	uint32_t cur_fanout, cur_pack, cur_object;
      	size_t alloc_objects, total_objects = 0;
      	struct midx_fanout fanout = { 0 };
    - 	struct pack_midx_entry *deduplicated_entries = NULL;
     -	uint32_t start_pack = ctx->m ? ctx->m->num_packs : 0;
      
      	for (cur_pack = start_pack; cur_pack < ctx->nr; cur_pack++)
    @@ midx-write.c: static int write_midx_internal(const char *object_dir,
      		}
      	}
      
    --	ctx.entries = get_sorted_entries(&ctx);
    -+	ctx.entries = get_sorted_entries(&ctx, start_pack);
    +-	compute_sorted_entries(&ctx);
    ++	compute_sorted_entries(&ctx, start_pack);
      
      	ctx.large_offsets_needed = 0;
      	for (i = 0; i < ctx.entries_nr; i++) {
4:  3908546ea8 ! 4:  9cd9257465 midx-write.c: extract `should_include_pack()`
    @@ Commit message
           - or, appear in the "to_include" list, if invoking the MIDX write
             machinery with the `--stdin-packs` command-line argument.
     
    -    In a future commit will want to call a slight variant of these checks
    -    from the code that reuses all packs from an existing MIDX, as well as
    -    the current location via add_pack_to_midx(). The latter will be
    -    modified in subsequent commits to only reuse packs which appear in the
    -    to_include list, if one was given.
    +    A future commit will want to call a slight variant of these checks from
    +    the code that reuses all packs from an existing MIDX, as well as the
    +    current location via add_pack_to_midx(). The latter will be modified in
    +    subsequent commits to only reuse packs which appear in the to_include
    +    list, if one was given.
     
         Prepare for that step by extracting these checks as a subroutine that
         may be called from both places.
5:  dc813ea1ca = 5:  1bb890e87c midx-write.c: extract `fill_packs_from_midx()`
6:  61268114c6 ! 6:  fe187b1939 midx-write.c: support reading an existing MIDX with `packs_to_include`
    @@ Commit message
         reject all of the packs it provided since they appear in an existing
         MIDX by definition.
     
    +    The sum total of this change is that we are now able to read and
    +    reference objects in an existing MIDX even when given a non-NULL
    +    `packs_to_include`. This is a prerequisite step for incremental MIDXs,
    +    which need to load any existing MIDX (if one is present) in order to
    +    determine whether or not an object already appears in an earlier portion
    +    of the MIDX to avoid duplicating it across multiple portions.
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## midx-write.c ##
7:  f4c0167f43 = 7:  f4f977c1c7 midx: replace `get_midx_rev_filename()` with a generic helper
8:  79e3f7f83f = 8:  bcadaf9278 pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper

base-commit: 3a57aa566a21e7a510c64881bc6bdff7eb397988
-- 
2.45.1.321.gbcadaf92783

  parent reply	other threads:[~2024-05-29 22:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-23 16:37 [PATCH 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Taylor Blau
2024-05-23 16:38 ` [PATCH 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
2024-05-29  7:47   ` Patrick Steinhardt
2024-05-29 22:29     ` Taylor Blau
2024-05-23 16:38 ` [PATCH 2/8] midx-write.c: reduce argument count for `get_sorted_entries()` Taylor Blau
2024-05-29  7:47   ` Patrick Steinhardt
2024-05-29 22:35     ` Taylor Blau
2024-05-23 16:38 ` [PATCH 3/8] midx-write.c: pass `start_pack` to `get_sorted_entries()` Taylor Blau
2024-05-29  7:48   ` Patrick Steinhardt
2024-05-29 22:36     ` Taylor Blau
2024-05-23 16:38 ` [PATCH 4/8] midx-write.c: extract `should_include_pack()` Taylor Blau
2024-05-29  7:48   ` Patrick Steinhardt
2024-05-29 22:40     ` Taylor Blau
2024-05-23 16:38 ` [PATCH 5/8] midx-write.c: extract `fill_packs_from_midx()` Taylor Blau
2024-05-23 16:38 ` [PATCH 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
2024-05-29  7:48   ` Patrick Steinhardt
2024-05-29 22:46     ` Taylor Blau
2024-05-23 16:38 ` [PATCH 7/8] midx: replace `get_midx_rev_filename()` with a generic helper Taylor Blau
2024-05-23 16:38 ` [PATCH 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper Taylor Blau
2024-05-29 22:55 ` Taylor Blau [this message]
2024-05-29 22:55   ` [PATCH v2 1/8] midx-write.c: tolerate `--preferred-pack` without bitmaps Taylor Blau
2024-05-29 22:55   ` [PATCH v2 2/8] midx-write.c: reduce argument count for `get_sorted_entries()` Taylor Blau
2024-05-29 22:55   ` [PATCH v2 3/8] midx-write.c: pass `start_pack` to `compute_sorted_entries()` Taylor Blau
2024-05-30  6:59     ` Jeff King
2024-05-29 22:55   ` [PATCH v2 4/8] midx-write.c: extract `should_include_pack()` Taylor Blau
2024-05-29 22:55   ` [PATCH v2 5/8] midx-write.c: extract `fill_packs_from_midx()` Taylor Blau
2024-05-29 22:55   ` [PATCH v2 6/8] midx-write.c: support reading an existing MIDX with `packs_to_include` Taylor Blau
2024-05-30  7:05     ` Jeff King
2024-05-29 22:55   ` [PATCH v2 7/8] midx: replace `get_midx_rev_filename()` with a generic helper Taylor Blau
2024-05-30  7:10     ` Jeff King
2024-05-29 22:55   ` [PATCH v2 8/8] pack-bitmap.c: reimplement `midx_bitmap_filename()` with helper Taylor Blau
2024-05-30  7:14   ` [PATCH v2 0/8] midx-write: miscellaneous clean-ups for incremental MIDXs Jeff King
2024-05-30 13:59     ` Taylor Blau
2024-05-31  8:28     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1717023301.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.