All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <derrickstolee@github.com>,
	Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/7] pack-revindex: enable on-disk reverse indexes by default
Date: Wed, 12 Apr 2023 18:20:14 -0400	[thread overview]
Message-ID: <cover.1681338013.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1681166596.git.me@ttaylorr.com>

Here is a (tiny) reroll of my series to enable pack reverse-indexes to
be written to disk by default instead of computed on-the-fly in memory.

The original cover letter[1] and commits herein have all of the gore-y
details. Not much has changed since last time, except:

  - squashing in a patch from Stolee to propagate a `struct repository
    *` a little further out from `load_pack_revindex()`

  - a tweak to the linux-TEST-vars CI job to continue running it with
    GIT_TEST_NO_WRITE_REV_INDEX

  - and an additional benchmark to demonstrate the performance of `git
    cat-file --batch-check='%(objectsize:disk)' --batch--all-objects
    --unordered` with and without on-disk reverse indexes

As always, a range-diff is included below for convenience. Thanks in
advance for taking (another) look!

[1]: https://lore.kernel.org/git/cover.1681166596.git.me@ttaylorr.com/

Taylor Blau (7):
  pack-write.c: plug a leak in stage_tmp_packfiles()
  t5325: mark as leak-free
  pack-revindex: make `load_pack_revindex` take a repository
  pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
  pack-revindex: introduce `pack.readReverseIndex`
  config: enable `pack.writeReverseIndex` by default
  t: invert `GIT_TEST_WRITE_REV_INDEX`

 Documentation/config/pack.txt     |  8 +++++++-
 builtin/index-pack.c              |  5 +++--
 builtin/pack-objects.c            |  5 +++--
 ci/run-build-and-tests.sh         |  2 +-
 pack-bitmap.c                     | 23 +++++++++++++----------
 pack-revindex.c                   | 12 +++++++++---
 pack-revindex.h                   |  6 ++++--
 pack-write.c                      |  2 ++
 packfile.c                        |  2 +-
 repo-settings.c                   |  1 +
 repository.h                      |  1 +
 t/README                          |  2 +-
 t/perf/p5312-pack-bitmaps-revs.sh |  3 +--
 t/t5325-reverse-index.sh          | 16 +++++++++++++++-
 14 files changed, 62 insertions(+), 26 deletions(-)

Range-diff against v1:
-:  ----------- > 1:  18be29c3988 pack-write.c: plug a leak in stage_tmp_packfiles()
-:  ----------- > 2:  affb5e2574b t5325: mark as leak-free
1:  be4faf11011 ! 3:  687a9a58924 pack-revindex: make `load_pack_revindex` take a repository
    @@ Commit message
         to `load_pack_revindex`, and update all callers to pass the correct
         instance (in all cases, `the_repository`).
     
    -    Signed-off-by: Taylor Blauy <me@ttaylorr.com>
    +    In certain instances, a new function-local variable is introduced to
    +    take the place of a `struct repository *` argument to the function
    +    itself to avoid propagating the new parameter even further throughout
    +    the tree.
    +
    +    Co-authored-by: Derrick Stolee <derrickstolee@github.com>
    +    Signed-off-by: Derrick Stolee <derrickstolee@github.com>
    +    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## pack-bitmap.c ##
    +@@ pack-bitmap.c: static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
    + 	return 0;
    + }
    + 
    +-static int load_reverse_index(struct bitmap_index *bitmap_git)
    ++static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git)
    + {
    + 	if (bitmap_is_midx(bitmap_git)) {
    + 		uint32_t i;
     @@ pack-bitmap.c: static int load_reverse_index(struct bitmap_index *bitmap_git)
      		 * since we will need to make use of them in pack-objects.
      		 */
      		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
     -			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
    -+			ret = load_pack_revindex(the_repository,
    -+						 bitmap_git->midx->packs[i]);
    ++			ret = load_pack_revindex(r, bitmap_git->midx->packs[i]);
      			if (ret)
      				return ret;
      		}
      		return 0;
      	}
     -	return load_pack_revindex(bitmap_git->pack);
    -+	return load_pack_revindex(the_repository, bitmap_git->pack);
    ++	return load_pack_revindex(r, bitmap_git->pack);
      }
      
    - static int load_bitmap(struct bitmap_index *bitmap_git)
    +-static int load_bitmap(struct bitmap_index *bitmap_git)
    ++static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
    + {
    + 	assert(bitmap_git->map);
    + 
    + 	bitmap_git->bitmaps = kh_init_oid_map();
    + 	bitmap_git->ext_index.positions = kh_init_oid_pos();
    + 
    +-	if (load_reverse_index(bitmap_git))
    ++	if (load_reverse_index(r, bitmap_git))
    + 		goto failed;
    + 
    + 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
    +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r)
    + {
    + 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
    + 
    +-	if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
    ++	if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git))
    + 		return bitmap_git;
    + 
    + 	free_bitmap_index(bitmap_git);
    +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r)
    + 
    + struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
    + {
    ++	struct repository *r = the_repository;
    + 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
    + 
    +-	if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(bitmap_git))
    ++	if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
    + 		return bitmap_git;
    + 
    + 	free_bitmap_index(bitmap_git);
    +@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
    + 	 * from disk. this is the point of no return; after this the rev_list
    + 	 * becomes invalidated and we must perform the revwalk through bitmaps
    + 	 */
    +-	if (load_bitmap(bitmap_git) < 0)
    ++	if (load_bitmap(revs->repo, bitmap_git) < 0)
    + 		goto cleanup;
    + 
    + 	object_array_clear(&revs->pending);
    +@@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
    + 				       uint32_t *entries,
    + 				       struct bitmap **reuse_out)
    + {
    ++	struct repository *r = the_repository;
    + 	struct packed_git *pack;
    + 	struct bitmap *result = bitmap_git->result;
    + 	struct bitmap *reuse;
    +@@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
    + 
    + 	assert(result);
    + 
    +-	load_reverse_index(bitmap_git);
    ++	load_reverse_index(r, bitmap_git);
    + 
    + 	if (bitmap_is_midx(bitmap_git))
    + 		pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
    +@@ pack-bitmap.c: int rebuild_bitmap(const uint32_t *reposition,
    + uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
    + 				struct packing_data *mapping)
    + {
    ++	struct repository *r = the_repository;
    + 	uint32_t i, num_objects;
    + 	uint32_t *reposition;
    + 
    + 	if (!bitmap_is_midx(bitmap_git))
    +-		load_reverse_index(bitmap_git);
    ++		load_reverse_index(r, bitmap_git);
    + 	else if (load_midx_revindex(bitmap_git->midx) < 0)
    + 		BUG("rebuild_existing_bitmaps: missing required rev-cache "
    + 		    "extension");
     
      ## pack-revindex.c ##
     @@ pack-revindex.c: static int load_pack_revindex_from_disk(struct packed_git *p)
2:  0f368e2347e = 4:  8eec5bacd3a pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
3:  6f692d470cb ! 5:  a62fc3e4ec1 pack-revindex: introduce `pack.readReverseIndex`
    @@ Commit message
               'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran
                 2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'
     
    +    Luckily, the results when running `git cat-file` with `--unordered` are
    +    closer together:
    +
    +        $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
    +        Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
    +          Time (mean ± σ):      5.066 s ±  0.105 s    [User: 4.792 s, System: 0.274 s]
    +          Range (min … max):    4.943 s …  5.220 s    10 runs
    +
    +        Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
    +          Time (mean ± σ):      6.193 s ±  0.069 s    [User: 5.937 s, System: 0.255 s]
    +          Range (min … max):    6.145 s …  6.356 s    10 runs
    +
    +        Summary
    +          'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran
    +            1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
    +
         Because the equilibrium point between these two is highly machine- and
         repository-dependent, allow users to configure whether or not they will
         read any ".rev" file(s) with this configuration knob.
4:  56a0fc0098e = 6:  f8298fb0bac config: enable `pack.writeReverseIndex` by default
5:  9c803799588 ! 7:  edff6a80c63 t: invert `GIT_TEST_WRITE_REV_INDEX`
    @@ Commit message
         disable writing ".rev" files, thereby running the test suite in a mode
         where the reverse index is generated from scratch.
     
    -    This ensures that we are still running and exercising Git's behavior
    -    when forced to generate reverse indexes from scratch.
    +    This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some
    +    spelling of "true", we are still running and exercising Git's behavior
    +    when forced to generate reverse indexes from scratch. Do so by setting
    +    it in the linux-TEST-vars CI run to ensure that we are maintaining good
    +    coverage of this now-legacy code.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ ci/run-build-and-tests.sh: linux-TEST-vars)
      	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
      	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
     -	export GIT_TEST_WRITE_REV_INDEX=1
    ++	export GIT_TEST_NO_WRITE_REV_INDEX=1
      	export GIT_TEST_CHECKOUT_WORKERS=2
      	;;
      linux-clang)
-- 
2.40.0.323.gedff6a80c63

  parent reply	other threads:[~2023-04-12 22:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-10 22:53 [PATCH 0/7] pack-revindex: enable on-disk reverse indexes by default Taylor Blau
2023-04-10 22:53 ` [PATCH 1/7] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-11 13:23   ` Derrick Stolee
2023-04-11 21:25     ` Taylor Blau
2023-04-10 22:53 ` [PATCH 2/7] t5325: mark as leak-free Taylor Blau
2023-04-10 22:53 ` [PATCH 3/7] pack-revindex: make `load_pack_revindex` take a repository Taylor Blau
2023-04-11 13:45   ` Derrick Stolee
2023-04-11 21:30     ` Taylor Blau
2023-04-12 17:33       ` Derrick Stolee
2023-04-10 22:53 ` [PATCH 4/7] pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK Taylor Blau
2023-04-10 22:53 ` [PATCH 5/7] pack-revindex: introduce `pack.readReverseIndex` Taylor Blau
2023-04-10 22:53 ` [PATCH 6/7] config: enable `pack.writeReverseIndex` by default Taylor Blau
2023-04-13 16:14   ` Junio C Hamano
2023-04-10 22:53 ` [PATCH 7/7] t: invert `GIT_TEST_WRITE_REV_INDEX` Taylor Blau
2023-04-11 13:51   ` Derrick Stolee
2023-04-11 21:33     ` Taylor Blau
2023-04-12 17:37       ` Derrick Stolee
2023-04-11 13:54 ` [PATCH 0/7] pack-revindex: enable on-disk reverse indexes by default Derrick Stolee
2023-04-11 21:40   ` Taylor Blau
2023-04-12 17:39     ` Derrick Stolee
2023-04-12 22:20 ` Taylor Blau [this message]
2023-04-12 22:20   ` [PATCH v2 1/7] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-12 22:20   ` [PATCH v2 2/7] t5325: mark as leak-free Taylor Blau
2023-04-12 22:20   ` [PATCH v2 3/7] pack-revindex: make `load_pack_revindex` take a repository Taylor Blau
2023-04-12 22:20   ` [PATCH v2 4/7] pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK Taylor Blau
2023-04-12 22:20   ` [PATCH v2 5/7] pack-revindex: introduce `pack.readReverseIndex` Taylor Blau
2023-04-12 22:20   ` [PATCH v2 6/7] config: enable `pack.writeReverseIndex` by default Taylor Blau
2023-04-12 22:20   ` [PATCH v2 7/7] t: invert `GIT_TEST_WRITE_REV_INDEX` Taylor Blau
2023-04-13 13:40   ` [PATCH v2 0/7] pack-revindex: enable on-disk reverse indexes by default Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1681338013.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.