From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <derrickstolee@github.com>,
Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v2 0/7] pack-revindex: enable on-disk reverse indexes by default
Date: Wed, 12 Apr 2023 18:20:14 -0400 [thread overview]
Message-ID: <cover.1681338013.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1681166596.git.me@ttaylorr.com>
Here is a (tiny) reroll of my series to enable pack reverse-indexes to
be written to disk by default instead of computed on-the-fly in memory.
The original cover letter[1] and commits herein have all of the gore-y
details. Not much has changed since last time, except:
- squashing in a patch from Stolee to propagate a `struct repository
*` a little further out from `load_pack_revindex()`
- a tweak to the linux-TEST-vars CI job to continue running it with
GIT_TEST_NO_WRITE_REV_INDEX
- and an additional benchmark to demonstrate the performance of `git
cat-file --batch-check='%(objectsize:disk)' --batch--all-objects
--unordered` with and without on-disk reverse indexes
As always, a range-diff is included below for convenience. Thanks in
advance for taking (another) look!
[1]: https://lore.kernel.org/git/cover.1681166596.git.me@ttaylorr.com/
Taylor Blau (7):
pack-write.c: plug a leak in stage_tmp_packfiles()
t5325: mark as leak-free
pack-revindex: make `load_pack_revindex` take a repository
pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
pack-revindex: introduce `pack.readReverseIndex`
config: enable `pack.writeReverseIndex` by default
t: invert `GIT_TEST_WRITE_REV_INDEX`
Documentation/config/pack.txt | 8 +++++++-
builtin/index-pack.c | 5 +++--
builtin/pack-objects.c | 5 +++--
ci/run-build-and-tests.sh | 2 +-
pack-bitmap.c | 23 +++++++++++++----------
pack-revindex.c | 12 +++++++++---
pack-revindex.h | 6 ++++--
pack-write.c | 2 ++
packfile.c | 2 +-
repo-settings.c | 1 +
repository.h | 1 +
t/README | 2 +-
t/perf/p5312-pack-bitmaps-revs.sh | 3 +--
t/t5325-reverse-index.sh | 16 +++++++++++++++-
14 files changed, 62 insertions(+), 26 deletions(-)
Range-diff against v1:
-: ----------- > 1: 18be29c3988 pack-write.c: plug a leak in stage_tmp_packfiles()
-: ----------- > 2: affb5e2574b t5325: mark as leak-free
1: be4faf11011 ! 3: 687a9a58924 pack-revindex: make `load_pack_revindex` take a repository
@@ Commit message
to `load_pack_revindex`, and update all callers to pass the correct
instance (in all cases, `the_repository`).
- Signed-off-by: Taylor Blauy <me@ttaylorr.com>
+ In certain instances, a new function-local variable is introduced to
+ take the place of a `struct repository *` argument to the function
+ itself to avoid propagating the new parameter even further throughout
+ the tree.
+
+ Co-authored-by: Derrick Stolee <derrickstolee@github.com>
+ Signed-off-by: Derrick Stolee <derrickstolee@github.com>
+ Signed-off-by: Taylor Blau <me@ttaylorr.com>
## pack-bitmap.c ##
+@@ pack-bitmap.c: static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
+ return 0;
+ }
+
+-static int load_reverse_index(struct bitmap_index *bitmap_git)
++static int load_reverse_index(struct repository *r, struct bitmap_index *bitmap_git)
+ {
+ if (bitmap_is_midx(bitmap_git)) {
+ uint32_t i;
@@ pack-bitmap.c: static int load_reverse_index(struct bitmap_index *bitmap_git)
* since we will need to make use of them in pack-objects.
*/
for (i = 0; i < bitmap_git->midx->num_packs; i++) {
- ret = load_pack_revindex(bitmap_git->midx->packs[i]);
-+ ret = load_pack_revindex(the_repository,
-+ bitmap_git->midx->packs[i]);
++ ret = load_pack_revindex(r, bitmap_git->midx->packs[i]);
if (ret)
return ret;
}
return 0;
}
- return load_pack_revindex(bitmap_git->pack);
-+ return load_pack_revindex(the_repository, bitmap_git->pack);
++ return load_pack_revindex(r, bitmap_git->pack);
}
- static int load_bitmap(struct bitmap_index *bitmap_git)
+-static int load_bitmap(struct bitmap_index *bitmap_git)
++static int load_bitmap(struct repository *r, struct bitmap_index *bitmap_git)
+ {
+ assert(bitmap_git->map);
+
+ bitmap_git->bitmaps = kh_init_oid_map();
+ bitmap_git->ext_index.positions = kh_init_oid_pos();
+
+- if (load_reverse_index(bitmap_git))
++ if (load_reverse_index(r, bitmap_git))
+ goto failed;
+
+ if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
+@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r)
+ {
+ struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
+
+- if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
++ if (!open_bitmap(r, bitmap_git) && !load_bitmap(r, bitmap_git))
+ return bitmap_git;
+
+ free_bitmap_index(bitmap_git);
+@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_git(struct repository *r)
+
+ struct bitmap_index *prepare_midx_bitmap_git(struct multi_pack_index *midx)
+ {
++ struct repository *r = the_repository;
+ struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
+
+- if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(bitmap_git))
++ if (!open_midx_bitmap_1(bitmap_git, midx) && !load_bitmap(r, bitmap_git))
+ return bitmap_git;
+
+ free_bitmap_index(bitmap_git);
+@@ pack-bitmap.c: struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
+ * from disk. this is the point of no return; after this the rev_list
+ * becomes invalidated and we must perform the revwalk through bitmaps
+ */
+- if (load_bitmap(bitmap_git) < 0)
++ if (load_bitmap(revs->repo, bitmap_git) < 0)
+ goto cleanup;
+
+ object_array_clear(&revs->pending);
+@@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
+ uint32_t *entries,
+ struct bitmap **reuse_out)
+ {
++ struct repository *r = the_repository;
+ struct packed_git *pack;
+ struct bitmap *result = bitmap_git->result;
+ struct bitmap *reuse;
+@@ pack-bitmap.c: int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
+
+ assert(result);
+
+- load_reverse_index(bitmap_git);
++ load_reverse_index(r, bitmap_git);
+
+ if (bitmap_is_midx(bitmap_git))
+ pack = bitmap_git->midx->packs[midx_preferred_pack(bitmap_git)];
+@@ pack-bitmap.c: int rebuild_bitmap(const uint32_t *reposition,
+ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
+ struct packing_data *mapping)
+ {
++ struct repository *r = the_repository;
+ uint32_t i, num_objects;
+ uint32_t *reposition;
+
+ if (!bitmap_is_midx(bitmap_git))
+- load_reverse_index(bitmap_git);
++ load_reverse_index(r, bitmap_git);
+ else if (load_midx_revindex(bitmap_git->midx) < 0)
+ BUG("rebuild_existing_bitmaps: missing required rev-cache "
+ "extension");
## pack-revindex.c ##
@@ pack-revindex.c: static int load_pack_revindex_from_disk(struct packed_git *p)
2: 0f368e2347e = 4: 8eec5bacd3a pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
3: 6f692d470cb ! 5: a62fc3e4ec1 pack-revindex: introduce `pack.readReverseIndex`
@@ Commit message
'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran
2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'
+ Luckily, the results when running `git cat-file` with `--unordered` are
+ closer together:
+
+ $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
+ Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
+ Time (mean ± σ): 5.066 s ± 0.105 s [User: 4.792 s, System: 0.274 s]
+ Range (min … max): 4.943 s … 5.220 s 10 runs
+
+ Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
+ Time (mean ± σ): 6.193 s ± 0.069 s [User: 5.937 s, System: 0.255 s]
+ Range (min … max): 6.145 s … 6.356 s 10 runs
+
+ Summary
+ 'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran
+ 1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
+
Because the equilibrium point between these two is highly machine- and
repository-dependent, allow users to configure whether or not they will
read any ".rev" file(s) with this configuration knob.
4: 56a0fc0098e = 6: f8298fb0bac config: enable `pack.writeReverseIndex` by default
5: 9c803799588 ! 7: edff6a80c63 t: invert `GIT_TEST_WRITE_REV_INDEX`
@@ Commit message
disable writing ".rev" files, thereby running the test suite in a mode
where the reverse index is generated from scratch.
- This ensures that we are still running and exercising Git's behavior
- when forced to generate reverse indexes from scratch.
+ This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some
+ spelling of "true", we are still running and exercising Git's behavior
+ when forced to generate reverse indexes from scratch. Do so by setting
+ it in the linux-TEST-vars CI run to ensure that we are maintaining good
+ coverage of this now-legacy code.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
@@ ci/run-build-and-tests.sh: linux-TEST-vars)
export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
- export GIT_TEST_WRITE_REV_INDEX=1
++ export GIT_TEST_NO_WRITE_REV_INDEX=1
export GIT_TEST_CHECKOUT_WORKERS=2
;;
linux-clang)
--
2.40.0.323.gedff6a80c63
next prev parent reply other threads:[~2023-04-12 22:20 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-10 22:53 [PATCH 0/7] pack-revindex: enable on-disk reverse indexes by default Taylor Blau
2023-04-10 22:53 ` [PATCH 1/7] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-11 13:23 ` Derrick Stolee
2023-04-11 21:25 ` Taylor Blau
2023-04-10 22:53 ` [PATCH 2/7] t5325: mark as leak-free Taylor Blau
2023-04-10 22:53 ` [PATCH 3/7] pack-revindex: make `load_pack_revindex` take a repository Taylor Blau
2023-04-11 13:45 ` Derrick Stolee
2023-04-11 21:30 ` Taylor Blau
2023-04-12 17:33 ` Derrick Stolee
2023-04-10 22:53 ` [PATCH 4/7] pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK Taylor Blau
2023-04-10 22:53 ` [PATCH 5/7] pack-revindex: introduce `pack.readReverseIndex` Taylor Blau
2023-04-10 22:53 ` [PATCH 6/7] config: enable `pack.writeReverseIndex` by default Taylor Blau
2023-04-13 16:14 ` Junio C Hamano
2023-04-10 22:53 ` [PATCH 7/7] t: invert `GIT_TEST_WRITE_REV_INDEX` Taylor Blau
2023-04-11 13:51 ` Derrick Stolee
2023-04-11 21:33 ` Taylor Blau
2023-04-12 17:37 ` Derrick Stolee
2023-04-11 13:54 ` [PATCH 0/7] pack-revindex: enable on-disk reverse indexes by default Derrick Stolee
2023-04-11 21:40 ` Taylor Blau
2023-04-12 17:39 ` Derrick Stolee
2023-04-12 22:20 ` Taylor Blau [this message]
2023-04-12 22:20 ` [PATCH v2 1/7] pack-write.c: plug a leak in stage_tmp_packfiles() Taylor Blau
2023-04-12 22:20 ` [PATCH v2 2/7] t5325: mark as leak-free Taylor Blau
2023-04-12 22:20 ` [PATCH v2 3/7] pack-revindex: make `load_pack_revindex` take a repository Taylor Blau
2023-04-12 22:20 ` [PATCH v2 4/7] pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK Taylor Blau
2023-04-12 22:20 ` [PATCH v2 5/7] pack-revindex: introduce `pack.readReverseIndex` Taylor Blau
2023-04-12 22:20 ` [PATCH v2 6/7] config: enable `pack.writeReverseIndex` by default Taylor Blau
2023-04-12 22:20 ` [PATCH v2 7/7] t: invert `GIT_TEST_WRITE_REV_INDEX` Taylor Blau
2023-04-13 13:40 ` [PATCH v2 0/7] pack-revindex: enable on-disk reverse indexes by default Derrick Stolee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1681338013.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).