From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
Elijah Newren <newren@gmail.com>,
Derrick Stolee <stolee@gmail.com>
Subject: Re: [PATCH 6/8] pack-bitmap: sort bitmaps before XORing
Date: Wed, 27 May 2026 12:56:00 -0400 [thread overview]
Message-ID: <ahciIDuESxNa9Fzn@nand.local> (raw)
In-Reply-To: <20260527100406.GG981444@coredump.intra.peff.net>
On Wed, May 27, 2026 at 06:04:06AM -0400, Jeff King wrote:
> On Tue, May 19, 2026 at 12:12:50PM -0400, Taylor Blau wrote:
>
> > Reachability bitmaps may be stored as XORs against nearby bitmaps, up to
> > 10 away. However, when callers provide selected commits in an arbitrary
> > order, the writer may miss good ancestor/descendant pairs and produce
> > much larger bitmap files without changing query coverage.
> >
> > Sort the selected bitmaps in date order (from oldest to newest) before
> > computing XOR offsets, leaving pseudo-merge bitmaps alone (which we will
> > deal with separately in following commits).
>
> That order certainly makes the most sense. I'd have thought we ended up
> there incidentally because of the order in which we consider the
> commits, but perhaps not. I wonder if this got much worse when we
> re-wrote the bitmap generation code a few years ago.
>
> That was in v2.31.0, I think. Repacking linux.git with bitmaps, though,
> I couldn't find any difference in size between v2.30 and v2.31. They're
> both ~67M. But that also didn't shrink with this patch, either.
>
> If you have some spare CPU cycles to burn, I would be interested in a
> comparison of the bitmap size of your test repo using v2.30.0, v2.31.1,
> and this patch.
I started running this experiment, but I don't think I actually have
enough CPU cycles to let it finish ;-). Pre-v2.31 bitmap generation is
*really* slow[^1], and after multiple hours (forcing the same selection
of bitmaps by back-porting and adjusting 'test-tool bitmap') I couldn't
seem to make any meaningful progress.
I'm sure that you could get some plausible numbers out of benchmarking
this on a smaller repository. In case you're interested, here's the
patch I wrote on top of v2.30.0:
--- 8< ---
diff --git a/Makefile b/Makefile
index 7b64106930a..9ce9f2b483c 100644
--- a/Makefile
+++ b/Makefile
@@ -690,6 +690,7 @@ X =
PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
TEST_BUILTINS_OBJS += test-advise.o
+TEST_BUILTINS_OBJS += test-bitmap.o
TEST_BUILTINS_OBJS += test-bloom.o
TEST_BUILTINS_OBJS += test-chmtime.o
TEST_BUILTINS_OBJS += test-config.o
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 5e998bdaa79..55dbb475120 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -113,7 +113,7 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
static struct object **seen_objects;
static unsigned int seen_objects_nr, seen_objects_alloc;
-static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused)
+void bitmap_writer_push_commit(struct commit *commit, struct ewah_bitmap *reused)
{
if (writer.selected_nr >= writer.selected_alloc) {
writer.selected_alloc = (writer.selected_alloc + 32) * 2;
@@ -402,7 +402,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
if (indexed_commits_nr < 100) {
for (i = 0; i < indexed_commits_nr; ++i)
- push_bitmapped_commit(indexed_commits[i], NULL);
+ bitmap_writer_push_commit(indexed_commits[i], NULL);
return;
}
@@ -440,7 +440,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits,
}
}
- push_bitmapped_commit(chosen, reused_bitmap);
+ bitmap_writer_push_commit(chosen, reused_bitmap);
i += next + 1;
display_progress(writer.progress, i);
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 1203120c432..a882efdb16a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -74,6 +74,8 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack,
struct pack_idx_entry **index,
uint32_t index_nr);
void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
+void bitmap_writer_push_commit(struct commit *commit,
+ struct ewah_bitmap *reused);
void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr, int max_bitmaps);
void bitmap_writer_build(struct packing_data *to_pack);
diff --git a/t/helper/test-bitmap.c b/t/helper/test-bitmap.c
new file mode 100644
index 00000000000..9be03377c06
--- /dev/null
+++ b/t/helper/test-bitmap.c
@@ -0,0 +1,119 @@
+#define USE_THE_REPOSITORY_VARIABLE
+
+#include "test-tool.h"
+#include "git-compat-util.h"
+#include "commit.h"
+#include "packfile.h"
+#include "pack-bitmap.h"
+
+static int add_packed_object(const struct object_id *oid,
+ struct packed_git *pack,
+ uint32_t pos,
+ void *_data)
+{
+ struct packing_data *packed = _data;
+ struct object_entry *entry;
+ struct object_info oi = OBJECT_INFO_INIT;
+ enum object_type type;
+
+ oi.typep = &type;
+
+ entry = packlist_alloc(packed, oid);
+ entry->in_pack_offset = nth_packed_object_offset(pack, pos);
+ entry->idx.offset = entry->in_pack_offset;
+ if (packed_object_info(the_repository, pack, entry->in_pack_offset, &oi) < 0)
+ die("could not get type of object %s",
+ oid_to_hex(oid));
+ oe_set_type(entry, type);
+ oe_set_in_pack(packed, entry, pack);
+
+ return 0;
+}
+
+static int idx_oid_cmp(const void *va, const void *vb)
+{
+ const struct pack_idx_entry *a = *(const struct pack_idx_entry **)va;
+ const struct pack_idx_entry *b = *(const struct pack_idx_entry **)vb;
+
+ return oidcmp(&a->oid, &b->oid);
+}
+
+static int bitmap_write(const char *basename)
+{
+ struct packed_git *p = NULL;
+ struct packing_data packed = { 0 };
+ struct pack_idx_entry **index;
+ struct strbuf buf = STRBUF_INIT;
+ uint32_t i;
+
+ prepare_repo_settings(the_repository);
+ for (p = get_all_packs(the_repository); p; p = p->next) {
+ if (!strcmp(pack_basename(p), basename))
+ break;
+ }
+
+ if (!p)
+ die("could not find pack '%s'", basename);
+
+ if (open_pack_index(p))
+ die("cannot open pack index for '%s'", p->pack_name);
+
+ prepare_packing_data(the_repository, &packed);
+
+ for_each_object_in_pack(p, add_packed_object, &packed,
+ FOR_EACH_OBJECT_PACK_ORDER);
+
+ /*
+ * Build the index array now that data.packed.objects[] is
+ * fully allocated (packlist_alloc() may have reallocated it
+ * during the loop above).
+ */
+ ALLOC_ARRAY(index, p->num_objects);
+ for (i = 0; i < p->num_objects; i++)
+ index[i] = &packed.objects[i].idx;
+
+ bitmap_writer_build_type_index(&packed, index, p->num_objects);
+
+ while (strbuf_getline_lf(&buf, stdin) != EOF) {
+ struct object_id oid;
+ struct commit *c;
+
+ if (get_oid_hex(buf.buf, &oid))
+ die("invalid OID: %s", buf.buf);
+
+ c = lookup_commit(the_repository, &oid);
+ if (!c || repo_parse_commit(the_repository, c))
+ die("could not parse commit %s", buf.buf);
+
+ bitmap_writer_push_commit(c, NULL);
+ }
+
+ bitmap_writer_build(&packed);
+
+ bitmap_writer_set_checksum(p->hash);
+
+ QSORT(index, p->num_objects, idx_oid_cmp);
+
+ strbuf_reset(&buf);
+ strbuf_addstr(&buf, p->pack_name);
+ strbuf_strip_suffix(&buf, ".pack");
+ strbuf_addstr(&buf, ".bitmap");
+ bitmap_writer_finish(index, p->num_objects, buf.buf, 0);
+
+ strbuf_release(&buf);
+ free(index);
+
+ return 0;
+}
+
+int cmd__bitmap(int argc, const char **argv)
+{
+ setup_git_directory();
+
+ if (argc == 3 && !strcmp(argv[1], "write"))
+ return bitmap_write(argv[2]);
+
+ usage("\ttest-tool bitmap write <pack-basename> < <commit-list>");
+
+ return -1;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 9d6d14d9293..c43d8c0977b 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -15,6 +15,7 @@ struct test_cmd {
static struct test_cmd cmds[] = {
{ "advise", cmd__advise_if_enabled },
+ { "bitmap", cmd__bitmap },
{ "bloom", cmd__bloom },
{ "chmtime", cmd__chmtime },
{ "config", cmd__config },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index a6470ff62c4..27e6e40ffcb 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -5,6 +5,7 @@
#include "git-compat-util.h"
int cmd__advise_if_enabled(int argc, const char **argv);
+int cmd__bitmap(int argc, const char **argv);
int cmd__bloom(int argc, const char **argv);
int cmd__chmtime(int argc, const char **argv);
int cmd__config(int argc, const char **argv);
--- >8 ---
> > On our same testing repository from previous commits, this change shrunk
> > our selection of 1,261 bitmaps from ~635.46 MiB to 176.4 MiB for a
> > ~72.24% reduction in the on-disk size of our *.bitmap file. The time to
> > generate the smaller bitmap file decreased by ~3.69 seconds, though this
> > is likely mostly noise.
>
> Certainly good numbers. The obvious follow-up question is: how does the
> reading side fare? I'd expect it to be a little better, if only because
> there are fewer bytes to consider when XOR-ing. But if there's some
> hidden assumption we're missing, then it could get wildly worse. It
> would be good to confirm that that didn't happen. ;)
It doesn't make a huge difference. Prior to this patch, the timings on
my test repository for 'git rev-list --count --all --objects
--use-bitmap-index' go from:
- 2.180 ± 0.019 seconds (with pseudo-merges)
- 52.149 ± 0.224 seconds (without pseudo-merges)
, and after applying this patch, it changes to:
- 2.611 ± 0.023 seconds (with pseudo-merges)
- 51.963 ± 0.131 seconds (without pseudo-merges)
It looks like there is a minor slow-down on pseudo-merges, and a minor
speed-up without them. The difference is small enough that I'm willing
to treat it as run-to-run noise.
> > static void compute_xor_offsets(struct bitmap_writer *writer)
> > {
> > static const int MAX_XOR_OFFSET_SEARCH = 10;
> >
> > int i, next = 0;
> > + int nr = bitmap_writer_nr_selected_commits(writer);
> > +
> > + if (nr > 1) {
> > + QSORT(writer->selected, nr, bitmapped_commit_date_cmp);
> > +
> > + for (i = 0; i < nr; i++) {
> > + struct bitmapped_commit *stored = &writer->selected[i];
> > + khiter_t hash_pos = kh_get_oid_map(writer->bitmaps,
> > + stored->commit->object.oid);
> > +
> > + if (hash_pos == kh_end(writer->bitmaps))
> > + BUG("selected commit missing from bitmap map: %s",
> > + oid_to_hex(&stored->commit->object.oid));
> > +
> > + kh_value(writer->bitmaps, hash_pos) = stored;
> > + }
> > + }
>
> OK. It took me a minute to wrap my head around this. The real work is
> done by QSORT(). But because we maintain a hash pointing into that
> array, we have to go through each hash entry and fix up its pointer.
Yup.
> Looks correct.
Thanks,
Taylor
[^1]: ...and I have great empathy for Stolee's suffering here when
benchmarking his performance improvements to the bitmap generation
code from back in the day! ;-).
next prev parent reply other threads:[~2026-05-27 16:56 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 16:12 [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Taylor Blau
2026-05-19 16:12 ` [PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27 8:57 ` Jeff King
2026-05-27 14:36 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27 9:03 ` Jeff King
2026-05-27 14:36 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27 9:24 ` Jeff King
2026-05-27 14:40 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-20 14:42 ` SZEDER Gábor
2026-05-20 17:12 ` Taylor Blau
2026-05-27 9:27 ` Jeff King
2026-05-19 16:12 ` [PATCH 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27 9:45 ` Jeff King
2026-05-27 14:46 ` Taylor Blau
2026-05-19 16:12 ` [PATCH 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 10:04 ` Jeff King
2026-05-27 16:56 ` Taylor Blau [this message]
2026-05-19 16:12 ` [PATCH 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-19 16:12 ` [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
2026-05-27 10:25 ` Jeff King
2026-05-27 19:24 ` Taylor Blau
2026-05-27 10:27 ` [PATCH 0/8] pack-bitmap-write: speed up bitmap generation Jeff King
2026-05-27 19:55 ` [PATCH v2 " Taylor Blau
2026-05-27 19:55 ` [PATCH v2 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` Taylor Blau
2026-05-27 19:55 ` [PATCH v2 2/8] pack-bitmap: check subtree bits before recursing Taylor Blau
2026-05-27 19:55 ` [PATCH v2 3/8] pack-bitmap: reuse stored selected bitmaps Taylor Blau
2026-05-27 19:55 ` [PATCH v2 4/8] pack-bitmap: consolidate `find_object_pos()` success path Taylor Blau
2026-05-27 19:56 ` [PATCH v2 5/8] pack-bitmap: cache object positions during fill Taylor Blau
2026-05-27 19:56 ` [PATCH v2 6/8] pack-bitmap: sort bitmaps before XORing Taylor Blau
2026-05-27 19:56 ` [PATCH v2 7/8] pack-bitmap: remember pseudo-merge parents Taylor Blau
2026-05-27 19:56 ` [PATCH v2 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahciIDuESxNa9Fzn@nand.local \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=stolee@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox