From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
Elijah Newren <newren@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format
Date: Fri, 10 May 2024 13:46:59 +0200 [thread overview]
Message-ID: <Zj4JM3ATSMice5do@tanuki> (raw)
In-Reply-To: <ZjkHT9XVl7ua8E14@nand.local>
[-- Attachment #1: Type: text/plain, Size: 5933 bytes --]
On Mon, May 06, 2024 at 12:37:35PM -0400, Taylor Blau wrote:
> On Mon, May 06, 2024 at 01:52:44PM +0200, Patrick Steinhardt wrote:
> > > diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
> > > index f5d200939b0..63a7177ac08 100644
> > > --- a/Documentation/technical/bitmap-format.txt
> > > +++ b/Documentation/technical/bitmap-format.txt
> > > @@ -255,3 +255,182 @@ triplet is -
> > > xor_row (4 byte integer, network byte order): ::
> > > The position of the triplet whose bitmap is used to compress
> > > this one, or `0xffffffff` if no such bitmap exists.
> > > +
> > > +Pseudo-merge bitmaps
> > > +--------------------
> > > +
> > > +If the `BITMAP_OPT_PSEUDO_MERGES` flag is set, a variable number of
> > > +bytes (preceding the name-hash cache, commit lookup table, and trailing
> > > +checksum) of the `.bitmap` file is used to store pseudo-merge bitmaps.
> >
> > Here you say that the section is supposed to come before some other
> > sections, whereas the first sentence in the "File format" section says
> > that it is the last section in a bitmap file.
>
> This is a quirk of the on-disk .bitmap format. New sections are added
> before existing sections, so if you were reading the file from beginning
> to end, you'd see the pseudo-merges extension, then the lookup table,
> then the name-hash cache (assuming all were written).
>
> I think that describing them in the order they were introduced here
> makes more sense, leaving their layout within the .bitmap file as an
> implementation detail.
>
> If you feel strongly otherwise, let's clean it up outside of this series
> since this whole portion of the documentation would need to be
> reordered.
I don't, thanks for the explanation.
[snip]
> +=== Overview
> +
> A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as
> follows:
> --- >8 ---
>
> > > +For example, suppose there exists a pseudo-merge bitmap with a large
> > > +number of commits, all of which are listed in the `WANTS` section of
> > > +some bitmap traversal query. When pseudo-merge bitmaps are enabled, the
> > > +bitmap machinery can quickly determine there is a pseudo-merge which
> > > +satisfies some subset of the wanted objects on either side of the query.
> > > +Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the
> > > +resulting bitmap. By contrast, without pseudo-merge bitmaps, we would
> > > +have to repeat the decompression and `OR`-ing step over a potentially
> > > +large number of individual bitmaps, which can take proportionally more
> > > +time.
> > > +
> > > +Another benefit of pseudo-merges arises when there is some combination
> > > +of (a) a large number of references, with (b) poor bitmap coverage, and
> > > +(c) deep, nested trees, making fill-in traversal relatively expensive.
> > > +For example, suppose that there are a large enough number of tags where
> > > +bitmapping each of the tags individually is infeasible. Without
> > > +pseudo-merge bitmaps, computing the result of, say, `git rev-list
> > > +--use-bitmap-index --count --objects --tags` would likely require a
> > > +large amount of fill-in traversal. But when a large quantity of those
> > > +tags are stored together in a pseudo-merge bitmap, the bitmap machinery
> > > +can take advantage of the fact that we only care about the union of
> > > +objects reachable from all of those tags, and answer the query much
> > > +faster.
> >
> > I would start the explanation with a discussion of the problem before
> > presenting the solution to those problems. In the current version it's
> > the other way round, you present a solution to a problem that isn't yet
> > explained
> >
> > It might also be helpful to discuss a bit who is supposed to create
> > those pseudo-merge bitmaps. Does Git do so automatically for all tags?
> > Does the admin have to configure this? If the latter, when do you want
> > to create those and what strategies are there to create them?
>
> The pseudo-merge bitmaps are created by Git itself, configured via the
> options described later on in this series. I'm happy to add a specific
> call-out, but I would rather do it elsewhere outside of
> Documentation/technical/bitmap-format.txt, which I think should be
> mostly focused on the on-disk format.
I think what throws me off here is that you already go into the
non-technical somewhat by explaining their usecases. This causes us to
end up halfwhere between "We motivate the changes" and "We document the
technical parts, only".
[snip]
> > In case you have multiple pseudo-merge bitmaps, is the whole of the
> > above repeated for each bitmap or is it just parts of it?
>
> The "pseudo-merge bitmaps" section contains a variable number of pairs
> of EWAH bitmaps, one pair for each pseudo-merge bitmap. I think this is
> covered below where it says "one or more pseudo-merge bitmaps, each
> containing: [...]", but let me know if I should be more explicit.
>
> > > +* An (optional) extended lookup table (written if and only if there is
> > > + at least one commit which appears in more than one pseudo-merge).
> > > + There are as many entries as commits which appear in multiple
> > > + pseudo-merges. Each entry contains the following:
> > > +
> > > + ** `N`, a 4-byte unsigned value equal to the number of pseudo-merges
> > > + which contain a given commit.
> >
> > How exactly is the given commit identified? Or in other words, given an
> > entry in the lookup table here, how do I figure out what commit it
> > belongs to?
>
> They aren't identified within this section. The extended lookup table is
> indexed into via the lookup table with an offset that is stored in the
> `offset` field when the MSB is set.
Okay. Would this explanation be a good addition to the document?
Patrick
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2024-05-10 11:47 UTC|newest]
Thread overview: 157+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
2024-03-20 22:05 ` [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-03-21 21:24 ` Junio C Hamano
2024-03-21 22:13 ` Taylor Blau
2024-03-21 22:22 ` Junio C Hamano
2024-03-20 22:05 ` [PATCH 02/24] config: repo_config_get_expiry() Taylor Blau
2024-04-10 17:54 ` Jeff King
2024-04-29 19:39 ` Taylor Blau
2024-03-20 22:05 ` [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-04-10 18:05 ` Jeff King
2024-04-29 19:47 ` Taylor Blau
2024-03-20 22:05 ` [PATCH 04/24] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
2024-04-10 18:06 ` Jeff King
2024-03-20 22:05 ` [PATCH 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-04-10 18:10 ` Jeff King
2024-03-20 22:05 ` [PATCH 06/24] pseudo-merge.ch: initial commit Taylor Blau
2024-03-20 22:05 ` [PATCH 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-03-20 22:05 ` [PATCH 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-03-20 22:05 ` [PATCH 10/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 11/24] pack-bitmap-write.c: select " Taylor Blau
2024-03-20 22:05 ` [PATCH 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-03-20 22:05 ` [PATCH 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-03-20 22:05 ` [PATCH 14/24] pseudo-merge: scaffolding for reads Taylor Blau
2024-03-20 22:05 ` [PATCH 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-03-20 22:05 ` [PATCH 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-03-20 22:05 ` [PATCH 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-03-20 22:05 ` [PATCH 19/24] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-03-20 22:05 ` [PATCH 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-03-20 22:06 ` [PATCH 21/24] pack-bitmap: extra trace2 information Taylor Blau
2024-03-20 22:06 ` [PATCH 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-03-20 22:06 ` [PATCH 23/24] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-03-20 22:06 ` [PATCH 24/24] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-03-21 19:50 ` [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
2024-04-29 20:42 ` [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-05-06 11:52 ` Patrick Steinhardt
2024-05-06 16:37 ` Taylor Blau
2024-05-10 11:46 ` Patrick Steinhardt [this message]
2024-05-13 19:47 ` Taylor Blau
2024-05-14 6:33 ` Patrick Steinhardt
2024-04-29 20:43 ` [PATCH v2 02/23] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 03/23] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
2024-04-29 20:43 ` [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-05-06 11:52 ` Patrick Steinhardt
2024-05-06 18:24 ` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 05/23] pseudo-merge.ch: initial commit Taylor Blau
2024-04-29 20:43 ` [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-05-06 11:52 ` Patrick Steinhardt
2024-05-06 18:48 ` Taylor Blau
2024-05-10 11:47 ` Patrick Steinhardt
2024-05-13 18:42 ` Jeff King
2024-05-13 20:19 ` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 07/23] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-05-13 18:50 ` Jeff King
2024-05-14 0:54 ` Taylor Blau
2024-04-29 20:43 ` [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-05-06 11:53 ` Patrick Steinhardt
2024-05-06 19:58 ` Taylor Blau
2024-05-13 19:03 ` Jeff King
2024-05-14 0:58 ` Taylor Blau
2024-05-16 8:07 ` Jeff King
2024-05-16 22:43 ` Junio C Hamano
2024-04-29 20:43 ` [PATCH v2 10/23] pack-bitmap-write.c: select " Taylor Blau
2024-05-06 11:53 ` Patrick Steinhardt
2024-05-06 20:05 ` Taylor Blau
2024-05-10 11:47 ` Patrick Steinhardt
2024-04-29 20:43 ` [PATCH v2 11/23] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-04-29 20:43 ` [PATCH v2 12/23] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-04-29 20:43 ` [PATCH v2 13/23] pseudo-merge: scaffolding for reads Taylor Blau
2024-04-29 20:43 ` [PATCH v2 14/23] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-04-29 20:44 ` [PATCH v2 15/23] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-04-29 20:44 ` [PATCH v2 16/23] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-04-29 20:44 ` [PATCH v2 17/23] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-04-29 20:44 ` [PATCH v2 18/23] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-04-29 20:44 ` [PATCH v2 19/23] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-04-29 20:44 ` [PATCH v2 20/23] pack-bitmap: extra trace2 information Taylor Blau
2024-04-29 20:44 ` [PATCH v2 21/23] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-04-29 20:44 ` [PATCH v2 22/23] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-04-29 20:44 ` [PATCH v2 23/23] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-04-30 20:03 ` [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
2024-05-01 14:40 ` Taylor Blau
2024-05-21 19:01 ` [PATCH v3 00/30] " Taylor Blau
2024-05-21 19:01 ` [PATCH v3 01/30] object.h: add flags allocated by pack-bitmap.h Taylor Blau
2024-05-21 19:06 ` Taylor Blau
2024-05-21 19:01 ` [PATCH v3 07/30] Documentation/gitpacking.txt: initial commit Taylor Blau
2024-05-21 19:02 ` [PATCH v3 08/30] Documentation/gitpacking.txt: describe pseudo-merge bitmaps Taylor Blau
2024-05-21 19:02 ` [PATCH v3 09/30] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-05-21 19:02 ` [PATCH v3 10/30] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-05-21 19:02 ` [PATCH v3 11/30] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-05-21 19:02 ` [PATCH v3 12/30] pseudo-merge.ch: initial commit Taylor Blau
2024-05-21 19:02 ` [PATCH v3 13/30] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-05-21 19:02 ` [PATCH v3 14/30] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-05-21 19:02 ` [PATCH v3 15/30] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-05-21 19:02 ` [PATCH v3 16/30] config: introduce git_config_float() Taylor Blau
2024-05-23 10:02 ` Jeff King
2024-05-23 17:51 ` Taylor Blau
2024-05-21 19:02 ` [PATCH v3 17/30] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-05-23 10:12 ` Jeff King
2024-05-23 17:56 ` Taylor Blau
2024-05-21 19:02 ` [PATCH v3 18/30] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-05-21 19:02 ` [PATCH v3 19/30] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-05-21 19:02 ` [PATCH v3 20/30] pseudo-merge: scaffolding for reads Taylor Blau
2024-05-21 19:02 ` [PATCH v3 21/30] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-05-21 19:02 ` [PATCH v3 22/30] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-05-23 10:40 ` Jeff King
2024-05-23 18:09 ` Taylor Blau
2024-05-21 19:02 ` [PATCH v3 23/30] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-05-21 19:02 ` [PATCH v3 24/30] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-05-21 19:02 ` [PATCH v3 25/30] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-05-23 10:42 ` Jeff King
2024-05-23 15:45 ` Junio C Hamano
2024-05-23 18:23 ` Taylor Blau
2024-05-21 19:03 ` [PATCH v3 26/30] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-05-23 10:48 ` Jeff King
2024-05-23 18:23 ` Taylor Blau
2024-05-21 19:03 ` [PATCH v3 27/30] pack-bitmap: extra trace2 information Taylor Blau
2024-05-21 19:03 ` [PATCH v3 28/30] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-05-21 19:03 ` [PATCH v3 29/30] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-05-21 19:03 ` [PATCH v3 30/30] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-05-23 10:54 ` Jeff King
2024-05-23 19:53 ` Taylor Blau
2024-05-25 3:13 ` Jeff King
2024-05-23 11:05 ` [PATCH v3 00/30] pack-bitmap: pseudo-merge reachability bitmaps Jeff King
2024-05-23 20:04 ` Taylor Blau
2024-05-25 3:15 ` Jeff King
2024-05-23 20:42 ` Taylor Blau
2024-05-23 21:26 ` [PATCH v4 00/24] " Taylor Blau
2024-05-23 21:26 ` [PATCH v4 01/24] Documentation/gitpacking.txt: initial commit Taylor Blau
2024-05-23 21:26 ` [PATCH v4 02/24] Documentation/gitpacking.txt: describe pseudo-merge bitmaps Taylor Blau
2024-05-23 21:26 ` [PATCH v4 03/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-05-23 21:26 ` [PATCH v4 04/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-05-23 21:26 ` [PATCH v4 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-05-23 21:26 ` [PATCH v4 06/24] pseudo-merge.ch: initial commit Taylor Blau
2024-05-23 21:26 ` [PATCH v4 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-05-23 21:26 ` [PATCH v4 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-05-23 21:26 ` [PATCH v4 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-05-23 21:26 ` [PATCH v4 10/24] config: introduce `git_config_double()` Taylor Blau
2024-05-23 21:26 ` [PATCH v4 11/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-05-25 3:22 ` Jeff King
2024-05-23 21:26 ` [PATCH v4 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-05-23 21:26 ` [PATCH v4 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-05-23 21:26 ` [PATCH v4 14/24] pseudo-merge: scaffolding for reads Taylor Blau
2024-05-23 21:26 ` [PATCH v4 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-05-23 21:26 ` [PATCH v4 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-05-23 21:27 ` [PATCH v4 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-05-23 21:27 ` [PATCH v4 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-05-23 21:27 ` [PATCH v4 19/24] t/test-lib-functions.sh: support `--notick` in `test_commit_bulk()` Taylor Blau
2024-05-25 3:25 ` Jeff King
2024-05-23 21:27 ` [PATCH v4 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-05-23 21:27 ` [PATCH v4 21/24] pack-bitmap: extra trace2 information Taylor Blau
2024-05-23 21:27 ` [PATCH v4 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-05-23 21:27 ` [PATCH v4 23/24] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-05-23 21:27 ` [PATCH v4 24/24] t/perf: implement performance tests for pseudo-merge bitmaps Taylor Blau
2024-05-25 3:26 ` [PATCH v4 00/24] pack-bitmap: pseudo-merge reachability bitmaps Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zj4JM3ATSMice5do@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).