From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, "Jonathan Tan" <jonathantanmy@google.com>,
"Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries)
Date: Tue, 17 Oct 2023 10:45:36 +0200 [thread overview]
Message-ID: <ZS5JsDKk8RioQfOA@tanuki> (raw)
In-Reply-To: <cover.1696969994.git.me@ttaylorr.com>
[-- Attachment #1: Type: text/plain, Size: 10543 bytes --]
On Tue, Oct 10, 2023 at 04:33:17PM -0400, Taylor Blau wrote:
> (Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth
> batch, 2023-10-04), at the time of writing).
>
> This series is a reroll of the combined efforts of [1] and [2] to
> introduce the v2 changed-path Bloom filters, which fixes a bug in our
> existing implementation of murmur3 paths with non-ASCII characters (when
> the "char" type is signed).
>
> In large part, this is the same as the previous round. But this round
> includes some extra bits that address issues pointed out by SZEDER
> Gábor, which are:
>
> - not reading Bloom filters for root commits
> - corrupting Bloom filter reads by tweaking the filter settings
> between layers.
>
> These issues were discussed in (among other places) [3], and [4],
> respectively.
>
> Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in
> assembling these patches. As usual, a range-diff is included below.
> Thanks in advance for your
> review!
As this patch series has been sitting around without reviews for a week
I've tried my best to give it a go. Note though that this area is mostly
outside of my own comfort zone, so some of the questions and suggestions
might ultimately not apply.
Patrick
> [1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
> [2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/
> [3]: https://public-inbox.org/git/20201015132147.GB24954@szeder.dev/
> [4]: https://lore.kernel.org/git/20230830200218.GA5147@szeder.dev/
>
> Jonathan Tan (4):
> gitformat-commit-graph: describe version 2 of BDAT
> t4216: test changed path filters with high bit paths
> repo-settings: introduce commitgraph.changedPathsVersion
> commit-graph: new filter ver. that fixes murmur3
>
> Taylor Blau (13):
> t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
> revision.c: consult Bloom filters for root commits
> commit-graph: ensure Bloom filters are read with consistent settings
> t/helper/test-read-graph.c: extract `dump_graph_info()`
> bloom.h: make `load_bloom_filter_from_graph()` public
> t/helper/test-read-graph: implement `bloom-filters` mode
> bloom: annotate filters with hash version
> bloom: prepare to discard incompatible Bloom filters
> commit-graph.c: unconditionally load Bloom filters
> commit-graph: drop unnecessary `graph_read_bloom_data_context`
> object.h: fix mis-aligned flag bits table
> commit-graph: reuse existing Bloom filters where possible
> bloom: introduce `deinit_bloom_filters()`
>
> Documentation/config/commitgraph.txt | 26 ++-
> Documentation/gitformat-commit-graph.txt | 9 +-
> bloom.c | 208 ++++++++++++++++-
> bloom.h | 38 +++-
> commit-graph.c | 61 ++++-
> object.h | 3 +-
> oss-fuzz/fuzz-commit-graph.c | 2 +-
> repo-settings.c | 6 +-
> repository.h | 2 +-
> revision.c | 26 ++-
> t/helper/test-bloom.c | 9 +-
> t/helper/test-read-graph.c | 67 ++++--
> t/t0095-bloom.sh | 8 +
> t/t4216-log-bloom.sh | 272 ++++++++++++++++++++++-
> 14 files changed, 682 insertions(+), 55 deletions(-)
>
> Range-diff against v2:
> 10: 002a06d1e9 ! 1: fe671d616c t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
> @@ Commit message
> indicating that no filters were used.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## t/t4216-log-bloom.sh ##
> @@ t/t4216-log-bloom.sh: test_bloom_filters_used () {
> -: ---------- > 2: 7d0fa93543 revision.c: consult Bloom filters for root commits
> -: ---------- > 3: 2ecc0a2d58 commit-graph: ensure Bloom filters are read with consistent settings
> 1: 5fa681b58e ! 4: 17703ed89a gitformat-commit-graph: describe version 2 of BDAT
> @@ Commit message
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## Documentation/gitformat-commit-graph.txt ##
> @@ Documentation/gitformat-commit-graph.txt: All multi-byte numbers are in network byte order.
> 2: 623d840575 ! 5: 94552abf45 t/helper/test-read-graph.c: extract `dump_graph_info()`
> @@ Commit message
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## t/helper/test-read-graph.c ##
> @@
> 3: bc9d77ae60 ! 6: 3d81efa27b bloom.h: make `load_bloom_filter_from_graph()` public
> @@ Commit message
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## bloom.c ##
> @@ bloom.c: static inline unsigned char get_bitmask(uint32_t pos)
> 4: ac7008aed3 ! 7: d23cd89037 t/helper/test-read-graph: implement `bloom-filters` mode
> @@ Commit message
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## t/helper/test-read-graph.c ##
> @@ t/helper/test-read-graph.c: static void dump_graph_info(struct commit_graph *graph)
> @@ t/helper/test-read-graph.c: int cmd__read_graph(int argc UNUSED, const char **ar
> - return 0;
> + return ret;
> }
> ++
> ++
> 5: 71755ba856 ! 8: cba766f224 t4216: test changed path filters with high bit paths
> @@ Commit message
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## t/t4216-log-bloom.sh ##
> -@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty commits' '
> - )
> +@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
> + ! grep "disabling Bloom filters" err
> '
>
> +get_first_changed_path_filter () {
> 6: 9768d92c0f ! 9: a08a961f41 repo-settings: introduce commitgraph.changedPathsVersion
> @@ Commit message
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## Documentation/config/commitgraph.txt ##
> @@ Documentation/config/commitgraph.txt: commitGraph.maxNewFilters::
> 7: f911b4bfab = 10: 61d44519a5 commit-graph: new filter ver. that fixes murmur3
> 8: 35009900df ! 11: a8c10f8de8 bloom: annotate filters with hash version
> @@ Commit message
> Bloom filter.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## bloom.c ##
> @@ bloom.c: int load_bloom_filter_from_graph(struct commit_graph *g,
> 9: 138bc16905 ! 12: 2ba10a4b4b bloom: prepare to discard incompatible Bloom filters
> @@ Commit message
> `get_or_compute_bloom_filter()`.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## bloom.c ##
> @@ bloom.c: static void init_truncated_large_filter(struct bloom_filter *filter,
> 11: 2437e62813 ! 13: 09d8669c3a commit-graph.c: unconditionally load Bloom filters
> @@ Commit message
> either "1" or "2".
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## commit-graph.c ##
> @@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
> 12: fe8fb2f5fe ! 14: 0d4f9dc4ee commit-graph: drop unnecessary `graph_read_bloom_data_context`
> @@ Commit message
>
> Noticed-by: Jonathan Tan <jonathantanmy@google.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## commit-graph.c ##
> @@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_start,
> 13: 825af91e11 ! 15: 1f7f27bc47 object.h: fix mis-aligned flag bits table
> @@ Commit message
> Bit position 23 is one column too far to the left.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## object.h ##
> @@ object.h: void object_array_init(struct object_array *array);
> 14: 593b317192 ! 16: abbef95ae8 commit-graph: reuse existing Bloom filters where possible
> @@ Commit message
> commits by their generation number.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> - Signed-off-by: Junio C Hamano <gitster@pobox.com>
> - Signed-off-by: Taylor Blau <me@ttaylorr.com>
>
> ## bloom.c ##
> @@
> 15: 8bf2c9cf98 = 17: ca362408d5 bloom: introduce `deinit_bloom_filters()`
> --
> 2.42.0.342.g8bb3a896ee
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-10-17 8:45 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 21:43 [PATCH 00/15] bloom: changed-path Bloom filters v2 Taylor Blau
2023-08-21 21:43 ` [PATCH 01/15] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-08-21 21:44 ` [PATCH 02/15] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-08-21 21:44 ` [PATCH 03/15] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-08-21 21:44 ` [PATCH 04/15] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-08-21 21:44 ` [PATCH 05/15] t4216: test changed path filters with high bit paths Taylor Blau
2023-08-21 21:44 ` [PATCH 06/15] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-08-21 21:44 ` [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-08-26 15:06 ` SZEDER Gábor
2023-08-29 16:31 ` Jonathan Tan
2023-08-30 20:02 ` SZEDER Gábor
2023-09-01 20:56 ` Jonathan Tan
2023-09-25 23:03 ` Taylor Blau
2023-10-08 14:35 ` SZEDER Gábor
2023-10-09 18:17 ` Taylor Blau
2023-10-09 19:31 ` Taylor Blau
2023-10-09 19:52 ` Junio C Hamano
2023-10-10 20:34 ` Taylor Blau
2023-08-21 21:44 ` [PATCH 08/15] bloom: annotate filters with hash version Taylor Blau
2023-08-21 21:44 ` [PATCH 09/15] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-21 21:44 ` [PATCH 10/15] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-21 21:44 ` [PATCH 11/15] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-21 21:44 ` [PATCH 12/15] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-08-21 21:44 ` [PATCH 13/15] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-21 21:44 ` [PATCH 14/15] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-21 21:44 ` [PATCH 15/15] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-08-24 22:22 ` [PATCH 00/15] bloom: changed-path Bloom filters v2 Jonathan Tan
2023-08-25 17:06 ` Jonathan Tan
2023-08-29 22:18 ` Jonathan Tan
2023-08-29 23:16 ` Junio C Hamano
2023-08-30 16:43 ` [PATCH v2 " Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 01/15] gitformat-commit-graph: describe version 2 of BDAT Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 02/15] t/helper/test-read-graph.c: extract `dump_graph_info()` Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 03/15] bloom.h: make `load_bloom_filter_from_graph()` public Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 04/15] t/helper/test-read-graph: implement `bloom-filters` mode Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 05/15] t4216: test changed path filters with high bit paths Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 06/15] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 07/15] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 08/15] bloom: annotate filters with hash version Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 09/15] bloom: prepare to discard incompatible Bloom filters Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 10/15] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 11/15] commit-graph.c: unconditionally load Bloom filters Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 12/15] commit-graph: drop unnecessary `graph_read_bloom_data_context` Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 13/15] object.h: fix mis-aligned flag bits table Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 14/15] commit-graph: reuse existing Bloom filters where possible Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 15/15] bloom: introduce `deinit_bloom_filters()` Jonathan Tan
2023-08-30 19:38 ` [PATCH v2 00/15] bloom: changed-path Bloom filters v2 Junio C Hamano
2023-10-10 20:33 ` [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-10 20:33 ` [PATCH v3 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-10 20:33 ` [PATCH v3 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-10 20:33 ` [PATCH v3 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt
2023-10-10 20:33 ` [PATCH v3 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-10 20:33 ` [PATCH v3 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt
2023-10-18 17:37 ` Taylor Blau
2023-10-18 23:56 ` Junio C Hamano
2023-10-10 20:33 ` [PATCH v3 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-10 20:33 ` [PATCH v3 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-10 20:33 ` [PATCH v3 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt
2023-10-18 17:41 ` Taylor Blau
2023-10-10 20:33 ` [PATCH v3 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-10 20:33 ` [PATCH v3 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt
2023-10-18 17:46 ` Taylor Blau
2023-10-10 20:33 ` [PATCH v3 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-10 20:33 ` [PATCH v3 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-10 20:33 ` [PATCH v3 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt
2023-10-10 20:34 ` [PATCH v3 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-10 20:34 ` [PATCH v3 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-10 20:34 ` [PATCH v3 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-10 20:34 ` [PATCH v3 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt [this message]
2023-10-18 17:47 ` [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZS5JsDKk8RioQfOA@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).