git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: "Jonathan Tan" <jonathantanmy@google.com>,
	"Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries)
Date: Tue, 10 Oct 2023 16:33:17 -0400	[thread overview]
Message-ID: <cover.1696969994.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1692654233.git.me@ttaylorr.com>

(Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth
batch, 2023-10-04), at the time of writing).

This series is a reroll of the combined efforts of [1] and [2] to
introduce the v2 changed-path Bloom filters, which fixes a bug in our
existing implementation of murmur3 paths with non-ASCII characters (when
the "char" type is signed).

In large part, this is the same as the previous round. But this round
includes some extra bits that address issues pointed out by SZEDER
Gábor, which are:

  - not reading Bloom filters for root commits
  - corrupting Bloom filter reads by tweaking the filter settings
    between layers.

These issues were discussed in (among other places) [3], and [4],
respectively.

Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in
assembling these patches. As usual, a range-diff is included below.
Thanks in advance for your
review!

[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/
[3]: https://public-inbox.org/git/20201015132147.GB24954@szeder.dev/
[4]: https://lore.kernel.org/git/20230830200218.GA5147@szeder.dev/

Jonathan Tan (4):
  gitformat-commit-graph: describe version 2 of BDAT
  t4216: test changed path filters with high bit paths
  repo-settings: introduce commitgraph.changedPathsVersion
  commit-graph: new filter ver. that fixes murmur3

Taylor Blau (13):
  t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
  revision.c: consult Bloom filters for root commits
  commit-graph: ensure Bloom filters are read with consistent settings
  t/helper/test-read-graph.c: extract `dump_graph_info()`
  bloom.h: make `load_bloom_filter_from_graph()` public
  t/helper/test-read-graph: implement `bloom-filters` mode
  bloom: annotate filters with hash version
  bloom: prepare to discard incompatible Bloom filters
  commit-graph.c: unconditionally load Bloom filters
  commit-graph: drop unnecessary `graph_read_bloom_data_context`
  object.h: fix mis-aligned flag bits table
  commit-graph: reuse existing Bloom filters where possible
  bloom: introduce `deinit_bloom_filters()`

 Documentation/config/commitgraph.txt     |  26 ++-
 Documentation/gitformat-commit-graph.txt |   9 +-
 bloom.c                                  | 208 ++++++++++++++++-
 bloom.h                                  |  38 +++-
 commit-graph.c                           |  61 ++++-
 object.h                                 |   3 +-
 oss-fuzz/fuzz-commit-graph.c             |   2 +-
 repo-settings.c                          |   6 +-
 repository.h                             |   2 +-
 revision.c                               |  26 ++-
 t/helper/test-bloom.c                    |   9 +-
 t/helper/test-read-graph.c               |  67 ++++--
 t/t0095-bloom.sh                         |   8 +
 t/t4216-log-bloom.sh                     | 272 ++++++++++++++++++++++-
 14 files changed, 682 insertions(+), 55 deletions(-)

Range-diff against v2:
10:  002a06d1e9 !  1:  fe671d616c t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
    @@ Commit message
         indicating that no filters were used.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## t/t4216-log-bloom.sh ##
     @@ t/t4216-log-bloom.sh: test_bloom_filters_used () {
 -:  ---------- >  2:  7d0fa93543 revision.c: consult Bloom filters for root commits
 -:  ---------- >  3:  2ecc0a2d58 commit-graph: ensure Bloom filters are read with consistent settings
 1:  5fa681b58e !  4:  17703ed89a gitformat-commit-graph: describe version 2 of BDAT
    @@ Commit message
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## Documentation/gitformat-commit-graph.txt ##
     @@ Documentation/gitformat-commit-graph.txt: All multi-byte numbers are in network byte order.
 2:  623d840575 !  5:  94552abf45 t/helper/test-read-graph.c: extract `dump_graph_info()`
    @@ Commit message
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## t/helper/test-read-graph.c ##
     @@
 3:  bc9d77ae60 !  6:  3d81efa27b bloom.h: make `load_bloom_filter_from_graph()` public
    @@ Commit message
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## bloom.c ##
     @@ bloom.c: static inline unsigned char get_bitmask(uint32_t pos)
 4:  ac7008aed3 !  7:  d23cd89037 t/helper/test-read-graph: implement `bloom-filters` mode
    @@ Commit message
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## t/helper/test-read-graph.c ##
     @@ t/helper/test-read-graph.c: static void dump_graph_info(struct commit_graph *graph)
    @@ t/helper/test-read-graph.c: int cmd__read_graph(int argc UNUSED, const char **ar
     -	return 0;
     +	return ret;
      }
    ++
    ++
 5:  71755ba856 !  8:  cba766f224 t4216: test changed path filters with high bit paths
    @@ Commit message
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## t/t4216-log-bloom.sh ##
    -@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty commits' '
    - 	)
    +@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
    + 	! grep "disabling Bloom filters" err
      '
      
     +get_first_changed_path_filter () {
 6:  9768d92c0f !  9:  a08a961f41 repo-settings: introduce commitgraph.changedPathsVersion
    @@ Commit message
         Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## Documentation/config/commitgraph.txt ##
     @@ Documentation/config/commitgraph.txt: commitGraph.maxNewFilters::
 7:  f911b4bfab = 10:  61d44519a5 commit-graph: new filter ver. that fixes murmur3
 8:  35009900df ! 11:  a8c10f8de8 bloom: annotate filters with hash version
    @@ Commit message
         Bloom filter.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## bloom.c ##
     @@ bloom.c: int load_bloom_filter_from_graph(struct commit_graph *g,
 9:  138bc16905 ! 12:  2ba10a4b4b bloom: prepare to discard incompatible Bloom filters
    @@ Commit message
         `get_or_compute_bloom_filter()`.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## bloom.c ##
     @@ bloom.c: static void init_truncated_large_filter(struct bloom_filter *filter,
11:  2437e62813 ! 13:  09d8669c3a commit-graph.c: unconditionally load Bloom filters
    @@ Commit message
         either "1" or "2".
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## commit-graph.c ##
     @@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
12:  fe8fb2f5fe ! 14:  0d4f9dc4ee commit-graph: drop unnecessary `graph_read_bloom_data_context`
    @@ Commit message
     
         Noticed-by: Jonathan Tan <jonathantanmy@google.com>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## commit-graph.c ##
     @@ commit-graph.c: static int graph_read_oid_lookup(const unsigned char *chunk_start,
13:  825af91e11 ! 15:  1f7f27bc47 object.h: fix mis-aligned flag bits table
    @@ Commit message
         Bit position 23 is one column too far to the left.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## object.h ##
     @@ object.h: void object_array_init(struct object_array *array);
14:  593b317192 ! 16:  abbef95ae8 commit-graph: reuse existing Bloom filters where possible
    @@ Commit message
           commits by their generation number.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
    -    Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## bloom.c ##
     @@
15:  8bf2c9cf98 = 17:  ca362408d5 bloom: introduce `deinit_bloom_filters()`
-- 
2.42.0.342.g8bb3a896ee

  parent reply	other threads:[~2023-10-10 20:33 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 21:43 [PATCH 00/15] bloom: changed-path Bloom filters v2 Taylor Blau
2023-08-21 21:43 ` [PATCH 01/15] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-08-21 21:44 ` [PATCH 02/15] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-08-21 21:44 ` [PATCH 03/15] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-08-21 21:44 ` [PATCH 04/15] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-08-21 21:44 ` [PATCH 05/15] t4216: test changed path filters with high bit paths Taylor Blau
2023-08-21 21:44 ` [PATCH 06/15] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-08-21 21:44 ` [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-08-26 15:06   ` SZEDER Gábor
2023-08-29 16:31     ` Jonathan Tan
2023-08-30 20:02       ` SZEDER Gábor
2023-09-01 20:56         ` Jonathan Tan
2023-09-25 23:03           ` Taylor Blau
2023-10-08 14:35             ` SZEDER Gábor
2023-10-09 18:17               ` Taylor Blau
2023-10-09 19:31                 ` Taylor Blau
2023-10-09 19:52                   ` Junio C Hamano
2023-10-10 20:34                     ` Taylor Blau
2023-08-21 21:44 ` [PATCH 08/15] bloom: annotate filters with hash version Taylor Blau
2023-08-21 21:44 ` [PATCH 09/15] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-21 21:44 ` [PATCH 10/15] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-21 21:44 ` [PATCH 11/15] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-21 21:44 ` [PATCH 12/15] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-08-21 21:44 ` [PATCH 13/15] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-21 21:44 ` [PATCH 14/15] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-21 21:44 ` [PATCH 15/15] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-08-24 22:22 ` [PATCH 00/15] bloom: changed-path Bloom filters v2 Jonathan Tan
2023-08-25 17:06   ` Jonathan Tan
2023-08-29 22:18     ` Jonathan Tan
2023-08-29 23:16       ` Junio C Hamano
2023-08-30 16:43 ` [PATCH v2 " Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 01/15] gitformat-commit-graph: describe version 2 of BDAT Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 02/15] t/helper/test-read-graph.c: extract `dump_graph_info()` Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 03/15] bloom.h: make `load_bloom_filter_from_graph()` public Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 04/15] t/helper/test-read-graph: implement `bloom-filters` mode Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 05/15] t4216: test changed path filters with high bit paths Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 06/15] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 07/15] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 08/15] bloom: annotate filters with hash version Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 09/15] bloom: prepare to discard incompatible Bloom filters Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 10/15] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 11/15] commit-graph.c: unconditionally load Bloom filters Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 12/15] commit-graph: drop unnecessary `graph_read_bloom_data_context` Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 13/15] object.h: fix mis-aligned flag bits table Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 14/15] commit-graph: reuse existing Bloom filters where possible Jonathan Tan
2023-08-30 16:43   ` [PATCH v2 15/15] bloom: introduce `deinit_bloom_filters()` Jonathan Tan
2023-08-30 19:38   ` [PATCH v2 00/15] bloom: changed-path Bloom filters v2 Junio C Hamano
2023-10-10 20:33 ` Taylor Blau [this message]
2023-10-10 20:33   ` [PATCH v3 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-10 20:33   ` [PATCH v3 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-10 20:33   ` [PATCH v3 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-10 20:33   ` [PATCH v3 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-10 20:33   ` [PATCH v3 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-18 17:37       ` Taylor Blau
2023-10-18 23:56         ` Junio C Hamano
2023-10-10 20:33   ` [PATCH v3 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-10 20:33   ` [PATCH v3 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-10 20:33   ` [PATCH v3 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-18 17:41       ` Taylor Blau
2023-10-10 20:33   ` [PATCH v3 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-10 20:33   ` [PATCH v3 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-18 17:46       ` Taylor Blau
2023-10-10 20:33   ` [PATCH v3 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-10 20:33   ` [PATCH v3 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-10 20:33   ` [PATCH v3 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-17  8:45     ` Patrick Steinhardt
2023-10-10 20:34   ` [PATCH v3 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-10 20:34   ` [PATCH v3 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-10 20:34   ` [PATCH v3 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-10 20:34   ` [PATCH v3 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-17  8:45   ` [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries) Patrick Steinhardt
2023-10-18 17:47     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1696969994.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).