All of lore.kernel.org
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: "Elijah Newren" <newren@gmail.com>, "Jeff King" <peff@peff.net>,
	"Jiang Xin" <worldhello.net@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v7 00/16] bloom: changed-path Bloom filters v2 (& sundries)
Date: Tue, 25 Jun 2024 13:39:01 -0400	[thread overview]
Message-ID: <cover.1719333276.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1706741516.git.me@ttaylorr.com>

(Rebased onto the tip of 'master', which is now 1e1586e4ed (The
sixteenth batch, 2024-06-24) at the time of writing).

This series is another minor reroll of the combined efforts of [1] and
[2] to introduce the v2 changed-path Bloom filters, which fixes a bug in
our existing implementation of murmur3 paths with non-ASCII characters
(when the "char" type is signed).

This version addresses the remaining comments from SZEDER around more
thorough testing of merging commit-graph layers with incompatible Bloom
filters versions, and ensuring the result is as expected.

Thanks again to Jonathan, Peff, and SZEDER who have helped a great deal
in assembling these patches. As usual, a range-diff is included below.

Thanks in advance for your review!

[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/

Jonathan Tan (1):
  gitformat-commit-graph: describe version 2 of BDAT

Taylor Blau (15):
  t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
  revision.c: consult Bloom filters for root commits
  commit-graph: ensure Bloom filters are read with consistent settings
  t/helper/test-read-graph.c: extract `dump_graph_info()`
  bloom.h: make `load_bloom_filter_from_graph()` public
  t/helper/test-read-graph: implement `bloom-filters` mode
  t4216: test changed path filters with high bit paths
  repo-settings: introduce commitgraph.changedPathsVersion
  bloom: annotate filters with hash version
  bloom: prepare to discard incompatible Bloom filters
  commit-graph: unconditionally load Bloom filters
  commit-graph: new Bloom filter version that fixes murmur3
  object.h: fix mis-aligned flag bits table
  commit-graph: reuse existing Bloom filters where possible
  bloom: introduce `deinit_bloom_filters()`

 Documentation/config/commitgraph.txt     |  29 +-
 Documentation/gitformat-commit-graph.txt |   9 +-
 bloom.c                                  | 208 ++++++++++++++-
 bloom.h                                  |  38 ++-
 commit-graph.c                           |  64 ++++-
 object.h                                 |   3 +-
 oss-fuzz/fuzz-commit-graph.c             |   2 +-
 repo-settings.c                          |   6 +-
 repository.h                             |   2 +-
 revision.c                               |  26 +-
 t/helper/test-bloom.c                    |   9 +-
 t/helper/test-read-graph.c               |  67 ++++-
 t/t0095-bloom.sh                         |   8 +
 t/t4216-log-bloom.sh                     | 325 ++++++++++++++++++++++-
 14 files changed, 738 insertions(+), 58 deletions(-)

Range-diff against v6:
 1:  9df34a2f4f =  1:  ee651fee33 t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
 2:  a6dc377f1b =  2:  5d88ad6c90 revision.c: consult Bloom filters for root commits
 3:  a77ab941bc !  3:  f6cf5bfc4e commit-graph: ensure Bloom filters are read with consistent settings
    @@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
     +test_expect_success 'merge graph layers with incompatible Bloom settings' '
     +	# Ensure that incompatible Bloom filters are ignored when
     +	# merging existing layers.
    -+	git -C $repo commit-graph write --reachable --changed-paths 2>err &&
    ++	>trace2.txt &&
    ++	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
    ++		git -C $repo commit-graph write --reachable --changed-paths 2>err &&
     +	grep "disabling Bloom filters for commit-graph layer .$layer." err &&
    ++	grep "{\"hash_version\":1,\"num_hashes\":7,\"bits_per_entry\":10,\"max_changed_paths\":512" trace2.txt &&
     +
     +	test_path_is_file $repo/$graph &&
     +	test_dir_is_empty $repo/$graphdir &&
 4:  56a9fdaff0 =  4:  0041600f31 gitformat-commit-graph: describe version 2 of BDAT
 5:  7484a82f7f =  5:  6e7f317551 t/helper/test-read-graph.c: extract `dump_graph_info()`
 6:  48343f93a2 =  6:  ae74fbad3e bloom.h: make `load_bloom_filter_from_graph()` public
 7:  286fd7dcdb =  7:  0dfd1a361e t/helper/test-read-graph: implement `bloom-filters` mode
 8:  7de7b89da0 =  8:  fbcaa686b1 t4216: test changed path filters with high bit paths
 9:  b13c9b8ff9 !  9:  60c063ca4a repo-settings: introduce commitgraph.changedPathsVersion
    @@ commit-graph.c: static void validate_mixed_bloom_settings(struct commit_graph *g
     
      ## oss-fuzz/fuzz-commit-graph.c ##
     @@ oss-fuzz/fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
    - 	 * possible.
      	 */
    + 	repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
      	the_repository->settings.commit_graph_generation_version = 2;
     -	the_repository->settings.commit_graph_read_changed_paths = 1;
     +	the_repository->settings.commit_graph_changed_paths_version = 1;
10:  09c44c51a5 = 10:  ce3a15a517 bloom: annotate filters with hash version
11:  d4995ef600 = 11:  99ab9cf448 bloom: prepare to discard incompatible Bloom filters
12:  c8e9bb7c88 = 12:  99e66d1dba commit-graph: unconditionally load Bloom filters
13:  d2f11c082d ! 13:  2e945c3d2e commit-graph: new Bloom filter version that fixes murmur3
    @@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
     +	if (r->settings.commit_graph_changed_paths_version < -1
     +	    || r->settings.commit_graph_changed_paths_version > 2) {
     +		warning(_("attempting to write a commit-graph, but "
    -+			  "'commitgraph.changedPathsVersion' (%d) is not supported"),
    ++			  "'commitGraph.changedPathsVersion' (%d) is not supported"),
     +			r->settings.commit_graph_changed_paths_version);
     +		return 0;
     +	}
    @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible
      	test_must_be_empty err
      '
      
    ++# chosen to be the same under all Unicode normalization forms
    ++CENT=$(printf "\302\242")
    ++
     +test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
     +	rm "$repo/$graph" &&
     +
    @@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible
     +	cat >expect.err <<-EOF &&
     +	warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
     +	EOF
    -+	test_cmp expect.err err
    ++	test_cmp expect.err err &&
    ++
    ++	# Merge the two layers with incompatible bloom filter versions,
    ++	# ensuring that the v2 filters are used.
    ++	>trace2.txt &&
    ++	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
    ++		git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write --reachable --changed-paths 2>err &&
    ++	grep "disabling Bloom filters for commit-graph layer .$layer." err &&
    ++	grep "{\"hash_version\":2,\"num_hashes\":7,\"bits_per_entry\":10,\"max_changed_paths\":512" trace2.txt
     +'
     +
      get_first_changed_path_filter () {
      	test-tool read-graph bloom-filters >filters.dat &&
      	head -n 1 filters.dat
    + }
    + 
    +-# chosen to be the same under all Unicode normalization forms
    +-CENT=$(printf "\302\242")
    +-
    + test_expect_success 'set up repo with high bit path, version 1 changed-path' '
    + 	git init highbit1 &&
    + 	test_commit -C highbit1 c1 "$CENT" &&
     @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when version 1 requested' '
      	)
      '
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'version 1 changed-path not used when version 2 requested' '
     +	(
     +		cd highbit1 &&
    -+		git config --add commitgraph.changedPathsVersion 2 &&
    ++		git config --add commitGraph.changedPathsVersion 2 &&
     +		test_bloom_filters_not_used "-- another$CENT"
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'version 1 changed-path used when autodetect requested' '
     +	(
     +		cd highbit1 &&
    -+		git config --add commitgraph.changedPathsVersion -1 &&
    ++		git config --add commitGraph.changedPathsVersion -1 &&
     +		test_bloom_filters_used "-- another$CENT"
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +	git -C highbit1 commit-graph write --reachable --changed-paths &&
     +	(
     +		cd highbit1 &&
    -+		git config --add commitgraph.changedPathsVersion -1 &&
    ++		git config --add commitGraph.changedPathsVersion -1 &&
     +		echo "options: bloom(1,10,7) read_generation_data" >expect &&
     +		test-tool read-graph >full &&
     +		grep options full >actual &&
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +
     +test_expect_success 'set up repo with high bit path, version 2 changed-path' '
     +	git init highbit2 &&
    -+	git -C highbit2 config --add commitgraph.changedPathsVersion 2 &&
    ++	git -C highbit2 config --add commitGraph.changedPathsVersion 2 &&
     +	test_commit -C highbit2 c2 "$CENT" &&
     +	git -C highbit2 commit-graph write --reachable --changed-paths
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'version 2 changed-path not used when version 1 requested' '
     +	(
     +		cd highbit2 &&
    -+		git config --add commitgraph.changedPathsVersion 1 &&
    ++		git config --add commitGraph.changedPathsVersion 1 &&
     +		test_bloom_filters_not_used "-- another$CENT"
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'version 2 changed-path used when autodetect requested' '
     +	(
     +		cd highbit2 &&
    -+		git config --add commitgraph.changedPathsVersion -1 &&
    ++		git config --add commitGraph.changedPathsVersion -1 &&
     +		test_bloom_filters_used "-- another$CENT"
     +	)
     +'
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +	git -C highbit2 commit-graph write --reachable --changed-paths &&
     +	(
     +		cd highbit2 &&
    -+		git config --add commitgraph.changedPathsVersion -1 &&
    ++		git config --add commitGraph.changedPathsVersion -1 &&
     +		echo "options: bloom(2,10,7) read_generation_data" >expect &&
     +		test-tool read-graph >full &&
     +		grep options full >actual &&
    @@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
     +test_expect_success 'when writing commit graph, do not reuse changed-path of another version' '
     +	git init doublewrite &&
     +	test_commit -C doublewrite c "$CENT" &&
    -+	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
    ++	git -C doublewrite config --add commitGraph.changedPathsVersion 1 &&
     +	git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	for v in -2 3
     +	do
    -+		git -C doublewrite config --add commitgraph.changedPathsVersion $v &&
    ++		git -C doublewrite config --add commitGraph.changedPathsVersion $v &&
     +		git -C doublewrite commit-graph write --reachable --changed-paths 2>err &&
     +		cat >expect <<-EOF &&
    -+		warning: attempting to write a commit-graph, but ${SQ}commitgraph.changedPathsVersion${SQ} ($v) is not supported
    ++		warning: attempting to write a commit-graph, but ${SQ}commitGraph.changedPathsVersion${SQ} ($v) is not supported
     +		EOF
     +		test_cmp expect err || return 1
     +	done &&
    -+	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
    ++	git -C doublewrite config --add commitGraph.changedPathsVersion 2 &&
     +	git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	(
     +		cd doublewrite &&
14:  9f54a376fb = 14:  242f023135 object.h: fix mis-aligned flag bits table
15:  67991dea7c ! 15:  1b80023e57 commit-graph: reuse existing Bloom filters where possible
    @@ bloom.c: static void init_truncated_large_filter(struct bloom_filter *filter,
     +		struct tree_desc desc;
     +		struct name_entry entry;
     +
    -+		init_tree_desc(&desc, t->buffer, t->size);
    ++		init_tree_desc(&desc, &t->object.oid, t->buffer, t->size);
     +		while (tree_entry(&desc, &entry)) {
     +			size_t i;
     +			for (i = 0; i < entry.pathlen; i++) {
    @@ t/t4216-log-bloom.sh: test_expect_success 'when writing another commit graph, pr
      	git init doublewrite &&
      	test_commit -C doublewrite c "$CENT" &&
     +
    - 	git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
    + 	git -C doublewrite config --add commitGraph.changedPathsVersion 1 &&
    ++	>trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 1 trace2.txt &&
    @@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reu
      		test_cmp expect err || return 1
      	done &&
     +
    - 	git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
    + 	git -C doublewrite config --add commitGraph.changedPathsVersion 2 &&
     -	git -C doublewrite commit-graph write --reachable --changed-paths &&
    ++	>trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C doublewrite commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 1 trace2.txt &&
    @@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reu
     +
     +	test_commit -C upgrade base no-high-bits &&
     +
    -+	git -C upgrade config --add commitgraph.changedPathsVersion 1 &&
    ++	git -C upgrade config --add commitGraph.changedPathsVersion 1 &&
    ++	>trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C upgrade commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 1 trace2.txt &&
     +	test_filter_upgraded 0 trace2.txt &&
     +
    -+	git -C upgrade config --add commitgraph.changedPathsVersion 2 &&
    ++	git -C upgrade config --add commitGraph.changedPathsVersion 2 &&
    ++	>trace2.txt &&
     +	GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
     +		git -C upgrade commit-graph write --reachable --changed-paths &&
     +	test_filter_computed 0 trace2.txt &&
16:  12058a074d = 16:  db9991f339 bloom: introduce `deinit_bloom_filters()`

base-commit: 1e1586e4ed626bde864339c10570bc0e73f0ab97
-- 
2.45.2.664.g446e6a2b1f

  parent reply	other threads:[~2024-06-25 17:39 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 22:52 [PATCH v6 00/16] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2024-01-31 22:52 ` [PATCH v6 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-31 22:52 ` [PATCH v6 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-31 22:52 ` [PATCH v6 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-31 22:52 ` [PATCH v6 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-31 22:52 ` [PATCH v6 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-31 22:52 ` [PATCH v6 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-31 22:52 ` [PATCH v6 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-31 22:52 ` [PATCH v6 10/16] bloom: annotate filters with hash version Taylor Blau
2024-01-31 22:52 ` [PATCH v6 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-31 22:52 ` [PATCH v6 12/16] commit-graph: unconditionally load " Taylor Blau
2024-01-31 22:52 ` [PATCH v6 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-02-25 22:15   ` SZEDER Gábor
2024-02-28  0:11   ` Jiang Xin
2024-01-31 22:52 ` [PATCH v6 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-31 22:52 ` [PATCH v6 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-31 22:53 ` [PATCH v6 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2024-06-25 17:39 ` Taylor Blau [this message]
2024-06-25 17:39   ` [PATCH v7 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-06-25 17:39   ` [PATCH v7 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-06-25 17:39   ` [PATCH v7 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-06-25 17:39   ` [PATCH v7 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-06-25 17:39   ` [PATCH v7 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-06-25 17:39   ` [PATCH v7 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-06-25 17:39   ` [PATCH v7 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-06-25 17:39   ` [PATCH v7 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-06-25 17:39   ` [PATCH v7 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-06-25 17:39   ` [PATCH v7 10/16] bloom: annotate filters with hash version Taylor Blau
2024-06-25 17:39   ` [PATCH v7 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-06-25 17:40   ` [PATCH v7 12/16] commit-graph: unconditionally load " Taylor Blau
2024-06-25 17:40   ` [PATCH v7 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-06-25 17:40   ` [PATCH v7 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-06-25 17:40   ` [PATCH v7 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-06-25 17:40   ` [PATCH v7 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1719333276.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    --cc=worldhello.net@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.