From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: "Elijah Newren" <newren@gmail.com>, "Jeff King" <peff@peff.net>,
"Jiang Xin" <worldhello.net@gmail.com>,
"Jonathan Tan" <jonathantanmy@google.com>,
"Junio C Hamano" <gitster@pobox.com>,
"SZEDER Gábor" <szeder.dev@gmail.com>,
"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v7 00/16] bloom: changed-path Bloom filters v2 (& sundries)
Date: Tue, 25 Jun 2024 13:39:01 -0400 [thread overview]
Message-ID: <cover.1719333276.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1706741516.git.me@ttaylorr.com>
(Rebased onto the tip of 'master', which is now 1e1586e4ed (The
sixteenth batch, 2024-06-24) at the time of writing).
This series is another minor reroll of the combined efforts of [1] and
[2] to introduce the v2 changed-path Bloom filters, which fixes a bug in
our existing implementation of murmur3 paths with non-ASCII characters
(when the "char" type is signed).
This version addresses the remaining comments from SZEDER around more
thorough testing of merging commit-graph layers with incompatible Bloom
filters versions, and ensuring the result is as expected.
Thanks again to Jonathan, Peff, and SZEDER who have helped a great deal
in assembling these patches. As usual, a range-diff is included below.
Thanks in advance for your review!
[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/
Jonathan Tan (1):
gitformat-commit-graph: describe version 2 of BDAT
Taylor Blau (15):
t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
revision.c: consult Bloom filters for root commits
commit-graph: ensure Bloom filters are read with consistent settings
t/helper/test-read-graph.c: extract `dump_graph_info()`
bloom.h: make `load_bloom_filter_from_graph()` public
t/helper/test-read-graph: implement `bloom-filters` mode
t4216: test changed path filters with high bit paths
repo-settings: introduce commitgraph.changedPathsVersion
bloom: annotate filters with hash version
bloom: prepare to discard incompatible Bloom filters
commit-graph: unconditionally load Bloom filters
commit-graph: new Bloom filter version that fixes murmur3
object.h: fix mis-aligned flag bits table
commit-graph: reuse existing Bloom filters where possible
bloom: introduce `deinit_bloom_filters()`
Documentation/config/commitgraph.txt | 29 +-
Documentation/gitformat-commit-graph.txt | 9 +-
bloom.c | 208 ++++++++++++++-
bloom.h | 38 ++-
commit-graph.c | 64 ++++-
object.h | 3 +-
oss-fuzz/fuzz-commit-graph.c | 2 +-
repo-settings.c | 6 +-
repository.h | 2 +-
revision.c | 26 +-
t/helper/test-bloom.c | 9 +-
t/helper/test-read-graph.c | 67 ++++-
t/t0095-bloom.sh | 8 +
t/t4216-log-bloom.sh | 325 ++++++++++++++++++++++-
14 files changed, 738 insertions(+), 58 deletions(-)
Range-diff against v6:
1: 9df34a2f4f = 1: ee651fee33 t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
2: a6dc377f1b = 2: 5d88ad6c90 revision.c: consult Bloom filters for root commits
3: a77ab941bc ! 3: f6cf5bfc4e commit-graph: ensure Bloom filters are read with consistent settings
@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
+test_expect_success 'merge graph layers with incompatible Bloom settings' '
+ # Ensure that incompatible Bloom filters are ignored when
+ # merging existing layers.
-+ git -C $repo commit-graph write --reachable --changed-paths 2>err &&
++ >trace2.txt &&
++ GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
++ git -C $repo commit-graph write --reachable --changed-paths 2>err &&
+ grep "disabling Bloom filters for commit-graph layer .$layer." err &&
++ grep "{\"hash_version\":1,\"num_hashes\":7,\"bits_per_entry\":10,\"max_changed_paths\":512" trace2.txt &&
+
+ test_path_is_file $repo/$graph &&
+ test_dir_is_empty $repo/$graphdir &&
4: 56a9fdaff0 = 4: 0041600f31 gitformat-commit-graph: describe version 2 of BDAT
5: 7484a82f7f = 5: 6e7f317551 t/helper/test-read-graph.c: extract `dump_graph_info()`
6: 48343f93a2 = 6: ae74fbad3e bloom.h: make `load_bloom_filter_from_graph()` public
7: 286fd7dcdb = 7: 0dfd1a361e t/helper/test-read-graph: implement `bloom-filters` mode
8: 7de7b89da0 = 8: fbcaa686b1 t4216: test changed path filters with high bit paths
9: b13c9b8ff9 ! 9: 60c063ca4a repo-settings: introduce commitgraph.changedPathsVersion
@@ commit-graph.c: static void validate_mixed_bloom_settings(struct commit_graph *g
## oss-fuzz/fuzz-commit-graph.c ##
@@ oss-fuzz/fuzz-commit-graph.c: int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
- * possible.
*/
+ repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
the_repository->settings.commit_graph_generation_version = 2;
- the_repository->settings.commit_graph_read_changed_paths = 1;
+ the_repository->settings.commit_graph_changed_paths_version = 1;
10: 09c44c51a5 = 10: ce3a15a517 bloom: annotate filters with hash version
11: d4995ef600 = 11: 99ab9cf448 bloom: prepare to discard incompatible Bloom filters
12: c8e9bb7c88 = 12: 99e66d1dba commit-graph: unconditionally load Bloom filters
13: d2f11c082d ! 13: 2e945c3d2e commit-graph: new Bloom filter version that fixes murmur3
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
+ if (r->settings.commit_graph_changed_paths_version < -1
+ || r->settings.commit_graph_changed_paths_version > 2) {
+ warning(_("attempting to write a commit-graph, but "
-+ "'commitgraph.changedPathsVersion' (%d) is not supported"),
++ "'commitGraph.changedPathsVersion' (%d) is not supported"),
+ r->settings.commit_graph_changed_paths_version);
+ return 0;
+ }
@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible
test_must_be_empty err
'
++# chosen to be the same under all Unicode normalization forms
++CENT=$(printf "\302\242")
++
+test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
+ rm "$repo/$graph" &&
+
@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible
+ cat >expect.err <<-EOF &&
+ warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
+ EOF
-+ test_cmp expect.err err
++ test_cmp expect.err err &&
++
++ # Merge the two layers with incompatible bloom filter versions,
++ # ensuring that the v2 filters are used.
++ >trace2.txt &&
++ GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
++ git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write --reachable --changed-paths 2>err &&
++ grep "disabling Bloom filters for commit-graph layer .$layer." err &&
++ grep "{\"hash_version\":2,\"num_hashes\":7,\"bits_per_entry\":10,\"max_changed_paths\":512" trace2.txt
+'
+
get_first_changed_path_filter () {
test-tool read-graph bloom-filters >filters.dat &&
head -n 1 filters.dat
+ }
+
+-# chosen to be the same under all Unicode normalization forms
+-CENT=$(printf "\302\242")
+-
+ test_expect_success 'set up repo with high bit path, version 1 changed-path' '
+ git init highbit1 &&
+ test_commit -C highbit1 c1 "$CENT" &&
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when version 1 requested' '
)
'
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
+test_expect_success 'version 1 changed-path not used when version 2 requested' '
+ (
+ cd highbit1 &&
-+ git config --add commitgraph.changedPathsVersion 2 &&
++ git config --add commitGraph.changedPathsVersion 2 &&
+ test_bloom_filters_not_used "-- another$CENT"
+ )
+'
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
+test_expect_success 'version 1 changed-path used when autodetect requested' '
+ (
+ cd highbit1 &&
-+ git config --add commitgraph.changedPathsVersion -1 &&
++ git config --add commitGraph.changedPathsVersion -1 &&
+ test_bloom_filters_used "-- another$CENT"
+ )
+'
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
+ git -C highbit1 commit-graph write --reachable --changed-paths &&
+ (
+ cd highbit1 &&
-+ git config --add commitgraph.changedPathsVersion -1 &&
++ git config --add commitGraph.changedPathsVersion -1 &&
+ echo "options: bloom(1,10,7) read_generation_data" >expect &&
+ test-tool read-graph >full &&
+ grep options full >actual &&
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
+
+test_expect_success 'set up repo with high bit path, version 2 changed-path' '
+ git init highbit2 &&
-+ git -C highbit2 config --add commitgraph.changedPathsVersion 2 &&
++ git -C highbit2 config --add commitGraph.changedPathsVersion 2 &&
+ test_commit -C highbit2 c2 "$CENT" &&
+ git -C highbit2 commit-graph write --reachable --changed-paths
+'
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
+test_expect_success 'version 2 changed-path not used when version 1 requested' '
+ (
+ cd highbit2 &&
-+ git config --add commitgraph.changedPathsVersion 1 &&
++ git config --add commitGraph.changedPathsVersion 1 &&
+ test_bloom_filters_not_used "-- another$CENT"
+ )
+'
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
+test_expect_success 'version 2 changed-path used when autodetect requested' '
+ (
+ cd highbit2 &&
-+ git config --add commitgraph.changedPathsVersion -1 &&
++ git config --add commitGraph.changedPathsVersion -1 &&
+ test_bloom_filters_used "-- another$CENT"
+ )
+'
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
+ git -C highbit2 commit-graph write --reachable --changed-paths &&
+ (
+ cd highbit2 &&
-+ git config --add commitgraph.changedPathsVersion -1 &&
++ git config --add commitGraph.changedPathsVersion -1 &&
+ echo "options: bloom(2,10,7) read_generation_data" >expect &&
+ test-tool read-graph >full &&
+ grep options full >actual &&
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when vers
+test_expect_success 'when writing commit graph, do not reuse changed-path of another version' '
+ git init doublewrite &&
+ test_commit -C doublewrite c "$CENT" &&
-+ git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
++ git -C doublewrite config --add commitGraph.changedPathsVersion 1 &&
+ git -C doublewrite commit-graph write --reachable --changed-paths &&
+ for v in -2 3
+ do
-+ git -C doublewrite config --add commitgraph.changedPathsVersion $v &&
++ git -C doublewrite config --add commitGraph.changedPathsVersion $v &&
+ git -C doublewrite commit-graph write --reachable --changed-paths 2>err &&
+ cat >expect <<-EOF &&
-+ warning: attempting to write a commit-graph, but ${SQ}commitgraph.changedPathsVersion${SQ} ($v) is not supported
++ warning: attempting to write a commit-graph, but ${SQ}commitGraph.changedPathsVersion${SQ} ($v) is not supported
+ EOF
+ test_cmp expect err || return 1
+ done &&
-+ git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
++ git -C doublewrite config --add commitGraph.changedPathsVersion 2 &&
+ git -C doublewrite commit-graph write --reachable --changed-paths &&
+ (
+ cd doublewrite &&
14: 9f54a376fb = 14: 242f023135 object.h: fix mis-aligned flag bits table
15: 67991dea7c ! 15: 1b80023e57 commit-graph: reuse existing Bloom filters where possible
@@ bloom.c: static void init_truncated_large_filter(struct bloom_filter *filter,
+ struct tree_desc desc;
+ struct name_entry entry;
+
-+ init_tree_desc(&desc, t->buffer, t->size);
++ init_tree_desc(&desc, &t->object.oid, t->buffer, t->size);
+ while (tree_entry(&desc, &entry)) {
+ size_t i;
+ for (i = 0; i < entry.pathlen; i++) {
@@ t/t4216-log-bloom.sh: test_expect_success 'when writing another commit graph, pr
git init doublewrite &&
test_commit -C doublewrite c "$CENT" &&
+
- git -C doublewrite config --add commitgraph.changedPathsVersion 1 &&
+ git -C doublewrite config --add commitGraph.changedPathsVersion 1 &&
++ >trace2.txt &&
+ GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+ git -C doublewrite commit-graph write --reachable --changed-paths &&
+ test_filter_computed 1 trace2.txt &&
@@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reu
test_cmp expect err || return 1
done &&
+
- git -C doublewrite config --add commitgraph.changedPathsVersion 2 &&
+ git -C doublewrite config --add commitGraph.changedPathsVersion 2 &&
- git -C doublewrite commit-graph write --reachable --changed-paths &&
++ >trace2.txt &&
+ GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+ git -C doublewrite commit-graph write --reachable --changed-paths &&
+ test_filter_computed 1 trace2.txt &&
@@ t/t4216-log-bloom.sh: test_expect_success 'when writing commit graph, do not reu
+
+ test_commit -C upgrade base no-high-bits &&
+
-+ git -C upgrade config --add commitgraph.changedPathsVersion 1 &&
++ git -C upgrade config --add commitGraph.changedPathsVersion 1 &&
++ >trace2.txt &&
+ GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+ git -C upgrade commit-graph write --reachable --changed-paths &&
+ test_filter_computed 1 trace2.txt &&
+ test_filter_upgraded 0 trace2.txt &&
+
-+ git -C upgrade config --add commitgraph.changedPathsVersion 2 &&
++ git -C upgrade config --add commitGraph.changedPathsVersion 2 &&
++ >trace2.txt &&
+ GIT_TRACE2_EVENT="$(pwd)/trace2.txt" \
+ git -C upgrade commit-graph write --reachable --changed-paths &&
+ test_filter_computed 0 trace2.txt &&
16: 12058a074d = 16: db9991f339 bloom: introduce `deinit_bloom_filters()`
base-commit: 1e1586e4ed626bde864339c10570bc0e73f0ab97
--
2.45.2.664.g446e6a2b1f
next prev parent reply other threads:[~2024-06-25 17:39 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 22:52 [PATCH v6 00/16] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2024-01-31 22:52 ` [PATCH v6 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-31 22:52 ` [PATCH v6 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-31 22:52 ` [PATCH v6 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-31 22:52 ` [PATCH v6 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-31 22:52 ` [PATCH v6 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-31 22:52 ` [PATCH v6 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-31 22:52 ` [PATCH v6 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-31 22:52 ` [PATCH v6 10/16] bloom: annotate filters with hash version Taylor Blau
2024-01-31 22:52 ` [PATCH v6 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-31 22:52 ` [PATCH v6 12/16] commit-graph: unconditionally load " Taylor Blau
2024-01-31 22:52 ` [PATCH v6 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-02-25 22:15 ` SZEDER Gábor
2024-02-28 0:11 ` Jiang Xin
2024-01-31 22:52 ` [PATCH v6 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-31 22:52 ` [PATCH v6 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-31 22:53 ` [PATCH v6 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2024-06-25 17:39 ` Taylor Blau [this message]
2024-06-25 17:39 ` [PATCH v7 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-06-25 17:39 ` [PATCH v7 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-06-25 17:39 ` [PATCH v7 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-06-25 17:39 ` [PATCH v7 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-06-25 17:39 ` [PATCH v7 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-06-25 17:39 ` [PATCH v7 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-06-25 17:39 ` [PATCH v7 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-06-25 17:39 ` [PATCH v7 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-06-25 17:39 ` [PATCH v7 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-06-25 17:39 ` [PATCH v7 10/16] bloom: annotate filters with hash version Taylor Blau
2024-06-25 17:39 ` [PATCH v7 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-06-25 17:40 ` [PATCH v7 12/16] commit-graph: unconditionally load " Taylor Blau
2024-06-25 17:40 ` [PATCH v7 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-06-25 17:40 ` [PATCH v7 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-06-25 17:40 ` [PATCH v7 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-06-25 17:40 ` [PATCH v7 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1719333276.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=newren@gmail.com \
--cc=peff@peff.net \
--cc=szeder.dev@gmail.com \
--cc=worldhello.net@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).