From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: "Jeff King" <peff@peff.net>, "Junio C Hamano" <gitster@pobox.com>,
"Jonathan Tan" <jonathantanmy@google.com>,
"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH v6 00/16] bloom: changed-path Bloom filters v2 (& sundries)
Date: Wed, 31 Jan 2024 17:52:08 -0500 [thread overview]
Message-ID: <cover.1706741516.git.me@ttaylorr.com> (raw)
(Rebased onto the tip of 'master', which is now bc7ee2e5e1 (The twelfth
batch, 2024-01-30), at the time of writing).
This series is a minor reroll of the combined efforts of [1] and [2] to
introduce the v2 changed-path Bloom filters, which fixes a bug in our
existing implementation of murmur3 paths with non-ASCII characters (when
the "char" type is signed).
This version is nearly identical to the previous round, modulo some
patch reorganization prompted by a test failure noticed by SZEDER [3].
I'm still not sure how that failure got through, but this round fixes
it, and I've double checked that 'git rebase -ix "make test"' passes on
across all revisions.
Thanks again to Jonathan, Peff, and SZEDER who have helped a great deal
in assembling these patches. As usual, a range-diff is included below.
Thanks in advance for your review!
[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/
[3]: https://lore.kernel.org/git/20240129212614.GB9612@szeder.dev/
Jonathan Tan (1):
gitformat-commit-graph: describe version 2 of BDAT
Taylor Blau (15):
t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
revision.c: consult Bloom filters for root commits
commit-graph: ensure Bloom filters are read with consistent settings
t/helper/test-read-graph.c: extract `dump_graph_info()`
bloom.h: make `load_bloom_filter_from_graph()` public
t/helper/test-read-graph: implement `bloom-filters` mode
t4216: test changed path filters with high bit paths
repo-settings: introduce commitgraph.changedPathsVersion
bloom: annotate filters with hash version
bloom: prepare to discard incompatible Bloom filters
commit-graph: unconditionally load Bloom filters
commit-graph: new Bloom filter version that fixes murmur3
object.h: fix mis-aligned flag bits table
commit-graph: reuse existing Bloom filters where possible
bloom: introduce `deinit_bloom_filters()`
Documentation/config/commitgraph.txt | 29 ++-
Documentation/gitformat-commit-graph.txt | 9 +-
bloom.c | 208 ++++++++++++++-
bloom.h | 38 ++-
commit-graph.c | 64 ++++-
object.h | 3 +-
oss-fuzz/fuzz-commit-graph.c | 2 +-
repo-settings.c | 6 +-
repository.h | 2 +-
revision.c | 26 +-
t/helper/test-bloom.c | 9 +-
t/helper/test-read-graph.c | 67 ++++-
t/t0095-bloom.sh | 8 +
t/t4216-log-bloom.sh | 310 ++++++++++++++++++++++-
14 files changed, 723 insertions(+), 58 deletions(-)
Range-diff against v5:
1: 6292e974c5 = 1: 9df34a2f4f t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
2: 6437cf5888 = 2: a6dc377f1b revision.c: consult Bloom filters for root commits
3: d5a73b9fa6 ! 3: a77ab941bc commit-graph: ensure Bloom filters are read with consistent settings
@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
+ done
+'
+
-+test_expect_success 'ensure incompatible Bloom filters are ignored' '
++test_expect_success 'ensure Bloom filters with incompatible settings are ignored' '
+ # Compute Bloom filters with "unusual" settings.
+ git -C $repo rev-parse one >in &&
+ GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
4: 85af5e80ef = 4: 56a9fdaff0 gitformat-commit-graph: describe version 2 of BDAT
5: b5ff3a9b34 = 5: 7484a82f7f t/helper/test-read-graph.c: extract `dump_graph_info()`
6: 1b45dc8d85 = 6: 48343f93a2 bloom.h: make `load_bloom_filter_from_graph()` public
7: 42b5126016 = 7: 286fd7dcdb t/helper/test-read-graph: implement `bloom-filters` mode
8: 2029a2e30c = 8: 7de7b89da0 t4216: test changed path filters with high bit paths
9: 79c6c3025a ! 9: b13c9b8ff9 repo-settings: introduce commitgraph.changedPathsVersion
@@ repository.h: struct repo_settings {
int gc_write_commit_graph;
int fetch_write_commit_graph;
int command_requires_full_index;
-
- ## t/t4216-log-bloom.sh ##
-@@ t/t4216-log-bloom.sh: test_expect_success 'setup for mixed Bloom setting tests' '
- done
- '
-
--test_expect_success 'ensure incompatible Bloom filters are ignored' '
-+test_expect_success 'ensure Bloom filters with incompatible settings are ignored' '
- # Compute Bloom filters with "unusual" settings.
- git -C $repo rev-parse one >in &&
- GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
-@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
- test_must_be_empty err
- '
-
-+test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
-+ rm "$repo/$graph" &&
-+
-+ git -C $repo log --oneline --no-decorate -- $CENT >expect &&
-+
-+ # Compute v1 Bloom filters for commits at the bottom.
-+ git -C $repo rev-parse HEAD^ >in &&
-+ git -C $repo commit-graph write --stdin-commits --changed-paths \
-+ --split <in &&
-+
-+ # Compute v2 Bloomfilters for the rest of the commits at the top.
-+ git -C $repo rev-parse HEAD >in &&
-+ git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write \
-+ --stdin-commits --changed-paths --split=no-merge <in &&
-+
-+ test_line_count = 2 $repo/$chain &&
-+
-+ git -C $repo log --oneline --no-decorate -- $CENT >actual 2>err &&
-+ test_cmp expect actual &&
-+
-+ layer="$(head -n 1 $repo/$chain)" &&
-+ cat >expect.err <<-EOF &&
-+ warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
-+ EOF
-+ test_cmp expect.err err
-+'
-+
- get_first_changed_path_filter () {
- test-tool read-graph bloom-filters >filters.dat &&
- head -n 1 filters.dat
11: a73c77a5ba = 10: 09c44c51a5 bloom: annotate filters with hash version
12: 0c62b36206 = 11: d4995ef600 bloom: prepare to discard incompatible Bloom filters
13: ff348fc49d ! 12: c8e9bb7c88 commit-graph.c: unconditionally load Bloom filters
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- commit-graph.c: unconditionally load Bloom filters
+ commit-graph: unconditionally load Bloom filters
In an earlier commit, we began ignoring the Bloom data ("BDAT") chunk
for commit-graphs whose Bloom filters were computed using a hash version
- incompatible with the value of `commitGraph.changedPathVersion`.
+ incompatible with the value of `commitGraph.changedPathVersion`.
Now that the Bloom API has been hardened to discard these incompatible
filters (with the exception of low-level APIs), we can safely load these
@@ Commit message
## commit-graph.c ##
@@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
-
+ g->chunk_bloom_data_size = chunk_size;
hash_version = get_be32(chunk_start);
-- if (*c->commit_graph_changed_paths_version == -1)
-- *c->commit_graph_changed_paths_version = hash_version;
-- else if (hash_version != *c->commit_graph_changed_paths_version)
+- if (hash_version != 1)
- return 0;
-
- g->chunk_bloom_data = chunk_start;
- g->chunk_bloom_data_size = chunk_size;
g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
+ g->bloom_filter_settings->hash_version = hash_version;
+ g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
ctx->write_generation_data = (get_configured_generation_version(r) == 2);
ctx->num_generation_data_overflows = 0;
-- bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2
-- ? 2 : 1;
+ bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version;
bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
bloom_settings.bits_per_entry);
bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
/* We have changed-paths already. Keep them in the next graph */
- if (g && g->bloom_filter_settings) {
+ if (g && g->chunk_bloom_data) {
ctx->changed_paths = 1;
- ctx->bloom_settings = g->bloom_filter_settings;
+
10: cfa5c358f8 ! 13: d2f11c082d commit-graph: new Bloom filter version that fixes murmur3
@@ bloom.h: int load_bloom_filter_from_graph(struct commit_graph *g,
size_t len,
## commit-graph.c ##
-@@ commit-graph.c: static int graph_read_bloom_index(const unsigned char *chunk_start,
- return 0;
- }
-
-+struct graph_read_bloom_data_context {
-+ struct commit_graph *g;
-+ int *commit_graph_changed_paths_version;
-+};
-+
- static int graph_read_bloom_data(const unsigned char *chunk_start,
+@@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
size_t chunk_size, void *data)
{
-- struct commit_graph *g = data;
-+ struct graph_read_bloom_data_context *c = data;
-+ struct commit_graph *g = c->g;
- uint32_t hash_version;
+ struct commit_graph *g = data;
+- uint32_t hash_version;
if (chunk_size < BLOOMDATA_CHUNK_HEADER_SIZE) {
+ warning(_("ignoring too-small changed-path chunk"
@@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
- return -1;
- }
-+ hash_version = get_be32(chunk_start);
-+
-+ if (*c->commit_graph_changed_paths_version == -1)
-+ *c->commit_graph_changed_paths_version = hash_version;
-+ else if (hash_version != *c->commit_graph_changed_paths_version)
-+ return 0;
-+
g->chunk_bloom_data = chunk_start;
g->chunk_bloom_data_size = chunk_size;
- hash_version = get_be32(chunk_start);
--
-- if (hash_version != 1)
-- return 0;
--
+
g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
- g->bloom_filter_settings->hash_version = hash_version;
+- g->bloom_filter_settings->hash_version = hash_version;
++ g->bloom_filter_settings->hash_version = get_be32(chunk_start);
g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
-@@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repo_settings *s,
- }
-
- if (s->commit_graph_changed_paths_version) {
-+ struct graph_read_bloom_data_context context = {
-+ .g = graph,
-+ .commit_graph_changed_paths_version = &s->commit_graph_changed_paths_version
-+ };
- read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
- graph_read_bloom_index, graph);
- read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
-- graph_read_bloom_data, graph);
-+ graph_read_bloom_data, &context);
- }
-
- if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {
+ g->bloom_filter_settings->bits_per_entry = get_be32(chunk_start + 8);
+ g->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES;
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
}
if (!commit_graph_compatible(r))
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
CALLOC_ARRAY(ctx, 1);
ctx->r = r;
-@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
- ctx->write_generation_data = (get_configured_generation_version(r) == 2);
- ctx->num_generation_data_overflows = 0;
-
-+ bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2
-+ ? 2 : 1;
- bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
- bloom_settings.bits_per_entry);
- bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
g = ctx->r->objects->commit_graph;
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
- if (g && g->chunk_bloom_data) {
+ if (g && g->bloom_filter_settings) {
ctx->changed_paths = 1;
- ctx->bloom_settings = g->bloom_filter_settings;
- }
+
+ /* don't propagate the hash_version unless unspecified */
## t/helper/test-bloom.c ##
@@ t/helper/test-bloom.c: static void get_bloom_filter_for_commit(const struct object_id *commit_oid)
@@ t/t0095-bloom.sh: test_expect_success 'compute unseeded murmur3 hash for test st
Hashes:0x5615800c|0x5b966560|0x61174ab4|0x66983008|0x6c19155c|0x7199fab0|0x771ae004|
## t/t4216-log-bloom.sh ##
+@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
+ test_must_be_empty err
+ '
+
++test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
++ rm "$repo/$graph" &&
++
++ git -C $repo log --oneline --no-decorate -- $CENT >expect &&
++
++ # Compute v1 Bloom filters for commits at the bottom.
++ git -C $repo rev-parse HEAD^ >in &&
++ git -C $repo commit-graph write --stdin-commits --changed-paths \
++ --split <in &&
++
++ # Compute v2 Bloomfilters for the rest of the commits at the top.
++ git -C $repo rev-parse HEAD >in &&
++ git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write \
++ --stdin-commits --changed-paths --split=no-merge <in &&
++
++ test_line_count = 2 $repo/$chain &&
++
++ git -C $repo log --oneline --no-decorate -- $CENT >actual 2>err &&
++ test_cmp expect actual &&
++
++ layer="$(head -n 1 $repo/$chain)" &&
++ cat >expect.err <<-EOF &&
++ warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
++ EOF
++ test_cmp expect.err err
++'
++
+ get_first_changed_path_filter () {
+ test-tool read-graph bloom-filters >filters.dat &&
+ head -n 1 filters.dat
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when version 1 requested' '
)
'
14: acc0a097b3 < -: ---------- commit-graph: drop unnecessary `graph_read_bloom_data_context`
15: da401c8853 = 14: 9f54a376fb object.h: fix mis-aligned flag bits table
16: 2674967309 = 15: 67991dea7c commit-graph: reuse existing Bloom filters where possible
17: bea7ec7b3f ! 16: 12058a074d bloom: introduce `deinit_bloom_filters()`
@@ bloom.h: void add_key_to_filter(const struct bloom_key *key,
BLOOM_NOT_COMPUTED = (1 << 0),
## commit-graph.c ##
-@@ commit-graph.c: struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r)
- void close_commit_graph(struct raw_object_store *o)
- {
+@@ commit-graph.c: void close_commit_graph(struct raw_object_store *o)
+ return;
+
clear_commit_graph_data_slab(&commit_graph_data_slab);
+ deinit_bloom_filters();
free_commit_graph(o->commit_graph);
base-commit: bc7ee2e5e16f0d1e710ef8fab3db59ab11f2bbe7
--
2.43.0.509.g253f65a7fc
next reply other threads:[~2024-01-31 22:52 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 22:52 Taylor Blau [this message]
2024-01-31 22:52 ` [PATCH v6 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-31 22:52 ` [PATCH v6 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-31 22:52 ` [PATCH v6 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-31 22:52 ` [PATCH v6 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-31 22:52 ` [PATCH v6 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-31 22:52 ` [PATCH v6 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-31 22:52 ` [PATCH v6 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-31 22:52 ` [PATCH v6 10/16] bloom: annotate filters with hash version Taylor Blau
2024-01-31 22:52 ` [PATCH v6 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-31 22:52 ` [PATCH v6 12/16] commit-graph: unconditionally load " Taylor Blau
2024-01-31 22:52 ` [PATCH v6 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-02-25 22:15 ` SZEDER Gábor
2024-02-28 0:11 ` Jiang Xin
2024-01-31 22:52 ` [PATCH v6 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-31 22:52 ` [PATCH v6 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-31 22:53 ` [PATCH v6 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2024-06-25 17:39 ` [PATCH v7 00/16] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2024-06-25 17:39 ` [PATCH v7 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-06-25 17:39 ` [PATCH v7 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-06-25 17:39 ` [PATCH v7 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-06-25 17:39 ` [PATCH v7 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-06-25 17:39 ` [PATCH v7 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-06-25 17:39 ` [PATCH v7 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-06-25 17:39 ` [PATCH v7 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-06-25 17:39 ` [PATCH v7 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-06-25 17:39 ` [PATCH v7 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-06-25 17:39 ` [PATCH v7 10/16] bloom: annotate filters with hash version Taylor Blau
2024-06-25 17:39 ` [PATCH v7 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-06-25 17:40 ` [PATCH v7 12/16] commit-graph: unconditionally load " Taylor Blau
2024-06-25 17:40 ` [PATCH v7 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-06-25 17:40 ` [PATCH v7 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-06-25 17:40 ` [PATCH v7 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-06-25 17:40 ` [PATCH v7 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1706741516.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=peff@peff.net \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).