From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: "Jeff King" <peff@peff.net>, "Junio C Hamano" <gitster@pobox.com>,
"Jonathan Tan" <jonathantanmy@google.com>,
"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: [PATCH v6 00/16] bloom: changed-path Bloom filters v2 (& sundries)
Date: Wed, 31 Jan 2024 17:52:08 -0500 [thread overview]
Message-ID: <cover.1706741516.git.me@ttaylorr.com> (raw)
(Rebased onto the tip of 'master', which is now bc7ee2e5e1 (The twelfth
batch, 2024-01-30), at the time of writing).
This series is a minor reroll of the combined efforts of [1] and [2] to
introduce the v2 changed-path Bloom filters, which fixes a bug in our
existing implementation of murmur3 paths with non-ASCII characters (when
the "char" type is signed).
This version is nearly identical to the previous round, modulo some
patch reorganization prompted by a test failure noticed by SZEDER [3].
I'm still not sure how that failure got through, but this round fixes
it, and I've double checked that 'git rebase -ix "make test"' passes on
across all revisions.
Thanks again to Jonathan, Peff, and SZEDER who have helped a great deal
in assembling these patches. As usual, a range-diff is included below.
Thanks in advance for your review!
[1]: https://lore.kernel.org/git/cover.1684790529.git.jonathantanmy@google.com/
[2]: https://lore.kernel.org/git/cover.1691426160.git.me@ttaylorr.com/
[3]: https://lore.kernel.org/git/20240129212614.GB9612@szeder.dev/
Jonathan Tan (1):
gitformat-commit-graph: describe version 2 of BDAT
Taylor Blau (15):
t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
revision.c: consult Bloom filters for root commits
commit-graph: ensure Bloom filters are read with consistent settings
t/helper/test-read-graph.c: extract `dump_graph_info()`
bloom.h: make `load_bloom_filter_from_graph()` public
t/helper/test-read-graph: implement `bloom-filters` mode
t4216: test changed path filters with high bit paths
repo-settings: introduce commitgraph.changedPathsVersion
bloom: annotate filters with hash version
bloom: prepare to discard incompatible Bloom filters
commit-graph: unconditionally load Bloom filters
commit-graph: new Bloom filter version that fixes murmur3
object.h: fix mis-aligned flag bits table
commit-graph: reuse existing Bloom filters where possible
bloom: introduce `deinit_bloom_filters()`
Documentation/config/commitgraph.txt | 29 ++-
Documentation/gitformat-commit-graph.txt | 9 +-
bloom.c | 208 ++++++++++++++-
bloom.h | 38 ++-
commit-graph.c | 64 ++++-
object.h | 3 +-
oss-fuzz/fuzz-commit-graph.c | 2 +-
repo-settings.c | 6 +-
repository.h | 2 +-
revision.c | 26 +-
t/helper/test-bloom.c | 9 +-
t/helper/test-read-graph.c | 67 ++++-
t/t0095-bloom.sh | 8 +
t/t4216-log-bloom.sh | 310 ++++++++++++++++++++++-
14 files changed, 723 insertions(+), 58 deletions(-)
Range-diff against v5:
1: 6292e974c5 = 1: 9df34a2f4f t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
2: 6437cf5888 = 2: a6dc377f1b revision.c: consult Bloom filters for root commits
3: d5a73b9fa6 ! 3: a77ab941bc commit-graph: ensure Bloom filters are read with consistent settings
@@ t/t4216-log-bloom.sh: test_expect_success 'Bloom generation backfills empty comm
+ done
+'
+
-+test_expect_success 'ensure incompatible Bloom filters are ignored' '
++test_expect_success 'ensure Bloom filters with incompatible settings are ignored' '
+ # Compute Bloom filters with "unusual" settings.
+ git -C $repo rev-parse one >in &&
+ GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
4: 85af5e80ef = 4: 56a9fdaff0 gitformat-commit-graph: describe version 2 of BDAT
5: b5ff3a9b34 = 5: 7484a82f7f t/helper/test-read-graph.c: extract `dump_graph_info()`
6: 1b45dc8d85 = 6: 48343f93a2 bloom.h: make `load_bloom_filter_from_graph()` public
7: 42b5126016 = 7: 286fd7dcdb t/helper/test-read-graph: implement `bloom-filters` mode
8: 2029a2e30c = 8: 7de7b89da0 t4216: test changed path filters with high bit paths
9: 79c6c3025a ! 9: b13c9b8ff9 repo-settings: introduce commitgraph.changedPathsVersion
@@ repository.h: struct repo_settings {
int gc_write_commit_graph;
int fetch_write_commit_graph;
int command_requires_full_index;
-
- ## t/t4216-log-bloom.sh ##
-@@ t/t4216-log-bloom.sh: test_expect_success 'setup for mixed Bloom setting tests' '
- done
- '
-
--test_expect_success 'ensure incompatible Bloom filters are ignored' '
-+test_expect_success 'ensure Bloom filters with incompatible settings are ignored' '
- # Compute Bloom filters with "unusual" settings.
- git -C $repo rev-parse one >in &&
- GIT_TEST_BLOOM_SETTINGS_NUM_HASHES=3 git -C $repo commit-graph write \
-@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
- test_must_be_empty err
- '
-
-+test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
-+ rm "$repo/$graph" &&
-+
-+ git -C $repo log --oneline --no-decorate -- $CENT >expect &&
-+
-+ # Compute v1 Bloom filters for commits at the bottom.
-+ git -C $repo rev-parse HEAD^ >in &&
-+ git -C $repo commit-graph write --stdin-commits --changed-paths \
-+ --split <in &&
-+
-+ # Compute v2 Bloomfilters for the rest of the commits at the top.
-+ git -C $repo rev-parse HEAD >in &&
-+ git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write \
-+ --stdin-commits --changed-paths --split=no-merge <in &&
-+
-+ test_line_count = 2 $repo/$chain &&
-+
-+ git -C $repo log --oneline --no-decorate -- $CENT >actual 2>err &&
-+ test_cmp expect actual &&
-+
-+ layer="$(head -n 1 $repo/$chain)" &&
-+ cat >expect.err <<-EOF &&
-+ warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
-+ EOF
-+ test_cmp expect.err err
-+'
-+
- get_first_changed_path_filter () {
- test-tool read-graph bloom-filters >filters.dat &&
- head -n 1 filters.dat
11: a73c77a5ba = 10: 09c44c51a5 bloom: annotate filters with hash version
12: 0c62b36206 = 11: d4995ef600 bloom: prepare to discard incompatible Bloom filters
13: ff348fc49d ! 12: c8e9bb7c88 commit-graph.c: unconditionally load Bloom filters
@@ Metadata
Author: Taylor Blau <me@ttaylorr.com>
## Commit message ##
- commit-graph.c: unconditionally load Bloom filters
+ commit-graph: unconditionally load Bloom filters
In an earlier commit, we began ignoring the Bloom data ("BDAT") chunk
for commit-graphs whose Bloom filters were computed using a hash version
- incompatible with the value of `commitGraph.changedPathVersion`.
+ incompatible with the value of `commitGraph.changedPathVersion`.
Now that the Bloom API has been hardened to discard these incompatible
filters (with the exception of low-level APIs), we can safely load these
@@ Commit message
## commit-graph.c ##
@@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
-
+ g->chunk_bloom_data_size = chunk_size;
hash_version = get_be32(chunk_start);
-- if (*c->commit_graph_changed_paths_version == -1)
-- *c->commit_graph_changed_paths_version = hash_version;
-- else if (hash_version != *c->commit_graph_changed_paths_version)
+- if (hash_version != 1)
- return 0;
-
- g->chunk_bloom_data = chunk_start;
- g->chunk_bloom_data_size = chunk_size;
g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
+ g->bloom_filter_settings->hash_version = hash_version;
+ g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
ctx->write_generation_data = (get_configured_generation_version(r) == 2);
ctx->num_generation_data_overflows = 0;
-- bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2
-- ? 2 : 1;
+ bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version;
bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
bloom_settings.bits_per_entry);
bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
/* We have changed-paths already. Keep them in the next graph */
- if (g && g->bloom_filter_settings) {
+ if (g && g->chunk_bloom_data) {
ctx->changed_paths = 1;
- ctx->bloom_settings = g->bloom_filter_settings;
+
10: cfa5c358f8 ! 13: d2f11c082d commit-graph: new Bloom filter version that fixes murmur3
@@ bloom.h: int load_bloom_filter_from_graph(struct commit_graph *g,
size_t len,
## commit-graph.c ##
-@@ commit-graph.c: static int graph_read_bloom_index(const unsigned char *chunk_start,
- return 0;
- }
-
-+struct graph_read_bloom_data_context {
-+ struct commit_graph *g;
-+ int *commit_graph_changed_paths_version;
-+};
-+
- static int graph_read_bloom_data(const unsigned char *chunk_start,
+@@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
size_t chunk_size, void *data)
{
-- struct commit_graph *g = data;
-+ struct graph_read_bloom_data_context *c = data;
-+ struct commit_graph *g = c->g;
- uint32_t hash_version;
+ struct commit_graph *g = data;
+- uint32_t hash_version;
if (chunk_size < BLOOMDATA_CHUNK_HEADER_SIZE) {
+ warning(_("ignoring too-small changed-path chunk"
@@ commit-graph.c: static int graph_read_bloom_data(const unsigned char *chunk_start,
- return -1;
- }
-+ hash_version = get_be32(chunk_start);
-+
-+ if (*c->commit_graph_changed_paths_version == -1)
-+ *c->commit_graph_changed_paths_version = hash_version;
-+ else if (hash_version != *c->commit_graph_changed_paths_version)
-+ return 0;
-+
g->chunk_bloom_data = chunk_start;
g->chunk_bloom_data_size = chunk_size;
- hash_version = get_be32(chunk_start);
--
-- if (hash_version != 1)
-- return 0;
--
+
g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
- g->bloom_filter_settings->hash_version = hash_version;
+- g->bloom_filter_settings->hash_version = hash_version;
++ g->bloom_filter_settings->hash_version = get_be32(chunk_start);
g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
-@@ commit-graph.c: struct commit_graph *parse_commit_graph(struct repo_settings *s,
- }
-
- if (s->commit_graph_changed_paths_version) {
-+ struct graph_read_bloom_data_context context = {
-+ .g = graph,
-+ .commit_graph_changed_paths_version = &s->commit_graph_changed_paths_version
-+ };
- read_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
- graph_read_bloom_index, graph);
- read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
-- graph_read_bloom_data, graph);
-+ graph_read_bloom_data, &context);
- }
-
- if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {
+ g->bloom_filter_settings->bits_per_entry = get_be32(chunk_start + 8);
+ g->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES;
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
}
if (!commit_graph_compatible(r))
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
CALLOC_ARRAY(ctx, 1);
ctx->r = r;
-@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
- ctx->write_generation_data = (get_configured_generation_version(r) == 2);
- ctx->num_generation_data_overflows = 0;
-
-+ bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2
-+ ? 2 : 1;
- bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
- bloom_settings.bits_per_entry);
- bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
g = ctx->r->objects->commit_graph;
@@ commit-graph.c: int write_commit_graph(struct object_directory *odb,
- if (g && g->chunk_bloom_data) {
+ if (g && g->bloom_filter_settings) {
ctx->changed_paths = 1;
- ctx->bloom_settings = g->bloom_filter_settings;
- }
+
+ /* don't propagate the hash_version unless unspecified */
## t/helper/test-bloom.c ##
@@ t/helper/test-bloom.c: static void get_bloom_filter_for_commit(const struct object_id *commit_oid)
@@ t/t0095-bloom.sh: test_expect_success 'compute unseeded murmur3 hash for test st
Hashes:0x5615800c|0x5b966560|0x61174ab4|0x66983008|0x6c19155c|0x7199fab0|0x771ae004|
## t/t4216-log-bloom.sh ##
+@@ t/t4216-log-bloom.sh: test_expect_success 'merge graph layers with incompatible Bloom settings' '
+ test_must_be_empty err
+ '
+
++test_expect_success 'ensure Bloom filter with incompatible versions are ignored' '
++ rm "$repo/$graph" &&
++
++ git -C $repo log --oneline --no-decorate -- $CENT >expect &&
++
++ # Compute v1 Bloom filters for commits at the bottom.
++ git -C $repo rev-parse HEAD^ >in &&
++ git -C $repo commit-graph write --stdin-commits --changed-paths \
++ --split <in &&
++
++ # Compute v2 Bloomfilters for the rest of the commits at the top.
++ git -C $repo rev-parse HEAD >in &&
++ git -C $repo -c commitGraph.changedPathsVersion=2 commit-graph write \
++ --stdin-commits --changed-paths --split=no-merge <in &&
++
++ test_line_count = 2 $repo/$chain &&
++
++ git -C $repo log --oneline --no-decorate -- $CENT >actual 2>err &&
++ test_cmp expect actual &&
++
++ layer="$(head -n 1 $repo/$chain)" &&
++ cat >expect.err <<-EOF &&
++ warning: disabling Bloom filters for commit-graph layer $SQ$layer$SQ due to incompatible settings
++ EOF
++ test_cmp expect.err err
++'
++
+ get_first_changed_path_filter () {
+ test-tool read-graph bloom-filters >filters.dat &&
+ head -n 1 filters.dat
@@ t/t4216-log-bloom.sh: test_expect_success 'version 1 changed-path used when version 1 requested' '
)
'
14: acc0a097b3 < -: ---------- commit-graph: drop unnecessary `graph_read_bloom_data_context`
15: da401c8853 = 14: 9f54a376fb object.h: fix mis-aligned flag bits table
16: 2674967309 = 15: 67991dea7c commit-graph: reuse existing Bloom filters where possible
17: bea7ec7b3f ! 16: 12058a074d bloom: introduce `deinit_bloom_filters()`
@@ bloom.h: void add_key_to_filter(const struct bloom_key *key,
BLOOM_NOT_COMPUTED = (1 << 0),
## commit-graph.c ##
-@@ commit-graph.c: struct bloom_filter_settings *get_bloom_filter_settings(struct repository *r)
- void close_commit_graph(struct raw_object_store *o)
- {
+@@ commit-graph.c: void close_commit_graph(struct raw_object_store *o)
+ return;
+
clear_commit_graph_data_slab(&commit_graph_data_slab);
+ deinit_bloom_filters();
free_commit_graph(o->commit_graph);
base-commit: bc7ee2e5e16f0d1e710ef8fab3db59ab11f2bbe7
--
2.43.0.509.g253f65a7fc
next reply other threads:[~2024-01-31 22:52 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-31 22:52 Taylor Blau [this message]
2024-01-31 22:52 ` [PATCH v6 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-31 22:52 ` [PATCH v6 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-31 22:52 ` [PATCH v6 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-31 22:52 ` [PATCH v6 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-31 22:52 ` [PATCH v6 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-31 22:52 ` [PATCH v6 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-31 22:52 ` [PATCH v6 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-31 22:52 ` [PATCH v6 10/16] bloom: annotate filters with hash version Taylor Blau
2024-01-31 22:52 ` [PATCH v6 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-31 22:52 ` [PATCH v6 12/16] commit-graph: unconditionally load " Taylor Blau
2024-01-31 22:52 ` [PATCH v6 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-02-25 22:15 ` SZEDER Gábor
2024-02-28 0:11 ` Jiang Xin
2024-01-31 22:52 ` [PATCH v6 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-31 22:52 ` [PATCH v6 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-31 22:53 ` [PATCH v6 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2024-06-25 17:39 ` [PATCH v7 00/16] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2024-06-25 17:39 ` [PATCH v7 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-06-25 17:39 ` [PATCH v7 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-06-25 17:39 ` [PATCH v7 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-06-25 17:39 ` [PATCH v7 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-06-25 17:39 ` [PATCH v7 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-06-25 17:39 ` [PATCH v7 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-06-25 17:39 ` [PATCH v7 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-06-25 17:39 ` [PATCH v7 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-06-25 17:39 ` [PATCH v7 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-06-25 17:39 ` [PATCH v7 10/16] bloom: annotate filters with hash version Taylor Blau
2024-06-25 17:39 ` [PATCH v7 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-06-25 17:40 ` [PATCH v7 12/16] commit-graph: unconditionally load " Taylor Blau
2024-06-25 17:40 ` [PATCH v7 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-06-25 17:40 ` [PATCH v7 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-06-25 17:40 ` [PATCH v7 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-06-25 17:40 ` [PATCH v7 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1706741516.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=peff@peff.net \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.