From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, "Jonathan Tan" <jonathantanmy@google.com>,
"Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
"SZEDER Gábor" <szeder.dev@gmail.com>
Subject: Re: [PATCH v3 13/17] commit-graph.c: unconditionally load Bloom filters
Date: Tue, 17 Oct 2023 10:45:29 +0200 [thread overview]
Message-ID: <ZS5JqdUmUWZ8OMoo@tanuki> (raw)
In-Reply-To: <09d8669c3a074e7a2ace9d650a345244b2362f7e.1696969994.git.me@ttaylorr.com>
[-- Attachment #1: Type: text/plain, Size: 4073 bytes --]
On Tue, Oct 10, 2023 at 04:33:59PM -0400, Taylor Blau wrote:
> In 9e4df4da07 (commit-graph: new filter ver. that fixes murmur3,
Nit: It's a bit funny to read this reference to a commit ID when the
commit in question is part of the same series. Isn't it likely to grow
stale?
> 2023-08-01), we began ignoring the Bloom data ("BDAT") chunk for
> commit-graphs whose Bloom filters were computed using a hash version
> incompatible with the value of `commitGraph.changedPathVersion`.
>
> Now that the Bloom API has been hardened to discard these incompatible
> filters (with the exception of low-level APIs), we can safely load these
> Bloom filters unconditionally.
>
> We no longer want to return early from `graph_read_bloom_data()`, and
> similarly do not want to set the bloom_settings' `hash_version` field as
> a side-effect. The latter is because we want to wait until we know which
> Bloom settings we're using (either the defaults, from the GIT_TEST
> variables, or from the previous commit-graph layer) before deciding what
> hash_version to use.
>
> If we detect an existing BDAT chunk, we'll infer the rest of the
> settings (e.g., number of hashes, bits per entry, and maximum number of
> changed paths) from the earlier graph layer. The hash_version will be
> inferred from the previous layer as well, unless one has already been
> specified via configuration.
>
> Once all of that is done, we normalize the value of the hash_version to
> either "1" or "2".
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> commit-graph.c | 19 ++++++++++---------
> 1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index db623afd09..fa3b58e762 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -327,12 +327,6 @@ static int graph_read_bloom_data(const unsigned char *chunk_start,
> uint32_t hash_version;
> hash_version = get_be32(chunk_start);
>
> - if (*c->commit_graph_changed_paths_version == -1) {
> - *c->commit_graph_changed_paths_version = hash_version;
> - } else if (hash_version != *c->commit_graph_changed_paths_version) {
> - return 0;
> - }
> -
> g->chunk_bloom_data = chunk_start;
> g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
> g->bloom_filter_settings->hash_version = hash_version;
> @@ -2473,8 +2467,7 @@ int write_commit_graph(struct object_directory *odb,
> ctx->write_generation_data = (get_configured_generation_version(r) == 2);
> ctx->num_generation_data_overflows = 0;
>
> - bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version == 2
> - ? 2 : 1;
> + bloom_settings.hash_version = r->settings.commit_graph_changed_paths_version;
> bloom_settings.bits_per_entry = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_BITS_PER_ENTRY",
> bloom_settings.bits_per_entry);
> bloom_settings.num_hashes = git_env_ulong("GIT_TEST_BLOOM_SETTINGS_NUM_HASHES",
> @@ -2506,10 +2499,18 @@ int write_commit_graph(struct object_directory *odb,
> /* We have changed-paths already. Keep them in the next graph */
> if (g && g->bloom_filter_settings) {
> ctx->changed_paths = 1;
> - ctx->bloom_settings = g->bloom_filter_settings;
> +
> + /* don't propagate the hash_version unless unspecified */
> + if (bloom_settings.hash_version == -1)
> + bloom_settings.hash_version = g->bloom_filter_settings->hash_version;
> + bloom_settings.bits_per_entry = g->bloom_filter_settings->bits_per_entry;
> + bloom_settings.num_hashes = g->bloom_filter_settings->num_hashes;
> + bloom_settings.max_changed_paths = g->bloom_filter_settings->max_changed_paths;
> }
> }
>
> + bloom_settings.hash_version = bloom_settings.hash_version == 2 ? 2 : 1;
> +
What if there is a future version of Git that writes Bloom filters with
hash version 3? Should we really normalize that to `1`?
Patrick
> if (ctx->split) {
> struct commit_graph *g = ctx->r->objects->commit_graph;
>
> --
> 2.42.0.342.g8bb3a896ee
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2023-10-17 8:45 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-21 21:43 [PATCH 00/15] bloom: changed-path Bloom filters v2 Taylor Blau
2023-08-21 21:43 ` [PATCH 01/15] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-08-21 21:44 ` [PATCH 02/15] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-08-21 21:44 ` [PATCH 03/15] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-08-21 21:44 ` [PATCH 04/15] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-08-21 21:44 ` [PATCH 05/15] t4216: test changed path filters with high bit paths Taylor Blau
2023-08-21 21:44 ` [PATCH 06/15] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-08-21 21:44 ` [PATCH 07/15] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-08-26 15:06 ` SZEDER Gábor
2023-08-29 16:31 ` Jonathan Tan
2023-08-30 20:02 ` SZEDER Gábor
2023-09-01 20:56 ` Jonathan Tan
2023-09-25 23:03 ` Taylor Blau
2023-10-08 14:35 ` SZEDER Gábor
2023-10-09 18:17 ` Taylor Blau
2023-10-09 19:31 ` Taylor Blau
2023-10-09 19:52 ` Junio C Hamano
2023-10-10 20:34 ` Taylor Blau
2023-08-21 21:44 ` [PATCH 08/15] bloom: annotate filters with hash version Taylor Blau
2023-08-21 21:44 ` [PATCH 09/15] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-21 21:44 ` [PATCH 10/15] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-21 21:44 ` [PATCH 11/15] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-21 21:44 ` [PATCH 12/15] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-08-21 21:44 ` [PATCH 13/15] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-21 21:44 ` [PATCH 14/15] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-21 21:44 ` [PATCH 15/15] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-08-24 22:22 ` [PATCH 00/15] bloom: changed-path Bloom filters v2 Jonathan Tan
2023-08-25 17:06 ` Jonathan Tan
2023-08-29 22:18 ` Jonathan Tan
2023-08-29 23:16 ` Junio C Hamano
2023-08-30 16:43 ` [PATCH v2 " Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 01/15] gitformat-commit-graph: describe version 2 of BDAT Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 02/15] t/helper/test-read-graph.c: extract `dump_graph_info()` Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 03/15] bloom.h: make `load_bloom_filter_from_graph()` public Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 04/15] t/helper/test-read-graph: implement `bloom-filters` mode Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 05/15] t4216: test changed path filters with high bit paths Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 06/15] repo-settings: introduce commitgraph.changedPathsVersion Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 07/15] commit-graph: new filter ver. that fixes murmur3 Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 08/15] bloom: annotate filters with hash version Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 09/15] bloom: prepare to discard incompatible Bloom filters Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 10/15] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 11/15] commit-graph.c: unconditionally load Bloom filters Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 12/15] commit-graph: drop unnecessary `graph_read_bloom_data_context` Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 13/15] object.h: fix mis-aligned flag bits table Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 14/15] commit-graph: reuse existing Bloom filters where possible Jonathan Tan
2023-08-30 16:43 ` [PATCH v2 15/15] bloom: introduce `deinit_bloom_filters()` Jonathan Tan
2023-08-30 19:38 ` [PATCH v2 00/15] bloom: changed-path Bloom filters v2 Junio C Hamano
2023-10-10 20:33 ` [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2023-10-10 20:33 ` [PATCH v3 01/17] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-10-10 20:33 ` [PATCH v3 02/17] revision.c: consult Bloom filters for root commits Taylor Blau
2023-10-10 20:33 ` [PATCH v3 03/17] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt
2023-10-10 20:33 ` [PATCH v3 04/17] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2023-10-10 20:33 ` [PATCH v3 05/17] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt
2023-10-18 17:37 ` Taylor Blau
2023-10-18 23:56 ` Junio C Hamano
2023-10-10 20:33 ` [PATCH v3 06/17] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2023-10-10 20:33 ` [PATCH v3 07/17] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2023-10-10 20:33 ` [PATCH v3 08/17] t4216: test changed path filters with high bit paths Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt
2023-10-18 17:41 ` Taylor Blau
2023-10-10 20:33 ` [PATCH v3 09/17] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2023-10-10 20:33 ` [PATCH v3 10/17] commit-graph: new filter ver. that fixes murmur3 Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt
2023-10-18 17:46 ` Taylor Blau
2023-10-10 20:33 ` [PATCH v3 11/17] bloom: annotate filters with hash version Taylor Blau
2023-10-10 20:33 ` [PATCH v3 12/17] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-10-10 20:33 ` [PATCH v3 13/17] commit-graph.c: unconditionally load " Taylor Blau
2023-10-17 8:45 ` Patrick Steinhardt [this message]
2023-10-10 20:34 ` [PATCH v3 14/17] commit-graph: drop unnecessary `graph_read_bloom_data_context` Taylor Blau
2023-10-10 20:34 ` [PATCH v3 15/17] object.h: fix mis-aligned flag bits table Taylor Blau
2023-10-10 20:34 ` [PATCH v3 16/17] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-10-10 20:34 ` [PATCH v3 17/17] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2023-10-17 8:45 ` [PATCH v3 00/17] bloom: changed-path Bloom filters v2 (& sundries) Patrick Steinhardt
2023-10-18 17:47 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZS5JqdUmUWZ8OMoo@tanuki \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=szeder.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).