git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: "Elijah Newren" <newren@gmail.com>, "Jeff King" <peff@peff.net>,
	"Jiang Xin" <worldhello.net@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Junio C Hamano" <gitster@pobox.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>,
	"Elijah Newren" <newren@gmail.com>
Subject: [PATCH v7 11/16] bloom: prepare to discard incompatible Bloom filters
Date: Tue, 25 Jun 2024 13:39:57 -0400	[thread overview]
Message-ID: <99ab9cf448eea65db1a561a79223fe70073d1be0.1719333276.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1719333276.git.me@ttaylorr.com>

Callers use the inline `get_bloom_filter()` implementation as a thin
wrapper around `get_or_compute_bloom_filter()`. The former calls the
latter with a value of "0" for `compute_if_not_present`, making
`get_bloom_filter()` the default read-only path for fetching an existing
Bloom filter.

Callers expect the value returned from `get_bloom_filter()` is usable,
that is that it's compatible with the configured value corresponding to
`commitGraph.changedPathsVersion`.

This is OK, since the commit-graph machinery only initializes its BDAT
chunk (thereby enabling it to service Bloom filter queries) when the
Bloom filter hash_version is compatible with our settings. So any value
returned by `get_bloom_filter()` is trivially useable.

However, subsequent commits will load the BDAT chunk even when the Bloom
filters are built with incompatible hash versions. Prepare to handle
this by teaching `get_bloom_filter()` to discard filters that are
incompatible with the configured hash version.

Callers who wish to read incompatible filters (e.g., for upgrading
filters from v1 to v2) may use the lower level routine,
`get_or_compute_bloom_filter()`.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 bloom.c | 20 +++++++++++++++++++-
 bloom.h | 20 ++++++++++++++++++--
 2 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/bloom.c b/bloom.c
index e64e53bc4c..c24489dbcf 100644
--- a/bloom.c
+++ b/bloom.c
@@ -220,6 +220,23 @@ static void init_truncated_large_filter(struct bloom_filter *filter,
 	filter->version = version;
 }
 
+struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
+{
+	struct bloom_filter *filter;
+	int hash_version;
+
+	filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL);
+	if (!filter)
+		return NULL;
+
+	prepare_repo_settings(r);
+	hash_version = r->settings.commit_graph_changed_paths_version;
+
+	if (!(hash_version == -1 || hash_version == filter->version))
+		return NULL; /* unusable filter */
+	return filter;
+}
+
 struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 						 struct commit *c,
 						 int compute_if_not_present,
@@ -245,7 +262,8 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 						     filter, graph_pos);
 	}
 
-	if (filter->data && filter->len)
+	if ((filter->data && filter->len) &&
+	    (!settings || settings->hash_version == filter->version))
 		return filter;
 	if (!compute_if_not_present)
 		return NULL;
diff --git a/bloom.h b/bloom.h
index c9dd7d4022..052a993aab 100644
--- a/bloom.h
+++ b/bloom.h
@@ -108,8 +108,24 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
 						 const struct bloom_filter_settings *settings,
 						 enum bloom_filter_computed *computed);
 
-#define get_bloom_filter(r, c) get_or_compute_bloom_filter( \
-	(r), (c), 0, NULL, NULL)
+/*
+ * Find the Bloom filter associated with the given commit "c".
+ *
+ * If any of the following are true
+ *
+ *   - the repository does not have a commit-graph, or
+ *   - the repository disables reading from the commit-graph, or
+ *   - the given commit does not have a Bloom filter computed, or
+ *   - there is a Bloom filter for commit "c", but it cannot be read
+ *     because the filter uses an incompatible version of murmur3
+ *
+ * , then `get_bloom_filter()` will return NULL. Otherwise, the corresponding
+ * Bloom filter will be returned.
+ *
+ * For callers who wish to inspect Bloom filters with incompatible hash
+ * versions, use get_or_compute_bloom_filter().
+ */
+struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c);
 
 int bloom_filter_contains(const struct bloom_filter *filter,
 			  const struct bloom_key *key,
-- 
2.45.2.664.g446e6a2b1f


  parent reply	other threads:[~2024-06-25 17:40 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-31 22:52 [PATCH v6 00/16] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2024-01-31 22:52 ` [PATCH v6 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-01-31 22:52 ` [PATCH v6 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-01-31 22:52 ` [PATCH v6 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-01-31 22:52 ` [PATCH v6 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-01-31 22:52 ` [PATCH v6 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-01-31 22:52 ` [PATCH v6 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-01-31 22:52 ` [PATCH v6 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-01-31 22:52 ` [PATCH v6 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-01-31 22:52 ` [PATCH v6 10/16] bloom: annotate filters with hash version Taylor Blau
2024-01-31 22:52 ` [PATCH v6 11/16] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2024-01-31 22:52 ` [PATCH v6 12/16] commit-graph: unconditionally load " Taylor Blau
2024-01-31 22:52 ` [PATCH v6 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-02-25 22:15   ` SZEDER Gábor
2024-02-28  0:11   ` Jiang Xin
2024-01-31 22:52 ` [PATCH v6 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-01-31 22:52 ` [PATCH v6 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-01-31 22:53 ` [PATCH v6 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau
2024-06-25 17:39 ` [PATCH v7 00/16] bloom: changed-path Bloom filters v2 (& sundries) Taylor Blau
2024-06-25 17:39   ` [PATCH v7 01/16] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2024-06-25 17:39   ` [PATCH v7 02/16] revision.c: consult Bloom filters for root commits Taylor Blau
2024-06-25 17:39   ` [PATCH v7 03/16] commit-graph: ensure Bloom filters are read with consistent settings Taylor Blau
2024-06-25 17:39   ` [PATCH v7 04/16] gitformat-commit-graph: describe version 2 of BDAT Taylor Blau
2024-06-25 17:39   ` [PATCH v7 05/16] t/helper/test-read-graph.c: extract `dump_graph_info()` Taylor Blau
2024-06-25 17:39   ` [PATCH v7 06/16] bloom.h: make `load_bloom_filter_from_graph()` public Taylor Blau
2024-06-25 17:39   ` [PATCH v7 07/16] t/helper/test-read-graph: implement `bloom-filters` mode Taylor Blau
2024-06-25 17:39   ` [PATCH v7 08/16] t4216: test changed path filters with high bit paths Taylor Blau
2024-06-25 17:39   ` [PATCH v7 09/16] repo-settings: introduce commitgraph.changedPathsVersion Taylor Blau
2024-06-25 17:39   ` [PATCH v7 10/16] bloom: annotate filters with hash version Taylor Blau
2024-06-25 17:39   ` Taylor Blau [this message]
2024-06-25 17:40   ` [PATCH v7 12/16] commit-graph: unconditionally load Bloom filters Taylor Blau
2024-06-25 17:40   ` [PATCH v7 13/16] commit-graph: new Bloom filter version that fixes murmur3 Taylor Blau
2024-06-25 17:40   ` [PATCH v7 14/16] object.h: fix mis-aligned flag bits table Taylor Blau
2024-06-25 17:40   ` [PATCH v7 15/16] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2024-06-25 17:40   ` [PATCH v7 16/16] bloom: introduce `deinit_bloom_filters()` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=99ab9cf448eea65db1a561a79223fe70073d1be0.1719333276.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=szeder.dev@gmail.com \
    --cc=worldhello.net@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).