From: Taylor Blau <me@ttaylorr.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters
Date: Tue, 29 Aug 2023 15:14:13 -0400 [thread overview]
Message-ID: <ZO5DheoYy/syeJ+9@nand.local> (raw)
In-Reply-To: <20230829164926.367260-1-jonathantanmy@google.com>
On Tue, Aug 29, 2023 at 09:49:26AM -0700, Jonathan Tan wrote:
> > > > + if (!(hash_version == -1 || hash_version == filter->version))
> > >
> > > No need for the comparison to -1 here.
> >
> > I'm not sure I understand your suggestion. When we fetch the filter from
> > get_or_compute_bloom_filter(), we have filter->version set to the
> > hash_version from the containing graph's Bloom filter settings.
> >
> > So (besides the normalization), I would think that:
> >
> > struct bloom_filter_settings *s = get_bloom_filter_settings(r);
> > struct bloom_filter *f = get_bloom_filter(...);
> >
> > assert(s->hash_version == f->version);
> >
> > would hold.
>
> My mention to avoid the comparison to -1 was just for completeness
> - since we're normalizing the value of hash_version to 1 or 2, we no
> longer need to compare it to -1.
>
> As for whether s->hash_version is always equal to f->version, I think
> that it may not be true if for some reason, there are multiple commit
> graph files on disk, not all with the same Bloom filter version.
Apologies for not quite grokking this, but I am still somewhat confused.
Suppose we applied something like your suggestion on top of this patch,
like so:
--- 8< ---
diff --git a/bloom.c b/bloom.c
index 739fa093ba..ee81976342 100644
--- a/bloom.c
+++ b/bloom.c
@@ -253,16 +253,16 @@ static void init_truncated_large_filter(struct bloom_filter *filter,
struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
{
struct bloom_filter *filter;
- int hash_version;
+ struct bloom_filter_settings *settings;
filter = get_or_compute_bloom_filter(r, c, 0, NULL, NULL);
if (!filter)
return NULL;
prepare_repo_settings(r);
- hash_version = r->settings.commit_graph_changed_paths_version;
+ settings = get_bloom_filter_settings(r);
- if (!(hash_version == -1 || hash_version == filter->version))
+ if (filter->version != (settings->hash_version == 2 ? 2 : 1))
return NULL; /* unusable filter */
return filter;
}
--- >8 ---
We have a handful of failing tests, notably including t4216.1, which
tries to read settings->hash_version, but fails since settings is NULL.
I believe this happens when we have a computed Bloom filter prepared for
some commit, but have not yet written a commit graph containing that (or
any) Bloom filter.
But I think we're talking about different things here. The point of
get_bloom_filter() as a wrapper around get_or_compute_bloom_filter() is
to only return Bloom filters that are readable under the configuration
commitGraph.changedPathsVersion.
We have a handle on what version was used to compute Bloom filters in
each layer of the graph by reading its "version" field, which is written
via load_bloom_filter_from_graph().
So we want to return a non-NULL filter exactly when we (a) have a Bloom
filter computed for the given commit, and (b) its version matches the
version specified in commitGraph.chagnedPathsVersion.
Are you saying that we need to do the normalization ahead of time so
that we don't emit filters that have different hash versions when
working in a commit-graph chain where each layer may use different Bloom
filter settings? If so, then I think the change we'd need to make is
limited to:
--- 8< ---
diff --git a/bloom.c b/bloom.c
index 739fa093ba..d5cc23b0a8 100644
--- a/bloom.c
+++ b/bloom.c
@@ -260,9 +260,9 @@ struct bloom_filter *get_bloom_filter(struct repository *r, struct commit *c)
return NULL;
prepare_repo_settings(r);
- hash_version = r->settings.commit_graph_changed_paths_version;
+ hash_version = r->settings.commit_graph_changed_paths_version == 2 ? 2 : 1;
- if (!(hash_version == -1 || hash_version == filter->version))
+ if (hash_version == filter->version)
return NULL; /* unusable filter */
return filter;
}
--- >8 ---
But that doesn't quite work, either, since it's breaking some assumption
from the caller and causing us not to write out any Bloom filters (I
haven't investigated further, but I'm not sure that this is the right
path in the first place...).
So I am not sure what you are suggesting, but I am sure that I'm missing
something.
Thanks,
Taylor
next prev parent reply other threads:[~2023-08-29 19:15 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-07 16:37 [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 1/6] bloom: annotate filters with hash version Taylor Blau
2023-08-11 21:46 ` Jonathan Tan
2023-08-17 19:55 ` Taylor Blau
2023-08-21 20:21 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-11 21:48 ` Jonathan Tan
2023-08-21 20:23 ` Taylor Blau
2023-08-24 22:20 ` Jonathan Tan
2023-08-24 22:47 ` Taylor Blau
2023-08-24 23:05 ` Jonathan Tan
2023-08-25 19:00 ` Taylor Blau
2023-08-29 16:49 ` Jonathan Tan
2023-08-29 19:14 ` Taylor Blau [this message]
2023-08-29 22:04 ` Jonathan Tan
2023-08-07 16:37 ` [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-11 22:00 ` Jonathan Tan
2023-08-21 20:40 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-11 22:06 ` Jonathan Tan
2023-08-11 22:13 ` [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Jonathan Tan
2023-08-21 20:46 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZO5DheoYy/syeJ+9@nand.local \
--to=me@ttaylorr.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).