From: Taylor Blau <me@ttaylorr.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade
Date: Mon, 21 Aug 2023 16:46:53 -0400 [thread overview]
Message-ID: <ZOPNPU/cekgJU7S7@nand.local> (raw)
In-Reply-To: <20230811221337.3331688-1-jonathantanmy@google.com>
On Fri, Aug 11, 2023 at 03:13:37PM -0700, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > On both linux.git and git.git, this series gives a significant speed-up
> > when upgrading Bloom filters from v1 to v2. On linux.git:
> >
> > Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit
> > Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s]
> > Range (min … max): 124.621 s … 125.227 s 3 runs
> >
> > Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths
> > Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s]
> > Range (min … max): 79.112 s … 79.437 s 3 runs
> >
> > Summary
> > 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran
> > 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths'
> >
> > On git.git (where we do have some non-ASCII paths), the change goes from
> > 4.163 seconds to 3.348 seconds, for a 1.24x speed-up.
>
> My main concern is that this modifies the code somewhat pervasively
> (tracking the version of Bloom filters and removing assumptions about
> what Bloom filter versions are in memory) in return for what I think
> is a small speedup, when considering that we will perform this
> operation only once for a given repo. I'll defer to server operators
> on this (or other people handling large numbers of repos), though.
Yeah, I think that this is certainly on the riskier side of things. But
I have confidence in our test coverage here, and feel better with your
thorough review.
FWIW, I think that the speed-up is worthwhile (though I'm obviously
biased here). When we were rolling out changed-path Bloom filters to
repositories on GitHub.com, it was apparent that computing them from
scratch in a single shot was infeasible. This led to 809e0327f5
(builtin/commit-graph.c: introduce '--max-new-filters=<n>', 2020-09-18).
That would of course still save us here if we didn't have an upgrade
path, since each recomputed filter would count against the maximum
number we can generate in a single run.
We could introduce a configuration knob, but I'd rather avoid it if
possible. That can also be done on top in a later patch, which should be
easy enough to do if we hear of a compelling use-case for wanting to
disable the upgrade path here.
> Putting that concern aside, I've reviewed the code and assuming that
> we're OK with modifying the code in this way, this patch set looks good
> to me, and hopefully my review will be of some help.
Thanks very much -- it was more than helpful :-). I'll collect up this
and your previous patches (as we have discussed off-list) and tie
everything together in a single series for the maintainer.
Thanks,
Taylor
prev parent reply other threads:[~2023-08-21 20:47 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-07 16:37 [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 1/6] bloom: annotate filters with hash version Taylor Blau
2023-08-11 21:46 ` Jonathan Tan
2023-08-17 19:55 ` Taylor Blau
2023-08-21 20:21 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-11 21:48 ` Jonathan Tan
2023-08-21 20:23 ` Taylor Blau
2023-08-24 22:20 ` Jonathan Tan
2023-08-24 22:47 ` Taylor Blau
2023-08-24 23:05 ` Jonathan Tan
2023-08-25 19:00 ` Taylor Blau
2023-08-29 16:49 ` Jonathan Tan
2023-08-29 19:14 ` Taylor Blau
2023-08-29 22:04 ` Jonathan Tan
2023-08-07 16:37 ` [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-11 22:00 ` Jonathan Tan
2023-08-21 20:40 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-11 22:06 ` Jonathan Tan
2023-08-11 22:13 ` [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Jonathan Tan
2023-08-21 20:46 ` Taylor Blau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZOPNPU/cekgJU7S7@nand.local \
--to=me@ttaylorr.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).