From: Taylor Blau <me@ttaylorr.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade
Date: Mon, 21 Aug 2023 16:46:53 -0400 [thread overview]
Message-ID: <ZOPNPU/cekgJU7S7@nand.local> (raw)
In-Reply-To: <20230811221337.3331688-1-jonathantanmy@google.com>
On Fri, Aug 11, 2023 at 03:13:37PM -0700, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > On both linux.git and git.git, this series gives a significant speed-up
> > when upgrading Bloom filters from v1 to v2. On linux.git:
> >
> > Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit
> > Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s]
> > Range (min … max): 124.621 s … 125.227 s 3 runs
> >
> > Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths
> > Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s]
> > Range (min … max): 79.112 s … 79.437 s 3 runs
> >
> > Summary
> > 'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran
> > 1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths'
> >
> > On git.git (where we do have some non-ASCII paths), the change goes from
> > 4.163 seconds to 3.348 seconds, for a 1.24x speed-up.
>
> My main concern is that this modifies the code somewhat pervasively
> (tracking the version of Bloom filters and removing assumptions about
> what Bloom filter versions are in memory) in return for what I think
> is a small speedup, when considering that we will perform this
> operation only once for a given repo. I'll defer to server operators
> on this (or other people handling large numbers of repos), though.
Yeah, I think that this is certainly on the riskier side of things. But
I have confidence in our test coverage here, and feel better with your
thorough review.
FWIW, I think that the speed-up is worthwhile (though I'm obviously
biased here). When we were rolling out changed-path Bloom filters to
repositories on GitHub.com, it was apparent that computing them from
scratch in a single shot was infeasible. This led to 809e0327f5
(builtin/commit-graph.c: introduce '--max-new-filters=<n>', 2020-09-18).
That would of course still save us here if we didn't have an upgrade
path, since each recomputed filter would count against the maximum
number we can generate in a single run.
We could introduce a configuration knob, but I'd rather avoid it if
possible. That can also be done on top in a later patch, which should be
easy enough to do if we hear of a compelling use-case for wanting to
disable the upgrade path here.
> Putting that concern aside, I've reviewed the code and assuming that
> we're OK with modifying the code in this way, this patch set looks good
> to me, and hopefully my review will be of some help.
Thanks very much -- it was more than helpful :-). I'll collect up this
and your previous patches (as we have discussed off-list) and tie
everything together in a single series for the maintainer.
Thanks,
Taylor
prev parent reply other threads:[~2023-08-21 20:47 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-07 16:37 [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 1/6] bloom: annotate filters with hash version Taylor Blau
2023-08-11 21:46 ` Jonathan Tan
2023-08-17 19:55 ` Taylor Blau
2023-08-21 20:21 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-11 21:48 ` Jonathan Tan
2023-08-21 20:23 ` Taylor Blau
2023-08-24 22:20 ` Jonathan Tan
2023-08-24 22:47 ` Taylor Blau
2023-08-24 23:05 ` Jonathan Tan
2023-08-25 19:00 ` Taylor Blau
2023-08-29 16:49 ` Jonathan Tan
2023-08-29 19:14 ` Taylor Blau
2023-08-29 22:04 ` Jonathan Tan
2023-08-07 16:37 ` [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-11 22:00 ` Jonathan Tan
2023-08-21 20:40 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-11 22:06 ` Jonathan Tan
2023-08-11 22:13 ` [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Jonathan Tan
2023-08-21 20:46 ` Taylor Blau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZOPNPU/cekgJU7S7@nand.local \
--to=me@ttaylorr.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.