git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, Derrick Stolee <derrickstolee@github.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade
Date: Mon, 21 Aug 2023 16:46:53 -0400	[thread overview]
Message-ID: <ZOPNPU/cekgJU7S7@nand.local> (raw)
In-Reply-To: <20230811221337.3331688-1-jonathantanmy@google.com>

On Fri, Aug 11, 2023 at 03:13:37PM -0700, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > On both linux.git and git.git, this series gives a significant speed-up
> > when upgrading Bloom filters from v1 to v2. On linux.git:
> >
> >     Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit
> >       Time (mean ± σ):     124.873 s ±  0.316 s    [User: 124.081 s, System: 0.643 s]
> >       Range (min … max):   124.621 s … 125.227 s    3 runs
> >
> >     Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths
> >       Time (mean ± σ):     79.271 s ±  0.163 s    [User: 74.611 s, System: 4.521 s]
> >       Range (min … max):   79.112 s … 79.437 s    3 runs
> >
> >     Summary
> >       'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran
> >         1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths'
> >
> > On git.git (where we do have some non-ASCII paths), the change goes from
> > 4.163 seconds to 3.348 seconds, for a 1.24x speed-up.
>
> My main concern is that this modifies the code somewhat pervasively
> (tracking the version of Bloom filters and removing assumptions about
> what Bloom filter versions are in memory) in return for what I think
> is a small speedup, when considering that we will perform this
> operation only once for a given repo. I'll defer to server operators
> on this (or other people handling large numbers of repos), though.

Yeah, I think that this is certainly on the riskier side of things. But
I have confidence in our test coverage here, and feel better with your
thorough review.

FWIW, I think that the speed-up is worthwhile (though I'm obviously
biased here). When we were rolling out changed-path Bloom filters to
repositories on GitHub.com, it was apparent that computing them from
scratch in a single shot was infeasible. This led to 809e0327f5
(builtin/commit-graph.c: introduce '--max-new-filters=<n>', 2020-09-18).

That would of course still save us here if we didn't have an upgrade
path, since each recomputed filter would count against the maximum
number we can generate in a single run.

We could introduce a configuration knob, but I'd rather avoid it if
possible. That can also be done on top in a later patch, which should be
easy enough to do if we hear of a compelling use-case for wanting to
disable the upgrade path here.

> Putting that concern aside, I've reviewed the code and assuming that
> we're OK with modifying the code in this way, this patch set looks good
> to me, and hopefully my review will be of some help.

Thanks very much -- it was more than helpful :-). I'll collect up this
and your previous patches (as we have discussed off-list) and tie
everything together in a single series for the maintainer.

Thanks,
Taylor

      reply	other threads:[~2023-08-21 20:47 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-07 16:37 [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 1/6] bloom: annotate filters with hash version Taylor Blau
2023-08-11 21:46   ` Jonathan Tan
2023-08-17 19:55     ` Taylor Blau
2023-08-21 20:21       ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-11 21:48   ` Jonathan Tan
2023-08-21 20:23     ` Taylor Blau
2023-08-24 22:20   ` Jonathan Tan
2023-08-24 22:47     ` Taylor Blau
2023-08-24 23:05       ` Jonathan Tan
2023-08-25 19:00         ` Taylor Blau
2023-08-29 16:49           ` Jonathan Tan
2023-08-29 19:14             ` Taylor Blau
2023-08-29 22:04               ` Jonathan Tan
2023-08-07 16:37 ` [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-11 22:00   ` Jonathan Tan
2023-08-21 20:40     ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-11 22:06   ` Jonathan Tan
2023-08-11 22:13 ` [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Jonathan Tan
2023-08-21 20:46   ` Taylor Blau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZOPNPU/cekgJU7S7@nand.local \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).