From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Derrick Stolee <derrickstolee@github.com>,
Jonathan Tan <jonathantanmy@google.com>,
Junio C Hamano <gitster@pobox.com>
Subject: [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade
Date: Mon, 7 Aug 2023 12:37:42 -0400 [thread overview]
Message-ID: <cover.1691426160.git.me@ttaylorr.com> (raw)
This series is based off of 'jt/path-filter-fix'.
These few patches implement an idea that we discussed in [1], where we
attempt to reuse existing Bloom filters during an upgrade from v1 to v2
Bloom filters while rewriting the commit-graph.
The core idea is that Bloom filters are reusable when there aren't any
non-ASCII paths in a commit's tree-diff against its first parent (or the
empty tree, if none exists). If we assume that a commit's tree-diff
meets those conditions, we can't conclude anything about whether either
tree contains non-ASCII characters, since they could be unmodified on
either side and thus excluded from the tree-diff.
But assuming the RHS (that there aren't any non-ASCII characters present
in the tree's path set) *does* give us that there aren't any such paths
present in the first-parent tree diff, either.
This series checks whether or not commits meet that criteria, and reuses
the existing Bloom filter (if one exists) when possible. In practice, we
end up visiting relatively few trees, since we mark trees we've already
visited.
On both linux.git and git.git, this series gives a significant speed-up
when upgrading Bloom filters from v1 to v2. On linux.git:
Benchmark 1: GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths
Time (mean ± σ): 124.873 s ± 0.316 s [User: 124.081 s, System: 0.643 s]
Range (min … max): 124.621 s … 125.227 s 3 runs
Benchmark 2: GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths
Time (mean ± σ): 79.271 s ± 0.163 s [User: 74.611 s, System: 4.521 s]
Range (min … max): 79.112 s … 79.437 s 3 runs
Summary
'GIT_TEST_UPGRADE_BLOOM_FILTERS=1 git.compile commit-graph write --reachable --changed-paths' ran
1.58 ± 0.01 times faster than 'GIT_TEST_UPGRADE_BLOOM_FILTERS=0 git.compile commit-graph write --reachable --changed-paths'
On git.git (where we do have some non-ASCII paths), the change goes from
4.163 seconds to 3.348 seconds, for a 1.24x speed-up.
I'm sending this as an RFC, since we are in the middle of the -rc phase,
and 'jt/path-filter-fix' isn't expected[2] to merge into 'master' until
we're on the other side of 2.42.
The structure of this series is as follows:
- The first three patches prepare to load the `BDAT` chunk, even when
the graph's Bloom filter settings are incompatible with the value in
`commitGraph.changedPathsVersion`.
- The fourth patch begins loading `BDAT` chunks unconditionally.
- The fifth patch is a clean-up.
- The sixth and final patch implements the approach discussed above.
Thanks in advance for your thoughts and review :-).
[1]: https://lore.kernel.org/git/ZMKvsObx+uaKA8zF@nand.local/
[2]: https://lore.kernel.org/git/xmqqy1it6ykm.fsf@gitster.g/
Taylor Blau (6):
bloom: annotate filters with hash version
bloom: prepare to discard incompatible Bloom filters
t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()`
commit-graph.c: unconditionally load Bloom filters
object.h: fix mis-aligned flag bits table
commit-graph: reuse existing Bloom filters where possible
bloom.c | 117 +++++++++++++++++++++++++++++++++++++++++--
bloom.h | 22 +++++++-
commit-graph.c | 24 +++++----
object.h | 3 +-
t/t4216-log-bloom.sh | 49 ++++++++++++++++--
5 files changed, 195 insertions(+), 20 deletions(-)
--
2.41.0.407.g6d1c33951b
next reply other threads:[~2023-08-07 16:38 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-07 16:37 Taylor Blau [this message]
2023-08-07 16:37 ` [RFC PATCH 1/6] bloom: annotate filters with hash version Taylor Blau
2023-08-11 21:46 ` Jonathan Tan
2023-08-17 19:55 ` Taylor Blau
2023-08-21 20:21 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 2/6] bloom: prepare to discard incompatible Bloom filters Taylor Blau
2023-08-11 21:48 ` Jonathan Tan
2023-08-21 20:23 ` Taylor Blau
2023-08-24 22:20 ` Jonathan Tan
2023-08-24 22:47 ` Taylor Blau
2023-08-24 23:05 ` Jonathan Tan
2023-08-25 19:00 ` Taylor Blau
2023-08-29 16:49 ` Jonathan Tan
2023-08-29 19:14 ` Taylor Blau
2023-08-29 22:04 ` Jonathan Tan
2023-08-07 16:37 ` [RFC PATCH 3/6] t/t4216-log-bloom.sh: harden `test_bloom_filters_not_used()` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 4/6] commit-graph.c: unconditionally load Bloom filters Taylor Blau
2023-08-11 22:00 ` Jonathan Tan
2023-08-21 20:40 ` Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 5/6] object.h: fix mis-aligned flag bits table Taylor Blau
2023-08-07 16:37 ` [RFC PATCH 6/6] commit-graph: reuse existing Bloom filters where possible Taylor Blau
2023-08-11 22:06 ` Jonathan Tan
2023-08-11 22:13 ` [RFC PATCH 0/6] bloom: reuse existing Bloom filters when possible during upgrade Jonathan Tan
2023-08-21 20:46 ` Taylor Blau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1691426160.git.me@ttaylorr.com \
--to=me@ttaylorr.com \
--cc=derrickstolee@github.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jonathantanmy@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).