From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH] bitmaps: don't recurse into trees already in the bitmap
Date: Tue, 15 Jun 2021 10:17:04 -0400 [thread overview]
Message-ID: <471cb9be-bb72-6a37-ede8-f9421d9d3ebe@gmail.com> (raw)
In-Reply-To: <YMdGGL6v8LrbcAJP@coredump.intra.peff.net>
On 6/14/2021 8:05 AM, Jeff King wrote:
> On Mon, Jun 14, 2021 at 03:27:04AM -0400, Jeff King wrote:
...
> We can use the same optimization we do for commits here: when we are
> about to open a tree, see if it's in the bitmap (either the one we are
> building, or the "seen" bitmap which covers the UNINTERESTING side of
> the bitmap when doing a set-difference).
I was surprised that we were not already doing this. As Git can handle
larger repos now than it could a few years ago, I expect this
optimization to be immediately valuable, and critical in years to come.
> But here are numbers from some other real-world repositories (that are
> not public). This one's tree is comparable in size to linux.git, but has
> ~16k refs (and so less complete bitmap coverage):
>
> Test HEAD^ HEAD
> -------------------------------------------------------------------------
> 5310.4: simulated clone 38.34(39.86+0.74) 33.95(35.53+0.76) -11.5%
> 5310.5: simulated fetch 2.29(6.31+0.35) 2.20(5.97+0.41) -3.9%
> 5310.7: rev-list (commits) 0.99(0.86+0.13) 0.96(0.85+0.11) -3.0%
> 5310.8: rev-list (objects) 11.32(11.04+0.27) 6.59(6.37+0.21) -41.8%
>
> And here's another with a very large tree (~340k entries), and a fairly
> large number of refs (~10k):
>
> Test HEAD^ HEAD
> -------------------------------------------------------------------------
> 5310.3: simulated clone 53.83(54.71+1.54) 39.77(40.76+1.50) -26.1%
> 5310.4: simulated fetch 19.91(20.11+0.56) 19.79(19.98+0.67) -0.6%
> 5310.6: rev-list (commits) 0.54(0.44+0.11) 0.51(0.43+0.07) -5.6%
> 5310.7: rev-list (objects) 24.32(23.59+0.73) 9.85(9.49+0.36) -59.5%
>
> This patch provides substantial improvements in these larger cases, and
> have any drawbacks for smaller ones (the cost of the bitmap check is
> quite small compared to an actual tree traversal).
These many-refs scenarios make sense as something that is difficult to
verify using a single fork of an open-source project, but is common in
many closed-source projects that do not use forking to reduce the ref
count per repo.
I was impressed by how little code was required for this change. LGTM.
Thanks,
-Stolee
next prev parent reply other threads:[~2021-06-15 14:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-14 7:27 [PATCH] bitmaps: don't recurse into trees already in the bitmap Jeff King
2021-06-14 12:05 ` Jeff King
2021-06-15 14:17 ` Derrick Stolee [this message]
2021-06-16 12:31 ` Patrick Steinhardt
2021-06-18 12:59 ` Jeff King
2021-06-18 13:35 ` Patrick Steinhardt
2021-06-18 14:10 ` Jeff King
2021-06-22 10:47 ` Patrick Steinhardt
2021-06-22 19:39 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=471cb9be-bb72-6a37-ede8-f9421d9d3ebe@gmail.com \
--to=stolee@gmail.com \
--cc=git@vger.kernel.org \
--cc=peff@peff.net \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).