git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>, git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>
Subject: Re: [PATCH] bitmaps: don't recurse into trees already in the bitmap
Date: Tue, 15 Jun 2021 10:17:04 -0400	[thread overview]
Message-ID: <471cb9be-bb72-6a37-ede8-f9421d9d3ebe@gmail.com> (raw)
In-Reply-To: <YMdGGL6v8LrbcAJP@coredump.intra.peff.net>

On 6/14/2021 8:05 AM, Jeff King wrote:
> On Mon, Jun 14, 2021 at 03:27:04AM -0400, Jeff King wrote:
...
> We can use the same optimization we do for commits here: when we are
> about to open a tree, see if it's in the bitmap (either the one we are
> building, or the "seen" bitmap which covers the UNINTERESTING side of
> the bitmap when doing a set-difference).

I was surprised that we were not already doing this. As Git can handle
larger repos now than it could a few years ago, I expect this
optimization to be immediately valuable, and critical in years to come.

> But here are numbers from some other real-world repositories (that are
> not public). This one's tree is comparable in size to linux.git, but has
> ~16k refs (and so less complete bitmap coverage):
> 
>   Test                         HEAD^               HEAD
>   -------------------------------------------------------------------------
>   5310.4: simulated clone      38.34(39.86+0.74)   33.95(35.53+0.76) -11.5%
>   5310.5: simulated fetch      2.29(6.31+0.35)     2.20(5.97+0.41) -3.9%
>   5310.7: rev-list (commits)   0.99(0.86+0.13)     0.96(0.85+0.11) -3.0%
>   5310.8: rev-list (objects)   11.32(11.04+0.27)   6.59(6.37+0.21) -41.8%
> 
> And here's another with a very large tree (~340k entries), and a fairly
> large number of refs (~10k):
> 
>   Test                         HEAD^               HEAD
>   -------------------------------------------------------------------------
>   5310.3: simulated clone      53.83(54.71+1.54)   39.77(40.76+1.50) -26.1%
>   5310.4: simulated fetch      19.91(20.11+0.56)   19.79(19.98+0.67) -0.6%
>   5310.6: rev-list (commits)   0.54(0.44+0.11)     0.51(0.43+0.07) -5.6%
>   5310.7: rev-list (objects)   24.32(23.59+0.73)   9.85(9.49+0.36) -59.5%
> 
> This patch provides substantial improvements in these larger cases, and
> have any drawbacks for smaller ones (the cost of the bitmap check is
> quite small compared to an actual tree traversal).

These many-refs scenarios make sense as something that is difficult to
verify using a single fork of an open-source project, but is common in
many closed-source projects that do not use forking to reduce the ref
count per repo.

I was impressed by how little code was required for this change. LGTM.

Thanks,
-Stolee

  reply	other threads:[~2021-06-15 14:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-14  7:27 [PATCH] bitmaps: don't recurse into trees already in the bitmap Jeff King
2021-06-14 12:05 ` Jeff King
2021-06-15 14:17   ` Derrick Stolee [this message]
2021-06-16 12:31     ` Patrick Steinhardt
2021-06-18 12:59       ` Jeff King
2021-06-18 13:35         ` Patrick Steinhardt
2021-06-18 14:10           ` Jeff King
2021-06-22 10:47           ` Patrick Steinhardt
2021-06-22 19:39             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=471cb9be-bb72-6a37-ede8-f9421d9d3ebe@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).