From: Jeff King <peff@peff.net>
To: Martin Fick <mfick@codeaurora.org>
Cc: Duy Nguyen <pclouds@gmail.com>, Git Mailing List <git@vger.kernel.org>
Subject: Re: Resolving deltas dominates clone time
Date: Tue, 30 Apr 2019 14:02:32 -0400 [thread overview]
Message-ID: <20190430180231.GC16729@sigill.intra.peff.net> (raw)
In-Reply-To: <3329645.KIYB9vJKXd@mfick-lnx>
On Tue, Apr 23, 2019 at 02:09:31PM -0600, Martin Fick wrote:
> Here are my index-pack results (I only ran them once since they take a while)
> using vgit 1.8.3.2:
>
> Threads real user sys
> 1 108m46.151s 106m14.420s 1m57.192s
> 2 58m14.274s 106m23.158s 5m32.736s
> 3 40m33.351s 106m42.281s 5m40.884s
> 4 31m40.342s 107m20.278s 5m40.675s
> 5 26m0.454s 106m54.370s 5m35.827s
> 12 13m25.304s 107m57.271s 6m26.493s
> 16 10m56.866s 107m46.107s 6m41.330s
> 18 10m18.112s 109m50.893s 7m1.369s
> 20 9m54.010s 113m51.028s 7m53.082s
> 24 9m1.104s 115m8.245s 7m57.156s
> 28 8m26.058s 116m46.311s 8m34.752s
> 32 8m42.967s 140m33.280s 9m59.514s
> 36 8m52.228s 151m28.939s 11m55.590s
> 40 8m22.719s 153m4.496s 12m36.041s
> 44 8m12.419s 166m41.594s 14m7.717s
> 48 8m0.377s 172m3.597s 16m32.041s
> 56 8m22.320s 188m31.426s 17m48.274s
Thanks for the data.
That seems to roughly match my results. Things get obviously better up
to around close to half of the available processors, and then you get
minimal returns for more CPU (some of yours actually get worse in the
middle, but that may be due to noise; my timings are all best-of-3).
> I think that if there were no default limit during a clone it could have
> disastrous effects on people using the repo tool from the android project, or
> any other "submodule like" tool that might clone many projects in parallel.
> With the repo tool, people often use a large -j number such as 24 which means
> they end up cloning around 24 projects at a time, and they may do this for
> around 1000 projects. If git clone suddenly started as many threads as there
> are CPUs for each clone, this would likely paralyze the machine.
IMHO this is already a problem, because none of those 24 gits knows
about the others. So they're already using 24*3 cores, though of course
at any given moment some of those 24 may be bottle-necked on the
network.
I suspect that repo should be passing in `-c pack.threads=N`, where `N`
is some formula based around how many cores we want to use, with some
constant fraction applied for how many we expect to be chugging on CPU
at any given point.
The optimal behavior would probably come from index-pack dynamically
assigning work based on system load, but that gets pretty messy. Ideally
we could just throw all of the load at the kernel's scheduler and let it
do the right thing, but:
- we clearly get some inefficiencies from being overly-parallelized,
so we don't want to go too far
- we have other resources we'd like to keep in use like the network
and disk. So probably the optimal case would be to have one (or a
few) index-packs fully utilizing the network, and then as they move
to the local-only CPU-heavy phase, start a few more on the network,
and so on.
There's no way to do that kind of slot-oriented gating now. It
actually wouldn't be too hard at the low-level of the code, but I'm
not sure what interface you'd use to communicate "OK, now go" to
each process.
> I do suspect it would be nice to have a switch though that repo could use to
> adjust this intelligently, is there some way to adjust threads from a clone, I
> don't see one? I tried using 'GIT_FORCE_THREADS=28 git clone ...' and it
> didn't seem to make a difference?
I think I led you astray earlier by mentioning GIT_FORCE_THREADS. It's
actually just a boolean for "use threads even if we're only
single-threaded". What you actually want is probably:
git clone -c pack.threads=28 ...
(though I didn't test it to be sure).
-Peff
next prev parent reply other threads:[~2019-04-30 18:02 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-19 21:47 Resolving deltas dominates clone time Martin Fick
2019-04-20 3:58 ` Jeff King
2019-04-20 7:59 ` Ævar Arnfjörð Bjarmason
2019-04-22 15:57 ` Jeff King
2019-04-22 18:01 ` Ævar Arnfjörð Bjarmason
2019-04-22 18:43 ` Jeff King
2019-04-23 7:07 ` Ævar Arnfjörð Bjarmason
2019-04-22 20:21 ` Martin Fick
2019-04-22 20:56 ` Jeff King
2019-04-22 21:02 ` Jeff King
2019-04-22 21:19 ` [PATCH] p5302: create the repo in each index-pack test Jeff King
2019-04-23 1:09 ` Junio C Hamano
2019-04-23 2:07 ` Jeff King
2019-04-23 2:27 ` Junio C Hamano
2019-04-23 2:36 ` Jeff King
2019-04-23 2:40 ` Junio C Hamano
2019-04-22 22:32 ` Resolving deltas dominates clone time Martin Fick
2019-04-23 1:55 ` Jeff King
2019-04-23 4:21 ` Jeff King
2019-04-23 10:08 ` Duy Nguyen
2019-04-23 20:09 ` Martin Fick
2019-04-30 18:02 ` Jeff King [this message]
2019-04-30 22:08 ` Martin Fick
2019-04-30 17:50 ` Jeff King
2019-04-30 18:48 ` Ævar Arnfjörð Bjarmason
2019-04-30 20:33 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190430180231.GC16729@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=mfick@codeaurora.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).