From: Martin Fick <mfick@codeaurora.org>
To: Git Mailing List <git@vger.kernel.org>
Subject: Resolving deltas dominates clone time
Date: Fri, 19 Apr 2019 15:47:22 -0600 [thread overview]
Message-ID: <259296914.jpyqiltySj@mfick-lnx> (raw)
We have a serious performance problem with one of our large repos. The repo is
our internal version of the android platform/manifest project. Our repo after
running a clean "repack -A -d -F" is close to 8G in size, has over 700K refs,
and it has over 8M objects. The repo takes around 40min to clone locally (same
disk to same disk) using git 1.8.2.1 on a high end machine (56 processors,
128GB RAM)! It takes around 10mins before getting to the resolving deltas
phase which then takes most of the rest of the time.
While this is a fairly large repo, a straight cp -r of the repo takes less
than 2mins, so I would expect a clone to be on the same order of magnitude in
time. For perspective, I have a kernel/msm repo with a third of the ref count
and double the object count which takes only around 20mins to clone on the
same machine (still slower than I would like).
I mention 1.8.2.1 because we have many old machines which need this. However,
I also tested this with git v2.18 and it actually is much slower even
(~140mins).
Reading the advice on the net, people seem to think that repacking with
shorter delta-chains would help improve this. I have not had any success with
this yet.
I have been thinking about this problem, and I suspect that this compute time
is actually spent doing SHA1 calculations, is that possible? Some basic back
of the envelope math and scripting seems to show that the repo may actually
contain about 2TB of data if you add up the size of all the objects in the
repo. Some quick research on the net seems to indicate that we might be able
to expect something around 500MB/s throughput on computing SHA1s, does that
seem reasonable? If I really have 2TB of data, should it then take around
66mins to get the SHA1s for all that data? Could my repo clone time really be
dominated by SHA1 math?
Any advice on how to speed up cloning this repo, or what to pursue more
in my investigation?
Thanks,
-Martin
--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation
next reply other threads:[~2019-04-19 21:47 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-19 21:47 Martin Fick [this message]
2019-04-20 3:58 ` Resolving deltas dominates clone time Jeff King
2019-04-20 7:59 ` Ævar Arnfjörð Bjarmason
2019-04-22 15:57 ` Jeff King
2019-04-22 18:01 ` Ævar Arnfjörð Bjarmason
2019-04-22 18:43 ` Jeff King
2019-04-23 7:07 ` Ævar Arnfjörð Bjarmason
2019-04-22 20:21 ` Martin Fick
2019-04-22 20:56 ` Jeff King
2019-04-22 21:02 ` Jeff King
2019-04-22 21:19 ` [PATCH] p5302: create the repo in each index-pack test Jeff King
2019-04-23 1:09 ` Junio C Hamano
2019-04-23 2:07 ` Jeff King
2019-04-23 2:27 ` Junio C Hamano
2019-04-23 2:36 ` Jeff King
2019-04-23 2:40 ` Junio C Hamano
2019-04-22 22:32 ` Resolving deltas dominates clone time Martin Fick
2019-04-23 1:55 ` Jeff King
2019-04-23 4:21 ` Jeff King
2019-04-23 10:08 ` Duy Nguyen
2019-04-23 20:09 ` Martin Fick
2019-04-30 18:02 ` Jeff King
2019-04-30 22:08 ` Martin Fick
2019-04-30 17:50 ` Jeff King
2019-04-30 18:48 ` Ævar Arnfjörð Bjarmason
2019-04-30 20:33 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=259296914.jpyqiltySj@mfick-lnx \
--to=mfick@codeaurora.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.