From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Stephen Morton <stephen.c.morton@gmail.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: Git Scaling: What factors most affect Git performance for a large repo?
Date: Fri, 20 Feb 2015 13:09:06 +0100 [thread overview]
Message-ID: <CACBZZX4T38j9YU3eiHTfkDoZKsgyJFrnJQNm5WBmb9RDenDOBg@mail.gmail.com> (raw)
In-Reply-To: <CACsJy8DkS65axQNY70FrfqR5s-49oOn8j7SAE9BTiRVNrm+ohQ@mail.gmail.com>
On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's:
>>
>> * Around 500k commits
>> * Around 100k tags
>> * Around 5k branches
>> * Around 500 commits/day, almost entirely to the same branch
>> * 1.5 GB .git checkout.
>> * Mostly text source, but some binaries (we're trying to cut down[1] on those)
>
> Would be nice if you could make an anonymized version of this repo
> public. Working on a "real" large repo is better than an artificial
> one.
Yeah, I'll try to do that.
>> But actually most of "git fetch" is spent in the reachability check
>> subsequently done by "git-rev-list" which takes several seconds. I
>
> I wonder if reachability bitmap could help here..
I could have sworn I had that enabled already but evidently not. I did
test it and it cut down on clone times a bit. Now our daily repacking
is:
git --git-dir={} gc &&
git --git-dir={} pack-refs --all --prune &&
git --git-dir={} repack -Ad --window=250 --depth=100
--write-bitmap-index --pack-kept-objects &&
It's not clear to me from the documentation whether this should just
be enabled on the server, or the clients too. In any case I've enabled
it on both.
Even then with it enabled on both a "git pull" that pulls down just
one commit on one branch is 13s. Trace attached at the end of the
mail.
>> haven't looked into it but there's got to be room for optimization
>> there, surely it only has to do reachability checks for new refs, or
>> could run in some "I trust this remote not to send me corrupt data"
>> completely mode (which would make sense within a company where you can
>> trust your main Git box).
>
> No, it's not just about trusting the server side, it's about catching
> data corruption on the wire as well. We have a trick to avoid
> reachability check in clone case, which is much more expensive than a
> fetch. Maybe we could do something further to help the fetch case _if_
> reachability bitmaps don't help.
Still, if that's indeed a big bottleneck what's the worst-case
scenario here? That the local repository gets hosed? The server will
still recursively validate the objects it gets sent, right?
I wonder if a better trade-off in that case would be to skip this in
some situations and instead put something like "git fsck" in a
cronjob.
Here's a "git pull" trace mentioned above:
$ time GIT_TRACE=1 git pull
13:06:13.603781 git.c:555 trace: exec: 'git-pull'
13:06:13.603936 run-command.c:351 trace: run_command: 'git-pull'
13:06:13.620615 git.c:349 trace: built-in: git
'rev-parse' '--git-dir'
13:06:13.631602 git.c:349 trace: built-in: git
'rev-parse' '--is-bare-repository'
13:06:13.636103 git.c:349 trace: built-in: git
'rev-parse' '--show-toplevel'
13:06:13.641491 git.c:349 trace: built-in: git 'ls-files' '-u'
13:06:13.719923 git.c:349 trace: built-in: git
'symbolic-ref' '-q' 'HEAD'
13:06:13.728085 git.c:349 trace: built-in: git 'config'
'branch.trunk.rebase'
13:06:13.738160 git.c:349 trace: built-in: git 'config' 'pull.ff'
13:06:13.743286 git.c:349 trace: built-in: git
'rev-parse' '-q' '--verify' 'HEAD'
13:06:13.972091 git.c:349 trace: built-in: git
'rev-parse' '--verify' 'HEAD'
13:06:14.149420 git.c:349 trace: built-in: git
'update-index' '-q' '--ignore-submodules' '--refresh'
13:06:14.294098 git.c:349 trace: built-in: git
'diff-files' '--quiet' '--ignore-submodules'
13:06:14.467711 git.c:349 trace: built-in: git
'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--'
13:06:14.683419 git.c:349 trace: built-in: git
'rev-parse' '-q' '--git-dir'
13:06:15.189707 git.c:349 trace: built-in: git
'rev-parse' '-q' '--verify' 'HEAD'
13:06:15.335948 git.c:349 trace: built-in: git 'fetch'
'--update-head-ok'
13:06:15.691303 run-command.c:351 trace: run_command: 'ssh'
'git.example.com' 'git-upload-pack '\''/gitrepos/core.git'\'''
13:06:17.095662 run-command.c:351 trace: run_command: 'rev-list'
'--objects' '--stdin' '--not' '--all' '--quiet'
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (6/6), done.
3:06:20.426346 run-command.c:351 trace: run_command:
'unpack-objects' '--pack_header=2,6'
13:06:20.431806 exec_cmd.c:130 trace: exec: 'git'
'unpack-objects' '--pack_header=2,6'
13:06:20.437343 git.c:349 trace: built-in: git
'unpack-objects' '--pack_header=2,6'
remote: Total 6 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (6/6), done.
13:06:20.444196 run-command.c:351 trace: run_command: 'rev-list'
'--objects' '--stdin' '--not' '--all'
13:06:20.447135 exec_cmd.c:130 trace: exec: 'git' 'rev-list'
'--objects' '--stdin' '--not' '--all'
13:06:20.451283 git.c:349 trace: built-in: git
'rev-list' '--objects' '--stdin' '--not' '--all'
From ssh://git.example.com/gitrepos/core
02d33d2..41e72c4 core -> origin/core
13:06:22.559609 run-command.c:351 trace: run_command: 'gc' '--auto'
13:06:22.562176 exec_cmd.c:130 trace: exec: 'git' 'gc' '--auto'
13:06:22.565661 git.c:349 trace: built-in: git 'gc' '--auto'
13:06:22.594980 git.c:349 trace: built-in: git
'rev-parse' '-q' '--verify' 'HEAD'
13:06:22.845728 git.c:349 trace: built-in: git
'show-branch' '--merge-base' 'refs/heads/core'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
'02d33d2be7f8601c3502fdd89b0946447d7cdf15'
13:06:23.087586 git.c:349 trace: built-in: git 'fmt-merge-msg'
13:06:23.341451 git.c:349 trace: built-in: git
'rev-parse' '--parseopt' '--stuck-long' '--' '--onto'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
13:06:23.350513 git.c:349 trace: built-in: git
'rev-parse' '--git-dir'
13:06:23.362011 git.c:349 trace: built-in: git
'rev-parse' '--is-bare-repository'
13:06:23.365282 git.c:349 trace: built-in: git
'rev-parse' '--show-toplevel'
13:06:23.372589 git.c:349 trace: built-in: git 'config'
'--bool' 'rebase.stat'
13:06:23.377056 git.c:349 trace: built-in: git 'config'
'--bool' 'rebase.autostash'
13:06:23.382102 git.c:349 trace: built-in: git 'config'
'--bool' 'rebase.autosquash'
13:06:23.389458 git.c:349 trace: built-in: git
'rev-parse' '--verify' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0'
13:06:23.608894 git.c:349 trace: built-in: git
'rev-parse' '--verify' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0'
13:06:23.894026 git.c:349 trace: built-in: git
'symbolic-ref' '-q' 'HEAD'
13:06:23.898918 git.c:349 trace: built-in: git
'rev-parse' '--verify' 'HEAD'
13:06:24.102269 git.c:349 trace: built-in: git
'rev-parse' '--verify' 'HEAD'
13:06:24.338636 git.c:349 trace: built-in: git
'update-index' '-q' '--ignore-submodules' '--refresh'
13:06:24.539912 git.c:349 trace: built-in: git
'diff-files' '--quiet' '--ignore-submodules'
13:06:24.729362 git.c:349 trace: built-in: git
'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--'
13:06:24.938533 git.c:349 trace: built-in: git
'merge-base' '41e72c42addc5075e8009a3eebe914fa0ce98b27'
'02d33d2be7f8601c3502fdd89b0946447d7cdf15'
13:06:25.197791 git.c:349 trace: built-in: git 'diff'
'--stat' '--summary' '02d33d2be7f8601c3502fdd89b0946447d7cdf15'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
[details on updated files]
13:06:25.488275 git.c:349 trace: built-in: git
'checkout' '-q' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0'
13:06:26.467413 git.c:349 trace: built-in: git
'update-ref' 'ORIG_HEAD' '02d33d2be7f8601c3502fdd89b0946447d7cdf15'
Fast-forwarded trunk to 41e72c42addc5075e8009a3eebe914fa0ce98b27.
13:06:26.716256 git.c:349 trace: built-in: git 'rev-parse' 'HEAD'
13:06:26.958595 git.c:349 trace: built-in: git
'update-ref' '-m' 'rebase finished: refs/heads/core onto
41e72c42addc5075e8009a3eebe914fa0ce98b27' 'refs/heads/core'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
'02d33d2be7f8601c3502fdd89b0946447d7cdf15'
13:06:27.205320 git.c:349 trace: built-in: git
'symbolic-ref' '-m' 'rebase finished: returning to refs/heads/core'
'HEAD' 'refs/heads/core'
13:06:27.208748 git.c:349 trace: built-in: git 'gc' '--auto'
next prev parent reply other threads:[~2015-02-20 12:09 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-19 21:26 Git Scaling: What factors most affect Git performance for a large repo? Stephen Morton
2015-02-19 22:21 ` Stefan Beller
2015-02-19 23:06 ` Stephen Morton
2015-02-19 23:15 ` Stefan Beller
2015-02-19 23:29 ` Ævar Arnfjörð Bjarmason
2015-02-20 0:04 ` Duy Nguyen
2015-02-20 12:09 ` Ævar Arnfjörð Bjarmason [this message]
2015-02-20 12:11 ` Ævar Arnfjörð Bjarmason
2015-02-20 14:25 ` Ævar Arnfjörð Bjarmason
2015-02-20 21:04 ` Junio C Hamano
2015-03-02 19:36 ` Ævar Arnfjörð Bjarmason
2015-03-02 20:15 ` Junio C Hamano
2015-02-20 22:02 ` Sebastian Schuberth
2015-02-24 12:44 ` Michael Haggerty
2015-03-02 19:42 ` Ævar Arnfjörð Bjarmason
2015-02-21 3:51 ` Duy Nguyen
2015-02-19 23:38 ` Duy Nguyen
2015-02-20 0:42 ` David Turner
2015-02-20 20:59 ` Junio C Hamano
2015-02-23 20:23 ` David Turner
2015-02-21 4:01 ` Duy Nguyen
2015-02-25 12:02 ` Duy Nguyen
2015-02-20 0:03 ` brian m. carlson
2015-02-20 16:06 ` Stephen Morton
2015-02-20 16:38 ` Matthieu Moy
2015-02-20 17:16 ` brian m. carlson
2015-02-20 22:08 ` Sebastian Schuberth
2015-02-20 22:58 ` brian m. carlson
-- strict thread matches above, loose matches on Subject: below --
2015-02-20 6:57 Martin Fick
2015-02-20 18:29 ` David Turner
2015-02-20 20:37 ` Martin Fick
2015-02-21 0:41 ` David Turner
2015-02-20 19:27 ` Randall S. Becker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACBZZX4T38j9YU3eiHTfkDoZKsgyJFrnJQNm5WBmb9RDenDOBg@mail.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=stephen.c.morton@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).