git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Duy Nguyen" <pclouds@gmail.com>
Cc: Stephen Morton <stephen.c.morton@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Git Scaling: What factors most affect Git performance for a large repo?
Date: Tue, 24 Feb 2015 13:44:49 +0100	[thread overview]
Message-ID: <54EC7241.7000500@alum.mit.edu> (raw)
In-Reply-To: <CACBZZX45eCo6YS4EpHvMQjN32+-w5BztfoLiwh_rJTs7FydgoQ@mail.gmail.com>

On 02/20/2015 03:25 PM, Ævar Arnfjörð Bjarmason wrote:
> On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen <pclouds@gmail.com> wrote:
>>> On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
>>> <avarab@gmail.com> wrote:
>>>> Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's:
>>>>
>>>>  * Around 500k commits
>>>>  * Around 100k tags
>>>>  * Around 5k branches
>>>>  * Around 500 commits/day, almost entirely to the same branch
>>>>  * 1.5 GB .git checkout.
>>>>  * Mostly text source, but some binaries (we're trying to cut down[1] on those)
>>>
>>> Would be nice if you could make an anonymized version of this repo
>>> public. Working on a "real" large repo is better than an artificial
>>> one.
>>
>> Yeah, I'll try to do that.
> 
> tl;dr: After some more testing it turns out the performance issues we
> have are almost entirely due to the number of refs. Some of these I
> knew about and were obvious (e..g. git pull), but some aren't so
> obvious (why does "git log" without "--all" slow down as a function of
> the overall number of refs?).

I'm assuming that you pack your references periodically. (If not, you
should, because reading lots of loose references is very expensive for
the commands that need to iterate over all references!)

On the other hand, packed refs also have a downside, namely that
whenever even a single packed reference has to be read, the whole
packed-refs file has to be read and parsed. One way that this can bite
you, even with innocuous-seeming commands, is if you haven't disabled
the use of replace references (i.e., using "git --no-replace-objects
<CMD>" or GIT_NO_REPLACE_OBJECTS). In that case, almost any Git command
has to read the "refs/replace/*" namespace, which, in turn, forces the
whole packed-refs file to be read and parsed. This can take a
significant amount of time if you have a very large number of references.

So try your experiments with replace references disabled. If that helps,
consider disabling them on your server if you don't need them.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

  parent reply	other threads:[~2015-02-24 12:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-19 21:26 Git Scaling: What factors most affect Git performance for a large repo? Stephen Morton
2015-02-19 22:21 ` Stefan Beller
2015-02-19 23:06   ` Stephen Morton
2015-02-19 23:15     ` Stefan Beller
2015-02-19 23:29 ` Ævar Arnfjörð Bjarmason
2015-02-20  0:04   ` Duy Nguyen
2015-02-20 12:09     ` Ævar Arnfjörð Bjarmason
2015-02-20 12:11       ` Ævar Arnfjörð Bjarmason
2015-02-20 14:25       ` Ævar Arnfjörð Bjarmason
2015-02-20 21:04         ` Junio C Hamano
2015-03-02 19:36           ` Ævar Arnfjörð Bjarmason
2015-03-02 20:15             ` Junio C Hamano
2015-02-20 22:02         ` Sebastian Schuberth
2015-02-24 12:44         ` Michael Haggerty [this message]
2015-03-02 19:42           ` Ævar Arnfjörð Bjarmason
2015-02-21  3:51       ` Duy Nguyen
2015-02-19 23:38 ` Duy Nguyen
2015-02-20  0:42   ` David Turner
2015-02-20 20:59     ` Junio C Hamano
2015-02-23 20:23       ` David Turner
2015-02-21  4:01     ` Duy Nguyen
2015-02-25 12:02       ` Duy Nguyen
2015-02-20  0:03 ` brian m. carlson
2015-02-20 16:06   ` Stephen Morton
2015-02-20 16:38     ` Matthieu Moy
2015-02-20 17:16     ` brian m. carlson
2015-02-20 22:08   ` Sebastian Schuberth
2015-02-20 22:58     ` brian m. carlson
  -- strict thread matches above, loose matches on Subject: below --
2015-02-20  6:57 Martin Fick
2015-02-20 18:29 ` David Turner
2015-02-20 20:37   ` Martin Fick
2015-02-21  0:41     ` David Turner
2015-02-20 19:27 ` Randall S. Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54EC7241.7000500@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=stephen.c.morton@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).