git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Turner <dturner@twopensource.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Stephen Morton <stephen.c.morton@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: Git Scaling: What factors most affect Git performance for a large repo?
Date: Thu, 19 Feb 2015 19:42:49 -0500	[thread overview]
Message-ID: <1424392969.30029.15.camel@leckie> (raw)
In-Reply-To: <CACsJy8Dortn4fHwF8xSgJ=KoJ9o1qzmc_UyaVq003D2sxFZEuQ@mail.gmail.com>

On Fri, 2015-02-20 at 06:38 +0700, Duy Nguyen wrote:
> >    * 'git push'?
> 
> This one is not affected by how deep your repo's history is, or how
> wide your tree is, so should be quick..
> 
> Ah the number of refs may affect both git-push and git-pull. I think
> Stefan knows better than I in this area.

I can tell you that this is a bit of a problem for us at Twitter.  We
have over 100k refs, which adds ~20MiB of downstream traffic to every
push.

I added a hack to improve this locally inside Twitter: The client sends
a bloom filter of shas that it believes that the server knows about; the
server sends only the sha of master and any refs that are not in the
bloom filter.  The client  uses its local version of the servers' refs
as if they had just been sent.  This means that some packs will be
suboptimal, due to false positives in the bloom filter leading some new
refs to not be sent.  Also, if there were a repack between the pull and
the push, some refs might have been deleted on the server; we repack
rarely enough and pull frequently enough that this is hopefully not an
issue.

We're still testing to see if this works.  But due to the number of
assumptions it makes, it's probably not that great an idea for general
use.

There are probably more complex schemes to compute minimal (or
small-enough) packs; in particular, if the patch is just a few megs off
of master, it's better to just send the whole pack.  That doesn't work
for us because we've got a log-based replication scheme that the pack
appends to, and we don't want the log to get too big; we want
more-minimal packs than that.  But it might work for others.

  reply	other threads:[~2015-02-20  0:42 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-19 21:26 Git Scaling: What factors most affect Git performance for a large repo? Stephen Morton
2015-02-19 22:21 ` Stefan Beller
2015-02-19 23:06   ` Stephen Morton
2015-02-19 23:15     ` Stefan Beller
2015-02-19 23:29 ` Ævar Arnfjörð Bjarmason
2015-02-20  0:04   ` Duy Nguyen
2015-02-20 12:09     ` Ævar Arnfjörð Bjarmason
2015-02-20 12:11       ` Ævar Arnfjörð Bjarmason
2015-02-20 14:25       ` Ævar Arnfjörð Bjarmason
2015-02-20 21:04         ` Junio C Hamano
2015-03-02 19:36           ` Ævar Arnfjörð Bjarmason
2015-03-02 20:15             ` Junio C Hamano
2015-02-20 22:02         ` Sebastian Schuberth
2015-02-24 12:44         ` Michael Haggerty
2015-03-02 19:42           ` Ævar Arnfjörð Bjarmason
2015-02-21  3:51       ` Duy Nguyen
2015-02-19 23:38 ` Duy Nguyen
2015-02-20  0:42   ` David Turner [this message]
2015-02-20 20:59     ` Junio C Hamano
2015-02-23 20:23       ` David Turner
2015-02-21  4:01     ` Duy Nguyen
2015-02-25 12:02       ` Duy Nguyen
2015-02-20  0:03 ` brian m. carlson
2015-02-20 16:06   ` Stephen Morton
2015-02-20 16:38     ` Matthieu Moy
2015-02-20 17:16     ` brian m. carlson
2015-02-20 22:08   ` Sebastian Schuberth
2015-02-20 22:58     ` brian m. carlson
  -- strict thread matches above, loose matches on Subject: below --
2015-02-20  6:57 Martin Fick
2015-02-20 18:29 ` David Turner
2015-02-20 20:37   ` Martin Fick
2015-02-21  0:41     ` David Turner
2015-02-20 19:27 ` Randall S. Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1424392969.30029.15.camel@leckie \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=stephen.c.morton@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).