git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sam Vilain <sam@vilain.net>
To: Philip Oakley <philipoakley@iee.org>,
	John Fisher <fishook2033@gmail.com>,
	git-users@googlegroups.com
Cc: Git List <git@vger.kernel.org>
Subject: Re: [git-users] worlds slowest git repo- what to do?
Date: Thu, 15 May 2014 12:48:29 -0700	[thread overview]
Message-ID: <53751A0D.2020702@vilain.net> (raw)
In-Reply-To: <06A2490FC9BC4461A39B982D3C7C85F7@PhilipOakley>

On 05/15/2014 12:06 PM, Philip Oakley wrote:
> From: "John Fisher" <fishook2033@gmail.com>
>> I assert based on one piece of evidence ( a post from a facebook dev)
>> that I now have the worlds biggest and slowest git
>> repository, and I am not a happy guy. I used to have the worlds
>> biggest CVS repository, but CVS can't handle multi-G
>> sized files. So I moved the repo to git, because we are using that
>> for our new projects.
>>
>> goal:
>> keep 150 G of files (mostly binary) from tiny sized to over 8G in a
>> version-control system.
>>
>> problem:
>> git is absurdly slow, think hours, on fast hardware.
>>
>> question:
>> any suggestions beyond these-
>> http://git-annex.branchable.com/
>> https://github.com/jedbrown/git-fat
>> https://github.com/schacon/git-media
>> http://code.google.com/p/boar/
>> subversion
>>

You could shard.  Break the problem up into smaller repositories, eg via
submodules.  Try ~128 shards and I'd expect that 129 small clones should
complete faster than a single 150G clone, as well as being resumable etc.

The first challenge will be figuring out what to shard on, and how to
lay out the repository.  You could have all of the large files in their
own directory, and then the main repository just has symlinks into the
sharded area.  In that case, I would recommend sharding by date of the
introduced blob, so that there's a good chance you won't need to clone
everything forever; as shards with not many files for the current
version could in theory be retired.  Or, if the directory structure
already suits it, you could "directly" use submodules.

The second challenge will be writing the filter-branch script for this :-)

Good luck,
Sam

  reply	other threads:[~2014-05-15 19:56 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <5374F7C6.5030205@gmail.com>
2014-05-15 19:06 ` worlds slowest git repo- what to do? Philip Oakley
2014-05-15 19:48   ` Sam Vilain [this message]
2014-05-16 10:13   ` [git-users] " Duy Nguyen
     [not found]     ` <CACsJy8CmiW88tNavRphZa_uMU=jVUCQE6cw5+t2AYnf5dDmcsQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-05-16 21:22       ` John Fisher
2014-05-17  1:49         ` [git-users] " Duy Nguyen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53751A0D.2020702@vilain.net \
    --to=sam@vilain.net \
    --cc=fishook2033@gmail.com \
    --cc=git-users@googlegroups.com \
    --cc=git@vger.kernel.org \
    --cc=philipoakley@iee.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).