git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Resul Cetin <Resul-Cetin@gmx.net>
Cc: git@vger.kernel.org, Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	gentoo-scm@gentoo.org
Subject: Re: Optimizing cloning of a high object count repository
Date: Sat, 13 Dec 2008 16:50:52 -0500 (EST)	[thread overview]
Message-ID: <alpine.LFD.2.00.0812131636330.30035@xanadu.home> (raw)
In-Reply-To: <alpine.LFD.2.00.0812131347130.30035@xanadu.home>

On Sat, 13 Dec 2008, Nicolas Pitre wrote:

> On Sat, 13 Dec 2008, Resul Cetin wrote:
> 
> > On Saturday 13 December 2008 16:46:50 you wrote:
> > [...]
> > > >  The size of the linux repository seems to be smaller but in the same
> > > > range object count and repository size but clones are much much faster.
> > > > Is there any way to optimize the server operations like counting and
> > > > compressing of objects to get the same speed as we get from
> > > > git.kernel.org (which does it in nearly no time and the only limiting
> > > > factor seems to be my bandwith)?
> > > >  The only other information I have is that Robin H. Johnson made a single
> > > >  ~910MiB pack for the whole repository.
> > >
> > > Make yearly packed repository snapshots and publish them via http.
> > > People can wget the latest snapshot, then pull updates later.
> > That would be a workaround but it doesn't explain why git.kernel.org deliveres 
> > torvalds repository without any notable counting and compressing time. Maybe 
> > it has something todo with the config I found inside the repository:
> > http://git.overlays.gentoo.org/gitroot/exp/gentoo-x86.git/config
> > It says that it isnt a bare repository.
> 
> That's not relevant.
> 
> The counting time is a bit unfortunate (although I have plans to speed 
> that up, if only I can find the time).
> 
> You should be able to skip the compression time entirely though, if you 
> do repack the repository first.  And you want it to be as tightly packed 
> as possible for public access.  I'm currently cloning it and the 
> counting phase is not _that_ bad compared to the compression phase.  Try 
> something like 'git repack -a -f -d --window=200' and let it run 
> overnight if necessary.  You need to do this only once, and preferably 
> on a machine with lots of RAM, and preferably on a 64-bit machine.  Once 
> this is done then things should go much more smoothly afterwards.

FYI, I repacked that repository after cloning it, and that operation 
required around 2.5G of resident memory.  Given the address space 
fragmentation, it is possible that a full repack cannot be performed on 
a 32-bit machine.

I did 'git repack -a -f -d --window=500 --depth=100'.  This took less 
than an hour on a quad core machine.  The resulting pack is 695MB in 
size.  That's the amount of data that would be transfered during a 
clone of this repository, and nothing would have to be compressed during 
the clone as everything is already fully compressed.


Nicolas

      reply	other threads:[~2008-12-13 21:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-13 15:24 Optimizing cloning of a high object count repository Resul Cetin
2008-12-13 15:46 ` Nguyen Thai Ngoc Duy
2008-12-13 16:14   ` Resul Cetin
2008-12-13 16:44     ` Jean-Luc Herren
2008-12-13 18:20       ` Resul Cetin
2008-12-13 18:56     ` Nicolas Pitre
2008-12-13 21:50       ` Nicolas Pitre [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.00.0812131636330.30035@xanadu.home \
    --to=nico@cam.org \
    --cc=Resul-Cetin@gmx.net \
    --cc=gentoo-scm@gentoo.org \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).