From: Nicolas Pitre <nico@cam.org>
To: Resul Cetin <Resul-Cetin@gmx.net>
Cc: git@vger.kernel.org, Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
gentoo-scm@gentoo.org
Subject: Re: Optimizing cloning of a high object count repository
Date: Sat, 13 Dec 2008 16:50:52 -0500 (EST) [thread overview]
Message-ID: <alpine.LFD.2.00.0812131636330.30035@xanadu.home> (raw)
In-Reply-To: <alpine.LFD.2.00.0812131347130.30035@xanadu.home>
On Sat, 13 Dec 2008, Nicolas Pitre wrote:
> On Sat, 13 Dec 2008, Resul Cetin wrote:
>
> > On Saturday 13 December 2008 16:46:50 you wrote:
> > [...]
> > > > The size of the linux repository seems to be smaller but in the same
> > > > range object count and repository size but clones are much much faster.
> > > > Is there any way to optimize the server operations like counting and
> > > > compressing of objects to get the same speed as we get from
> > > > git.kernel.org (which does it in nearly no time and the only limiting
> > > > factor seems to be my bandwith)?
> > > > The only other information I have is that Robin H. Johnson made a single
> > > > ~910MiB pack for the whole repository.
> > >
> > > Make yearly packed repository snapshots and publish them via http.
> > > People can wget the latest snapshot, then pull updates later.
> > That would be a workaround but it doesn't explain why git.kernel.org deliveres
> > torvalds repository without any notable counting and compressing time. Maybe
> > it has something todo with the config I found inside the repository:
> > http://git.overlays.gentoo.org/gitroot/exp/gentoo-x86.git/config
> > It says that it isnt a bare repository.
>
> That's not relevant.
>
> The counting time is a bit unfortunate (although I have plans to speed
> that up, if only I can find the time).
>
> You should be able to skip the compression time entirely though, if you
> do repack the repository first. And you want it to be as tightly packed
> as possible for public access. I'm currently cloning it and the
> counting phase is not _that_ bad compared to the compression phase. Try
> something like 'git repack -a -f -d --window=200' and let it run
> overnight if necessary. You need to do this only once, and preferably
> on a machine with lots of RAM, and preferably on a 64-bit machine. Once
> this is done then things should go much more smoothly afterwards.
FYI, I repacked that repository after cloning it, and that operation
required around 2.5G of resident memory. Given the address space
fragmentation, it is possible that a full repack cannot be performed on
a 32-bit machine.
I did 'git repack -a -f -d --window=500 --depth=100'. This took less
than an hour on a quad core machine. The resulting pack is 695MB in
size. That's the amount of data that would be transfered during a
clone of this repository, and nothing would have to be compressed during
the clone as everything is already fully compressed.
Nicolas
prev parent reply other threads:[~2008-12-13 21:52 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-13 15:24 Optimizing cloning of a high object count repository Resul Cetin
2008-12-13 15:46 ` Nguyen Thai Ngoc Duy
2008-12-13 16:14 ` Resul Cetin
2008-12-13 16:44 ` Jean-Luc Herren
2008-12-13 18:20 ` Resul Cetin
2008-12-13 18:56 ` Nicolas Pitre
2008-12-13 21:50 ` Nicolas Pitre [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0812131636330.30035@xanadu.home \
--to=nico@cam.org \
--cc=Resul-Cetin@gmx.net \
--cc=gentoo-scm@gentoo.org \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).