Optimizing cloning of a high object count repository

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Optimizing cloning of a high object count repository
@ 2008-12-13 15:24 Resul Cetin
  2008-12-13 15:46 ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 7+ messages in thread
From: Resul Cetin @ 2008-12-13 15:24 UTC (permalink / raw)
  To: git; +Cc: gentoo-scm

Hi,
there are currently different ideas to move gentoo's cvs repository to an 
other scm. Current tests showed that svn will not make anything better (it 
gets in most perfomance and size based benchmarks even worse). Another idea is 
to move to git. It looks really promising in size based benchmarks but cloning 
seems nearly impossible. The current test repository is available at 
git://git.overlays.gentoo.org/exp/gentoo-x86.git and is around 900MB in size 
and has 4696137 objects. It really takes ages to do the counting of the 
objects on the server and compressing takes much longer.
The size of the linux repository seems to be smaller but in the same range 
object count and repository size but clones are much much faster. Is there any 
way to optimize the server operations like counting and compressing of objects 
to get the same speed as we get from git.kernel.org (which does it in nearly 
no time and the only limiting factor seems to be my bandwith)?
The only other information I have is that Robin H. Johnson made a single 
~910MiB pack for the whole repository.

Thx in advance,
	Resul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimizing cloning of a high object count repository
  2008-12-13 15:24 Optimizing cloning of a high object count repository Resul Cetin
@ 2008-12-13 15:46 ` Nguyen Thai Ngoc Duy
  2008-12-13 16:14   ` Resul Cetin
  0 siblings, 1 reply; 7+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2008-12-13 15:46 UTC (permalink / raw)
  To: Resul Cetin; +Cc: git, gentoo-scm

On 12/13/08, Resul Cetin <Resul-Cetin@gmx.net> wrote:
> Hi,
>  there are currently different ideas to move gentoo's cvs repository to an
>  other scm. Current tests showed that svn will not make anything better (it
>  gets in most perfomance and size based benchmarks even worse). Another idea is
>  to move to git. It looks really promising in size based benchmarks but cloning
>  seems nearly impossible. The current test repository is available at
>  git://git.overlays.gentoo.org/exp/gentoo-x86.git and is around 900MB in size
>  and has 4696137 objects. It really takes ages to do the counting of the
>  objects on the server and compressing takes much longer.
>  The size of the linux repository seems to be smaller but in the same range
>  object count and repository size but clones are much much faster. Is there any
>  way to optimize the server operations like counting and compressing of objects
>  to get the same speed as we get from git.kernel.org (which does it in nearly
>  no time and the only limiting factor seems to be my bandwith)?
>  The only other information I have is that Robin H. Johnson made a single
>  ~910MiB pack for the whole repository.

Make yearly packed repository snapshots and publish them via http.
People can wget the latest snapshot, then pull updates later.
-- 
Duy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimizing cloning of a high object count repository
  2008-12-13 15:46 ` Nguyen Thai Ngoc Duy
@ 2008-12-13 16:14   ` Resul Cetin
  2008-12-13 16:44     ` Jean-Luc Herren
  2008-12-13 18:56     ` Nicolas Pitre
  0 siblings, 2 replies; 7+ messages in thread
From: Resul Cetin @ 2008-12-13 16:14 UTC (permalink / raw)
  To: git; +Cc: Nguyen Thai Ngoc Duy, gentoo-scm

On Saturday 13 December 2008 16:46:50 you wrote:
[...]
> >  The size of the linux repository seems to be smaller but in the same
> > range object count and repository size but clones are much much faster.
> > Is there any way to optimize the server operations like counting and
> > compressing of objects to get the same speed as we get from
> > git.kernel.org (which does it in nearly no time and the only limiting
> > factor seems to be my bandwith)?
> >  The only other information I have is that Robin H. Johnson made a single
> >  ~910MiB pack for the whole repository.
>
> Make yearly packed repository snapshots and publish them via http.
> People can wget the latest snapshot, then pull updates later.
That would be a workaround but it doesn't explain why git.kernel.org deliveres 
torvalds repository without any notable counting and compressing time. Maybe 
it has something todo with the config I found inside the repository:
http://git.overlays.gentoo.org/gitroot/exp/gentoo-x86.git/config
It says that it isnt a bare repository.
Before I forget. I was wrong that it is a single 910mb file. Somebody seems to 
have repacked it into 7 single packs.

Regards,
	Resul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimizing cloning of a high object count repository
  2008-12-13 16:14   ` Resul Cetin
@ 2008-12-13 16:44     ` Jean-Luc Herren
  2008-12-13 18:20       ` Resul Cetin
  2008-12-13 18:56     ` Nicolas Pitre
  1 sibling, 1 reply; 7+ messages in thread
From: Jean-Luc Herren @ 2008-12-13 16:44 UTC (permalink / raw)
  To: git; +Cc: Resul Cetin, Nguyen Thai Ngoc Duy, gentoo-scm

Resul Cetin wrote:
> That would be a workaround but it doesn't explain why git.kernel.org deliveres 
> torvalds repository without any notable counting and compressing time.

If I remember right, git.kernel.org is a quite beefy machine.  But
then again it has a lot more traffic too.  It might be interesting
to know what machine you're on, compared to git.kernel.org.

jlh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimizing cloning of a high object count repository
  2008-12-13 16:44     ` Jean-Luc Herren
@ 2008-12-13 18:20       ` Resul Cetin
  0 siblings, 0 replies; 7+ messages in thread
From: Resul Cetin @ 2008-12-13 18:20 UTC (permalink / raw)
  To: git; +Cc: Jean-Luc Herren, Nguyen Thai Ngoc Duy, gentoo-scm

On Saturday 13 December 2008 17:44:07 you wrote:
> Resul Cetin wrote:
> > That would be a workaround but it doesn't explain why git.kernel.org
> > deliveres torvalds repository without any notable counting and
> > compressing time.
>
> If I remember right, git.kernel.org is a quite beefy machine.  But
> then again it has a lot more traffic too.  It might be interesting
> to know what machine you're on, compared to git.kernel.org.
I dont know what type of machine git.overlay.g.o is but my athlon64 3500+ with 
4GB ram has exactly the same problem without any other load. I made a clone  
over http and did no other changes to the repository until now.

http://git.overlays.gentoo.org/gitroot/exp/gentoo-x86.git/ is the http clone 
url.

I will try some stuff to reduce the time spend before sending anything..... If 
anyone has some ideas how to do that....

Regards,
	Resul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimizing cloning of a high object count repository
  2008-12-13 16:14   ` Resul Cetin
  2008-12-13 16:44     ` Jean-Luc Herren
@ 2008-12-13 18:56     ` Nicolas Pitre
  2008-12-13 21:50       ` Nicolas Pitre
  1 sibling, 1 reply; 7+ messages in thread
From: Nicolas Pitre @ 2008-12-13 18:56 UTC (permalink / raw)
  To: Resul Cetin; +Cc: git, Nguyen Thai Ngoc Duy, gentoo-scm

On Sat, 13 Dec 2008, Resul Cetin wrote:

> On Saturday 13 December 2008 16:46:50 you wrote:
> [...]
> > >  The size of the linux repository seems to be smaller but in the same
> > > range object count and repository size but clones are much much faster.
> > > Is there any way to optimize the server operations like counting and
> > > compressing of objects to get the same speed as we get from
> > > git.kernel.org (which does it in nearly no time and the only limiting
> > > factor seems to be my bandwith)?
> > >  The only other information I have is that Robin H. Johnson made a single
> > >  ~910MiB pack for the whole repository.
> >
> > Make yearly packed repository snapshots and publish them via http.
> > People can wget the latest snapshot, then pull updates later.
> That would be a workaround but it doesn't explain why git.kernel.org deliveres 
> torvalds repository without any notable counting and compressing time. Maybe 
> it has something todo with the config I found inside the repository:
> http://git.overlays.gentoo.org/gitroot/exp/gentoo-x86.git/config
> It says that it isnt a bare repository.

That's not relevant.

The counting time is a bit unfortunate (although I have plans to speed 
that up, if only I can find the time).

You should be able to skip the compression time entirely though, if you 
do repack the repository first.  And you want it to be as tightly packed 
as possible for public access.  I'm currently cloning it and the 
counting phase is not _that_ bad compared to the compression phase.  Try 
something like 'git repack -a -f -d --window=200' and let it run 
overnight if necessary.  You need to do this only once, and preferably 
on a machine with lots of RAM, and preferably on a 64-bit machine.  Once 
this is done then things should go much more smoothly afterwards.

Nicolas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Optimizing cloning of a high object count repository
  2008-12-13 18:56     ` Nicolas Pitre
@ 2008-12-13 21:50       ` Nicolas Pitre
  0 siblings, 0 replies; 7+ messages in thread
From: Nicolas Pitre @ 2008-12-13 21:50 UTC (permalink / raw)
  To: Resul Cetin; +Cc: git, Nguyen Thai Ngoc Duy, gentoo-scm

On Sat, 13 Dec 2008, Nicolas Pitre wrote:

> On Sat, 13 Dec 2008, Resul Cetin wrote:
> 
> > On Saturday 13 December 2008 16:46:50 you wrote:
> > [...]
> > > >  The size of the linux repository seems to be smaller but in the same
> > > > range object count and repository size but clones are much much faster.
> > > > Is there any way to optimize the server operations like counting and
> > > > compressing of objects to get the same speed as we get from
> > > > git.kernel.org (which does it in nearly no time and the only limiting
> > > > factor seems to be my bandwith)?
> > > >  The only other information I have is that Robin H. Johnson made a single
> > > >  ~910MiB pack for the whole repository.
> > >
> > > Make yearly packed repository snapshots and publish them via http.
> > > People can wget the latest snapshot, then pull updates later.
> > That would be a workaround but it doesn't explain why git.kernel.org deliveres 
> > torvalds repository without any notable counting and compressing time. Maybe 
> > it has something todo with the config I found inside the repository:
> > http://git.overlays.gentoo.org/gitroot/exp/gentoo-x86.git/config
> > It says that it isnt a bare repository.
> 
> That's not relevant.
> 
> The counting time is a bit unfortunate (although I have plans to speed 
> that up, if only I can find the time).
> 
> You should be able to skip the compression time entirely though, if you 
> do repack the repository first.  And you want it to be as tightly packed 
> as possible for public access.  I'm currently cloning it and the 
> counting phase is not _that_ bad compared to the compression phase.  Try 
> something like 'git repack -a -f -d --window=200' and let it run 
> overnight if necessary.  You need to do this only once, and preferably 
> on a machine with lots of RAM, and preferably on a 64-bit machine.  Once 
> this is done then things should go much more smoothly afterwards.

FYI, I repacked that repository after cloning it, and that operation 
required around 2.5G of resident memory.  Given the address space 
fragmentation, it is possible that a full repack cannot be performed on 
a 32-bit machine.

I did 'git repack -a -f -d --window=500 --depth=100'.  This took less 
than an hour on a quad core machine.  The resulting pack is 695MB in 
size.  That's the amount of data that would be transfered during a 
clone of this repository, and nothing would have to be compressed during 
the clone as everything is already fully compressed.


Nicolas

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-12-13 21:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-13 15:24 Optimizing cloning of a high object count repository Resul Cetin
2008-12-13 15:46 ` Nguyen Thai Ngoc Duy
2008-12-13 16:14   ` Resul Cetin
2008-12-13 16:44     ` Jean-Luc Herren
2008-12-13 18:20       ` Resul Cetin
2008-12-13 18:56     ` Nicolas Pitre
2008-12-13 21:50       ` Nicolas Pitre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).