From: Petr Baudis <pasky@ucw.cz>
To: Sylvain Beucler <beuc@gnu.org>
Cc: savannah-hackers-public@gnu.org, git@vger.kernel.org
Subject: Re: Git hosting techniques
Date: Sat, 4 Nov 2006 13:08:45 +0100 [thread overview]
Message-ID: <20061104120845.GA18879@pasky.or.cz> (raw)
In-Reply-To: <20061029175446.GE12285@localhost.localdomain>
Hi,
cc'ing git@vger.kernel.org since this might be interesting for other
Git people as well.
On Sun, Oct 29, 2006 at 06:54:46PM CET, Sylvain Beucler wrote:
> We're currently setting up something similar at
> http://cvs.sv.gnu.org/gitweb/,
That's great!
> I would like to know if you considered the ability to autopack
> repositories to optimize space and disk i/o. For example, we're
> experimenting with the coreutils repository which weighs 1.1GB. Since
> you mirror the glibc repository, maybe you have similar issues?
currently I do it in a rather silly way and when I do an "all-repo
check" every hour (which updates mirrors of external repositories etc.)
- I also check for unpacked objects and if there are any, I will repack
the repository; see
http://repo.or.cz/w/repo.git?a=blob;f=updatecheck.sh;hb=HEAD
This is not an optimal behaviour, for two reasons:
(i) Full repack can be a lot of work on large repositories, so we
shouldn't *always* repack but more importantly, we should only rarely do
a full repack - see below.
(ii) This is very unfriendly to those who fetch over HTTP, because
after you do a full repack, they will need to download the whole new
packfile instead of just the missing objects.
The best solution would be to have a more intelligent repacking
strategy, where you have "archival" packs with very old history and an
active pack with just the new changes, and when you pack the loose
objects they just get appended to the "current" pack. Alternatively,
a slightly more complicated but even more flexible "logarithmic"
repacking strategy could be implemented, see
http://news.gmane.org/find-root.php?message_id=<20051112135947.GC30496@pasky.or.cz>
Even with the dumb packing strategy though, I think it pays off if you
have at least a bit of CPU power to spare. The packing saving are
really immense. For example with the glibc repository, an incremental
CVS import worth of few days of changes _doubled_ the size of the
repository (from 100M to 200M), while repacking brought it back to the
original size (100M) + epsilon.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
parent reply other threads:[~2006-11-04 12:08 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <20061029175446.GE12285@localhost.localdomain>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20061104120845.GA18879@pasky.or.cz \
--to=pasky@ucw.cz \
--cc=beuc@gnu.org \
--cc=git@vger.kernel.org \
--cc=savannah-hackers-public@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).