From: Andreas Ericsson <ae@op5.se>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Andy Parkins <andyparkins@gmail.com>, git@vger.kernel.org
Subject: Re: git-fetching from a big repository is slow
Date: Thu, 14 Dec 2006 16:06:05 +0100 [thread overview]
Message-ID: <4581685D.1070407@op5.se> (raw)
In-Reply-To: <Pine.LNX.4.63.0612141513130.3635@wbgn013.biozentrum.uni-wuerzburg.de>
Johannes Schindelin wrote:
> Hi,
>
> On Thu, 14 Dec 2006, Andreas Ericsson wrote:
>
>> Andy Parkins wrote:
>>> Hello,
>>>
>>> I've got a big repository. I've got two computers. One has the repository
>>> up-to-date (164M after repack); one is behind (30M ish).
>>>
>>> I used git-fetch to try and update; and the sync took HOURS. I zipped the
>>> .git directory and transferred that and it took about 15 minutes to
>>> transfer.
>>>
>>> Am I doing something wrong? The git-fetch was done with a git+ssh:// URL.
>>> The zip transfer with scp (so ssh shouldn't be a factor).
>>>
>> This seems to happen if your repository consists of many large binary files,
>> especially many large binary files of several versions that do not deltify
>> well against each other. Perhaps it's worth adding gzip compression detecion
>> to git? I imagine more people than me are tracking gzipped/bzip2'ed content
>> that pretty much never deltifies well against anything else.
>
> Or we add something like the heuristics we discovered in another thread,
> where rename detection (which is related to delta candidate searching) is
> not started if the sizes differ drastically.
>
It wouldn't work for this particular case though. In our distribution
repository we have ~300 bzip2 compressed tarballs with an average size
of 3MiB. 240 of those are between 2.5 and 4 MiB, so they don't
drastically differ, but neither do they delta well.
One option would be to add some sort of config option to skip attempting
deltas of files with a certain suffix. That way we could just tell it to
ignore *.gz,*.tgz,*.bz2 and everything would work just as it does today,
but a lot faster.
--
Andreas Ericsson andreas.ericsson@op5.se
OP5 AB www.op5.se
next prev parent reply other threads:[~2006-12-14 15:06 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-12-14 13:40 git-fetching from a big repository is slow Andy Parkins
2006-12-14 13:53 ` Andreas Ericsson
2006-12-14 14:14 ` Johannes Schindelin
2006-12-14 15:06 ` Andreas Ericsson [this message]
2006-12-14 19:05 ` Geert Bosch
2006-12-14 19:46 ` Shawn Pearce
2006-12-14 22:12 ` Horst H. von Brand
2006-12-14 22:38 ` Shawn Pearce
2006-12-15 21:49 ` Pazu
2006-12-16 13:32 ` Robin Rosenberg
2006-12-14 23:01 ` Geert Bosch
2006-12-14 23:15 ` Johannes Schindelin
2006-12-14 23:29 ` Shawn Pearce
2006-12-15 0:07 ` Johannes Schindelin
2006-12-15 0:42 ` Shawn Pearce
2006-12-15 2:26 ` Nicolas Pitre
2006-12-14 22:28 ` Andreas Ericsson
2006-12-14 15:18 ` Andy Parkins
2006-12-14 15:45 ` Han-Wen Nienhuys
2006-12-14 16:20 ` Andy Parkins
2006-12-14 16:34 ` Johannes Schindelin
2006-12-14 20:41 ` Junio C Hamano
2006-12-14 23:26 ` Johannes Schindelin
2006-12-15 0:38 ` Junio C Hamano
2006-12-14 18:14 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4581685D.1070407@op5.se \
--to=ae@op5.se \
--cc=Johannes.Schindelin@gmx.de \
--cc=andyparkins@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.