git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Ericsson <ae@op5.se>
To: Geert Bosch <bosch@adacore.com>
Cc: Johannes Schindelin <Johannes.Schindelin@gmx.de>,
	Andy Parkins <andyparkins@gmail.com>,
	git@vger.kernel.org
Subject: Re: git-fetching from a big repository is slow
Date: Thu, 14 Dec 2006 23:28:46 +0100	[thread overview]
Message-ID: <4581D01E.9020806@op5.se> (raw)
In-Reply-To: <C287764F-6755-4291-A87A-3E8816E90B49@adacore.com>

Geert Bosch wrote:
> 
> On Dec 14, 2006, at 10:06, Andreas Ericsson wrote:
> 
>> It wouldn't work for this particular case though. In our distribution 
>> repository we have ~300 bzip2 compressed tarballs with an average size 
>> of 3MiB. 240 of those are between 2.5 and 4 MiB, so they don't 
>> drastically differ, but neither do they delta well.
>>
>> One option would be to add some sort of config option to skip 
>> attempting deltas of files with a certain suffix. That way we could 
>> just tell it to ignore *.gz,*.tgz,*.bz2 and everything would work just 
>> as it does today, but a lot faster.
> 
> Such special magic based on filenames is always a bad idea. Tomorrow 
> somebody
> comes with .zip files (oh, and of course .ZIP), then it's .jpg's other
> compressed content. In the end git will be doing lots of magic and still 
> perform
> badly on unknown compressed content.
> 

Hence config option. People can tell git to skip trying to delta 
whatever they want. For this particular mothership repo, we only ever 
work against it when we're at the office, meaning resulting datasize is 
not an issue, but data computation can be a real bottle-neck.

> There is a very simple way of detecting compressed files: just look at the
> size of the compressed blob and compare against the size of the expanded 
> blob.
> If the compressed blob has a non-trivial size which is close to the 
> expanded
> size, assume the file is not interesting as source or target for deltas.
> 
> Example:
>    if (compressed_size > expanded_size / 4 * 3 + 1024) {
>      /* don't try to deltify if blob doesn't compress well */
>      return ...;
>    }
> 

Many compression algorithms generate similar output for similar input. 
Most source-code projects change relatively little between releases, so 
they *could* delta well, it's just that in our repo they don't.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se

  parent reply	other threads:[~2006-12-14 22:29 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-12-14 13:40 git-fetching from a big repository is slow Andy Parkins
2006-12-14 13:53 ` Andreas Ericsson
2006-12-14 14:14   ` Johannes Schindelin
2006-12-14 15:06     ` Andreas Ericsson
2006-12-14 19:05       ` Geert Bosch
2006-12-14 19:46         ` Shawn Pearce
2006-12-14 22:12           ` Horst H. von Brand
2006-12-14 22:38             ` Shawn Pearce
2006-12-15 21:49               ` Pazu
2006-12-16 13:32                 ` Robin Rosenberg
2006-12-14 23:01           ` Geert Bosch
2006-12-14 23:15           ` Johannes Schindelin
2006-12-14 23:29             ` Shawn Pearce
2006-12-15  0:07               ` Johannes Schindelin
2006-12-15  0:42                 ` Shawn Pearce
2006-12-15  2:26             ` Nicolas Pitre
2006-12-14 22:28         ` Andreas Ericsson [this message]
2006-12-14 15:18   ` Andy Parkins
2006-12-14 15:45     ` Han-Wen Nienhuys
2006-12-14 16:20       ` Andy Parkins
2006-12-14 16:34         ` Johannes Schindelin
2006-12-14 20:41           ` Junio C Hamano
2006-12-14 23:26             ` Johannes Schindelin
2006-12-15  0:38               ` Junio C Hamano
2006-12-14 18:14   ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4581D01E.9020806@op5.se \
    --to=ae@op5.se \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=andyparkins@gmail.com \
    --cc=bosch@adacore.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).