git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: Huge win, compressing a window of delta runs as a unit
Date: Fri, 18 Aug 2006 12:25:57 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0608181057440.11359@localhost.localdomain> (raw)
In-Reply-To: <9e4733910608180615q4895334bw57c55e59a4ac5482@mail.gmail.com>

On Fri, 18 Aug 2006, Jon Smirl wrote:

> On 8/18/06, Nicolas Pitre <nico@cam.org> wrote:
> > A better way to get such a size saving is to increase the window and
> > depth parameters.  For example, a window of 20 and depth of 20 can
> > usually provide a pack size saving greater than 11% with none of the
> > disadvantages mentioned above.
> 
> Our window size is effectively infinite. I am handing him all of the
> revisions from a single file in optimal order. This includes branches.

In GIT packing terms this is infinite delta _depth_ not _window_.

> He takes these revisions, runs xdiff on them, and then puts the entire
> result into a single zlib blob.

This is not a good idea to have infinite delta depth.  The time to 
browse the repository history then becomes exponential with the number 
of revisions making the value of such a repository a bit questionnable 
(you could as well only preserve the last 2 years of history instead 
since further than that with infinite delta depth is likely to be too 
slow and painful to use).

But just for comparison I did a repack -a -f on the kernel repository 
with --window=50 --depth=5000 which should be a good approximation of 
the best possible delta matchingwith infinite depth.

Default delta params (window=10 depth=10) : 122103455 
Agressive deltas (window=50 depth=5000) : 105870516
Reduction : 13%

OK let's try it with delta chains in the same zlib stream using the 
patch I posted yesterday (with a minor tweak allowing the usage of -f 
with git-repack).

Agressive and grouped deltas (window=50 depth=5000 : 99860685

This is a mere 5.7% reduction over the non grouped deltas, less than the 
11% reduction I obtained yesterday when the delta depth is kept 
reasonably short.

The increased delta depth is likely to make a large difference on old 
repos with long history, maybe more so and with much less 
complexity than the delta grouping.

> I suspect the size reduction is directly proportional to the age of
> the repository. The kernel repository only has three years worth of
> data in it.  Linus has the full history in another repository that is
> not in general distribution. We can get it from him when he gets back
> from vacation.
> 
> If the repository doesn't contain long delta chains the optimization
> doesn't help that much. On the other hand it doesn't hurt either since
> the chains weren't long.  My repository is four times as old as the
> kernel one and I am getting 4x the benefit.

No that cannot be right.

Let's assume every whole objects are 10 in size and every deltas are 1.  
You therefore can have 1 base object and 10 delta objects, effectively 
storing 11 objects for a size of 20.  You therefore have a 1.8 vs 10 
size ratio.

If the delta depth is 100 then you potentially have 1 base object and 
100 deltas for a size ratio of 1.1 vs 10.

If the delta depth is 1000 the ratiobecomes 1.01 vs 10.

The size saving is therefore _not_ proportional with the age of the 
repository.  It rather tend to be asymptotic with the delta ratio (but 
impose an exponential runtime cost when fetching objects out of it).

The fact that your 4x old repository has 
a 4x size saving 
can be due only to packing malfunction I would say.


Nicolas

  parent reply	other threads:[~2006-08-18 16:26 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-16 17:20 Huge win, compressing a window of delta runs as a unit Jon Smirl
2006-08-17  4:07 ` Shawn Pearce
2006-08-17  7:56   ` Johannes Schindelin
2006-08-17  8:07     ` Johannes Schindelin
2006-08-17 14:36       ` Jon Smirl
2006-08-17 15:45         ` Johannes Schindelin
2006-08-17 16:33           ` Nicolas Pitre
2006-08-17 17:05             ` Johannes Schindelin
2006-08-17 17:22             ` Jon Smirl
2006-08-17 18:15               ` Nicolas Pitre
2006-08-17 17:17           ` Jon Smirl
2006-08-17 17:32             ` Nicolas Pitre
2006-08-17 18:06               ` Jon Smirl
2006-08-17 17:22   ` Nicolas Pitre
2006-08-17 18:03     ` Jon Smirl
2006-08-17 18:24       ` Nicolas Pitre
2006-08-18  4:03 ` Nicolas Pitre
2006-08-18 12:53   ` Jon Smirl
2006-08-18 16:30     ` Nicolas Pitre
2006-08-18 16:56       ` Jon Smirl
2006-08-21  3:45         ` Nicolas Pitre
2006-08-21  6:46           ` Shawn Pearce
2006-08-21 10:24             ` Jakub Narebski
2006-08-21 16:23             ` Jon Smirl
2006-08-18 13:15   ` Jon Smirl
2006-08-18 13:36     ` Johannes Schindelin
2006-08-18 13:50       ` Jon Smirl
2006-08-19 19:25         ` Linus Torvalds
2006-08-18 16:25     ` Nicolas Pitre [this message]
2006-08-21  7:06       ` Shawn Pearce
2006-08-21 14:07         ` Jon Smirl
2006-08-21 15:46         ` Nicolas Pitre
2006-08-21 16:14           ` Jon Smirl
2006-08-21 17:48             ` Nicolas Pitre
2006-08-21 17:55               ` Nicolas Pitre
2006-08-21 18:01                 ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0608181057440.11359@localhost.localdomain \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).