From: Nicolas Pitre <nico@cam.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: Huge win, compressing a window of delta runs as a unit
Date: Mon, 21 Aug 2006 13:48:59 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0608211309270.3851@localhost.localdomain> (raw)
In-Reply-To: <9e4733910608210914s1157f47eta821584928ce4dd5@mail.gmail.com>
On Mon, 21 Aug 2006, Jon Smirl wrote:
> How about making the length of delta chains an exponential function of
> the number of revs? In Mozilla configure.in has 1,700 revs and it is a
> 250K file. If you store a full copy every 10 revs that is 43MB
> (prezip) of data that almost no one is going to look at.
> The chains
> lengths should reflect the relative probability that someone is going
> to ask to see the revs. That is not at all a uniform function.
1) You can do that already with stock git-repack.
2) The current git-pack-objects code can and does produce a delta "tree"
off of a single base object. It doesn't have to be a linear list.
Therefore, even if the default depth is 10, you may as well have many
deltas pointing to the _same_ base object effectively making the
compression ratio much larger than 1/10.
If for example each object has 2 delta childs, and each of those deltas
also have 2 delta childs, you could have up to 39366 delta objects
attached to a _single_ undeltified base object.
And of course the git-pack-objects code doesn't limit the number of
delta childs in any way so this could theoritically be infinite even
though the max depth is 10. OK the delta matching window limits that
somehow but nothing prevents you from repacking with a larger window
since that parameter has no penalty on the reading of objects out of the
pack.
> Personally I am still in favor of a two pack system. One archival pack
> stores everything in a single chain and size, not speed, is it's most
> important attribute. It is marked readonly and only functions as an
> archive; git-repack never touches it. It might even use a more compact
> compression algorithm.
>
> The second pack is for storing more recent revisions. The archival
> pack would be constructed such that none of the files needed for the
> head revisions of any branch are in it. They would all be in the
> second pack.
Personally I'm still against that. All arguments put forward for a
different or multiple packing system are based on unproven assumptions
so far and none of those arguments actually present significant
advantages over the current system. For example, I was really intriged
by the potential of object grouping into a single zlib stream at first,
but it turns out not to be so great after actual testing.
I still think that a global zlib dictionary is a good idea. Not because
it looks like it'll make packs enormously smaller, but rather because it
impose no performance regression over the current system. And I
strongly believe that you have to have a really strong case for
promoting a solution that carries performance regression, something like
over 25% smaller packs for example. But so far it didn't happen.
> This may be a path to partial repositories. Instead of downloading the
> real archival pack I could download just an index for it. The index
> entries would be marked to indicate that these objects are valid but
> not-present.
An index without the actual objects is simply useless. You could do
without the index entirely in that case anyway since the mere presence
of an entry in the index doesn't give you anything really useful.
Nicolas
next prev parent reply other threads:[~2006-08-21 17:49 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-16 17:20 Huge win, compressing a window of delta runs as a unit Jon Smirl
2006-08-17 4:07 ` Shawn Pearce
2006-08-17 7:56 ` Johannes Schindelin
2006-08-17 8:07 ` Johannes Schindelin
2006-08-17 14:36 ` Jon Smirl
2006-08-17 15:45 ` Johannes Schindelin
2006-08-17 16:33 ` Nicolas Pitre
2006-08-17 17:05 ` Johannes Schindelin
2006-08-17 17:22 ` Jon Smirl
2006-08-17 18:15 ` Nicolas Pitre
2006-08-17 17:17 ` Jon Smirl
2006-08-17 17:32 ` Nicolas Pitre
2006-08-17 18:06 ` Jon Smirl
2006-08-17 17:22 ` Nicolas Pitre
2006-08-17 18:03 ` Jon Smirl
2006-08-17 18:24 ` Nicolas Pitre
2006-08-18 4:03 ` Nicolas Pitre
2006-08-18 12:53 ` Jon Smirl
2006-08-18 16:30 ` Nicolas Pitre
2006-08-18 16:56 ` Jon Smirl
2006-08-21 3:45 ` Nicolas Pitre
2006-08-21 6:46 ` Shawn Pearce
2006-08-21 10:24 ` Jakub Narebski
2006-08-21 16:23 ` Jon Smirl
2006-08-18 13:15 ` Jon Smirl
2006-08-18 13:36 ` Johannes Schindelin
2006-08-18 13:50 ` Jon Smirl
2006-08-19 19:25 ` Linus Torvalds
2006-08-18 16:25 ` Nicolas Pitre
2006-08-21 7:06 ` Shawn Pearce
2006-08-21 14:07 ` Jon Smirl
2006-08-21 15:46 ` Nicolas Pitre
2006-08-21 16:14 ` Jon Smirl
2006-08-21 17:48 ` Nicolas Pitre [this message]
2006-08-21 17:55 ` Nicolas Pitre
2006-08-21 18:01 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0608211309270.3851@localhost.localdomain \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).