git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Nicolas Pitre" <nico@cam.org>
Cc: "Shawn Pearce" <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Re: Huge win, compressing a window of delta runs as a unit
Date: Thu, 17 Aug 2006 14:03:05 -0400	[thread overview]
Message-ID: <9e4733910608171103j545a7fd5x2b61b97376de04be@mail.gmail.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0608171233370.11359@localhost.localdomain>

On 8/17/06, Nicolas Pitre <nico@cam.org> wrote:
> We're streaving for optimal data storage here so don't be afraid to use
> one of the available types for an "object stream" object.  Because when
> you think of it, the deflating of multiple objects into a single zlib
> stream can be applied to all object types not only deltas.  If ever
> deflating many blobs into one zlib stream is dimmed worth it then the
> encoding will already be ready for it.  Also you can leverage existing
> code to write headers, etc.

Here are two more case that need to be accounted for in the packs.

1) If you zip something and it gets bigger. We need an entry that just
stores the object without it being zipped. Zipping jpegs or mpegs will
likely make them significantly bigger. Or does zlib like already
detect this case and do a null compression?

2) If you delta something and the delta is bigger than the object
being deltaed. The delta code should detect this and store the full
object instead of the delta. Again jpegs and mpegs will trigger this.
You may even want to say that the delta has to be smaller than 80% of
the full object.

Shawn is planning on looking into true dictionary based compression.
That will generate even more types inside of a pack. If dictionary
compression works out full text search can be added with a little more
overhead. True dictionary based compression has the potential to go
even smaller than the current 280MB. The optimal way to do this is for
each pack to contain it's own dictionary.

-- 
Jon Smirl
jonsmirl@gmail.com

  reply	other threads:[~2006-08-17 18:03 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-16 17:20 Huge win, compressing a window of delta runs as a unit Jon Smirl
2006-08-17  4:07 ` Shawn Pearce
2006-08-17  7:56   ` Johannes Schindelin
2006-08-17  8:07     ` Johannes Schindelin
2006-08-17 14:36       ` Jon Smirl
2006-08-17 15:45         ` Johannes Schindelin
2006-08-17 16:33           ` Nicolas Pitre
2006-08-17 17:05             ` Johannes Schindelin
2006-08-17 17:22             ` Jon Smirl
2006-08-17 18:15               ` Nicolas Pitre
2006-08-17 17:17           ` Jon Smirl
2006-08-17 17:32             ` Nicolas Pitre
2006-08-17 18:06               ` Jon Smirl
2006-08-17 17:22   ` Nicolas Pitre
2006-08-17 18:03     ` Jon Smirl [this message]
2006-08-17 18:24       ` Nicolas Pitre
2006-08-18  4:03 ` Nicolas Pitre
2006-08-18 12:53   ` Jon Smirl
2006-08-18 16:30     ` Nicolas Pitre
2006-08-18 16:56       ` Jon Smirl
2006-08-21  3:45         ` Nicolas Pitre
2006-08-21  6:46           ` Shawn Pearce
2006-08-21 10:24             ` Jakub Narebski
2006-08-21 16:23             ` Jon Smirl
2006-08-18 13:15   ` Jon Smirl
2006-08-18 13:36     ` Johannes Schindelin
2006-08-18 13:50       ` Jon Smirl
2006-08-19 19:25         ` Linus Torvalds
2006-08-18 16:25     ` Nicolas Pitre
2006-08-21  7:06       ` Shawn Pearce
2006-08-21 14:07         ` Jon Smirl
2006-08-21 15:46         ` Nicolas Pitre
2006-08-21 16:14           ` Jon Smirl
2006-08-21 17:48             ` Nicolas Pitre
2006-08-21 17:55               ` Nicolas Pitre
2006-08-21 18:01                 ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e4733910608171103j545a7fd5x2b61b97376de04be@mail.gmail.com \
    --to=jonsmirl@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).