git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Shawn Pearce <spearce@spearce.org>
Cc: Jon Smirl <jonsmirl@gmail.com>, git <git@vger.kernel.org>
Subject: Re: Huge win, compressing a window of delta runs as a unit
Date: Thu, 17 Aug 2006 13:22:02 -0400 (EDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0608171233370.11359@localhost.localdomain> (raw)
In-Reply-To: <20060817040719.GC18500@spearce.org>

On Thu, 17 Aug 2006, Shawn Pearce wrote:

> I'm going to try to integrate this into core GIT this weekend.
> My current idea is to make use of the OBJ_EXT type flag to add
> an extended header field behind the length which describes the
> "chunk" as being a delta chain compressed in one zlib stream.
> I'm not overly concerned about saving lots of space in the header
> here as it looks like we're winning a huge amount of pack space,
> so the extended header will probably itself be a couple of bytes.
> This keeps the shorter reserved types free for other great ideas.  :)

We're streaving for optimal data storage here so don't be afraid to use 
one of the available types for an "object stream" object.  Because when 
you think of it, the deflating of multiple objects into a single zlib 
stream can be applied to all object types not only deltas.  If ever 
deflating many blobs into one zlib stream is dimmed worth it then the 
encoding will already be ready for it.  Also you can leverage existing 
code to write headers, etc.

I'd suggest you use OBJ_GROUP = 0 as a new primary object type.  Then 
the "size" field in the header could then become the number of objects 
that are included in the group.  Most of the time that will fit in the 
low 4 bits of the first header byte, but if there is more than 15 
grouped objects then more bits can be used on the following byte.  
Anyway so far all the code to generate and parse that is already there.  
If ever there is a need for more extensions that could be prefixed with 
a pure zero byte (an object group with a zero object count which is 
distinguishable from a real group).

Then, having the number of grouped objects, you just have to list the 
usual headers for those objects, which are their type and inflated size 
just like regular object headers, including the base sha1 for deltas.  
Again you already have code to produce and parse those.

And finally just append the objects payload in a single deflated stream.

This way the reading of an object from a group can be optimized if the 
object data is located at the beginning of the stream such that you only 
need to inflate the amount of bytes leading to the desired data 
(possibly caching those for further delta replaying), inflate 
the needed data for the desired object and then ignoring the remaining 
of the stream.


Nicolas

  parent reply	other threads:[~2006-08-17 17:22 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-16 17:20 Huge win, compressing a window of delta runs as a unit Jon Smirl
2006-08-17  4:07 ` Shawn Pearce
2006-08-17  7:56   ` Johannes Schindelin
2006-08-17  8:07     ` Johannes Schindelin
2006-08-17 14:36       ` Jon Smirl
2006-08-17 15:45         ` Johannes Schindelin
2006-08-17 16:33           ` Nicolas Pitre
2006-08-17 17:05             ` Johannes Schindelin
2006-08-17 17:22             ` Jon Smirl
2006-08-17 18:15               ` Nicolas Pitre
2006-08-17 17:17           ` Jon Smirl
2006-08-17 17:32             ` Nicolas Pitre
2006-08-17 18:06               ` Jon Smirl
2006-08-17 17:22   ` Nicolas Pitre [this message]
2006-08-17 18:03     ` Jon Smirl
2006-08-17 18:24       ` Nicolas Pitre
2006-08-18  4:03 ` Nicolas Pitre
2006-08-18 12:53   ` Jon Smirl
2006-08-18 16:30     ` Nicolas Pitre
2006-08-18 16:56       ` Jon Smirl
2006-08-21  3:45         ` Nicolas Pitre
2006-08-21  6:46           ` Shawn Pearce
2006-08-21 10:24             ` Jakub Narebski
2006-08-21 16:23             ` Jon Smirl
2006-08-18 13:15   ` Jon Smirl
2006-08-18 13:36     ` Johannes Schindelin
2006-08-18 13:50       ` Jon Smirl
2006-08-19 19:25         ` Linus Torvalds
2006-08-18 16:25     ` Nicolas Pitre
2006-08-21  7:06       ` Shawn Pearce
2006-08-21 14:07         ` Jon Smirl
2006-08-21 15:46         ` Nicolas Pitre
2006-08-21 16:14           ` Jon Smirl
2006-08-21 17:48             ` Nicolas Pitre
2006-08-21 17:55               ` Nicolas Pitre
2006-08-21 18:01                 ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0608171233370.11359@localhost.localdomain \
    --to=nico@cam.org \
    --cc=git@vger.kernel.org \
    --cc=jonsmirl@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).