From: Nicolas Pitre <nico@cam.org>
To: Shawn Pearce <spearce@spearce.org>
Cc: Jon Smirl <jonsmirl@gmail.com>, git <git@vger.kernel.org>
Subject: Re: Huge win, compressing a window of delta runs as a unit
Date: Thu, 17 Aug 2006 13:22:02 -0400 (EDT) [thread overview]
Message-ID: <Pine.LNX.4.64.0608171233370.11359@localhost.localdomain> (raw)
In-Reply-To: <20060817040719.GC18500@spearce.org>
On Thu, 17 Aug 2006, Shawn Pearce wrote:
> I'm going to try to integrate this into core GIT this weekend.
> My current idea is to make use of the OBJ_EXT type flag to add
> an extended header field behind the length which describes the
> "chunk" as being a delta chain compressed in one zlib stream.
> I'm not overly concerned about saving lots of space in the header
> here as it looks like we're winning a huge amount of pack space,
> so the extended header will probably itself be a couple of bytes.
> This keeps the shorter reserved types free for other great ideas. :)
We're streaving for optimal data storage here so don't be afraid to use
one of the available types for an "object stream" object. Because when
you think of it, the deflating of multiple objects into a single zlib
stream can be applied to all object types not only deltas. If ever
deflating many blobs into one zlib stream is dimmed worth it then the
encoding will already be ready for it. Also you can leverage existing
code to write headers, etc.
I'd suggest you use OBJ_GROUP = 0 as a new primary object type. Then
the "size" field in the header could then become the number of objects
that are included in the group. Most of the time that will fit in the
low 4 bits of the first header byte, but if there is more than 15
grouped objects then more bits can be used on the following byte.
Anyway so far all the code to generate and parse that is already there.
If ever there is a need for more extensions that could be prefixed with
a pure zero byte (an object group with a zero object count which is
distinguishable from a real group).
Then, having the number of grouped objects, you just have to list the
usual headers for those objects, which are their type and inflated size
just like regular object headers, including the base sha1 for deltas.
Again you already have code to produce and parse those.
And finally just append the objects payload in a single deflated stream.
This way the reading of an object from a group can be optimized if the
object data is located at the beginning of the stream such that you only
need to inflate the amount of bytes leading to the desired data
(possibly caching those for further delta replaying), inflate
the needed data for the desired object and then ignoring the remaining
of the stream.
Nicolas
next prev parent reply other threads:[~2006-08-17 17:22 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-16 17:20 Huge win, compressing a window of delta runs as a unit Jon Smirl
2006-08-17 4:07 ` Shawn Pearce
2006-08-17 7:56 ` Johannes Schindelin
2006-08-17 8:07 ` Johannes Schindelin
2006-08-17 14:36 ` Jon Smirl
2006-08-17 15:45 ` Johannes Schindelin
2006-08-17 16:33 ` Nicolas Pitre
2006-08-17 17:05 ` Johannes Schindelin
2006-08-17 17:22 ` Jon Smirl
2006-08-17 18:15 ` Nicolas Pitre
2006-08-17 17:17 ` Jon Smirl
2006-08-17 17:32 ` Nicolas Pitre
2006-08-17 18:06 ` Jon Smirl
2006-08-17 17:22 ` Nicolas Pitre [this message]
2006-08-17 18:03 ` Jon Smirl
2006-08-17 18:24 ` Nicolas Pitre
2006-08-18 4:03 ` Nicolas Pitre
2006-08-18 12:53 ` Jon Smirl
2006-08-18 16:30 ` Nicolas Pitre
2006-08-18 16:56 ` Jon Smirl
2006-08-21 3:45 ` Nicolas Pitre
2006-08-21 6:46 ` Shawn Pearce
2006-08-21 10:24 ` Jakub Narebski
2006-08-21 16:23 ` Jon Smirl
2006-08-18 13:15 ` Jon Smirl
2006-08-18 13:36 ` Johannes Schindelin
2006-08-18 13:50 ` Jon Smirl
2006-08-19 19:25 ` Linus Torvalds
2006-08-18 16:25 ` Nicolas Pitre
2006-08-21 7:06 ` Shawn Pearce
2006-08-21 14:07 ` Jon Smirl
2006-08-21 15:46 ` Nicolas Pitre
2006-08-21 16:14 ` Jon Smirl
2006-08-21 17:48 ` Nicolas Pitre
2006-08-21 17:55 ` Nicolas Pitre
2006-08-21 18:01 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0608171233370.11359@localhost.localdomain \
--to=nico@cam.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).