git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Shawn Pearce" <spearce@spearce.org>, git <git@vger.kernel.org>
Subject: Fwd: Git SCM and zlib dictionaries
Date: Tue, 15 Aug 2006 11:19:59 -0400	[thread overview]
Message-ID: <9e4733910608150819l6ba602e2q2f52a5693a2bac4d@mail.gmail.com> (raw)
In-Reply-To: <00B40C71-72B6-499B-806B-64A140136944@alumni.caltech.edu>

---------- Forwarded message ----------
From: Mark Adler <madler@alumni.caltech.edu>
Date: Aug 15, 2006 10:43 AM
Subject: Re: Git SCM and zlib dictionaries
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Jean-loup Gailly <jloup@gzip.org>


On Aug 15, 2006, at 6:11 AM, Jon Smirl wrote:
> What we are doing is similar to full-text
> search indexing.

If the point of very small (1Kish) compressed chunks is for random
access and individual decompression of those pieces, then there are
other approaches.  You can for example compress many of them together
for better compression (say 32), and accept some speed degradation by
having to decompress on average half (16) of them to get to the one
you want.

------------------------------
We have delta runs of about 20 revsisions, compress those 20 blobs as
a group instead of individually. The pack index would point all 20
sha1's to the same blob with a different type code. You had to load
and unzip most of these objects anyway to compute the revision off
from the diffs. Putting them into a single zip means that they share a
single compression table.
-------------------------------

Or you can process the whole thing to create a custom coding scheme,
as illustrated in "Managing Gigabytes":

     http://www.cs.mu.oz.au/mg/

mark



-- 
Jon Smirl
jonsmirl@gmail.com

           reply	other threads:[~2006-08-15 15:20 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <00B40C71-72B6-499B-806B-64A140136944@alumni.caltech.edu>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e4733910608150819l6ba602e2q2f52a5693a2bac4d@mail.gmail.com \
    --to=jonsmirl@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).