From: Shawn Pearce <spearce@spearce.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: git <git@vger.kernel.org>
Subject: Re: Compression and dictionaries
Date: Mon, 14 Aug 2006 00:17:05 -0400 [thread overview]
Message-ID: <20060814041705.GD18667@spearce.org> (raw)
In-Reply-To: <9e4733910608132107j7bca0271g360de3447febbf51@mail.gmail.com>
Jon Smirl <jonsmirl@gmail.com> wrote:
> The zlib doc says to put your most common strings into the fixed
> dictionary. If a string isn't in the fixed dictionary it will get
> handled with an internal dictionary entry. By default zlib runs with
> an empty fixed dictionary and handles everything with the internal
> dictionary.
> Since we are encoding C many strings will always be present (if,
> static, define, const, char, include, int, void, while, continue,
> etc). Do you have any tools to identify the top 500 strings in C
> code? The fixed dictionary would get hardcoded into the git apps.
Actually GIT itself may also benefit from other strings beyond
those common found in C-like languages:
'10644 '
'40000 '
'parent '
'tree '
'author '
'committer '
as these occur frequently in trees and commits.
> A fixed dictionary could conceivably take 5-10% off the size of each entry.
Could be an interesting experiment to see if that's really true
for common loads (e.g. the kernel repo). I don't think anyone has
tried it.
--
Shawn.
next prev parent reply other threads:[~2006-08-14 4:17 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-14 3:37 Compression and dictionaries Jon Smirl
2006-08-14 3:56 ` Shawn Pearce
2006-08-14 4:07 ` Jon Smirl
2006-08-14 4:17 ` Shawn Pearce [this message]
2006-08-14 7:48 ` Alex Riesen
2006-08-14 10:06 ` Erik Mouw
2006-08-14 12:33 ` Johannes Schindelin
2006-08-14 14:08 ` Jon Smirl
2006-08-14 14:45 ` Johannes Schindelin
2006-08-14 16:15 ` Jon Smirl
2006-08-14 16:32 ` David Lang
2006-08-14 16:55 ` Jakub Narebski
2006-08-14 17:15 ` Jeff Garzik
2006-08-14 17:34 ` David Lang
2006-08-14 17:50 ` Jeff Garzik
2006-08-14 18:48 ` Jon Smirl
2006-08-14 19:08 ` David Lang
2006-08-14 19:38 ` Johannes Schindelin
2006-08-14 15:14 ` Alex Riesen
2006-08-14 15:26 ` Johannes Schindelin
-- strict thread matches above, loose matches on Subject: below --
2006-08-15 8:33 linux
2006-08-15 13:29 ` Jon Smirl
2006-08-15 14:55 ` Jon Smirl
2006-08-16 0:37 ` linux
[not found] ` <4b73d43f0608152243i15b37036x7aa50aa3afc2b02f@mail.gmail.com>
2006-08-16 5:50 ` Jon Smirl
2006-08-16 6:33 ` Johannes Schindelin
2006-08-16 6:55 ` Shawn Pearce
2006-08-16 7:09 ` Johannes Schindelin
2006-08-16 14:43 ` Jon Smirl
2006-08-17 22:33 ` linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060814041705.GD18667@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.