From: Shawn Pearce <spearce@spearce.org>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: Mozilla .git tree
Date: Sat, 2 Sep 2006 13:39:22 -0400 [thread overview]
Message-ID: <20060902173922.GA27826@spearce.org> (raw)
In-Reply-To: <9e4733910609020720w3633aa0cq5016fb1e223fc4cb@mail.gmail.com>
Jon Smirl <jonsmirl@gmail.com> wrote:
> If you're going to redo the pack formats another big win for the
> Mozilla pack is to convert pack internal sha1 references into file
> offsets.within the pack. Doing that will take around 30MB off from the
> Mozilla pack size. sha1's are not compressible so this is a direct
> savings.
Right now Junio's working on the index to break the 4 GiB barrier.
I think Junio and Nico have already agreed to change the base SHA1
to be an offset instead; though this is an issue for the current
way the base gets written out behind the delta as you need to know
exactly how many bytes the delta is going to be so you can correctly
compute the offset.
> This might reduce memory usage too. The index is only needed to get
> the initial object from the pack. Since index use is lighter it could
> just be open/closed when needed.
True; however when you are walking a series of commits (to produce
output for `git log` for example) every time you parse a commit you
need to go back to the .idx to relookup the ancestor commit(s).
So you don't want to open/close the .idx file on every object;
instead put the .idx file into the LRU like the .pack files are
(or into their own LRU chain) and maintain some threshold on how
many bytes worth of .idx is kept live.
> You could also introduce a zlib dictionary object into the format and
> just leave it empty for now.
No. I'm not sure I'm ready to propose that as a solution for
decreasing pack size. Now that my exams are over I've started
working on a true dictionary based compression implementation.
I want to try to get Git itself repacked under it, then try the
Mozilla pack after I get my new amd64 based system built.
If that's as big of space saver as we're hoping it would be then
the pack format would be radically different and need to change;
if it doesn't gain us anything (or is worse!) then we can go back
to the drawing board and consider other pack format changes such as
a zlib dictionary. But right now its measly 4% gain isn't very much.
--
Shawn.
--
VGER BF report: U 0.653439
next prev parent reply other threads:[~2006-09-02 17:41 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9e4733910608290943g6aa79855q62b98caf4f19510@mail.gmail.com>
[not found] ` <20060829165811.GB21729@spearce.org>
[not found] ` <9e4733910608291037k2d9fb791v18abc19bdddf5e89@mail.gmail.com>
[not found] ` <20060829175819.GE21729@spearce.org>
[not found] ` <9e4733910608291155g782953bbv5df1b74878f4fcf1@mail.gmail.com>
[not found] ` <20060829190548.GK21729@spearce.org>
[not found] ` <9e4733910608291252q130fc723r945e6ab906ca6969@mail.gmail.com>
[not found] ` <20060829232007.GC22935@spearce.org>
[not found] ` <9e4733910608291807q9b896e4sdbfaa9e49de58c2b@mail.gmail.com>
2006-08-30 1:51 ` Mozilla .git tree Shawn Pearce
2006-08-30 2:25 ` Shawn Pearce
2006-08-30 2:58 ` Jon Smirl
2006-08-30 3:10 ` Shawn Pearce
2006-08-30 3:27 ` Jon Smirl
2006-08-30 5:53 ` Nicolas Pitre
2006-08-30 11:42 ` Junio C Hamano
2006-09-01 7:42 ` Junio C Hamano
2006-09-02 1:19 ` Shawn Pearce
2006-09-02 4:01 ` Junio C Hamano
2006-09-02 4:39 ` Shawn Pearce
2006-09-02 11:06 ` Junio C Hamano
2006-09-02 14:20 ` Jon Smirl
2006-09-02 17:39 ` Shawn Pearce [this message]
2006-09-02 18:56 ` Linus Torvalds
2006-09-02 20:53 ` Junio C Hamano
2006-09-02 17:44 ` Shawn Pearce
2006-09-02 2:04 ` Shawn Pearce
2006-09-02 11:02 ` Junio C Hamano
2006-09-02 17:51 ` Shawn Pearce
2006-09-02 20:55 ` Junio C Hamano
2006-09-03 3:54 ` Shawn Pearce
2006-09-01 17:45 ` A Large Angry SCM
2006-09-01 18:35 ` Linus Torvalds
2006-09-01 19:56 ` Junio C Hamano
2006-09-01 23:14 ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02 0:23 ` Linus Torvalds
2006-09-02 1:39 ` VGER BF report? Johannes Schindelin
2006-09-02 5:58 ` Sam Ravnborg
2006-09-02 1:52 ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02 3:52 ` Junio C Hamano
2006-09-02 4:52 ` Shawn Pearce
2006-09-02 9:42 ` Junio C Hamano
2006-09-02 17:43 ` Linus Torvalds
2006-09-02 10:09 ` Junio C Hamano
2006-09-02 17:54 ` Shawn Pearce
2006-09-03 21:00 ` Junio C Hamano
2006-09-04 4:10 ` Shawn Pearce
2006-09-04 5:50 ` Junio C Hamano
2006-09-04 6:44 ` Shawn Pearce
2006-09-04 7:39 ` Junio C Hamano
2006-09-03 0:27 ` Linus Torvalds
2006-09-03 0:32 ` Junio C Hamano
2006-09-05 8:12 ` Junio C Hamano
2006-09-02 18:43 ` Linus Torvalds
2006-09-02 20:56 ` Junio C Hamano
2006-09-03 21:48 ` Junio C Hamano
2006-09-03 22:00 ` Linus Torvalds
2006-09-03 22:16 ` Linus Torvalds
2006-09-03 22:34 ` Junio C Hamano
2006-09-04 4:06 ` Junio C Hamano
2006-09-04 15:19 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060902173922.GA27826@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).