From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Nicolas Pitre <nico@cam.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
git@vger.kernel.org
Subject: Re: pack v4 status
Date: Tue, 27 Feb 2007 22:45:55 -0500 [thread overview]
Message-ID: <20070228034555.GA5597@spearce.org> (raw)
In-Reply-To: <7vwt23b54a.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano <junkio@cox.net> wrote:
> Nicolas Pitre <nico@cam.org> writes:
> > The idea is to deal with only tree objects containing the 64K most
> > frequently used base names and fall back to the current tree object
> > encoding for objects that couldn't be represented that way.
>
> Ah, I was wondering the same thing as Linus after seeing shawn
> talked about the 2-byte prefix on #git. Falling back to an
> alternate encoding for rarer cases makes sense.
Right. Git is already fast, and already compresses the object data
very well. But I think we can make things faster without violating
the basic assumptions of "whole project history", and it just turns
out that those encodings are also making the data smaller for the
common case of human maintained source code. Which of course is
one of the primary uses for Git, but is obviously not the only use.
In the worst case scenario we'll be doing exactly what we are
doing today with regards to encoding. That performance and disk
space usage is already known and considered "very, very fast" and
"very small". ;-)
In the best case scenario (human managed source like linux.git,
git.git) we'll scream with pack v4. The rev-list stats I posted
from just the tree encoding switch not only saved 3 MiB of disk
space but improved total running time by 12.5%. Nico and I know
we can still do better.
With 15k basenames in linux.git we're filling only 23.6% of the
available namespace within a single packfile. I think that by the
time we have enough basenames to break 64K we'll be several years
out and be talking about historical packs vs. active packs.
--
Shawn.
next prev parent reply other threads:[~2007-02-28 3:46 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-02-27 15:50 pack v4 status Shawn O. Pearce
2007-02-27 21:51 ` Linus Torvalds
2007-02-27 22:15 ` Johannes Schindelin
2007-02-27 22:33 ` Nicolas Pitre
2007-02-27 22:32 ` Nicolas Pitre
2007-02-27 22:36 ` Junio C Hamano
2007-02-28 3:45 ` Shawn O. Pearce [this message]
2007-02-28 1:19 ` Nicolas Pitre
2007-02-28 4:13 ` Shawn O. Pearce
-- strict thread matches above, loose matches on Subject: below --
2007-02-28 10:04 linux
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070228034555.GA5597@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=nico@cam.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).