From: Shawn Pearce <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Nicolas Pitre <nico@cam.org>,
git@vger.kernel.org, Linus Torvalds <torvalds@osdl.org>
Subject: Re: Mozilla .git tree
Date: Fri, 1 Sep 2006 22:04:27 -0400 [thread overview]
Message-ID: <20060902020426.GF24234@spearce.org> (raw)
In-Reply-To: <7vr6yw58xp.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano <junkio@cox.net> wrote:
[snip]
> +A new style .idx file begins with a signature, "\377tOc", and a
> +version number as a 4-byte network byte order integer. The version
> +of this implementation is 2.
Ick. I understand why you did this (and thanks for such a good
explanation of it by the way) but what a horrible signature number
and way to wedge in a version number.
>From now on I propose that if we write a file - especially a binary
file - to disk that we always include a 4 byte version number into
the header of the file. Then we can avoid trying to wedge this into
the file after the fact when some worthwhile improvement requires
changing the format.
I think we probably should have done this when the binary headers
were introduced into loose objects. Would 4 bytes of file format
version number with an (initial version number which wasn't 0x78
in byte 0 and failed the % 31 test) really have hurt that much in
a loose object?
[snip]
> + . 4-byte network byte order integer, recording the location of the
> + next object in the main toc table.
Why not just the 4 byte object entry length? To load an object we
have to go find the next object in the idx file so we can compute
the offset difference. On very large packs (e.g. the Mozilla pack)
the index is 46 MiB. The jump across the index could be the entire
thing from back to front just to compute the size of an object when
the fan-out table and the binary search process really only poked
the tail end of the index when searching for the entry. So we're
demand paging in the front of the index just to compute a length
Sure the scheme you outlined allows a 64 bit difference but
uncompressed objects already can't be larger than 2**32-1 and we
could just as easily move that limit down to say 2**32-16 to leave
room for the object header and zlib header.
--
Shawn.
--
VGER BF report: U 0.5
next prev parent reply other threads:[~2006-09-02 3:21 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9e4733910608290943g6aa79855q62b98caf4f19510@mail.gmail.com>
[not found] ` <20060829165811.GB21729@spearce.org>
[not found] ` <9e4733910608291037k2d9fb791v18abc19bdddf5e89@mail.gmail.com>
[not found] ` <20060829175819.GE21729@spearce.org>
[not found] ` <9e4733910608291155g782953bbv5df1b74878f4fcf1@mail.gmail.com>
[not found] ` <20060829190548.GK21729@spearce.org>
[not found] ` <9e4733910608291252q130fc723r945e6ab906ca6969@mail.gmail.com>
[not found] ` <20060829232007.GC22935@spearce.org>
[not found] ` <9e4733910608291807q9b896e4sdbfaa9e49de58c2b@mail.gmail.com>
2006-08-30 1:51 ` Mozilla .git tree Shawn Pearce
2006-08-30 2:25 ` Shawn Pearce
2006-08-30 2:58 ` Jon Smirl
2006-08-30 3:10 ` Shawn Pearce
2006-08-30 3:27 ` Jon Smirl
2006-08-30 5:53 ` Nicolas Pitre
2006-08-30 11:42 ` Junio C Hamano
2006-09-01 7:42 ` Junio C Hamano
2006-09-02 1:19 ` Shawn Pearce
2006-09-02 4:01 ` Junio C Hamano
2006-09-02 4:39 ` Shawn Pearce
2006-09-02 11:06 ` Junio C Hamano
2006-09-02 14:20 ` Jon Smirl
2006-09-02 17:39 ` Shawn Pearce
2006-09-02 18:56 ` Linus Torvalds
2006-09-02 20:53 ` Junio C Hamano
2006-09-02 17:44 ` Shawn Pearce
2006-09-02 2:04 ` Shawn Pearce [this message]
2006-09-02 11:02 ` Junio C Hamano
2006-09-02 17:51 ` Shawn Pearce
2006-09-02 20:55 ` Junio C Hamano
2006-09-03 3:54 ` Shawn Pearce
2006-09-01 17:45 ` A Large Angry SCM
2006-09-01 18:35 ` Linus Torvalds
2006-09-01 19:56 ` Junio C Hamano
2006-09-01 23:14 ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02 0:23 ` Linus Torvalds
2006-09-02 1:39 ` VGER BF report? Johannes Schindelin
2006-09-02 5:58 ` Sam Ravnborg
2006-09-02 1:52 ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02 3:52 ` Junio C Hamano
2006-09-02 4:52 ` Shawn Pearce
2006-09-02 9:42 ` Junio C Hamano
2006-09-02 17:43 ` Linus Torvalds
2006-09-02 10:09 ` Junio C Hamano
2006-09-02 17:54 ` Shawn Pearce
2006-09-03 21:00 ` Junio C Hamano
2006-09-04 4:10 ` Shawn Pearce
2006-09-04 5:50 ` Junio C Hamano
2006-09-04 6:44 ` Shawn Pearce
2006-09-04 7:39 ` Junio C Hamano
2006-09-03 0:27 ` Linus Torvalds
2006-09-03 0:32 ` Junio C Hamano
2006-09-05 8:12 ` Junio C Hamano
2006-09-02 18:43 ` Linus Torvalds
2006-09-02 20:56 ` Junio C Hamano
2006-09-03 21:48 ` Junio C Hamano
2006-09-03 22:00 ` Linus Torvalds
2006-09-03 22:16 ` Linus Torvalds
2006-09-03 22:34 ` Junio C Hamano
2006-09-04 4:06 ` Junio C Hamano
2006-09-04 15:19 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060902020426.GF24234@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=nico@cam.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).