From: Shawn Pearce <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Nicolas Pitre <nico@cam.org>,
git@vger.kernel.org, Linus Torvalds <torvalds@osdl.org>
Subject: Re: Mozilla .git tree
Date: Fri, 1 Sep 2006 21:19:50 -0400 [thread overview]
Message-ID: <20060902011950.GB24234@spearce.org> (raw)
In-Reply-To: <7vr6yw58xp.fsf@assigned-by-dhcp.cox.net>
Junio C Hamano <junkio@cox.net> wrote:
> Step 3. Work on integrating partial mmap() support with Shawn.
> This is more or less orthogonal to 4GB ceiling (people
> would hit mmap() limit even with a 1.5GB pack), but I
> suspect it would be necessary to be able to tell where
> the end of each pack entry is cheaply to implement
> this.
I was just getting ready to move my partial mmap support over from
fast-import.
Although I did the implementation a little differently in fast-import
than what I think I'll do in core Git. In fast-import store a
hashtable in memory of all objects in the pack but I chose not to
store the ending offset (or compressed length) and instead just
guess about where the object ends. I did that to save 4 bytes of
memory per object. :-)
Its necessary to know where the object ends to ensure that your
current mapping (or any remapping you are about to do) covers the
entire object before you start deflating. Otherwise you might
have to remap the pack in the middle of the inflate operation.
(Of course you might need to do this anyway if the compressed object
is larger than your default mapping unit.)
What I did in fast-import was give inflate whatever was left in
the current mapping; then if I got a Z_OK or Z_BUF_ERROR back from
inflate I move the mapping to the next 128 MiB chunk and reset my
z_stream's next_in/avail_in accordingly, then recall inflate.
No I didn't performance test it to see how frequently I'm mapping
a pack multiple times to get one object. But I'm going to stick my
neck out and say that most objects probably don't have a compressed
length exceeding 128 MiB so we're talking one remap that we would
have had to do anyway if the object spanned over the end of the
current mapping. If the object's starting offset was completely
outside of the current mapping then I rounded the offset down to
the page size (from getpagesize) and remapped; therefore we also
probably only do one remap on objects needing it.
But having the length or ending offset in the index will help with
copying the object during a repack as well as prevent us from needing
to guess during accesses. So good news indeed that you are adding
it to the index.
--
Shawn.
--
VGER BF report: U 0.5
next prev parent reply other threads:[~2006-09-02 2:16 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <9e4733910608290943g6aa79855q62b98caf4f19510@mail.gmail.com>
[not found] ` <20060829165811.GB21729@spearce.org>
[not found] ` <9e4733910608291037k2d9fb791v18abc19bdddf5e89@mail.gmail.com>
[not found] ` <20060829175819.GE21729@spearce.org>
[not found] ` <9e4733910608291155g782953bbv5df1b74878f4fcf1@mail.gmail.com>
[not found] ` <20060829190548.GK21729@spearce.org>
[not found] ` <9e4733910608291252q130fc723r945e6ab906ca6969@mail.gmail.com>
[not found] ` <20060829232007.GC22935@spearce.org>
[not found] ` <9e4733910608291807q9b896e4sdbfaa9e49de58c2b@mail.gmail.com>
2006-08-30 1:51 ` Mozilla .git tree Shawn Pearce
2006-08-30 2:25 ` Shawn Pearce
2006-08-30 2:58 ` Jon Smirl
2006-08-30 3:10 ` Shawn Pearce
2006-08-30 3:27 ` Jon Smirl
2006-08-30 5:53 ` Nicolas Pitre
2006-08-30 11:42 ` Junio C Hamano
2006-09-01 7:42 ` Junio C Hamano
2006-09-02 1:19 ` Shawn Pearce [this message]
2006-09-02 4:01 ` Junio C Hamano
2006-09-02 4:39 ` Shawn Pearce
2006-09-02 11:06 ` Junio C Hamano
2006-09-02 14:20 ` Jon Smirl
2006-09-02 17:39 ` Shawn Pearce
2006-09-02 18:56 ` Linus Torvalds
2006-09-02 20:53 ` Junio C Hamano
2006-09-02 17:44 ` Shawn Pearce
2006-09-02 2:04 ` Shawn Pearce
2006-09-02 11:02 ` Junio C Hamano
2006-09-02 17:51 ` Shawn Pearce
2006-09-02 20:55 ` Junio C Hamano
2006-09-03 3:54 ` Shawn Pearce
2006-09-01 17:45 ` A Large Angry SCM
2006-09-01 18:35 ` Linus Torvalds
2006-09-01 19:56 ` Junio C Hamano
2006-09-01 23:14 ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02 0:23 ` Linus Torvalds
2006-09-02 1:39 ` VGER BF report? Johannes Schindelin
2006-09-02 5:58 ` Sam Ravnborg
2006-09-02 1:52 ` [PATCH] pack-objects: re-validate data we copy from elsewhere Junio C Hamano
2006-09-02 3:52 ` Junio C Hamano
2006-09-02 4:52 ` Shawn Pearce
2006-09-02 9:42 ` Junio C Hamano
2006-09-02 17:43 ` Linus Torvalds
2006-09-02 10:09 ` Junio C Hamano
2006-09-02 17:54 ` Shawn Pearce
2006-09-03 21:00 ` Junio C Hamano
2006-09-04 4:10 ` Shawn Pearce
2006-09-04 5:50 ` Junio C Hamano
2006-09-04 6:44 ` Shawn Pearce
2006-09-04 7:39 ` Junio C Hamano
2006-09-03 0:27 ` Linus Torvalds
2006-09-03 0:32 ` Junio C Hamano
2006-09-05 8:12 ` Junio C Hamano
2006-09-02 18:43 ` Linus Torvalds
2006-09-02 20:56 ` Junio C Hamano
2006-09-03 21:48 ` Junio C Hamano
2006-09-03 22:00 ` Linus Torvalds
2006-09-03 22:16 ` Linus Torvalds
2006-09-03 22:34 ` Junio C Hamano
2006-09-04 4:06 ` Junio C Hamano
2006-09-04 15:19 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060902011950.GB24234@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=nico@cam.org \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).