From: Shawn Pearce <spearce@spearce.org>
To: git@vger.kernel.org
Cc: Jon Smirl <jonsmirl@gmail.com>
Subject: Re: Packfile can't be mapped
Date: Sun, 27 Aug 2006 22:47:20 -0400 [thread overview]
Message-ID: <20060828024720.GD24204@spearce.org> (raw)
In-Reply-To: <9e4733910608271804j762960a8ud83654c78ebe009a@mail.gmail.com>
Jon Smirl <jonsmirl@gmail.com> wrote:
> git-repack can't handle my 1.75GB pack file. I am running x86 with 3GB
> address space.
>
> -rw-rw-r-- 1 jonsmirl jonsmirl 47221712 Aug 27 20:29 testme.idx
> -rw-rw-r-- 1 jonsmirl jonsmirl 1754317619 Aug 27 20:29 testme.pack
>
> [jonsmirl@jonsmirl t1]$ git-repack -a -f --window=50 --depth=5000
> Generating pack...
> Done counting 1963325 objects.
> fatal: packfile .git/objects/pack/testme.pack cannot be mapped.
> [jonsmirl@jonsmirl t1]$
>
> It is built from Mozilla CVS but it is an intermediate stage of our
> work. The fast-import tool isn't diffing directory tree which makes
> the pack much bigger than it needs to be. Shawn is working on the
> packing code.
I'm going to try to get tree deltas written to the pack sometime this
week. That should compact this intermediate pack down to something
that git-pack-objects would be able to successfully mmap into a
32 bit address space. A complete repack with no delta reuse will
hopefully generate a pack closer to 400 MB in size. But I know
Jon would like to get that pack even smaller. :)
I should point out that the input stream to fast-import was 20 GB
(completely decompressed revisions from RCS) plus all commit data.
The original CVS ,v files are around 3 GB. An archive .tar.gz'ing
the ,v files is around 550 MB. Going to only 1.7 GB without tree
or commit deltas is certainly pretty good. :)
> ---------------------------------------------------
> Alloc'd objects: 1968000 ( 1892000 overflow )
> Total objects: 1967527 ( 41856 duplicates)
> blobs : 633842 ( 0 duplicates)
> trees : 1131208 ( 41856 duplicates)
> commits: 200921 ( 0 duplicates)
> tags : 1556 ( 0 duplicates)
> Total branches: 1600 ( 7985 loads )
> marks: 1048576 ( 200921 unique )
> atoms: 56803
> Memory total: 66908 KiB
> pools: 5408 KiB
> objects: 61500 KiB
> Pack remaps: 9501
> ---------------------------------------------------
> Pack size: 1713200 KiB
> Index size: 46114 KiB
All of that says that aside from the 1.7 GB output file fast-import
ran extremely well. About 1.9 million objects were written into
the output pack file, with 41k duplicate trees (duplicate blobs
were removed by cvs2svn prior to fast-import so they don't appear).
200k commits were created across 1600 branches. And we did it in
only 67 MB of memory.
We also had ~8000 LRU cache misses related to our branch data;
this just means that cvs2svn likes to frequently jump around
between branches rather than import an entire branch at a time.
Boosting the size of the LRU cache (at the expense of needing more
memory) should reduce those cache misses as well as 'Pack remaps'.
I'd also like to clean up that pack remapping code and move it
into sha1_file.c. Its an implementation of partial pack mapping
and it is apparently working quite well for us in fast-import.
It may help GIT deal with very large packs (e.g. 1.7 GB) on smaller
address space systems (e.g. 32 bit).
We're not confident that this import is completely valid yet.
We have a few translation issues we're still working on. But now
that we have a complete pack going from start to finish we can start
to focus on those issues. Especially since this entire process
(,v to .pack) is less than half a day to run.
--
Shawn.
next prev parent reply other threads:[~2006-08-28 3:21 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-28 1:04 Packfile can't be mapped Jon Smirl
2006-08-28 2:47 ` Shawn Pearce [this message]
2006-08-28 4:27 ` Nicolas Pitre
2006-08-28 4:36 ` Linus Torvalds
2006-08-28 6:00 ` Shawn Pearce
2006-08-28 14:15 ` Jon Smirl
2006-08-28 14:40 ` Nicolas Pitre
2006-08-28 15:44 ` Jon Smirl
2006-08-28 16:43 ` Nicolas Pitre
2006-08-28 16:48 ` Shawn Pearce
2006-08-28 14:48 ` Nicolas Pitre
2006-08-28 5:33 ` Shawn Pearce
2006-08-28 16:42 ` Shawn Pearce
2006-08-28 17:19 ` Nicolas Pitre
2006-08-29 4:52 ` Shawn Pearce
2006-08-29 5:33 ` Shawn Pearce
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060828024720.GD24204@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=jonsmirl@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).