From: "Dana How" <danahow@gmail.com>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: "Junio C Hamano" <junkio@cox.net>,
"Git Mailing List" <git@vger.kernel.org>,
danahow@gmail.com
Subject: Re: [PATCH] Prevent megablobs from gunking up git packs
Date: Tue, 22 May 2007 00:33:23 -0700 [thread overview]
Message-ID: <56b7f5510705220033p43d20aabnbf86f1f6959d611a@mail.gmail.com> (raw)
In-Reply-To: <20070522063050.GD11636@spearce.org>
On 5/21/07, Shawn O. Pearce <spearce@spearce.org> wrote:
> Dana How <danahow@gmail.com> wrote:
> > ... Operations
> > such as "git-log --pretty=oneline" were about 30X faster
> > on a cold cache and 2 to 3X faster otherwise. Process sizes
> > remained reasonable.
>
> Can you give me details about your system? Is this a 64 bit binary?
RHEL4/Nahant on an Opteron. Yes.
> What is your core.packedGitWindowSize and core.packedGitLimit set to?
I didn't change the default.
> It sounds like the packed version was almost 3 GiB smaller, but
> was slower because we were mmap'ing far too much data at startup
> and that was making your OS page in things that you didn't really
> need to have.
The difference in size is because of the "Custom compression levels"
patch -- now the loose objects use Z_BEST_SPEED, whereas the packs
use Z_DEFAULT_COMPRESSION.
> Mind trying git-log with a smaller core.packedGitWindow{Size,Limit}?
> Perhaps its just as simple as our defaults are far far too high for
> your workload...
I think that's a good idea and it should be easy to try tomorrow.
It will improve the cold cache case definitely.
But we need to consider both *read* and *creation* performance.
The portion of the repo I imported to git grows at about 500MB/week
(compressed). Should I repack -a every week? Every month? In any case,
should I use default window/depth, or 0/0? If default, run-times are
prohibitive (in fact, I've always killed each attempt so the machine
could be used for "real" work), and if 0/0, then I lose deltification
on all objects.
These megablobs really are outliers and stress the "one size fits
all" approach of packing in git. As a thought experiment,
let's (1) pretend git-repack takes --max-blob-size= and --max-pack-size= ,
(2) pretend the patch doesn't add the repack.maxblobsize variable,
and (3) do the following:
% git-repack -a -d --max-blob-size=256
% git-repack --max-pack-size=2047 --window=0 --depth=0
The first step makes a digestible 13MB packfile, and the second
puts all the megablobs in 6+ 2GB packfiles. Is there really any
advantage to carrying out the second step? If I'm processing
a 100MB+ blob, do I really care about an extra open(2) call?
Thanks,
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
next prev parent reply other threads:[~2007-05-22 7:33 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-22 6:14 [PATCH] Prevent megablobs from gunking up git packs Dana How
2007-05-22 6:30 ` Shawn O. Pearce
2007-05-22 7:33 ` Dana How [this message]
2007-05-22 6:52 ` Junio C Hamano
2007-05-22 8:00 ` Dana How
2007-05-22 11:05 ` Jakub Narebski
2007-05-22 16:59 ` Dana How
2007-05-22 23:44 ` Jakub Narebski
2007-05-23 0:28 ` Junio C Hamano
2007-05-23 1:58 ` Nicolas Pitre
2007-05-22 17:38 ` Nicolas Pitre
2007-05-22 18:07 ` Dana How
2007-05-23 22:08 ` Junio C Hamano
2007-05-23 23:55 ` Dana How
2007-05-24 1:44 ` Junio C Hamano
2007-05-24 7:12 ` Shawn O. Pearce
2007-05-24 9:38 ` Johannes Schindelin
2007-05-24 17:23 ` david
2007-05-24 17:29 ` Johannes Schindelin
2007-05-25 0:55 ` Shawn O. Pearce
2007-05-24 20:43 ` Geert Bosch
2007-05-24 23:29 ` Dana How
2007-05-25 2:06 ` Shawn O. Pearce
2007-05-25 5:44 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56b7f5510705220033p43d20aabnbf86f1f6959d611a@mail.gmail.com \
--to=danahow@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=spearce@spearce.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).