From: Nicolas Pitre <nico@cam.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Lee <clee@kde.org>, Junio C Hamano <junkio@cox.net>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: git-index-pack really does suck..
Date: Tue, 03 Apr 2007 17:21:12 -0400 (EDT) [thread overview]
Message-ID: <alpine.LFD.0.98.0704031705440.28181@xanadu.home> (raw)
In-Reply-To: <Pine.LNX.4.64.0704031346250.6730@woody.linux-foundation.org>
On Tue, 3 Apr 2007, Linus Torvalds wrote:
>
>
> On Tue, 3 Apr 2007, Nicolas Pitre wrote:
> > >
> > > Yeah. What happens is that inside the repo, because we do all the
> > > duplicate object checks (verifying that there are no evil hash collisions)
> > > even after fixing the memory leak, we end up keeping *track* of all those
> > > objects.
> >
> > What do you mean?
>
> Look at what we have to do to look up a SHA1 object.. We create all the
> lookup infrastructure, we don't *just* read the object. The delta base
> cache is the most obvious one.
It is caped to 16MB, so we're far from the 200+ MB count.
> > I'm of the opinion that this patch is unnecessary. It only helps in
> > bogus workflows to start with, and it makes the default behavior unsafe
> > (unsafe from a paranoid pov, but still). And in the _normal_ workflow
> > it should never trigger.
>
> Actually, even in the normal workflow it will do all the extra unnecessary
> work, if only because the lookup costs of *not* finding the entry.
>
> Lookie here:
>
> - git index-pack of the *git* pack-file in the v2.6/linux directory (zero
> overlap of objects)
>
> With --paranoid:
>
> 2.75user 0.37system 0:03.13elapsed 99%CPU
> 0major+5583minor pagefaults
>
> Without --paranoid:
>
> 2.55user 0.12system 0:02.68elapsed 99%CPU
> 0major+2957minor pagefaults
>
> See? That's the *normal* workflow. Zero objects found. 7% CPU overhead
> from just the unnecessary work, and almost twice as much memory used. Just
> from the index file lookup etc for a decent-sized project.
7% overhead over 2 second and a half of CPU which, _normally_, happens
when cloning the whole thing over a network connection which, if you're
lucky and have a 6mbps cable connection, will still be spread over 5
minutes of real time. And that is assuming that you're cloning a big
project inside itself which wouldn't work anyway. Otherwise a big clone
wound run index-pack in an empty repo where the lookup of exinsting
object is zero. Remains git-fetch which should concern itself with much
smaller packs pushing this overhead in the noise.
> Now, in the KDE situation, the *unnecessary* lookups will be about ten
> times more expensive, both on memory and CPU, just because the repository
> is about 20x the size. Even with no actual hits.
So? When would you really perform such an operation in a meaningful
way?
The memory usage worries me. I still cannot explain nor justify it.
But the CPU overhead is certainly not of any concern in _normal_ usage
scenarios, is it?
If anything that might be a good test case for the newton-raphson pack
lookup idea.
Nicolas
next prev parent reply other threads:[~2007-04-03 21:21 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-03 15:15 git-index-pack really does suck Linus Torvalds
[not found] ` <db 69205d0704031227q1009eabfhdd82aa3636f25bb6@mail.gmail.com>
[not found] ` <Pine.LNX.4.64.07 04031304420.6730@woody.linux-foundation.org>
[not found] ` <Pine.LNX.4.64.0704031322490.67 30@woody.linux-foundation.org>
[not found] ` <Pi ne.LNX.4.64.0704031413200.6730@woody.linux-foundation.org>
[not found] ` <alpine.LFD.0.98. 0704031836350.28181@xanadu.home>
2007-04-03 16:21 ` Linus Torvalds
2007-04-03 16:40 ` Nicolas Pitre
2007-04-03 16:33 ` Nicolas Pitre
2007-04-03 19:27 ` Chris Lee
2007-04-03 19:49 ` Nicolas Pitre
2007-04-03 19:54 ` Chris Lee
2007-04-03 20:18 ` Linus Torvalds
2007-04-03 20:32 ` Nicolas Pitre
2007-04-03 20:40 ` Junio C Hamano
2007-04-03 21:00 ` Linus Torvalds
2007-04-03 21:28 ` Nicolas Pitre
2007-04-03 22:49 ` Chris Lee
2007-04-03 23:12 ` Linus Torvalds
2007-04-03 20:56 ` Linus Torvalds
2007-04-03 21:03 ` Shawn O. Pearce
2007-04-03 21:13 ` Linus Torvalds
2007-04-03 21:17 ` Shawn O. Pearce
2007-04-03 21:26 ` Linus Torvalds
2007-04-03 21:28 ` Linus Torvalds
2007-04-03 22:31 ` Junio C Hamano
2007-04-03 22:38 ` Shawn O. Pearce
2007-04-03 22:41 ` Junio C Hamano
2007-04-05 10:22 ` [PATCH 1/2] git-fetch--tool pick-rref Junio C Hamano
2007-04-05 10:22 ` [PATCH 2/2] git-fetch: use fetch--tool pick-rref to avoid local fetch from alternate Junio C Hamano
2007-04-05 16:15 ` Shawn O. Pearce
2007-04-05 21:37 ` Junio C Hamano
2007-04-03 21:34 ` git-index-pack really does suck Nicolas Pitre
2007-04-03 21:37 ` Shawn O. Pearce
2007-04-03 21:44 ` Junio C Hamano
2007-04-03 21:53 ` Shawn O. Pearce
2007-04-03 22:10 ` Jeff King
2007-04-03 22:40 ` Dana How
2007-04-03 22:52 ` Linus Torvalds
2007-04-03 22:31 ` David Lang
2007-04-03 23:00 ` Nicolas Pitre
2007-04-03 21:21 ` Nicolas Pitre [this message]
2007-04-03 20:33 ` Linus Torvalds
2007-04-03 21:05 ` Nicolas Pitre
2007-04-03 21:11 ` Shawn O. Pearce
2007-04-03 21:24 ` Linus Torvalds
[not found] ` <alpine.LF D.0.98.0704031735470.28181@xanadu.home>
2007-04-03 21:42 ` Nicolas Pitre
2007-04-03 22:07 ` Junio C Hamano
2007-04-03 22:11 ` Shawn O. Pearce
2007-04-03 22:34 ` Nicolas Pitre
2007-04-03 22:14 ` Linus Torvalds
2007-04-03 22:55 ` Nicolas Pitre
2007-04-03 22:36 ` David Lang
2007-04-04 9:51 ` Alex Riesen
[not found] ` <P ine.LNX.4.63.0704061455380.24050@qynat.qvtvafvgr.pbz>
2007-04-06 21:56 ` David Lang
2007-04-06 22:47 ` Junio C Hamano
2007-04-06 22:49 ` Junio C Hamano
2007-04-06 22:22 ` David Lang
2007-04-06 22:55 ` Junio C Hamano
2007-04-06 22:28 ` David Lang
2007-04-03 23:29 ` Linus Torvalds
2007-04-03 20:34 ` Junio C Hamano
2007-04-03 20:53 ` Nicolas Pitre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.0.98.0704031705440.28181@xanadu.home \
--to=nico@cam.org \
--cc=clee@kde.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).