git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Chris Lee <clee@kde.org>, Nicolas Pitre <nico@cam.org>,
	Junio C Hamano <junkio@cox.net>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: git-index-pack really does suck..
Date: Tue, 3 Apr 2007 13:33:24 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.64.0704031322490.6730@woody.linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0704031304420.6730@woody.linux-foundation.org>



On Tue, 3 Apr 2007, Linus Torvalds wrote:
> 
> So how about this updated patch? We could certainly make "git pull" imply 
> "--paranoid" if we want to, but even that is likely pretty unnecessary. 
> It's not like anybody has ever shown a SHA1 collision, and if the *local* 
> repository is corrupt (and has an object with the wrong SHA1 - that's what 
> the testsuite checks for), then it's probably good to get the valid object 
> from the remote..

Some trivial timings for indexing just the kernel pack..

Without --paranoid:

	24.61user 2.16system 0:27.04elapsed 99%CPU
	0major+14120minor pagefaults

With --paranoid:

	42.74user 3.04system 0:46.36elapsed 98%CPU
	0major+72768minor pagefaults

so it's a noticeable CPU issue, but it's even more noticeable in memory 
usage (55MB vs 284MB - pagefaults give a good way to look at how much 
memory really got allocated for the process).

All that extra memory is just for SHA1 commit ID information. 

Now, clearly the usage scenario here is a big odd (ie the case where we 
have all the objects already), so in that sense this is very much a 
worst-case situation, and you simply shouldn't *do* something like this, 
but at the same time, I'm just not convinced a very theoretical SHA1 
collision check is worth it. 

Btw, even if we don't have any of the objects, if you have tons and tons 
of objects and do a "git pull", just the *lookup* of the nonexistent 
objects will be expensive: first we won't find it in any pack, then we'll 
look at the loose objects, and then we'll look int he pack *again* due to 
the race avoidance. So looking up nonexistent objects is actually pretty 
expensive.

In fact, "--paranoid" takes one second more for me even totally outside of 
a git repository, just because we waste so much time trying to look up 
non-existent object files ;)

			Linus

  parent reply	other threads:[~2007-04-03 20:44 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-03 15:15 git-index-pack really does suck Linus Torvalds
     [not found] ` <Pi ne.LNX.4.64.0704031413200.6730@woody.linux-foundation.org>
     [not found]   ` <alpine.LFD.0.98. 0704031836350.28181@xanadu.home>
     [not found] ` <db 69205d0704031227q1009eabfhdd82aa3636f25bb6@mail.gmail.com>
     [not found]   ` <Pine.LNX.4.64.07 04031304420.6730@woody.linux-foundation.org>
     [not found]     ` <Pine.LNX.4.64.0704031322490.67 30@woody.linux-foundation.org>
2007-04-03 16:21 ` Linus Torvalds
2007-04-03 16:40   ` Nicolas Pitre
2007-04-03 16:33 ` Nicolas Pitre
2007-04-03 19:27 ` Chris Lee
2007-04-03 19:49   ` Nicolas Pitre
2007-04-03 19:54     ` Chris Lee
2007-04-03 20:18   ` Linus Torvalds
2007-04-03 20:32     ` Nicolas Pitre
2007-04-03 20:40       ` Junio C Hamano
2007-04-03 21:00         ` Linus Torvalds
2007-04-03 21:28           ` Nicolas Pitre
2007-04-03 22:49           ` Chris Lee
2007-04-03 23:12             ` Linus Torvalds
2007-04-03 20:56       ` Linus Torvalds
2007-04-03 21:03         ` Shawn O. Pearce
2007-04-03 21:13           ` Linus Torvalds
2007-04-03 21:17             ` Shawn O. Pearce
2007-04-03 21:26               ` Linus Torvalds
2007-04-03 21:28                 ` Linus Torvalds
2007-04-03 22:31                   ` Junio C Hamano
2007-04-03 22:38                     ` Shawn O. Pearce
2007-04-03 22:41                       ` Junio C Hamano
2007-04-05 10:22                   ` [PATCH 1/2] git-fetch--tool pick-rref Junio C Hamano
2007-04-05 10:22                   ` [PATCH 2/2] git-fetch: use fetch--tool pick-rref to avoid local fetch from alternate Junio C Hamano
2007-04-05 16:15                     ` Shawn O. Pearce
2007-04-05 21:37                       ` Junio C Hamano
2007-04-03 21:34               ` git-index-pack really does suck Nicolas Pitre
2007-04-03 21:37                 ` Shawn O. Pearce
2007-04-03 21:44                   ` Junio C Hamano
2007-04-03 21:53                     ` Shawn O. Pearce
2007-04-03 22:10                       ` Jeff King
2007-04-03 22:40                 ` Dana How
2007-04-03 22:52                   ` Linus Torvalds
2007-04-03 22:31                     ` David Lang
2007-04-03 23:00                   ` Nicolas Pitre
2007-04-03 21:21         ` Nicolas Pitre
2007-04-03 20:33     ` Linus Torvalds [this message]
2007-04-03 21:05       ` Nicolas Pitre
2007-04-03 21:11         ` Shawn O. Pearce
2007-04-03 21:24         ` Linus Torvalds
     [not found]           ` <alpine.LF D.0.98.0704031735470.28181@xanadu.home>
2007-04-03 21:42           ` Nicolas Pitre
2007-04-03 22:07             ` Junio C Hamano
2007-04-03 22:11               ` Shawn O. Pearce
2007-04-03 22:34               ` Nicolas Pitre
2007-04-03 22:14             ` Linus Torvalds
2007-04-03 22:55               ` Nicolas Pitre
2007-04-03 22:36                 ` David Lang
2007-04-04  9:51                   ` Alex Riesen
     [not found]                     ` <P ine.LNX.4.63.0704061455380.24050@qynat.qvtvafvgr.pbz>
2007-04-06 21:56                     ` David Lang
2007-04-06 22:47                       ` Junio C Hamano
2007-04-06 22:49                         ` Junio C Hamano
2007-04-06 22:22                           ` David Lang
2007-04-06 22:55                             ` Junio C Hamano
2007-04-06 22:28                               ` David Lang
2007-04-03 23:29                 ` Linus Torvalds
2007-04-03 20:34     ` Junio C Hamano
2007-04-03 20:53       ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0704031322490.6730@woody.linux-foundation.org \
    --to=torvalds@linux-foundation.org \
    --cc=clee@kde.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).