Re: On data structures and parallelism

Git development
 help / color / mirror / Atom feed

From: Linus Torvalds <torvalds@linux-foundation.org>
To: david@lang.hm
Cc: Heikki Orsila <shdl@zakalwe.fi>, git@vger.kernel.org
Subject: Re: On data structures and parallelism
Date: Sun, 17 May 2009 13:35:44 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.2.01.0905171308010.3301@localhost.localdomain> (raw)
In-Reply-To: <alpine.DEB.1.10.0905171230070.26653@asgard>

On Sun, 17 May 2009, david@lang.hm wrote:
> 
> do things change with SSDs? I've heard that even (especially??) with the Intel
> SSDs you want to have several operations going in paralllel to get the best
> out of them.

There's a slight, but noticeable, improvement.

This is: "echo 3 > /proc/sys/vm/drop_caches; time git diff" run in a loop. 

With 'core.preloadindex = true':

	real	0m1.138s
	real	0m1.116s
	real	0m1.132s
	real	0m1.120s
	real	0m1.106s
	real	0m1.132s

and with it set to 'false':

	real	0m1.256s
	real	0m1.258s
	real	0m1.242s
	real	0m1.240s
	real	0m1.244s
	real	0m1.242s

so it's about a 10% improvement. Which is pretty good, considering 
that

 (a) those disks are fast enough that even for that totally cache-cold 
     case, I get about 35% CPU utilization for the single-threaded case.

     And that's despite this being a 3.2GHz Nehalem box, so 35% CPU is 
     really quite remarkably good. Om my (much slower) laptop with a 
     1.2GHz Core 2, I get 2-3% CPU-time (and the whole operation takes 20 
     seconds).

 (b) Not all the IO ends up being parallelized, since there is a 
     per-directory mutex that means that even though we start 20 threads, 
     it probably gets a much smaller amount of real parallelism due to 
     locking.

in general, the IO parallelization obviously helps most when the IO is 
slow _and_ overlaps perfectly. Perfect overlap doesn't end up happening 
due to the per-directory lookup semaphore (think of it like a bank 
conflict in trying to parallelize memory accesses), but with a slow NFS 
connection you should get reasonably close to that optimal situation.

But with a single spindle, and rotating media, there really is sadly very 
little room for optimization. I suspect a SATA with TCQ disk might be able 
to do _somewhat_ better than my old PATA-only laptop (discounting the fact 
that my PATA laptop harddisk is extra slow due to being just 4200rpm: any 
desktop disk will be much faster), but I doubt the index preloading is 
really all that noticeable.

In fact, I just tested on another machine, and saw no difference 
what-so-ever. If anything, it was slightly slower. I suspect TCQ is a 
bigger win with writes.

			Linus

     prev parent reply	other threads:[~2009-05-17 20:36 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-17 15:23 On data structures and parallelism Heikki Orsila
2009-05-17 17:06 ` Linus Torvalds
2009-05-17 17:46   ` Linus Torvalds
2009-05-17 19:31     ` david
2009-05-17 20:35       ` Linus Torvalds [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.01.0905171308010.3301@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=david@lang.hm \
    --cc=git@vger.kernel.org \
    --cc=shdl@zakalwe.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox