Git development
 help / color / mirror / Atom feed
From: david@lang.hm
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Heikki Orsila <shdl@zakalwe.fi>, git@vger.kernel.org
Subject: Re: On data structures and parallelism
Date: Sun, 17 May 2009 12:31:35 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.1.10.0905171230070.26653@asgard> (raw)
In-Reply-To: <alpine.LFD.2.01.0905171038320.3301@localhost.localdomain>

On Sun, 17 May 2009, Linus Torvalds wrote:

> On Sun, 17 May 2009, Linus Torvalds wrote:
>>
>> That said, on my laptops, CPU time really _never_ is the issue. Every
>> single time something is slow, the issue is a slow 4200rpm disk that may
>> get 25MB/s off it for linear things in the best case, but seeks take
>> milliseconds and any kind of random access will just kill performance.
>
> Side note - I've several times desperately tried to see if IO parallelism
> helps. It doesn't. Some drives do better if they get many independent
> reads and can just do them concurrently. Sadly, that's pretty rare for
> reads on rotational media, and impossible with legacy IDE drives (that
> don't have the ability to do tagged queueing).
>
> So when I try to do IO in parallel (which git does support for many
> operations), that just makes the whole system come to a screeching halt
> because it now seeks around the disk a lot more. A similar issue that
> often kill parallelism on CPU's (bad cache behavior, and lots of
> outstanding memory requests) kills parallelism on disks too - disk
> performance simply is much _better_ if you do serial things than if you
> try to parallelize the same work.
>
> It would be different if I had a fancy high-end RAID system with tagged
> queueing and lots of spare bandwidth that could be used in parallel. But
> that's not what the git usage scenario often is. All the people pushing
> multi-core seem to always ignore the big issues, and always working on
> nice trivial problems with a small and well-behaved "kernel" that has no
> IO and preferably didn't cache well even when single-threaded (ie
> "streaming" data).

do things change with SSDs? I've heard that even (especially??) with the 
Intel SSDs you want to have several operations going in paralllel to get 
the best out of them.

David Lang

  reply	other threads:[~2009-05-17 19:31 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-17 15:23 On data structures and parallelism Heikki Orsila
2009-05-17 17:06 ` Linus Torvalds
2009-05-17 17:46   ` Linus Torvalds
2009-05-17 19:31     ` david [this message]
2009-05-17 20:35       ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.1.10.0905171230070.26653@asgard \
    --to=david@lang.hm \
    --cc=git@vger.kernel.org \
    --cc=shdl@zakalwe.fi \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox