linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Grant Grundler <grundler@google.com>
Cc: Matthew Wilcox <matthew@wil.cx>,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: libata / scsi separation
Date: Wed, 10 Dec 2008 11:47:05 +0900	[thread overview]
Message-ID: <493F2DA9.7040008@gmail.com> (raw)
In-Reply-To: <da824cf30812091829g180b3e45i843e356364bc2f9f@mail.gmail.com>

Hello,

Grant Grundler wrote:
> On Tue, Dec 9, 2008 at 5:54 PM, Tejun Heo <htejun@gmail.com> wrote:
>> (cc'ing Jens)
> ...
>> Is the command issue rate really the bottleneck?
> 
> Not directly. It's the lack of CPU leftover at high transaction rates
> ( > 10000 IOPS per disk). So yes, the system does bottle neck on CPU
> utilization.
> 
>> It seem a bit
>> unlikely unless you're issuing lots of really small IOs but then again
>> those new SSDs are pretty fast.
> 
> That's the whole point of SSDs (lots of small, random IO).

But on many workloads, filesystems manage to colocate what belongs
together and with little help from read ahead and block layer we
manage to dish out decently sized requests.  It will be great to serve
4k requests as fast as we can but whether that should be (or rather
how much) the focal point of optimization is a slightly different
problem.

> The second desirable attribute SSDs have is consistent response for
> reads. HDs vary from microseconds to 100's of milliseconds. Very long
> tail in the read latency response.
> 
>>> (OK, I haven't measured the overhead of the *SCSI* layer, I've measured
>>> the overhead of the *libata* layer.  I think the point here is that you
>>> can't measure the difference at a macro level unless you're sending a
>>> lot of commands.)
>> How did you measure it?
> 
> Willy presented how he measured SCSI stack at LSF2008. ISTR he was
> advised to use oprofile in his test application so there is probably
> an updated version of these slides:
>     http://iou.parisc-linux.org/lsf2008/IO-latency-Kristen-Carlson-Accardi.pdf

Ah... okay, with ram low level driver.

>> The issue path isn't thick at all although
>> command allocation logic there is a bit brain damaged and should use
>> block layer tag management.  All it does is - allocate qc, interpret
>> SCSI command to ATA command and write it to qc, map dma and build dma
>> table and pass it over to the low level issue function.  The only
>> extra step there is the translation part and I don't think that can
>> take a full microsecond on modern processors.
> 
> Maybe you are counting instructions and not cycles? Every cache miss
> is 200-300 cycles (say 100ns). When running multiple threads, we will
> miss on nearly every spinlock acquisition and probably on several data
> accesses. 1 microsecond isn't alot when counting this way.

Yeah, ata uses its own locking and the qc allocation does atomic
bitops for each bit for no good reason which can hurt for very hi-ops
with NCQ tags filled up.  If serving 4k requests as fast as possible
is the goal, I'm not really sure the current SCSI or ATA commands are
the best suited ones.  Both SCSI and ATA are focused on rotating media
with seek latency and thus have SG on the host bus side in mode cases
but never on the device side.  If getting the maximum random scattered
access throughput is a must, the best way would be adding a SG r/w
commands to ATA and adapt our storage stack accordingly.

Thanks.

-- 
tejun

  reply	other threads:[~2008-12-10  2:47 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-03  1:38 [PATCH] remove ide-scsi FUJITA Tomonori
2008-12-03 10:06 ` Christoph Hellwig
2008-12-03 13:31   ` Willem Riede
2008-12-03 13:55     ` Matthew Wilcox
2008-12-03 14:02       ` Alan Cox
2008-12-03 15:09   ` James Bottomley
2008-12-06  6:12     ` Pete Zaitcev
2008-12-06 14:06       ` Bartlomiej Zolnierkiewicz
2008-12-06 14:51     ` Bartlomiej Zolnierkiewicz
2008-12-06 15:06       ` Alan Cox
2008-12-06 16:29         ` Bartlomiej Zolnierkiewicz
2008-12-06 15:25       ` Willem Riede
2008-12-06 15:59         ` Bartlomiej Zolnierkiewicz
2008-12-06 17:00       ` Dan Noé
2008-12-06 21:41         ` Bartlomiej Zolnierkiewicz
2008-12-06 22:24           ` Alan Cox
2008-12-06 22:52             ` Sergei Shtylyov
2008-12-06 23:02               ` Alan Cox
2008-12-06 23:19                 ` Sergei Shtylyov
2008-12-06 23:32                   ` Alan Cox
2008-12-07  0:08                     ` Sergei Shtylyov
2008-12-07 11:40                       ` Alan Cox
2008-12-07 14:46                         ` Sergei Shtylyov
2008-12-07 15:04                   ` James Bottomley
2008-12-07 15:21                     ` Sergei Shtylyov
2008-12-09 22:21                     ` libata / scsi separation Matthew Wilcox
2008-12-09 22:38                       ` James Bottomley
2008-12-10  3:37                         ` Matthew Wilcox
2008-12-10  1:54                       ` Tejun Heo
2008-12-10  2:29                         ` Grant Grundler
2008-12-10  2:47                           ` Tejun Heo [this message]
2008-12-10  3:23                             ` Grant Grundler
2008-12-10  3:44                               ` Tejun Heo
2008-12-10 15:24                                 ` Matthew Wilcox
2008-12-10 15:33                                   ` Tejun Heo
2008-12-10 16:01                                     ` Matthew Wilcox
2008-12-10 17:11                                     ` Grant Grundler
2008-12-10 17:21                                   ` Grant Grundler
2008-12-07  0:19                 ` [PATCH] remove ide-scsi Sergei Shtylyov
2008-12-07  9:59                   ` Sergei Shtylyov
2008-12-07 10:41                 ` Sergei Shtylyov
2008-12-09 21:41                 ` Matthew Wilcox
2008-12-10 17:46                   ` Sergei Shtylyov
2008-12-06 23:28               ` Jeff Garzik
2008-12-06 23:42                 ` Sergei Shtylyov
2008-12-06 23:48                   ` Jeff Garzik
2008-12-07  3:36                     ` Yinghai Lu
2008-12-07  4:17                       ` Jeff Garzik
2008-12-07  5:07                         ` Yinghai Lu
2008-12-07 11:00                           ` Sergei Shtylyov
2008-12-09 19:59                         ` Mark Lord
2008-12-09 20:07                           ` Jeff Garzik
2008-12-09 21:04                             ` James Bottomley
2008-12-06 23:45                 ` Bartlomiej Zolnierkiewicz
2008-12-06 23:50                   ` Jeff Garzik
2008-12-06 23:40             ` Bartlomiej Zolnierkiewicz
2008-12-06 23:51               ` Alan Cox
2008-12-07  0:56                 ` Bartlomiej Zolnierkiewicz
2008-12-07  1:14                   ` Alan Cox
2008-12-07 10:32                     ` Sergei Shtylyov
2008-12-06 23:51               ` Jeff Garzik
2008-12-06 22:33           ` Al Viro
2008-12-06 23:13             ` Bartlomiej Zolnierkiewicz
2008-12-06 23:17             ` Willem Riede
2008-12-07  0:09               ` Al Viro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=493F2DA9.7040008@gmail.com \
    --to=htejun@gmail.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=grundler@google.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=matthew@wil.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).