From: Tejun Heo <htejun@gmail.com>
To: Grant Grundler <grundler@google.com>
Cc: Matthew Wilcox <matthew@wil.cx>,
James Bottomley <James.Bottomley@hansenpartnership.com>,
linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: libata / scsi separation
Date: Wed, 10 Dec 2008 12:44:51 +0900 [thread overview]
Message-ID: <493F3B33.8010607@gmail.com> (raw)
In-Reply-To: <da824cf30812091923j241f915dmbcb27245c0d0491b@mail.gmail.com>
Hello,
Grant Grundler wrote:
>>> Maybe you are counting instructions and not cycles? Every cache miss
>>> is 200-300 cycles (say 100ns). When running multiple threads, we will
>>> miss on nearly every spinlock acquisition and probably on several data
>>> accesses. 1 microsecond isn't alot when counting this way.
>> Yeah, ata uses its own locking and the qc allocation does atomic
>> bitops for each bit for no good reason which can hurt for very hi-ops
>> with NCQ tags filled up. If serving 4k requests as fast as possible
>> is the goal, I'm not really sure the current SCSI or ATA commands are
>> the best suited ones. Both SCSI and ATA are focused on rotating media
>> with seek latency
>
> I think existing File Systems and block IO schedulers (except NOOP) are
> tuned for rotating media and access patterns that benefit this media the most.
Acutally, the whole stack is optimized toward IO devices with seek
latency, from the hardware to our drivers and the whole block layer
itself.
>> and thus have SG on the host bus side in mode cases
>> but never on the device side.
>
> SG == scatter-gather? I'm not sure why that is specific to rotating media.
> Or is this referring to "SCSI-generic" pass through?
I was talking about scatter-gather. All the IO commands are about one
continuous extent of data on the device and the whole stack from the
bio is built that way and the overhead of libata is minute compared to
the whole thing including emitting single command and receiving
completion for each 4k transfer.
> In any case, only traversing one fewer layers (SCSI or libata) in
> block code path would help serve 4k requests more efficiently.
Yes, no doubt.
>> If getting the maximum random scattered
>> access throughput is a must, the best way would be adding a SG r/w
>> commands to ATA and adapt our storage stack accordingly.
>
> I don't think everyone wants to throw out the entire stack.
> But adding a passthrough for ATA and connecting that to FUSE might
> be a performant alternative.
Don't know how FUSE would come into play but if the device can receive
list of IOs to perform in a single command and reply accordingly, the
block layer (possibly bio interface too?) can be modified to merge
random IOs into a single request and things will be really fast and
whether we grab one more spinlock or not at the bottom of the stack
wouldn't really matter.
Thanks.
--
tejun
next prev parent reply other threads:[~2008-12-10 3:44 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-03 1:38 [PATCH] remove ide-scsi FUJITA Tomonori
2008-12-03 10:06 ` Christoph Hellwig
2008-12-03 13:31 ` Willem Riede
2008-12-03 13:55 ` Matthew Wilcox
2008-12-03 14:02 ` Alan Cox
2008-12-03 15:09 ` James Bottomley
2008-12-06 6:12 ` Pete Zaitcev
2008-12-06 14:06 ` Bartlomiej Zolnierkiewicz
2008-12-06 14:51 ` Bartlomiej Zolnierkiewicz
2008-12-06 15:06 ` Alan Cox
2008-12-06 16:29 ` Bartlomiej Zolnierkiewicz
2008-12-06 15:25 ` Willem Riede
2008-12-06 15:59 ` Bartlomiej Zolnierkiewicz
2008-12-06 17:00 ` Dan Noé
2008-12-06 21:41 ` Bartlomiej Zolnierkiewicz
2008-12-06 22:24 ` Alan Cox
2008-12-06 22:52 ` Sergei Shtylyov
2008-12-06 23:02 ` Alan Cox
2008-12-06 23:19 ` Sergei Shtylyov
2008-12-06 23:32 ` Alan Cox
2008-12-07 0:08 ` Sergei Shtylyov
2008-12-07 11:40 ` Alan Cox
2008-12-07 14:46 ` Sergei Shtylyov
2008-12-07 15:04 ` James Bottomley
2008-12-07 15:21 ` Sergei Shtylyov
2008-12-09 22:21 ` libata / scsi separation Matthew Wilcox
2008-12-09 22:38 ` James Bottomley
2008-12-10 3:37 ` Matthew Wilcox
2008-12-10 1:54 ` Tejun Heo
2008-12-10 2:29 ` Grant Grundler
2008-12-10 2:47 ` Tejun Heo
2008-12-10 3:23 ` Grant Grundler
2008-12-10 3:44 ` Tejun Heo [this message]
2008-12-10 15:24 ` Matthew Wilcox
2008-12-10 15:33 ` Tejun Heo
2008-12-10 16:01 ` Matthew Wilcox
2008-12-10 17:11 ` Grant Grundler
2008-12-10 17:21 ` Grant Grundler
2008-12-07 0:19 ` [PATCH] remove ide-scsi Sergei Shtylyov
2008-12-07 9:59 ` Sergei Shtylyov
2008-12-07 10:41 ` Sergei Shtylyov
2008-12-09 21:41 ` Matthew Wilcox
2008-12-10 17:46 ` Sergei Shtylyov
2008-12-06 23:28 ` Jeff Garzik
2008-12-06 23:42 ` Sergei Shtylyov
2008-12-06 23:48 ` Jeff Garzik
2008-12-07 3:36 ` Yinghai Lu
2008-12-07 4:17 ` Jeff Garzik
2008-12-07 5:07 ` Yinghai Lu
2008-12-07 11:00 ` Sergei Shtylyov
2008-12-09 19:59 ` Mark Lord
2008-12-09 20:07 ` Jeff Garzik
2008-12-09 21:04 ` James Bottomley
2008-12-06 23:45 ` Bartlomiej Zolnierkiewicz
2008-12-06 23:50 ` Jeff Garzik
2008-12-06 23:40 ` Bartlomiej Zolnierkiewicz
2008-12-06 23:51 ` Alan Cox
2008-12-07 0:56 ` Bartlomiej Zolnierkiewicz
2008-12-07 1:14 ` Alan Cox
2008-12-07 10:32 ` Sergei Shtylyov
2008-12-06 23:51 ` Jeff Garzik
2008-12-06 22:33 ` Al Viro
2008-12-06 23:13 ` Bartlomiej Zolnierkiewicz
2008-12-06 23:17 ` Willem Riede
2008-12-07 0:09 ` Al Viro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=493F3B33.8010607@gmail.com \
--to=htejun@gmail.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=grundler@google.com \
--cc=linux-ide@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=matthew@wil.cx \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).