Re: LSF Papers online? - Matthew Wilcox

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <matthew@wil.cx>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Grant Grundler <grundler@google.com>,
	Robert Hancock <hancockrwd@gmail.com>,
	linux-scsi@vger.kernel.org, linux-ide@vger.kernel.org
Subject: Re: LSF Papers online?
Date: Thu, 16 Apr 2009 11:45:53 -0600	[thread overview]
Message-ID: <20090416174553.GE1926@parisc-linux.org> (raw)
In-Reply-To: <1239899823.3762.37.camel@mulgrave.int.hansenpartnership.com>

On Thu, Apr 16, 2009 at 11:37:02AM -0500, James Bottomley wrote:
> of data.   I fully agree that some of the less smart SATA controllers
> have a lot of catching up to do in this space, but that isn't
> necessarily a driver issue; you can't polish a turd as the saying
> goes ...

I guess you haven't seen the episode of Mythbusters where they manage
to do exactly that?  ;-)

> IOPS are starting to come up because SSDs are saying they prefer many
> smaller transactions to an accumulated larger one.  I'm still not

I don't think that's what SSDs are saying.  The protocol (and controllers)
still work better if you send down one 128k IO than 32 4k IOs.  But with
the low latency of doing accesses, it's better to send down a 16k IO
now than it is to wait around a bit and see if another 16k IO comes along.

> entirely convinced that trying to rightsize is wrong here:  most of the
> FS data is getting more contiguous, so even for SSDs we can merge
> without a lot of work.  A simple back of the envelope calculation can
> give the rightizing:  If you want a SSD to max out at its 31 allowed
> tags saturating a 3G sata link, then you're talking 10M per tag per

Better than that, only 8MB of data per tag per second.  SATA effectively
limits you to 250MB/s.  That's 2016 IOPS per tag.  Of course, this
assumes you're only doing the NCQ commands and not, say, issuing TRIM
or something.

> second.  If we assume a 4k sector size, that's 2500 IOPS per tag
> (there's no real point doing less than 4k, because that has us splitting
> the page cache). Or, to put it another way, over 75k IOPS for a single
> SSD doesn't make sense ... the interesting question is whether it would
> make more sense to align on, say 16k io and so expect to max out at 20k
> IOPS.

If we're serious about getting 2000 IOPS per tag, then the round-trip
inside the kernel to recycle a tag has to be less than 500 microseconds.
Do you have a good idea about how to measure what that is today?
Here's the call path taken by the AHCI driver:

ahci_interrupt()
ahci_port_intr()
ata_qc_complete_multiple()
ata_qc_complete()
__ata_qc_complete()
ata_scsi_qc_complete() [qc->complete_fn]
scsi_done() [qc->scsidone]
blk_complete_request()
__blk_complete_request()
raise_softirq_irqoff()
...
blk_done_softirq()
scsi_softirq_done() [rq->q->softirq_done_fn]
scsi_finish_command()
scsi_io_completion()
scsi_end_request()
scsi_next_command()
scsi_run_queue()
__blk_run_queue()
blk_invoke_request_fn()
scsi_request_fn() [q->request_fn]
scsi_dispatch_cmd()
ata_scsi_translate() [host->hostt->queuecommand]
ata_qc_issue()
ahci_qc_issue() [ap->ops->qc_issue]

I can see a few ways to cut down the latency between knowing a tag is
no longer used and starting the next command.

We could pretend the AHCI driver has a queue depth of 64, queue up
commands in the driver, swap the tags over, and send out the next command
before we process this command.

This is similar to a technique that's used in some old SCSI drivers that
didn't support tagged commands at all -- a second command was queued
inside the driver while the first was executing on the device.

But then, we had that big movement towards elimintaing queues from inside
drivers ... maybe we need another way.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

next prev parent reply	other threads:[~2009-04-16 17:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <49E335BA.3020103@panasas.com>
     [not found] ` <200904132340.21525.bzolnier@gmail.com>
     [not found]   ` <49E3BBB9.4040100@garzik.org>
     [not found]     ` <200904140324.59657.bzolnier@gmail.com>
2009-04-14 10:14       ` LSF Papers online? Jeff Garzik
2009-04-14 14:54         ` Bartlomiej Zolnierkiewicz
2009-04-14 15:40           ` Jeff Garzik
2009-04-14 16:54           ` Alan Cox
2009-04-14 22:09             ` Bartlomiej Zolnierkiewicz
2009-04-14 22:49               ` James Bottomley
2009-04-15  1:39                 ` Robert Hancock
2009-04-15  3:58                   ` James Bottomley
2009-04-15  8:30                     ` Alan Cox
2009-04-16  6:31                   ` Grant Grundler
2009-04-16 16:37                     ` James Bottomley
2009-04-16 17:45                       ` Matthew Wilcox [this message]
2009-04-14 23:14               ` Jeff Garzik
2009-04-15  9:28               ` Alan Cox
2009-04-15 13:38                 ` Bartlomiej Zolnierkiewicz
2009-04-15 14:56                   ` Alan Cox
2009-04-16 16:01                     ` Bartlomiej Zolnierkiewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090416174553.GE1926@parisc-linux.org \
    --to=matthew@wil.cx \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=grundler@google.com \
    --cc=hancockrwd@gmail.com \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).