linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: axboe@kernel.dk (Jens Axboe)
Subject: RFC: Allow block drivers to poll for I/O instead of sleeping
Date: Tue, 25 Jun 2013 16:55:20 +0200	[thread overview]
Message-ID: <20130625145520.GK5594@kernel.dk> (raw)
In-Reply-To: <20130625030140.GZ8211@linux.intel.com>

On Mon, Jun 24 2013, Matthew Wilcox wrote:
> On Mon, Jun 24, 2013@09:15:45AM +0200, Jens Axboe wrote:
> > Willy, I think the general design is fine, hooking in via the bdi is the
> > only way to get back to the right place from where you need to sleep.
> > Some thoughts:
> > 
> > - This should be hooked in via blk-iopoll, both of them should call into
> >   the same driver hook for polling completions.
> 
> I actually started working on this, then I realised that it's actually
> a bad idea.  blk-iopoll's poll function is to poll the single I/O queue
> closest to this CPU.  The iowait poll function is to poll all queues
> that the I/O for this address_space might complete on.

blk_iopoll can be tied to "whatever". It was originally designed to be
tied to the queue, which would make it driver-wide. So there's no intent
for it to poll only a subset of the device, though if you tie it to a
completion queue (which would be most natural), then it'd only find
completions there.

I didn't look at your nvme end of the implementation - if you could
reliably map to the right completion queue, then it would have the same
mapping as iopoll on a per-completion queue basis. If you can't, then
you have to poll all of them. That doesn't seem like it would scale well
for having more than a few applications banging on a device.

> I'm reluctant to ask drivers to define two poll functions, but I'm even
> more reluctant to ask them to define one function with two purposes.
>
> > - It needs to be more intelligent in when you want to poll and when you
> >   want regular irq driven IO.
> 
> Oh yeah, absolutely.  While the example patch didn't show it, I wouldn't
> enable it for all NVMe devices; only ones with sufficiently low latency.
> There's also the ability for the driver to look at the number of
> outstanding I/Os and return an error (eg -EBUSY) to stop spinning.

There might also be read vs write differences. Say some devices complete
writes very quickly, but reads are slower. Or vice versa. And then
there's the inevitable "some IOs are slow, but usually very fast". But
that can't really be handled except giving up on the polling at some
point.

> > - With the former note, the app either needs to opt in (and hence
> >   willingly sacrifice CPU cycles of its scheduling slice) or it needs to
> >   be nicer in when it gives up and goes back to irq driven IO.
> 
> Yup.  I like the way you framed it.  If the task *wants* to spend its
> CPU cycles on polling for I/O instead of giving up the remainder of its
> time slice, then it should be able to do that.  After all, it already can;
> it can submit an I/O request via AIO, and then call io_getevents in a
> tight loop.

Exactly, that was my point. Or it can busy loop just checking the aio
ring, at which point it's really stupid to be IRQ driven at all. It'd be
much better to have the app reap the completion directly.

> So maybe the right way to do this is with a task flag?  If we go that
> route, I'd like to further develop this option to allow I/Os to be
> designated as "low latency" vs "normal".  Taking a page fault would be
> "low latency" for all tasks, not just ones that choose to spin for I/O.

Not sure, I'd have to think about it some more. It's a mix of what the
application decides to do, but also what the underlying device can do.
And there might be fs implications, etc.

-- 
Jens Axboe

  reply	other threads:[~2013-06-25 14:55 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20 20:17 RFC: Allow block drivers to poll for I/O instead of sleeping Matthew Wilcox
2013-06-23 10:09 ` Ingo Molnar
2013-06-23 18:29   ` Linus Torvalds
2013-06-24  7:17     ` Jens Axboe
2013-06-25  0:11       ` Steven Rostedt
2013-06-25  3:07         ` Matthew Wilcox
2013-06-25 13:57           ` Steven Rostedt
2013-06-25 14:57         ` Jens Axboe
2013-06-24  8:07     ` Ingo Molnar
2013-06-25  3:18       ` Matthew Wilcox
2013-06-25  7:07         ` Bart Van Assche
2013-06-25 15:00         ` Jens Axboe
2013-06-27 18:10     ` Rik van Riel
2013-06-23 22:14   ` David Ahern
2013-06-24  8:21     ` Ingo Molnar
2013-06-24  7:15   ` Jens Axboe
2013-06-24  8:18     ` Ingo Molnar
2013-06-25  3:01     ` Matthew Wilcox
2013-06-25 14:55       ` Jens Axboe [this message]
2013-06-27 18:42 ` Rik van Riel
2013-07-04  1:13 ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130625145520.GK5594@kernel.dk \
    --to=axboe@kernel.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).