All of lore.kernel.org
 help / color / mirror / Atom feed
From: willy@linux.intel.com (Matthew Wilcox)
Subject: RFC: Allow block drivers to poll for I/O instead of sleeping
Date: Mon, 24 Jun 2013 23:01:40 -0400	[thread overview]
Message-ID: <20130625030140.GZ8211@linux.intel.com> (raw)
In-Reply-To: <20130624071544.GR9422@kernel.dk>

On Mon, Jun 24, 2013@09:15:45AM +0200, Jens Axboe wrote:
> Willy, I think the general design is fine, hooking in via the bdi is the
> only way to get back to the right place from where you need to sleep.
> Some thoughts:
> 
> - This should be hooked in via blk-iopoll, both of them should call into
>   the same driver hook for polling completions.

I actually started working on this, then I realised that it's actually
a bad idea.  blk-iopoll's poll function is to poll the single I/O queue
closest to this CPU.  The iowait poll function is to poll all queues
that the I/O for this address_space might complete on.

I'm reluctant to ask drivers to define two poll functions, but I'm even
more reluctant to ask them to define one function with two purposes.

> - It needs to be more intelligent in when you want to poll and when you
>   want regular irq driven IO.

Oh yeah, absolutely.  While the example patch didn't show it, I wouldn't
enable it for all NVMe devices; only ones with sufficiently low latency.
There's also the ability for the driver to look at the number of
outstanding I/Os and return an error (eg -EBUSY) to stop spinning.

> - With the former note, the app either needs to opt in (and hence
>   willingly sacrifice CPU cycles of its scheduling slice) or it needs to
>   be nicer in when it gives up and goes back to irq driven IO.

Yup.  I like the way you framed it.  If the task *wants* to spend its
CPU cycles on polling for I/O instead of giving up the remainder of its
time slice, then it should be able to do that.  After all, it already can;
it can submit an I/O request via AIO, and then call io_getevents in a
tight loop.

So maybe the right way to do this is with a task flag?  If we go that
route, I'd like to further develop this option to allow I/Os to be
designated as "low latency" vs "normal".  Taking a page fault would be
"low latency" for all tasks, not just ones that choose to spin for I/O.

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Wilcox <willy@linux.intel.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Ingo Molnar <mingo@kernel.org>, Al Viro <viro@zeniv.linux.org.uk>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
	linux-scsi@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: RFC: Allow block drivers to poll for I/O instead of sleeping
Date: Mon, 24 Jun 2013 23:01:40 -0400	[thread overview]
Message-ID: <20130625030140.GZ8211@linux.intel.com> (raw)
In-Reply-To: <20130624071544.GR9422@kernel.dk>

On Mon, Jun 24, 2013 at 09:15:45AM +0200, Jens Axboe wrote:
> Willy, I think the general design is fine, hooking in via the bdi is the
> only way to get back to the right place from where you need to sleep.
> Some thoughts:
> 
> - This should be hooked in via blk-iopoll, both of them should call into
>   the same driver hook for polling completions.

I actually started working on this, then I realised that it's actually
a bad idea.  blk-iopoll's poll function is to poll the single I/O queue
closest to this CPU.  The iowait poll function is to poll all queues
that the I/O for this address_space might complete on.

I'm reluctant to ask drivers to define two poll functions, but I'm even
more reluctant to ask them to define one function with two purposes.

> - It needs to be more intelligent in when you want to poll and when you
>   want regular irq driven IO.

Oh yeah, absolutely.  While the example patch didn't show it, I wouldn't
enable it for all NVMe devices; only ones with sufficiently low latency.
There's also the ability for the driver to look at the number of
outstanding I/Os and return an error (eg -EBUSY) to stop spinning.

> - With the former note, the app either needs to opt in (and hence
>   willingly sacrifice CPU cycles of its scheduling slice) or it needs to
>   be nicer in when it gives up and goes back to irq driven IO.

Yup.  I like the way you framed it.  If the task *wants* to spend its
CPU cycles on polling for I/O instead of giving up the remainder of its
time slice, then it should be able to do that.  After all, it already can;
it can submit an I/O request via AIO, and then call io_getevents in a
tight loop.

So maybe the right way to do this is with a task flag?  If we go that
route, I'd like to further develop this option to allow I/Os to be
designated as "low latency" vs "normal".  Taking a page fault would be
"low latency" for all tasks, not just ones that choose to spin for I/O.

  parent reply	other threads:[~2013-06-25  3:01 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20 20:17 RFC: Allow block drivers to poll for I/O instead of sleeping Matthew Wilcox
2013-06-20 20:17 ` Matthew Wilcox
2013-06-23 10:09 ` Ingo Molnar
2013-06-23 10:09   ` Ingo Molnar
2013-06-23 18:29   ` Linus Torvalds
2013-06-23 18:29     ` Linus Torvalds
2013-06-24  7:17     ` Jens Axboe
2013-06-24  7:17       ` Jens Axboe
2013-06-25  0:11       ` Steven Rostedt
2013-06-25  0:11         ` Steven Rostedt
2013-06-25  3:07         ` Matthew Wilcox
2013-06-25  3:07           ` Matthew Wilcox
2013-06-25 13:57           ` Steven Rostedt
2013-06-25 13:57             ` Steven Rostedt
2013-06-25 14:57         ` Jens Axboe
2013-06-25 14:57           ` Jens Axboe
2013-06-24  8:07     ` Ingo Molnar
2013-06-24  8:07       ` Ingo Molnar
2013-06-25  3:18       ` Matthew Wilcox
2013-06-25  3:18         ` Matthew Wilcox
2013-06-25  7:07         ` Bart Van Assche
2013-06-25  7:07           ` Bart Van Assche
2013-06-25 15:00         ` Jens Axboe
2013-06-25 15:00           ` Jens Axboe
2013-06-27 18:10     ` Rik van Riel
2013-06-27 18:10       ` Rik van Riel
2013-06-23 22:14   ` David Ahern
2013-06-23 22:14     ` David Ahern
2013-06-24  8:21     ` Ingo Molnar
2013-06-24  8:21       ` Ingo Molnar
2013-06-24  7:15   ` Jens Axboe
2013-06-24  7:15     ` Jens Axboe
2013-06-24  8:18     ` Ingo Molnar
2013-06-24  8:18       ` Ingo Molnar
2013-06-25  3:01     ` Matthew Wilcox [this message]
2013-06-25  3:01       ` Matthew Wilcox
2013-06-25 14:55       ` Jens Axboe
2013-06-25 14:55         ` Jens Axboe
2013-06-27 18:42 ` Rik van Riel
2013-06-27 18:42   ` Rik van Riel
2013-07-04  1:13 ` Shaohua Li
2013-07-04  1:13   ` Shaohua Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130625030140.GZ8211@linux.intel.com \
    --to=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.