* [PATCH 0/2] Block: Give option to force io polling
@ 2016-03-31 20:19 Jon Derrick
2016-04-05 12:38 ` Christoph Hellwig
2016-05-05 19:44 ` Stephen Bates
0 siblings, 2 replies; 11+ messages in thread
From: Jon Derrick @ 2016-03-31 20:19 UTC (permalink / raw)
To: axboe; +Cc: Jon Derrick, linux-nvme, linux-block, keith.busch, hch,
stephen.bates
In 4.6, enabling io polling in direct-io was switched to a per-io flag.
This had an unintended result of giving a significant difference when
doing benchmarks between 4.5 and 4.6, using fio's sync engine.
I was able to regain the performance by getting the pvsync2 engine
working with the new p{read,write}v2 syscalls, but this patchset allows
polling to be tried in the direct-io path with the other syscalls.
Rather than having to convert applications to prwv2 syscalls, users can
enable this knob that let's them see the same performance as they may
have seen in 4.5 when it always polled.
Jon Derrick (2):
block: add queue flag to always poll
block: add forced polling sysfs controls
block/blk-core.c | 8 ++++++++
block/blk-sysfs.c | 38 ++++++++++++++++++++++++++++++++++++++
fs/direct-io.c | 7 ++++++-
include/linux/blkdev.h | 2 ++
4 files changed, 54 insertions(+), 1 deletion(-)
--
2.5.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] Block: Give option to force io polling
2016-03-31 20:19 [PATCH 0/2] Block: Give option to force io polling Jon Derrick
@ 2016-04-05 12:38 ` Christoph Hellwig
2016-04-05 15:54 ` Keith Busch
2016-05-05 19:44 ` Stephen Bates
1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2016-04-05 12:38 UTC (permalink / raw)
To: Jon Derrick
Cc: axboe, keith.busch, hch, linux-nvme, linux-block, stephen.bates
On Thu, Mar 31, 2016 at 02:19:12PM -0600, Jon Derrick wrote:
> In 4.6, enabling io polling in direct-io was switched to a per-io flag.
> This had an unintended result of giving a significant difference when
> doing benchmarks between 4.5 and 4.6, using fio's sync engine.
Only if you manually enabled polling beforehand. I really don't see
how this is going to be useful for actual applications vs benchmarks,
so I'll need more convincing arguments.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] Block: Give option to force io polling
2016-04-05 12:38 ` Christoph Hellwig
@ 2016-04-05 15:54 ` Keith Busch
2016-04-05 17:27 ` Christoph Hellwig
0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2016-04-05 15:54 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jon Derrick, axboe, linux-nvme, linux-block, stephen.bates
On Tue, Apr 05, 2016 at 05:38:32AM -0700, Christoph Hellwig wrote:
> On Thu, Mar 31, 2016 at 02:19:12PM -0600, Jon Derrick wrote:
> > In 4.6, enabling io polling in direct-io was switched to a per-io flag.
> > This had an unintended result of giving a significant difference when
> > doing benchmarks between 4.5 and 4.6, using fio's sync engine.
>
> Only if you manually enabled polling beforehand. I really don't see
> how this is going to be useful for actual applications vs benchmarks,
> so I'll need more convincing arguments.
I think it's more about providing opt-in control to make the low-latency
benefit reachable to users with existing programs.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] Block: Give option to force io polling
2016-04-05 15:54 ` Keith Busch
@ 2016-04-05 17:27 ` Christoph Hellwig
2016-04-05 18:07 ` Stephen Bates
0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2016-04-05 17:27 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Jon Derrick, axboe, linux-nvme, linux-block,
stephen.bates
On Tue, Apr 05, 2016 at 03:54:54PM +0000, Keith Busch wrote:
> I think it's more about providing opt-in control to make the low-latency
> benefit reachable to users with existing programs.
What program benefits from unconditional polling for all I/O on a given
device?
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH 0/2] Block: Give option to force io polling
2016-04-05 17:27 ` Christoph Hellwig
@ 2016-04-05 18:07 ` Stephen Bates
0 siblings, 0 replies; 11+ messages in thread
From: Stephen Bates @ 2016-04-05 18:07 UTC (permalink / raw)
To: Christoph Hellwig, Keith Busch
Cc: axboe@fb.com, linux-block@vger.kernel.org,
linux-nvme@lists.infradead.org, Jon Derrick
> > I think it's more about providing opt-in control to make the
> > low-latency benefit reachable to users with existing programs.
>
> What program benefits from unconditional polling for all I/O on a given
> device?
End users who want to avail of the polling benefits for low latency NVMe devices without having to resort to rewriting their code would like to see something like what Jon is proposing. As I understand it this could be polling control per queue not per device but I can see caching applications where polling for the whole NVMe device would be desirable.
Stephen
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH 0/2] Block: Give option to force io polling
2016-03-31 20:19 [PATCH 0/2] Block: Give option to force io polling Jon Derrick
2016-04-05 12:38 ` Christoph Hellwig
@ 2016-05-05 19:44 ` Stephen Bates
2016-05-08 9:02 ` hch
1 sibling, 1 reply; 11+ messages in thread
From: Stephen Bates @ 2016-05-05 19:44 UTC (permalink / raw)
To: Jon Derrick, axboe@fb.com
Cc: linux-block@vger.kernel.org, keith.busch@intel.com, Stephen Bates,
linux-nvme@lists.infradead.org, hch@infradead.org
>
> In 4.6, enabling io polling in direct-io was switched to a per-io flag.
> This had an unintended result of giving a significant difference when doing
> benchmarks between 4.5 and 4.6, using fio's sync engine.
>
> I was able to regain the performance by getting the pvsync2 engine working
> with the new p{read,write}v2 syscalls, but this patchset allows polling to be
> tried in the direct-io path with the other syscalls.
>
> Rather than having to convert applications to prwv2 syscalls, users can enable
> this knob that let's them see the same performance as they may have seen
> in 4.5 when it always polled.
>
> Jon Derrick (2):
> block: add queue flag to always poll
> block: add forced polling sysfs controls
>
> block/blk-core.c | 8 ++++++++
> block/blk-sysfs.c | 38 ++++++++++++++++++++++++++++++++++++++
> fs/direct-io.c | 7 ++++++-
> include/linux/blkdev.h | 2 ++
> 4 files changed, 54 insertions(+), 1 deletion(-)
>
Hi
Revisiting this discussion from late March...
I am very interested in seeing this added. There are use cases involving super-low latency (non-NAND based) NVMe devices and they want the fastest possible IO response times for ALL IO to that device. They also have no desire to wait for the new system calls and glibc updates needed to tie their applications into polling or to rewrite their applications to avail of those new calls. I have done some testing on Jon's "big hammer" and it seems to work well for this use case and applied cleanly against v4.6-rc6.
For the series...
Reviewed-by: Stephen Bates <stephen.bates@microsemi.com>
Tested-by: Stephen Bates <stephen.bates@microsemi.com>
Cheers
Stephen
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] Block: Give option to force io polling
2016-05-05 19:44 ` Stephen Bates
@ 2016-05-08 9:02 ` hch
2016-05-09 14:53 ` Stephen Bates
0 siblings, 1 reply; 11+ messages in thread
From: hch @ 2016-05-08 9:02 UTC (permalink / raw)
To: Stephen Bates
Cc: Jon Derrick, axboe@fb.com, linux-nvme@lists.infradead.org,
linux-block@vger.kernel.org, keith.busch@intel.com,
hch@infradead.org
On Thu, May 05, 2016 at 07:44:29PM +0000, Stephen Bates wrote:
> I am very interested in seeing this added. There are use cases involving super-low latency (non-NAND based) NVMe devices and they want the fastest possible IO response times for ALL IO to that device. They also have no desire to wait for the new system calls and glibc updates needed to tie their applications into polling or to rewrite their applications to avail of those new calls. I have done some testing on Jon's "big hammer" and it seems to work well for this use case and applied cleanly against v4.6-rc6.
Let's get the driver for those devices merged first, and if you can
provide numbers that it's worth it we can add a tweak to always enable
polling from the driver for those devices.
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH 0/2] Block: Give option to force io polling
2016-05-08 9:02 ` hch
@ 2016-05-09 14:53 ` Stephen Bates
2016-05-12 7:08 ` hch
0 siblings, 1 reply; 11+ messages in thread
From: Stephen Bates @ 2016-05-09 14:53 UTC (permalink / raw)
To: hch@infradead.org
Cc: axboe@fb.com, linux-block@vger.kernel.org, keith.busch@intel.com,
linux-nvme@lists.infradead.org, Jon Derrick
> On Thu, May 05, 2016 at 07:44:29PM +0000, Stephen Bates wrote:
> > I am very interested in seeing this added. There are use cases involving
> super-low latency (non-NAND based) NVMe devices and they want the
> fastest possible IO response times for ALL IO to that device. They also have
> no desire to wait for the new system calls and glibc updates needed to tie
> their applications into polling or to rewrite their applications to avail of those
> new calls. I have done some testing on Jon's "big hammer" and it seems to
> work well for this use case and applied cleanly against v4.6-rc6.
>
> Let's get the driver for those devices merged first, and if you can provide
> numbers that it's worth it we can add a tweak to always enable polling from
> the driver for those devices.
Christoph, this is a DRAM based NVMe device, the code for polling in NVMe was merged in 4.5 right? We are using the inbox NVMe driver. Here is some performance data:
QD=1, single thread, random 4KB reads
Polling Off: 12us Avg / 40us 99.99% ;
Polling On: 9.5us Avg / 25us 99.99%
Both the average and 99.99% reduction are of interest.
Cheers
Stephen
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/2] Block: Give option to force io polling
2016-05-09 14:53 ` Stephen Bates
@ 2016-05-12 7:08 ` hch
2016-05-12 17:27 ` Stephen Bates
0 siblings, 1 reply; 11+ messages in thread
From: hch @ 2016-05-12 7:08 UTC (permalink / raw)
To: Stephen Bates
Cc: hch@infradead.org, axboe@fb.com, linux-block@vger.kernel.org,
keith.busch@intel.com, linux-nvme@lists.infradead.org,
Jon Derrick
On Mon, May 09, 2016 at 02:53:30PM +0000, Stephen Bates wrote:
> Christoph, this is a DRAM based NVMe device, the code for polling in NVMe was merged in 4.5 right? We are using the inbox NVMe driver. Here is some performance data:
>
> QD=1, single thread, random 4KB reads
>
> Polling Off: 12us Avg / 40us 99.99% ;
> Polling On: 9.5us Avg / 25us 99.99%
>
> Both the average and 99.99% reduction are of interest.
How does CPU usage look for common workloads with polling force
enabled? If it's really an overall win we should just add a quirk
to the NVMe driver to always force polling for this device based on the
PCI ID.
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH 0/2] Block: Give option to force io polling
2016-05-12 7:08 ` hch
@ 2016-05-12 17:27 ` Stephen Bates
2016-05-12 17:34 ` Derrick, Jonathan
0 siblings, 1 reply; 11+ messages in thread
From: Stephen Bates @ 2016-05-12 17:27 UTC (permalink / raw)
To: hch@infradead.org
Cc: axboe@fb.com, linux-block@vger.kernel.org, keith.busch@intel.com,
linux-nvme@lists.infradead.org, Jon Derrick
>
>
> On Mon, May 09, 2016 at 02:53:30PM +0000, Stephen Bates wrote:
> > Christoph, this is a DRAM based NVMe device, the code for polling in
> NVMe was merged in 4.5 right? We are using the inbox NVMe driver. Here is
> some performance data:
> >
> > QD=1, single thread, random 4KB reads
> >
> > Polling Off: 12us Avg / 40us 99.99% ;
> > Polling On: 9.5us Avg / 25us 99.99%
> >
> > Both the average and 99.99% reduction are of interest.
>
> How does CPU usage look for common workloads with polling force enabled?
Christoph, CPU load added.
Polling Off: 12us Avg / 40us 99.99% ; CPU 0.39 H/W Threads
Polling On: 9.5us Avg / 25us 99.99% ; CPU 0.98 H/W Threads
> If it's really an overall win we should just add a quirk to the NVMe driver to
> always force polling for this device based on the PCI ID.
While I like the "big hammer" approach I think we have to have some control over when it is swung ;-). Always turning on polling for a specific device seems a bit too Mj�lnir [1] of a solution. The bonus of polling is only when QD and/or thread-count is low and not everyone will use the device that way. I also suspect other low latency NVMe devices are coming and having a quirk for each and every one of them does not make sense. We could use a module param or sysfs entry in the NVMe driver itself if we want to avoid having this control in the block layer?
Stephen
[1] Mj�lnir = Thor's Hammer ;-).
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH 0/2] Block: Give option to force io polling
2016-05-12 17:27 ` Stephen Bates
@ 2016-05-12 17:34 ` Derrick, Jonathan
0 siblings, 0 replies; 11+ messages in thread
From: Derrick, Jonathan @ 2016-05-12 17:34 UTC (permalink / raw)
To: Stephen Bates, hch@infradead.org
Cc: axboe@fb.com, linux-block@vger.kernel.org, Busch, Keith,
linux-nvme@lists.infradead.org
Hi Stephen,
I've got a patch I'm going to send out shortly which does it for a block-device file (as Jeff suggested). It's still Mj�lnir but seems to be more in the right place.
-----Original Message-----
From: Stephen Bates [mailto:stephen.bates@microsemi.com]
Sent: Thursday, May 12, 2016 11:28 AM
To: hch@infradead.org
Cc: axboe@fb.com; linux-block@vger.kernel.org; Busch, Keith <keith.busch@intel.com>; linux-nvme@lists.infradead.org; Derrick, Jonathan <jonathan.derrick@intel.com>
Subject: RE: [PATCH 0/2] Block: Give option to force io polling
>
>
> On Mon, May 09, 2016 at 02:53:30PM +0000, Stephen Bates wrote:
> > Christoph, this is a DRAM based NVMe device, the code for polling in
> NVMe was merged in 4.5 right? We are using the inbox NVMe driver. Here
> is some performance data:
> >
> > QD=1, single thread, random 4KB reads
> >
> > Polling Off: 12us Avg / 40us 99.99% ; Polling On: 9.5us Avg / 25us
> > 99.99%
> >
> > Both the average and 99.99% reduction are of interest.
>
> How does CPU usage look for common workloads with polling force enabled?
Christoph, CPU load added.
Polling Off: 12us Avg / 40us 99.99% ; CPU 0.39 H/W Threads Polling On: 9.5us Avg / 25us 99.99% ; CPU 0.98 H/W Threads
> If it's really an overall win we should just add a quirk to the NVMe
> driver to always force polling for this device based on the PCI ID.
While I like the "big hammer" approach I think we have to have some control over when it is swung ;-). Always turning on polling for a specific device seems a bit too Mj�lnir [1] of a solution. The bonus of polling is only when QD and/or thread-count is low and not everyone will use the device that way. I also suspect other low latency NVMe devices are coming and having a quirk for each and every one of them does not make sense. We could use a module param or sysfs entry in the NVMe driver itself if we want to avoid having this control in the block layer?
Stephen
[1] Mj�lnir = Thor's Hammer ;-).
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-05-12 17:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-31 20:19 [PATCH 0/2] Block: Give option to force io polling Jon Derrick
2016-04-05 12:38 ` Christoph Hellwig
2016-04-05 15:54 ` Keith Busch
2016-04-05 17:27 ` Christoph Hellwig
2016-04-05 18:07 ` Stephen Bates
2016-05-05 19:44 ` Stephen Bates
2016-05-08 9:02 ` hch
2016-05-09 14:53 ` Stephen Bates
2016-05-12 7:08 ` hch
2016-05-12 17:27 ` Stephen Bates
2016-05-12 17:34 ` Derrick, Jonathan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).