* coalescing in polling mode in 4.9 @ 2018-02-02 8:10 Alex Nln 2018-02-05 0:15 ` Alex Nln 2018-02-05 15:02 ` Keith Busch 0 siblings, 2 replies; 12+ messages in thread From: Alex Nln @ 2018-02-02 8:10 UTC (permalink / raw) Hello, Enabling interrupt coalescing in nvme device, kernel 4.9 significantly reduces performance when using polling mode. When I enable coalescing, IOPS drops from 100K to 35K and latency jumps from 7 usec to 25 usec. Shouldn't we expect performance boost in polling mode when interrupt coalescing enabled? Device is Intel DC P3600 Coalescing enabled: nvme set-feature /dev/nvme0 -f 8 -v 0x00ffff fio-2.16 file: [global] iodepth=1 direct=1 ioengine=pvsync2 hipri group_reporting time_based blocksize=4k norandommap=1 [job1] rw=read filename=/dev/nvme0n1 name=raw=sequential-read numjobs=1 runtime=60 Thanks ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-02 8:10 coalescing in polling mode in 4.9 Alex Nln @ 2018-02-05 0:15 ` Alex Nln 2018-02-05 14:40 ` Sagi Grimberg 2018-02-05 15:02 ` Keith Busch 1 sibling, 1 reply; 12+ messages in thread From: Alex Nln @ 2018-02-05 0:15 UTC (permalink / raw) Appreciate if someone could help me to understand why do we need interrupts in polling mode, and performance drops when interrupts coalescing enabled. Thanks On Fri, 2 Feb 2018 00:10:28 -0800 Alex Nln <alex.nlnnfn@gmail.com> wrote: > Hello, > > Enabling interrupt coalescing in nvme device, kernel 4.9 > significantly reduces performance when using polling mode. > When I enable coalescing, IOPS drops from 100K to 35K and > latency jumps from 7 usec to 25 usec. > > Shouldn't we expect performance boost in polling mode when > interrupt coalescing enabled? > > > Device is Intel DC P3600 > Coalescing enabled: nvme set-feature /dev/nvme0 -f 8 -v 0x00ffff > fio-2.16 file: > [global] > iodepth=1 > direct=1 > ioengine=pvsync2 > hipri > group_reporting > time_based > blocksize=4k > norandommap=1 > > [job1] > rw=read > filename=/dev/nvme0n1 > name=raw=sequential-read > numjobs=1 > runtime=60 > > > > Thanks ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-05 0:15 ` Alex Nln @ 2018-02-05 14:40 ` Sagi Grimberg 0 siblings, 0 replies; 12+ messages in thread From: Sagi Grimberg @ 2018-02-05 14:40 UTC (permalink / raw) > Appreciate if someone could help me to understand why do we need interrupts > in polling mode, and performance drops when interrupts coalescing enabled. I didn't try with this device. I did try on p3700 in the past and this did not happen. ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-02 8:10 coalescing in polling mode in 4.9 Alex Nln 2018-02-05 0:15 ` Alex Nln @ 2018-02-05 15:02 ` Keith Busch 2018-02-05 15:42 ` Nitesh 2018-02-06 3:55 ` Alex Nln 1 sibling, 2 replies; 12+ messages in thread From: Keith Busch @ 2018-02-05 15:02 UTC (permalink / raw) On Fri, Feb 02, 2018@12:10:28AM -0800, Alex Nln wrote: > Enabling interrupt coalescing in nvme device, kernel 4.9 > significantly reduces performance when using polling mode. > When I enable coalescing, IOPS drops from 100K to 35K and > latency jumps from 7 usec to 25 usec. > > Shouldn't we expect performance boost in polling mode when > interrupt coalescing enabled? > > > Device is Intel DC P3600 That's a pretty low latency for this device. Are you running reads to unmapped blocks? I've seen the phenomenom occur where higher coalescing settings worsens performance. That usually means the the polling thread exceeded its time: need_resched() returns true, so the tasks schedules out and relies on interrupts, which have been throttled. Maybe a hybrid polling would be better for this. ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-05 15:02 ` Keith Busch @ 2018-02-05 15:42 ` Nitesh 2018-02-06 5:39 ` Alex Nln 2018-02-06 16:29 ` Keith Busch 2018-02-06 3:55 ` Alex Nln 1 sibling, 2 replies; 12+ messages in thread From: Nitesh @ 2018-02-05 15:42 UTC (permalink / raw) Hi Alex, I got into similar problem not long ago. With coalescing enabled, some I/Os took very long. Every time need_reshed() returns true, io_schedule() makes task go to sleep as its state is previous set as non-interruptible.I handled this by setting task state as running, and release the cpu. Diff is attached below, you may give it a try. Hi Keith, Should below not be the default way to handle classic-polling case i.e. do not rely on interrupts at all? For sleep during polling-path (which reduces cpu utilization) we anyway have hybrid polling path. If this approach seems acceptable, I can further refactor the code and submit a patch. I look forward to your suggestions. diff --git a/fs/block_dev.c b/fs/block_dev.c index 4a181fc..d2eeedf 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, set_current_state(TASK_UNINTERRUPTIBLE); if (!READ_ONCE(bio.bi_private)) break; - if (!(iocb->ki_flags & IOCB_HIPRI) || - !blk_poll(bdev_get_queue(bdev), qc)) + if (!(iocb->ki_flags & IOCB_HIPRI)) io_schedule(); + else if (!blk_poll(bdev_get_queue(bdev), qc)) { + if(need_resched()) + set_current_state(TASK_RUNNING); + io_schedule(); + } } __set_current_state(TASK_RUNNING); On Monday 05 February 2018 08:32 PM, Keith Busch wrote: > On Fri, Feb 02, 2018@12:10:28AM -0800, Alex Nln wrote: >> Enabling interrupt coalescing in nvme device, kernel 4.9 >> significantly reduces performance when using polling mode. >> When I enable coalescing, IOPS drops from 100K to 35K and >> latency jumps from 7 usec to 25 usec. >> >> Shouldn't we expect performance boost in polling mode when >> interrupt coalescing enabled? >> >> >> Device is Intel DC P3600 > > That's a pretty low latency for this device. Are you running reads to > unmapped blocks? > > I've seen the phenomenom occur where higher coalescing settings worsens > performance. That usually means the the polling thread exceeded its time: > need_resched() returns true, so the tasks schedules out and relies on > interrupts, which have been throttled. Maybe a hybrid polling would be > better for this. > > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme > > > ^ permalink raw reply related [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-05 15:42 ` Nitesh @ 2018-02-06 5:39 ` Alex Nln 2018-02-09 22:37 ` Nitesh Shetty 2018-02-06 16:29 ` Keith Busch 1 sibling, 1 reply; 12+ messages in thread From: Alex Nln @ 2018-02-06 5:39 UTC (permalink / raw) Hi Nitesh, On Mon, 05 Feb 2018 21:12:12 +0530 Nitesh <nj.shetty@samsung.com> wrote: > Hi Alex, > I got into similar problem not long ago. With coalescing enabled, some > I/Os took very long. Every time need_reshed() returns true, > io_schedule() makes task go to sleep as its state is previous set as > non-interruptible.I handled this by setting task state as running, and > release the cpu. Diff is attached below, you may give it a try. > I tested this patch on kernel 4.9 and it solves the problem. Thanks a lot > > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 4a181fc..d2eeedf 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, > struct iov_iter *iter, > set_current_state(TASK_UNINTERRUPTIBLE); > if (!READ_ONCE(bio.bi_private)) > break; > - if (!(iocb->ki_flags & IOCB_HIPRI) || > - !blk_poll(bdev_get_queue(bdev), qc)) > + if (!(iocb->ki_flags & IOCB_HIPRI)) > io_schedule(); > + else if (!blk_poll(bdev_get_queue(bdev), qc)) { > + if(need_resched()) > + set_current_state(TASK_RUNNING); > + io_schedule(); > + } > } > __set_current_state(TASK_RUNNING); > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-06 5:39 ` Alex Nln @ 2018-02-09 22:37 ` Nitesh Shetty 2018-02-10 21:08 ` Alex Nln 0 siblings, 1 reply; 12+ messages in thread From: Nitesh Shetty @ 2018-02-09 22:37 UTC (permalink / raw) Hi Alex, Did you observe any benefit in polling-performance i.e. with interrupt-coalescing vs without interrupt-coalescing. I'd appreciate if you could share the performance/latency data. Thanks, Nitesh On Tuesday 06 February 2018 11:09 AM, Alex Nln wrote: > Hi Nitesh, > > On Mon, 05 Feb 2018 21:12:12 +0530 > Nitesh <nj.shetty@samsung.com> wrote: > >> Hi Alex, >> I got into similar problem not long ago. With coalescing enabled, some >> I/Os took very long. Every time need_reshed() returns true, >> io_schedule() makes task go to sleep as its state is previous set as >> non-interruptible.I handled this by setting task state as running, and >> release the cpu. Diff is attached below, you may give it a try. >> > > I tested this patch on kernel 4.9 and it solves the problem. > Thanks a lot > > >> >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 4a181fc..d2eeedf 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, >> struct iov_iter *iter, >> set_current_state(TASK_UNINTERRUPTIBLE); >> if (!READ_ONCE(bio.bi_private)) >> break; >> - if (!(iocb->ki_flags & IOCB_HIPRI) || >> - !blk_poll(bdev_get_queue(bdev), qc)) >> + if (!(iocb->ki_flags & IOCB_HIPRI)) >> io_schedule(); >> + else if (!blk_poll(bdev_get_queue(bdev), qc)) { >> + if(need_resched()) >> + set_current_state(TASK_RUNNING); >> + io_schedule(); >> + } >> } >> __set_current_state(TASK_RUNNING); >> >> > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-09 22:37 ` Nitesh Shetty @ 2018-02-10 21:08 ` Alex Nln 2018-02-12 14:53 ` Keith Busch 0 siblings, 1 reply; 12+ messages in thread From: Alex Nln @ 2018-02-10 21:08 UTC (permalink / raw) Hi Nitesh, Yes, the benefits are clear, in terms of IOPS and latency. See below. coalescing iops clat intr/s 0x0000 108710 8.65 108711 0xff01 113081 8.28 56541 0xff03 114955 8.14 28738 0xff07 116318 8.04 14539 0xff0f 116470 8.02 7279 0xff1f 116908 8.00 3653 0xff3f 116833 8.00 1825 0xff7f 116718 8.00 911 0xffff 116962 8.00 456 It is quite strange model of IO processing: polling with interrupts. Shouldn't these two be mutual exclusive? Why not to disable interrupts at all on a queue while polling? Thanks, Alex device Intel DC P3600, kernel 4.9.76 + Nitesh patch that fixes polling, fio 2.21: [global] iodepth=1 direct=1 ioengine=pvsync2 hipri group_reporting time_based norandommap=1 [job1] rw=read filename=/dev/nvme0n1 name=raw=sequential-read numjobs=1 runtime=60 On Sat, 10 Feb 2018 04:07:08 +0530 Nitesh Shetty <nj.shetty@samsung.com> wrote: > Hi Alex, > > Did you observe any benefit in polling-performance i.e. with > interrupt-coalescing vs without interrupt-coalescing. I'd appreciate if > you could share the performance/latency data. > > Thanks, > Nitesh > > On Tuesday 06 February 2018 11:09 AM, Alex Nln wrote: > > Hi Nitesh, > > > > On Mon, 05 Feb 2018 21:12:12 +0530 > > Nitesh <nj.shetty@samsung.com> wrote: > > > >> Hi Alex, > >> I got into similar problem not long ago. With coalescing enabled, some > >> I/Os took very long. Every time need_reshed() returns true, > >> io_schedule() makes task go to sleep as its state is previous set as > >> non-interruptible.I handled this by setting task state as running, and > >> release the cpu. Diff is attached below, you may give it a try. > >> > > > > I tested this patch on kernel 4.9 and it solves the problem. > > Thanks a lot > > > > > >> > >> diff --git a/fs/block_dev.c b/fs/block_dev.c > >> index 4a181fc..d2eeedf 100644 > >> --- a/fs/block_dev.c > >> +++ b/fs/block_dev.c > >> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, > >> struct iov_iter *iter, > >> set_current_state(TASK_UNINTERRUPTIBLE); > >> if (!READ_ONCE(bio.bi_private)) > >> break; > >> - if (!(iocb->ki_flags & IOCB_HIPRI) || > >> - !blk_poll(bdev_get_queue(bdev), qc)) > >> + if (!(iocb->ki_flags & IOCB_HIPRI)) > >> io_schedule(); > >> + else if (!blk_poll(bdev_get_queue(bdev), qc)) { > >> + if(need_resched()) > >> + set_current_state(TASK_RUNNING); > >> + io_schedule(); > >> + } > >> } > >> __set_current_state(TASK_RUNNING); > >> > >> > > > > _______________________________________________ > > Linux-nvme mailing list > > Linux-nvme at lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/linux-nvme > > > > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-10 21:08 ` Alex Nln @ 2018-02-12 14:53 ` Keith Busch 0 siblings, 0 replies; 12+ messages in thread From: Keith Busch @ 2018-02-12 14:53 UTC (permalink / raw) On Sat, Feb 10, 2018@01:08:17PM -0800, Alex Nln wrote: > It is quite strange model of IO processing: polling with interrupts. > Shouldn't these two be mutual exclusive? Why not to disable > interrupts at all on a queue while polling? That's a valid point. There are some limitations preventing this at the moment. The nvme driver currently does not have a way to know the command it is submitting is intended to be polled, and if it did, we still need to determine a policy for creating special queue or queues that have interrupts disabled and how we register these with the block layer. ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-05 15:42 ` Nitesh 2018-02-06 5:39 ` Alex Nln @ 2018-02-06 16:29 ` Keith Busch 2018-02-07 8:27 ` Nitesh 1 sibling, 1 reply; 12+ messages in thread From: Keith Busch @ 2018-02-06 16:29 UTC (permalink / raw) On Mon, Feb 05, 2018@09:12:12PM +0530, Nitesh wrote: > diff --git a/fs/block_dev.c b/fs/block_dev.c > index 4a181fc..d2eeedf 100644 > --- a/fs/block_dev.c > +++ b/fs/block_dev.c > @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct > iov_iter *iter, > set_current_state(TASK_UNINTERRUPTIBLE); > if (!READ_ONCE(bio.bi_private)) > break; > - if (!(iocb->ki_flags & IOCB_HIPRI) || > - !blk_poll(bdev_get_queue(bdev), qc)) > + if (!(iocb->ki_flags & IOCB_HIPRI)) > io_schedule(); > + else if (!blk_poll(bdev_get_queue(bdev), qc)) { > + if(need_resched()) > + set_current_state(TASK_RUNNING); > + io_schedule(); > + } > } > __set_current_state(TASK_RUNNING); Yah, I think this looks good. Do you want to send this as a proper patch to linux-block mailing list for consideration? ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-06 16:29 ` Keith Busch @ 2018-02-07 8:27 ` Nitesh 0 siblings, 0 replies; 12+ messages in thread From: Nitesh @ 2018-02-07 8:27 UTC (permalink / raw) Hi Keith, I will send proper patch. On Tuesday 06 February 2018 09:59 PM, Keith Busch wrote: > On Mon, Feb 05, 2018@09:12:12PM +0530, Nitesh wrote: >> diff --git a/fs/block_dev.c b/fs/block_dev.c >> index 4a181fc..d2eeedf 100644 >> --- a/fs/block_dev.c >> +++ b/fs/block_dev.c >> @@ -236,9 +236,13 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct >> iov_iter *iter, >> set_current_state(TASK_UNINTERRUPTIBLE); >> if (!READ_ONCE(bio.bi_private)) >> break; >> - if (!(iocb->ki_flags & IOCB_HIPRI) || >> - !blk_poll(bdev_get_queue(bdev), qc)) >> + if (!(iocb->ki_flags & IOCB_HIPRI)) >> io_schedule(); >> + else if (!blk_poll(bdev_get_queue(bdev), qc)) { >> + if(need_resched()) >> + set_current_state(TASK_RUNNING); >> + io_schedule(); >> + } >> } >> __set_current_state(TASK_RUNNING); > > Yah, I think this looks good. Do you want to send this as a proper patch > to linux-block mailing list for consideration? > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* coalescing in polling mode in 4.9 2018-02-05 15:02 ` Keith Busch 2018-02-05 15:42 ` Nitesh @ 2018-02-06 3:55 ` Alex Nln 1 sibling, 0 replies; 12+ messages in thread From: Alex Nln @ 2018-02-06 3:55 UTC (permalink / raw) On Mon, 5 Feb 2018 08:02:53 -0700 Keith Busch <keith.busch@intel.com> wrote: > On Fri, Feb 02, 2018@12:10:28AM -0800, Alex Nln wrote: > > Enabling interrupt coalescing in nvme device, kernel 4.9 > > significantly reduces performance when using polling mode. > > When I enable coalescing, IOPS drops from 100K to 35K and > > latency jumps from 7 usec to 25 usec. > > > > Shouldn't we expect performance boost in polling mode when > > interrupt coalescing enabled? > > > > > > Device is Intel DC P3600 > > That's a pretty low latency for this device. Are you running reads to > unmapped blocks? These are synchronous sequential reads, random reads' latency is higher. > > I've seen the phenomenom occur where higher coalescing settings worsens > performance. That usually means the the polling thread exceeded its time: > need_resched() returns true, so the tasks schedules out and relies on > interrupts, which have been throttled. Maybe a hybrid polling would be > better for this. > It looks like the eventual call to need_resched() is a result of low performance. When I set coalescing to 0x00FFFF there are about 160 nvme interrupts/sec, but ~35K IOPS, so clearly only a fractions of CQ entries are processed in interrupt context, most of the CQ entries still processed by polling. Somehow device delays posting completion entries when coalescing is enabled. Is it possible? Thanks for help ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-02-12 14:53 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-02-02 8:10 coalescing in polling mode in 4.9 Alex Nln 2018-02-05 0:15 ` Alex Nln 2018-02-05 14:40 ` Sagi Grimberg 2018-02-05 15:02 ` Keith Busch 2018-02-05 15:42 ` Nitesh 2018-02-06 5:39 ` Alex Nln 2018-02-09 22:37 ` Nitesh Shetty 2018-02-10 21:08 ` Alex Nln 2018-02-12 14:53 ` Keith Busch 2018-02-06 16:29 ` Keith Busch 2018-02-07 8:27 ` Nitesh 2018-02-06 3:55 ` Alex Nln
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox