* [PATCH] nvme: utilize two queue maps, one for reads and one for writes [not found] <20181114004148.GA29545@roeck-us.net> @ 2018-11-14 0:51 ` Jens Axboe 2018-11-14 1:28 ` Mike Snitzer 2018-11-14 4:52 ` [PATCH] " Guenter Roeck 0 siblings, 2 replies; 5+ messages in thread From: Jens Axboe @ 2018-11-14 0:51 UTC (permalink / raw) On 11/13/18 5:41 PM, Guenter Roeck wrote: > Hi, > > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote: >> NVMe does round-robin between queues by default, which means that >> sharing a queue map for both reads and writes can be problematic >> in terms of read servicing. It's much easier to flood the queue >> with writes and reduce the read servicing. >> >> Implement two queue maps, one for reads and one for writes. The >> write queue count is configurable through the 'write_queues' >> parameter. >> >> By default, we retain the previous behavior of having a single >> queue set, shared between reads and writes. Setting 'write_queues' >> to a non-zero value will create two queue sets, one for reads and >> one for writes, the latter using the configurable number of >> queues (hardware queue counts permitting). >> >> Reviewed-by: Hannes Reinecke <hare at suse.com> >> Reviewed-by: Keith Busch <keith.busch at intel.com> >> Signed-off-by: Jens Axboe <axboe at kernel.dk> > > This patch causes hangs when running recent versions of > -next with several architectures; see the -next column at > kerneltests.org/builders for details. Bisect log below; this > was run with qemu on alpha. Reverting this patch as well as > "nvme: add separate poll queue map" fixes the problem. I don't see anything related to what hung, the trace, and so on. Can you clue me in? Where are the test results with dmesg? How to reproduce? -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
* nvme: utilize two queue maps, one for reads and one for writes 2018-11-14 0:51 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe @ 2018-11-14 1:28 ` Mike Snitzer 2018-11-14 1:36 ` Mike Snitzer 2018-11-14 4:52 ` [PATCH] " Guenter Roeck 1 sibling, 1 reply; 5+ messages in thread From: Mike Snitzer @ 2018-11-14 1:28 UTC (permalink / raw) On Tue, Nov 13 2018 at 7:51pm -0500, Jens Axboe <axboe@kernel.dk> wrote: > On 11/13/18 5:41 PM, Guenter Roeck wrote: > > Hi, > > > > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote: > >> NVMe does round-robin between queues by default, which means that > >> sharing a queue map for both reads and writes can be problematic > >> in terms of read servicing. It's much easier to flood the queue > >> with writes and reduce the read servicing. > >> > >> Implement two queue maps, one for reads and one for writes. The > >> write queue count is configurable through the 'write_queues' > >> parameter. > >> > >> By default, we retain the previous behavior of having a single > >> queue set, shared between reads and writes. Setting 'write_queues' > >> to a non-zero value will create two queue sets, one for reads and > >> one for writes, the latter using the configurable number of > >> queues (hardware queue counts permitting). > >> > >> Reviewed-by: Hannes Reinecke <hare at suse.com> > >> Reviewed-by: Keith Busch <keith.busch at intel.com> > >> Signed-off-by: Jens Axboe <axboe at kernel.dk> > > > > This patch causes hangs when running recent versions of > > -next with several architectures; see the -next column at > > kerneltests.org/builders for details. Bisect log below; this > > was run with qemu on alpha. Reverting this patch as well as > > "nvme: add separate poll queue map" fixes the problem. > > I don't see anything related to what hung, the trace, and so on. > Can you clue me in? Where are the test results with dmesg? > > How to reproduce? Think Guenter should've provided a full kerneltests.org url, but I had a look and found this for powerpc with -next: https://kerneltests.org/builders/next-powerpc-next/builds/998/steps/buildcommand/logs/stdio Has useful logs of the build failure due to block. (not seeing any -next failure for alpha but Guenter said he was using qemu so the build failure could've been any arch qemu supports) Mike ^ permalink raw reply [flat|nested] 5+ messages in thread
* nvme: utilize two queue maps, one for reads and one for writes 2018-11-14 1:28 ` Mike Snitzer @ 2018-11-14 1:36 ` Mike Snitzer 0 siblings, 0 replies; 5+ messages in thread From: Mike Snitzer @ 2018-11-14 1:36 UTC (permalink / raw) On Tue, Nov 13 2018 at 8:28pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Tue, Nov 13 2018 at 7:51pm -0500, > Jens Axboe <axboe@kernel.dk> wrote: > > > On 11/13/18 5:41 PM, Guenter Roeck wrote: > > > Hi, > > > > > > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote: > > >> NVMe does round-robin between queues by default, which means that > > >> sharing a queue map for both reads and writes can be problematic > > >> in terms of read servicing. It's much easier to flood the queue > > >> with writes and reduce the read servicing. > > >> > > >> Implement two queue maps, one for reads and one for writes. The > > >> write queue count is configurable through the 'write_queues' > > >> parameter. > > >> > > >> By default, we retain the previous behavior of having a single > > >> queue set, shared between reads and writes. Setting 'write_queues' > > >> to a non-zero value will create two queue sets, one for reads and > > >> one for writes, the latter using the configurable number of > > >> queues (hardware queue counts permitting). > > >> > > >> Reviewed-by: Hannes Reinecke <hare at suse.com> > > >> Reviewed-by: Keith Busch <keith.busch at intel.com> > > >> Signed-off-by: Jens Axboe <axboe at kernel.dk> > > > > > > This patch causes hangs when running recent versions of > > > -next with several architectures; see the -next column at > > > kerneltests.org/builders for details. Bisect log below; this > > > was run with qemu on alpha. Reverting this patch as well as > > > "nvme: add separate poll queue map" fixes the problem. > > > > I don't see anything related to what hung, the trace, and so on. > > Can you clue me in? Where are the test results with dmesg? > > > > How to reproduce? > > Think Guenter should've provided a full kerneltests.org url, but I had a > look and found this for powerpc with -next: > https://kerneltests.org/builders/next-powerpc-next/builds/998/steps/buildcommand/logs/stdio > > Has useful logs of the build failure due to block. Take that back, of course I only had a quick look and first scrolled to this fragment and thought "yeap shows block build failure" (not _really_): opt/buildbot/slave/next-next/build/kernel/sched/psi.c: In function 'cgroup_move_task': /opt/buildbot/slave/next-next/build/include/linux/spinlock.h:273:32: warning: 'rq' may be used uninitialized in this function [-Wmaybe-uninitialized] #define raw_spin_unlock(lock) _raw_spin_unlock(lock) ^~~~~~~~~~~~~~~~ /opt/buildbot/slave/next-next/build/kernel/sched/psi.c:639:13: note: 'rq' was declared here struct rq *rq; ^~ ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] nvme: utilize two queue maps, one for reads and one for writes 2018-11-14 0:51 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe 2018-11-14 1:28 ` Mike Snitzer @ 2018-11-14 4:52 ` Guenter Roeck 2018-11-14 17:12 ` Jens Axboe 1 sibling, 1 reply; 5+ messages in thread From: Guenter Roeck @ 2018-11-14 4:52 UTC (permalink / raw) On Tue, Nov 13, 2018@05:51:08PM -0700, Jens Axboe wrote: > On 11/13/18 5:41 PM, Guenter Roeck wrote: > > Hi, > > > > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote: > >> NVMe does round-robin between queues by default, which means that > >> sharing a queue map for both reads and writes can be problematic > >> in terms of read servicing. It's much easier to flood the queue > >> with writes and reduce the read servicing. > >> > >> Implement two queue maps, one for reads and one for writes. The > >> write queue count is configurable through the 'write_queues' > >> parameter. > >> > >> By default, we retain the previous behavior of having a single > >> queue set, shared between reads and writes. Setting 'write_queues' > >> to a non-zero value will create two queue sets, one for reads and > >> one for writes, the latter using the configurable number of > >> queues (hardware queue counts permitting). > >> > >> Reviewed-by: Hannes Reinecke <hare at suse.com> > >> Reviewed-by: Keith Busch <keith.busch at intel.com> > >> Signed-off-by: Jens Axboe <axboe at kernel.dk> > > > > This patch causes hangs when running recent versions of > > -next with several architectures; see the -next column at > > kerneltests.org/builders for details. Bisect log below; this > > was run with qemu on alpha. Reverting this patch as well as > > "nvme: add separate poll queue map" fixes the problem. > > I don't see anything related to what hung, the trace, and so on. > Can you clue me in? Where are the test results with dmesg? > alpha just stalls during boot. parisc reports a hung task in nvme_reset_work. sparc64 reports EIO when instantiating the nvme driver, called from nvme_reset_work, and then stalls. In all three cases, reverting the two mentioned patches fixes the problem. https://kerneltests.org/builders/qemu-parisc-next/builds/173/steps/qemubuildcommand_1/logs/stdio is an example log for parisc. I didn't check if the other boot failures (ppc looks bad) have the same root cause. > How to reproduce? > parisc: qemu-system-hppa -kernel vmlinux -no-reboot \ -snapshot -device nvme,serial=foo,drive=d0 \ -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ -append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \ -nographic -monitor null alpha: qemu-system-alpha -M clipper -kernel arch/alpha/boot/vmlinux -no-reboot \ -snapshot -device nvme,serial=foo,drive=d0 \ -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ -append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \ -m 128M -nographic -monitor null -serial stdio sparc64: qemu-system-sparc64 -M sun4u -cpu 'TI UltraSparc IIi' -m 512 \ -snapshot -device nvme,serial=foo,drive=d0,bus=pciB \ -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ -kernel arch/sparc/boot/image -no-reboot \ -append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \ -nographic -monitor none The root file systems are available from the respective subdirectories of: https://github.com/groeck/linux-build-test/tree/master/rootfs Guenter ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] nvme: utilize two queue maps, one for reads and one for writes 2018-11-14 4:52 ` [PATCH] " Guenter Roeck @ 2018-11-14 17:12 ` Jens Axboe 0 siblings, 0 replies; 5+ messages in thread From: Jens Axboe @ 2018-11-14 17:12 UTC (permalink / raw) On 11/13/18 9:52 PM, Guenter Roeck wrote: > On Tue, Nov 13, 2018@05:51:08PM -0700, Jens Axboe wrote: >> On 11/13/18 5:41 PM, Guenter Roeck wrote: >>> Hi, >>> >>> On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote: >>>> NVMe does round-robin between queues by default, which means that >>>> sharing a queue map for both reads and writes can be problematic >>>> in terms of read servicing. It's much easier to flood the queue >>>> with writes and reduce the read servicing. >>>> >>>> Implement two queue maps, one for reads and one for writes. The >>>> write queue count is configurable through the 'write_queues' >>>> parameter. >>>> >>>> By default, we retain the previous behavior of having a single >>>> queue set, shared between reads and writes. Setting 'write_queues' >>>> to a non-zero value will create two queue sets, one for reads and >>>> one for writes, the latter using the configurable number of >>>> queues (hardware queue counts permitting). >>>> >>>> Reviewed-by: Hannes Reinecke <hare at suse.com> >>>> Reviewed-by: Keith Busch <keith.busch at intel.com> >>>> Signed-off-by: Jens Axboe <axboe at kernel.dk> >>> >>> This patch causes hangs when running recent versions of >>> -next with several architectures; see the -next column at >>> kerneltests.org/builders for details. Bisect log below; this >>> was run with qemu on alpha. Reverting this patch as well as >>> "nvme: add separate poll queue map" fixes the problem. >> >> I don't see anything related to what hung, the trace, and so on. >> Can you clue me in? Where are the test results with dmesg? >> > alpha just stalls during boot. parisc reports a hung task > in nvme_reset_work. sparc64 reports EIO when instantiating > the nvme driver, called from nvme_reset_work, and then stalls. > In all three cases, reverting the two mentioned patches fixes > the problem. I think the below patch should fix it. > https://kerneltests.org/builders/qemu-parisc-next/builds/173/steps/qemubuildcommand_1/logs/stdio > > is an example log for parisc. > > I didn't check if the other boot failures (ppc looks bad) > have the same root cause. > >> How to reproduce? >> > parisc: > > qemu-system-hppa -kernel vmlinux -no-reboot \ > -snapshot -device nvme,serial=foo,drive=d0 \ > -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ > -append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \ > -nographic -monitor null > > alpha: > > qemu-system-alpha -M clipper -kernel arch/alpha/boot/vmlinux -no-reboot \ > -snapshot -device nvme,serial=foo,drive=d0 \ > -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ > -append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \ > -m 128M -nographic -monitor null -serial stdio > > sparc64: > > qemu-system-sparc64 -M sun4u -cpu 'TI UltraSparc IIi' -m 512 \ > -snapshot -device nvme,serial=foo,drive=d0,bus=pciB \ > -drive file=rootfs.ext2,if=none,format=raw,id=d0 \ > -kernel arch/sparc/boot/image -no-reboot \ > -append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \ > -nographic -monitor none > > The root file systems are available from the respective subdirectories > of: > > https://github.com/groeck/linux-build-test/tree/master/rootfs This is useful, thanks! I haven't tried it yet, but I was able to reproduce on x86 with MSI turned off. diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 8df868afa363..6c03461ad988 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2098,7 +2098,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) .nr_sets = ARRAY_SIZE(irq_sets), .sets = irq_sets, }; - int result; + int result = 0; /* * For irq sets, we have to ask for minvec == maxvec. This passes @@ -2113,9 +2113,16 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues) affd.nr_sets = 1; /* - * Need IRQs for read+write queues, and one for the admin queue + * Need IRQs for read+write queues, and one for the admin queue. + * If we can't get more than one vector, we have to share the + * admin queue and IO queue vector. For that case, don't add + * an extra vector for the admin queue, or we'll continue + * asking for 2 and get -ENOSPC in return. */ - nr_io_queues = irq_sets[0] + irq_sets[1] + 1; + if (result == -ENOSPC && nr_io_queues == 1) + nr_io_queues = 1; + else + nr_io_queues = irq_sets[0] + irq_sets[1] + 1; result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues, nr_io_queues, -- Jens Axboe ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-11-14 17:12 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20181114004148.GA29545@roeck-us.net>
2018-11-14 0:51 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe
2018-11-14 1:28 ` Mike Snitzer
2018-11-14 1:36 ` Mike Snitzer
2018-11-14 4:52 ` [PATCH] " Guenter Roeck
2018-11-14 17:12 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).