[PATCH] nvme: utilize two queue maps, one for reads and one for writes

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
       [not found] <20181114004148.GA29545@roeck-us.net>
@ 2018-11-14  0:51 ` Jens Axboe
  2018-11-14  1:28   ` Mike Snitzer
  2018-11-14  4:52   ` [PATCH] " Guenter Roeck
  0 siblings, 2 replies; 5+ messages in thread
From: Jens Axboe @ 2018-11-14  0:51 UTC (permalink / raw)


On 11/13/18 5:41 PM, Guenter Roeck wrote:
> Hi,
> 
> On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
>> NVMe does round-robin between queues by default, which means that
>> sharing a queue map for both reads and writes can be problematic
>> in terms of read servicing. It's much easier to flood the queue
>> with writes and reduce the read servicing.
>>
>> Implement two queue maps, one for reads and one for writes. The
>> write queue count is configurable through the 'write_queues'
>> parameter.
>>
>> By default, we retain the previous behavior of having a single
>> queue set, shared between reads and writes. Setting 'write_queues'
>> to a non-zero value will create two queue sets, one for reads and
>> one for writes, the latter using the configurable number of
>> queues (hardware queue counts permitting).
>>
>> Reviewed-by: Hannes Reinecke <hare at suse.com>
>> Reviewed-by: Keith Busch <keith.busch at intel.com>
>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> 
> This patch causes hangs when running recent versions of
> -next with several architectures; see the -next column at
> kerneltests.org/builders for details.  Bisect log below; this
> was run with qemu on alpha. Reverting this patch as well as
> "nvme: add separate poll queue map" fixes the problem.

I don't see anything related to what hung, the trace, and so on.
Can you clue me in? Where are the test results with dmesg?

How to reproduce?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* nvme: utilize two queue maps, one for reads and one for writes
  2018-11-14  0:51 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe
@ 2018-11-14  1:28   ` Mike Snitzer
  2018-11-14  1:36     ` Mike Snitzer
  2018-11-14  4:52   ` [PATCH] " Guenter Roeck
  1 sibling, 1 reply; 5+ messages in thread
From: Mike Snitzer @ 2018-11-14  1:28 UTC (permalink / raw)


On Tue, Nov 13 2018 at  7:51pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 11/13/18 5:41 PM, Guenter Roeck wrote:
> > Hi,
> > 
> > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
> >> NVMe does round-robin between queues by default, which means that
> >> sharing a queue map for both reads and writes can be problematic
> >> in terms of read servicing. It's much easier to flood the queue
> >> with writes and reduce the read servicing.
> >>
> >> Implement two queue maps, one for reads and one for writes. The
> >> write queue count is configurable through the 'write_queues'
> >> parameter.
> >>
> >> By default, we retain the previous behavior of having a single
> >> queue set, shared between reads and writes. Setting 'write_queues'
> >> to a non-zero value will create two queue sets, one for reads and
> >> one for writes, the latter using the configurable number of
> >> queues (hardware queue counts permitting).
> >>
> >> Reviewed-by: Hannes Reinecke <hare at suse.com>
> >> Reviewed-by: Keith Busch <keith.busch at intel.com>
> >> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> > 
> > This patch causes hangs when running recent versions of
> > -next with several architectures; see the -next column at
> > kerneltests.org/builders for details.  Bisect log below; this
> > was run with qemu on alpha. Reverting this patch as well as
> > "nvme: add separate poll queue map" fixes the problem.
> 
> I don't see anything related to what hung, the trace, and so on.
> Can you clue me in? Where are the test results with dmesg?
> 
> How to reproduce?

Think Guenter should've provided a full kerneltests.org url, but I had a
look and found this for powerpc with -next:
https://kerneltests.org/builders/next-powerpc-next/builds/998/steps/buildcommand/logs/stdio

Has useful logs of the build failure due to block.

(not seeing any -next failure for alpha but Guenter said he was using
qemu so the build failure could've been any arch qemu supports)

Mike

^ permalink raw reply	[flat|nested] 5+ messages in thread

* nvme: utilize two queue maps, one for reads and one for writes
  2018-11-14  1:28   ` Mike Snitzer
@ 2018-11-14  1:36     ` Mike Snitzer
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Snitzer @ 2018-11-14  1:36 UTC (permalink / raw)


On Tue, Nov 13 2018 at  8:28pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Nov 13 2018 at  7:51pm -0500,
> Jens Axboe <axboe@kernel.dk> wrote:
> 
> > On 11/13/18 5:41 PM, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
> > >> NVMe does round-robin between queues by default, which means that
> > >> sharing a queue map for both reads and writes can be problematic
> > >> in terms of read servicing. It's much easier to flood the queue
> > >> with writes and reduce the read servicing.
> > >>
> > >> Implement two queue maps, one for reads and one for writes. The
> > >> write queue count is configurable through the 'write_queues'
> > >> parameter.
> > >>
> > >> By default, we retain the previous behavior of having a single
> > >> queue set, shared between reads and writes. Setting 'write_queues'
> > >> to a non-zero value will create two queue sets, one for reads and
> > >> one for writes, the latter using the configurable number of
> > >> queues (hardware queue counts permitting).
> > >>
> > >> Reviewed-by: Hannes Reinecke <hare at suse.com>
> > >> Reviewed-by: Keith Busch <keith.busch at intel.com>
> > >> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> > > 
> > > This patch causes hangs when running recent versions of
> > > -next with several architectures; see the -next column at
> > > kerneltests.org/builders for details.  Bisect log below; this
> > > was run with qemu on alpha. Reverting this patch as well as
> > > "nvme: add separate poll queue map" fixes the problem.
> > 
> > I don't see anything related to what hung, the trace, and so on.
> > Can you clue me in? Where are the test results with dmesg?
> > 
> > How to reproduce?
> 
> Think Guenter should've provided a full kerneltests.org url, but I had a
> look and found this for powerpc with -next:
> https://kerneltests.org/builders/next-powerpc-next/builds/998/steps/buildcommand/logs/stdio
> 
> Has useful logs of the build failure due to block.

Take that back, of course I only had a quick look and first scrolled to
this fragment and thought "yeap shows block build failure" (not
_really_):

opt/buildbot/slave/next-next/build/kernel/sched/psi.c: In function 'cgroup_move_task':
/opt/buildbot/slave/next-next/build/include/linux/spinlock.h:273:32: warning: 'rq' may be used uninitialized in this function [-Wmaybe-uninitialized]
 #define raw_spin_unlock(lock)  _raw_spin_unlock(lock)
                                ^~~~~~~~~~~~~~~~
/opt/buildbot/slave/next-next/build/kernel/sched/psi.c:639:13: note: 'rq' was declared here
  struct rq *rq;
             ^~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-14  0:51 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe
  2018-11-14  1:28   ` Mike Snitzer
@ 2018-11-14  4:52   ` Guenter Roeck
  2018-11-14 17:12     ` Jens Axboe
  1 sibling, 1 reply; 5+ messages in thread
From: Guenter Roeck @ 2018-11-14  4:52 UTC (permalink / raw)


On Tue, Nov 13, 2018@05:51:08PM -0700, Jens Axboe wrote:
> On 11/13/18 5:41 PM, Guenter Roeck wrote:
> > Hi,
> > 
> > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
> >> NVMe does round-robin between queues by default, which means that
> >> sharing a queue map for both reads and writes can be problematic
> >> in terms of read servicing. It's much easier to flood the queue
> >> with writes and reduce the read servicing.
> >>
> >> Implement two queue maps, one for reads and one for writes. The
> >> write queue count is configurable through the 'write_queues'
> >> parameter.
> >>
> >> By default, we retain the previous behavior of having a single
> >> queue set, shared between reads and writes. Setting 'write_queues'
> >> to a non-zero value will create two queue sets, one for reads and
> >> one for writes, the latter using the configurable number of
> >> queues (hardware queue counts permitting).
> >>
> >> Reviewed-by: Hannes Reinecke <hare at suse.com>
> >> Reviewed-by: Keith Busch <keith.busch at intel.com>
> >> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> > 
> > This patch causes hangs when running recent versions of
> > -next with several architectures; see the -next column at
> > kerneltests.org/builders for details.  Bisect log below; this
> > was run with qemu on alpha. Reverting this patch as well as
> > "nvme: add separate poll queue map" fixes the problem.
> 
> I don't see anything related to what hung, the trace, and so on.
> Can you clue me in? Where are the test results with dmesg?
> 
alpha just stalls during boot. parisc reports a hung task
in nvme_reset_work. sparc64 reports EIO when instantiating
the nvme driver, called from nvme_reset_work, and then stalls.
In all three cases, reverting the two mentioned patches fixes
the problem.

https://kerneltests.org/builders/qemu-parisc-next/builds/173/steps/qemubuildcommand_1/logs/stdio

is an example log for parisc.

I didn't check if the other boot failures (ppc looks bad)
have the same root cause.

> How to reproduce?
> 
parisc:

qemu-system-hppa -kernel vmlinux -no-reboot \
	-snapshot -device nvme,serial=foo,drive=d0 \
	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \
	-nographic -monitor null

alpha:

qemu-system-alpha -M clipper -kernel arch/alpha/boot/vmlinux -no-reboot \
	-snapshot -device nvme,serial=foo,drive=d0 \
	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
	-m 128M -nographic -monitor null -serial stdio

sparc64:

qemu-system-sparc64 -M sun4u -cpu 'TI UltraSparc IIi' -m 512 \
	-snapshot -device nvme,serial=foo,drive=d0,bus=pciB \
	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
	-kernel arch/sparc/boot/image -no-reboot \
	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
	-nographic -monitor none

The root file systems are available from the respective subdirectories
of:

https://github.com/groeck/linux-build-test/tree/master/rootfs

Guenter

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-14  4:52   ` [PATCH] " Guenter Roeck
@ 2018-11-14 17:12     ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2018-11-14 17:12 UTC (permalink / raw)


On 11/13/18 9:52 PM, Guenter Roeck wrote:
> On Tue, Nov 13, 2018@05:51:08PM -0700, Jens Axboe wrote:
>> On 11/13/18 5:41 PM, Guenter Roeck wrote:
>>> Hi,
>>>
>>> On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
>>>> NVMe does round-robin between queues by default, which means that
>>>> sharing a queue map for both reads and writes can be problematic
>>>> in terms of read servicing. It's much easier to flood the queue
>>>> with writes and reduce the read servicing.
>>>>
>>>> Implement two queue maps, one for reads and one for writes. The
>>>> write queue count is configurable through the 'write_queues'
>>>> parameter.
>>>>
>>>> By default, we retain the previous behavior of having a single
>>>> queue set, shared between reads and writes. Setting 'write_queues'
>>>> to a non-zero value will create two queue sets, one for reads and
>>>> one for writes, the latter using the configurable number of
>>>> queues (hardware queue counts permitting).
>>>>
>>>> Reviewed-by: Hannes Reinecke <hare at suse.com>
>>>> Reviewed-by: Keith Busch <keith.busch at intel.com>
>>>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
>>>
>>> This patch causes hangs when running recent versions of
>>> -next with several architectures; see the -next column at
>>> kerneltests.org/builders for details.  Bisect log below; this
>>> was run with qemu on alpha. Reverting this patch as well as
>>> "nvme: add separate poll queue map" fixes the problem.
>>
>> I don't see anything related to what hung, the trace, and so on.
>> Can you clue me in? Where are the test results with dmesg?
>>
> alpha just stalls during boot. parisc reports a hung task
> in nvme_reset_work. sparc64 reports EIO when instantiating
> the nvme driver, called from nvme_reset_work, and then stalls.
> In all three cases, reverting the two mentioned patches fixes
> the problem.

I think the below patch should fix it.

> https://kerneltests.org/builders/qemu-parisc-next/builds/173/steps/qemubuildcommand_1/logs/stdio
> 
> is an example log for parisc.
> 
> I didn't check if the other boot failures (ppc looks bad)
> have the same root cause.
> 
>> How to reproduce?
>>
> parisc:
> 
> qemu-system-hppa -kernel vmlinux -no-reboot \
> 	-snapshot -device nvme,serial=foo,drive=d0 \
> 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \
> 	-nographic -monitor null
> 
> alpha:
> 
> qemu-system-alpha -M clipper -kernel arch/alpha/boot/vmlinux -no-reboot \
> 	-snapshot -device nvme,serial=foo,drive=d0 \
> 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
> 	-m 128M -nographic -monitor null -serial stdio
> 
> sparc64:
> 
> qemu-system-sparc64 -M sun4u -cpu 'TI UltraSparc IIi' -m 512 \
> 	-snapshot -device nvme,serial=foo,drive=d0,bus=pciB \
> 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> 	-kernel arch/sparc/boot/image -no-reboot \
> 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
> 	-nographic -monitor none
> 
> The root file systems are available from the respective subdirectories
> of:
> 
> https://github.com/groeck/linux-build-test/tree/master/rootfs

This is useful, thanks! I haven't tried it yet, but I was able to
reproduce on x86 with MSI turned off.


diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8df868afa363..6c03461ad988 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2098,7 +2098,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 		.nr_sets = ARRAY_SIZE(irq_sets),
 		.sets = irq_sets,
 	};
-	int result;
+	int result = 0;
 
 	/*
 	 * For irq sets, we have to ask for minvec == maxvec. This passes
@@ -2113,9 +2113,16 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 			affd.nr_sets = 1;
 
 		/*
-		 * Need IRQs for read+write queues, and one for the admin queue
+		 * Need IRQs for read+write queues, and one for the admin queue.
+		 * If we can't get more than one vector, we have to share the
+		 * admin queue and IO queue vector. For that case, don't add
+		 * an extra vector for the admin queue, or we'll continue
+		 * asking for 2 and get -ENOSPC in return.
 		 */
-		nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
+		if (result == -ENOSPC && nr_io_queues == 1)
+			nr_io_queues = 1;
+		else
+			nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
 
 		result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues,
 				nr_io_queues,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-11-14 17:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20181114004148.GA29545@roeck-us.net>
2018-11-14  0:51 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe
2018-11-14  1:28   ` Mike Snitzer
2018-11-14  1:36     ` Mike Snitzer
2018-11-14  4:52   ` [PATCH] " Guenter Roeck
2018-11-14 17:12     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).