[PATCH] nvme: utilize two queue maps, one for reads and one for writes

linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
       [not found] <20181114004148.GA29545@roeck-us.net>
@ 2018-11-14  0:51 ` Jens Axboe
  2018-11-14  1:28   ` Mike Snitzer
  2018-11-14  4:52   ` [PATCH] " Guenter Roeck
  0 siblings, 2 replies; 15+ messages in thread
From: Jens Axboe @ 2018-11-14  0:51 UTC (permalink / raw)


On 11/13/18 5:41 PM, Guenter Roeck wrote:
> Hi,
> 
> On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
>> NVMe does round-robin between queues by default, which means that
>> sharing a queue map for both reads and writes can be problematic
>> in terms of read servicing. It's much easier to flood the queue
>> with writes and reduce the read servicing.
>>
>> Implement two queue maps, one for reads and one for writes. The
>> write queue count is configurable through the 'write_queues'
>> parameter.
>>
>> By default, we retain the previous behavior of having a single
>> queue set, shared between reads and writes. Setting 'write_queues'
>> to a non-zero value will create two queue sets, one for reads and
>> one for writes, the latter using the configurable number of
>> queues (hardware queue counts permitting).
>>
>> Reviewed-by: Hannes Reinecke <hare at suse.com>
>> Reviewed-by: Keith Busch <keith.busch at intel.com>
>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> 
> This patch causes hangs when running recent versions of
> -next with several architectures; see the -next column at
> kerneltests.org/builders for details.  Bisect log below; this
> was run with qemu on alpha. Reverting this patch as well as
> "nvme: add separate poll queue map" fixes the problem.

I don't see anything related to what hung, the trace, and so on.
Can you clue me in? Where are the test results with dmesg?

How to reproduce?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme: utilize two queue maps, one for reads and one for writes
  2018-11-14  0:51 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe
@ 2018-11-14  1:28   ` Mike Snitzer
  2018-11-14  1:36     ` Mike Snitzer
  2018-11-14  4:52   ` [PATCH] " Guenter Roeck
  1 sibling, 1 reply; 15+ messages in thread
From: Mike Snitzer @ 2018-11-14  1:28 UTC (permalink / raw)


On Tue, Nov 13 2018 at  7:51pm -0500,
Jens Axboe <axboe@kernel.dk> wrote:

> On 11/13/18 5:41 PM, Guenter Roeck wrote:
> > Hi,
> > 
> > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
> >> NVMe does round-robin between queues by default, which means that
> >> sharing a queue map for both reads and writes can be problematic
> >> in terms of read servicing. It's much easier to flood the queue
> >> with writes and reduce the read servicing.
> >>
> >> Implement two queue maps, one for reads and one for writes. The
> >> write queue count is configurable through the 'write_queues'
> >> parameter.
> >>
> >> By default, we retain the previous behavior of having a single
> >> queue set, shared between reads and writes. Setting 'write_queues'
> >> to a non-zero value will create two queue sets, one for reads and
> >> one for writes, the latter using the configurable number of
> >> queues (hardware queue counts permitting).
> >>
> >> Reviewed-by: Hannes Reinecke <hare at suse.com>
> >> Reviewed-by: Keith Busch <keith.busch at intel.com>
> >> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> > 
> > This patch causes hangs when running recent versions of
> > -next with several architectures; see the -next column at
> > kerneltests.org/builders for details.  Bisect log below; this
> > was run with qemu on alpha. Reverting this patch as well as
> > "nvme: add separate poll queue map" fixes the problem.
> 
> I don't see anything related to what hung, the trace, and so on.
> Can you clue me in? Where are the test results with dmesg?
> 
> How to reproduce?

Think Guenter should've provided a full kerneltests.org url, but I had a
look and found this for powerpc with -next:
https://kerneltests.org/builders/next-powerpc-next/builds/998/steps/buildcommand/logs/stdio

Has useful logs of the build failure due to block.

(not seeing any -next failure for alpha but Guenter said he was using
qemu so the build failure could've been any arch qemu supports)

Mike

^ permalink raw reply	[flat|nested] 15+ messages in thread

* nvme: utilize two queue maps, one for reads and one for writes
  2018-11-14  1:28   ` Mike Snitzer
@ 2018-11-14  1:36     ` Mike Snitzer
  0 siblings, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2018-11-14  1:36 UTC (permalink / raw)


On Tue, Nov 13 2018 at  8:28pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Nov 13 2018 at  7:51pm -0500,
> Jens Axboe <axboe@kernel.dk> wrote:
> 
> > On 11/13/18 5:41 PM, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
> > >> NVMe does round-robin between queues by default, which means that
> > >> sharing a queue map for both reads and writes can be problematic
> > >> in terms of read servicing. It's much easier to flood the queue
> > >> with writes and reduce the read servicing.
> > >>
> > >> Implement two queue maps, one for reads and one for writes. The
> > >> write queue count is configurable through the 'write_queues'
> > >> parameter.
> > >>
> > >> By default, we retain the previous behavior of having a single
> > >> queue set, shared between reads and writes. Setting 'write_queues'
> > >> to a non-zero value will create two queue sets, one for reads and
> > >> one for writes, the latter using the configurable number of
> > >> queues (hardware queue counts permitting).
> > >>
> > >> Reviewed-by: Hannes Reinecke <hare at suse.com>
> > >> Reviewed-by: Keith Busch <keith.busch at intel.com>
> > >> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> > > 
> > > This patch causes hangs when running recent versions of
> > > -next with several architectures; see the -next column at
> > > kerneltests.org/builders for details.  Bisect log below; this
> > > was run with qemu on alpha. Reverting this patch as well as
> > > "nvme: add separate poll queue map" fixes the problem.
> > 
> > I don't see anything related to what hung, the trace, and so on.
> > Can you clue me in? Where are the test results with dmesg?
> > 
> > How to reproduce?
> 
> Think Guenter should've provided a full kerneltests.org url, but I had a
> look and found this for powerpc with -next:
> https://kerneltests.org/builders/next-powerpc-next/builds/998/steps/buildcommand/logs/stdio
> 
> Has useful logs of the build failure due to block.

Take that back, of course I only had a quick look and first scrolled to
this fragment and thought "yeap shows block build failure" (not
_really_):

opt/buildbot/slave/next-next/build/kernel/sched/psi.c: In function 'cgroup_move_task':
/opt/buildbot/slave/next-next/build/include/linux/spinlock.h:273:32: warning: 'rq' may be used uninitialized in this function [-Wmaybe-uninitialized]
 #define raw_spin_unlock(lock)  _raw_spin_unlock(lock)
                                ^~~~~~~~~~~~~~~~
/opt/buildbot/slave/next-next/build/kernel/sched/psi.c:639:13: note: 'rq' was declared here
  struct rq *rq;
             ^~

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-14  0:51 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe
  2018-11-14  1:28   ` Mike Snitzer
@ 2018-11-14  4:52   ` Guenter Roeck
  2018-11-14 17:12     ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Guenter Roeck @ 2018-11-14  4:52 UTC (permalink / raw)


On Tue, Nov 13, 2018@05:51:08PM -0700, Jens Axboe wrote:
> On 11/13/18 5:41 PM, Guenter Roeck wrote:
> > Hi,
> > 
> > On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
> >> NVMe does round-robin between queues by default, which means that
> >> sharing a queue map for both reads and writes can be problematic
> >> in terms of read servicing. It's much easier to flood the queue
> >> with writes and reduce the read servicing.
> >>
> >> Implement two queue maps, one for reads and one for writes. The
> >> write queue count is configurable through the 'write_queues'
> >> parameter.
> >>
> >> By default, we retain the previous behavior of having a single
> >> queue set, shared between reads and writes. Setting 'write_queues'
> >> to a non-zero value will create two queue sets, one for reads and
> >> one for writes, the latter using the configurable number of
> >> queues (hardware queue counts permitting).
> >>
> >> Reviewed-by: Hannes Reinecke <hare at suse.com>
> >> Reviewed-by: Keith Busch <keith.busch at intel.com>
> >> Signed-off-by: Jens Axboe <axboe at kernel.dk>
> > 
> > This patch causes hangs when running recent versions of
> > -next with several architectures; see the -next column at
> > kerneltests.org/builders for details.  Bisect log below; this
> > was run with qemu on alpha. Reverting this patch as well as
> > "nvme: add separate poll queue map" fixes the problem.
> 
> I don't see anything related to what hung, the trace, and so on.
> Can you clue me in? Where are the test results with dmesg?
> 
alpha just stalls during boot. parisc reports a hung task
in nvme_reset_work. sparc64 reports EIO when instantiating
the nvme driver, called from nvme_reset_work, and then stalls.
In all three cases, reverting the two mentioned patches fixes
the problem.

https://kerneltests.org/builders/qemu-parisc-next/builds/173/steps/qemubuildcommand_1/logs/stdio

is an example log for parisc.

I didn't check if the other boot failures (ppc looks bad)
have the same root cause.

> How to reproduce?
> 
parisc:

qemu-system-hppa -kernel vmlinux -no-reboot \
	-snapshot -device nvme,serial=foo,drive=d0 \
	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \
	-nographic -monitor null

alpha:

qemu-system-alpha -M clipper -kernel arch/alpha/boot/vmlinux -no-reboot \
	-snapshot -device nvme,serial=foo,drive=d0 \
	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
	-m 128M -nographic -monitor null -serial stdio

sparc64:

qemu-system-sparc64 -M sun4u -cpu 'TI UltraSparc IIi' -m 512 \
	-snapshot -device nvme,serial=foo,drive=d0,bus=pciB \
	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
	-kernel arch/sparc/boot/image -no-reboot \
	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
	-nographic -monitor none

The root file systems are available from the respective subdirectories
of:

https://github.com/groeck/linux-build-test/tree/master/rootfs

Guenter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-14  4:52   ` [PATCH] " Guenter Roeck
@ 2018-11-14 17:12     ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2018-11-14 17:12 UTC (permalink / raw)


On 11/13/18 9:52 PM, Guenter Roeck wrote:
> On Tue, Nov 13, 2018@05:51:08PM -0700, Jens Axboe wrote:
>> On 11/13/18 5:41 PM, Guenter Roeck wrote:
>>> Hi,
>>>
>>> On Wed, Oct 31, 2018@08:36:31AM -0600, Jens Axboe wrote:
>>>> NVMe does round-robin between queues by default, which means that
>>>> sharing a queue map for both reads and writes can be problematic
>>>> in terms of read servicing. It's much easier to flood the queue
>>>> with writes and reduce the read servicing.
>>>>
>>>> Implement two queue maps, one for reads and one for writes. The
>>>> write queue count is configurable through the 'write_queues'
>>>> parameter.
>>>>
>>>> By default, we retain the previous behavior of having a single
>>>> queue set, shared between reads and writes. Setting 'write_queues'
>>>> to a non-zero value will create two queue sets, one for reads and
>>>> one for writes, the latter using the configurable number of
>>>> queues (hardware queue counts permitting).
>>>>
>>>> Reviewed-by: Hannes Reinecke <hare at suse.com>
>>>> Reviewed-by: Keith Busch <keith.busch at intel.com>
>>>> Signed-off-by: Jens Axboe <axboe at kernel.dk>
>>>
>>> This patch causes hangs when running recent versions of
>>> -next with several architectures; see the -next column at
>>> kerneltests.org/builders for details.  Bisect log below; this
>>> was run with qemu on alpha. Reverting this patch as well as
>>> "nvme: add separate poll queue map" fixes the problem.
>>
>> I don't see anything related to what hung, the trace, and so on.
>> Can you clue me in? Where are the test results with dmesg?
>>
> alpha just stalls during boot. parisc reports a hung task
> in nvme_reset_work. sparc64 reports EIO when instantiating
> the nvme driver, called from nvme_reset_work, and then stalls.
> In all three cases, reverting the two mentioned patches fixes
> the problem.

I think the below patch should fix it.

> https://kerneltests.org/builders/qemu-parisc-next/builds/173/steps/qemubuildcommand_1/logs/stdio
> 
> is an example log for parisc.
> 
> I didn't check if the other boot failures (ppc looks bad)
> have the same root cause.
> 
>> How to reproduce?
>>
> parisc:
> 
> qemu-system-hppa -kernel vmlinux -no-reboot \
> 	-snapshot -device nvme,serial=foo,drive=d0 \
> 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0,115200 ' \
> 	-nographic -monitor null
> 
> alpha:
> 
> qemu-system-alpha -M clipper -kernel arch/alpha/boot/vmlinux -no-reboot \
> 	-snapshot -device nvme,serial=foo,drive=d0 \
> 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
> 	-m 128M -nographic -monitor null -serial stdio
> 
> sparc64:
> 
> qemu-system-sparc64 -M sun4u -cpu 'TI UltraSparc IIi' -m 512 \
> 	-snapshot -device nvme,serial=foo,drive=d0,bus=pciB \
> 	-drive file=rootfs.ext2,if=none,format=raw,id=d0 \
> 	-kernel arch/sparc/boot/image -no-reboot \
> 	-append 'root=/dev/nvme0n1 rw rootwait panic=-1 console=ttyS0' \
> 	-nographic -monitor none
> 
> The root file systems are available from the respective subdirectories
> of:
> 
> https://github.com/groeck/linux-build-test/tree/master/rootfs

This is useful, thanks! I haven't tried it yet, but I was able to
reproduce on x86 with MSI turned off.


diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8df868afa363..6c03461ad988 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2098,7 +2098,7 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 		.nr_sets = ARRAY_SIZE(irq_sets),
 		.sets = irq_sets,
 	};
-	int result;
+	int result = 0;
 
 	/*
 	 * For irq sets, we have to ask for minvec == maxvec. This passes
@@ -2113,9 +2113,16 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 			affd.nr_sets = 1;
 
 		/*
-		 * Need IRQs for read+write queues, and one for the admin queue
+		 * Need IRQs for read+write queues, and one for the admin queue.
+		 * If we can't get more than one vector, we have to share the
+		 * admin queue and IO queue vector. For that case, don't add
+		 * an extra vector for the admin queue, or we'll continue
+		 * asking for 2 and get -ENOSPC in return.
 		 */
-		nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
+		if (result == -ENOSPC && nr_io_queues == 1)
+			nr_io_queues = 1;
+		else
+			nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
 
 		result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues,
 				nr_io_queues,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 15+ messages in thread

[parent not found: <20181115182833.GA15729@roeck-us.net>]

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
       [not found] <20181115182833.GA15729@roeck-us.net>
@ 2018-11-15 18:38 ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2018-11-15 18:38 UTC (permalink / raw)


On 11/15/18 11:28 AM, Guenter Roeck wrote:
> Hi Jens,
> 
>> I think the below patch should fix it.
>>
> Sorry I wasn't able to test this earlier. Looks like it does
> fix the problem; the problem is no longer seen in next-20181115.
> Minor comment below.

That's fine, thanks for testing!

>>  		/*
>> -		 * Need IRQs for read+write queues, and one for the admin queue
>> +		 * Need IRQs for read+write queues, and one for the admin queue.
>> +		 * If we can't get more than one vector, we have to share the
>> +		 * admin queue and IO queue vector. For that case, don't add
>> +		 * an extra vector for the admin queue, or we'll continue
>> +		 * asking for 2 and get -ENOSPC in return.
>>  		 */
>> -		nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
>> +		if (result == -ENOSPC && nr_io_queues == 1)
>> +			nr_io_queues = 1;
> 
> Setting nr_io_queues to 1 when it already is set to 1 doesn't really do
> anything. Is this for clarification ?

Guess that does look a bit odd, alternative would be to flip the
condition, but I think this one is easier to read.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <20181115191126.GA16973@roeck-us.net>]

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
       [not found] <20181115191126.GA16973@roeck-us.net>
@ 2018-11-15 19:29 ` Jens Axboe
  2018-11-15 19:38   ` Guenter Roeck
  2018-11-15 19:36 ` Guenter Roeck
  1 sibling, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2018-11-15 19:29 UTC (permalink / raw)


On 11/15/18 12:11 PM, Guenter Roeck wrote:
> On Wed, Nov 14, 2018@10:12:44AM -0700, Jens Axboe wrote:
>>
>> I think the below patch should fix it.
>>
> 
> I spoke too early. sparc64, next-20181115:
> 
> [   14.204370] nvme nvme0: pci function 0000:02:00.0
> [   14.249956] nvme nvme0: Removing after probe failure status: -5
> [   14.263496] ------------[ cut here ]------------
> [   14.263913] WARNING: CPU: 0 PID: 15 at kernel/irq/manage.c:1597 __free_irq+0xa4/0x320
> [   14.264265] Trying to free already-free IRQ 9
> [   14.264519] Modules linked in:
> [   14.264961] CPU: 0 PID: 15 Comm: kworker/u2:1 Not tainted 4.20.0-rc2-next-20181115 #1
> [   14.265555] Workqueue: nvme-reset-wq nvme_reset_work
> [   14.265899] Call Trace:
> [   14.266118]  [000000000046944c] __warn+0xcc/0x100
> [   14.266375]  [00000000004694b0] warn_slowpath_fmt+0x30/0x40
> [   14.266635]  [00000000004d4ce4] __free_irq+0xa4/0x320
> [   14.266867]  [00000000004d4ff8] free_irq+0x38/0x80
> [   14.267092]  [00000000007b1874] pci_free_irq+0x14/0x40
> [   14.267327]  [00000000008a5444] nvme_dev_disable+0xe4/0x520
> [   14.267576]  [00000000008a69b8] nvme_reset_work+0x138/0x1c60
> [   14.267827]  [0000000000488dd0] process_one_work+0x230/0x6e0
> [   14.268079]  [00000000004894f4] worker_thread+0x274/0x520
> [   14.268321]  [0000000000490624] kthread+0xe4/0x120
> [   14.268544]  [00000000004060c4] ret_from_fork+0x1c/0x2c
> [   14.268825]  [0000000000000000]           (null)
> [   14.269089] irq event stamp: 32796
> [   14.269350] hardirqs last  enabled at (32795): [<0000000000b624a4>] _raw_spin_unlock_irqrestore+0x24/0x80
> [   14.269757] hardirqs last disabled at (32796): [<0000000000b622f4>] _raw_spin_lock_irqsave+0x14/0x60
> [   14.270566] softirqs last  enabled at (32780): [<0000000000b64c18>] __do_softirq+0x238/0x520
> [   14.271206] softirqs last disabled at (32729): [<000000000042ceec>] do_softirq_own_stack+0x2c/0x40
> [   14.272288] ---[ end trace cb79ccd2a0a03f3c ]---
> 
> Looks like an error during probe followed by an error cleanup problem.

Did it previous probe fine? Or is the new thing just the fact that
we spew a warning on trying to free a non-existing vector?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-15 19:29 ` Jens Axboe
@ 2018-11-15 19:38   ` Guenter Roeck
  2018-11-15 19:40     ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Guenter Roeck @ 2018-11-15 19:38 UTC (permalink / raw)


On Thu, Nov 15, 2018@12:29:04PM -0700, Jens Axboe wrote:
> On 11/15/18 12:11 PM, Guenter Roeck wrote:
> > On Wed, Nov 14, 2018@10:12:44AM -0700, Jens Axboe wrote:
> >>
> >> I think the below patch should fix it.
> >>
> > 
> > I spoke too early. sparc64, next-20181115:
> > 
> > [   14.204370] nvme nvme0: pci function 0000:02:00.0
> > [   14.249956] nvme nvme0: Removing after probe failure status: -5
> > [   14.263496] ------------[ cut here ]------------
> > [   14.263913] WARNING: CPU: 0 PID: 15 at kernel/irq/manage.c:1597 __free_irq+0xa4/0x320
> > [   14.264265] Trying to free already-free IRQ 9
> > [   14.264519] Modules linked in:
> > [   14.264961] CPU: 0 PID: 15 Comm: kworker/u2:1 Not tainted 4.20.0-rc2-next-20181115 #1
> > [   14.265555] Workqueue: nvme-reset-wq nvme_reset_work
> > [   14.265899] Call Trace:
> > [   14.266118]  [000000000046944c] __warn+0xcc/0x100
> > [   14.266375]  [00000000004694b0] warn_slowpath_fmt+0x30/0x40
> > [   14.266635]  [00000000004d4ce4] __free_irq+0xa4/0x320
> > [   14.266867]  [00000000004d4ff8] free_irq+0x38/0x80
> > [   14.267092]  [00000000007b1874] pci_free_irq+0x14/0x40
> > [   14.267327]  [00000000008a5444] nvme_dev_disable+0xe4/0x520
> > [   14.267576]  [00000000008a69b8] nvme_reset_work+0x138/0x1c60
> > [   14.267827]  [0000000000488dd0] process_one_work+0x230/0x6e0
> > [   14.268079]  [00000000004894f4] worker_thread+0x274/0x520
> > [   14.268321]  [0000000000490624] kthread+0xe4/0x120
> > [   14.268544]  [00000000004060c4] ret_from_fork+0x1c/0x2c
> > [   14.268825]  [0000000000000000]           (null)
> > [   14.269089] irq event stamp: 32796
> > [   14.269350] hardirqs last  enabled at (32795): [<0000000000b624a4>] _raw_spin_unlock_irqrestore+0x24/0x80
> > [   14.269757] hardirqs last disabled at (32796): [<0000000000b622f4>] _raw_spin_lock_irqsave+0x14/0x60
> > [   14.270566] softirqs last  enabled at (32780): [<0000000000b64c18>] __do_softirq+0x238/0x520
> > [   14.271206] softirqs last disabled at (32729): [<000000000042ceec>] do_softirq_own_stack+0x2c/0x40
> > [   14.272288] ---[ end trace cb79ccd2a0a03f3c ]---
> > 
> > Looks like an error during probe followed by an error cleanup problem.
> 
> Did it previous probe fine? Or is the new thing just the fact that
> we spew a warning on trying to free a non-existing vector?
> 
This works fine in mainline, if that is your question.

Guenter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-15 19:38   ` Guenter Roeck
@ 2018-11-15 19:40     ` Jens Axboe
  2018-11-15 19:43       ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2018-11-15 19:40 UTC (permalink / raw)


On 11/15/18 12:38 PM, Guenter Roeck wrote:
> On Thu, Nov 15, 2018@12:29:04PM -0700, Jens Axboe wrote:
>> On 11/15/18 12:11 PM, Guenter Roeck wrote:
>>> On Wed, Nov 14, 2018@10:12:44AM -0700, Jens Axboe wrote:
>>>>
>>>> I think the below patch should fix it.
>>>>
>>>
>>> I spoke too early. sparc64, next-20181115:
>>>
>>> [   14.204370] nvme nvme0: pci function 0000:02:00.0
>>> [   14.249956] nvme nvme0: Removing after probe failure status: -5
>>> [   14.263496] ------------[ cut here ]------------
>>> [   14.263913] WARNING: CPU: 0 PID: 15 at kernel/irq/manage.c:1597 __free_irq+0xa4/0x320
>>> [   14.264265] Trying to free already-free IRQ 9
>>> [   14.264519] Modules linked in:
>>> [   14.264961] CPU: 0 PID: 15 Comm: kworker/u2:1 Not tainted 4.20.0-rc2-next-20181115 #1
>>> [   14.265555] Workqueue: nvme-reset-wq nvme_reset_work
>>> [   14.265899] Call Trace:
>>> [   14.266118]  [000000000046944c] __warn+0xcc/0x100
>>> [   14.266375]  [00000000004694b0] warn_slowpath_fmt+0x30/0x40
>>> [   14.266635]  [00000000004d4ce4] __free_irq+0xa4/0x320
>>> [   14.266867]  [00000000004d4ff8] free_irq+0x38/0x80
>>> [   14.267092]  [00000000007b1874] pci_free_irq+0x14/0x40
>>> [   14.267327]  [00000000008a5444] nvme_dev_disable+0xe4/0x520
>>> [   14.267576]  [00000000008a69b8] nvme_reset_work+0x138/0x1c60
>>> [   14.267827]  [0000000000488dd0] process_one_work+0x230/0x6e0
>>> [   14.268079]  [00000000004894f4] worker_thread+0x274/0x520
>>> [   14.268321]  [0000000000490624] kthread+0xe4/0x120
>>> [   14.268544]  [00000000004060c4] ret_from_fork+0x1c/0x2c
>>> [   14.268825]  [0000000000000000]           (null)
>>> [   14.269089] irq event stamp: 32796
>>> [   14.269350] hardirqs last  enabled at (32795): [<0000000000b624a4>] _raw_spin_unlock_irqrestore+0x24/0x80
>>> [   14.269757] hardirqs last disabled at (32796): [<0000000000b622f4>] _raw_spin_lock_irqsave+0x14/0x60
>>> [   14.270566] softirqs last  enabled at (32780): [<0000000000b64c18>] __do_softirq+0x238/0x520
>>> [   14.271206] softirqs last disabled at (32729): [<000000000042ceec>] do_softirq_own_stack+0x2c/0x40
>>> [   14.272288] ---[ end trace cb79ccd2a0a03f3c ]---
>>>
>>> Looks like an error during probe followed by an error cleanup problem.
>>
>> Did it previous probe fine? Or is the new thing just the fact that
>> we spew a warning on trying to free a non-existing vector?
>>
> This works fine in mainline, if that is your question.

Yeah, as soon as I sent the other email I realized that. Let me send
you a quick patch.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-15 19:40     ` Jens Axboe
@ 2018-11-15 19:43       ` Jens Axboe
  2018-11-15 22:06         ` Guenter Roeck
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2018-11-15 19:43 UTC (permalink / raw)


On 11/15/18 12:40 PM, Jens Axboe wrote:
> On 11/15/18 12:38 PM, Guenter Roeck wrote:
>> On Thu, Nov 15, 2018@12:29:04PM -0700, Jens Axboe wrote:
>>> On 11/15/18 12:11 PM, Guenter Roeck wrote:
>>>> On Wed, Nov 14, 2018@10:12:44AM -0700, Jens Axboe wrote:
>>>>>
>>>>> I think the below patch should fix it.
>>>>>
>>>>
>>>> I spoke too early. sparc64, next-20181115:
>>>>
>>>> [   14.204370] nvme nvme0: pci function 0000:02:00.0
>>>> [   14.249956] nvme nvme0: Removing after probe failure status: -5
>>>> [   14.263496] ------------[ cut here ]------------
>>>> [   14.263913] WARNING: CPU: 0 PID: 15 at kernel/irq/manage.c:1597 __free_irq+0xa4/0x320
>>>> [   14.264265] Trying to free already-free IRQ 9
>>>> [   14.264519] Modules linked in:
>>>> [   14.264961] CPU: 0 PID: 15 Comm: kworker/u2:1 Not tainted 4.20.0-rc2-next-20181115 #1
>>>> [   14.265555] Workqueue: nvme-reset-wq nvme_reset_work
>>>> [   14.265899] Call Trace:
>>>> [   14.266118]  [000000000046944c] __warn+0xcc/0x100
>>>> [   14.266375]  [00000000004694b0] warn_slowpath_fmt+0x30/0x40
>>>> [   14.266635]  [00000000004d4ce4] __free_irq+0xa4/0x320
>>>> [   14.266867]  [00000000004d4ff8] free_irq+0x38/0x80
>>>> [   14.267092]  [00000000007b1874] pci_free_irq+0x14/0x40
>>>> [   14.267327]  [00000000008a5444] nvme_dev_disable+0xe4/0x520
>>>> [   14.267576]  [00000000008a69b8] nvme_reset_work+0x138/0x1c60
>>>> [   14.267827]  [0000000000488dd0] process_one_work+0x230/0x6e0
>>>> [   14.268079]  [00000000004894f4] worker_thread+0x274/0x520
>>>> [   14.268321]  [0000000000490624] kthread+0xe4/0x120
>>>> [   14.268544]  [00000000004060c4] ret_from_fork+0x1c/0x2c
>>>> [   14.268825]  [0000000000000000]           (null)
>>>> [   14.269089] irq event stamp: 32796
>>>> [   14.269350] hardirqs last  enabled at (32795): [<0000000000b624a4>] _raw_spin_unlock_irqrestore+0x24/0x80
>>>> [   14.269757] hardirqs last disabled at (32796): [<0000000000b622f4>] _raw_spin_lock_irqsave+0x14/0x60
>>>> [   14.270566] softirqs last  enabled at (32780): [<0000000000b64c18>] __do_softirq+0x238/0x520
>>>> [   14.271206] softirqs last disabled at (32729): [<000000000042ceec>] do_softirq_own_stack+0x2c/0x40
>>>> [   14.272288] ---[ end trace cb79ccd2a0a03f3c ]---
>>>>
>>>> Looks like an error during probe followed by an error cleanup problem.
>>>
>>> Did it previous probe fine? Or is the new thing just the fact that
>>> we spew a warning on trying to free a non-existing vector?
>>>
>> This works fine in mainline, if that is your question.
> 
> Yeah, as soon as I sent the other email I realized that. Let me send
> you a quick patch.

How's this?


diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ffbab5b01df4..fd73bfd2d1be 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2088,15 +2088,11 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 			affd.nr_sets = 1;
 
 		/*
-		 * Need IRQs for read+write queues, and one for the admin queue.
-		 * If we can't get more than one vector, we have to share the
-		 * admin queue and IO queue vector. For that case, don't add
-		 * an extra vector for the admin queue, or we'll continue
-		 * asking for 2 and get -ENOSPC in return.
+		 * If we got a failure and we're down to asking for just
+		 * 1 + 1 queues, just ask for a single vector. We'll share
+		 * that between the single IO queue and the admin queue.
 		 */
-		if (result == -ENOSPC && nr_io_queues == 1)
-			nr_io_queues = 1;
-		else
+		if (!(result < 0 && nr_io_queues == 1))
 			nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
 
 		result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues,

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-15 19:43       ` Jens Axboe
@ 2018-11-15 22:06         ` Guenter Roeck
  2018-11-15 22:12           ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Guenter Roeck @ 2018-11-15 22:06 UTC (permalink / raw)


On Thu, Nov 15, 2018@12:43:40PM -0700, Jens Axboe wrote:
> On 11/15/18 12:40 PM, Jens Axboe wrote:
> > On 11/15/18 12:38 PM, Guenter Roeck wrote:
> >> On Thu, Nov 15, 2018@12:29:04PM -0700, Jens Axboe wrote:
> >>> On 11/15/18 12:11 PM, Guenter Roeck wrote:
> >>>> On Wed, Nov 14, 2018@10:12:44AM -0700, Jens Axboe wrote:
> >>>>>
> >>>>> I think the below patch should fix it.
> >>>>>
> >>>>
> >>>> I spoke too early. sparc64, next-20181115:
> >>>>
> >>>> [   14.204370] nvme nvme0: pci function 0000:02:00.0
> >>>> [   14.249956] nvme nvme0: Removing after probe failure status: -5
> >>>> [   14.263496] ------------[ cut here ]------------
> >>>> [   14.263913] WARNING: CPU: 0 PID: 15 at kernel/irq/manage.c:1597 __free_irq+0xa4/0x320
> >>>> [   14.264265] Trying to free already-free IRQ 9
> >>>> [   14.264519] Modules linked in:
> >>>> [   14.264961] CPU: 0 PID: 15 Comm: kworker/u2:1 Not tainted 4.20.0-rc2-next-20181115 #1
> >>>> [   14.265555] Workqueue: nvme-reset-wq nvme_reset_work
> >>>> [   14.265899] Call Trace:
> >>>> [   14.266118]  [000000000046944c] __warn+0xcc/0x100
> >>>> [   14.266375]  [00000000004694b0] warn_slowpath_fmt+0x30/0x40
> >>>> [   14.266635]  [00000000004d4ce4] __free_irq+0xa4/0x320
> >>>> [   14.266867]  [00000000004d4ff8] free_irq+0x38/0x80
> >>>> [   14.267092]  [00000000007b1874] pci_free_irq+0x14/0x40
> >>>> [   14.267327]  [00000000008a5444] nvme_dev_disable+0xe4/0x520
> >>>> [   14.267576]  [00000000008a69b8] nvme_reset_work+0x138/0x1c60
> >>>> [   14.267827]  [0000000000488dd0] process_one_work+0x230/0x6e0
> >>>> [   14.268079]  [00000000004894f4] worker_thread+0x274/0x520
> >>>> [   14.268321]  [0000000000490624] kthread+0xe4/0x120
> >>>> [   14.268544]  [00000000004060c4] ret_from_fork+0x1c/0x2c
> >>>> [   14.268825]  [0000000000000000]           (null)
> >>>> [   14.269089] irq event stamp: 32796
> >>>> [   14.269350] hardirqs last  enabled at (32795): [<0000000000b624a4>] _raw_spin_unlock_irqrestore+0x24/0x80
> >>>> [   14.269757] hardirqs last disabled at (32796): [<0000000000b622f4>] _raw_spin_lock_irqsave+0x14/0x60
> >>>> [   14.270566] softirqs last  enabled at (32780): [<0000000000b64c18>] __do_softirq+0x238/0x520
> >>>> [   14.271206] softirqs last disabled at (32729): [<000000000042ceec>] do_softirq_own_stack+0x2c/0x40
> >>>> [   14.272288] ---[ end trace cb79ccd2a0a03f3c ]---
> >>>>
> >>>> Looks like an error during probe followed by an error cleanup problem.
> >>>
> >>> Did it previous probe fine? Or is the new thing just the fact that
> >>> we spew a warning on trying to free a non-existing vector?
> >>>
> >> This works fine in mainline, if that is your question.
> > 
> > Yeah, as soon as I sent the other email I realized that. Let me send
> > you a quick patch.
> 
> How's this?
> 
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index ffbab5b01df4..fd73bfd2d1be 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2088,15 +2088,11 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
>  			affd.nr_sets = 1;
>  
>  		/*
> -		 * Need IRQs for read+write queues, and one for the admin queue.
> -		 * If we can't get more than one vector, we have to share the
> -		 * admin queue and IO queue vector. For that case, don't add
> -		 * an extra vector for the admin queue, or we'll continue
> -		 * asking for 2 and get -ENOSPC in return.
> +		 * If we got a failure and we're down to asking for just
> +		 * 1 + 1 queues, just ask for a single vector. We'll share
> +		 * that between the single IO queue and the admin queue.
>  		 */
> -		if (result == -ENOSPC && nr_io_queues == 1)
> -			nr_io_queues = 1;
> -		else
> +		if (!(result < 0 && nr_io_queues == 1))
>  			nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
>  

Unfortunately, the code doesn't even get here because the call of
pci_alloc_irq_vectors_affinity in the first iteration fails with
-EINVAL, which results in an immediate return with -EIO.

Guenter

>  		result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues,
> 
> -- 
> Jens Axboe
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-15 22:06         ` Guenter Roeck
@ 2018-11-15 22:12           ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2018-11-15 22:12 UTC (permalink / raw)


On 11/15/18 3:06 PM, Guenter Roeck wrote:
> On Thu, Nov 15, 2018@12:43:40PM -0700, Jens Axboe wrote:
>> On 11/15/18 12:40 PM, Jens Axboe wrote:
>>> On 11/15/18 12:38 PM, Guenter Roeck wrote:
>>>> On Thu, Nov 15, 2018@12:29:04PM -0700, Jens Axboe wrote:
>>>>> On 11/15/18 12:11 PM, Guenter Roeck wrote:
>>>>>> On Wed, Nov 14, 2018@10:12:44AM -0700, Jens Axboe wrote:
>>>>>>>
>>>>>>> I think the below patch should fix it.
>>>>>>>
>>>>>>
>>>>>> I spoke too early. sparc64, next-20181115:
>>>>>>
>>>>>> [   14.204370] nvme nvme0: pci function 0000:02:00.0
>>>>>> [   14.249956] nvme nvme0: Removing after probe failure status: -5
>>>>>> [   14.263496] ------------[ cut here ]------------
>>>>>> [   14.263913] WARNING: CPU: 0 PID: 15 at kernel/irq/manage.c:1597 __free_irq+0xa4/0x320
>>>>>> [   14.264265] Trying to free already-free IRQ 9
>>>>>> [   14.264519] Modules linked in:
>>>>>> [   14.264961] CPU: 0 PID: 15 Comm: kworker/u2:1 Not tainted 4.20.0-rc2-next-20181115 #1
>>>>>> [   14.265555] Workqueue: nvme-reset-wq nvme_reset_work
>>>>>> [   14.265899] Call Trace:
>>>>>> [   14.266118]  [000000000046944c] __warn+0xcc/0x100
>>>>>> [   14.266375]  [00000000004694b0] warn_slowpath_fmt+0x30/0x40
>>>>>> [   14.266635]  [00000000004d4ce4] __free_irq+0xa4/0x320
>>>>>> [   14.266867]  [00000000004d4ff8] free_irq+0x38/0x80
>>>>>> [   14.267092]  [00000000007b1874] pci_free_irq+0x14/0x40
>>>>>> [   14.267327]  [00000000008a5444] nvme_dev_disable+0xe4/0x520
>>>>>> [   14.267576]  [00000000008a69b8] nvme_reset_work+0x138/0x1c60
>>>>>> [   14.267827]  [0000000000488dd0] process_one_work+0x230/0x6e0
>>>>>> [   14.268079]  [00000000004894f4] worker_thread+0x274/0x520
>>>>>> [   14.268321]  [0000000000490624] kthread+0xe4/0x120
>>>>>> [   14.268544]  [00000000004060c4] ret_from_fork+0x1c/0x2c
>>>>>> [   14.268825]  [0000000000000000]           (null)
>>>>>> [   14.269089] irq event stamp: 32796
>>>>>> [   14.269350] hardirqs last  enabled at (32795): [<0000000000b624a4>] _raw_spin_unlock_irqrestore+0x24/0x80
>>>>>> [   14.269757] hardirqs last disabled at (32796): [<0000000000b622f4>] _raw_spin_lock_irqsave+0x14/0x60
>>>>>> [   14.270566] softirqs last  enabled at (32780): [<0000000000b64c18>] __do_softirq+0x238/0x520
>>>>>> [   14.271206] softirqs last disabled at (32729): [<000000000042ceec>] do_softirq_own_stack+0x2c/0x40
>>>>>> [   14.272288] ---[ end trace cb79ccd2a0a03f3c ]---
>>>>>>
>>>>>> Looks like an error during probe followed by an error cleanup problem.
>>>>>
>>>>> Did it previous probe fine? Or is the new thing just the fact that
>>>>> we spew a warning on trying to free a non-existing vector?
>>>>>
>>>> This works fine in mainline, if that is your question.
>>>
>>> Yeah, as soon as I sent the other email I realized that. Let me send
>>> you a quick patch.
>>
>> How's this?
>>
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index ffbab5b01df4..fd73bfd2d1be 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -2088,15 +2088,11 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
>>  			affd.nr_sets = 1;
>>  
>>  		/*
>> -		 * Need IRQs for read+write queues, and one for the admin queue.
>> -		 * If we can't get more than one vector, we have to share the
>> -		 * admin queue and IO queue vector. For that case, don't add
>> -		 * an extra vector for the admin queue, or we'll continue
>> -		 * asking for 2 and get -ENOSPC in return.
>> +		 * If we got a failure and we're down to asking for just
>> +		 * 1 + 1 queues, just ask for a single vector. We'll share
>> +		 * that between the single IO queue and the admin queue.
>>  		 */
>> -		if (result == -ENOSPC && nr_io_queues == 1)
>> -			nr_io_queues = 1;
>> -		else
>> +		if (!(result < 0 && nr_io_queues == 1))
>>  			nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
>>  
> 
> Unfortunately, the code doesn't even get here because the call of
> pci_alloc_irq_vectors_affinity in the first iteration fails with
> -EINVAL, which results in an immediate return with -EIO.

Oh yeah... How about this then?


diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index ffbab5b01df4..4d161daa9c3a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2088,15 +2088,11 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 			affd.nr_sets = 1;
 
 		/*
-		 * Need IRQs for read+write queues, and one for the admin queue.
-		 * If we can't get more than one vector, we have to share the
-		 * admin queue and IO queue vector. For that case, don't add
-		 * an extra vector for the admin queue, or we'll continue
-		 * asking for 2 and get -ENOSPC in return.
+		 * If we got a failure and we're down to asking for just
+		 * 1 + 1 queues, just ask for a single vector. We'll share
+		 * that between the single IO queue and the admin queue.
 		 */
-		if (result == -ENOSPC && nr_io_queues == 1)
-			nr_io_queues = 1;
-		else
+		if (!(result < 0 && nr_io_queues == 1))
 			nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
 
 		result = pci_alloc_irq_vectors_affinity(pdev, nr_io_queues,
@@ -2111,6 +2107,9 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
 			if (!nr_io_queues)
 				return result;
 			continue;
+		} else if (result == -EINVAL) {
+			nr_io_queues = 1;
+			continue;
 		} else if (result <= 0)
 			return -EIO;
 		break;

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
       [not found] <20181115191126.GA16973@roeck-us.net>
  2018-11-15 19:29 ` Jens Axboe
@ 2018-11-15 19:36 ` Guenter Roeck
  2018-11-15 19:39   ` Jens Axboe
  1 sibling, 1 reply; 15+ messages in thread
From: Guenter Roeck @ 2018-11-15 19:36 UTC (permalink / raw)


On Thu, Nov 15, 2018@11:11:26AM -0800, Guenter Roeck wrote:
> On Wed, Nov 14, 2018@10:12:44AM -0700, Jens Axboe wrote:
> > 
> > I think the below patch should fix it.
> > 
> 
> I spoke too early. sparc64, next-20181115:
> 
> [   14.204370] nvme nvme0: pci function 0000:02:00.0
> [   14.249956] nvme nvme0: Removing after probe failure status: -5
> [   14.263496] ------------[ cut here ]------------
> [   14.263913] WARNING: CPU: 0 PID: 15 at kernel/irq/manage.c:1597 __free_irq+0xa4/0x320
> [   14.264265] Trying to free already-free IRQ 9
> [   14.264519] Modules linked in:
> [   14.264961] CPU: 0 PID: 15 Comm: kworker/u2:1 Not tainted 4.20.0-rc2-next-20181115 #1
> [   14.265555] Workqueue: nvme-reset-wq nvme_reset_work
> [   14.265899] Call Trace:
> [   14.266118]  [000000000046944c] __warn+0xcc/0x100
> [   14.266375]  [00000000004694b0] warn_slowpath_fmt+0x30/0x40
> [   14.266635]  [00000000004d4ce4] __free_irq+0xa4/0x320
> [   14.266867]  [00000000004d4ff8] free_irq+0x38/0x80
> [   14.267092]  [00000000007b1874] pci_free_irq+0x14/0x40
> [   14.267327]  [00000000008a5444] nvme_dev_disable+0xe4/0x520
> [   14.267576]  [00000000008a69b8] nvme_reset_work+0x138/0x1c60
> [   14.267827]  [0000000000488dd0] process_one_work+0x230/0x6e0
> [   14.268079]  [00000000004894f4] worker_thread+0x274/0x520
> [   14.268321]  [0000000000490624] kthread+0xe4/0x120
> [   14.268544]  [00000000004060c4] ret_from_fork+0x1c/0x2c
> [   14.268825]  [0000000000000000]           (null)
> [   14.269089] irq event stamp: 32796
> [   14.269350] hardirqs last  enabled at (32795): [<0000000000b624a4>] _raw_spin_unlock_irqrestore+0x24/0x80
> [   14.269757] hardirqs last disabled at (32796): [<0000000000b622f4>] _raw_spin_lock_irqsave+0x14/0x60
> [   14.270566] softirqs last  enabled at (32780): [<0000000000b64c18>] __do_softirq+0x238/0x520
> [   14.271206] softirqs last disabled at (32729): [<000000000042ceec>] do_softirq_own_stack+0x2c/0x40
> [   14.272288] ---[ end trace cb79ccd2a0a03f3c ]---
> 
> Looks like an error during probe followed by an error cleanup problem.
> 
On sparc64, pci_alloc_irq_vectors_affinity() returns -EINVAL (possibly
because the controller doesn't support MSI).

[   16.554753] nvme nvme0: pci function 0000:02:00.0
[   16.622894] nvme 0000:02:00.0: pre alloc: nr_io_queues: 2 result: 0
[   16.623814] nvme 0000:02:00.0: post alloc: nr_io_queues: 2 result: -22
[   16.625047] nvme nvme0: Removing after probe failure status: -5

... and, as result, allocating a single (legacy) interrupt isn't even tried.

I didn't try to track down the cleanup failure.

Guenter

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
  2018-11-15 19:36 ` Guenter Roeck
@ 2018-11-15 19:39   ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2018-11-15 19:39 UTC (permalink / raw)


On 11/15/18 12:36 PM, Guenter Roeck wrote:
> On Thu, Nov 15, 2018@11:11:26AM -0800, Guenter Roeck wrote:
>> On Wed, Nov 14, 2018@10:12:44AM -0700, Jens Axboe wrote:
>>>
>>> I think the below patch should fix it.
>>>
>>
>> I spoke too early. sparc64, next-20181115:
>>
>> [   14.204370] nvme nvme0: pci function 0000:02:00.0
>> [   14.249956] nvme nvme0: Removing after probe failure status: -5
>> [   14.263496] ------------[ cut here ]------------
>> [   14.263913] WARNING: CPU: 0 PID: 15 at kernel/irq/manage.c:1597 __free_irq+0xa4/0x320
>> [   14.264265] Trying to free already-free IRQ 9
>> [   14.264519] Modules linked in:
>> [   14.264961] CPU: 0 PID: 15 Comm: kworker/u2:1 Not tainted 4.20.0-rc2-next-20181115 #1
>> [   14.265555] Workqueue: nvme-reset-wq nvme_reset_work
>> [   14.265899] Call Trace:
>> [   14.266118]  [000000000046944c] __warn+0xcc/0x100
>> [   14.266375]  [00000000004694b0] warn_slowpath_fmt+0x30/0x40
>> [   14.266635]  [00000000004d4ce4] __free_irq+0xa4/0x320
>> [   14.266867]  [00000000004d4ff8] free_irq+0x38/0x80
>> [   14.267092]  [00000000007b1874] pci_free_irq+0x14/0x40
>> [   14.267327]  [00000000008a5444] nvme_dev_disable+0xe4/0x520
>> [   14.267576]  [00000000008a69b8] nvme_reset_work+0x138/0x1c60
>> [   14.267827]  [0000000000488dd0] process_one_work+0x230/0x6e0
>> [   14.268079]  [00000000004894f4] worker_thread+0x274/0x520
>> [   14.268321]  [0000000000490624] kthread+0xe4/0x120
>> [   14.268544]  [00000000004060c4] ret_from_fork+0x1c/0x2c
>> [   14.268825]  [0000000000000000]           (null)
>> [   14.269089] irq event stamp: 32796
>> [   14.269350] hardirqs last  enabled at (32795): [<0000000000b624a4>] _raw_spin_unlock_irqrestore+0x24/0x80
>> [   14.269757] hardirqs last disabled at (32796): [<0000000000b622f4>] _raw_spin_lock_irqsave+0x14/0x60
>> [   14.270566] softirqs last  enabled at (32780): [<0000000000b64c18>] __do_softirq+0x238/0x520
>> [   14.271206] softirqs last disabled at (32729): [<000000000042ceec>] do_softirq_own_stack+0x2c/0x40
>> [   14.272288] ---[ end trace cb79ccd2a0a03f3c ]---
>>
>> Looks like an error during probe followed by an error cleanup problem.
>>
> On sparc64, pci_alloc_irq_vectors_affinity() returns -EINVAL (possibly
> because the controller doesn't support MSI).
> 
> [   16.554753] nvme nvme0: pci function 0000:02:00.0
> [   16.622894] nvme 0000:02:00.0: pre alloc: nr_io_queues: 2 result: 0
> [   16.623814] nvme 0000:02:00.0: post alloc: nr_io_queues: 2 result: -22
> [   16.625047] nvme nvme0: Removing after probe failure status: -5
> 
> ... and, as result, allocating a single (legacy) interrupt isn't even tried.
> 
> I didn't try to track down the cleanup failure.

OK, then this isn't a new failure in terms of whether the nvme device will
work, it's just a cleanup issue.

That's less severe than the previous hang :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <20181115224634.GA13101@roeck-us.net>]

* [PATCH] nvme: utilize two queue maps, one for reads and one for writes
       [not found] <20181115224634.GA13101@roeck-us.net>
@ 2018-11-15 23:03 ` Jens Axboe
  0 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2018-11-15 23:03 UTC (permalink / raw)


On 11/15/18 3:46 PM, Guenter Roeck wrote:
> On Thu, Nov 15, 2018@03:12:48PM -0700, Jens Axboe wrote:
>> On 11/15/18 3:06 PM, Guenter Roeck wrote:
>>> On Thu, Nov 15, 2018@12:43:40PM -0700, Jens Axboe wrote:
>>>> On 11/15/18 12:40 PM, Jens Axboe wrote:
>>>>> On 11/15/18 12:38 PM, Guenter Roeck wrote:
>>>>>> On Thu, Nov 15, 2018@12:29:04PM -0700, Jens Axboe wrote:
>>>>>>> On 11/15/18 12:11 PM, Guenter Roeck wrote:
>>>>>>>> On Wed, Nov 14, 2018@10:12:44AM -0700, Jens Axboe wrote:
>>>>>>>>>
>>>>>>>>> I think the below patch should fix it.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I spoke too early. sparc64, next-20181115:
>>>>>>>>
>>>>>>>> [   14.204370] nvme nvme0: pci function 0000:02:00.0
>>>>>>>> [   14.249956] nvme nvme0: Removing after probe failure status: -5
>>>>>>>> [   14.263496] ------------[ cut here ]------------
>>>>>>>> [   14.263913] WARNING: CPU: 0 PID: 15 at kernel/irq/manage.c:1597 __free_irq+0xa4/0x320
>>>>>>>> [   14.264265] Trying to free already-free IRQ 9
>>>>>>>> [   14.264519] Modules linked in:
>>>>>>>> [   14.264961] CPU: 0 PID: 15 Comm: kworker/u2:1 Not tainted 4.20.0-rc2-next-20181115 #1
>>>>>>>> [   14.265555] Workqueue: nvme-reset-wq nvme_reset_work
>>>>>>>> [   14.265899] Call Trace:
>>>>>>>> [   14.266118]  [000000000046944c] __warn+0xcc/0x100
>>>>>>>> [   14.266375]  [00000000004694b0] warn_slowpath_fmt+0x30/0x40
>>>>>>>> [   14.266635]  [00000000004d4ce4] __free_irq+0xa4/0x320
>>>>>>>> [   14.266867]  [00000000004d4ff8] free_irq+0x38/0x80
>>>>>>>> [   14.267092]  [00000000007b1874] pci_free_irq+0x14/0x40
>>>>>>>> [   14.267327]  [00000000008a5444] nvme_dev_disable+0xe4/0x520
>>>>>>>> [   14.267576]  [00000000008a69b8] nvme_reset_work+0x138/0x1c60
>>>>>>>> [   14.267827]  [0000000000488dd0] process_one_work+0x230/0x6e0
>>>>>>>> [   14.268079]  [00000000004894f4] worker_thread+0x274/0x520
>>>>>>>> [   14.268321]  [0000000000490624] kthread+0xe4/0x120
>>>>>>>> [   14.268544]  [00000000004060c4] ret_from_fork+0x1c/0x2c
>>>>>>>> [   14.268825]  [0000000000000000]           (null)
>>>>>>>> [   14.269089] irq event stamp: 32796
>>>>>>>> [   14.269350] hardirqs last  enabled at (32795): [<0000000000b624a4>] _raw_spin_unlock_irqrestore+0x24/0x80
>>>>>>>> [   14.269757] hardirqs last disabled at (32796): [<0000000000b622f4>] _raw_spin_lock_irqsave+0x14/0x60
>>>>>>>> [   14.270566] softirqs last  enabled at (32780): [<0000000000b64c18>] __do_softirq+0x238/0x520
>>>>>>>> [   14.271206] softirqs last disabled at (32729): [<000000000042ceec>] do_softirq_own_stack+0x2c/0x40
>>>>>>>> [   14.272288] ---[ end trace cb79ccd2a0a03f3c ]---
>>>>>>>>
>>>>>>>> Looks like an error during probe followed by an error cleanup problem.
>>>>>>>
>>>>>>> Did it previous probe fine? Or is the new thing just the fact that
>>>>>>> we spew a warning on trying to free a non-existing vector?
>>>>>>>
>>>>>> This works fine in mainline, if that is your question.
>>>>>
>>>>> Yeah, as soon as I sent the other email I realized that. Let me send
>>>>> you a quick patch.
>>>>
>>>> How's this?
>>>>
>>>>
>>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>>> index ffbab5b01df4..fd73bfd2d1be 100644
>>>> --- a/drivers/nvme/host/pci.c
>>>> +++ b/drivers/nvme/host/pci.c
>>>> @@ -2088,15 +2088,11 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
>>>>  			affd.nr_sets = 1;
>>>>  
>>>>  		/*
>>>> -		 * Need IRQs for read+write queues, and one for the admin queue.
>>>> -		 * If we can't get more than one vector, we have to share the
>>>> -		 * admin queue and IO queue vector. For that case, don't add
>>>> -		 * an extra vector for the admin queue, or we'll continue
>>>> -		 * asking for 2 and get -ENOSPC in return.
>>>> +		 * If we got a failure and we're down to asking for just
>>>> +		 * 1 + 1 queues, just ask for a single vector. We'll share
>>>> +		 * that between the single IO queue and the admin queue.
>>>>  		 */
>>>> -		if (result == -ENOSPC && nr_io_queues == 1)
>>>> -			nr_io_queues = 1;
>>>> -		else
>>>> +		if (!(result < 0 && nr_io_queues == 1))
>>>>  			nr_io_queues = irq_sets[0] + irq_sets[1] + 1;
>>>>  
>>>
>>> Unfortunately, the code doesn't even get here because the call of
>>> pci_alloc_irq_vectors_affinity in the first iteration fails with
>>> -EINVAL, which results in an immediate return with -EIO.
>>
>> Oh yeah... How about this then?
>>
> Yes, this one works (at least on sparc64). Do I need to test
> on other architectures as well ?

Should be fine, hopefully... Thanks for testing!

>> @@ -2111,6 +2107,9 @@ static int nvme_setup_irqs(struct nvme_dev *dev, int nr_io_queues)
>>  			if (!nr_io_queues)
>>  				return result;
>>  			continue;
>> +		} else if (result == -EINVAL) {
> 
> Add an explanation, maybe ?

Yeah, I'll add a proper comment, this was just for testing.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-11-15 23:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20181114004148.GA29545@roeck-us.net>
2018-11-14  0:51 ` [PATCH] nvme: utilize two queue maps, one for reads and one for writes Jens Axboe
2018-11-14  1:28   ` Mike Snitzer
2018-11-14  1:36     ` Mike Snitzer
2018-11-14  4:52   ` [PATCH] " Guenter Roeck
2018-11-14 17:12     ` Jens Axboe
     [not found] <20181115182833.GA15729@roeck-us.net>
2018-11-15 18:38 ` Jens Axboe
     [not found] <20181115191126.GA16973@roeck-us.net>
2018-11-15 19:29 ` Jens Axboe
2018-11-15 19:38   ` Guenter Roeck
2018-11-15 19:40     ` Jens Axboe
2018-11-15 19:43       ` Jens Axboe
2018-11-15 22:06         ` Guenter Roeck
2018-11-15 22:12           ` Jens Axboe
2018-11-15 19:36 ` Guenter Roeck
2018-11-15 19:39   ` Jens Axboe
     [not found] <20181115224634.GA13101@roeck-us.net>
2018-11-15 23:03 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).