* [bug report] blktests nvme/004 failed after offline cpu
@ 2022-06-30 6:02 Yi Zhang
2022-07-04 5:42 ` [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side Yi Zhang
0 siblings, 1 reply; 10+ messages in thread
From: Yi Zhang @ 2022-06-30 6:02 UTC (permalink / raw)
To: open list:NVM EXPRESS DRIVER, linux-block
Hello
I found this issue when I run blktests after offline cpus on
linux-block/for-next, here are the steps and dmesg log,
and from the log, the test failed with the target connect, feel free
to let me know if you need any info/test, thanks.
# echo 0 >/sys/devices/system/cpu/cpu0/online
# ./check nvme/004
nvme/004 (test nvme and nvmet UUID NS descriptors) [failed]
runtime ... 0.725s
--- tests/nvme/004.out 2022-06-30 01:50:53.637275584 -0400
+++ /root/blktests/results/nodev/nvme/004.out.bad 2022-06-30
01:55:22.321399448 -0400
@@ -1,5 +1,7 @@
Running nvme/004
-91fdba0d-f87b-4c25-b80f-db7be1418b9e
-uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e
-NQN:blktests-subsystem-1 disconnected 1 controller(s)
+Failed to write to /dev/nvme-fabrics: Invalid cross-device link
+cat: '/sys/class/nvme/nvme*/subsysnqn': No such file or directory
+cat: /sys/block/n1/uuid: No such file or directory
...
(Run 'diff -u tests/nvme/004.out
/root/blktests/results/nodev/nvme/004.out.bad' to see the entire diff)
# dmesg
[ 1526.169417] numa_remove_cpu cpu 0 node 0: mask now 1-31
[ 1526.170619] smpboot: CPU 0 is now offline
[ 1531.030430] loop: module loaded
[ 1531.115255] run blktests nvme/004 at 2022-06-30 01:55:21
[ 1531.305557] loop0: detected capacity change from 0 to 2097152
[ 1531.354299] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 1531.402815] nvmet: creating nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333.
[ 1531.404124] nvme nvme0: creating 31 I/O queues.
[ 1531.448181] nvme nvme0: Connect command failed, error wo/DNR bit: -16402
--
Best Regards,
Yi Zhang
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side 2022-06-30 6:02 [bug report] blktests nvme/004 failed after offline cpu Yi Zhang @ 2022-07-04 5:42 ` Yi Zhang 2022-07-04 23:04 ` Sagi Grimberg 0 siblings, 1 reply; 10+ messages in thread From: Yi Zhang @ 2022-07-04 5:42 UTC (permalink / raw) To: open list:NVM EXPRESS DRIVER, linux-block Cc: Sagi Grimberg, Bart Van Assche, Ming Lei update the subject to better describe the issue: So I tried this issue on one nvme/rdma environment, and it was also reproducible, here are the steps: # echo 0 >/sys/devices/system/cpu/cpu0/online # dmesg | tail -10 [ 781.577235] smpboot: CPU 0 is now offline # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn Failed to write to /dev/nvme-fabrics: Invalid cross-device link no controller found: failed to write to nvme-fabrics device # dmesg [ 781.577235] smpboot: CPU 0 is now offline [ 799.471627] nvme nvme0: creating 39 I/O queues. [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18 On Thu, Jun 30, 2022 at 2:02 PM Yi Zhang <yi.zhang@redhat.com> wrote: > > Hello > I found this issue when I run blktests after offline cpus on > linux-block/for-next, here are the steps and dmesg log, > and from the log, the test failed with the target connect, feel free > to let me know if you need any info/test, thanks. > > # echo 0 >/sys/devices/system/cpu/cpu0/online > # ./check nvme/004 > nvme/004 (test nvme and nvmet UUID NS descriptors) [failed] > runtime ... 0.725s > --- tests/nvme/004.out 2022-06-30 01:50:53.637275584 -0400 > +++ /root/blktests/results/nodev/nvme/004.out.bad 2022-06-30 > 01:55:22.321399448 -0400 > @@ -1,5 +1,7 @@ > Running nvme/004 > -91fdba0d-f87b-4c25-b80f-db7be1418b9e > -uuid.91fdba0d-f87b-4c25-b80f-db7be1418b9e > -NQN:blktests-subsystem-1 disconnected 1 controller(s) > +Failed to write to /dev/nvme-fabrics: Invalid cross-device link > +cat: '/sys/class/nvme/nvme*/subsysnqn': No such file or directory > +cat: /sys/block/n1/uuid: No such file or directory > ... > (Run 'diff -u tests/nvme/004.out > /root/blktests/results/nodev/nvme/004.out.bad' to see the entire diff) > # dmesg > [ 1526.169417] numa_remove_cpu cpu 0 node 0: mask now 1-31 > [ 1526.170619] smpboot: CPU 0 is now offline > [ 1531.030430] loop: module loaded > [ 1531.115255] run blktests nvme/004 at 2022-06-30 01:55:21 > [ 1531.305557] loop0: detected capacity change from 0 to 2097152 > [ 1531.354299] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 > [ 1531.402815] nvmet: creating nvm controller 1 for subsystem > blktests-subsystem-1 for NQN > nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333. > [ 1531.404124] nvme nvme0: creating 31 I/O queues. > [ 1531.448181] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 > > -- > Best Regards, > Yi Zhang -- Best Regards, Yi Zhang ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side 2022-07-04 5:42 ` [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side Yi Zhang @ 2022-07-04 23:04 ` Sagi Grimberg 2022-07-05 0:49 ` Ming Lei 0 siblings, 1 reply; 10+ messages in thread From: Sagi Grimberg @ 2022-07-04 23:04 UTC (permalink / raw) To: Yi Zhang, open list:NVM EXPRESS DRIVER, linux-block Cc: Bart Van Assche, Ming Lei > update the subject to better describe the issue: > > So I tried this issue on one nvme/rdma environment, and it was also > reproducible, here are the steps: > > # echo 0 >/sys/devices/system/cpu/cpu0/online > # dmesg | tail -10 > [ 781.577235] smpboot: CPU 0 is now offline > # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn > Failed to write to /dev/nvme-fabrics: Invalid cross-device link > no controller found: failed to write to nvme-fabrics device > > # dmesg > [ 781.577235] smpboot: CPU 0 is now offline > [ 799.471627] nvme nvme0: creating 39 I/O queues. > [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. > [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 > [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18 This is because of blk_mq_alloc_request_hctx() and was raised before. IIRC there was reluctance to make it allocate a request for an hctx even if its associated mapped cpu is offline. The latest attempt was from Ming: [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx Don't know where that went tho... ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side 2022-07-04 23:04 ` Sagi Grimberg @ 2022-07-05 0:49 ` Ming Lei 2022-07-06 15:30 ` Sagi Grimberg 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2022-07-05 0:49 UTC (permalink / raw) To: Sagi Grimberg Cc: Yi Zhang, open list:NVM EXPRESS DRIVER, linux-block, Bart Van Assche On Tue, Jul 05, 2022 at 02:04:53AM +0300, Sagi Grimberg wrote: > > > update the subject to better describe the issue: > > > > So I tried this issue on one nvme/rdma environment, and it was also > > reproducible, here are the steps: > > > > # echo 0 >/sys/devices/system/cpu/cpu0/online > > # dmesg | tail -10 > > [ 781.577235] smpboot: CPU 0 is now offline > > # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn > > Failed to write to /dev/nvme-fabrics: Invalid cross-device link > > no controller found: failed to write to nvme-fabrics device > > > > # dmesg > > [ 781.577235] smpboot: CPU 0 is now offline > > [ 799.471627] nvme nvme0: creating 39 I/O queues. > > [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. > > [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 > > [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18 > > This is because of blk_mq_alloc_request_hctx() and was raised before. > > IIRC there was reluctance to make it allocate a request for an hctx even > if its associated mapped cpu is offline. > > The latest attempt was from Ming: > [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx > > Don't know where that went tho... The attempt relies on that the queue for connecting io queue uses non-admined irq, unfortunately that can't be true for all drivers, so that way can't go. So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed io queue, then the nvme host still can be setup with less io queues. Otherwise nvme_*_connect_io_queues() could fail easily, especially for 1:1 mapping. Thanks, Ming ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side 2022-07-05 0:49 ` Ming Lei @ 2022-07-06 15:30 ` Sagi Grimberg 2022-07-07 1:46 ` Ming Lei 0 siblings, 1 reply; 10+ messages in thread From: Sagi Grimberg @ 2022-07-06 15:30 UTC (permalink / raw) To: Ming Lei Cc: Yi Zhang, open list:NVM EXPRESS DRIVER, linux-block, Bart Van Assche >>> update the subject to better describe the issue: >>> >>> So I tried this issue on one nvme/rdma environment, and it was also >>> reproducible, here are the steps: >>> >>> # echo 0 >/sys/devices/system/cpu/cpu0/online >>> # dmesg | tail -10 >>> [ 781.577235] smpboot: CPU 0 is now offline >>> # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn >>> Failed to write to /dev/nvme-fabrics: Invalid cross-device link >>> no controller found: failed to write to nvme-fabrics device >>> >>> # dmesg >>> [ 781.577235] smpboot: CPU 0 is now offline >>> [ 799.471627] nvme nvme0: creating 39 I/O queues. >>> [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. >>> [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 >>> [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18 >> >> This is because of blk_mq_alloc_request_hctx() and was raised before. >> >> IIRC there was reluctance to make it allocate a request for an hctx even >> if its associated mapped cpu is offline. >> >> The latest attempt was from Ming: >> [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx >> >> Don't know where that went tho... > > The attempt relies on that the queue for connecting io queue uses > non-admined irq, unfortunately that can't be true for all drivers, > so that way can't go. The only consumer is nvme-fabrics, so others don't matter. Maybe we need a different interface that allows this relaxation. > So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed > io queue, then the nvme host still can be setup with less io queues. What happens when the CPU comes back? Not sure we can simply ignore it. > Otherwise nvme_*_connect_io_queues() could fail easily, especially for > 1:1 mapping. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side 2022-07-06 15:30 ` Sagi Grimberg @ 2022-07-07 1:46 ` Ming Lei 2022-07-07 7:28 ` Sagi Grimberg 0 siblings, 1 reply; 10+ messages in thread From: Ming Lei @ 2022-07-07 1:46 UTC (permalink / raw) To: Sagi Grimberg Cc: Yi Zhang, open list:NVM EXPRESS DRIVER, linux-block, Bart Van Assche On Wed, Jul 06, 2022 at 06:30:43PM +0300, Sagi Grimberg wrote: > > > > > update the subject to better describe the issue: > > > > > > > > So I tried this issue on one nvme/rdma environment, and it was also > > > > reproducible, here are the steps: > > > > > > > > # echo 0 >/sys/devices/system/cpu/cpu0/online > > > > # dmesg | tail -10 > > > > [ 781.577235] smpboot: CPU 0 is now offline > > > > # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn > > > > Failed to write to /dev/nvme-fabrics: Invalid cross-device link > > > > no controller found: failed to write to nvme-fabrics device > > > > > > > > # dmesg > > > > [ 781.577235] smpboot: CPU 0 is now offline > > > > [ 799.471627] nvme nvme0: creating 39 I/O queues. > > > > [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. > > > > [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 > > > > [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18 > > > > > > This is because of blk_mq_alloc_request_hctx() and was raised before. > > > > > > IIRC there was reluctance to make it allocate a request for an hctx even > > > if its associated mapped cpu is offline. > > > > > > The latest attempt was from Ming: > > > [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx > > > > > > Don't know where that went tho... > > > > The attempt relies on that the queue for connecting io queue uses > > non-admined irq, unfortunately that can't be true for all drivers, > > so that way can't go. > > The only consumer is nvme-fabrics, so others don't matter. > Maybe we need a different interface that allows this relaxation. > > > So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed > > io queue, then the nvme host still can be setup with less io queues. > > What happens when the CPU comes back? Not sure we can simply ignore it. Anyway, it is a not good choice to fail the whole controller if only one queue can't be connected. I meant the queue can be kept as non-LIVE, and it should work since no any io can be issued to this queue when it is non-LIVE. Just wondering why we can't re-connect the io queue and set LIVE after any CPU in the this hctx->cpumask becomes online? blk-mq could add one pair of callbacks for driver for handing this queue change. thanks, Ming ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side 2022-07-07 1:46 ` Ming Lei @ 2022-07-07 7:28 ` Sagi Grimberg 2022-07-07 8:07 ` Ming Lei 2022-07-26 2:05 ` Ming Lei 0 siblings, 2 replies; 10+ messages in thread From: Sagi Grimberg @ 2022-07-07 7:28 UTC (permalink / raw) To: Ming Lei Cc: Yi Zhang, open list:NVM EXPRESS DRIVER, linux-block, Bart Van Assche >>>>> update the subject to better describe the issue: >>>>> >>>>> So I tried this issue on one nvme/rdma environment, and it was also >>>>> reproducible, here are the steps: >>>>> >>>>> # echo 0 >/sys/devices/system/cpu/cpu0/online >>>>> # dmesg | tail -10 >>>>> [ 781.577235] smpboot: CPU 0 is now offline >>>>> # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn >>>>> Failed to write to /dev/nvme-fabrics: Invalid cross-device link >>>>> no controller found: failed to write to nvme-fabrics device >>>>> >>>>> # dmesg >>>>> [ 781.577235] smpboot: CPU 0 is now offline >>>>> [ 799.471627] nvme nvme0: creating 39 I/O queues. >>>>> [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. >>>>> [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 >>>>> [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18 >>>> >>>> This is because of blk_mq_alloc_request_hctx() and was raised before. >>>> >>>> IIRC there was reluctance to make it allocate a request for an hctx even >>>> if its associated mapped cpu is offline. >>>> >>>> The latest attempt was from Ming: >>>> [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx >>>> >>>> Don't know where that went tho... >>> >>> The attempt relies on that the queue for connecting io queue uses >>> non-admined irq, unfortunately that can't be true for all drivers, >>> so that way can't go. >> >> The only consumer is nvme-fabrics, so others don't matter. >> Maybe we need a different interface that allows this relaxation. >> >>> So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed >>> io queue, then the nvme host still can be setup with less io queues. >> >> What happens when the CPU comes back? Not sure we can simply ignore it. > > Anyway, it is a not good choice to fail the whole controller if only one > queue can't be connected. That is irrelevant. > I meant the queue can be kept as non-LIVE, and > it should work since no any io can be issued to this queue when it is > non-LIVE. The way that nvme-pci behaves is to create all the queues and either have them idle when their mapped cpu is offline, and have the queue there and ready when the cpu comes back. It is the simpler approach and I would like to have it for fabrics too, but to establish a fabrics queue we need to send a request (connect) to the controller. The fact that we cannot simply get a reference to a request for a given hw queue is baffling to me. > Just wondering why we can't re-connect the io queue and set LIVE after > any CPU in the this hctx->cpumask becomes online? blk-mq could add one > pair of callbacks for driver for handing this queue change. Certainly possible, but you are creating yet another interface solely for nvme-fabrics that covers up for the existing interface that does not satisfy what nvme-fabrics (the only consumer of it) would like it to do. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side 2022-07-07 7:28 ` Sagi Grimberg @ 2022-07-07 8:07 ` Ming Lei 2022-07-26 2:05 ` Ming Lei 1 sibling, 0 replies; 10+ messages in thread From: Ming Lei @ 2022-07-07 8:07 UTC (permalink / raw) To: Sagi Grimberg Cc: Yi Zhang, open list:NVM EXPRESS DRIVER, linux-block, Bart Van Assche On Thu, Jul 07, 2022 at 10:28:22AM +0300, Sagi Grimberg wrote: > > > > > > > update the subject to better describe the issue: > > > > > > > > > > > > So I tried this issue on one nvme/rdma environment, and it was also > > > > > > reproducible, here are the steps: > > > > > > > > > > > > # echo 0 >/sys/devices/system/cpu/cpu0/online > > > > > > # dmesg | tail -10 > > > > > > [ 781.577235] smpboot: CPU 0 is now offline > > > > > > # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn > > > > > > Failed to write to /dev/nvme-fabrics: Invalid cross-device link > > > > > > no controller found: failed to write to nvme-fabrics device > > > > > > > > > > > > # dmesg > > > > > > [ 781.577235] smpboot: CPU 0 is now offline > > > > > > [ 799.471627] nvme nvme0: creating 39 I/O queues. > > > > > > [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. > > > > > > [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 > > > > > > [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18 > > > > > > > > > > This is because of blk_mq_alloc_request_hctx() and was raised before. > > > > > > > > > > IIRC there was reluctance to make it allocate a request for an hctx even > > > > > if its associated mapped cpu is offline. > > > > > > > > > > The latest attempt was from Ming: > > > > > [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx > > > > > > > > > > Don't know where that went tho... > > > > > > > > The attempt relies on that the queue for connecting io queue uses > > > > non-admined irq, unfortunately that can't be true for all drivers, > > > > so that way can't go. > > > > > > The only consumer is nvme-fabrics, so others don't matter. > > > Maybe we need a different interface that allows this relaxation. > > > > > > > So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed > > > > io queue, then the nvme host still can be setup with less io queues. > > > > > > What happens when the CPU comes back? Not sure we can simply ignore it. > > > > Anyway, it is a not good choice to fail the whole controller if only one > > queue can't be connected. > > That is irrelevant. Isn't the exact issue reported by Yi? If there is one cpu offline, the controller may not be setup in case of 1:1 mapping, do you think this way is reasonable? > > > I meant the queue can be kept as non-LIVE, and > > it should work since no any io can be issued to this queue when it is > > non-LIVE. > > The way that nvme-pci behaves is to create all the queues and either > have them idle when their mapped cpu is offline, and have the queue > there and ready when the cpu comes back. It is the simpler approach and > I would like to have it for fabrics too, but to establish a fabrics > queue we need to send a request (connect) to the controller. The fact > that we cannot simply get a reference to a request for a given hw queue > is baffling to me. It is because the connection need one request from specified hctx, this way is anti blk-mq queue design. Previously kernel panic is caused, but now controller can't be setup if any io queue can't be connected. > > > Just wondering why we can't re-connect the io queue and set LIVE after > > any CPU in the this hctx->cpumask becomes online? blk-mq could add one > > pair of callbacks for driver for handing this queue change. > Certainly possible, but you are creating yet another interface solely > for nvme-fabrics that covers up for the existing interface that does not > satisfy what nvme-fabrics (the only consumer of it) would like it to do. The interface can be well defined, and may have generic usage, such as, delay allocating request pool until the queue becomes active(any cpu in its mapping becomes online) for saving memory consumption. Thanks, Ming ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side 2022-07-07 7:28 ` Sagi Grimberg 2022-07-07 8:07 ` Ming Lei @ 2022-07-26 2:05 ` Ming Lei 2022-07-26 8:56 ` Sagi Grimberg 1 sibling, 1 reply; 10+ messages in thread From: Ming Lei @ 2022-07-26 2:05 UTC (permalink / raw) To: Sagi Grimberg Cc: Yi Zhang, open list:NVM EXPRESS DRIVER, linux-block, Bart Van Assche On Thu, Jul 07, 2022 at 10:28:22AM +0300, Sagi Grimberg wrote: > > > > > > > update the subject to better describe the issue: > > > > > > > > > > > > So I tried this issue on one nvme/rdma environment, and it was also > > > > > > reproducible, here are the steps: > > > > > > > > > > > > # echo 0 >/sys/devices/system/cpu/cpu0/online > > > > > > # dmesg | tail -10 > > > > > > [ 781.577235] smpboot: CPU 0 is now offline > > > > > > # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn > > > > > > Failed to write to /dev/nvme-fabrics: Invalid cross-device link > > > > > > no controller found: failed to write to nvme-fabrics device > > > > > > > > > > > > # dmesg > > > > > > [ 781.577235] smpboot: CPU 0 is now offline > > > > > > [ 799.471627] nvme nvme0: creating 39 I/O queues. > > > > > > [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. > > > > > > [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 > > > > > > [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18 > > > > > > > > > > This is because of blk_mq_alloc_request_hctx() and was raised before. > > > > > > > > > > IIRC there was reluctance to make it allocate a request for an hctx even > > > > > if its associated mapped cpu is offline. > > > > > > > > > > The latest attempt was from Ming: > > > > > [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx > > > > > > > > > > Don't know where that went tho... > > > > > > > > The attempt relies on that the queue for connecting io queue uses > > > > non-admined irq, unfortunately that can't be true for all drivers, > > > > so that way can't go. > > > > > > The only consumer is nvme-fabrics, so others don't matter. > > > Maybe we need a different interface that allows this relaxation. > > > > > > > So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed > > > > io queue, then the nvme host still can be setup with less io queues. > > > > > > What happens when the CPU comes back? Not sure we can simply ignore it. > > > > Anyway, it is a not good choice to fail the whole controller if only one > > queue can't be connected. > > That is irrelevant. > > > I meant the queue can be kept as non-LIVE, and > > it should work since no any io can be issued to this queue when it is > > non-LIVE. > > The way that nvme-pci behaves is to create all the queues and either > have them idle when their mapped cpu is offline, and have the queue > there and ready when the cpu comes back. It is the simpler approach and > I would like to have it for fabrics too, but to establish a fabrics > queue we need to send a request (connect) to the controller. The fact > that we cannot simply get a reference to a request for a given hw queue > is baffling to me. > > > Just wondering why we can't re-connect the io queue and set LIVE after > > any CPU in the this hctx->cpumask becomes online? blk-mq could add one > > pair of callbacks for driver for handing this queue change. > Certainly possible, but you are creating yet another interface solely > for nvme-fabrics that covers up for the existing interface that does not > satisfy what nvme-fabrics (the only consumer of it) would like it to do. I guess you meant that the others(rdma and tcp) use non-managed queue, so they needn't such change? But it isn't true actually, blk-mq/nvme still can't handle it well. From blk-mq's viewpoint, if all CPUs in hctx->cpumask are offline, it will treat the hctx as inactive and not workable, and refuses to allocate request from this hctx, no matter if the underlying queue irq is managed or not. Now after 14dc7a18abbe ("block: Fix handling of offline queues in blk_mq_alloc_request_hctx(), it may break controller setup easily if any CPU is offline. I'd suggest to fix the issue in unified way since nvme-fabric needs to be covered, then nvme's user experience can be improved. BTW, I guess rdma/tcp/fc's queue may take extra or bigger resources than nvme pci, if resource are only allocated until the queue is active, queue resource utilization may be improved. Thanks, Ming ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side 2022-07-26 2:05 ` Ming Lei @ 2022-07-26 8:56 ` Sagi Grimberg 0 siblings, 0 replies; 10+ messages in thread From: Sagi Grimberg @ 2022-07-26 8:56 UTC (permalink / raw) To: Ming Lei Cc: Yi Zhang, open list:NVM EXPRESS DRIVER, linux-block, Bart Van Assche On 7/26/22 05:05, Ming Lei wrote: > On Thu, Jul 07, 2022 at 10:28:22AM +0300, Sagi Grimberg wrote: >> >>>>>>> update the subject to better describe the issue: >>>>>>> >>>>>>> So I tried this issue on one nvme/rdma environment, and it was also >>>>>>> reproducible, here are the steps: >>>>>>> >>>>>>> # echo 0 >/sys/devices/system/cpu/cpu0/online >>>>>>> # dmesg | tail -10 >>>>>>> [ 781.577235] smpboot: CPU 0 is now offline >>>>>>> # nvme connect -t rdma -a 172.31.45.202 -s 4420 -n testnqn >>>>>>> Failed to write to /dev/nvme-fabrics: Invalid cross-device link >>>>>>> no controller found: failed to write to nvme-fabrics device >>>>>>> >>>>>>> # dmesg >>>>>>> [ 781.577235] smpboot: CPU 0 is now offline >>>>>>> [ 799.471627] nvme nvme0: creating 39 I/O queues. >>>>>>> [ 801.053782] nvme nvme0: mapped 39/0/0 default/read/poll queues. >>>>>>> [ 801.064149] nvme nvme0: Connect command failed, error wo/DNR bit: -16402 >>>>>>> [ 801.073059] nvme nvme0: failed to connect queue: 1 ret=-18 >>>>>> >>>>>> This is because of blk_mq_alloc_request_hctx() and was raised before. >>>>>> >>>>>> IIRC there was reluctance to make it allocate a request for an hctx even >>>>>> if its associated mapped cpu is offline. >>>>>> >>>>>> The latest attempt was from Ming: >>>>>> [PATCH V7 0/3] blk-mq: fix blk_mq_alloc_request_hctx >>>>>> >>>>>> Don't know where that went tho... >>>>> >>>>> The attempt relies on that the queue for connecting io queue uses >>>>> non-admined irq, unfortunately that can't be true for all drivers, >>>>> so that way can't go. >>>> >>>> The only consumer is nvme-fabrics, so others don't matter. >>>> Maybe we need a different interface that allows this relaxation. >>>> >>>>> So far, I'd suggest to fix nvme_*_connect_io_queues() to ignore failed >>>>> io queue, then the nvme host still can be setup with less io queues. >>>> >>>> What happens when the CPU comes back? Not sure we can simply ignore it. >>> >>> Anyway, it is a not good choice to fail the whole controller if only one >>> queue can't be connected. >> >> That is irrelevant. >> >>> I meant the queue can be kept as non-LIVE, and >>> it should work since no any io can be issued to this queue when it is >>> non-LIVE. >> >> The way that nvme-pci behaves is to create all the queues and either >> have them idle when their mapped cpu is offline, and have the queue >> there and ready when the cpu comes back. It is the simpler approach and >> I would like to have it for fabrics too, but to establish a fabrics >> queue we need to send a request (connect) to the controller. The fact >> that we cannot simply get a reference to a request for a given hw queue >> is baffling to me. >> >>> Just wondering why we can't re-connect the io queue and set LIVE after >>> any CPU in the this hctx->cpumask becomes online? blk-mq could add one >>> pair of callbacks for driver for handing this queue change. >> Certainly possible, but you are creating yet another interface solely >> for nvme-fabrics that covers up for the existing interface that does not >> satisfy what nvme-fabrics (the only consumer of it) would like it to do. > > I guess you meant that the others(rdma and tcp) use non-managed queue, > so they needn't such change? > > But it isn't true actually, blk-mq/nvme still can't handle it well. From > blk-mq's viewpoint, if all CPUs in hctx->cpumask are offline, it will > treat the hctx as inactive and not workable, and refuses to allocate > request from this hctx, no matter if the underlying queue irq is managed > or not. > > Now after 14dc7a18abbe ("block: Fix handling of offline queues in > blk_mq_alloc_request_hctx(), it may break controller setup easily if > any CPU is offline. > > I'd suggest to fix the issue in unified way since nvme-fabric needs to be > covered, then nvme's user experience can be improved. That is exactly what I want, but unlike pcie, nvmf creates the queue using a connect request that is not driven from a user context. Hence it would be nice to have an interface to get it done. The alternative would be to make nvmf connect not use blk-mq, but that is not a good alternative in my mind. Having a callback interface for cpu hotplug is just another interface that every transport will need to implement, and it makes nvmf different than pci. > BTW, I guess rdma/tcp/fc's queue may take extra or bigger resources than > nvme pci, if resource are only allocated until the queue is active, queue > resource utilization may be improved. That is not a concern what-so-ever. Queue resources are cheap enough that we shouldn't have to care about it in this scale. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-07-26 8:56 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-06-30 6:02 [bug report] blktests nvme/004 failed after offline cpu Yi Zhang 2022-07-04 5:42 ` [bug report] nvme/rdma: nvme connect failed after offline one cpu on host side Yi Zhang 2022-07-04 23:04 ` Sagi Grimberg 2022-07-05 0:49 ` Ming Lei 2022-07-06 15:30 ` Sagi Grimberg 2022-07-07 1:46 ` Ming Lei 2022-07-07 7:28 ` Sagi Grimberg 2022-07-07 8:07 ` Ming Lei 2022-07-26 2:05 ` Ming Lei 2022-07-26 8:56 ` Sagi Grimberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox