* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
[not found] <1119455866.5604170.1527936593726.JavaMail.zimbra@redhat.com>
@ 2018-06-02 11:25 ` Yi Zhang
2018-06-03 12:20 ` Sagi Grimberg
0 siblings, 1 reply; 10+ messages in thread
From: Yi Zhang @ 2018-06-02 11:25 UTC (permalink / raw)
Hi
I would like to report a kernel NULL pointer bug with reset_controller stress operation during fio background, here is the reproducer and kernel log, let me know if you need more info
Reproducer:
1. connect to target
2. do fio stress testing background
3. do reset_controller stress test
num=0
while [ $num -lt 100 ];
do
echo 1 >/sys/block/nvme0n1/device/reset_controller
ret=$?
if [ $ret -eq 1 ]; then
echo "reset_controller operation failed: $num"
break
fi
((num++))
sleep 0.5
done
HW:
04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
Target:
[ 90.562051] IPv6: ADDRCONF(NETDEV_UP): mlx5_ib1.8003: link is not ready
[ 90.611005] mlx5_core 0000:04:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(512) RxCqeCmprss(0)
[ 90.620998] mlx5_core 0000:04:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(512) RxCqeCmprss(0)
[ 90.953571] IPv6: ADDRCONF(NETDEV_UP): mlx5_ib1.8003: link is not ready
[ 90.964800] IPv6: ADDRCONF(NETDEV_UP): mlx5_ib1.8003: link is not ready
[ 90.978598] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_ib1.8003: link becomes ready
[ 1296.312270] null: module loaded
[ 1296.433612] nvmet: adding nsid 1 to subsystem testnqn
[ 1296.440626] nvmet_rdma: enabling port 2 (172.31.1.92:4420)
[ 1313.304302] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:8d2d8eef-dd38-4b2b-bbef-49d95201d83d.
[ 1313.390460] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:8d2d8eef-dd38-4b2b-bbef-49d95201d83d.
[ 1320.424131] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:8d2d8eef-dd38-4b2b-bbef-49d95201d83d.
--snip--
[ 1369.110165] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:8d2d8eef-dd38-4b2b-bbef-49d95201d83d.
[ 1370.069398] mlx5_1ump_cqe:270pid 1960): dump error cqe
[ 1370.076935] 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1370.085528] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1370.094109] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 1370.102664] 00000030: 00 00 00 00 00 00 89 14 01 00 0b 8e 04 08 cf d3
[ 1370.111206] nvmet_rdma: SEND for CQE 0x000000002fd63b83 failed with status remote operation error (11).
[ 1370.123061] nvmet: ctrl 1 fatal error occurred!
Host:
[ 486.369937] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.1.92:4420
[ 486.380175] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 486.389168] nvme nvme0: Property Set error: 7, offset 0x14
[ 486.453361] nvme nvme0: creating 40 I/O queues.
[ 487.172879] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.1.92:4420
[ 493.430382] nvme nvme0: Property Set error: 7, offset 0x14
[ 493.487198] nvme nvme0: creating 40 I/O queues.
[ 495.996666] nvme nvme0: Property Set error: 7, offset 0x14
--snip--
[ 542.174885] nvme nvme0: creating 40 I/O queues.
[ 543.114917] DMAR: DRHD: handling fault status reg 2
[ 543.114961] BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
[ 543.121034] DMAR: [DMA Read] Request device [04:00.1] fault addr 8f2c0000 [fault reason 06] PTE Read access is not set
[ 543.130346] PGD 0 P4D 0
[ 543.146236] Oops: 0000 [#1] SMP PTI
[ 543.150673] Modules linked in: nvme_rdma nvme_fabrics nvme_core nvmet_rdma nvmet sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge 8021q gara
[ 543.234603] sysfillrect sysimgblt fb_sys_fops mlx5_core ttm drm ahci libahci libata crc32c_intel tg3 mlxfw devlink dm_mirror dm_region_hash dm_log dm_mod
[ 543.251468] CPU: 30 PID: 0 Comm: swapper/30 Not tainted 4.17.0-rc7 #1
[ 543.259388] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.6.2 01/08/2016
[ 543.268485] RIP: 0010:__nvme_rdma_recv_done.isra.46+0x1e9/0x350 [nvme_rdma]
[ 543.277009] RSP: 0018:ffff98cc7fbc3e40 EFLAGS: 00010202
[ 543.283589] RAX: 0000000000000000 RBX: ffff98dc2f7836c0 RCX: 0000000000000024
[ 543.292304] RDX: ffff98dc68f91000 RSI: 000000000000003b RDI: ffff98dc6ec21440
[ 543.301032] RBP: ffff98bd44327030 R08: 00000000000003ff R09: 0000000000000fc0
[ 543.309762] R10: 0000000000000000 R11: 0000000000000000 R12: ffff98ca2a2cd8a0
[ 543.318482] R13: ffff98cc7ce10000 R14: ffff98cb54db5e20 R15: ffff98dc78aca400
[ 543.327206] FS: 0000000000000000(0000) GS:ffff98cc7fbc0000(0000) knlGS:0000000000000000
[ 543.337015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 543.344207] CR2: 0000000000000014 CR3: 000000139b00a003 CR4: 00000000001606e0
[ 543.352959] Call Trace:
[ 543.356471] <IRQ>
[ 543.359497] __ib_process_cq+0x7d/0xd0 [ib_core]
[ 543.365436] ib_poll_handler+0x25/0x70 [ib_core]
[ 543.371368] irq_poll_softirq+0xae/0x110
[ 543.376522] __do_softirq+0xd2/0x280
[ 543.381287] irq_exit+0xd5/0xe0
[ 543.385558] do_IRQ+0x4c/0xd0
[ 543.389634] common_interrupt+0xf/0xf
[ 543.394484] </IRQ>
[ 543.397581] RIP: 0010:mwait_idle+0x6c/0x150
[ 543.403009] RSP: 0018:ffffa6f2c649feb0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd8
[ 543.412234] RAX: 0000000000000000 RBX: ffff98bd44641700 RCX: 0000000000000000
[ 543.420981] RDX: 0000000000000000 RSI: 000000000000001e RDI: ffff98cc7fbe30c0
[ 543.429728] RBP: 000000000000001e R08: 0000000000000008 R09: 0000000000000000
[ 543.438474] R10: 0000000000000000 R11: 0000000000004406 R12: 0000000000000000
[ 543.447214] R13: 0000000000000000 R14: ffff98bd44641700 R15: ffff98bd44641700
[ 543.455958] do_idle+0x1a6/0x290
[ 543.460332] cpu_startup_entry+0x6f/0x80
[ 543.465482] start_secondary+0x1aa/0x200
[ 543.470629] secondary_startup_64+0xa5/0xb0
[ 543.476065] Code: e8 bd ec ff ff 44 89 f0 48 8b 4c 24 38 65 48 33 0c 25 28 00 00 00 0f 85 da 00 00 00 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <8b> 50 14 41 39 57 20 0f 8
[ 543.498749] RIP: __nvme_rdma_recv_done.isra.46+0x1e9/0x350 [nvme_rdma] RSP: ffff98cc7fbc3e40
[ 543.508968] CR2: 0000000000000014
[ 543.513447] ---[ end trace b1b498e6cc9d5dae ]---
[ 543.513448] BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
[ 543.576424] Kernel panic - not syncing: Fatal exception in interrupt
[ 543.582845] PGD 0 P4D 0
[ 543.594374] Oops: 0000 [#2] SMP PTI
[ 543.598998] Modules linked in: nvme_rdma nvme_fabrics nvme_core nvmet_rdma nvmet sch_mqprio ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge 8021q gara
[ 543.683696] sysfillrect sysimgblt fb_sys_fops mlx5_core ttm drm ahci libahci libata crc32c_intel tg3 mlxfw devlink dm_mirror dm_region_hash dm_log dm_mod
[ 543.700695] CPU: 33 PID: 0 Comm: swapper/33 Tainted: G D 4.17.0-rc7 #1
[ 543.710223] Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.6.2 01/08/2016
[ 543.719371] RIP: 0010:__nvme_rdma_recv_done.isra.46+0x1e9/0x350 [nvme_rdma]
[ 543.727940] RSP: 0018:ffff98dc7f403e40 EFLAGS: 00010202
[ 543.734561] RAX: 0000000000000000 RBX: ffff98dc2f8b36c0 RCX: 0000000000000018
[ 543.743333] RDX: ffff98dc68f91000 RSI: 0000000000000065 RDI: ffff98dc6ec21d40
[ 543.752101] RBP: ffff98bd44326af0 R08: 00000000000003ff R09: 0000000000000e00
[ 543.760862] R10: 0000000000000000 R11: 0000000000000000 R12: ffff98caac6d97f8
[ 543.769620] R13: ffff98cc7ce10000 R14: ffff98caac3a0870 R15: ffff98db89f45c00
[ 543.778374] FS: 0000000000000000(0000) GS:ffff98dc7f400000(0000) knlGS:0000000000000000
[ 543.788212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 543.795429] CR2: 0000000000000014 CR3: 000000139b00a004 CR4: 00000000001606e0
[ 543.804212] Call Trace:
[ 543.807755] <IRQ>
[ 543.810816] __ib_process_cq+0x7d/0xd0 [ib_core]
[ 543.816791] ib_poll_handler+0x25/0x70 [ib_core]
[ 543.822763] irq_poll_softirq+0xae/0x110
[ 543.827961] __do_softirq+0xd2/0x280
[ 543.832771] irq_exit+0xd5/0xe0
[ 543.837090] do_IRQ+0x4c/0xd0
[ 543.841206] common_interrupt+0xf/0xf
[ 543.846087] </IRQ>
[ 543.849217] RIP: 0010:mwait_idle+0x6c/0x150
[ 543.854678] RSP: 0018:ffffa6f2c64b7eb0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdd
[ 543.863943] RAX: 0000000000000000 RBX: ffff98ccc4db5c00 RCX: 0000000000000000
[ 543.872727] RDX: 0000000000000000 RSI: 0000000000000021 RDI: ffff98dc7f4230c0
[ 543.881506] RBP: 0000000000000021 R08: 0000000000000008 R09: 000000000000b000
[ 543.890286] R10: 0000000000000021 R11: 0000000000000001 R12: 0000000000000000
[ 543.899062] R13: 0000000000000000 R14: ffff98ccc4db5c00 R15: ffff98ccc4db5c00
[ 543.907843] do_idle+0x1a6/0x290
[ 543.912239] cpu_startup_entry+0x6f/0x80
[ 543.917396] start_secondary+0x1aa/0x200
[ 543.922539] secondary_startup_64+0xa5/0xb0
[ 543.927960] Code: e8 bd ec ff ff 44 89 f0 48 8b 4c 24 38 65 48 33 0c 25 28 00 00 00 0f 85 da 00 00 00 48 83 c4 40 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <8b> 50 14 41 39 57 20 0f 8
[ 543.950611] RIP: __nvme_rdma_recv_done.isra.46+0x1e9/0x350 [nvme_rdma] RSP: ffff98dc7f403e40
[ 543.960821] CR2: 0000000000000014
[ 543.965292] ---[ end trace b1b498e6cc9d5daf ]---
Best Regards,
Yi Zhang
^ permalink raw reply [flat|nested] 10+ messages in thread
* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
2018-06-02 11:25 ` [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer Yi Zhang
@ 2018-06-03 12:20 ` Sagi Grimberg
2018-06-03 12:59 ` Max Gurtovoy
0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2018-06-03 12:20 UTC (permalink / raw)
> Hi
>
> I would like to report a kernel NULL pointer bug with reset_controller stress operation during fio background, here is the reproducer and kernel log, let me know if you need more info
>
> Reproducer:
> 1. connect to target
> 2. do fio stress testing background
> 3. do reset_controller stress test
> num=0
> while [ $num -lt 100 ];
> do
> echo 1 >/sys/block/nvme0n1/device/reset_controller
> ret=$?
> if [ $ret -eq 1 ]; then
> echo "reset_controller operation failed: $num"
> break
> fi
> ((num++))
> sleep 0.5
> done
>
> HW:
> 04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
> 04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
Hi Yi, thanks for reporting!
Can you check if the below patch makes this go away?
--
commit 5b9b261df4a9f3c5ce04abcae8c74717a2355962
Author: Sagi Grimberg <sagi at grimberg.me>
Date: Sun Feb 25 20:06:52 2018 +0200
nvme-rdma: stop rdma queue pairs before cancelling all reaquests
We cannot risk having queue-pairs accessing user buffers after
their I/O requests were completed.
Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 7b3f08410430..2890e560d043 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -965,12 +965,14 @@ static void nvme_rdma_error_recovery_work(struct
work_struct *work)
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
+ nvme_rdma_stop_io_queues(ctrl);
blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_io_queues(ctrl, false);
}
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_admin_queue(ctrl, false);
@@ -1720,8 +1722,11 @@ static void nvme_rdma_shutdown_ctrl(struct
nvme_rdma_ctrl *ctrl, bool shutdown)
{
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
+ nvme_rdma_stop_io_queues(ctrl);
blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_cancel_request, &ctrl->ctrl);
+ if (shutdown)
+ nvme_start_queues(&ctrl->ctrl);
nvme_rdma_destroy_io_queues(ctrl, shutdown);
}
@@ -1731,6 +1736,7 @@ static void nvme_rdma_shutdown_ctrl(struct
nvme_rdma_ctrl *ctrl, bool shutdown)
nvme_disable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_cancel_request, &ctrl->ctrl);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
--
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
2018-06-03 12:20 ` Sagi Grimberg
@ 2018-06-03 12:59 ` Max Gurtovoy
2018-06-03 16:46 ` Yi Zhang
0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2018-06-03 12:59 UTC (permalink / raw)
hi Sagi,
Israel already sent a similar commit that is on hold (based on your
proposal in the past):
The race is between completing the request at error recovery work
and rdma completions.
If we cancel the request before getting the good rdma completion
we get a NULL deref of the request MR at nvme_rdma_process_nvme_rsp().
When Canceling the request we return its mr to the mr pool (set mr to NULL)
and also unmap its data.
Canceling the requests while the rdma queues are active is not safe.
Because rdma queues are active and we get good rdma completions
that can use the mr pointer which may be NULL.
Completing the request too soon may lead also to performing DMA to/from
user buffers which might have been already unmapped.
The commit fixes the race by draining the QP before
starting the abort commands mechanism.
Signed-off-by: Israel Rukshin <israelr at mellanox.com>
Reviewed-by: Max Gurtovoy <maxg at mellanox.com>
---
drivers/nvme/host/rdma.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c1abfc8..56fccac 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -734,7 +734,6 @@ static struct blk_mq_tag_set
*nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl,
bool remove)
{
- nvme_rdma_stop_queue(&ctrl->queues[0]);
if (remove) {
blk_cleanup_queue(ctrl->ctrl.admin_q);
nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.admin_tagset);
@@ -817,7 +816,6 @@ static int nvme_rdma_configure_admin_queue(struct
nvme_rdma_ctrl *ctrl,
static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl,
bool remove)
{
- nvme_rdma_stop_io_queues(ctrl);
if (remove) {
blk_cleanup_queue(ctrl->ctrl.connect_q);
nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
@@ -947,6 +945,7 @@ static void nvme_rdma_reconnect_ctrl_work(struct
work_struct *work)
return;
destroy_admin:
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
nvme_rdma_destroy_admin_queue(ctrl, false);
requeue:
dev_info(ctrl->ctrl.device, "Failed reconnect attempt %d\n",
@@ -963,12 +962,14 @@ static void nvme_rdma_error_recovery_work(struct
work_struct *work)
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
+ nvme_rdma_stop_io_queues(ctrl);
blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_io_queues(ctrl, false);
}
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_admin_queue(ctrl, false);
@@ -1725,6 +1726,7 @@ static void nvme_rdma_shutdown_ctrl(struct
nvme_rdma_ctrl *ctrl, bool shutdown)
{
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
+ nvme_rdma_stop_io_queues(ctrl);
blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_io_queues(ctrl, shutdown);
@@ -1736,6 +1738,7 @@ static void nvme_rdma_shutdown_ctrl(struct
nvme_rdma_ctrl *ctrl, bool shutdown)
nvme_disable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_cancel_request, &ctrl->ctrl);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
@@ -2001,6 +2004,7 @@ static struct nvme_ctrl
*nvme_rdma_create_ctrl(struct device *dev,
return &ctrl->ctrl;
out_remove_admin_queue:
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
nvme_rdma_destroy_admin_queue(ctrl, true);
out_kfree_queues:
kfree(ctrl->queues);
This includes also error flows. Christoph said it will be good idea to
add it to the core...
On 6/3/2018 3:20 PM, Sagi Grimberg wrote:
>
>> Hi
>>
>> I would like to report a kernel NULL pointer bug with reset_controller
>> stress operation during fio background, here is the reproducer and
>> kernel log, let me know if you need more info
>>
>> Reproducer:
>> 1. connect to target
>> 2. do fio stress testing background
>> 3. do reset_controller stress test
>> num=0
>> while [ $num -lt 100 ];
>> do
>> ???? echo 1 >/sys/block/nvme0n1/device/reset_controller
>> ???? ret=$?
>> ???? if [ $ret -eq 1 ]; then
>> ???????? echo "reset_controller operation failed: $num"
>> ???????? break
>> ???? fi
>> ???? ((num++))
>> ???? sleep 0.5
>> done
>>
>> HW:
>> 04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family
>> [ConnectX-4]
>> 04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family
>> [ConnectX-4]
>
> Hi Yi, thanks for reporting!
>
> Can you check if the below patch makes this go away?
> --
> commit 5b9b261df4a9f3c5ce04abcae8c74717a2355962
> Author: Sagi Grimberg <sagi at grimberg.me>
> Date:?? Sun Feb 25 20:06:52 2018 +0200
>
> ??? nvme-rdma: stop rdma queue pairs before cancelling all reaquests
>
> ??? We cannot risk having queue-pairs accessing user buffers after
> ??? their I/O requests were completed.
>
> ??? Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 7b3f08410430..2890e560d043 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -965,12 +965,14 @@ static void nvme_rdma_error_recovery_work(struct
> work_struct *work)
>
> ??????? if (ctrl->ctrl.queue_count > 1) {
> ??????????????? nvme_stop_queues(&ctrl->ctrl);
> +?????????????? nvme_rdma_stop_io_queues(ctrl);
> ??????????????? blk_mq_tagset_busy_iter(&ctrl->tag_set,
> ??????????????????????????????????????? nvme_cancel_request, &ctrl->ctrl);
> ??????????????? nvme_rdma_destroy_io_queues(ctrl, false);
> ??????? }
>
> ??????? blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
> +?????? nvme_rdma_stop_queue(&ctrl->queues[0]);
> ??????? blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
> ??????????????????????????????? nvme_cancel_request, &ctrl->ctrl);
> ??????? nvme_rdma_destroy_admin_queue(ctrl, false);
> @@ -1720,8 +1722,11 @@ static void nvme_rdma_shutdown_ctrl(struct
> nvme_rdma_ctrl *ctrl, bool shutdown)
> ?{
> ??????? if (ctrl->ctrl.queue_count > 1) {
> ??????????????? nvme_stop_queues(&ctrl->ctrl);
> +?????????????? nvme_rdma_stop_io_queues(ctrl);
> ??????????????? blk_mq_tagset_busy_iter(&ctrl->tag_set,
> ??????????????????????????????????????? nvme_cancel_request, &ctrl->ctrl);
> +?????????????? if (shutdown)
> +?????????????????????? nvme_start_queues(&ctrl->ctrl);
> ??????????????? nvme_rdma_destroy_io_queues(ctrl, shutdown);
> ??????? }
>
> @@ -1731,6 +1736,7 @@ static void nvme_rdma_shutdown_ctrl(struct
> nvme_rdma_ctrl *ctrl, bool shutdown)
> ??????????????? nvme_disable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
>
> ??????? blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
> +?????? nvme_rdma_stop_queue(&ctrl->queues[0]);
> ??????? blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
> ??????????????????????????????? nvme_cancel_request, &ctrl->ctrl);
> ??????? blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
> --
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&data=02%7C01%7Cmaxg%40mellanox.com%7C132a350b85164000961108d5c94c70dc%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636636252518868081&sdata=EYlGxGSmzlncFDOxtASojkWxDGwmhVBe%2Bmm%2F0mA9p%2Fw%3D&reserved=0
>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
2018-06-03 12:59 ` Max Gurtovoy
@ 2018-06-03 16:46 ` Yi Zhang
2018-06-05 8:56 ` Ming Lei
0 siblings, 1 reply; 10+ messages in thread
From: Yi Zhang @ 2018-06-03 16:46 UTC (permalink / raw)
Hi Max/Sagi
Thanks for looking this issue.
I just tried Isreal and Sagi's patch, the kernel NULL pointer cannot be reproduced any more.
But the reset operation always hang with less than 100 times, then I found IO hang on Host side and "keep-alive timer expired" observed on target[2].
I also found some other error log during my test[1].
[1]
[ 245.229779] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
[2]
Target:
[ 647.523888] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
--snip--
[ 715.975355] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
[ 731.214264] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[ 731.222435] nvmet: ctrl 1 fatal error occurred!
Host:
[ 245.219461] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.1.92:4420
[ 245.229779] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[ 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
[ 245.302714] nvme nvme0: creating 40 I/O queues.
[ 246.011104] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.1.92:4420
[ 252.159464] nvme nvme0: Property Set error: 7, offset 0x14
[ 252.204801] nvme nvme0: creating 40 I/O queues.
[ 255.005423] nvme nvme0: Property Set error: 7, offset 0x14
--snip--
[ 426.191472] nvme nvme0: creating 40 I/O queues.
[ 428.994740] nvme nvme0: Property Set error: 7, offset 0x14
[ 429.042588] nvme nvme0: creating 40 I/O queues.
[ 615.682478] INFO: task kworker/u81:8:685 blocked for more than 120 seconds.
[ 615.690891] Not tainted 4.17.0-rc7.Sagi+ #10
[ 615.696721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 615.706003] kworker/u81:8 D 0 685 2 0x80000000
[ 615.712675] Workqueue: writeback wb_workfn (flush-259:0)
[ 615.719150] Call Trace:
[ 615.722423] ? __schedule+0x290/0x870
[ 615.727053] schedule+0x32/0x80
[ 615.731092] io_schedule+0x12/0x40
[ 615.735425] blk_mq_get_tag+0x12e/0x260
[ 615.740240] ? remove_wait_queue+0x60/0x60
[ 615.745343] blk_mq_get_request+0xce/0x390
[ 615.750437] ? __blk_mq_sched_bio_merge+0xec/0x190
[ 615.756311] blk_mq_make_request+0x11c/0x560
[ 615.761608] generic_make_request+0x18c/0x390
[ 615.766996] submit_bio+0x6e/0x130
[ 615.771320] ? guard_bio_eod+0x2c/0xa0
[ 615.776020] submit_bh_wbc+0x157/0x190
[ 615.780727] __block_write_full_page+0x14b/0x400
[ 615.786406] __writepage+0x19/0x50
[ 615.790722] write_cache_pages+0x222/0x470
[ 615.795803] ? bdi_set_max_ratio+0x70/0x70
[ 615.800895] generic_writepages+0x51/0x80
[ 615.805883] ? __wake_up_common_lock+0x87/0xc0
[ 615.811356] do_writepages+0x1a/0x70
[ 615.815849] __writeback_single_inode+0x3d/0x340
[ 615.821508] writeback_sb_inodes+0x24f/0x4b0
[ 615.826784] __writeback_inodes_wb+0x87/0xb0
[ 615.832060] wb_writeback+0x27c/0x310
[ 615.836651] wb_workfn+0x306/0x450
[ 615.840952] process_one_work+0x158/0x360
[ 615.845932] worker_thread+0x47/0x3e0
[ 615.850517] kthread+0xf8/0x130
[ 615.854518] ? max_active_store+0x80/0x80
[ 615.859480] ? kthread_bind+0x10/0x10
[ 615.864063] ret_from_fork+0x35/0x40
Best Regards,
Yi Zhang
----- Original Message -----
From: "Max Gurtovoy" <maxg@mellanox.com>
To: "Sagi Grimberg" <sagi at grimberg.me>, "Yi Zhang" <yi.zhang at redhat.com>, linux-nvme at lists.infradead.org
Sent: Sunday, June 3, 2018 8:59:39 PM
Subject: Re: [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
hi Sagi,
Israel already sent a similar commit that is on hold (based on your
proposal in the past):
The race is between completing the request at error recovery work
and rdma completions.
If we cancel the request before getting the good rdma completion
we get a NULL deref of the request MR at nvme_rdma_process_nvme_rsp().
When Canceling the request we return its mr to the mr pool (set mr to NULL)
and also unmap its data.
Canceling the requests while the rdma queues are active is not safe.
Because rdma queues are active and we get good rdma completions
that can use the mr pointer which may be NULL.
Completing the request too soon may lead also to performing DMA to/from
user buffers which might have been already unmapped.
The commit fixes the race by draining the QP before
starting the abort commands mechanism.
Signed-off-by: Israel Rukshin <israelr at mellanox.com>
Reviewed-by: Max Gurtovoy <maxg at mellanox.com>
---
drivers/nvme/host/rdma.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c1abfc8..56fccac 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -734,7 +734,6 @@ static struct blk_mq_tag_set
*nvme_rdma_alloc_tagset(struct nvme_ctrl *nctrl,
static void nvme_rdma_destroy_admin_queue(struct nvme_rdma_ctrl *ctrl,
bool remove)
{
- nvme_rdma_stop_queue(&ctrl->queues[0]);
if (remove) {
blk_cleanup_queue(ctrl->ctrl.admin_q);
nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.admin_tagset);
@@ -817,7 +816,6 @@ static int nvme_rdma_configure_admin_queue(struct
nvme_rdma_ctrl *ctrl,
static void nvme_rdma_destroy_io_queues(struct nvme_rdma_ctrl *ctrl,
bool remove)
{
- nvme_rdma_stop_io_queues(ctrl);
if (remove) {
blk_cleanup_queue(ctrl->ctrl.connect_q);
nvme_rdma_free_tagset(&ctrl->ctrl, ctrl->ctrl.tagset);
@@ -947,6 +945,7 @@ static void nvme_rdma_reconnect_ctrl_work(struct
work_struct *work)
return;
destroy_admin:
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
nvme_rdma_destroy_admin_queue(ctrl, false);
requeue:
dev_info(ctrl->ctrl.device, "Failed reconnect attempt %d\n",
@@ -963,12 +962,14 @@ static void nvme_rdma_error_recovery_work(struct
work_struct *work)
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
+ nvme_rdma_stop_io_queues(ctrl);
blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_io_queues(ctrl, false);
}
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_admin_queue(ctrl, false);
@@ -1725,6 +1726,7 @@ static void nvme_rdma_shutdown_ctrl(struct
nvme_rdma_ctrl *ctrl, bool shutdown)
{
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
+ nvme_rdma_stop_io_queues(ctrl);
blk_mq_tagset_busy_iter(&ctrl->tag_set,
nvme_cancel_request, &ctrl->ctrl);
nvme_rdma_destroy_io_queues(ctrl, shutdown);
@@ -1736,6 +1738,7 @@ static void nvme_rdma_shutdown_ctrl(struct
nvme_rdma_ctrl *ctrl, bool shutdown)
nvme_disable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
nvme_cancel_request, &ctrl->ctrl);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
@@ -2001,6 +2004,7 @@ static struct nvme_ctrl
*nvme_rdma_create_ctrl(struct device *dev,
return &ctrl->ctrl;
out_remove_admin_queue:
+ nvme_rdma_stop_queue(&ctrl->queues[0]);
nvme_rdma_destroy_admin_queue(ctrl, true);
out_kfree_queues:
kfree(ctrl->queues);
This includes also error flows. Christoph said it will be good idea to
add it to the core...
On 6/3/2018 3:20 PM, Sagi Grimberg wrote:
>
>> Hi
>>
>> I would like to report a kernel NULL pointer bug with reset_controller
>> stress operation during fio background, here is the reproducer and
>> kernel log, let me know if you need more info
>>
>> Reproducer:
>> 1. connect to target
>> 2. do fio stress testing background
>> 3. do reset_controller stress test
>> num=0
>> while [ $num -lt 100 ];
>> do
>> ???? echo 1 >/sys/block/nvme0n1/device/reset_controller
>> ???? ret=$?
>> ???? if [ $ret -eq 1 ]; then
>> ???????? echo "reset_controller operation failed: $num"
>> ???????? break
>> ???? fi
>> ???? ((num++))
>> ???? sleep 0.5
>> done
>>
>> HW:
>> 04:00.0 Infiniband controller: Mellanox Technologies MT27700 Family
>> [ConnectX-4]
>> 04:00.1 Infiniband controller: Mellanox Technologies MT27700 Family
>> [ConnectX-4]
>
> Hi Yi, thanks for reporting!
>
> Can you check if the below patch makes this go away?
> --
> commit 5b9b261df4a9f3c5ce04abcae8c74717a2355962
> Author: Sagi Grimberg <sagi at grimberg.me>
> Date:?? Sun Feb 25 20:06:52 2018 +0200
>
> ??? nvme-rdma: stop rdma queue pairs before cancelling all reaquests
>
> ??? We cannot risk having queue-pairs accessing user buffers after
> ??? their I/O requests were completed.
>
> ??? Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 7b3f08410430..2890e560d043 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -965,12 +965,14 @@ static void nvme_rdma_error_recovery_work(struct
> work_struct *work)
>
> ??????? if (ctrl->ctrl.queue_count > 1) {
> ??????????????? nvme_stop_queues(&ctrl->ctrl);
> +?????????????? nvme_rdma_stop_io_queues(ctrl);
> ??????????????? blk_mq_tagset_busy_iter(&ctrl->tag_set,
> ??????????????????????????????????????? nvme_cancel_request, &ctrl->ctrl);
> ??????????????? nvme_rdma_destroy_io_queues(ctrl, false);
> ??????? }
>
> ??????? blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
> +?????? nvme_rdma_stop_queue(&ctrl->queues[0]);
> ??????? blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
> ??????????????????????????????? nvme_cancel_request, &ctrl->ctrl);
> ??????? nvme_rdma_destroy_admin_queue(ctrl, false);
> @@ -1720,8 +1722,11 @@ static void nvme_rdma_shutdown_ctrl(struct
> nvme_rdma_ctrl *ctrl, bool shutdown)
> ?{
> ??????? if (ctrl->ctrl.queue_count > 1) {
> ??????????????? nvme_stop_queues(&ctrl->ctrl);
> +?????????????? nvme_rdma_stop_io_queues(ctrl);
> ??????????????? blk_mq_tagset_busy_iter(&ctrl->tag_set,
> ??????????????????????????????????????? nvme_cancel_request, &ctrl->ctrl);
> +?????????????? if (shutdown)
> +?????????????????????? nvme_start_queues(&ctrl->ctrl);
> ??????????????? nvme_rdma_destroy_io_queues(ctrl, shutdown);
> ??????? }
>
> @@ -1731,6 +1736,7 @@ static void nvme_rdma_shutdown_ctrl(struct
> nvme_rdma_ctrl *ctrl, bool shutdown)
> ??????????????? nvme_disable_ctrl(&ctrl->ctrl, ctrl->ctrl.cap);
>
> ??????? blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
> +?????? nvme_rdma_stop_queue(&ctrl->queues[0]);
> ??????? blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
> ??????????????????????????????? nvme_cancel_request, &ctrl->ctrl);
> ??????? blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
> --
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&data=02%7C01%7Cmaxg%40mellanox.com%7C132a350b85164000961108d5c94c70dc%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636636252518868081&sdata=EYlGxGSmzlncFDOxtASojkWxDGwmhVBe%2Bmm%2F0mA9p%2Fw%3D&reserved=0
>
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
2018-06-03 16:46 ` Yi Zhang
@ 2018-06-05 8:56 ` Ming Lei
2018-06-06 8:32 ` Yi Zhang
0 siblings, 1 reply; 10+ messages in thread
From: Ming Lei @ 2018-06-05 8:56 UTC (permalink / raw)
On Mon, Jun 4, 2018@12:46 AM, Yi Zhang <yi.zhang@redhat.com> wrote:
> Hi Max/Sagi
>
> Thanks for looking this issue.
> I just tried Isreal and Sagi's patch, the kernel NULL pointer cannot be reproduced any more.
> But the reset operation always hang with less than 100 times, then I found IO hang on Host side and "keep-alive timer expired" observed on target[2].
> I also found some other error log during my test[1].
>
> [1]
> [ 245.229779] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
> [ 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
>
> [2]
> Target:
> [ 647.523888] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
> --snip--
> [ 715.975355] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
> [ 731.214264] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [ 731.222435] nvmet: ctrl 1 fatal error occurred!
>
> Host:
> [ 245.219461] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.1.92:4420
> [ 245.229779] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
> [ 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
> [ 245.302714] nvme nvme0: creating 40 I/O queues.
> [ 246.011104] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.1.92:4420
> [ 252.159464] nvme nvme0: Property Set error: 7, offset 0x14
> [ 252.204801] nvme nvme0: creating 40 I/O queues.
> [ 255.005423] nvme nvme0: Property Set error: 7, offset 0x14
> --snip--
> [ 426.191472] nvme nvme0: creating 40 I/O queues.
> [ 428.994740] nvme nvme0: Property Set error: 7, offset 0x14
> [ 429.042588] nvme nvme0: creating 40 I/O queues.
> [ 615.682478] INFO: task kworker/u81:8:685 blocked for more than 120 seconds.
> [ 615.690891] Not tainted 4.17.0-rc7.Sagi+ #10
> [ 615.696721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 615.706003] kworker/u81:8 D 0 685 2 0x80000000
> [ 615.712675] Workqueue: writeback wb_workfn (flush-259:0)
> [ 615.719150] Call Trace:
> [ 615.722423] ? __schedule+0x290/0x870
> [ 615.727053] schedule+0x32/0x80
> [ 615.731092] io_schedule+0x12/0x40
> [ 615.735425] blk_mq_get_tag+0x12e/0x260
> [ 615.740240] ? remove_wait_queue+0x60/0x60
> [ 615.745343] blk_mq_get_request+0xce/0x390
> [ 615.750437] ? __blk_mq_sched_bio_merge+0xec/0x190
> [ 615.756311] blk_mq_make_request+0x11c/0x560
> [ 615.761608] generic_make_request+0x18c/0x390
> [ 615.766996] submit_bio+0x6e/0x130
> [ 615.771320] ? guard_bio_eod+0x2c/0xa0
> [ 615.776020] submit_bh_wbc+0x157/0x190
> [ 615.780727] __block_write_full_page+0x14b/0x400
> [ 615.786406] __writepage+0x19/0x50
> [ 615.790722] write_cache_pages+0x222/0x470
> [ 615.795803] ? bdi_set_max_ratio+0x70/0x70
> [ 615.800895] generic_writepages+0x51/0x80
> [ 615.805883] ? __wake_up_common_lock+0x87/0xc0
> [ 615.811356] do_writepages+0x1a/0x70
> [ 615.815849] __writeback_single_inode+0x3d/0x340
> [ 615.821508] writeback_sb_inodes+0x24f/0x4b0
> [ 615.826784] __writeback_inodes_wb+0x87/0xb0
> [ 615.832060] wb_writeback+0x27c/0x310
> [ 615.836651] wb_workfn+0x306/0x450
> [ 615.840952] process_one_work+0x158/0x360
> [ 615.845932] worker_thread+0x47/0x3e0
> [ 615.850517] kthread+0xf8/0x130
> [ 615.854518] ? max_active_store+0x80/0x80
> [ 615.859480] ? kthread_bind+0x10/0x10
> [ 615.864063] ret_from_fork+0x35/0x40
This one seems one fixed blk-mq issue, and the following merged commit
might be helpful:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6fc46498784e799d3eb95d83079180e413c4e7d
Thanks,
Ming Lei
^ permalink raw reply [flat|nested] 10+ messages in thread
* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
2018-06-05 8:56 ` Ming Lei
@ 2018-06-06 8:32 ` Yi Zhang
2018-06-06 9:48 ` Max Gurtovoy
0 siblings, 1 reply; 10+ messages in thread
From: Yi Zhang @ 2018-06-06 8:32 UTC (permalink / raw)
Hi Ming
I just tried 4.17.0 + Isreal's patch, but with bad luck, still can
reproduce the issue bellow.
On 06/05/2018 04:56 PM, Ming Lei wrote:
> On Mon, Jun 4, 2018@12:46 AM, Yi Zhang <yi.zhang@redhat.com> wrote:
>> Hi Max/Sagi
>>
>> Thanks for looking this issue.
>> I just tried Isreal and Sagi's patch, the kernel NULL pointer cannot be reproduced any more.
>> But the reset operation always hang with less than 100 times, then I found IO hang on Host side and "keep-alive timer expired" observed on target[2].
>> I also found some other error log during my test[1].
>>
>> [1]
>> [ 245.229779] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>> [ 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
>>
>> [2]
>> Target:
>> [ 647.523888] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
>> --snip--
>> [ 715.975355] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
>> [ 731.214264] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
>> [ 731.222435] nvmet: ctrl 1 fatal error occurred!
>>
>> Host:
>> [ 245.219461] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.1.92:4420
>> [ 245.229779] nvme nvme0: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
>> [ 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
>> [ 245.302714] nvme nvme0: creating 40 I/O queues.
>> [ 246.011104] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.1.92:4420
>> [ 252.159464] nvme nvme0: Property Set error: 7, offset 0x14
>> [ 252.204801] nvme nvme0: creating 40 I/O queues.
>> [ 255.005423] nvme nvme0: Property Set error: 7, offset 0x14
>> --snip--
>> [ 426.191472] nvme nvme0: creating 40 I/O queues.
>> [ 428.994740] nvme nvme0: Property Set error: 7, offset 0x14
>> [ 429.042588] nvme nvme0: creating 40 I/O queues.
>> [ 615.682478] INFO: task kworker/u81:8:685 blocked for more than 120 seconds.
>> [ 615.690891] Not tainted 4.17.0-rc7.Sagi+ #10
>> [ 615.696721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 615.706003] kworker/u81:8 D 0 685 2 0x80000000
>> [ 615.712675] Workqueue: writeback wb_workfn (flush-259:0)
>> [ 615.719150] Call Trace:
>> [ 615.722423] ? __schedule+0x290/0x870
>> [ 615.727053] schedule+0x32/0x80
>> [ 615.731092] io_schedule+0x12/0x40
>> [ 615.735425] blk_mq_get_tag+0x12e/0x260
>> [ 615.740240] ? remove_wait_queue+0x60/0x60
>> [ 615.745343] blk_mq_get_request+0xce/0x390
>> [ 615.750437] ? __blk_mq_sched_bio_merge+0xec/0x190
>> [ 615.756311] blk_mq_make_request+0x11c/0x560
>> [ 615.761608] generic_make_request+0x18c/0x390
>> [ 615.766996] submit_bio+0x6e/0x130
>> [ 615.771320] ? guard_bio_eod+0x2c/0xa0
>> [ 615.776020] submit_bh_wbc+0x157/0x190
>> [ 615.780727] __block_write_full_page+0x14b/0x400
>> [ 615.786406] __writepage+0x19/0x50
>> [ 615.790722] write_cache_pages+0x222/0x470
>> [ 615.795803] ? bdi_set_max_ratio+0x70/0x70
>> [ 615.800895] generic_writepages+0x51/0x80
>> [ 615.805883] ? __wake_up_common_lock+0x87/0xc0
>> [ 615.811356] do_writepages+0x1a/0x70
>> [ 615.815849] __writeback_single_inode+0x3d/0x340
>> [ 615.821508] writeback_sb_inodes+0x24f/0x4b0
>> [ 615.826784] __writeback_inodes_wb+0x87/0xb0
>> [ 615.832060] wb_writeback+0x27c/0x310
>> [ 615.836651] wb_workfn+0x306/0x450
>> [ 615.840952] process_one_work+0x158/0x360
>> [ 615.845932] worker_thread+0x47/0x3e0
>> [ 615.850517] kthread+0xf8/0x130
>> [ 615.854518] ? max_active_store+0x80/0x80
>> [ 615.859480] ? kthread_bind+0x10/0x10
>> [ 615.864063] ret_from_fork+0x35/0x40
> This one seems one fixed blk-mq issue, and the following merged commit
> might be helpful:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6fc46498784e799d3eb95d83079180e413c4e7d
>
>
> Thanks,
> Ming Lei
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
^ permalink raw reply [flat|nested] 10+ messages in thread
* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
2018-06-06 8:32 ` Yi Zhang
@ 2018-06-06 9:48 ` Max Gurtovoy
2018-06-07 3:20 ` Yi Zhang
0 siblings, 1 reply; 10+ messages in thread
From: Max Gurtovoy @ 2018-06-06 9:48 UTC (permalink / raw)
On 6/6/2018 11:32 AM, Yi Zhang wrote:
> Hi Ming
>
> I just tried 4.17.0 + Isreal's patch, but with bad luck, still can
> reproduce the issue bellow.
can you try Israel's patch + Christoph's "drop abort sequence, kill
BLK_EH_HANDLED" series ?
>
>
> On 06/05/2018 04:56 PM, Ming Lei wrote:
>> On Mon, Jun 4, 2018@12:46 AM, Yi Zhang <yi.zhang@redhat.com> wrote:
>>> Hi Max/Sagi
>>>
>>> Thanks for looking this issue.
>>> I just tried Isreal and Sagi's patch, the kernel NULL pointer cannot
>>> be reproduced any more.
>>> But the reset operation always hang with less than 100 times, then I
>>> found IO hang on Host side and "keep-alive timer expired" observed on
>>> target[2].
>>> I also found some other error log during my test[1].
>>>
>>> [1]
>>> [? 245.229779] nvme nvme0: Removing ctrl: NQN
>>> "nqn.2014-08.org.nvmexpress.discovery"
>>> [? 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
>>>
>>> [2]
>>> Target:
>>> [? 647.523888] nvmet: creating controller 1 for subsystem testnqn for
>>> NQN
>>> nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
>>> --snip--
>>> [? 715.975355] nvmet: creating controller 1 for subsystem testnqn for
>>> NQN
>>> nqn.2014-08.org.nvmexpress:uuid:58282b02-64ae-4155-bad5-e37183e148e9.
>>> [? 731.214264] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
>>> [? 731.222435] nvmet: ctrl 1 fatal error occurred!
>>>
>>> Host:
>>> [? 245.219461] nvme nvme0: new ctrl: NQN
>>> "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.1.92:4420
>>> [? 245.229779] nvme nvme0: Removing ctrl: NQN
>>> "nqn.2014-08.org.nvmexpress.discovery"
>>> [? 245.238728] nvme nvme0: Property Set error: 7, offset 0x14
>>> [? 245.302714] nvme nvme0: creating 40 I/O queues.
>>> [? 246.011104] nvme nvme0: new ctrl: NQN "testnqn", addr
>>> 172.31.1.92:4420
>>> [? 252.159464] nvme nvme0: Property Set error: 7, offset 0x14
>>> [? 252.204801] nvme nvme0: creating 40 I/O queues.
>>> [? 255.005423] nvme nvme0: Property Set error: 7, offset 0x14
>>> --snip--
>>> [? 426.191472] nvme nvme0: creating 40 I/O queues.
>>> [? 428.994740] nvme nvme0: Property Set error: 7, offset 0x14
>>> [? 429.042588] nvme nvme0: creating 40 I/O queues.
>>> [? 615.682478] INFO: task kworker/u81:8:685 blocked for more than 120
>>> seconds.
>>> [? 615.690891]?????? Not tainted 4.17.0-rc7.Sagi+ #10
>>> [? 615.696721] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [? 615.706003] kworker/u81:8?? D??? 0?? 685????? 2 0x80000000
>>> [? 615.712675] Workqueue: writeback wb_workfn (flush-259:0)
>>> [? 615.719150] Call Trace:
>>> [? 615.722423]? ? __schedule+0x290/0x870
>>> [? 615.727053]? schedule+0x32/0x80
>>> [? 615.731092]? io_schedule+0x12/0x40
>>> [? 615.735425]? blk_mq_get_tag+0x12e/0x260
>>> [? 615.740240]? ? remove_wait_queue+0x60/0x60
>>> [? 615.745343]? blk_mq_get_request+0xce/0x390
>>> [? 615.750437]? ? __blk_mq_sched_bio_merge+0xec/0x190
>>> [? 615.756311]? blk_mq_make_request+0x11c/0x560
>>> [? 615.761608]? generic_make_request+0x18c/0x390
>>> [? 615.766996]? submit_bio+0x6e/0x130
>>> [? 615.771320]? ? guard_bio_eod+0x2c/0xa0
>>> [? 615.776020]? submit_bh_wbc+0x157/0x190
>>> [? 615.780727]? __block_write_full_page+0x14b/0x400
>>> [? 615.786406]? __writepage+0x19/0x50
>>> [? 615.790722]? write_cache_pages+0x222/0x470
>>> [? 615.795803]? ? bdi_set_max_ratio+0x70/0x70
>>> [? 615.800895]? generic_writepages+0x51/0x80
>>> [? 615.805883]? ? __wake_up_common_lock+0x87/0xc0
>>> [? 615.811356]? do_writepages+0x1a/0x70
>>> [? 615.815849]? __writeback_single_inode+0x3d/0x340
>>> [? 615.821508]? writeback_sb_inodes+0x24f/0x4b0
>>> [? 615.826784]? __writeback_inodes_wb+0x87/0xb0
>>> [? 615.832060]? wb_writeback+0x27c/0x310
>>> [? 615.836651]? wb_workfn+0x306/0x450
>>> [? 615.840952]? process_one_work+0x158/0x360
>>> [? 615.845932]? worker_thread+0x47/0x3e0
>>> [? 615.850517]? kthread+0xf8/0x130
>>> [? 615.854518]? ? max_active_store+0x80/0x80
>>> [? 615.859480]? ? kthread_bind+0x10/0x10
>>> [? 615.864063]? ret_from_fork+0x35/0x40
>> This one seems one fixed blk-mq issue, and the following merged commit
>> might be helpful:
>>
>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3De6fc46498784e799d3eb95d83079180e413c4e7d&data=02%7C01%7Cmaxg%40mellanox.com%7Cfdd5b3e904f14418836708d5cb880651%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636638707444089428&sdata=Pl1lD%2FfTjRdAzwjEXjqEHWQHavqertDBbnDrkgv%2FMT0%3D&reserved=0
>>
>>
>>
>> Thanks,
>> Ming Lei
>>
>> _______________________________________________
>> Linux-nvme mailing list
>> Linux-nvme at lists.infradead.org
>> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.infradead.org%2Fmailman%2Flistinfo%2Flinux-nvme&data=02%7C01%7Cmaxg%40mellanox.com%7Cfdd5b3e904f14418836708d5cb880651%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636638707444089428&sdata=yt19IfdH1YMGfHhzdYJEq0OzYTAYa1zehZ4daiSlcO8%3D&reserved=0
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
2018-06-06 9:48 ` Max Gurtovoy
@ 2018-06-07 3:20 ` Yi Zhang
2018-06-07 8:27 ` Sagi Grimberg
0 siblings, 1 reply; 10+ messages in thread
From: Yi Zhang @ 2018-06-07 3:20 UTC (permalink / raw)
On 06/06/2018 05:48 PM, Max Gurtovoy wrote:
>
>
> On 6/6/2018 11:32 AM, Yi Zhang wrote:
>> Hi Ming
>>
>> I just tried 4.17.0 + Isreal's patch, but with bad luck, still can
>> reproduce the issue bellow.
>
> can you try Israel's patch + Christoph's "drop abort sequence, kill
> BLK_EH_HANDLED" series ?
Hi Max
Still can reproduce the target " keep-alive timer (15 seconds) expired"
and host IO blocked issue.
Here is the log:
Target:
[? 356.473764] nvmet: creating controller 1 for subsystem testnqn for
NQN nqn.2014-08.org.nvmexpress:uuid:95d13919-162e-48e4-8f00-da8a15ba7522.
[? 359.298784] nvmet: creating controller 1 for subsystem testnqn for
NQN nqn.2014-08.org.nvmexpress:uuid:95d13919-162e-48e4-8f00-da8a15ba7522.
[? 374.408151] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[? 374.416204] nvmet: ctrl 1 fatal error occurred!
Host:
[? 390.853777] nvme nvme0: creating 40 I/O queues.
[? 393.660763] nvme nvme0: Property Set error: 7, offset 0x14
[? 393.713160] nvme nvme0: creating 40 I/O queues.
[? 615.429133] INFO: task kworker/u82:0:21 blocked for more than 120
seconds.
[? 615.437464]?????? Not tainted 4.17.0-rc4.test+ #1
[? 615.443243] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 615.452517] kworker/u82:0?? D??? 0??? 21????? 2 0x80000000
[? 615.459184] Workqueue: nvme-reset-wq nvme_rdma_reset_ctrl_work
[nvme_rdma]
[? 615.467406] Call Trace:
[? 615.470682]? ? __schedule+0x290/0x870
[? 615.475306]? ? select_task_rq_fair+0x9dd/0xce0
[? 615.480812]? schedule+0x32/0x80
[? 615.484858]? schedule_timeout+0x1d8/0x300
[? 615.489872]? ? blk_finish_plug+0x27/0x40
[? 615.494787]? wait_for_completion+0x123/0x190
[? 615.500091]? ? wake_up_q+0x70/0x70
[? 615.504426]? flush_work+0x11a/0x1a0
[? 615.508849]? ? wake_up_worker+0x30/0x30
[? 615.513664]? nvme_stop_ctrl+0x37/0x60 [nvme_core]
[? 615.519442]? nvme_rdma_reset_ctrl_work+0x22/0xb0 [nvme_rdma]
[? 615.526294]? process_one_work+0x158/0x360
[? 615.531299]? worker_thread+0x1cf/0x3e0
[? 615.536005]? kthread+0xf8/0x130
[? 615.540036]? ? max_active_store+0x80/0x80
[? 615.545035]? ? kthread_bind+0x10/0x10
[? 615.549647]? ret_from_fork+0x35/0x40
[? 615.554174] INFO: task kworker/u81:8:673 blocked for more than 120
seconds.
[? 615.562478]?????? Not tainted 4.17.0-rc4.test+ #1
[? 615.568261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 615.577554] kworker/u81:8?? D??? 0?? 673????? 2 0x80000000
[? 615.584234] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[? 615.590806] Call Trace:
[? 615.594087]? ? __schedule+0x290/0x870
[? 615.598725]? schedule+0x32/0x80
[? 615.602777]? schedule_preempt_disabled+0xa/0x10
[? 615.608384]? __mutex_lock.isra.7+0x1cd/0x510
[? 615.613698]? ? bdget+0x3f/0x140
[? 615.617745]? revalidate_disk+0x47/0x80
[? 615.622474]? nvme_validate_ns+0x49/0x7c0 [nvme_core]
[? 615.628563]? ? wake_up_q+0x70/0x70
[? 615.632903]? ? __blk_mq_free_request+0x6d/0x80
[? 615.638407]? nvme_scan_work+0x2f4/0x3b0 [nvme_core]
[? 615.644400]? process_one_work+0x158/0x360
[? 615.649422]? worker_thread+0x47/0x3e0
[? 615.654041]? kthread+0xf8/0x130
[? 615.658086]? ? max_active_store+0x80/0x80
[? 615.663096]? ? kthread_bind+0x10/0x10
[? 615.667717]? ret_from_fork+0x35/0x40
[? 615.672237] INFO: task kworker/u81:9:674 blocked for more than 120
seconds.
[? 615.680553]?????? Not tainted 4.17.0-rc4.test+ #1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
2018-06-07 3:20 ` Yi Zhang
@ 2018-06-07 8:27 ` Sagi Grimberg
2018-06-07 11:02 ` Yi Zhang
0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2018-06-07 8:27 UTC (permalink / raw)
> Still can reproduce the target " keep-alive timer (15 seconds) expired"
> and host IO blocked issue.
Looks like the revalidate_disk is stuck on mutex via ns scan work which
prevents from the reset to start as we try to flush that work.
It appears that it's bdev->bd_mutex that is blocked. Still trying to
understand how it can get blocked.
Yi, would it be possible to try and reproduce this with lockdep to
help us?
> Here is the log:
> Target:
> [? 356.473764] nvmet: creating controller 1 for subsystem testnqn for
> NQN nqn.2014-08.org.nvmexpress:uuid:95d13919-162e-48e4-8f00-da8a15ba7522.
> [? 359.298784] nvmet: creating controller 1 for subsystem testnqn for
> NQN nqn.2014-08.org.nvmexpress:uuid:95d13919-162e-48e4-8f00-da8a15ba7522.
> [? 374.408151] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
> [? 374.416204] nvmet: ctrl 1 fatal error occurred!
> Host:
> [? 390.853777] nvme nvme0: creating 40 I/O queues.
> [? 393.660763] nvme nvme0: Property Set error: 7, offset 0x14
> [? 393.713160] nvme nvme0: creating 40 I/O queues.
> [? 615.429133] INFO: task kworker/u82:0:21 blocked for more than 120
> seconds.
> [? 615.437464]?????? Not tainted 4.17.0-rc4.test+ #1
> [? 615.443243] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [? 615.452517] kworker/u82:0?? D??? 0??? 21????? 2 0x80000000
> [? 615.459184] Workqueue: nvme-reset-wq nvme_rdma_reset_ctrl_work
> [nvme_rdma]
> [? 615.467406] Call Trace:
> [? 615.470682]? ? __schedule+0x290/0x870
> [? 615.475306]? ? select_task_rq_fair+0x9dd/0xce0
> [? 615.480812]? schedule+0x32/0x80
> [? 615.484858]? schedule_timeout+0x1d8/0x300
> [? 615.489872]? ? blk_finish_plug+0x27/0x40
> [? 615.494787]? wait_for_completion+0x123/0x190
> [? 615.500091]? ? wake_up_q+0x70/0x70
> [? 615.504426]? flush_work+0x11a/0x1a0
> [? 615.508849]? ? wake_up_worker+0x30/0x30
> [? 615.513664]? nvme_stop_ctrl+0x37/0x60 [nvme_core]
> [? 615.519442]? nvme_rdma_reset_ctrl_work+0x22/0xb0 [nvme_rdma]
> [? 615.526294]? process_one_work+0x158/0x360
> [? 615.531299]? worker_thread+0x1cf/0x3e0
> [? 615.536005]? kthread+0xf8/0x130
> [? 615.540036]? ? max_active_store+0x80/0x80
> [? 615.545035]? ? kthread_bind+0x10/0x10
> [? 615.549647]? ret_from_fork+0x35/0x40
> [? 615.554174] INFO: task kworker/u81:8:673 blocked for more than 120
> seconds.
> [? 615.562478]?????? Not tainted 4.17.0-rc4.test+ #1
> [? 615.568261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [? 615.577554] kworker/u81:8?? D??? 0?? 673????? 2 0x80000000
> [? 615.584234] Workqueue: nvme-wq nvme_scan_work [nvme_core]
> [? 615.590806] Call Trace:
> [? 615.594087]? ? __schedule+0x290/0x870
> [? 615.598725]? schedule+0x32/0x80
> [? 615.602777]? schedule_preempt_disabled+0xa/0x10
> [? 615.608384]? __mutex_lock.isra.7+0x1cd/0x510
> [? 615.613698]? ? bdget+0x3f/0x140
> [? 615.617745]? revalidate_disk+0x47/0x80
> [? 615.622474]? nvme_validate_ns+0x49/0x7c0 [nvme_core]
> [? 615.628563]? ? wake_up_q+0x70/0x70
> [? 615.632903]? ? __blk_mq_free_request+0x6d/0x80
> [? 615.638407]? nvme_scan_work+0x2f4/0x3b0 [nvme_core]
> [? 615.644400]? process_one_work+0x158/0x360
> [? 615.649422]? worker_thread+0x47/0x3e0
> [? 615.654041]? kthread+0xf8/0x130
> [? 615.658086]? ? max_active_store+0x80/0x80
> [? 615.663096]? ? kthread_bind+0x10/0x10
> [? 615.667717]? ret_from_fork+0x35/0x40
> [? 615.672237] INFO: task kworker/u81:9:674 blocked for more than 120
> seconds.
> [? 615.680553]?????? Not tainted 4.17.0-rc4.test+ #1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer
2018-06-07 8:27 ` Sagi Grimberg
@ 2018-06-07 11:02 ` Yi Zhang
0 siblings, 0 replies; 10+ messages in thread
From: Yi Zhang @ 2018-06-07 11:02 UTC (permalink / raw)
On 06/07/2018 04:27 PM, Sagi Grimberg wrote:
>
>> Still can reproduce the target " keep-alive timer (15 seconds)
>> expired" and host IO blocked issue.
>
> Looks like the revalidate_disk is stuck on mutex via ns scan work which
> prevents from the reset to start as we try to flush that work.
>
> It appears that it's bdev->bd_mutex that is blocked. Still trying to
> understand how it can get blocked.
>
> Yi, would it be possible to try and reproduce this with lockdep to
> help us?
>
Yes, here is the log on target/host side after enable lockdep:
Target:
[? 408.978728] nvmet: creating controller 1 for subsystem testnqn for
NQN nqn.2014-08.org.nvmexpress:uuid:ca06e830-e1cf-47d4-bb2c-41d68ec6ac09.
[? 412.220261] nvmet: creating controller 1 for subsystem testnqn for
NQN nqn.2014-08.org.nvmexpress:uuid:ca06e830-e1cf-47d4-bb2c-41d68ec6ac09.
[? 427.452231] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[? 427.478042] nvmet: ctrl 1 fatal error occurred!
Host:
[? 400.172594] nvme nvme0: creating 40 I/O queues.
[? 403.363435] nvme nvme0: Property Set error: 7, offset 0x14
[? 403.413914] nvme nvme0: creating 40 I/O queues.
[? 614.996955] INFO: task kworker/u81:8:672 blocked for more than 120
seconds.
[? 615.005534]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 615.012276] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 615.021648] kworker/u81:8?? D??? 0?? 672????? 2 0x80000000
[? 615.028392] Workqueue: nvme-wq nvme_scan_work [nvme_core]
[? 615.035050] Call Trace:
[? 615.038398]? ? __schedule+0x403/0xbd0
[? 615.043106]? ? __mutex_lock+0x2aa/0x900
[? 615.048004]? schedule+0x32/0x80
[? 615.052112]? schedule_preempt_disabled+0xa/0x10
[? 615.057778]? __mutex_lock+0x2af/0x900
[? 615.062475]? ? __mutex_lock+0xb4/0x900
[? 615.067253]? ? I_BDEV+0x10/0x10
[? 615.071356]? ? bdev_test+0x20/0x20
[? 615.075742]? ? revalidate_disk+0x49/0x90
[? 615.080726]? revalidate_disk+0x49/0x90
[? 615.085502]? nvme_validate_ns+0x49/0x8b0 [nvme_core]
[? 615.091646]? ? blk_queue_exit+0x67/0x110
[? 615.096615]? ? blk_queue_exit+0x7f/0x110
[? 615.101574]? ? blk_queue_exit+0x5/0x110
[? 615.106423]? nvme_scan_work+0x2fa/0x3c0 [nvme_core]
[? 615.112459]? process_one_work+0x1f9/0x5c0
[? 615.117506]? ? process_one_work+0x173/0x5c0
[? 615.122748]? worker_thread+0x44/0x3e0
[? 615.127408]? kthread+0xfb/0x130
[? 615.131466]? ? process_one_work+0x5c0/0x5c0
[? 615.136697]? ? kthread_bind+0x10/0x10
[? 615.141348]? ret_from_fork+0x3a/0x50
[? 615.145901] INFO: task kworker/u81:9:673 blocked for more than 120
seconds.
[? 615.154251]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 615.160947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 615.170276] kworker/u81:9?? D??? 0?? 673????? 2 0x80000000
[? 615.176992] Workqueue: nvme-reset-wq nvme_rdma_reset_ctrl_work
[nvme_rdma]
[? 615.185275] Call Trace:
[? 615.188600]? ? __schedule+0x403/0xbd0
[? 615.193289]? schedule+0x32/0x80
[? 615.197383]? schedule_timeout+0x21e/0x460
[? 615.202453]? ? find_held_lock+0x38/0xa0
[? 615.207322]? ? mark_held_locks+0x6b/0x90
[? 615.212283]? ? _raw_spin_unlock_irq+0x29/0x40
[? 615.217727]? wait_for_completion+0x110/0x170
[? 615.223075]? ? wake_up_q+0x70/0x70
[? 615.227446]? flush_work+0x197/0x220
[? 615.231918]? ? flush_work+0x14e/0x220
[? 615.236569]? ? wake_up_worker+0x30/0x30
[? 615.241420]? ? wait_for_completion+0x36/0x170
[? 615.246851]? nvme_stop_ctrl+0x37/0x60 [nvme_core]
[? 615.252665]? nvme_rdma_reset_ctrl_work+0x22/0xb0 [nvme_rdma]
[? 615.259551]? process_one_work+0x1f9/0x5c0
[? 615.264574]? ? process_one_work+0x173/0x5c0
[? 615.269789]? worker_thread+0x1c3/0x3e0
[? 615.274497]? kthread+0xfb/0x130
[? 615.278521]? ? process_one_work+0x5c0/0x5c0
[? 615.283704]? ? kthread_bind+0x10/0x10
[? 615.288311]? ret_from_fork+0x3a/0x50
[? 615.292830] INFO: task systemd-udevd:996 blocked for more than 120
seconds.
[? 615.301144]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 615.307806] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 615.317117] systemd-udevd?? D??? 0?? 996????? 1 0x00000000
[? 615.323805] Call Trace:
[? 615.327096]? ? __schedule+0x403/0xbd0
[? 615.331743]? ? blk_flush_plug_list+0xe7/0x240
[? 615.337181]? schedule+0x32/0x80
[? 615.341238]? io_schedule+0x12/0x40
[? 615.345598]? blk_mq_get_tag+0x15c/0x270
[? 615.350437]? ? remove_wait_queue+0x60/0x60
[? 615.355581]? blk_mq_get_request+0x13c/0x470
[? 615.360822]? ? blk_mq_get_request+0x5/0x470
[? 615.366062]? blk_mq_make_request+0x13d/0x700
[? 615.371398]? generic_make_request+0x338/0x570
[? 615.376834]? submit_bio+0x6e/0x130
[? 615.381193]? ? guard_bio_eod+0xa3/0x180
[? 615.386042]? ? guard_bio_eod+0x21/0x180
[? 615.390883]? submit_bh_wbc+0x15a/0x190
[? 615.395630]? __block_write_full_page+0x14b/0x400
[? 615.401352]? __writepage+0x1c/0x60
[? 615.405703]? write_cache_pages+0x243/0x530
[? 615.410819]? ? __wb_update_bandwidth+0x200/0x200
[? 615.416526]? generic_writepages+0x54/0x80
[? 615.421531]? do_writepages+0x1a/0x70
[? 615.426032]? __filemap_fdatawrite_range+0xc5/0x100
[? 615.431899]? filemap_write_and_wait+0x2c/0x70
[? 615.437283]? invalidate_partition+0x1e/0x40
[? 615.442458]? rescan_partitions+0x4f/0x370
[? 615.447460]? blkdev_reread_part+0x20/0x30
[? 615.452454]? blkdev_ioctl+0x4d6/0x9e0
[? 615.457067]? ? selinux_file_ioctl+0x13d/0x1f0
[? 615.462451]? block_ioctl+0x43/0x50
[? 615.466757]? do_vfs_ioctl+0xa9/0x6c0
[? 615.471266]? ksys_ioctl+0x60/0x90
[? 615.475484]? __x64_sys_ioctl+0x16/0x20
[? 615.480186]? do_syscall_64+0x60/0x210
[? 615.484794]? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[? 615.490957] RIP: 0033:0x7fb829fc4567
[? 615.495467] RSP: 002b:00007ffcf18dd888 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[? 615.504462] RAX: ffffffffffffffda RBX: 000055975a9ad030 RCX:
00007fb829fc4567
[? 615.512980] RDX: 0000000000000000 RSI: 000000000000125f RDI:
000000000000000c
[? 615.521500] RBP: 000055975aa08d50 R08: 0000559758ee1142 R09:
0000000000000003
[? 615.530024] R10: fffffffffffffd43 R11: 0000000000000246 R12:
00007ffcf18dda80
[? 615.538550] R13: 0000000000000003 R14: 000000000000000c R15:
0000000000000000
[? 615.547117] INFO: task fio:3715 blocked for more than 120 seconds.
[? 615.554585]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 615.561283] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 615.570608] fio???????????? D??? 0? 3715?? 3671 0x00000080
[? 615.577315] Call Trace:
[? 615.580602]? ? __schedule+0x403/0xbd0
[? 615.585242]? ? __mutex_lock+0x2aa/0x900
[? 615.590070]? schedule+0x32/0x80
[? 615.594114]? schedule_preempt_disabled+0xa/0x10
[? 615.599719]? __mutex_lock+0x2af/0x900
[? 615.604355]? ? __mutex_lock+0xb4/0x900
[? 615.609079]? ? find_held_lock+0x38/0xa0
[? 615.613922]? ? blkdev_put+0x1e/0x100
[? 615.618464]? blkdev_put+0x1e/0x100
[? 615.622803]? blkdev_close+0x24/0x30
[? 615.627242]? __fput+0xde/0x220
[? 615.631191]? task_work_run+0x89/0xa0
[? 615.635724]? exit_to_usermode_loop+0x7e/0xb1
[? 615.641032]? do_syscall_64+0x1e5/0x210
[? 615.645762]? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[? 615.651955] RIP: 0033:0x14a7e810c80d
[? 615.656492] RSP: 002b:000014a7c9150b90 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[? 615.665510] RAX: 0000000000000000 RBX: 000014a7e4b3f6d0 RCX:
000014a7e810c80d
[? 615.674059] RDX: 000014a7ee411000 RSI: 0000000000000080 RDI:
0000000000000032
[? 615.682608] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[? 615.691160] R10: 0000000000000000 R11: 0000000000000293 R12:
0000000000000000
[? 615.699702] R13: 0000000000000000 R14: 0000000000000000 R15:
000014a7cdd87398
[? 615.708261] INFO: task fio:3716 blocked for more than 120 seconds.
[? 615.715738]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 615.722445] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 615.731786] fio???????????? D??? 0? 3716?? 3671 0x00000080
[? 615.738535] Call Trace:
[? 615.741864]? ? __schedule+0x403/0xbd0
[? 615.746554]? ? __mutex_lock+0x2aa/0x900
[? 615.751438]? schedule+0x32/0x80
[? 615.755541]? schedule_preempt_disabled+0xa/0x10
[? 615.761200]? __mutex_lock+0x2af/0x900
[? 615.765897]? ? __mutex_lock+0xb4/0x900
[? 615.770682]? ? find_held_lock+0x38/0xa0
[? 615.775568]? ? blkdev_put+0x1e/0x100
[? 615.780163]? blkdev_put+0x1e/0x100
[? 615.784554]? blkdev_close+0x24/0x30
[? 615.789026]? __fput+0xde/0x220
[? 615.793007]? task_work_run+0x89/0xa0
[? 615.797560]? exit_to_usermode_loop+0x7e/0xb1
[? 615.802892]? do_syscall_64+0x1e5/0x210
[? 615.807630]? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[? 615.813832] RIP: 0033:0x14a7e810c80d
[? 615.818367] RSP: 002b:000014a7c9351b90 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[? 615.827391] RAX: 0000000000000000 RBX: 000014a7e4b3f890 RCX:
000014a7e810c80d
[? 615.835922] RDX: 000014a7ee411000 RSI: 0000000000000080 RDI:
0000000000000020
[? 615.844446] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[? 615.852964] R10: 0000000000000000 R11: 0000000000000293 R12:
0000000000000000
[? 615.861483] R13: 0000000000000000 R14: 0000000000000000 R15:
000014a7cdd96ee0
[? 615.870010] INFO: task fio:3717 blocked for more than 120 seconds.
[? 615.877468]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 615.884152] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 615.893477] fio???????????? D??? 0? 3717?? 3671 0x00000080
[? 615.900186] Call Trace:
[? 615.903499]? ? __schedule+0x403/0xbd0
[? 615.908171]? ? __mutex_lock+0x2aa/0x900
[? 615.913038]? schedule+0x32/0x80
[? 615.917123]? schedule_preempt_disabled+0xa/0x10
[? 615.922772]? __mutex_lock+0x2af/0x900
[? 615.927447]? ? __mutex_lock+0xb4/0x900
[? 615.932223]? ? find_held_lock+0x38/0xa0
[? 615.937090]? ? blkdev_put+0x1e/0x100
[? 615.941671]? blkdev_put+0x1e/0x100
[? 615.946037]? blkdev_close+0x24/0x30
[? 615.950498]? __fput+0xde/0x220
[? 615.954462]? task_work_run+0x89/0xa0
[? 615.959002]? exit_to_usermode_loop+0x7e/0xb1
[? 615.964322]? do_syscall_64+0x1e5/0x210
[? 615.969060]? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[? 615.975257] RIP: 0033:0x14a7e810c80d
[? 615.979797] RSP: 002b:000014a7c9552b90 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[? 615.988816] RAX: 0000000000000000 RBX: 000014a7e4b3fa50 RCX:
000014a7e810c80d
[? 615.997346] RDX: 000014a7ee411000 RSI: 0000000000000080 RDI:
0000000000000012
[? 616.005867] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[? 616.014392] R10: 0000000000000000 R11: 0000000000000293 R12:
0000000000000000
[? 616.022906] R13: 0000000000000000 R14: 0000000000000000 R15:
000014a7cdda6a28
[? 616.031435] INFO: task fio:3719 blocked for more than 120 seconds.
[? 616.038896]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 616.045597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 616.054927] fio???????????? D??? 0? 3719?? 3671 0x00000080
[? 616.061640] Call Trace:
[? 616.064947]? ? __schedule+0x403/0xbd0
[? 616.069619]? ? __mutex_lock+0x2aa/0x900
[? 616.074486]? schedule+0x32/0x80
[? 616.078578]? schedule_preempt_disabled+0xa/0x10
[? 616.084224]? __mutex_lock+0x2af/0x900
[? 616.088902]? ? __mutex_lock+0xb4/0x900
[? 616.093661]? ? find_held_lock+0x38/0xa0
[? 616.098532]? ? blkdev_put+0x1e/0x100
[? 616.103106]? blkdev_put+0x1e/0x100
[? 616.107482]? blkdev_close+0x24/0x30
[? 616.111939]? __fput+0xde/0x220
[? 616.115903]? task_work_run+0x89/0xa0
[? 616.120443]? exit_to_usermode_loop+0x7e/0xb1
[? 616.125766]? do_syscall_64+0x1e5/0x210
[? 616.130500]? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[? 616.136702] RIP: 0033:0x14a7e810c80d
[? 616.141239] RSP: 002b:000014a7c9954b90 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[? 616.150262] RAX: 0000000000000000 RBX: 000014a7e4b3fdd0 RCX:
000014a7e810c80d
[? 616.158793] RDX: 000014a7ee411000 RSI: 0000000000000080 RDI:
000000000000000f
[? 616.167317] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[? 616.175837] R10: 0000000000000000 R11: 0000000000000293 R12:
0000000000000000
[? 616.184359] R13: 0000000000000000 R14: 0000000000000000 R15:
000014a7cddc60b8
[? 616.192887] INFO: task fio:3720 blocked for more than 120 seconds.
[? 616.200349]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 616.207032] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 616.216362] fio???????????? D??? 0? 3720?? 3671 0x00000080
[? 616.223060] Call Trace:
[? 616.226373]? ? __schedule+0x403/0xbd0
[? 616.231041]? ? __mutex_lock+0x2aa/0x900
[? 616.235910]? schedule+0x32/0x80
[? 616.239999]? schedule_preempt_disabled+0xa/0x10
[? 616.245646]? __mutex_lock+0x2af/0x900
[? 616.250322]? ? __mutex_lock+0xb4/0x900
[? 616.255087]? ? find_held_lock+0x38/0xa0
[? 616.259953]? ? blkdev_put+0x1e/0x100
[? 616.264536]? blkdev_put+0x1e/0x100
[? 616.268910]? blkdev_close+0x24/0x30
[? 616.273369]? __fput+0xde/0x220
[? 616.277325]? task_work_run+0x89/0xa0
[? 616.281857]? exit_to_usermode_loop+0x7e/0xb1
[? 616.287177]? do_syscall_64+0x1e5/0x210
[? 616.291915]? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[? 616.298129] RIP: 0033:0x14a7e810c80d
[? 616.302668] RSP: 002b:000014a7c9b55b90 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[? 616.311692] RAX: 0000000000000000 RBX: 000014a7e4b3ff90 RCX:
000014a7e810c80d
[? 616.320223] RDX: 000014a7ee411000 RSI: 0000000000000080 RDI:
000000000000001e
[? 616.328751] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[? 616.337272] R10: 0000000000000000 R11: 0000000000000293 R12:
0000000000000000
[? 616.345792] R13: 0000000000000000 R14: 0000000000000000 R15:
000014a7cddd5c00
[? 616.354322] INFO: task fio:3721 blocked for more than 120 seconds.
[? 616.361779]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 616.368464] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 616.377788] fio???????????? D??? 0? 3721?? 3671 0x00000080
[? 616.384501] Call Trace:
[? 616.387810]? ? __schedule+0x403/0xbd0
[? 616.392478]? ? __mutex_lock+0x2aa/0x900
[? 616.397347]? schedule+0x32/0x80
[? 616.401433]? schedule_preempt_disabled+0xa/0x10
[? 616.407079]? __mutex_lock+0x2af/0x900
[? 616.411750]? ? __mutex_lock+0xb4/0x900
[? 616.416524]? ? find_held_lock+0x38/0xa0
[? 616.421391]? ? blkdev_put+0x1e/0x100
[? 616.425976]? blkdev_put+0x1e/0x100
[? 616.430350]? blkdev_close+0x24/0x30
[? 616.434812]? __fput+0xde/0x220
[? 616.438774]? task_work_run+0x89/0xa0
[? 616.443313]? exit_to_usermode_loop+0x7e/0xb1
[? 616.448632]? do_syscall_64+0x1e5/0x210
[? 616.453373]? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[? 616.459565] RIP: 0033:0x14a7e810c80d
[? 616.464111] RSP: 002b:000014a7c9d56b90 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[? 616.473132] RAX: 0000000000000000 RBX: 000014a7e4b40150 RCX:
000014a7e810c80d
[? 616.481662] RDX: 000014a7ee411000 RSI: 0000000000000080 RDI:
0000000000000017
[? 616.490178] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[? 616.498703] R10: 0000000000000000 R11: 0000000000000293 R12:
0000000000000000
[? 616.507217] R13: 0000000000000000 R14: 0000000000000000 R15:
000014a7cdde5748
[? 616.515745] INFO: task fio:3722 blocked for more than 120 seconds.
[? 616.523197]?????? Not tainted 4.17.0-rc4.test2_lockdep+ #4
[? 616.529881] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[? 616.539202] fio???????????? D??? 0? 3722?? 3671 0x00000080
[? 616.545912] Call Trace:
[? 616.549218]? ? __schedule+0x403/0xbd0
[? 616.553888]? ? __mutex_lock+0x2aa/0x900
[? 616.558756]? schedule+0x32/0x80
[? 616.562843]? schedule_preempt_disabled+0xa/0x10
[? 616.568489]? __mutex_lock+0x2af/0x900
[? 616.573164]? ? __mutex_lock+0xb4/0x900
[? 616.577936]? ? find_held_lock+0x38/0xa0
[? 616.582805]? ? blkdev_put+0x1e/0x100
[? 616.587380]? blkdev_put+0x1e/0x100
[? 616.591754]? blkdev_close+0x24/0x30
[? 616.596211]? __fput+0xde/0x220
[? 616.600176]? task_work_run+0x89/0xa0
[? 616.604716]? exit_to_usermode_loop+0x7e/0xb1
[? 616.610041]? do_syscall_64+0x1e5/0x210
[? 616.614778]? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[? 616.620979] RIP: 0033:0x14a7e810c80d
[? 616.625514] RSP: 002b:000014a7c9f57b90 EFLAGS: 00000293 ORIG_RAX:
0000000000000003
[? 616.634540] RAX: 0000000000000000 RBX: 000014a7e4b40310 RCX:
000014a7e810c80d
[? 616.643067] RDX: 000014a7ee411000 RSI: 0000000000000080 RDI:
0000000000000024
[? 616.651593] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[? 616.660115] R10: 0000000000000000 R11: 0000000000000293 R12:
0000000000000000
[? 616.668635] R13: 0000000000000000 R14: 0000000000000000 R15:
000014a7cddf5290
[? 616.677171]
[? 616.677171] Showing all locks held in the system:
[? 616.685157] 1 lock held by khungtaskd/257:
[? 616.690338]? #0: 000000000c4c8765 (tasklist_lock){.+.+}, at:
debug_show_all_locks+0x39/0x1b0
[? 616.700367] 3 locks held by kworker/u81:8/672:
[? 616.705904]? #0: 00000000f7d83d28 ((wq_completion)"nvme-wq"){+.+.},
at: process_one_work+0x173/0x5c0
[? 616.716728]? #1: 00000000d602fcc3
((work_completion)(&ctrl->scan_work)){+.+.}, at:
process_one_work+0x173/0x5c0
[? 616.728635]? #2: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
revalidate_disk+0x49/0x90
[? 616.738326] 2 locks held by kworker/u81:9/673:
[? 616.743930]? #0: 00000000c6ce30ed
((wq_completion)"nvme-reset-wq"){+.+.}, at: process_one_work+0x173/0x5c0
[? 616.755402]? #1: 000000002440c8c3
((work_completion)(&ctrl->ctrl.reset_work)){+.+.}, at:
process_one_work+0x173/0x5c0
[? 616.767958] 1 lock held by systemd-udevd/996:
[? 616.773511]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_reread_part+0x18/0x30
[? 616.783562] 2 locks held by agetty/1548:
[? 616.788649]? #0: 00000000dc70cd5c (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x24/0x50
[? 616.798714]? #1: 000000003ab79ae8 (&ldata->atomic_read_lock){+.+.},
at: n_tty_read+0xdb/0x8d0
[? 616.808973] 2 locks held by agetty/1549:
[? 616.814066]? #0: 00000000435ac06a (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x24/0x50
[? 616.824152]? #1: 000000002a9a768f (&ldata->atomic_read_lock){+.+.},
at: n_tty_read+0xdb/0x8d0
[? 616.834450] 1 lock held by fio/3715:
[? 616.839195]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.848624] 1 lock held by fio/3716:
[? 616.853373]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.862809] 1 lock held by fio/3717:
[? 616.867557]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.876996] 1 lock held by fio/3719:
[? 616.881761]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.891220] 1 lock held by fio/3720:
[? 616.896010]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.905492] 1 lock held by fio/3721:
[? 616.910305]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.919805] 1 lock held by fio/3722:
[? 616.924636]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.934161] 1 lock held by fio/3723:
[? 616.939012]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.948560] 1 lock held by fio/3724:
[? 616.953432]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.963000] 1 lock held by fio/3725:
[? 616.967899]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.977487] 1 lock held by fio/3726:
[? 616.982409]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 616.992015] 1 lock held by fio/3727:
[? 616.996939]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.006547] 1 lock held by fio/3728:
[? 617.011471]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.021094] 1 lock held by fio/3729:
[? 617.026051]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.035700] 1 lock held by fio/3730:
[? 617.040672]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.050333] 1 lock held by fio/3731:
[? 617.055318]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.064983] 1 lock held by fio/3732:
[? 617.069968]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.079628] 1 lock held by fio/3733:
[? 617.084608]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.094264] 1 lock held by fio/3734:
[? 617.099235]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.108883] 1 lock held by fio/3735:
[? 617.113841]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.123474] 1 lock held by fio/3736:
[? 617.128431]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.138062] 1 lock held by fio/3737:
[? 617.143009]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.152642] 1 lock held by fio/3738:
[? 617.157588]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.167211] 1 lock held by fio/3739:
[? 617.172157]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.181779] 1 lock held by fio/3740:
[? 617.186725]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.196350] 1 lock held by fio/3741:
[? 617.201291]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.210919] 1 lock held by fio/3742:
[? 617.215863]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.225485] 1 lock held by fio/3743:
[? 617.230429]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.240054] 1 lock held by fio/3744:
[? 617.244988]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.254616] 1 lock held by fio/3745:
[? 617.259557]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.269183] 1 lock held by fio/3746:
[? 617.274132]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.283757] 1 lock held by fio/3747:
[? 617.288696]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.298323] 1 lock held by fio/3748:
[? 617.303266]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.312894] 1 lock held by fio/3749:
[? 617.317843]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.327469] 1 lock held by fio/3751:
[? 617.332407]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.342032] 1 lock held by fio/3753:
[? 617.346970]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.356597] 1 lock held by fio/3754:
[? 617.361540]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.371163] 1 lock held by fio/3755:
[? 617.376108]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.385732] 1 lock held by fio/3756:
[? 617.390678]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.400304] 1 lock held by fio/3757:
[? 617.405237]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.414864] 1 lock held by fio/3758:
[? 617.419808]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.429431] 1 lock held by fio/3759:
[? 617.434377]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.444005] 1 lock held by fio/3760:
[? 617.448949]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.458576] 1 lock held by fio/3761:
[? 617.463518]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.473142] 1 lock held by fio/3762:
[? 617.478090]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.487714] 1 lock held by fio/3763:
[? 617.492653]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.502279] 1 lock held by fio/3764:
[? 617.507223]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.516848] 1 lock held by fio/3765:
[? 617.521790]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.531417] 1 lock held by fio/3766:
[? 617.536366]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.545993] 1 lock held by fio/3767:
[? 617.550943]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.560570] 1 lock held by fio/3768:
[? 617.565512]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.575136] 1 lock held by fio/3769:
[? 617.580081]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.589703] 1 lock held by fio/3770:
[? 617.594650]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.604275] 1 lock held by fio/3771:
[? 617.609213]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.618841] 1 lock held by fio/3772:
[? 617.623785]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.633412] 1 lock held by fio/3773:
[? 617.638361]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.647988] 1 lock held by fio/3774:
[? 617.652938]? #0: 00000000b47b635f (&bdev->bd_mutex){+.+.}, at:
blkdev_put+0x1e/0x100
[? 617.662566] 3 locks held by kworker/u82:3/3784:
[? 617.668585]? #0: 000000004f345383 ((wq_completion)"writeback"){+.+.},
at: process_one_work+0x173/0x5c0
[? 617.679966]? #1: 0000000009b8ab6b
((work_completion)(&(&wb->dwork)->work)){+.+.}, at:
process_one_work+0x173/0x5c0
[? 617.692524]? #2: 00000000f5c92b1e (&type->s_umount_key#33){.+.+}, at:
trylock_super+0x16/0x50
[? 617.703034] 3 locks held by main.sh/4525:
[? 617.708474]? #0: 00000000e900c5b8 (sb_writers#4){.+.+}, at:
vfs_write+0x17e/0x1b0
[? 617.717821]? #1: 0000000012a57c73 (&of->mutex){+.+.}, at:
kernfs_fop_write+0xe3/0x1a0
[? 617.727558]? #2: 00000000efd0bad4 (kn->count#240){++++}, at:
kernfs_fop_write+0xeb/0x1a0
[? 617.737585]
[? 617.740204] =============================================
[? 617.740204]
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-06-07 11:02 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1119455866.5604170.1527936593726.JavaMail.zimbra@redhat.com>
2018-06-02 11:25 ` [BUG REPORT] reset_controller stress operation lead to kernel NULL pointer Yi Zhang
2018-06-03 12:20 ` Sagi Grimberg
2018-06-03 12:59 ` Max Gurtovoy
2018-06-03 16:46 ` Yi Zhang
2018-06-05 8:56 ` Ming Lei
2018-06-06 8:32 ` Yi Zhang
2018-06-06 9:48 ` Max Gurtovoy
2018-06-07 3:20 ` Yi Zhang
2018-06-07 8:27 ` Sagi Grimberg
2018-06-07 11:02 ` Yi Zhang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.