From: Krishnamraju Eraparaju <krishna2@chelsio.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-rdma@vger.kernel.org, bharat@chelsio.com,
linux-nvme@lists.infradead.org
Subject: Re: Hang at NVME Host caused by Controller reset
Date: Tue, 28 Jul 2020 17:29:07 +0530 [thread overview]
Message-ID: <20200728115904.GA5508@chelsio.com> (raw)
In-Reply-To: <9b8dae53-1fcc-3c03-5fcd-cfb55cd8cc80@grimberg.me>
Sagi,
With the given patch, I am no more seeing the freeze_queue_wait hang
issue, but I am seeing another hang issue:
dmesg:
[Jul28 11:01] igb 0000:03:00.0 enp3s0f0: igb: enp3s0f0 NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX
[ +0.000137] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0f0: link becomes
ready
[Jul28 11:17] cxgb4 0000:02:00.4 enp2s0f4: passive DA module inserted
[ +0.579450] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000683] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:19] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
support of multi-port devices.
[ +0.000159] nvme nvme0: creating 1 I/O queues.
[ +0.000350] nvme nvme0: mapped 1/0/0 default/read/poll queues.
[ +0.001316] nvme nvme0: new ctrl: NQN "nvme-ram0", addr 102.1.1.6:4420
[Jul28 11:20] DEBUG: cpu: 3: blk_queue_enter:448 process is "nvme" (pid
4011)
q->mq_freeze_depth: 1
(pm || (blk_pm_request_resume(q),!blk_queue_pm_only(q)))): 1
blk_queue_dying(q): 0
[ +21.511514] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.560355] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000941] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:21] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.552934] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.001076] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:22] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.615365] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000886] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:23] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.556661] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000837] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[ +3.765550] INFO: task bash:3014 blocked for more than 122 seconds.
[ +0.000067] Not tainted 5.8.0-rc7ekr+ #2
[ +0.000057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ +0.000064] bash D14272 3014 2417 0x00000000
[ +0.000066] Call Trace:
[ +0.000064] __schedule+0x32b/0x670
[ +0.000060] schedule+0x45/0xb0
[ +0.000059] schedule_timeout+0x216/0x330
[ +0.000060] ? enqueue_task_fair+0x196/0x7e0
[ +0.000059] wait_for_completion+0x81/0xe0
[ +0.000061] __flush_work+0x114/0x1c0
[ +0.000058] ? flush_workqueue_prep_pwqs+0x130/0x130
[ +0.000066] nvme_reset_ctrl_sync+0x25/0x40 [nvme_core]
[ +0.000125] nvme_sysfs_reset+0xd/0x20 [nvme_core]
[ +0.000137] kernfs_fop_write+0xbc/0x1a0
[ +0.000114] vfs_write+0xc2/0x1f0
[ +0.000120] ksys_write+0x5a/0xd0
[ +0.000106] do_syscall_64+0x3e/0x70
[ +0.000122] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000115] RIP: 0033:0x7f8124b93317
[ +0.000110] Code: Bad RIP value.
[ +0.000109] RSP: 002b:00007ffdbbbff1c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ +0.000182] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
00007f8124b93317
[ +0.000138] RDX: 0000000000000002 RSI: 0000559345c156d0 RDI:
0000000000000001
[ +0.000125] RBP: 0000559345c156d0 R08: 000000000000000a R09:
0000000000000001
[ +0.000117] R10: 00005593453d1471 R11: 0000000000000246 R12:
0000000000000002
[ +0.000116] R13: 00007f8124c6d6a0 R14: 00007f8124c6e4a0 R15:
00007f8124c6d8a0
[ +0.000121] INFO: task nvme:4011 blocked for more than 122 seconds.
[ +0.000118] Not tainted 5.8.0-rc7ekr+ #2
[ +0.000114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ +0.000190] nvme D14392 4011 2326 0x00004000
[ +0.000132] Call Trace:
[ +0.000117] __schedule+0x32b/0x670
[ +0.000109] schedule+0x45/0xb0
[ +0.000108] blk_queue_enter+0x1e9/0x250
[ +0.000109] ? wait_woken+0x70/0x70
[ +0.000108] blk_mq_alloc_request+0x53/0xc0
[ +0.000112] nvme_alloc_request+0x61/0x70 [nvme_core]
[ +0.000118] nvme_submit_user_cmd+0x50/0x310 [nvme_core]
[ +0.000126] nvme_user_cmd+0x12e/0x1c0 [nvme_core]
[ +0.000124] ? _copy_to_user+0x22/0x30
[ +0.000108] blkdev_ioctl+0x100/0x250
[ +0.000109] block_ioctl+0x34/0x40
[ +0.000110] ksys_ioctl+0x82/0xc0
[ +0.000106] __x64_sys_ioctl+0x11/0x20
[ +0.000126] do_syscall_64+0x3e/0x70
[ +0.000113] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000132] RIP: 0033:0x7fed0bd2967b
[ +0.000134] Code: Bad RIP value.
[ +0.000107] RSP: 002b:00007fff55b568a8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ +0.000172] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007fed0bd2967b
[ +0.000112] RDX: 00007fff55b568b0 RSI: 00000000c0484e43 RDI:
0000000000000003
[ +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[ +0.000130] R10: 0000000000000000 R11: 0000000000000246 R12:
00007fff55b5878a
[ +0.000119] R13: 0000000000000006 R14: 00007fff55b56f60 R15:
00005595f54554a0
[ +0.000135] Kernel panic - not syncing: hung_task: blocked tasks
[ +0.000141] CPU: 8 PID: 520 Comm: khungtaskd Not tainted 5.8.0-rc7ekr+
#2
Testcase:
while [ 1 ]; do nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done
while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller;
done
while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up;
sleep 28; done
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
WARNING: multiple messages have this Message-ID (diff)
From: Krishnamraju Eraparaju <krishna2@chelsio.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
bharat@chelsio.com
Subject: Re: Hang at NVME Host caused by Controller reset
Date: Tue, 28 Jul 2020 17:29:07 +0530 [thread overview]
Message-ID: <20200728115904.GA5508@chelsio.com> (raw)
In-Reply-To: <9b8dae53-1fcc-3c03-5fcd-cfb55cd8cc80@grimberg.me>
Sagi,
With the given patch, I am no more seeing the freeze_queue_wait hang
issue, but I am seeing another hang issue:
dmesg:
[Jul28 11:01] igb 0000:03:00.0 enp3s0f0: igb: enp3s0f0 NIC Link is Up
1000 Mbps Full Duplex, Flow Control: RX
[ +0.000137] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0f0: link becomes
ready
[Jul28 11:17] cxgb4 0000:02:00.4 enp2s0f4: passive DA module inserted
[ +0.579450] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000683] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:19] nvme nvme0: Please enable CONFIG_NVME_MULTIPATH for full
support of multi-port devices.
[ +0.000159] nvme nvme0: creating 1 I/O queues.
[ +0.000350] nvme nvme0: mapped 1/0/0 default/read/poll queues.
[ +0.001316] nvme nvme0: new ctrl: NQN "nvme-ram0", addr 102.1.1.6:4420
[Jul28 11:20] DEBUG: cpu: 3: blk_queue_enter:448 process is "nvme" (pid
4011)
q->mq_freeze_depth: 1
(pm || (blk_pm_request_resume(q),!blk_queue_pm_only(q)))): 1
blk_queue_dying(q): 0
[ +21.511514] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.560355] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000941] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:21] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.552934] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.001076] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:22] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.615365] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000886] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[Jul28 11:23] cxgb4 0000:02:00.4: Port 0 link down, reason: Link Down
[ +0.556661] cxgb4 0000:02:00.4 enp2s0f4: link up, 40Gbps, full-duplex,
Tx/Rx PAUSE
[ +0.000837] IPv6: ADDRCONF(NETDEV_CHANGE): enp2s0f4: link becomes
ready
[ +3.765550] INFO: task bash:3014 blocked for more than 122 seconds.
[ +0.000067] Not tainted 5.8.0-rc7ekr+ #2
[ +0.000057] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ +0.000064] bash D14272 3014 2417 0x00000000
[ +0.000066] Call Trace:
[ +0.000064] __schedule+0x32b/0x670
[ +0.000060] schedule+0x45/0xb0
[ +0.000059] schedule_timeout+0x216/0x330
[ +0.000060] ? enqueue_task_fair+0x196/0x7e0
[ +0.000059] wait_for_completion+0x81/0xe0
[ +0.000061] __flush_work+0x114/0x1c0
[ +0.000058] ? flush_workqueue_prep_pwqs+0x130/0x130
[ +0.000066] nvme_reset_ctrl_sync+0x25/0x40 [nvme_core]
[ +0.000125] nvme_sysfs_reset+0xd/0x20 [nvme_core]
[ +0.000137] kernfs_fop_write+0xbc/0x1a0
[ +0.000114] vfs_write+0xc2/0x1f0
[ +0.000120] ksys_write+0x5a/0xd0
[ +0.000106] do_syscall_64+0x3e/0x70
[ +0.000122] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000115] RIP: 0033:0x7f8124b93317
[ +0.000110] Code: Bad RIP value.
[ +0.000109] RSP: 002b:00007ffdbbbff1c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ +0.000182] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
00007f8124b93317
[ +0.000138] RDX: 0000000000000002 RSI: 0000559345c156d0 RDI:
0000000000000001
[ +0.000125] RBP: 0000559345c156d0 R08: 000000000000000a R09:
0000000000000001
[ +0.000117] R10: 00005593453d1471 R11: 0000000000000246 R12:
0000000000000002
[ +0.000116] R13: 00007f8124c6d6a0 R14: 00007f8124c6e4a0 R15:
00007f8124c6d8a0
[ +0.000121] INFO: task nvme:4011 blocked for more than 122 seconds.
[ +0.000118] Not tainted 5.8.0-rc7ekr+ #2
[ +0.000114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ +0.000190] nvme D14392 4011 2326 0x00004000
[ +0.000132] Call Trace:
[ +0.000117] __schedule+0x32b/0x670
[ +0.000109] schedule+0x45/0xb0
[ +0.000108] blk_queue_enter+0x1e9/0x250
[ +0.000109] ? wait_woken+0x70/0x70
[ +0.000108] blk_mq_alloc_request+0x53/0xc0
[ +0.000112] nvme_alloc_request+0x61/0x70 [nvme_core]
[ +0.000118] nvme_submit_user_cmd+0x50/0x310 [nvme_core]
[ +0.000126] nvme_user_cmd+0x12e/0x1c0 [nvme_core]
[ +0.000124] ? _copy_to_user+0x22/0x30
[ +0.000108] blkdev_ioctl+0x100/0x250
[ +0.000109] block_ioctl+0x34/0x40
[ +0.000110] ksys_ioctl+0x82/0xc0
[ +0.000106] __x64_sys_ioctl+0x11/0x20
[ +0.000126] do_syscall_64+0x3e/0x70
[ +0.000113] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ +0.000132] RIP: 0033:0x7fed0bd2967b
[ +0.000134] Code: Bad RIP value.
[ +0.000107] RSP: 002b:00007fff55b568a8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ +0.000172] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
00007fed0bd2967b
[ +0.000112] RDX: 00007fff55b568b0 RSI: 00000000c0484e43 RDI:
0000000000000003
[ +0.000113] RBP: 0000000000000000 R08: 0000000000000001 R09:
0000000000000000
[ +0.000130] R10: 0000000000000000 R11: 0000000000000246 R12:
00007fff55b5878a
[ +0.000119] R13: 0000000000000006 R14: 00007fff55b56f60 R15:
00005595f54554a0
[ +0.000135] Kernel panic - not syncing: hung_task: blocked tasks
[ +0.000141] CPU: 8 PID: 520 Comm: khungtaskd Not tainted 5.8.0-rc7ekr+
#2
Testcase:
while [ 1 ]; do nvme write-zeroes /dev/nvme0n1 -s 1 -c 1; done
while [ 1 ]; do echo 1 > /sys/block/nvme0n1/device/reset_controller;
done
while [ 1 ]; do ifconfig enp2s0f4 down; sleep 24; ifconfig enp2s0f4 up;
sleep 28; done
next prev parent reply other threads:[~2020-07-28 11:59 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-27 18:19 Hang at NVME Host caused by Controller reset Krishnamraju Eraparaju
2020-07-27 18:19 ` Krishnamraju Eraparaju
2020-07-27 18:47 ` Sagi Grimberg
2020-07-27 18:47 ` Sagi Grimberg
2020-07-28 11:59 ` Krishnamraju Eraparaju [this message]
2020-07-28 11:59 ` Krishnamraju Eraparaju
2020-07-28 15:54 ` Sagi Grimberg
2020-07-28 15:54 ` Sagi Grimberg
2020-07-28 17:42 ` Krishnamraju Eraparaju
2020-07-28 17:42 ` Krishnamraju Eraparaju
2020-07-28 18:35 ` Sagi Grimberg
2020-07-28 18:35 ` Sagi Grimberg
2020-07-28 20:20 ` Sagi Grimberg
2020-07-28 20:20 ` Sagi Grimberg
2020-07-29 8:57 ` Krishnamraju Eraparaju
2020-07-29 8:57 ` Krishnamraju Eraparaju
2020-07-29 9:28 ` Sagi Grimberg
[not found] ` <20200730162056.GA17468@chelsio.com>
2020-07-30 20:59 ` Sagi Grimberg
2020-07-30 21:32 ` Krishnamraju Eraparaju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200728115904.GA5508@chelsio.com \
--to=krishna2@chelsio.com \
--cc=bharat@chelsio.com \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.