From: Yi Zhang <yi.zhang@redhat.com>
To: linux-nvme@lists.infradead.org
Cc: sagi@grimberg.me
Subject: NVMeoF RDMA IB: I/O timeout and NULL pointer observed during rescan_controller/reset_controller with fio background
Date: Wed, 18 Sep 2019 05:13:36 -0400 (EDT) [thread overview]
Message-ID: <1437535598.446597.1568798016422.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1823328454.445263.1568796846850.JavaMail.zimbra@redhat.com>
Hello
I observed bellow I/O timeout and NULL pointer on 5.3.0, pls help check it, let me know if you need more info or test patch, thanks
reproducer:
1. do fio background testing
2. stress rescan/reset_controller operation
echo 1 > /sys/block/nvme2n1/device/nvme2/rescan_controller
echo 1 > /sys/block/nvme2n1/device/nvme2/reset_controller
kernel log:
[ 384.865550] nvme nvme2: creating 48 I/O queues.
[ 386.784069] nvme nvme2: creating 48 I/O queues.
[ 387.771002] nvme_ns_head_make_request: 159989 callbacks suppressed
[ 387.771004] block nvme2n1: no usable path - requeuing I/O
[ 387.771012] block nvme2n1: no usable path - requeuing I/O
[ 387.771051] block nvme2n1: no usable path - requeuing I/O
[ 387.771061] block nvme2n1: no usable path - requeuing I/O
[ 387.771065] block nvme2n1: no usable path - requeuing I/O
[ 387.771070] block nvme2n1: no usable path - requeuing I/O
[ 387.771077] block nvme2n1: no usable path - requeuing I/O
[ 387.771124] block nvme2n1: no usable path - requeuing I/O
[ 387.771146] block nvme2n1: no usable path - requeuing I/O
[ 387.771155] block nvme2n1: no usable path - requeuing I/O
[ 449.670780] nvme nvme2: I/O 0 QID 0 timeout
[ 449.691674] nvme nvme2: Connect command failed, error wo/DNR bit: 7
[ 449.697974] BUG: kernel NULL pointer dereference, address: 00000000000000c8
[ 449.704945] #PF: supervisor read access in kernel mode
[ 449.710082] #PF: error_code(0x0000) - not-present page
[ 449.715221] PGD 0 P4D 0
[ 449.717761] Oops: 0000 [#1] SMP PTI
[ 449.721254] CPU: 45 PID: 1145 Comm: kworker/u98:2 Not tainted 5.3.0 #12
[ 449.727866] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.2.11 06/13/2019
[ 449.735448] Workqueue: nvme-reset-wq nvme_rdma_reset_ctrl_work [nvme_rdma]
[ 449.742325] RIP: 0010:rdma_disconnect+0x2e/0x90 [rdma_cm]
[ 449.747722] Code: 00 55 53 48 89 fb 48 8b bf a8 02 00 00 48 85 ff 74 65 48 8b 0b 0f b6 83 c0 01 00 00 48 69 c0 b8 00 00 00 48 03 81 80 04 00 00 <8b> 40 10 a8 04 75 0d a8 08 74 42 5b 31 f6 5d e9 fe 72 b2 ff 48 89
[ 449.766466] RSP: 0018:ffffb01f87323de0 EFLAGS: 00010206
[ 449.771693] RAX: 00000000000000b8 RBX: ffff9e4d5a474c00 RCX: ffff9e4d5a475c00
[ 449.778825] RDX: 0000000000000819 RSI: ffff9e4dffb96b88 RDI: ffff9e41af404e00
[ 449.785956] RBP: 0000000000000000 R08: 00000000000008e7 R09: 000000000000002d
[ 449.793098] R10: ffffb01f87323df8 R11: ffffb01f87323ac0 R12: 0000000000000007
[ 449.800229] R13: ffff9e4db1332000 R14: ffff9e42045e2540 R15: ffff9e4db1332000
[ 449.807361] FS: 0000000000000000(0000) GS:ffff9e4dffb80000(0000) knlGS:0000000000000000
[ 449.815456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 449.821195] CR2: 00000000000000c8 CR3: 000000183e60a003 CR4: 00000000007606e0
[ 449.828327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 449.835457] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 449.842581] PKRU: 55555554
[ 449.845287] Call Trace:
[ 449.847758] nvme_rdma_start_queue+0x8f/0xc0 [nvme_rdma]
[ 449.853070] nvme_rdma_setup_ctrl+0x4ef/0x6a0 [nvme_rdma]
[ 449.858469] nvme_rdma_reset_ctrl_work+0x4e/0x70 [nvme_rdma]
[ 449.864132] process_one_work+0x1a1/0x360
[ 449.868140] worker_thread+0x1c9/0x380
[ 449.871894] ? process_one_work+0x360/0x360
[ 449.876081] kthread+0x10c/0x130
[ 449.879310] ? kthread_create_on_node+0x60/0x60
[ 449.883846] ret_from_fork+0x35/0x40
[ 449.887422] Modules linked in: nvme_rdma nvme_fabrics nvmet_rdma nvmet 8021q garp mrp stp llc ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp iw_cxgb4 libcxgb mlx5_ib vfat fat opa_vnic ib_umad ib_ipoib intel_rapl_msr intel_rapl_common isst_if_common rpcrdma sunrpc skx_edac nfit rdma_ucm libnvdimm ib_iser rdma_cm x86_pkg_temp_thermal intel_powerclamp iw_cm ib_cm coretemp libiscsi kvm_intel scsi_transport_iscsi hfi1 kvm iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif irqbypass rdmavt bnxt_re crct10dif_pclmul crc32_pclmul ib_uverbs ghash_clmulni_intel intel_cstate intel_uncore ib_core dell_smbios mei_me ipmi_si intel_rapl_perf wmi_bmof dell_wmi_descriptor pcspkr sg mei i2c_i801 lpc_ich ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_vram_helper mlx5_core ttm drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops nvme libahci csiostor drm cxgb4 bnxt_en crc32c_intel nvme_core libata megaraid_sas scsi_trans
port_fc
[ 449.887456] mlxfw tg3 wmi dm_mirror dm_region_hash dm_log dm_mod
[ 449.980847] CR2: 00000000000000c8
[ 449.984182] ---[ end trace aeab63ac2e6510db ]---
[ 450.046884] RIP: 0010:rdma_disconnect+0x2e/0x90 [rdma_cm]
[ 450.052282] Code: 00 55 53 48 89 fb 48 8b bf a8 02 00 00 48 85 ff 74 65 48 8b 0b 0f b6 83 c0 01 00 00 48 69 c0 b8 00 00 00 48 03 81 80 04 00 00 <8b> 40 10 a8 04 75 0d a8 08 74 42 5b 31 f6 5d e9 fe 72 b2 ff 48 89
[ 450.071027] RSP: 0018:ffffb01f87323de0 EFLAGS: 00010206
[ 450.076255] RAX: 00000000000000b8 RBX: ffff9e4d5a474c00 RCX: ffff9e4d5a475c00
[ 450.083387] RDX: 0000000000000819 RSI: ffff9e4dffb96b88 RDI: ffff9e41af404e00
[ 450.090517] RBP: 0000000000000000 R08: 00000000000008e7 R09: 000000000000002d
[ 450.097642] R10: ffffb01f87323df8 R11: ffffb01f87323ac0 R12: 0000000000000007
[ 450.104774] R13: ffff9e4db1332000 R14: ffff9e42045e2540 R15: ffff9e4db1332000
[ 450.111899] FS: 0000000000000000(0000) GS:ffff9e4dffb80000(0000) knlGS:0000000000000000
[ 450.119985] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 450.125729] CR2: 00000000000000c8 CR3: 000000183e60a003 CR4: 00000000007606e0
[ 450.132853] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 450.139987] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 450.147118] PKRU: 55555554
[ 450.149831] Kernel panic - not syncing: Fatal exception
[ 450.155135] Kernel Offset: 0x17200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 450.224883] ---[ end Kernel panic - not syncing: Fatal exception ]---
# gdb /lib/modules/5.3.0/kernel/drivers/nvme/host/nvme-rdma.ko
Reading symbols from /lib/modules/5.3.0/kernel/drivers/nvme/host/nvme-rdma.ko...done.
(gdb) l *(nvme_rdma_start_queue+0x8f)
0x65f is in nvme_rdma_start_queue (drivers/nvme/host/rdma.c:568).
563 }
564
565 static void __nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
566 {
567 rdma_disconnect(queue->cm_id);
568 ib_drain_qp(queue->qp);
569 }
570
571 static void nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
572 {
(gdb)
# lspci | grep -i mel
3b:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
Best Regards,
Yi Zhang
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next parent reply other threads:[~2019-09-18 9:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1823328454.445263.1568796846850.JavaMail.zimbra@redhat.com>
2019-09-18 9:13 ` Yi Zhang [this message]
2019-09-18 14:21 ` NVMeoF RDMA IB: I/O timeout and NULL pointer observed during rescan_controller/reset_controller with fio background Max Gurtovoy
2019-09-20 3:37 ` Yi Zhang
2019-09-20 16:58 ` Sagi Grimberg
2019-09-23 15:25 ` Max Gurtovoy
2019-09-24 4:52 ` Yi Zhang
2019-09-27 8:17 ` Yi Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1437535598.446597.1568798016422.JavaMail.zimbra@redhat.com \
--to=yi.zhang@redhat.com \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox