From: Yi Zhang <yi.zhang@redhat.com>
To: linux-nvme@lists.infradead.org
Cc: sagi@grimberg.me
Subject: NVMeoF RDMA IB: I/O timeout and NULL pointer observed during rescan_controller/reset_controller with fio background
Date: Wed, 18 Sep 2019 05:13:36 -0400 (EDT) [thread overview]
Message-ID: <1437535598.446597.1568798016422.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1823328454.445263.1568796846850.JavaMail.zimbra@redhat.com>
Hello
I observed bellow I/O timeout and NULL pointer on 5.3.0, pls help check it, let me know if you need more info or test patch, thanks
reproducer:
1. do fio background testing
2. stress rescan/reset_controller operation
echo 1 > /sys/block/nvme2n1/device/nvme2/rescan_controller
echo 1 > /sys/block/nvme2n1/device/nvme2/reset_controller
kernel log:
[ 384.865550] nvme nvme2: creating 48 I/O queues.
[ 386.784069] nvme nvme2: creating 48 I/O queues.
[ 387.771002] nvme_ns_head_make_request: 159989 callbacks suppressed
[ 387.771004] block nvme2n1: no usable path - requeuing I/O
[ 387.771012] block nvme2n1: no usable path - requeuing I/O
[ 387.771051] block nvme2n1: no usable path - requeuing I/O
[ 387.771061] block nvme2n1: no usable path - requeuing I/O
[ 387.771065] block nvme2n1: no usable path - requeuing I/O
[ 387.771070] block nvme2n1: no usable path - requeuing I/O
[ 387.771077] block nvme2n1: no usable path - requeuing I/O
[ 387.771124] block nvme2n1: no usable path - requeuing I/O
[ 387.771146] block nvme2n1: no usable path - requeuing I/O
[ 387.771155] block nvme2n1: no usable path - requeuing I/O
[ 449.670780] nvme nvme2: I/O 0 QID 0 timeout
[ 449.691674] nvme nvme2: Connect command failed, error wo/DNR bit: 7
[ 449.697974] BUG: kernel NULL pointer dereference, address: 00000000000000c8
[ 449.704945] #PF: supervisor read access in kernel mode
[ 449.710082] #PF: error_code(0x0000) - not-present page
[ 449.715221] PGD 0 P4D 0
[ 449.717761] Oops: 0000 [#1] SMP PTI
[ 449.721254] CPU: 45 PID: 1145 Comm: kworker/u98:2 Not tainted 5.3.0 #12
[ 449.727866] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.2.11 06/13/2019
[ 449.735448] Workqueue: nvme-reset-wq nvme_rdma_reset_ctrl_work [nvme_rdma]
[ 449.742325] RIP: 0010:rdma_disconnect+0x2e/0x90 [rdma_cm]
[ 449.747722] Code: 00 55 53 48 89 fb 48 8b bf a8 02 00 00 48 85 ff 74 65 48 8b 0b 0f b6 83 c0 01 00 00 48 69 c0 b8 00 00 00 48 03 81 80 04 00 00 <8b> 40 10 a8 04 75 0d a8 08 74 42 5b 31 f6 5d e9 fe 72 b2 ff 48 89
[ 449.766466] RSP: 0018:ffffb01f87323de0 EFLAGS: 00010206
[ 449.771693] RAX: 00000000000000b8 RBX: ffff9e4d5a474c00 RCX: ffff9e4d5a475c00
[ 449.778825] RDX: 0000000000000819 RSI: ffff9e4dffb96b88 RDI: ffff9e41af404e00
[ 449.785956] RBP: 0000000000000000 R08: 00000000000008e7 R09: 000000000000002d
[ 449.793098] R10: ffffb01f87323df8 R11: ffffb01f87323ac0 R12: 0000000000000007
[ 449.800229] R13: ffff9e4db1332000 R14: ffff9e42045e2540 R15: ffff9e4db1332000
[ 449.807361] FS: 0000000000000000(0000) GS:ffff9e4dffb80000(0000) knlGS:0000000000000000
[ 449.815456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 449.821195] CR2: 00000000000000c8 CR3: 000000183e60a003 CR4: 00000000007606e0
[ 449.828327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 449.835457] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 449.842581] PKRU: 55555554
[ 449.845287] Call Trace:
[ 449.847758] nvme_rdma_start_queue+0x8f/0xc0 [nvme_rdma]
[ 449.853070] nvme_rdma_setup_ctrl+0x4ef/0x6a0 [nvme_rdma]
[ 449.858469] nvme_rdma_reset_ctrl_work+0x4e/0x70 [nvme_rdma]
[ 449.864132] process_one_work+0x1a1/0x360
[ 449.868140] worker_thread+0x1c9/0x380
[ 449.871894] ? process_one_work+0x360/0x360
[ 449.876081] kthread+0x10c/0x130
[ 449.879310] ? kthread_create_on_node+0x60/0x60
[ 449.883846] ret_from_fork+0x35/0x40
[ 449.887422] Modules linked in: nvme_rdma nvme_fabrics nvmet_rdma nvmet 8021q garp mrp stp llc ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp iw_cxgb4 libcxgb mlx5_ib vfat fat opa_vnic ib_umad ib_ipoib intel_rapl_msr intel_rapl_common isst_if_common rpcrdma sunrpc skx_edac nfit rdma_ucm libnvdimm ib_iser rdma_cm x86_pkg_temp_thermal intel_powerclamp iw_cm ib_cm coretemp libiscsi kvm_intel scsi_transport_iscsi hfi1 kvm iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif irqbypass rdmavt bnxt_re crct10dif_pclmul crc32_pclmul ib_uverbs ghash_clmulni_intel intel_cstate intel_uncore ib_core dell_smbios mei_me ipmi_si intel_rapl_perf wmi_bmof dell_wmi_descriptor pcspkr sg mei i2c_i801 lpc_ich ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_vram_helper mlx5_core ttm drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops nvme libahci csiostor drm cxgb4 bnxt_en crc32c_intel nvme_core libata megaraid_sas scsi_trans
port_fc
[ 449.887456] mlxfw tg3 wmi dm_mirror dm_region_hash dm_log dm_mod
[ 449.980847] CR2: 00000000000000c8
[ 449.984182] ---[ end trace aeab63ac2e6510db ]---
[ 450.046884] RIP: 0010:rdma_disconnect+0x2e/0x90 [rdma_cm]
[ 450.052282] Code: 00 55 53 48 89 fb 48 8b bf a8 02 00 00 48 85 ff 74 65 48 8b 0b 0f b6 83 c0 01 00 00 48 69 c0 b8 00 00 00 48 03 81 80 04 00 00 <8b> 40 10 a8 04 75 0d a8 08 74 42 5b 31 f6 5d e9 fe 72 b2 ff 48 89
[ 450.071027] RSP: 0018:ffffb01f87323de0 EFLAGS: 00010206
[ 450.076255] RAX: 00000000000000b8 RBX: ffff9e4d5a474c00 RCX: ffff9e4d5a475c00
[ 450.083387] RDX: 0000000000000819 RSI: ffff9e4dffb96b88 RDI: ffff9e41af404e00
[ 450.090517] RBP: 0000000000000000 R08: 00000000000008e7 R09: 000000000000002d
[ 450.097642] R10: ffffb01f87323df8 R11: ffffb01f87323ac0 R12: 0000000000000007
[ 450.104774] R13: ffff9e4db1332000 R14: ffff9e42045e2540 R15: ffff9e4db1332000
[ 450.111899] FS: 0000000000000000(0000) GS:ffff9e4dffb80000(0000) knlGS:0000000000000000
[ 450.119985] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 450.125729] CR2: 00000000000000c8 CR3: 000000183e60a003 CR4: 00000000007606e0
[ 450.132853] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 450.139987] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 450.147118] PKRU: 55555554
[ 450.149831] Kernel panic - not syncing: Fatal exception
[ 450.155135] Kernel Offset: 0x17200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 450.224883] ---[ end Kernel panic - not syncing: Fatal exception ]---
# gdb /lib/modules/5.3.0/kernel/drivers/nvme/host/nvme-rdma.ko
Reading symbols from /lib/modules/5.3.0/kernel/drivers/nvme/host/nvme-rdma.ko...done.
(gdb) l *(nvme_rdma_start_queue+0x8f)
0x65f is in nvme_rdma_start_queue (drivers/nvme/host/rdma.c:568).
563 }
564
565 static void __nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
566 {
567 rdma_disconnect(queue->cm_id);
568 ib_drain_qp(queue->qp);
569 }
570
571 static void nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
572 {
(gdb)
# lspci | grep -i mel
3b:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]
Best Regards,
Yi Zhang
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next parent reply other threads:[~2019-09-18 9:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1823328454.445263.1568796846850.JavaMail.zimbra@redhat.com>
2019-09-18 9:13 ` Yi Zhang [this message]
2019-09-18 14:21 ` NVMeoF RDMA IB: I/O timeout and NULL pointer observed during rescan_controller/reset_controller with fio background Max Gurtovoy
2019-09-20 3:37 ` Yi Zhang
2019-09-20 16:58 ` Sagi Grimberg
2019-09-23 15:25 ` Max Gurtovoy
2019-09-24 4:52 ` Yi Zhang
2019-09-27 8:17 ` Yi Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1437535598.446597.1568798016422.JavaMail.zimbra@redhat.com \
--to=yi.zhang@redhat.com \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.