Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Yi Zhang <yi.zhang@redhat.com>
To: linux-nvme@lists.infradead.org
Cc: sagi@grimberg.me
Subject: NVMeoF RDMA IB:  I/O timeout and NULL pointer observed during rescan_controller/reset_controller with fio background
Date: Wed, 18 Sep 2019 05:13:36 -0400 (EDT)	[thread overview]
Message-ID: <1437535598.446597.1568798016422.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1823328454.445263.1568796846850.JavaMail.zimbra@redhat.com>

Hello 
I observed bellow I/O timeout and NULL pointer on 5.3.0, pls help check it, let me know if you need more info or test patch, thanks

reproducer:
1. do fio background testing
2. stress rescan/reset_controller operation
echo 1 > /sys/block/nvme2n1/device/nvme2/rescan_controller
echo 1 > /sys/block/nvme2n1/device/nvme2/reset_controller

kernel log:
[  384.865550] nvme nvme2: creating 48 I/O queues.
[  386.784069] nvme nvme2: creating 48 I/O queues.
[  387.771002] nvme_ns_head_make_request: 159989 callbacks suppressed
[  387.771004] block nvme2n1: no usable path - requeuing I/O
[  387.771012] block nvme2n1: no usable path - requeuing I/O
[  387.771051] block nvme2n1: no usable path - requeuing I/O
[  387.771061] block nvme2n1: no usable path - requeuing I/O
[  387.771065] block nvme2n1: no usable path - requeuing I/O
[  387.771070] block nvme2n1: no usable path - requeuing I/O
[  387.771077] block nvme2n1: no usable path - requeuing I/O
[  387.771124] block nvme2n1: no usable path - requeuing I/O
[  387.771146] block nvme2n1: no usable path - requeuing I/O
[  387.771155] block nvme2n1: no usable path - requeuing I/O
[  449.670780] nvme nvme2: I/O 0 QID 0 timeout
[  449.691674] nvme nvme2: Connect command failed, error wo/DNR bit: 7
[  449.697974] BUG: kernel NULL pointer dereference, address: 00000000000000c8
[  449.704945] #PF: supervisor read access in kernel mode
[  449.710082] #PF: error_code(0x0000) - not-present page
[  449.715221] PGD 0 P4D 0 
[  449.717761] Oops: 0000 [#1] SMP PTI
[  449.721254] CPU: 45 PID: 1145 Comm: kworker/u98:2 Not tainted 5.3.0 #12
[  449.727866] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.2.11 06/13/2019
[  449.735448] Workqueue: nvme-reset-wq nvme_rdma_reset_ctrl_work [nvme_rdma]
[  449.742325] RIP: 0010:rdma_disconnect+0x2e/0x90 [rdma_cm]
[  449.747722] Code: 00 55 53 48 89 fb 48 8b bf a8 02 00 00 48 85 ff 74 65 48 8b 0b 0f b6 83 c0 01 00 00 48 69 c0 b8 00 00 00 48 03 81 80 04 00 00 <8b> 40 10 a8 04 75 0d a8 08 74 42 5b 31 f6 5d e9 fe 72 b2 ff 48 89
[  449.766466] RSP: 0018:ffffb01f87323de0 EFLAGS: 00010206
[  449.771693] RAX: 00000000000000b8 RBX: ffff9e4d5a474c00 RCX: ffff9e4d5a475c00
[  449.778825] RDX: 0000000000000819 RSI: ffff9e4dffb96b88 RDI: ffff9e41af404e00
[  449.785956] RBP: 0000000000000000 R08: 00000000000008e7 R09: 000000000000002d
[  449.793098] R10: ffffb01f87323df8 R11: ffffb01f87323ac0 R12: 0000000000000007
[  449.800229] R13: ffff9e4db1332000 R14: ffff9e42045e2540 R15: ffff9e4db1332000
[  449.807361] FS:  0000000000000000(0000) GS:ffff9e4dffb80000(0000) knlGS:0000000000000000
[  449.815456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  449.821195] CR2: 00000000000000c8 CR3: 000000183e60a003 CR4: 00000000007606e0
[  449.828327] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  449.835457] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  449.842581] PKRU: 55555554
[  449.845287] Call Trace:
[  449.847758]  nvme_rdma_start_queue+0x8f/0xc0 [nvme_rdma]
[  449.853070]  nvme_rdma_setup_ctrl+0x4ef/0x6a0 [nvme_rdma]
[  449.858469]  nvme_rdma_reset_ctrl_work+0x4e/0x70 [nvme_rdma]
[  449.864132]  process_one_work+0x1a1/0x360
[  449.868140]  worker_thread+0x1c9/0x380
[  449.871894]  ? process_one_work+0x360/0x360
[  449.876081]  kthread+0x10c/0x130
[  449.879310]  ? kthread_create_on_node+0x60/0x60
[  449.883846]  ret_from_fork+0x35/0x40
[  449.887422] Modules linked in: nvme_rdma nvme_fabrics nvmet_rdma nvmet 8021q garp mrp stp llc ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp iw_cxgb4 libcxgb mlx5_ib vfat fat opa_vnic ib_umad ib_ipoib intel_rapl_msr intel_rapl_common isst_if_common rpcrdma sunrpc skx_edac nfit rdma_ucm libnvdimm ib_iser rdma_cm x86_pkg_temp_thermal intel_powerclamp iw_cm ib_cm coretemp libiscsi kvm_intel scsi_transport_iscsi hfi1 kvm iTCO_wdt iTCO_vendor_support dcdbas ipmi_ssif irqbypass rdmavt bnxt_re crct10dif_pclmul crc32_pclmul ib_uverbs ghash_clmulni_intel intel_cstate intel_uncore ib_core dell_smbios mei_me ipmi_si intel_rapl_perf wmi_bmof dell_wmi_descriptor pcspkr sg mei i2c_i801 lpc_ich ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_vram_helper mlx5_core ttm drm_kms_helper syscopyarea sysfillrect sysimgblt ahci fb_sys_fops nvme libahci csiostor drm cxgb4 bnxt_en crc32c_intel nvme_core libata megaraid_sas scsi_trans
 port_fc
[  449.887456]  mlxfw tg3 wmi dm_mirror dm_region_hash dm_log dm_mod
[  449.980847] CR2: 00000000000000c8
[  449.984182] ---[ end trace aeab63ac2e6510db ]---
[  450.046884] RIP: 0010:rdma_disconnect+0x2e/0x90 [rdma_cm]
[  450.052282] Code: 00 55 53 48 89 fb 48 8b bf a8 02 00 00 48 85 ff 74 65 48 8b 0b 0f b6 83 c0 01 00 00 48 69 c0 b8 00 00 00 48 03 81 80 04 00 00 <8b> 40 10 a8 04 75 0d a8 08 74 42 5b 31 f6 5d e9 fe 72 b2 ff 48 89
[  450.071027] RSP: 0018:ffffb01f87323de0 EFLAGS: 00010206
[  450.076255] RAX: 00000000000000b8 RBX: ffff9e4d5a474c00 RCX: ffff9e4d5a475c00
[  450.083387] RDX: 0000000000000819 RSI: ffff9e4dffb96b88 RDI: ffff9e41af404e00
[  450.090517] RBP: 0000000000000000 R08: 00000000000008e7 R09: 000000000000002d
[  450.097642] R10: ffffb01f87323df8 R11: ffffb01f87323ac0 R12: 0000000000000007
[  450.104774] R13: ffff9e4db1332000 R14: ffff9e42045e2540 R15: ffff9e4db1332000
[  450.111899] FS:  0000000000000000(0000) GS:ffff9e4dffb80000(0000) knlGS:0000000000000000
[  450.119985] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  450.125729] CR2: 00000000000000c8 CR3: 000000183e60a003 CR4: 00000000007606e0
[  450.132853] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  450.139987] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  450.147118] PKRU: 55555554
[  450.149831] Kernel panic - not syncing: Fatal exception
[  450.155135] Kernel Offset: 0x17200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  450.224883] ---[ end Kernel panic - not syncing: Fatal exception ]---

# gdb /lib/modules/5.3.0/kernel/drivers/nvme/host/nvme-rdma.ko 
Reading symbols from /lib/modules/5.3.0/kernel/drivers/nvme/host/nvme-rdma.ko...done.

(gdb) l *(nvme_rdma_start_queue+0x8f)
0x65f is in nvme_rdma_start_queue (drivers/nvme/host/rdma.c:568).
563	}
564	
565	static void __nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
566	{
567		rdma_disconnect(queue->cm_id);
568		ib_drain_qp(queue->qp);
569	}
570	
571	static void nvme_rdma_stop_queue(struct nvme_rdma_queue *queue)
572	{
(gdb) 

# lspci | grep -i mel
3b:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5]



Best Regards,
  Yi Zhang



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

       reply	other threads:[~2019-09-18  9:13 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1823328454.445263.1568796846850.JavaMail.zimbra@redhat.com>
2019-09-18  9:13 ` Yi Zhang [this message]
2019-09-18 14:21   ` NVMeoF RDMA IB: I/O timeout and NULL pointer observed during rescan_controller/reset_controller with fio background Max Gurtovoy
2019-09-20  3:37     ` Yi Zhang
2019-09-20 16:58       ` Sagi Grimberg
2019-09-23 15:25         ` Max Gurtovoy
2019-09-24  4:52           ` Yi Zhang
2019-09-27  8:17           ` Yi Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1437535598.446597.1568798016422.JavaMail.zimbra@redhat.com \
    --to=yi.zhang@redhat.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox