All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yi Zhang <yi.zhang@redhat.com>
To: linux-nvme@lists.infradead.org
Cc: selvin.xavier@broadcom.com, sagi@grimberg.me
Subject: Bug report: NVMeoF RDMA bnxt_roce: I/O 1 QID 0 timeout
Date: Mon, 11 Nov 2019 22:18:15 -0500 (EST)	[thread overview]
Message-ID: <661008349.11145565.1573528695548.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1269174553.11144711.1573527781970.JavaMail.zimbra@redhat.com>

Hello

I would like to report a NVMeoF RDMA I/O timeout issue, here is the reproducer and kernel log, let me know if you need more info 

Reproducer:
1.setup nvmeof rdma roce environment
   target: qedr roce
   client: bnxt roce

2. client: connect target
3. client: do fio stress

HW info:
target:
# lspci | grep -i ql
19:00.0 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
19:00.1 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
19:00.2 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
19:00.3 Ethernet controller: QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller (rev 02)
client:
# lspci  | grep -i bro
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
18:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] (rev 02)
19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)


kerne log:
client:
[   65.974085] nvme nvme2: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 172.31.45.186:4420
[   65.974592] nvme nvme2: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"
[   66.027510] nvme nvme2: creating 1 I/O queues.
[   66.053178] nvme nvme2: mapped 1/0/0 default/read/poll queues.
[   66.053802] nvme nvme2: new ctrl: NQN "testnqn", addr 172.31.45.186:4420
[   66.054465] nvme2n1: detected capacity change from 0 to 1600321314816
[   81.926738] nvme nvme2: I/O 1 QID 0 timeout
[   81.930943] nvme nvme2: starting error recovery
[   88.070471] nvme nvme2: I/O 1 QID 0 timeout
[   93.702284] nvme nvme2: I/O 1 QID 0 timeout
[   98.310013] nvme nvme2: I/O 49 QID 1 timeout
[   98.314297] nvme nvme2: I/O 50 QID 1 timeout
[   98.318575] nvme nvme2: I/O 51 QID 1 timeout
[   98.322855] nvme nvme2: I/O 52 QID 1 timeout
[   98.327135] nvme nvme2: I/O 54 QID 1 timeout
[   98.331419] nvme nvme2: I/O 55 QID 1 timeout
[   98.335699] nvme nvme2: I/O 56 QID 1 timeout
[   98.339984] nvme nvme2: I/O 57 QID 1 timeout
[   98.344259] nvme nvme2: I/O 65 QID 1 timeout
[   98.348531] nvme nvme2: I/O 66 QID 1 timeout
[   98.352805] nvme nvme2: I/O 67 QID 1 timeout
[   98.357086] nvme nvme2: I/O 68 QID 1 timeout
[   98.361369] nvme nvme2: I/O 69 QID 1 timeout
[   98.365648] nvme nvme2: I/O 70 QID 1 timeout
[   98.369928] nvme nvme2: I/O 71 QID 1 timeout
[   98.374203] nvme nvme2: I/O 72 QID 1 timeout
[   98.378484] nvme nvme2: I/O 73 QID 1 timeout
[   98.382764] nvme nvme2: I/O 74 QID 1 timeout
[   98.387035] nvme nvme2: I/O 75 QID 1 timeout
[   98.391310] nvme nvme2: I/O 76 QID 1 timeout
[   98.395590] nvme nvme2: I/O 77 QID 1 timeout
[   98.399864] nvme nvme2: I/O 78 QID 1 timeout
[   98.404141] nvme nvme2: I/O 79 QID 1 timeout
[   98.408415] nvme nvme2: I/O 80 QID 1 timeout
[   98.412686] nvme nvme2: I/O 81 QID 1 timeout
[   98.416961] nvme nvme2: I/O 82 QID 1 timeout
[   98.421232] nvme nvme2: I/O 86 QID 1 timeout
[   98.425503] nvme nvme2: I/O 87 QID 1 timeout
[   98.429776] nvme nvme2: I/O 94 QID 1 timeout
[   98.434050] nvme nvme2: I/O 95 QID 1 timeout
[   98.438323] nvme nvme2: I/O 96 QID 1 timeout
[   98.442596] nvme nvme2: I/O 97 QID 1 timeout
[   98.446867] nvme nvme2: I/O 98 QID 1 timeout
[   98.451141] nvme nvme2: I/O 99 QID 1 timeout
[   98.455412] nvme nvme2: I/O 100 QID 1 timeout
[   98.459772] nvme nvme2: I/O 101 QID 1 timeout
[   98.464131] nvme nvme2: I/O 102 QID 1 timeout
[   98.468490] nvme nvme2: I/O 103 QID 1 timeout
[   98.472849] nvme nvme2: I/O 104 QID 1 timeout
[   98.477208] nvme nvme2: I/O 105 QID 1 timeout
[   98.481568] nvme nvme2: I/O 106 QID 1 timeout
[   98.485926] nvme nvme2: I/O 109 QID 1 timeout
[   98.490286] nvme nvme2: I/O 111 QID 1 timeout
[   98.494644] nvme nvme2: I/O 112 QID 1 timeout
[   98.499004] nvme nvme2: I/O 113 QID 1 timeout
[   98.503363] nvme nvme2: I/O 115 QID 1 timeout
[   98.507721] nvme nvme2: I/O 116 QID 1 timeout
[   98.512080] nvme nvme2: I/O 117 QID 1 timeout
[   98.516441] nvme nvme2: I/O 118 QID 1 timeout
[   98.520807] nvme nvme2: I/O 119 QID 1 timeout
[   98.525167] nvme nvme2: I/O 120 QID 1 timeout
[   98.529526] nvme nvme2: I/O 121 QID 1 timeout
[   98.533885] nvme nvme2: I/O 122 QID 1 timeout
[   98.538245] nvme nvme2: I/O 123 QID 1 timeout
[   98.542603] nvme nvme2: I/O 124 QID 1 timeout
[   98.546963] nvme nvme2: I/O 126 QID 1 timeout
[   99.846073] nvme nvme2: I/O 1 QID 0 timeout
[  102.405967] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x41d]=0x3 timedout (20000)msec
[  102.413114] infiniband bnxt_re0: Failed to modify HW QP
[  102.418367] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x0]=0x3 send failed
[  102.424551] infiniband bnxt_re0: Failed to modify HW QP
[  102.429779] ------------[ cut here ]------------
[  102.434399] failed to drain send queue: -110
[  102.438730] WARNING: CPU: 43 PID: 780 at drivers/infiniband/core/verbs.c:2656 __ib_drain_sq+0x143/0x190 [ib_core]
[  102.448985] Modules linked in: nvme_rdma nvme_fabrics nvmet_rdma nvmet 8021q garp mrp stp llc ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp vfat fat mlx5_ib opa_vnic ib_umad ib_ipoib rpcrdma intel_rapl_msr sunrpc intel_rapl_common rdma_ucm ib_iser libiscsi isst_if_common scsi_transport_iscsi iTCO_wdt iTCO_vendor_support skx_edac dcdbas nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif iw_cxgb4 kvm irqbypass rdma_cm iw_cm ib_cm libcxgb crct10dif_pclmul crc32_pclmul hfi1 bnxt_re rdmavt ghash_clmulni_intel intel_cstate ib_uverbs ib_core intel_uncore dell_smbios intel_rapl_perf wmi_bmof dell_wmi_descriptor sg pcspkr mei_me ipmi_si i2c_i801 lpc_ich mei ipmi_devintf ipmi_msghandler acpi_power_meter xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops i2c_algo_bit drm_vram_helper ttm mlx5_core drm ahci csiostor cxgb4 libahci crc32c_intel nvme mlxfw bnxt_en nvme_core libata megaraid_sas
[  102.449018]  scsi_transport_fc tg3 pci_hyperv_intf wmi dm_mirror dm_region_hash dm_log dm_mod
[  102.543833] CPU: 43 PID: 780 Comm: kworker/u98:3 Kdump: loaded Not tainted 5.4.0-0.rc6.2.elrdy.x86_64 #1
[  102.553303] Hardware name: Dell Inc. PowerEdge R740/00WGD1, BIOS 2.2.11 06/13/2019
[  102.560872] Workqueue: nvme-wq nvme_rdma_error_recovery_work [nvme_rdma]
[  102.567572] RIP: 0010:__ib_drain_sq+0x143/0x190 [ib_core]
[  102.572968] Code: 00 00 00 48 89 df e8 8c 28 7f f4 48 85 c0 74 e1 e9 69 ff ff ff 89 c6 48 c7 c7 48 ce 51 c1 c6 05 cd 0c 04 00 01 e8 56 2f fe f3 <0f> 0b e9 4d ff ff ff 80 3d b9 0c 04 00 00 0f 85 40 ff ff ff 89 c6
[  102.591711] RSP: 0018:ffffbb79c7883cf8 EFLAGS: 00010282
[  102.596940] RAX: 0000000000000000 RBX: ffff9ed9bdac3818 RCX: 0000000000000000
[  102.604072] RDX: ffff9ee60fb66900 RSI: ffff9ee60fb56b88 RDI: ffff9ee60fb56b88
[  102.611204] RBP: ffff9ee60b1d8400 R08: 000000000000079a R09: 000000000000002b
[  102.618334] R10: 0000000000000000 R11: ffffbb79c7883ba0 R12: ffffbb79c7883d28
[  102.625458] R13: ffffbb79c7883d00 R14: ffff9eda44b0fbc0 R15: ffff9ee5ff4000b0
[  102.632584] FS:  0000000000000000(0000) GS:ffff9ee60fb40000(0000) knlGS:0000000000000000
[  102.640669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  102.646415] CR2: 00007faba43bcec8 CR3: 0000000f4fa0a004 CR4: 00000000007606e0
[  102.653547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  102.660680] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  102.667812] PKRU: 55555554
[  102.670525] Call Trace:
[  102.672987]  ? wb_over_bg_thresh+0x20c/0x220
[  102.677265]  ib_drain_qp+0xe/0x20 [ib_core]
[  102.681448]  nvme_rdma_teardown_io_queues.part.35+0x4e/0xa0 [nvme_rdma]
[  102.688057]  nvme_rdma_error_recovery_work+0x35/0x90 [nvme_rdma]
[  102.694064]  process_one_work+0x1a1/0x360
[  102.698074]  worker_thread+0x30/0x380
[  102.701742]  ? pwq_unbound_release_workfn+0xd0/0xd0
[  102.706622]  kthread+0x112/0x130
[  102.709851]  ? __kthread_parkme+0x70/0x70
[  102.713867]  ret_from_fork+0x35/0x40
[  102.717444] ---[ end trace b9af855765aeac25 ]---
[  102.722074] bnxt_en 0000:19:00.0: QPLIB: cmdq[0x0]=0x3 send failed
[  102.728271] infiniband bnxt_re0: Failed to modify HW QP

target:
[   44.362119] nvmet: adding nsid 1 to subsystem testnqn
[   44.362662] nvmet_rdma: enabling port 2 (172.31.45.186:4420)
[   61.722470] SubnSet(OPA_PortInfo) smlid 0x4
[   62.127400] nvmet: creating controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:07212bc2-57e0-4729-9377-4608892fed05.
[   62.180752] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:07212bc2-57e0-4729-9377-4608892fed05.
[   80.914135] [qedr_poll_cq_req:3816(qedr0)]Error: POLL CQ with ROCE_CQE_REQ_STS_TRANSPORT_RETRY_CNT_ERR. CQ icid=0x1, QP icid=0x2
[   92.839414] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[   92.845445] nvmet: ctrl 1 fatal error occurred!

Best Regards,
  Yi Zhang



_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

       reply	other threads:[~2019-11-12  3:20 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1269174553.11144711.1573527781970.JavaMail.zimbra@redhat.com>
2019-11-12  3:18 ` Yi Zhang [this message]
2019-11-12  4:51   ` Bug report: NVMeoF RDMA bnxt_roce: I/O 1 QID 0 timeout Selvin Xavier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=661008349.11145565.1573528695548.JavaMail.zimbra@redhat.com \
    --to=yi.zhang@redhat.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=selvin.xavier@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.