From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============6148234018056153012==" MIME-Version: 1.0 From: Vladislav Bolkhovitin Subject: Re: [SPDK] SPDK errors Date: Tue, 29 Aug 2017 19:33:39 -0700 Message-ID: <59A62403.2010700@vlnb.net> In-Reply-To: CAKJZjix9m4h01u2-nz+ces+N-jKnGZZMoTRLXdmRgH89nLJV3A@mail.gmail.com List-ID: To: spdk@lists.01.org --===============6148234018056153012== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Santhebachalli Ganesh wrote on 08/29/2017 10:14 AM: > Folks, > My name is Ganesh, and I am working on NVEMoF performance metrics using S= PDK (and kernel). > I would appreciate your expert insights. > = > I am observing errors when QD on perf is increased above >=3D64 most of t= he > times. Sometimes, even for <=3D16 > Errors are not consistent. > = > Attached are some details. > = > Please let me know if have any additional questions. > = > Thanks. > -Ganesh > = > SPDK errors 1.txt > = > = > Setup details: > -- Some info on setup > Same HW/SW on target and initiator. > = > adminuser(a)dell730-80:~> hostnamectl > Static hostname: dell730-80 > Icon name: computer-server > Chassis: server > Machine ID: b5abb0fe67afd04c59521c40599b3115 > Boot ID: f825aa6338194338a6f80125caa836c7 > Operating System: openSUSE Leap 42.3 > CPE OS Name: cpe:/o:opensuse:leap:42.3 > Kernel: Linux 4.12.8-1.g4d7933a-default > Architecture: x86-64 > = > adminuser(a)dell730-80:~> lscpu | grep -i socket > Core(s) per socket: 12 > Socket(s): 2 > = > 2MB and/or 1GB huge pages set, > = > Latest spdk/dpdk from respective GIT, > = > compiled with RDMA flag, > = > nvmf.conf file: (have played around with the values) > reactor mask 0x5555 > AcceptorCore 2 > 1 - 3 Subsystems on cores 4,8,10 > = > adminuser(a)dell730-80:~> sudo gits/spdk/app/nvmf_tgt/nvmf_tgt -c gits/sp= dk/etc/spdk/nvmf.conf -p 6 > = > PCI, NVME cards (16GB) > adminuser(a)dell730-80:~> sudo lspci | grep -i pmc > 04:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev = 06) > 06:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev = 06) > 85:00.0 Non-Volatile memory controller: PMC-Sierra Inc. Device f117 (rev = 06) > = > Network cards: (latest associated FW from vendor) > adminuser(a)dell730-80:~> sudo lspci | grep -i connect > 05:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [Connec= tX-3 Pro] > = > --- initiator cmd line > sudo ./perf -q 32 -s 512 -w randread -t 30 -r 'trtype:RDMA adrfam:IPv4 tr= addr:1.1.1.80 trsvcid:4420 subnqn:nqn.2016-06.io.spdk:cnode1' -c 0x2 > = > --errors on stdout on target > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_= process_completions: *ERROR*: cpl does not map to outstanding cmd > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_prin= t_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0= m:0 dnr:0 > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_= process_completions: *ERROR*: cpl does not map to outstanding cmd > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_prin= t_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:201 cdw0:0 sqhd:0094 p:0= m:0 dnr:0 > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_= process_completions: *ERROR*: cpl does not map to outstanding cmd > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_prin= t_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:198 cdw0:0 sqhd:0094 p:0= m:0 dnr:0 > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_= process_completions: *ERROR*: cpl does not map to outstanding cmd > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_prin= t_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0= m:0 dnr:0 > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_pcie.c:1910:nvme_pcie_qpair_= process_completions: *ERROR*: cpl does not map to outstanding cmd > Aug 24 17:14:09 dell730-80 nvmf[38006]: nvme_qpair.c: 284:nvme_qpair_prin= t_completion: *NOTICE*: SUCCESS (00/00) sqid:1 cid:222 cdw0:0 sqhd:0094 p:0= m:0 dnr:0 > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_= cmd: *ERROR*: readv failed: rc =3D -12 > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_= cmd: *ERROR*: readv failed: rc =3D -12 > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_= cmd: *ERROR*: readv failed: rc =3D -12 > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_= cmd: *ERROR*: readv failed: rc =3D -12 > Aug 24 17:14:09 dell730-80 nvmf[38006]: bdev_nvme.c:1248:bdev_nvme_queue_= cmd: *ERROR*: readv failed: rc =3D -12 > Aug 24 17:14:13 dell730-80 nvmf[38006]: rdma.c:1622:spdk_nvmf_rdma_poll: = *ERROR*: CQ error on CQ 0x7f8a3803cae0, Request 0x140231622050400 (12): tra= nsport retry counter exceeded > = > --- errros seen on client > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error o= n Queue Pair 0x1fdb580, Response Index 33408520 (13): RNR retry counter exc= eeded > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error o= n Queue Pair 0x1fdb580, Response Index 33408016 (5): Work Request Flushed E= rror > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error o= n Queue Pair 0x1fdb580, Response Index 14 (5): Work Request Flushed Error > nvme_rdma.c:1470:nvme_rdma_qpair_process_completions: *ERROR*: CQ error o= n Queue Pair 0x1fdb580, Response Index 15 (5): Work Request Flushed Error It's, actually, might be HW errors, because retries supposed to be engaged = only on packets loss/corruptions. Might be bad or not too well inserted cables. Vlad --===============6148234018056153012==--