From: bugzilla@dpdk.org
To: dev@dpdk.org
Subject: [dpdk-dev] [Bug 334] ConnectX-4/mlx5 crashes under high load in rxq_cq_decompress_v()
Date: Mon, 22 Jul 2019 01:56:43 +0000 [thread overview]
Message-ID: <bug-334-3@http.bugs.dpdk.org/> (raw)
https://bugs.dpdk.org/show_bug.cgi?id=334
Bug ID: 334
Summary: ConnectX-4/mlx5 crashes under high load in
rxq_cq_decompress_v()
Product: DPDK
Version: 18.11
Hardware: x86
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: Normal
Component: ethdev
Assignee: dev@dpdk.org
Reporter: yasu@nttv6.jp
Target Milestone: ---
I'm writing my own DPDK application and it gets a crash in the
mlx5 driver function.
It doesn't crash under 10Gbps load but does under 50Gbps load
(or higher, 90Gbps was tested and resulted in a similar crash).
(both load are for a 100GbE port.)
4 cores (4 rxqs, 1-to-1) were assigned for the port.
48 txqs were assigned for the port.
The port's device is:
Mellanox Technologies MT27700 Family [ConnectX-4]
MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64
in Ubuntu 18.04.2 LTS 4.15.0-50-generic
$ sudo mstflint -d 86:00.0 q
Image type: FS3
FW Version: 12.17.2020
FW Release Date: 22.11.2016
Description: UID GuidsNumber
Base GUID: N/A 4
Base MAC: 00900b65b390 4
Orig Base MAC: N/A 4
Image VSD: N/A
Device VSD: N/A
PSID: LNR3270110033
Security Attributes: N/A
(Couldn't update the firmware because of the PSID.)
The backtrace of the crash:
Thread 11 "lcore-slave-8" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff1517700 (LWP 30617)]
0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10)
at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
721 *__P = __B;
(gdb) bt
#0 0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10)
at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
#1 rxq_cq_decompress_v (rxq=0x1c0da7480, cq=0x1c0c8cb80, elts=0x1c0da7a70)
at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:438
#2 0x0000555555f05b82 in rxq_burst_v (rxq=0x1c0da7480, pkts=0x7ffff1514a40,
pkts_n=32, err=0x7ffff1505978)
at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:956
#3 0x0000555555f0662a in mlx5_rx_burst_vec (dpdk_rxq=0x1c0da7480,
pkts=0x7ffff1514a40, pkts_n=32)
at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec.c:238
#4 0x000055555563304d in rte_eth_rx_burst (port_id=0, queue_id=0,
rx_pkts=0x7ffff1514a40, nb_pkts=32)
at
/usr/local/dpdk-18.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:3879
(our DPDK application functions follow.)
It reproduces always. The same happened in DPDK 19.05.0.
When the crash occurs, in frame 1: rxq_cq_decompress_v():
(gdb) p t_pkt->data_len
$1 = 124
(gdb) p mcqe_n
$2 = 124
(gdb) p pos
$3 = 116
(gdb) p elts[pos + 3]
$10 = (struct rte_mbuf *) 0x0
It seems sometimes something is wrong in the initialization of
struct rte_mbuf *elts[].
(gdb) p/x (void*[124])elts[0]
$4 = {0x1e0106fc0, 0x1e00a8880, 0x1de8b4580, 0x1de716340, 0x1e00ad600,
0x1dfcc04c0, 0x1decf89c0, 0x1df656440, 0x1e02fc500, 0x1df7303c0,
0x1df7876c0, 0x1df0adfc0, 0x1dc44a8c0, 0x1dfb55040, 0x1df4b3480,
0x1ded87800, 0x1e07a64c0, 0x1dec066c0, 0x1dc59d9c0, 0x1de3ae540,
0x1debaf3c0, 0x1dfd69d40, 0x1dfd36f80, 0x1df073dc0, 0x1dffb3ec0,
0x1df0d7280, 0x1e0235b80, 0x1de4b3e40, 0x1df925900, 0x1df421f80,
0x1df021840, 0x1dfab7980, 0x1dfe572c0, 0x1dea3cb00, 0x1dbf5a540,
0x1de10aa00, 0x1dded8c00, 0x1df87c080, 0x1dee80f40, 0x1df596f00,
0x1dff20300, 0x1e05a4dc0, 0x1e0182800, 0x1e0257a00, 0x1e0323100,
0x1e0f3f100, 0x1df5ff140, 0x1dfbe17c0, 0x1de2c3680, 0x1dfd54080,
0x1de18afc0, 0x1dd81ecc0, 0x1de1f7f80, 0x1ded09900, 0x1df35b600,
0x1de57f540, 0x1df9e4e40, 0x1e0747d80, 0x1e024df00, 0x1ddf2d840,
0x1df95ad80, 0x1dedf47c0, 0x1de1ebdc0, 0x1e00e9ec0, 0x1e02febc0,
0x1dae22840, 0x1e051d3c0, 0x1df46f780, 0x1e0353800, 0x1e0ceb480,
0x1dfe9fd40, 0x1db58d440, 0x1e0526ec0, 0x1d61ebe40, 0x1dfe85300,
0x1df3b4fc0, 0x1ddbc0cc0, 0x1e04823c0, 0x1df724200, 0x1df9cf180,
0x1dfeb0c80, 0x1df4fe5c0, 0x1dff0f3c0, 0x1e051fa80, 0x1dd81c600,
0x1ddcef880, 0x1de30e7c0, 0x1ded803c0, 0x1de51e740, 0x1deffac40,
0x1df533a40, 0x1dd399240, 0x1deccf700, 0x1dfefbdc0, 0x1de9ab600,
0x1e0502980, 0x1dfb52980, 0x1dedabd40, 0x1e07e5440, 0x1dea91740,
0x1dd749ac0, 0x1e0d1e240, 0x1df86b140, 0x1df9013c0, 0x1dfc31680,
0x1dfa15540, 0x1e03694c0, 0x1e06dfb40, 0x1dfdf8b80, 0x1ddd8cf40,
0x1e03086c0, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380,
0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x0, 0x0, 0x0, 0x0,
0x7ffff7ff487c}
FYI:
The same DPDK application works fine on DPDK-18.11.2,
for Mellanox Technologies MT27800 Family [ConnectX-5]
with below firmware, even in the high load.
MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64.tgz
in Ubuntu 18.04.2 LTS 4.15.0-54-generic.
# mstflint -d 3b:00.0 q
Image type: FS4
FW Version: 16.24.1000
FW Release Date: 26.11.2018
Product Version: 16.24.1000
Rom Info: type=UEFI version=14.17.11 cpu=AMD64
type=PXE version=3.5.603 cpu=AMD64
Description: UID GuidsNumber
Base GUID: 506b4b0300086c56 8
Base MAC: 506b4b086c56 8
Image VSD: N/A
Device VSD: N/A
PSID: MT_0000000008
Security Attributes: N/A
--
You are receiving this mail because:
You are the assignee for the bug.
reply other threads:[~2019-07-22 1:56 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-334-3@http.bugs.dpdk.org/ \
--to=bugzilla@dpdk.org \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.