All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla@dpdk.org
To: dev@dpdk.org
Subject: [dpdk-dev] [Bug 334] ConnectX-4/mlx5 crashes under high load in rxq_cq_decompress_v()
Date: Mon, 22 Jul 2019 01:56:43 +0000	[thread overview]
Message-ID: <bug-334-3@http.bugs.dpdk.org/> (raw)

https://bugs.dpdk.org/show_bug.cgi?id=334

            Bug ID: 334
           Summary: ConnectX-4/mlx5 crashes under high load in
                    rxq_cq_decompress_v()
           Product: DPDK
           Version: 18.11
          Hardware: x86
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: ethdev
          Assignee: dev@dpdk.org
          Reporter: yasu@nttv6.jp
  Target Milestone: ---

I'm writing my own DPDK application and it gets a crash in the
mlx5 driver function.

It doesn't crash under 10Gbps load but does under 50Gbps load
(or higher, 90Gbps was tested and resulted in a similar crash).
(both load are for a 100GbE port.)
4 cores (4 rxqs, 1-to-1) were assigned for the port.
48 txqs were assigned for the port.

The port's device is:
Mellanox Technologies MT27700 Family [ConnectX-4]
MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64
in Ubuntu 18.04.2 LTS 4.15.0-50-generic

$ sudo mstflint -d 86:00.0 q
Image type:            FS3
FW Version:            12.17.2020
FW Release Date:       22.11.2016
Description:           UID                GuidsNumber
Base GUID:             N/A                     4
Base MAC:              00900b65b390            4
Orig Base MAC:         N/A                     4
Image VSD:             N/A
Device VSD:            N/A
PSID:                  LNR3270110033
Security Attributes:   N/A
(Couldn't update the firmware because of the PSID.)

The backtrace of the crash:

Thread 11 "lcore-slave-8" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff1517700 (LWP 30617)]
0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10)
    at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
721       *__P = __B;
(gdb) bt
#0  0x0000555555f0230a in _mm_storeu_si128 (__B=..., __P=0x10)
    at /usr/lib/gcc/x86_64-linux-gnu/7/include/emmintrin.h:721
#1  rxq_cq_decompress_v (rxq=0x1c0da7480, cq=0x1c0c8cb80, elts=0x1c0da7a70)
    at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:438
#2  0x0000555555f05b82 in rxq_burst_v (rxq=0x1c0da7480, pkts=0x7ffff1514a40,
    pkts_n=32, err=0x7ffff1505978)
    at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec_sse.h:956
#3  0x0000555555f0662a in mlx5_rx_burst_vec (dpdk_rxq=0x1c0da7480,
    pkts=0x7ffff1514a40, pkts_n=32)
    at /usr/local/dpdk-stable-18.11.2/drivers/net/mlx5/mlx5_rxtx_vec.c:238
#4  0x000055555563304d in rte_eth_rx_burst (port_id=0, queue_id=0,
    rx_pkts=0x7ffff1514a40, nb_pkts=32)
    at
/usr/local/dpdk-18.11/x86_64-native-linuxapp-gcc/include/rte_ethdev.h:3879
(our DPDK application functions follow.)

It reproduces always. The same happened in DPDK 19.05.0.

When the crash occurs, in frame 1: rxq_cq_decompress_v():
(gdb) p t_pkt->data_len
$1 = 124
(gdb) p mcqe_n
$2 = 124
(gdb) p pos
$3 = 116
(gdb) p elts[pos + 3]
$10 = (struct rte_mbuf *) 0x0

It seems sometimes something is wrong in the initialization of
struct rte_mbuf *elts[].

(gdb) p/x (void*[124])elts[0]
$4 = {0x1e0106fc0, 0x1e00a8880, 0x1de8b4580, 0x1de716340, 0x1e00ad600, 
  0x1dfcc04c0, 0x1decf89c0, 0x1df656440, 0x1e02fc500, 0x1df7303c0, 
  0x1df7876c0, 0x1df0adfc0, 0x1dc44a8c0, 0x1dfb55040, 0x1df4b3480, 
  0x1ded87800, 0x1e07a64c0, 0x1dec066c0, 0x1dc59d9c0, 0x1de3ae540, 
  0x1debaf3c0, 0x1dfd69d40, 0x1dfd36f80, 0x1df073dc0, 0x1dffb3ec0, 
  0x1df0d7280, 0x1e0235b80, 0x1de4b3e40, 0x1df925900, 0x1df421f80, 
  0x1df021840, 0x1dfab7980, 0x1dfe572c0, 0x1dea3cb00, 0x1dbf5a540, 
  0x1de10aa00, 0x1dded8c00, 0x1df87c080, 0x1dee80f40, 0x1df596f00, 
  0x1dff20300, 0x1e05a4dc0, 0x1e0182800, 0x1e0257a00, 0x1e0323100, 
  0x1e0f3f100, 0x1df5ff140, 0x1dfbe17c0, 0x1de2c3680, 0x1dfd54080, 
  0x1de18afc0, 0x1dd81ecc0, 0x1de1f7f80, 0x1ded09900, 0x1df35b600, 
  0x1de57f540, 0x1df9e4e40, 0x1e0747d80, 0x1e024df00, 0x1ddf2d840, 
  0x1df95ad80, 0x1dedf47c0, 0x1de1ebdc0, 0x1e00e9ec0, 0x1e02febc0, 
  0x1dae22840, 0x1e051d3c0, 0x1df46f780, 0x1e0353800, 0x1e0ceb480, 
  0x1dfe9fd40, 0x1db58d440, 0x1e0526ec0, 0x1d61ebe40, 0x1dfe85300, 
  0x1df3b4fc0, 0x1ddbc0cc0, 0x1e04823c0, 0x1df724200, 0x1df9cf180, 
  0x1dfeb0c80, 0x1df4fe5c0, 0x1dff0f3c0, 0x1e051fa80, 0x1dd81c600, 
  0x1ddcef880, 0x1de30e7c0, 0x1ded803c0, 0x1de51e740, 0x1deffac40, 
  0x1df533a40, 0x1dd399240, 0x1deccf700, 0x1dfefbdc0, 0x1de9ab600, 
  0x1e0502980, 0x1dfb52980, 0x1dedabd40, 0x1e07e5440, 0x1dea91740, 
  0x1dd749ac0, 0x1e0d1e240, 0x1df86b140, 0x1df9013c0, 0x1dfc31680, 
  0x1dfa15540, 0x1e03694c0, 0x1e06dfb40, 0x1dfdf8b80, 0x1ddd8cf40, 
  0x1e03086c0, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 
  0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x1c0da2380, 0x0, 0x0, 0x0, 0x0, 
  0x7ffff7ff487c}

FYI:
The same DPDK application works fine on DPDK-18.11.2,
for Mellanox Technologies MT27800 Family [ConnectX-5]
with below firmware, even in the high load.
MLNX_OFED_LINUX-4.5-1.0.1.0-ubuntu18.04-x86_64.tgz
in Ubuntu 18.04.2 LTS 4.15.0-54-generic.

# mstflint -d 3b:00.0 q
Image type:            FS4
FW Version:            16.24.1000
FW Release Date:       26.11.2018
Product Version:       16.24.1000
Rom Info:              type=UEFI version=14.17.11 cpu=AMD64
                       type=PXE version=3.5.603 cpu=AMD64
Description:           UID                GuidsNumber
Base GUID:             506b4b0300086c56        8
Base MAC:              506b4b086c56            8
Image VSD:             N/A
Device VSD:            N/A
PSID:                  MT_0000000008
Security Attributes:   N/A

-- 
You are receiving this mail because:
You are the assignee for the bug.

                 reply	other threads:[~2019-07-22  1:56 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-334-3@http.bugs.dpdk.org/ \
    --to=bugzilla@dpdk.org \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.