All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla@dpdk.org
To: dev@dpdk.org
Subject: [Bug 97] rte_memcpy() moves data incorrectly on Ubuntu 18.04 on Intel Skylake
Date: Tue, 23 Oct 2018 17:48:09 +0000	[thread overview]
Message-ID: <bug-97-3@http.bugs.dpdk.org/> (raw)

https://bugs.dpdk.org/show_bug.cgi?id=97

            Bug ID: 97
           Summary: rte_memcpy() moves data incorrectly on Ubuntu 18.04 on
                    Intel Skylake
           Product: DPDK
           Version: 18.08
          Hardware: x86
                OS: Linux
            Status: CONFIRMED
          Severity: critical
          Priority: Normal
         Component: core
          Assignee: dev@dpdk.org
          Reporter: yskoh@mellanox.com
  Target Milestone: ---

Reported by:
        https://mails.dpdk.org/archives/dev/2018-September/111522.html

We've recently encountered a weird issue with Ubuntu 18.04 on the Skylake
server. I can always reproduce this crash and I could narrowed it down. I guess
it could be a GCC issue.


[1] How to reproduce
- ConnectX-4Lx/ConnectX-5 with mlx5 PMD in DPDK 18.02/18.05/18.08
- Ubuntu 18.04 on Intel Skylake server
- gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
- Testpmd crashes when it starts to forward traffic. Easy to reproduce.
- Only happens on the Skylake server.


[2] Failure point

The attached patch gives an insight of why it crashes. The following is the
result of the patch and the GDB commands.

In summary, rte_memcpy() doesn't work as expected. In __mempool_generic_put(),
there's rte_memcpy() to move the array of objects to the lcore cache. If I run
memcmp() right after rte_memcpy(dst, src, n), data in dst differs from data in
src. And it looks like some of data got shifted by a few bytes as you can see
below.

        [GDB command]
        $dst = 0x7ffff4e09ea8
        $src = 0x7fffce3fb970
        $n = 256
        x/32gx 0x7ffff4e09ea8
        x/32gx 0x7fffce3fb970
        testpmd: /home/mlnxtest/dpdk/build/include/rte_mempool.h:1140:
__mempool_generic_put: Assertion `0' failed.

        Thread 4 "lcore-slave-1" received signal SIGABRT, Aborted.
        [Switching to Thread 0x7fffce3ff700 (LWP 69913)]
        (gdb) x/32gx 0x7ffff4e09ea8
        0x7ffff4e09ea8: 0x00007fffaac38ec0      0x00007fffaac38500
        0x7ffff4e09eb8: 0x00007fffaac37b40      0x00007fffaac37180
        0x7ffff4e09ec8: 0x850000007fffaac3      0x7b4000007fffaac3
        0x7ffff4e09ed8: 0x00007fffaac35440      0x00007fffaac34a80
        0x7ffff4e09ee8: 0xaac3850000007fff      0xaac37b4000007fff
        0x7ffff4e09ef8: 0x00007fffaac32d40      0x00007fffaac32380
        0x7ffff4e09f08: 0x7fffaac385000000      0x7fffaac37b400000
        0x7ffff4e09f18: 0x00007fffaac30640      0x00007fffaac2fc80
        0x7ffff4e09f28: 0x00007fffaac2f2c0      0x00007fffaac2e900
        0x7ffff4e09f38: 0x00007fffaac2df40      0x00007fffaac2d580
        0x7ffff4e09f48: 0x00007fffaac2cbc0      0x00007fffaac2c200
        0x7ffff4e09f58: 0x00007fffaac2b840      0x00007fffaac2ae80
        0x7ffff4e09f68: 0x00007fffaac2a4c0      0x00007fffaac29b00
        0x7ffff4e09f78: 0x00007fffaac29140      0x00007fffaac28780
        0x7ffff4e09f88: 0x00007fffaac27dc0      0x00007fffaac27400
        0x7ffff4e09f98: 0x00007fffaac26a40      0x00007fffaac26080
        (gdb) x/32gx 0x7fffce3fb970
        0x7fffce3fb970: 0x00007fffaac38ec0      0x00007fffaac38500
        0x7fffce3fb980: 0x00007fffaac37b40      0x00007fffaac37180
        0x7fffce3fb990: 0x00007fffaac367c0      0x00007fffaac35e00
        0x7fffce3fb9a0: 0x00007fffaac35440      0x00007fffaac34a80
        0x7fffce3fb9b0: 0x00007fffaac340c0      0x00007fffaac33700
        0x7fffce3fb9c0: 0x00007fffaac32d40      0x00007fffaac32380
        0x7fffce3fb9d0: 0x00007fffaac319c0      0x00007fffaac31000
        0x7fffce3fb9e0: 0x00007fffaac30640      0x00007fffaac2fc80
        0x7fffce3fb9f0: 0x00007fffaac2f2c0      0x00007fffaac2e900
        0x7fffce3fba00: 0x00007fffaac2df40      0x00007fffaac2d580
        0x7fffce3fba10: 0x00007fffaac2cbc0      0x00007fffaac2c200
        0x7fffce3fba20: 0x00007fffaac2b840      0x00007fffaac2ae80
        0x7fffce3fba30: 0x00007fffaac2a4c0      0x00007fffaac29b00
        0x7fffce3fba40: 0x00007fffaac29140      0x00007fffaac28780
        0x7fffce3fba50: 0x00007fffaac27dc0      0x00007fffaac27400
        0x7fffce3fba60: 0x00007fffaac26a40      0x00007fffaac26080


AFAIK, AVX512F support is disabled by default in DPDK as it is still
experimental (CONFIG_RTE_ENABLE_AVX512=n). But with gcc optimization, AVX2
version of rte_memcpy() seems to be optimized with 512b instructions. If I
disable it by adding EXTRA_CFLAGS="-mno-avx512f", then it works fine and
doesn't
crash.

Do you have any idea regarding this issue or are you already aware of it?


Thanks,
Yongseok


$ git diff
diff --git a/config/common_base b/config/common_base
index ad03cf433..f512b5a88 100644
--- a/config/common_base
+++ b/config/common_base
@@ -275,8 +275,8 @@ CONFIG_RTE_LIBRTE_MLX4_TX_MP_CACHE=8
 #
 # Compile burst-oriented Mellanox ConnectX-4 & ConnectX-5 (MLX5) PMD
 #
-CONFIG_RTE_LIBRTE_MLX5_PMD=n
-CONFIG_RTE_LIBRTE_MLX5_DEBUG=n
+CONFIG_RTE_LIBRTE_MLX5_PMD=y
+CONFIG_RTE_LIBRTE_MLX5_DEBUG=y
 CONFIG_RTE_LIBRTE_MLX5_DLOPEN_DEPS=n
 CONFIG_RTE_LIBRTE_MLX5_TX_MP_CACHE=8

@@ -597,7 +597,7 @@ CONFIG_RTE_RING_USE_C11_MEM_MODEL=n
 #
 CONFIG_RTE_LIBRTE_MEMPOOL=y
 CONFIG_RTE_MEMPOOL_CACHE_MAX_SIZE=512
-CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n
+CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=y

 #
 # Compile Mempool drivers
diff --git a/lib/librte_mempool/rte_mempool.h
b/lib/librte_mempool/rte_mempool.h
index 8b1b7f7ed..9f48028d9 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -39,6 +39,7 @@
 #include <errno.h>
 #include <inttypes.h>
 #include <sys/queue.h>
+#include <assert.h>

 #include <rte_config.h>
 #include <rte_spinlock.h>
@@ -1123,6 +1124,22 @@ __mempool_generic_put(struct rte_mempool *mp, void *
const *obj_table,
        /* Add elements back into the cache */
        rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);

+       if(memcmp(&cache_objs[0], obj_table, sizeof(void *) * n)) {
+               printf("[GDB command] \n"
+                      "$dst = %p\n"
+                      "$src = %p\n"
+                      "$n = %ld\n"
+                      "x/%ldgx %p\n"
+                      "x/%ldgx %p\n",
+                      (void *)&cache_objs[0],
+                      (const void *)obj_table,
+                      sizeof(void *) * n,
+                      sizeof(void *) * n / 8, (void *)&cache_objs[0],
+                      sizeof(void *) * n / 8, (const void *)obj_table
+                      );
+               assert(0);
+       }
+
        cache->len += n;

        if (cache->len >= cache->flushthresh) {

-- 
You are receiving this mail because:
You are the assignee for the bug.

             reply	other threads:[~2018-10-23 17:48 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-23 17:48 bugzilla [this message]
2021-09-10 20:01 ` [dpdk-dev] [Bug 97] rte_memcpy() moves data incorrectly on Ubuntu 18.04 on Intel Skylake bugzilla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-97-3@http.bugs.dpdk.org/ \
    --to=bugzilla@dpdk.org \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.