[DPDK/ethdev Bug 1392] mlx5: random superfluous setting mbuf:hash.fdir.hi

All of lore.kernel.org
 help / color / mirror / Atom feed

From: bugzilla@dpdk.org
To: dev@dpdk.org
Subject: [DPDK/ethdev Bug 1392] mlx5: random superfluous setting mbuf:hash.fdir.hi
Date: Fri, 01 Mar 2024 13:40:14 +0000	[thread overview]
Message-ID: <bug-1392-3@http.bugs.dpdk.org/> (raw)

[-- Attachment #1: Type: text/plain, Size: 6683 bytes --]

https://bugs.dpdk.org/show_bug.cgi?id=1392

            Bug ID: 1392
           Summary: mlx5: random superfluous setting mbuf:hash.fdir.hi
           Product: DPDK
           Version: unspecified
          Hardware: x86
                OS: All
            Status: UNCONFIRMED
          Severity: minor
          Priority: Normal
         Component: ethdev
          Assignee: dev@dpdk.org
          Reporter: konstantin.v.ananyev@yandex.ru
  Target Milestone: ---

Reproducible on latest version of DPDK main branch (24.03 rc1).

While mocking with DPDK+mlx5 device(MT27800 Family [ConnectX-5] 1017)
noticed that mlx5_rx_burst_vec() on x86 can sometimes store
junk values inside  mbuf:hash.fdir.hi, even though rte_flow 'MARK'
action was not requested and RTE_MBUF_F_RX_FDIR is not set in the
mbuf:ol_flags.
It can be easily reproduced with testpmd too,
to catch it I added the following code into it:

--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -872,6 +872,10 @@ common_fwd_stream_receive(struct fwd_stream *fs, struct
rte_mbuf **burst,
        if (record_burst_stats)
                fs->rx_burst_stats.pkt_burst_spread[nb_rx]++;
        fs->rx_packets += nb_rx;
+
+       for (uint16_t i = 0; i != nb_rx; i++)
+               RTE_VERIFY(burst[i]->hash.fdir.hi == 0 ||
+                       (burst[i]->ol_flags & RTE_MBUF_F_RX_FDIR) != 0);
        return nb_rx;
 }

Then:
./dpdk-testpmd --lcores='49,51' -n 6 -a ca:00.0 -a ca:00.1 -a cb:00.0 -a
cb:00.1 --  --rxd=1024 --txd=1024 --rss-ip --rxq=2 --txq=2
...
EAL: PANIC in common_fwd_stream_receive():
line 877        assert "burst[i]->hash.fdir.hi == 0 || (burst[i]->ol_flags &
RTE_MBUF_F_RX_FDIR) != 0" failed
0: /home/kananyev/dpdk.org/x84_64-default-linuxapp-gcc10-dbg/app/dpdk-testpmd
(rte_dump_stack+0x1f) [1781b0d]
...
(gdb) print/x burst[i]->hash.fdir
$5 = {{{hash = 0xc176, id = 0xd46c}, lo = 0xd46cc176}, hi = 0x76c16c}
(gdb) print/x burst[i]->ol_flags
$6 = 0x182 // RTE_MBUF_F_RX_RSS_HASH |  RTE_MBUF_F_RX_IP_CKSUM_GOOD |
RTE_MBUF_F_RX_L4_CKSUM_GOOD

A bit of analysis (to put it upfront - I am not that familiar with mlx5 PMD,
so would need mlx5 maintainer to have a proper look):

first, hash.fdir.hi are being updated here:
#0  _mm_storeu_si128 (__B=..., __P=0x17eb98aa4)
    at /usr/lib64/gcc/x86_64-suse-linux/10/include/emmintrin.h:728
#1  rxq_cq_process_v (rxq=0x1727b1f80, cq=0x11f3c0a800, elts=0x1727b3240,
    pkts=0x7ffff2f23ea0, pkts_n=32, err=0x7ffff2f23d98, comp=0x7ffff2f21038)
    at ../drivers/net/mlx5/mlx5_rxtx_vec_sse.h:697

It just copies cq[..].sop_drop_qpn value here:
(gdb) print/x cq[pos + p3].sop_drop_qpn
$30 = 0x2b1fc59d

What is interesting, it copies it unconditionally:
const __m128i flow_mark_adj = _mm_set_epi32(rxq->mark * (-1), 0, 0, 0);
...
pkt_mb3 = _mm_shuffle_epi8(cqes[3], shuf_mask);
...
pkt_mb3 = _mm_add_epi32(pkt_mb3, flow_mark_adj);
...
_mm_storeu_si128((void *)&pkts[pos + 3]->pkt_len, pkt_mb3);

While scalar version of RX updates hash.fdir.hi conditionally, i.e. only when
rxq->mark is set.
Then later vector version realizes that corresponding cqe[] entry was a
compressed one, so
it continues with   rxq_cq_decompress_v() for the same mbuf.
But rxq_cq_decompress_v() updates hash.fdir.hi (and corresponding ol_flags)
*only* when rxq->mark is set.
Otherwise, it doesn't touch hash.fdir.hi at all.
So, as rxq->mark==0,  we ended up with 'mbuf.hash.fdir.hi' set to some junk
value,
while (ol_flags & RTE_MBUF_F_RX_FDIR) == 0.  

Note that by itself, it is probably not a big issue, as that happens when
RTE_MBUF_F_RX_FDIR is not set.
So majority of well behaving app will run just fine, and this small problem
will be un-noticed.
Though event (and probably sched) framework silently and un-conditionally
re-uses mbuf::hash.fdir.hi for its own purposes:
struct {
  uint32_t reserved1;
  uint16_t reserved2;
  uint16_t txq;
  /**< The event eth Tx adapter uses this field
   * to store Tx queue id.
   * @see rte_event_eth_tx_adapter_txq_set()
   */
} txadapter; /**< Eventdev ethdev Tx adapter */

So some of libevent functions expects hash.txadapter.txq to be set into
particular 
value, and such superfluous update of mbuf:hash.fdir.hi can cause a crash.
That's how I hit that issue first: https://bugs.dpdk.org/show_bug.cgi?id=1391

Plus it still looks like sort of inconsistency and undefined bheaviour.
So I decided to report it.
Possible fix (again, I am not sure that it the best way, but at least it
works):
inside rxq_cq_process_v(),  just zero-out hash.fdir.hi when rxq->mark is not
set:    

diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
index 2bdd1f676d..8ab5dcc19e 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h
@@ -600,6 +600,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct
mlx5_cqe *cq,
                              0,
                              rxq->crc_present * RTE_ETHER_CRC_LEN);
        const __m128i flow_mark_adj = _mm_set_epi32(rxq->mark * (-1), 0, 0, 0);
+       const __m128i flow_mark_msk =
+               _mm_set_epi32(rxq->mark * (-1), -1, -1, -1);
        /*
         * A. load first Qword (8bytes) in one loop.
         * B. copy 4 mbuf pointers from elts ring to returning pkts.
@@ -693,6 +695,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct
mlx5_cqe *cq,
                /* C.4 adjust flow mark. */
                pkt_mb3 = _mm_add_epi32(pkt_mb3, flow_mark_adj);
                pkt_mb2 = _mm_add_epi32(pkt_mb2, flow_mark_adj);
+               pkt_mb3 = _mm_and_si128(pkt_mb3, flow_mark_msk);
+               pkt_mb2 = _mm_and_si128(pkt_mb2, flow_mark_msk);
                /* D.1 fill in mbuf - rx_descriptor_fields1. */
                _mm_storeu_si128((void *)&pkts[pos + 3]->pkt_len, pkt_mb3);
                _mm_storeu_si128((void *)&pkts[pos + 2]->pkt_len, pkt_mb2);
@@ -720,6 +724,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct
mlx5_cqe *cq,
                /* C.4 adjust flow mark. */
                pkt_mb1 = _mm_add_epi32(pkt_mb1, flow_mark_adj);
                pkt_mb0 = _mm_add_epi32(pkt_mb0, flow_mark_adj);
+               pkt_mb1 = _mm_and_si128(pkt_mb1, flow_mark_msk);
+               pkt_mb0 = _mm_and_si128(pkt_mb0, flow_mark_msk);
                /* E.1 extract op_own byte. */
                op_own_tmp1 = _mm_unpacklo_epi32(cqes[0], cqes[1]);
                op_own = _mm_unpackhi_epi64(op_own_tmp1, op_own_tmp2);

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #2: Type: text/html, Size: 8911 bytes --]

                 reply	other threads:[~2024-03-01 13:40 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:2bdd1f676 dfblob:8ab5dcc19 )
 OR (
bs:"[DPDK/ethdev Bug 1392] mlx5: random superfluous setting mbuf:hash.fdir.hi" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-1392-3@http.bugs.dpdk.org/ \
    --to=bugzilla@dpdk.org \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.