From: Stephen Hemminger <stephen@networkplumber.org>
To: Thomas Monjalon <thomas@monjalon.net>
Cc: dev@dpdk.org, Gregory Etelson <getelson@nvidia.com>,
Dariusz Sosnowski <dsosnowski@nvidia.com>,
Viacheslav Ovsiienko <viacheslavo@nvidia.com>,
Bing Zhao <bingz@nvidia.com>, Ori Kam <orika@nvidia.com>,
Suanming Mou <suanmingm@nvidia.com>,
Matan Azrad <matan@nvidia.com>
Subject: Re: [PATCH v4 06/10] net/mlx5: support selective Rx
Date: Tue, 2 Jun 2026 06:53:32 -0700 [thread overview]
Message-ID: <20260602065332.1d9e82fb@phoenix.local> (raw)
In-Reply-To: <20260529133522.2646044-7-thomas@monjalon.net>
On Fri, 29 May 2026 15:34:00 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> From: Gregory Etelson <getelson@nvidia.com>
>
> Selective Rx may save some PCI bandwidth.
> Implement selective Rx in the (quite slow) scalar SPRQ Rx path
> mlx5_rx_burst() where the performance impact
> of the added condition branches is acceptable.
> Other Rx functions do not support this feature.
> When using selective Rx, mlx5_rx_burst will be selected.
>
> A null Memory Region (MR) is always allocated
> at shared device context initialization.
> The selective Rx capability is not advertised
> if this special MR allocation fails.
>
> For each Rx segment configured with a NULL mempool,
> a "null mbuf" is created.
> It is a fake mbuf allocated outside any mempool,
> used as a placeholder in the Rx ring.
> The null MR lkey is used in the WQE for these segments
> so the NIC writes received data to a discard buffer.
> The mbuf data room size is resolved from the first segment having a pool.
> For null segments, the buffer length is from the last seen pool,
> so that the WQE stride size remains consistent.
>
> In mlx5_rx_burst, discarded segments are not chained
> into the packet mbuf list, NB_SEGS is decremented accordingly,
> and no replacement buffer is allocated.
> A separate data_seg_len accumulator tracks the total length
> of delivered segments only.
> The packet length is adjusted to reflect only the data
> actually delivered to the application.
>
> Signed-off-by: Gregory Etelson <getelson@nvidia.com>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
> ---
AI review with Opus 4.8 and High setting found one issue:
Patch 6: net/mlx5: support selective Rx
Error: NULL pointer dereference when the first configured Rx segment is a
discard segment (mp == NULL).
In mlx5_rx_burst() the head mbuf and the chain tail are tracked like this:
if (pkt) {
if (rep->pool)
NEXT(tail) = rep;
else
--NB_SEGS(pkt);
}
...
if (seg->pool) {
tail = seg;
...
}
tail is only ever assigned inside "if (seg->pool)", and pkt is set to the
first processed segment unconditionally (pkt = seg in the !pkt block, no
pool guard). So if the first segment of a packet is a discard segment:
pkt becomes the null_mbuf (pool == NULL), tail stays NULL;
on the next (real) segment, rep->pool is set, so NEXT(tail) = rep executes with tail == NULL -> write through NULL.
Even without the crash, returning the pool-less null_mbuf as the packet
head is wrong: the application later frees it back to a NULL pool.
This is reachable, not theoretical. testpmd (patch 3) inserts a leading
mp==NULL segment whenever the first offset is > 0 (seg_offset > next_offset
with next_offset starting at 0), ethdev check_split (patch 2) now permits a
leading NULL mp, and mlx5_rxq_new() accepts it (first_mp is just the first
non-NULL pool; there is no requirement that rxseg[0].mp != NULL). The DTS
cases selective_rx_payload_only (rxoffs=[34]) and selective_rx_two_segments
(rxoffs=[14,...]) in patch 10 configure exactly this layout, and
mlx5_selective_rx_enabled() forces the scalar mlx5_rx_burst path, so the
buggy path is the one that runs.
Trace for rxoffs=34 / rxpkts=payload (segments: discard[0,34) real[34,290)
discard[290,max)):
iter0 (discard head): pkt == NULL, seg->pool == NULL -> pkt = null_mbuf,
tail not set; len(290) > DATA_LEN(34) -> ++NB_SEGS, continue.
iter1 (real seg): pkt set, rep->pool != NULL -> NEXT(tail==NULL)=rep.
Suggested fix: a discard segment must not become the packet head/tail.
Either reject rxseg[0].mp == NULL in mlx5_rxq_new() (cleanest, matches the
"deliver last N bytes" case being unsupported here), or make the data path
skip leading discard segments without assigning them to pkt and only set
pkt/tail on the first segment with a pool. If leading discard is intended
to be supported, the head selection and NEXT(tail) linking both need to
account for tail == NULL.
The same head/tail assumption also means a packet that falls entirely
within a leading discard segment would be returned with a NULL-pool head;
fixing the above covers that too.
next prev parent reply other threads:[~2026-06-02 13:53 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-02 16:09 [PATCH 1/2] ethdev: support selective Rx data Gregory Etelson
2026-02-02 16:09 ` [PATCH 2/2] app/testpmd: " Gregory Etelson
2026-02-02 17:37 ` Stephen Hemminger
2026-02-02 18:17 ` [PATCH 1/2] ethdev: " Stephen Hemminger
2026-05-09 21:56 ` [PATCH v2 00/10] selective Rx Thomas Monjalon
2026-05-09 21:56 ` [PATCH v2 01/10] app/testpmd: print Rx split capabilities Thomas Monjalon
2026-05-09 21:56 ` [PATCH v2 02/10] ethdev: introduce selective Rx Thomas Monjalon
2026-05-09 21:56 ` [PATCH v2 03/10] app/testpmd: support " Thomas Monjalon
2026-05-09 21:56 ` [PATCH v2 04/10] common/mlx5: add null MR functions Thomas Monjalon
2026-05-09 21:56 ` [PATCH v2 05/10] net/mlx5: fix Rx split segment counter type Thomas Monjalon
2026-05-09 21:56 ` [PATCH v2 06/10] net/mlx5: support selective Rx Thomas Monjalon
2026-05-09 21:56 ` [PATCH v2 07/10] net/mlx5: reindent previous changes Thomas Monjalon
2026-05-09 21:56 ` [PATCH v2 08/10] common/mlx5: remove callbacks for MR registration Thomas Monjalon
2026-05-09 21:57 ` [PATCH v2 09/10] dts: fix topology capability comparison Thomas Monjalon
2026-05-09 21:57 ` [PATCH v2 10/10] dts: add selective Rx tests Thomas Monjalon
2026-05-10 16:19 ` [PATCH v2 00/10] selective Rx Stephen Hemminger
2026-05-28 12:46 ` [PATCH v3 " Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 01/10] app/testpmd: print Rx split capabilities Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 02/10] ethdev: introduce selective Rx Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 03/10] app/testpmd: support " Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 04/10] common/mlx5: add null MR functions Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 05/10] net/mlx5: fix Rx split segment counter type Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 06/10] net/mlx5: support selective Rx Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 07/10] net/mlx5: reindent previous changes Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 08/10] common/mlx5: remove callbacks for MR registration Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 09/10] dts: fix topology capability comparison Thomas Monjalon
2026-05-28 12:46 ` [PATCH v3 10/10] dts: add selective Rx tests Thomas Monjalon
2026-05-29 13:33 ` [PATCH v4 00/10] selective Rx Thomas Monjalon
2026-05-29 13:33 ` [PATCH v4 01/10] app/testpmd: print Rx split capabilities Thomas Monjalon
2026-05-29 13:33 ` [PATCH v4 02/10] ethdev: introduce selective Rx Thomas Monjalon
2026-06-01 8:07 ` Andrew Rybchenko
2026-05-29 13:33 ` [PATCH v4 03/10] app/testpmd: support " Thomas Monjalon
2026-05-29 13:33 ` [PATCH v4 04/10] common/mlx5: add null MR functions Thomas Monjalon
2026-05-29 13:33 ` [PATCH v4 05/10] net/mlx5: fix Rx split segment counter type Thomas Monjalon
2026-05-29 13:34 ` [PATCH v4 06/10] net/mlx5: support selective Rx Thomas Monjalon
2026-06-02 13:53 ` Stephen Hemminger [this message]
2026-06-02 21:37 ` Thomas Monjalon
2026-05-29 13:34 ` [PATCH v4 07/10] net/mlx5: reindent previous changes Thomas Monjalon
2026-05-29 13:34 ` [PATCH v4 08/10] common/mlx5: remove callbacks for MR registration Thomas Monjalon
2026-05-29 13:34 ` [PATCH v4 09/10] dts: fix topology capability comparison Thomas Monjalon
2026-05-29 13:34 ` [PATCH v4 10/10] dts: add selective Rx tests Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 00/10] selective Rx Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 01/10] app/testpmd: print Rx split capabilities Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 02/10] ethdev: introduce selective Rx Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 03/10] app/testpmd: support " Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 04/10] common/mlx5: add null MR functions Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 05/10] net/mlx5: fix Rx split segment counter type Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 06/10] net/mlx5: support selective Rx Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 07/10] net/mlx5: reindent previous changes Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 08/10] common/mlx5: remove callbacks for MR registration Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 09/10] dts: fix topology capability comparison Thomas Monjalon
2026-06-02 21:38 ` [PATCH v5 10/10] dts: add selective Rx tests Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 00/10] selective Rx Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 01/10] app/testpmd: print Rx split capabilities Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 02/10] ethdev: introduce selective Rx Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 03/10] app/testpmd: support " Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 04/10] common/mlx5: add null MR functions Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 05/10] net/mlx5: fix Rx split segment counter type Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 06/10] net/mlx5: support selective Rx Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 07/10] net/mlx5: reindent previous changes Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 08/10] common/mlx5: remove callbacks for MR registration Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 09/10] dts: fix topology capability comparison Thomas Monjalon
2026-06-02 21:49 ` [PATCH v6 10/10] dts: add selective Rx tests Thomas Monjalon
2026-06-03 17:31 ` [PATCH v6 00/10] selective Rx Stephen Hemminger
2026-06-03 18:23 ` [PATCH v7 " Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 01/10] app/testpmd: print Rx split capabilities Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 02/10] ethdev: introduce selective Rx Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 03/10] app/testpmd: support " Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 04/10] common/mlx5: add null MR functions Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 05/10] net/mlx5: fix Rx split segment counter type Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 06/10] net/mlx5: support selective Rx Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 07/10] net/mlx5: reindent previous changes Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 08/10] common/mlx5: remove callbacks for MR registration Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 09/10] dts: fix topology capability comparison Thomas Monjalon
2026-06-03 18:23 ` [PATCH v7 10/10] dts: add selective Rx tests Thomas Monjalon
2026-06-04 19:30 ` [PATCH v8 0/9] selective Rx Thomas Monjalon
2026-06-04 19:30 ` [PATCH v8 1/9] app/testpmd: print Rx split capabilities Thomas Monjalon
2026-06-04 19:30 ` [PATCH v8 2/9] ethdev: introduce selective Rx Thomas Monjalon
2026-06-04 19:30 ` [PATCH v8 3/9] app/testpmd: support " Thomas Monjalon
2026-06-04 19:30 ` [PATCH v8 4/9] common/mlx5: add null MR functions Thomas Monjalon
2026-06-04 19:30 ` [PATCH v8 5/9] net/mlx5: fix Rx split segment counter type Thomas Monjalon
2026-06-04 19:30 ` [PATCH v8 6/9] net/mlx5: support selective Rx Thomas Monjalon
2026-06-04 19:30 ` [PATCH v8 7/9] common/mlx5: remove callbacks for MR registration Thomas Monjalon
2026-06-04 19:31 ` [PATCH v8 8/9] dts: fix topology capability comparison Thomas Monjalon
2026-06-04 19:31 ` [PATCH v8 9/9] dts: add selective Rx tests Thomas Monjalon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260602065332.1d9e82fb@phoenix.local \
--to=stephen@networkplumber.org \
--cc=bingz@nvidia.com \
--cc=dev@dpdk.org \
--cc=dsosnowski@nvidia.com \
--cc=getelson@nvidia.com \
--cc=matan@nvidia.com \
--cc=orika@nvidia.com \
--cc=suanmingm@nvidia.com \
--cc=thomas@monjalon.net \
--cc=viacheslavo@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox