* [PATCH] net/af_packet: fix parsing of numeric device args
From: Stephen Hemminger @ 2026-06-03 18:13 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, Denis Sergeev, stable, John W. Linville
In-Reply-To: <20260603170812.212262-1-denserg.edu@gmail.com>
This driver has several numeric arguments but it was using
atoi() which allows garbage and negative values.
Convert to a helper using strtoul() with upper bound.
First found by Linux Verification Center (linuxtesting.org) with SVACE.
Reported-by: Denis Sergeev <denserg.edu@gmail.com>
Fixes: 364e08f2bbc0 ("af_packet: add PMD for AF_PACKET-based virtual devices")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
drivers/net/af_packet/rte_eth_af_packet.c | 58 +++++++++++++++++++----
1 file changed, 48 insertions(+), 10 deletions(-)
diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
index 8303ff5ca9..b0ff22ea55 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -15,7 +15,9 @@
#include <rte_kvargs.h>
#include <bus_vdev_driver.h>
+#include <ctype.h>
#include <errno.h>
+#include <limits.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <arpa/inet.h>
@@ -1138,6 +1140,42 @@ rte_pmd_init_internals(struct rte_vdev_device *dev,
return -1;
}
+/* Parse an unsigned integer device argument. */
+static int
+parse_uint(const char *key, const char *value,
+ unsigned int *out, unsigned long limit)
+{
+ unsigned long val;
+ char *endptr;
+
+ if (value == NULL) {
+ PMD_LOG(ERR, "no value for argument \"%s\"", key);
+ return -1;
+ }
+
+ /* Skip leading whitespace so a leading sign can be detected. */
+ while (isspace((unsigned char)*value))
+ value++;
+
+ /* strtoul() silently accepts and negates a leading '-'. */
+ if (*value == '\0' || *value == '-') {
+ PMD_LOG(ERR, "invalid value \"%s\" for argument \"%s\"",
+ value, key);
+ return -1;
+ }
+
+ errno = 0;
+ val = strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0' || val > limit) {
+ PMD_LOG(ERR, "invalid value \"%s\" for argument \"%s\"",
+ value, key);
+ return -1;
+ }
+
+ *out = (unsigned int)val;
+ return 0;
+}
+
static int
rte_eth_from_packet(struct rte_vdev_device *dev,
int const *sockfd,
@@ -1168,7 +1206,9 @@ rte_eth_from_packet(struct rte_vdev_device *dev,
for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
pair = &kvlist->pairs[k_idx];
if (strstr(pair->key, ETH_AF_PACKET_NUM_Q_ARG) != NULL) {
- qpairs = atoi(pair->value);
+ if (parse_uint(pair->key, pair->value,
+ &qpairs, RTE_MAX_QUEUES_PER_PORT) < 0)
+ return -1;
if (qpairs < 1) {
PMD_LOG(ERR,
"%s: invalid qpairs value",
@@ -1178,7 +1218,8 @@ rte_eth_from_packet(struct rte_vdev_device *dev,
continue;
}
if (strstr(pair->key, ETH_AF_PACKET_BLOCKSIZE_ARG) != NULL) {
- blocksize = atoi(pair->value);
+ if (parse_uint(pair->key, pair->value, &blocksize, UINT_MAX) < 0)
+ return -1;
if (!blocksize) {
PMD_LOG(ERR,
"%s: invalid blocksize value",
@@ -1188,7 +1229,8 @@ rte_eth_from_packet(struct rte_vdev_device *dev,
continue;
}
if (strstr(pair->key, ETH_AF_PACKET_FRAMESIZE_ARG) != NULL) {
- framesize = atoi(pair->value);
+ if (parse_uint(pair->key, pair->value, &framesize, UINT_MAX) < 0)
+ return -1;
if (!framesize) {
PMD_LOG(ERR,
"%s: invalid framesize value",
@@ -1198,7 +1240,8 @@ rte_eth_from_packet(struct rte_vdev_device *dev,
continue;
}
if (strstr(pair->key, ETH_AF_PACKET_FRAMECOUNT_ARG) != NULL) {
- framecount = atoi(pair->value);
+ if (parse_uint(pair->key, pair->value, &framecount, UINT_MAX) < 0)
+ return -1;
if (!framecount) {
PMD_LOG(ERR,
"%s: invalid framecount value",
@@ -1208,13 +1251,8 @@ rte_eth_from_packet(struct rte_vdev_device *dev,
continue;
}
if (strstr(pair->key, ETH_AF_PACKET_QDISC_BYPASS_ARG) != NULL) {
- qdisc_bypass = atoi(pair->value);
- if (qdisc_bypass > 1) {
- PMD_LOG(ERR,
- "%s: invalid bypass value",
- name);
+ if (parse_uint(pair->key, pair->value, &qdisc_bypass, 1) < 0)
return -1;
- }
continue;
}
if (strstr(pair->key, ETH_AF_PACKET_FANOUT_MODE_ARG) != NULL) {
--
2.53.0
^ permalink raw reply related
* [PATCH 4/4] net/bnxt: fix RSS hash mode configuration for VFs
From: Mohammad Shuab Siddique @ 2026-06-03 17:51 UTC (permalink / raw)
To: dev; +Cc: kishore.padmanabha, stable, Mohammad Shuab Siddique
In-Reply-To: <20260603175137.1990204-1-Mohammad-Shuab.Siddique@broadcom.com>
From: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
Fixed VFs attempting global RSS configuration which is not
permitted by firmware. VFs (including trusted VFs) must use
per-VNIC RSS configuration with actual vnic_id and rss_ctx_idx
values.
Cc: stable@dpdk.org
Signed-off-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
---
drivers/net/bnxt/bnxt_hwrm.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bnxt/bnxt_hwrm.c b/drivers/net/bnxt/bnxt_hwrm.c
index 0c82935de9..afc948ac29 100644
--- a/drivers/net/bnxt/bnxt_hwrm.c
+++ b/drivers/net/bnxt/bnxt_hwrm.c
@@ -2970,8 +2970,22 @@ bnxt_hwrm_vnic_rss_cfg_hash_mode_p5(struct bnxt *bp, struct bnxt_vnic_info *vnic
req.hash_mode_flags = BNXT_HASH_MODE_INNERMOST;
else
req.hash_mode_flags = vnic->hash_mode;
- req.vnic_id = rte_cpu_to_le_16(BNXT_DFLT_VNIC_ID_INVALID);
- req.rss_ctx_idx = rte_cpu_to_le_16(BNXT_RSS_CTX_IDX_INVALID);
+
+ /* VFs must use actual vnic_id for per-VNIC configuration.
+ * PFs can use INVALID vnic_id for global configuration.
+ * This is because VFs don't have permission to configure
+ * global hash mode, even if they're trusted.
+ */
+ if (BNXT_VF(bp)) {
+ req.vnic_id = rte_cpu_to_le_16(vnic->fw_vnic_id);
+ req.rss_ctx_idx = rte_cpu_to_le_16(vnic->fw_grp_ids[0]);
+ PMD_DRV_LOG_LINE(DEBUG, "VF using per-VNIC RSS config (vnic_id=%u)",
+ vnic->fw_vnic_id);
+ } else {
+ req.vnic_id = rte_cpu_to_le_16(BNXT_DFLT_VNIC_ID_INVALID);
+ req.rss_ctx_idx = rte_cpu_to_le_16(BNXT_RSS_CTX_IDX_INVALID);
+ PMD_DRV_LOG_LINE(DEBUG, "PF using global RSS config");
+ }
PMD_DRV_LOG_LINE(DEBUG, "RSS CFG: Hash level %d", req.hash_mode_flags);
rc = bnxt_hwrm_send_message(bp, &req, sizeof(req),
--
2.47.3
^ permalink raw reply related
* [PATCH 3/4] net/bnxt: remove implicit integer sign-extension
From: Mohammad Shuab Siddique @ 2026-06-03 17:51 UTC (permalink / raw)
To: dev; +Cc: kishore.padmanabha, stable, Zoe Cheimets, Mohammad Shuab Siddique
In-Reply-To: <20260603175137.1990204-1-Mohammad-Shuab.Siddique@broadcom.com>
From: Zoe Cheimets <zoe.cheimets@broadcom.com>
In bnxt_ring.c, the result on line 389 was auto-sign extended by
the compiler because the arithmetic result is an int, but the
dpi_offset is uint64_t. Fix by casting the result to uint64_t
before the multiplication forces extension. To ensure that a
negative integer is not being cast to uint64_t, add a check in
the if-statement.
Fixes: 7a1f9c782b50 ("net/bnxt: add multi-doorbell support")
Cc: stable@dpdk.org
Signed-off-by: Zoe Cheimets <zoe.cheimets@broadcom.com>
Signed-off-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
---
drivers/net/bnxt/bnxt_ring.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bnxt/bnxt_ring.c b/drivers/net/bnxt/bnxt_ring.c
index ccca779b97..579b73d2ce 100644
--- a/drivers/net/bnxt/bnxt_ring.c
+++ b/drivers/net/bnxt/bnxt_ring.c
@@ -385,9 +385,10 @@ void bnxt_set_db(struct bnxt *bp,
db->doorbell = (char *)bp->doorbell_base + db_offset;
if (bp->fw_cap & BNXT_FW_CAP_MULTI_DB &&
- dpi != BNXT_PRIVILEGED_DPI) {
- dpi_offset = (dpi - bp->nq_dpi_start) *
- bp->db_page_size;
+ dpi != BNXT_PRIVILEGED_DPI &&
+ dpi >= bp->nq_dpi_start) {
+ dpi_offset = (uint64_t)(dpi - bp->nq_dpi_start) *
+ bp->db_page_size;
db->doorbell = (char *)db->doorbell + dpi_offset;
}
db->db_key64 |= (uint64_t)fid << DBR_XID_SFT;
--
2.47.3
^ permalink raw reply related
* [PATCH 2/4] net/bnxt: fix QP resource count in backing store config
From: Mohammad Shuab Siddique @ 2026-06-03 17:51 UTC (permalink / raw)
To: dev; +Cc: kishore.padmanabha, stable, Ajit Khaparde,
Mohammad Shuab Siddique
In-Reply-To: <20260603175137.1990204-1-Mohammad-Shuab.Siddique@broadcom.com>
From: Ajit Khaparde <ajit.khaparde@broadcom.com>
The driver is not passing the QP1 count while
configuring backing store for QP type.
This can result in a smaller set of entries allocated by the
driver with the firmware.
The number of entries is provided by the firmware as a
part of backing store qcaps v2 HWRM command.
Fixes: 5b5d398434f9 ("net/bnxt: add support for backing store v2")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
---
drivers/net/bnxt/bnxt_ethdev.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index b677f9491d..7d70d0f3ec 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -5404,11 +5404,11 @@ int bnxt_alloc_ctx_pg_tbls(struct bnxt *bp)
if (ctxm->type == HWRM_FUNC_BACKING_STORE_CFG_V2_INPUT_TYPE_CQ)
entries = ctxm->cq_l2_entries;
else if (ctxm->type == HWRM_FUNC_BACKING_STORE_CFG_V2_INPUT_TYPE_QP)
- entries = ctxm->qp_l2_entries;
+ entries = ctxm->qp_l2_entries + ctxm->qp_qp1_entries;
else if (ctxm->type == HWRM_FUNC_BACKING_STORE_CFG_V2_INPUT_TYPE_MRAV)
entries = ctxm->mrav_av_entries;
else if (ctxm->type == HWRM_FUNC_BACKING_STORE_CFG_V2_INPUT_TYPE_TIM)
- entries = ctx2->qp_l2_entries;
+ entries = ctx2->qp_l2_entries + ctx2->qp_qp1_entries;
entries = clamp_t(uint32_t, entries, ctxm->min_entries,
ctxm->max_entries);
ctx_pg[i].entries = entries;
--
2.47.3
^ permalink raw reply related
* [PATCH 1/4] net/bnxt: modify check for short Tx BDs
From: Mohammad Shuab Siddique @ 2026-06-03 17:51 UTC (permalink / raw)
To: dev; +Cc: kishore.padmanabha, stable, Ajit Khaparde,
Mohammad Shuab Siddique
In-Reply-To: <20260603175137.1990204-1-Mohammad-Shuab.Siddique@broadcom.com>
From: Ajit Khaparde <ajit.khaparde@broadcom.com>
There is no need to use the long BDs for transmits
where only checksum offload is needed.
Modify the check for long BD and use long BDs only in cases
where TSO and other offloads are requested.
Fixes: 527b10089cc5 ("net/bnxt: optimize Tx completion handling")
Cc: stable@dpdk.org
Signed-off-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Signed-off-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
---
drivers/net/bnxt/bnxt_txr.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/bnxt/bnxt_txr.c b/drivers/net/bnxt/bnxt_txr.c
index 27758898b0..7ef5b15ae8 100644
--- a/drivers/net/bnxt/bnxt_txr.c
+++ b/drivers/net/bnxt/bnxt_txr.c
@@ -111,8 +111,7 @@ int bnxt_init_tx_ring_struct(struct bnxt_tx_queue *txq, unsigned int socket_id)
static bool
bnxt_xmit_need_long_bd(struct rte_mbuf *tx_pkt, struct bnxt_tx_queue *txq)
{
- if (tx_pkt->ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_TCP_CKSUM |
- RTE_MBUF_F_TX_UDP_CKSUM | RTE_MBUF_F_TX_IP_CKSUM |
+ if (tx_pkt->ol_flags & (RTE_MBUF_F_TX_TCP_SEG |
RTE_MBUF_F_TX_VLAN | RTE_MBUF_F_TX_OUTER_IP_CKSUM |
RTE_MBUF_F_TX_TUNNEL_GRE | RTE_MBUF_F_TX_TUNNEL_VXLAN |
RTE_MBUF_F_TX_TUNNEL_GENEVE | RTE_MBUF_F_TX_IEEE1588_TMST |
--
2.47.3
^ permalink raw reply related
* [PATCH 0/4] net/bnxt: miscellaneous bug fixes
From: Mohammad Shuab Siddique @ 2026-06-03 17:51 UTC (permalink / raw)
To: dev; +Cc: kishore.padmanabha, stable, Mohammad Shuab Siddique
From: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
This series collects four independent bug fixes for the bnxt PMD:
- Eliminate unnecessary long TX BDs when only checksum offload is needed
- Pass QP1 resource count correctly when configuring backing store
- Fix implicit integer sign-extension in the doorbell calculation
- Prevent VFs from attempting global RSS configuration
All patches carry Fixes: tags and Cc: stable@dpdk.org.
Ajit Khaparde (2):
net/bnxt: modify check for short Tx BDs
net/bnxt: fix QP resource count in backing store config
Mohammad Shuab Siddique (1):
net/bnxt: fix RSS hash mode configuration for VFs
Zoe Cheimets (1):
net/bnxt: remove implicit integer sign-extension
drivers/net/bnxt/bnxt_ethdev.c | 4 ++--
drivers/net/bnxt/bnxt_hwrm.c | 18 ++++++++++++++++--
drivers/net/bnxt/bnxt_ring.c | 7 ++++---
drivers/net/bnxt/bnxt_txr.c | 3 +--
4 files changed, 23 insertions(+), 9 deletions(-)
--
2.47.3
^ permalink raw reply
* Re: [PATCH v6 00/10] selective Rx
From: Stephen Hemminger @ 2026-06-03 17:31 UTC (permalink / raw)
To: Thomas Monjalon; +Cc: dev
In-Reply-To: <20260602215022.3698662-1-thomas@monjalon.net>
On Tue, 2 Jun 2026 23:49:12 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> This is a new feature in ethdev with tests and mlx5 implementation.
> Selective Rx allows to receive partial data,
> saving some hardware bandwidth.
>
> Note 1: mlx5 support patch is not correctly indented
> to make review easier. An indent patch follows to be squashed.
>
> Note 2: DTS patch is an attempt to test the feature on day 1,
> it is not mandatory if it is blocking the merge.
>
> v2: rework after Gregory
> v3: fix bugs found with AI by Stephen
> v4: fix packet type in DTS test
> v5: fix mlx5 Rx to handle discarding first segment
> v6: fix reindent patch
>
>
> Gregory Etelson (4):
> ethdev: introduce selective Rx
> app/testpmd: support selective Rx
> common/mlx5: add null MR functions
> net/mlx5: support selective Rx
>
> Thomas Monjalon (6):
> app/testpmd: print Rx split capabilities
> net/mlx5: fix Rx split segment counter type
> net/mlx5: reindent previous changes
> common/mlx5: remove callbacks for MR registration
> dts: fix topology capability comparison
> dts: add selective Rx tests
>
> app/test-pmd/config.c | 17 ++
> app/test-pmd/testpmd.c | 71 ++++-
> devtools/libabigail.abignore | 7 +
> doc/guides/nics/features.rst | 14 +
> doc/guides/nics/features/default.ini | 1 +
> doc/guides/nics/features/mlx5.ini | 1 +
> doc/guides/nics/mlx5.rst | 86 ++++--
> doc/guides/rel_notes/release_26_07.rst | 11 +
> doc/guides/testpmd_app_ug/run_app.rst | 20 ++
> drivers/common/mlx5/linux/mlx5_common_verbs.c | 53 ++--
> drivers/common/mlx5/mlx5_common.c | 6 +-
> drivers/common/mlx5/mlx5_common_mr.c | 37 ++-
> drivers/common/mlx5/mlx5_common_mr.h | 29 +-
> drivers/common/mlx5/windows/mlx5_common_os.c | 31 ++-
> drivers/compress/mlx5/mlx5_compress.c | 4 +-
> drivers/crypto/mlx5/mlx5_crypto.h | 2 -
> drivers/crypto/mlx5/mlx5_crypto_gcm.c | 6 +-
> drivers/net/mlx5/mlx5.c | 7 +
> drivers/net/mlx5/mlx5.h | 4 +-
> drivers/net/mlx5/mlx5_ethdev.c | 25 ++
> drivers/net/mlx5/mlx5_flow_aso.c | 21 +-
> drivers/net/mlx5/mlx5_flow_hw.c | 11 +-
> drivers/net/mlx5/mlx5_flow_quota.c | 6 +-
> drivers/net/mlx5/mlx5_hws_cnt.c | 19 +-
> drivers/net/mlx5/mlx5_rx.c | 186 ++++++++-----
> drivers/net/mlx5/mlx5_rx.h | 5 +-
> drivers/net/mlx5/mlx5_rxq.c | 75 +++--
> drivers/net/mlx5/mlx5_trigger.c | 64 ++++-
> dts/api/capabilities.py | 2 +
> dts/api/testpmd/__init__.py | 17 ++
> dts/api/testpmd/types.py | 6 +
> dts/framework/testbed_model/capability.py | 10 +-
> dts/tests/TestSuite_rx_split.py | 262 ++++++++++++++++++
> lib/ethdev/rte_ethdev.c | 24 +-
> lib/ethdev/rte_ethdev.h | 14 +-
> 35 files changed, 880 insertions(+), 274 deletions(-)
> create mode 100644 dts/tests/TestSuite_rx_split.py
>
Still has some issues:
Patch 6: net/mlx5: support selective Rx
Error: after a non-critical Rx error CQE, the next packet is processed with
a stale length and stale CQE; the queue effectively wedges.
To avoid re-polling the CQE on each leading discard segment of one packet,
v6 wraps the poll in "if (len == 0)" and resets len to 0 at the points where
a packet finishes. Two of the three exits reset it (normal completion and
the "no real segment found" skip both do "len = 0;"), but the non-critical
error-CQE path does not:
if (unlikely(len & MLX5_ERROR_CQE_MASK)) {
if (seg->pool)
rte_mbuf_raw_free(rep);
if (len == MLX5_CRITICAL_ERROR_CQE_RET) {
rq_ci = rxq->rq_ci << sges_n;
break;
}
rq_ci >>= sges_n;
rq_ci += skip_cnt;
rq_ci <<= sges_n;
MLX5_ASSERT(!pkt);
continue; /* len still == MLX5_ERROR_CQE_MASK (0x40000000) */
}
MLX5_ERROR_CQE_MASK is 0x40000000, so len is non-zero on this continue. The
next iteration hits "if (len == 0)" == false and skips mlx5_rx_poll_len()
entirely. The following real segment then sets pkt with PKT_LEN(pkt) ==
0x40000000 and rxq_cq_to_mbuf() reads the stale cqe/mcqe. Because
0x40000000 > DATA_LEN(seg), the "more data" branch keeps consuming
descriptors as one bogus giant packet, walking the whole ring and emitting
nothing. Pre-v6 this worked because the poll was unconditional in the !pkt
block.
Suggested fix: reset len before the error-skip continue, matching the other
two exits:
rq_ci >>= sges_n;
rq_ci += skip_cnt;
rq_ci <<= sges_n;
MLX5_ASSERT(!pkt);
len = 0;
continue;
Minor: "tail = seg;" is now set in both the "first real segment" block and
the "real segment: replenish WQE" block; the first is redundant since the
second always runs for the same segment. Harmless, but the duplicate can be
dropped.
^ permalink raw reply
* Re: [PATCH] net/ark: validate IPv4 octets in address parser
From: Stephen Hemminger @ 2026-06-03 17:29 UTC (permalink / raw)
To: Denis Sergeev; +Cc: dev, shepard.siegel, ed.czeck, john.miller, sdl.dpdk
In-Reply-To: <20260603054742.120101-1-denserg.edu@gmail.com>
On Wed, 3 Jun 2026 08:47:38 +0300
Denis Sergeev <denserg.edu@gmail.com> wrote:
> The IPv4 parsing helper used by pktgen and pktchkr reads each octet
> with "%u". This allows values above 255 to be accepted from the
> configuration file and encoded into unintended device register values.
>
> Reject parsed octets outside the IPv4 byte range before assembling
> the 32-bit address.
>
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
>
> Fixes: 9c7188a68d7b ("net/ark: provide API for hardware modules pktchkr and pktgen")
>
> Signed-off-by: Denis Sergeev <denserg.edu@gmail.com>
> ---
Have to ask, why a driver is rolling it own IP address parser, that is a bad idea.
But then the whole builtin vendor pktgen in a driver is bad idea.
Claude scan of the driver found lots more issues.
Hate to think what Mythos would find here...
^ permalink raw reply
* [DPDK/ethdev Bug 1950] Lots of problems found from review of ARK PMD
From: bugzilla @ 2026-06-03 17:24 UTC (permalink / raw)
To: dev
http://bugs.dpdk.org/show_bug.cgi?id=1950
Bug ID: 1950
Summary: Lots of problems found from review of ARK PMD
Product: DPDK
Version: 22.03
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: major
Priority: Normal
Component: ethdev
Assignee: dev@dpdk.org
Reporter: stephen@networkplumber.org
Target Milestone: ---
In addition to the problems identified by LF scan, using Claude Opus found a
lot more in this driver:
Component: drivers/net/ark
Version: main
Manual review of the full ark driver tree (ark_ethdev.c, ark_ethdev_rx.c,
ark_ethdev_tx.c, ark_pktgen.{c,h}, ark_pktchkr.{c,h}, ark_mpu.c, ark_pktdir.c)
turned up the issues below, in addition to existing scanner reports (strncat
length-arg misuse and non-NUL-terminating strncpy in process_file_args;
unchecked rte_zmalloc_socket before memcpy in ark_dev_init).
== SECURITY (suggest separate tracking item) ==
S1. Unvalidated dlopen() driver overload in check_for_ext() (ark_ethdev.c)
Loads an arbitrary .so from the ARK_EXT_PATH environment variable
(dlopen(path, RTLD_LOCAL | RTLD_LAZY)) and resolves ~12 rte_pmd_ark_*
symbols that are cast to function pointers and called on the control and
data paths. No path validation, allow-list, ownership/permission check, or
integrity check. DPDK apps commonly run privileged, so any process able to
influence the environment gets arbitrary in-process code execution -- a
local code-injection / privesc vector. RTLD_LAZY defeats up-front symbol
validation; unchecked dlsym casts are UB on ABI mismatch.
Recommendation: gate behind a build option off by default; refuse to load
when privileged or when the file is not root-owned / is world-writable;
document as debug-only.
== HIGH - crashes / memory safety ==
1. NULL deref in ark_pktgen_setup() (ark_pktgen.c): pktgen toptions[] has no
"port" entry (only pktchkr's does), but the run branch calls
options("port")->v.INT; options() returns NULL on miss. Crashes whenever
Pkt_gen config sets run=1.
2. ark_tx_queue_release() missing NULL guard (ark_ethdev_tx.c): ark_dev_close()
calls it for every tx_queues[] slot and ark_dev_stop() loops over
ark_tx_queue_stop() without checks; both deref queue->ddm / cons_index. RX
side guards (queue == 0); TX side does not -> NULL deref on close/stop with
any un-setup TX slot.
3. ark_dev_rx_queue_count() missing NULL guard (ark_ethdev_rx.c): dereferences
queue with no queue == 0 check, unlike every other entry point in the file.
4. mac_addrs alloc failure not handled (ark_ethdev.c ark_dev_init, ~L381):
failure is logged but execution falls through with mac_addrs == NULL; the
secondary-port path (~L461) uses goto error, the primary does not.
5. error: path leaks secondary ports (ark_ethdev.c ark_dev_init, ~L480): only
the primary mac_addrs is freed; already-probed secondary eth_devs and their
dev_private/mac_addrs leak.
== HIGH - silent data corruption ==
6. OTLONG written through wrong union member (ark_pktgen.c pmd_set_arg,
ark_pktchkr.c set_arg): "case OTLONG: o->v.INT = atoll(val);" truncates to
32 bits and leaves stale upper bits in .LONG. Affects num_pkts,
src_mac_addr, dst_mac_addr. Should be o->v.LONG.
7. src_mac_addr / num_pkts read via .v.INT in setup (both *_setup()): these are
OTLONG; reading .INT drops upper MAC bytes (default 0xdC3cF6425060).
Adjacent
dst_mac_addr correctly uses .LONG.
== MEDIUM ==
8. parse_ipv4_string() return type (ark_pktgen.c, ark_pktchkr.c): declared
int32_t but builds values up to 0xFFFFFFFF via unsigned math; default
169.254.10.240 goes negative on conversion. Should be uint32_t. "return 0"
on parse failure also aliases a valid 0.0.0.0.
9. Divide-by-zero risk in ark_api_num_queues_per_port() (ark_mpu.c):
num_queues / ark_ports; ark_ports derives from untrusted
user_ext.dev_get_port_count(), unchecked for 0.
10. pkt_size_min > pkt_size_max defaults: 2006/2005 vs 1514 in both option
tables; likely a typo.
11. dst_port / src_port default 65536: out of range for a 16-bit port.
== LOW / cleanup ==
12. Misplaced log in ark_pktchkr_wait_done(): "pktgen done" prints every loop
iteration instead of once after completion.
13. ark_pktchkr_get_pkts_sent() returns int for a uint32_t register
(signedness); pktgen version returns uint32_t.
14. *_parse() tokenizer/doc mismatch: comments document comma-separated args;
toks[] omits ','.
15. inst->dev_info never set/used in ark_pkt_gen_inst / ark_pkt_chkr_inst
(allocated via rte_malloc, not zmalloc).
16. check_for_ext() "found" is dead: never reassigned; only the -1
dlopen-failure path is meaningful.
17. ark_mpu_configure() logs ring_size (uint32_t) with %d.
== PROCESS / TOOLING ==
P1. Non-inclusive naming not caught by review tooling: en_slaved_start
(ark_pktgen.c toptions[] and the ark_pktgen_set_pkt_ctrl() parameter) uses
terminology DPDK has been retiring. The substantive point is that
checkpatches.sh / the checkpatch master|slave test should have flagged this
and did not -- indicating the file was either grandfathered in or the check
is not being run against this tree. Coverage gap in the tooling, not just a
naming nit. Suggest renaming (e.g. en_gated_start) and reviewing why the
automated check missed it.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply
* [PATCH v2] net/af_packet: fix qpairs argument upper bound check
From: Denis Sergeev @ 2026-06-03 17:08 UTC (permalink / raw)
To: dev; +Cc: stephen, Denis Sergeev
In-Reply-To: <20260603044228.117357-1-denserg.edu@gmail.com>
The qpairs vdev argument was parsed with atoi(), which does no
validation: trailing garbage is ignored and a negative input such
as "-1" wraps to UINT_MAX when stored in the unsigned qpairs field,
passing the existing "< 1" check and reaching rte_pmd_init_internals()
as nb_queues. This causes excessive socket and memory allocation in
the per-queue loop.
Parse the value with strtoul() and reject non-numeric input, trailing
characters, negative values and values outside the
[1, RTE_MAX_QUEUES_PER_PORT] range.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: ccd37d341e8d ("net/af_packet: remove queue number limitation")
Signed-off-by: Denis Sergeev <denserg.edu@gmail.com>
---
v2:
* Replace atoi() with strtoul() and validate the parsed value
drivers/net/af_packet/rte_eth_af_packet.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
index 8303ff5ca9..b7758a5c75 100644
--- a/drivers/net/af_packet/rte_eth_af_packet.c
+++ b/drivers/net/af_packet/rte_eth_af_packet.c
@@ -1168,13 +1168,20 @@ rte_eth_from_packet(struct rte_vdev_device *dev,
for (k_idx = 0; k_idx < kvlist->count; k_idx++) {
pair = &kvlist->pairs[k_idx];
if (strstr(pair->key, ETH_AF_PACKET_NUM_Q_ARG) != NULL) {
- qpairs = atoi(pair->value);
- if (qpairs < 1) {
+ char *endptr;
+ unsigned long num;
+
+ errno = 0;
+ num = strtoul(pair->value, &endptr, 10);
+ if (errno != 0 || endptr == pair->value ||
+ *endptr != '\0' || pair->value[0] == '-' ||
+ num < 1 || num > RTE_MAX_QUEUES_PER_PORT) {
PMD_LOG(ERR,
"%s: invalid qpairs value",
name);
return -1;
}
+ qpairs = num;
continue;
}
if (strstr(pair->key, ETH_AF_PACKET_BLOCKSIZE_ARG) != NULL) {
--
2.50.1
^ permalink raw reply related
* [PATCH v3] net/ark: fix null dereference on allocation failure
From: Denis Sergeev @ 2026-06-03 16:32 UTC (permalink / raw)
To: dev; +Cc: shepard.siegel, ed.czeck, john.miller, stephen, stable,
Denis Sergeev
In-Reply-To: <20260603053214.119296-1-denserg.edu@gmail.com>
rte_zmalloc_socket() can return NULL when memory is exhausted.
The result is passed directly to memcpy() without a NULL check,
leading to undefined behavior. Add a NULL check consistent with
the existing check for mac_addrs allocation in the same function.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: bf73ee28f47b ("net/ark: support single function with multiple port")
Cc: stable@dpdk.org
Signed-off-by: Denis Sergeev <denserg.edu@gmail.com>
---
v3:
* Use a single-line error message without "!" and "Exiting"
v2:
* Fix Fixes: tag to use 12-char sha1 (checkpatch BAD_FIXES_TAG)
drivers/net/ark/ark_ethdev.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
index 8b25ed948f..546a44704f 100644
--- a/drivers/net/ark/ark_ethdev.c
+++ b/drivers/net/ark/ark_ethdev.c
@@ -445,6 +445,10 @@ ark_dev_init(struct rte_eth_dev *dev)
sizeof(struct ark_adapter),
RTE_CACHE_LINE_SIZE,
rte_socket_id());
+ if (eth_dev->data->dev_private == NULL) {
+ ARK_PMD_LOG(ERR, "Memory allocation for dev_private failed\n");
+ goto error;
+ }
memcpy(eth_dev->data->dev_private, ark,
sizeof(struct ark_adapter));
--
2.50.1
^ permalink raw reply related
* Re: [PATCH v2] eal: fix data race in hugepage prefault
From: Thomas Monjalon @ 2026-06-03 15:57 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, stable, Michal Sieron, Anatoly Burakov, Bruce Richardson
In-Reply-To: <20260601160019.116067-1-stephen@networkplumber.org>
01/06/2026 18:00, Stephen Hemminger:
> The prefault step in alloc_seg() reads a value from the hugepage and
> writes it back unchanged to force the kernel to commit the backing
> page. The read and write were not atomic, which races with concurrent
> access to the same physical page from a secondary process attaching
> to the hugetlbfs-backed mapping during rte_eal_init().
>
> Replace the non-atomic load+store with a single atomic fetch-or of
> zero. This touches the page with an atomic read-modify-write without
> changing its contents, eliminating the race while preserving the
> original intent of forcing a write fault.
>
> Fixes: 0f1631be24bd ("mem: fix page fault trigger")
> Cc: stable@dpdk.org
>
> Reported-by: Michal Sieron <michal.sieron@nokia.com>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] eal: fix function versioning with LTO
From: Stephen Hemminger @ 2026-06-03 15:53 UTC (permalink / raw)
To: David Marchand; +Cc: dev, stable, Thomas Monjalon
In-Reply-To: <CAJFAV8w+bUJGTBSOgqp8xOHXm5av2xdFMo8TNwr=kP87R30AYg@mail.gmail.com>
On Wed, 3 Jun 2026 12:01:48 +0200
David Marchand <david.marchand@redhat.com> wrote:
> Hello,
>
> On Wed, 3 Jun 2026 at 00:57, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > When using function versioning and building with Link Time Optimization,
> > the compiler does not see the __asm__ annotation of symbols and
> > therefore thinks there are two versions of the same symbol.
> >
> > The fix is to use compiler symver attribute on the function which
> > was added in GCC 10. Keep the older method for backward compatibility
> > with older compilers.
> >
> > Bugzilla ID: 1949
> > Fixes: e30e194c4d06 ("eal: rework function versioning macros")
>
> We never used the symver stuff, so it seems unlikely the issue was
> introduced with this rework.
>
> The fact that clang does not support this attribute is a concern.
Clang doesn't have this problem. It works as is.
> > Cc: stable@dpdk.org
>
> Why do we need to backport?
Well LTO has worked for a long time, it is not experimental just
not commonly done since it takes so long to build.
We were doing it years ago at MSFT.
>
> LTO is kind of experimental, so it seems good enough to reply "not
> expected to work in older LTS" if someone reported an issue.
>
> And in practice, no LTS release call the versioning macros, since a
> LTS drops all compatibility.
>
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> I would like to reproduce, but I can't build main with LTO.
> What patches did you apply locally to avoid warnings on the hash library?
I use Debian testing and GCC 15 but shows up on older versions as well
I get no warnings building main
$ git am ~/Downloads/v2-ethdev-add-buffer-size-parameter-to-rte_eth_dev_get_name_by_port.patch
$ meson setup build-lto -Db_lto=true
$ ninja -C build-lto
ninja: Entering directory `build-lto'
[620/3781] Linking target lib/librte_ethdev.so.26.2
FAILED: [code=1] lib/librte_ethdev.so.26.2
cc -o lib/librte_ethdev.so.26.2 lib/librte_ethdev.so.26.2.p/ethdev_ethdev_driver.c.o lib/librte_ethdev.so.26.2.p/ethdev_ethdev_private.c.o lib/librte_ethdev.so.26.2.p/ethdev_ethdev_profile.c.o lib/librte_ethdev.so.26.2.p/ethdev_ethdev_trace_points.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_class_eth.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_ethdev.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_ethdev_cman.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_ethdev_telemetry.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_flow.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_mtr.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_tm.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_telemetry.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_common.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_8079.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_8472.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_8636.c.o lib/librte_ethdev.so.26.2.p/ethdev_ethdev_linux_ethtool.c.o -flto=auto -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared -fPIC -Wl,-soname,librte_ethdev.so.26 -Wl,--no-as-needed -Wl,--undefined-version -pthread -Wl,--start-group -lm -ldl -lnuma -lfdt '-Wl,-rpath,$ORIGIN/' lib/librte_eal.so.26.2 lib/librte_kvargs.so.26.2 lib/librte_log.so.26.2 lib/librte_telemetry.so.26.2 lib/librte_argparse.so.26.2 lib/librte_net.so.26.2 lib/librte_mbuf.so.26.2 lib/librte_mempool.so.26.2 lib/librte_ring.so.26.2 lib/librte_meter.so.26.2 -Wl,--version-script=/home/shemminger/DPDK/lto/build-lto/lib/ethdev_exports.map /usr/lib/x86_64-linux-gnu/libbsd.so /usr/lib/x86_64-linux-gnu/libarchive.so -Wl,--end-group
/tmp/cc3RQyqL.s: Assembler messages:
/tmp/cc3RQyqL.s: Error: invalid attempt to declare external version name as default in symbol `rte_eth_dev_get_name_by_port@@DPDK_27'
make: *** [/tmp/ccVzgiZ2.mk:2: /tmp/ccTlGfA9.ltrans0.ltrans.o] Error 1
make: *** Waiting for unfinished jobs....
lto-wrapper: fatal error: make returned 2 exit status
compilation terminated.
^ permalink raw reply
* Re: [PATCH v4 0/2] net/intel: optimize for fast-free hint
From: Bruce Richardson @ 2026-06-03 15:56 UTC (permalink / raw)
To: Morten Brørup; +Cc: dev, ciara.loftus
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F658DD@smartserver.smartshare.dk>
On Tue, Jun 02, 2026 at 06:26:13PM +0200, Morten Brørup wrote:
> > From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> > Sent: Tuesday, 2 June 2026 17.45
> >
> > When the fast-free hint is provided to the driver we know that the
> > mbufs
> > have refcnt of 1 and are from the same mempool. Therefore, we can
> > optimize a bit for this case even in the scalar path of our drivers.
> >
> > ---
> > v4:
> > * add precursor patch to adjust mbuf pointers so that the DD bit
> > is written to a descriptor with a valid mbuf pointer associated
> > with it.
> >
> > v3:
> > * used mbuf_raw_free_bulk rather than mempool function directly
> > * check for fast_free via mp pointer rather than flags
> > * remove unnecessary prefetches
> >
> > V2: Fix issues with original submission:
> > * missed check for NULL mbufs
> > * fixed issue with freeing directly from sw_ring in scalar path which
> > doesn't work as thats not a flag array of pointers
> > * fixed missing null assignment in case of large segments for TSO
> >
> >
> > Bruce Richardson (2):
> > net/intel: write mbuf for last Tx desc of segment
> > net/intel: optimize for fast-free hint
> >
> > drivers/net/intel/common/tx.h | 21 ++++--
> > drivers/net/intel/common/tx_scalar.h | 98 +++++++++++++++++++++-------
> > 2 files changed, 90 insertions(+), 29 deletions(-)
> >
> > --
> > 2.53.0
>
> Good catch by Ciara, and good solution to it.
> Series-Acked-by: Morten Brørup <mb@smartsharesystems.com>
>
Applied to dpdk-next-net-intel.
Thanks for all the reviews folks. Quick tests on next-net-intel tree with
scalar driver show a nice perf bump from this change.
/Bruce
^ permalink raw reply
* Re: [PATCH] net/ark: fix unsafe env variable in extension loading
From: Stephen Hemminger @ 2026-06-03 15:30 UTC (permalink / raw)
To: Denis Sergeev
Cc: dev, shepard.siegel, ed.czeck, john.miller, stable, sdl.dpdk
In-Reply-To: <20260603052604.118850-1-denserg.edu@gmail.com>
On Wed, 3 Jun 2026 08:26:00 +0300
Denis Sergeev <denserg.edu@gmail.com> wrote:
> diff --git a/drivers/net/ark/ark_ethdev.c b/drivers/net/ark/ark_ethdev.c
> index 8b25ed948f..e25478103b 100644
> --- a/drivers/net/ark/ark_ethdev.c
> +++ b/drivers/net/ark/ark_ethdev.c
> @@ -211,9 +211,19 @@ static int
> check_for_ext(struct ark_adapter *ark)
> {
> int found = 0;
> + const char *dllpath;
> +
> + /*
> + * A basic security check is necessary before trusting
> + * ARK_EXT_PATH environment variable.
> + */
> + if (geteuid() != getuid() || getegid() != getgid()) {
> + ARK_PMD_LOG(DEBUG, "EXT ignoring ARK_EXT_PATH under setuid/setgid\n");
> + return 0;
> + }
>
DPDK may be run in containers. This would break that.
The whole dlopen extension stuff in this driver is rubbish and should not have been allowed in.
It creates testing and security nightmares.
^ permalink raw reply
* Re: [PATCH] net/ark: fix null dereference on allocation failure
From: Stephen Hemminger @ 2026-06-03 15:31 UTC (permalink / raw)
To: Denis Sergeev
Cc: dev, shepard.siegel, ed.czeck, john.miller, stable, sdl.dpdk
In-Reply-To: <20260603052207.118688-1-denserg.edu@gmail.com>
On Wed, 3 Jun 2026 08:21:54 +0300
Denis Sergeev <denserg.edu@gmail.com> wrote:
> + if (eth_dev->data->dev_private == NULL) {
> + ARK_PMD_LOG(ERR,
> + "Memory allocation for dev_private failed!"
> + " Exiting.\n");
Do not split error messages across lines. And no need for ! and the Exiting part.
^ permalink raw reply
* Re: [PATCH] net/af_packet: fix qpairs argument upper bound check
From: Stephen Hemminger @ 2026-06-03 15:27 UTC (permalink / raw)
To: Denis Sergeev; +Cc: dev, sdl.dpdk
In-Reply-To: <20260603044228.117357-1-denserg.edu@gmail.com>
On Wed, 3 Jun 2026 07:42:18 +0300
Denis Sergeev <denserg.edu@gmail.com> wrote:
> diff --git a/drivers/net/af_packet/rte_eth_af_packet.c b/drivers/net/af_packet/rte_eth_af_packet.c
> index 0ee94e71ea..ebf015dd9a 100644
> --- a/drivers/net/af_packet/rte_eth_af_packet.c
> +++ b/drivers/net/af_packet/rte_eth_af_packet.c
> @@ -1169,7 +1169,7 @@ rte_eth_from_packet(struct rte_vdev_device *dev,
> pair = &kvlist->pairs[k_idx];
> if (strstr(pair->key, ETH_AF_PACKET_NUM_Q_ARG) != NULL) {
> qpairs = atoi(pair->value);
> - if (qpairs < 1) {
> + if (qpairs < 1 || qpairs > RTE_MAX_QUEUES_PER_PORT) {
> PMD_LOG(ERR,
> "%s: invalid qpairs value",
> name);
> --
I would rather see atoi() not used in DPDK, it accepts too much and does not do
enough validation. Replace with strtoul() instead.
^ permalink raw reply
* Re: [PATCH v7] mempool: improve cache behaviour and performance
From: Thomas Monjalon @ 2026-06-03 15:44 UTC (permalink / raw)
To: Andrew Rybchenko, Bruce Richardson, Hemant Agrawal,
Morten Brørup, Konstantin Ananyev, David Marchand,
Robin Jarry, Jerin Jacob
Cc: dev, Jingjing Wu, Praveen Shetty, Sachin Saxena
In-Reply-To: <20260601164007.252063-1-mb@smartsharesystems.com>
01/06/2026 18:40, Morten Brørup:
> This patch refactors the mempool cache to eliminate some unexpected
> behaviour and reduce the mempool cache miss rate.
I feel we need more opinions about this behaviour change.
Bruce was suggesting some user-tuning. What do you think?
Also the decision may be easier if we have some performance benchmarks.
Who could help to show some numbers before/after in various cases please?
^ permalink raw reply
* Re: [PATCH] test/latency: fix intermittent failure on slow platforms
From: Thomas Monjalon @ 2026-06-03 15:37 UTC (permalink / raw)
To: Stephen Hemminger
Cc: dev, stable, Reshma Pattan, Bruce Richardson, Luca Boccassi
In-Reply-To: <CAMw=ZnRB+wv7ZN6c5YkF=LfTB328uYcBmf=xCqjTU8hSjzuVZQ@mail.gmail.com>
01/06/2026 12:23, Luca Boccassi:
> On Sun, 31 May 2026 at 19:01, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > The forwarding loop was bounded by a fixed interval of 0.5ms
> > but on slow or emulated platforms with a low-frequency timebase
> > (e.g. RISC-V rdtime) this fails because the loop only ran once.
> > The test needs two iterations to get any samples.
> >
> > Rearrange the forwarding loop so that a minimum number of iterations
> > are required. The loop still has an upper bound on packets and time
> > interval which is expanded to 10 ms.
> >
> > If no samples are collected, mark the test as skipped.
> > Refactor the forwarding loop test so that cleanup happens on
> > failure.
> >
> > Reported-by: Luca Boccassi <bluca@debian.org>
> > Fixes: b34508b9cbcd ("test/latency: update with more checks")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> > app/test/test_latencystats.c | 75 ++++++++++++++++++++++--------------
> > 1 file changed, 46 insertions(+), 29 deletions(-)
>
> Thanks, this has been failing consistently in riscv64 since at least
> 25.11, hopefully this makes it stable.
>
> Acked-by: Luca Boccassi <luca.boccassi@gmail.com>
Applied, thanks.
^ permalink raw reply
* RE: [EXTERNAL] Re: [PATCH v5 1/1] net/mana: add device reset support
From: Wei Hu @ 2026-06-03 15:27 UTC (permalink / raw)
To: Stephen Hemminger, Wei Hu; +Cc: dev@dpdk.org, Long Li
In-Reply-To: <20260601095802.36cec727@phoenix.local>
Thanks for the review on v5, Stephen.
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Tuesday, June 2, 2026 12:58 AM
> To: Wei Hu <weh@linux.microsoft.com>
> Cc: dev@dpdk.org; Long Li <longli@microsoft.com>; Wei Hu
> <weh@microsoft.com>
> Subject: [EXTERNAL] Re: [PATCH v5 1/1] net/mana: add device reset support
>
> The RCU use is not really RCU. thread_online/offline are called on every rx/tx
> burst, and the "thread" token is the queue index, not a thread. So it is a per-
> queue in-use flag paid for on the hottest path, plus a new library dependency,
> to express "wait until no queue is mid-burst" -- which the driver already half
> does by swapping the burst function to mana_*_burst_removed and checking
> dev_state. Please drop the rcu use. If you must drain readers, a per-queue
> atomic flag is lighter and local; the fast path already has the dev_state acquire-
> load it needs.
>
I have tested just swapping to mana_*_burst_removed doesn't stop
the application from entering the burst functions. I will add per-queue atomic
flags and spin poll all queue flags in the slower reset path. I think the actual
fast path cost is nearly identical. However, it does remove the rcu library
dependency.
> The data path only needs the atomic, and that part is fine. The lock is
> legitimate for serializing the teardown/rebuild against control ops, but the way
> it is used is the problem: it is acquired in mana_intr_handler, released in
> mana_reset_enter, re-acquired in mana_reset_thread, and released in
> mana_reset_exit_delay. That cross-function, cross-thread handoff is exactly
> why every function needs __rte_no_thread_safety_analysis.
> Acquire and release the lock in the same function and the annotations all go
> away. Turning off thread-safety analysis needs strong justification and this
> does not have it.
I will remove the __rte_no_thread_safety_analysis from the code. It was added
when I used spin lock and it caused clang build warnings. Now I have
changed it to pthread mutex and clang build is clean without this.
> - thread_online is taken at the top of the burst functions but
> thread_offline is only on some return paths. Any early return that
> misses it leaves a token non-quiescent and rte_rcu_qsbr_check() in
> mana_reset_enter spins forever. This is the kind of breakage the per-burst
> bracketing invites -- another reason to drop it.
>
I have checked mana_rx_burst and mana_tx_burst. There's no actual bug
now -- every return path has thread-offline. Changing rcu to per-queue flags
would still need to reset the flag in every return path. I agree it is kind of
fragile. However, the reset path needs it to make sure there is no thread still
in the fast path before it can start to tear down the resources.
I will send out a new revision.
Thanks,
Wei
^ permalink raw reply
* Re: [PATCH] eal: fix function versioning with LTO
From: Stephen Hemminger @ 2026-06-03 15:24 UTC (permalink / raw)
To: David Marchand; +Cc: dev, stable, Thomas Monjalon
In-Reply-To: <CAJFAV8w+bUJGTBSOgqp8xOHXm5av2xdFMo8TNwr=kP87R30AYg@mail.gmail.com>
On Wed, 3 Jun 2026 12:01:48 +0200
David Marchand <david.marchand@redhat.com> wrote:
> Hello,
>
> On Wed, 3 Jun 2026 at 00:57, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > When using function versioning and building with Link Time Optimization,
> > the compiler does not see the __asm__ annotation of symbols and
> > therefore thinks there are two versions of the same symbol.
> >
> > The fix is to use compiler symver attribute on the function which
> > was added in GCC 10. Keep the older method for backward compatibility
> > with older compilers.
> >
> > Bugzilla ID: 1949
> > Fixes: e30e194c4d06 ("eal: rework function versioning macros")
>
> We never used the symver stuff, so it seems unlikely the issue was
> introduced with this rework.
>
> The fact that clang does not support this attribute is a concern.
>
>
> > Cc: stable@dpdk.org
>
> Why do we need to backport?
>
> LTO is kind of experimental, so it seems good enough to reply "not
> expected to work in older LTS" if someone reported an issue.
>
> And in practice, no LTS release call the versioning macros, since a
> LTS drops all compatibility.
>
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>
> I would like to reproduce, but I can't build main with LTO.
> What patches did you apply locally to avoid warnings on the hash library?
I get no warnings from current main with GCC 15
I was doing test build of net-next with this patch:
https://patchwork.dpdk.org/project/dpdk/patch/20260529000748.275863-1-stephen@networkplumber.org/
The LTO build is done by:
$ meson setup build-lto -Db_lto=true
The result was:
cc -o lib/librte_ethdev.so.26.2 lib/librte_ethdev.so.26.2.p/ethdev_ethdev_driver.c.o lib/librte_ethdev.so.26.2.p/ethdev_ethdev_private.c.o lib/librte_ethdev.so.26.2.p/ethdev_ethdev_profile.c.o lib/librte_ethdev.so.26.2.p/ethdev_ethdev_trace_points.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_class_eth.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_ethdev.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_ethdev_cman.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_ethdev_telemetry.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_flow.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_mtr.c.o lib/librte_ethdev.so.26.2.p/ethdev_rte_tm.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_telemetry.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_common.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_8079.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_8472.c.o lib/librte_ethdev.so.26.2.p/ethdev_sff_8636.c.o lib/librte_ethdev.so.26.2.p/ethdev_ethdev_linux_ethtool.c.o -flto=auto -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared -fPIC -Wl,-soname,librte_ethdev.so.26 -Wl,--no-as-needed -Wl,--undefined-version -pthread -Wl,--start-group -lm -ldl -lnuma -lfdt '-Wl,-rpath,$ORIGIN/' lib/librte_eal.so.26.2 lib/librte_kvargs.so.26.2 lib/librte_log.so.26.2 lib/librte_telemetry.so.26.2 lib/librte_argparse.so.26.2 lib/librte_net.so.26.2 lib/librte_mbuf.so.26.2 lib/librte_mempool.so.26.2 lib/librte_ring.so.26.2 lib/librte_meter.so.26.2 -Wl,--version-script=/home/shemminger/DPDK/lto/build-lto/lib/ethdev_exports.map /usr/lib/x86_64-linux-gnu/libbsd.so /usr/lib/x86_64-linux-gnu/libarchive.so -Wl,--end-group
/tmp/cc3RQyqL.s: Assembler messages:
/tmp
Fed the result into Claude, and it said "yeah, I see the same problem on
other projects, the answer is ...". With a little searching found an example
in Gentoo https://github.com/InBetweenNames/gentooLTO/pull/458
With this change symbol versioning with LTO works on both GCC >= 10 and Clang.
Clang LTO doesn't have the attribute, but it also doesn't have the same LTO issue.
^ permalink raw reply
* Re: [PATCH v2] mempool: introduce statistics reset function
From: Thomas Monjalon @ 2026-06-03 15:14 UTC (permalink / raw)
To: Morten Brørup; +Cc: Stephen Hemminger, dev, Andrew Rybchenko
In-Reply-To: <b2fa41c0-aed1-420f-ad89-775fe0024dce@oktetlabs.ru>
24/02/2026 10:57, Andrew Rybchenko:
> On 2/24/26 12:28 PM, Morten Brørup wrote:
> > Populating a mempool with objects is accounted for in the statistics.
> > When analyzing mempool cache statistics, this may distort the data.
> > In order to simplify mempool cache statistics analysis, a mempool
> > statistics reset function was added.
> >
> > Furthermore, details about average burst sizes and mempool cache miss
> > rates were added to the statistics shown when dumping a mempool.
> >
> > Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Applied with version changed to 26.07, thanks.
^ permalink raw reply
* [PATCH 2/2] net/i40e: fix blocking link wait on device start
From: Ciara Loftus @ 2026-06-03 14:34 UTC (permalink / raw)
To: dev; +Cc: Ciara Loftus, stable
In-Reply-To: <20260603143407.1108527-1-ciara.loftus@intel.com>
Currently, device start performs a synchronous link status update,
blocking for up to one second if the link is not yet up. This causes
unnecessary startup delay in scenarios where the link partner is slow to
come up or unavailable.
The wait was introduced alongside the maximum frame size MAC config
command. Some devices require the link to be up before issuing that
command, so the solution was to block at device start until link was
established.
To address the issue, remove the unconditional blocking wait. Take note
if the link was down at the time of the MAC config command during device
start and if it was, re-issue the command upon notification of the first
link-up event.
In the case where all interrupt vectors are consumed by Rx queues, no
interrupt or alarm handler is available to process link-up events. The
blocking wait is retained for this narrow case to preserve correctness.
Fixes: 82fcf20d039c ("net/i40e: fix maximum frame size configuration")
Cc: stable@dpdk.org
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
drivers/net/intel/i40e/i40e_ethdev.c | 25 +++++++++++++++++++++----
drivers/net/intel/i40e/i40e_ethdev.h | 2 ++
2 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/drivers/net/intel/i40e/i40e_ethdev.c b/drivers/net/intel/i40e/i40e_ethdev.c
index e818b6fc7c..a7c00033e5 100644
--- a/drivers/net/intel/i40e/i40e_ethdev.c
+++ b/drivers/net/intel/i40e/i40e_ethdev.c
@@ -2565,15 +2565,19 @@ i40e_dev_start(struct rte_eth_dev *dev)
I40E_AQ_EVENT_MEDIA_NA), NULL);
if (ret != I40E_SUCCESS)
PMD_DRV_LOG(WARNING, "Fail to set phy mask");
-
- /* Call get_link_info aq command to enable/disable LSE */
- i40e_dev_link_update(dev, 1);
}
if (dev->data->dev_conf.intr_conf.rxq == 0) {
+ i40e_dev_link_update(dev, 0);
+ pf->mac_config_on_link_up = !dev->data->dev_link.link_status;
rte_eal_alarm_set(I40E_ALARM_INTERVAL,
i40e_dev_alarm_handler, dev);
} else {
+ /* Block if no interrupt handler can process link-up events, to help
+ * ensure that MAC config is applied when the link is up.
+ */
+ i40e_dev_link_update(dev, !rte_intr_allow_others(intr_handle));
+ pf->mac_config_on_link_up = !dev->data->dev_link.link_status;
/* enable uio intr after callback register */
rte_intr_enable(intr_handle);
}
@@ -6867,6 +6871,7 @@ static void
i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
{
struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+ struct i40e_pf *pf = I40E_DEV_PRIVATE_TO_PF(dev->data->dev_private);
struct i40e_arq_event_info info;
uint16_t pending, opcode;
uint8_t msg_buf[I40E_AQ_BUF_SZ] = {0};
@@ -6899,9 +6904,21 @@ i40e_dev_handle_aq_msg(struct rte_eth_dev *dev)
break;
case i40e_aqc_opc_get_link_status:
ret = i40e_dev_link_update(dev, 0);
- if (!ret)
+ /* Some devices require MAC config to be applied when the link comes up. */
+ if (!ret) {
+ if (pf->mac_config_on_link_up && dev->data->dev_link.link_status) {
+ uint16_t max_frame_size;
+
+ max_frame_size = dev->data->mtu ?
+ dev->data->mtu + I40E_ETH_OVERHEAD :
+ I40E_FRAME_SIZE_MAX;
+ i40e_aq_set_mac_config(hw, max_frame_size, TRUE,
+ false, 0, NULL);
+ pf->mac_config_on_link_up = false;
+ }
rte_eth_dev_callback_process(dev,
RTE_ETH_EVENT_INTR_LSC, NULL);
+ }
break;
default:
PMD_DRV_LOG(DEBUG, "Request %u is not supported yet",
diff --git a/drivers/net/intel/i40e/i40e_ethdev.h b/drivers/net/intel/i40e/i40e_ethdev.h
index d57c53f661..7524defdb7 100644
--- a/drivers/net/intel/i40e/i40e_ethdev.h
+++ b/drivers/net/intel/i40e/i40e_ethdev.h
@@ -1191,6 +1191,8 @@ struct i40e_pf {
/* When firmware > 8.3, the enable flag for outer VLAN processing */
bool fw8_3gt;
+ /* MAC config needs re-applying when link first comes up */
+ bool mac_config_on_link_up;
struct i40e_vf_msg_cfg vf_msg_cfg;
uint64_t prev_rx_bytes;
--
2.43.0
^ permalink raw reply related
* [PATCH 1/2] net/ice: revert fix link up when starting device
From: Ciara Loftus @ 2026-06-03 14:34 UTC (permalink / raw)
To: dev; +Cc: Ciara Loftus
In-Reply-To: <20260603143407.1108527-1-ciara.loftus@intel.com>
This reverts commit 6c76b76dc64183eb2f24a52b90d4ff9feb4872f4.
The reverted commit worked around a potential timing issue where the
link could be reported down immediately after the link was enabled
during device start. The commit introduced a blocking wait which gave
the driver a better chance to read the correct link state before
returning from device start. However, since the auto link update flag is
set when setting the link up in device start, an adminq notification
should arrive once the link is up, which will be handled and correctly
set the link status. This means we can remove the delay from device
start as it does not need to guarantee link up before returning.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
---
drivers/net/intel/ice/ice_ethdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/intel/ice/ice_ethdev.c b/drivers/net/intel/ice/ice_ethdev.c
index d2734b6688..da570d03cb 100644
--- a/drivers/net/intel/ice/ice_ethdev.c
+++ b/drivers/net/intel/ice/ice_ethdev.c
@@ -4464,7 +4464,7 @@ ice_dev_start(struct rte_eth_dev *dev)
ice_dev_set_link_up(dev);
/* Call get_link_info aq command to enable/disable LSE */
- ice_link_update(dev, 1);
+ ice_link_update(dev, 0);
pf->adapter_stopped = false;
--
2.43.0
^ permalink raw reply related
* [PATCH 0/2] net/intel: fix blocking link wait on device start
From: Ciara Loftus @ 2026-06-03 14:34 UTC (permalink / raw)
To: dev; +Cc: Ciara Loftus
Device start in both ice and i40e has historically blocked for up to
two seconds waiting for link to come up before returning. This causes
noticeable startup delay when a link partner is slow to come up or
absent entirely. This series reverts the wait in ice, and removes the
wait in i40e while ensuring that the MAC config is applied when the
link is up which is a requirement for some devices.
Ciara Loftus (2):
net/ice: revert fix link up when starting device
net/i40e: fix blocking link wait on device start
drivers/net/intel/i40e/i40e_ethdev.c | 25 +++++++++++++++++++++----
drivers/net/intel/i40e/i40e_ethdev.h | 2 ++
drivers/net/intel/ice/ice_ethdev.c | 2 +-
3 files changed, 24 insertions(+), 5 deletions(-)
--
2.43.0
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox