DPDK-dev Archive on lore.kernel.org

DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 02/17] common/cnxk: add API of SA valid for cn20k platform
From: Rahul Bhansali @ 2026-06-11  7:32 UTC (permalink / raw)
  To: dev, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: jerinj, Rahul Bhansali
In-Reply-To: <20260611073311.3129711-1-rbhansali@marvell.com>

Add API to get SA valid configuration for cn20k platform.

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
---
 drivers/common/cnxk/cnxk_security.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/drivers/common/cnxk/cnxk_security.c b/drivers/common/cnxk/cnxk_security.c
index 6b51055100..6f46ad3276 100644
--- a/drivers/common/cnxk/cnxk_security.c
+++ b/drivers/common/cnxk/cnxk_security.c
@@ -606,6 +606,20 @@ cnxk_ot_ipsec_outb_sa_valid(struct roc_ot_ipsec_outb_sa *sa)
 	return !!sa->w2.s.valid;
 }
 
+RTE_EXPORT_INTERNAL_SYMBOL(cnxk_ow_ipsec_inb_sa_valid)
+bool
+cnxk_ow_ipsec_inb_sa_valid(struct roc_ow_ipsec_inb_sa *sa)
+{
+	return !!sa->w2.s.valid;
+}
+
+RTE_EXPORT_INTERNAL_SYMBOL(cnxk_ow_ipsec_outb_sa_valid)
+bool
+cnxk_ow_ipsec_outb_sa_valid(struct roc_ow_ipsec_outb_sa *sa)
+{
+	return !!sa->w2.s.valid;
+}
+
 RTE_EXPORT_INTERNAL_SYMBOL(cnxk_ipsec_ivlen_get)
 uint8_t
 cnxk_ipsec_ivlen_get(enum rte_crypto_cipher_algorithm c_algo,
-- 
2.34.1


^ permalink raw reply related

* [PATCH 01/17] net/cnxk: update mbuf next field for multi segment
From: Rahul Bhansali @ 2026-06-11  7:32 UTC (permalink / raw)
  To: dev, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra
  Cc: jerinj, Rahul Bhansali

As per the requirement of rte_mbuf_raw_reset_bulk(), the mbuf's
'next' and 'nb_segs' fields are required to be reset.
This reset these field for multi-segment mbufs on cn9k platform.

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
---
 drivers/net/cnxk/cn9k_rx.h |  8 --------
 drivers/net/cnxk/cn9k_tx.h | 42 ++++++++++++++++++--------------------
 2 files changed, 20 insertions(+), 30 deletions(-)

diff --git a/drivers/net/cnxk/cn9k_rx.h b/drivers/net/cnxk/cn9k_rx.h
index 79b56fe160..5ccdc5dee1 100644
--- a/drivers/net/cnxk/cn9k_rx.h
+++ b/drivers/net/cnxk/cn9k_rx.h
@@ -410,8 +410,6 @@ cn9k_nix_cqe_to_mbuf(const struct nix_cqe_hdr_s *cq, const uint32_t tag,
 		 * Hence, flag argument is not required.
 		 */
 		nix_cqe_xtract_mseg(rx, mbuf, val, 0);
-	else
-		mbuf->next = NULL;
 }
 
 static inline uint16_t
@@ -826,12 +824,6 @@ cn9k_nix_recv_pkts_vector(void *rx_queue, struct rte_mbuf **rx_pkts,
 			nix_cqe_xtract_mseg((union nix_rx_parse_u *)
 						(cq0 + CQE_SZ(3) + 8), mbuf3,
 					    mbuf_initializer, flags);
-		} else {
-			/* Update that no more segments */
-			mbuf0->next = NULL;
-			mbuf1->next = NULL;
-			mbuf2->next = NULL;
-			mbuf3->next = NULL;
 		}
 
 		/* Store the mbufs to rx_pkts */
diff --git a/drivers/net/cnxk/cn9k_tx.h b/drivers/net/cnxk/cn9k_tx.h
index 32665d2050..0ec448e36c 100644
--- a/drivers/net/cnxk/cn9k_tx.h
+++ b/drivers/net/cnxk/cn9k_tx.h
@@ -665,14 +665,14 @@ cn9k_nix_prepare_mseg(struct cn9k_eth_txq *txq, struct rte_mbuf *m, struct rte_m
 #else
 	RTE_SET_USED(cookie);
 #endif
-#ifdef RTE_ENABLE_ASSERT
-	m->next = NULL;
-	m->nb_segs = 1;
-#endif
-	m = m_next;
-	if (!m)
+	if (likely(!m_next))
 		goto done;
 
+	if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F)) {
+		m->next = NULL;
+		m->nb_segs = 1;
+	}
+	m = m_next;
 	/* Fill mbuf segments */
 	do {
 		m_next = m->next;
@@ -704,12 +704,13 @@ cn9k_nix_prepare_mseg(struct cn9k_eth_txq *txq, struct rte_mbuf *m, struct rte_m
 			sg_u = sg->u;
 			slist++;
 		}
-#ifdef RTE_ENABLE_ASSERT
-		m->next = NULL;
-#endif
+		if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+			m->next = NULL;
 		m = m_next;
 	} while (nb_segs);
 
+	if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+		rte_io_wmb();
 done:
 	sg->u = sg_u;
 	sg->segs = i;
@@ -720,9 +721,6 @@ cn9k_nix_prepare_mseg(struct cn9k_eth_txq *txq, struct rte_mbuf *m, struct rte_m
 	segdw += (off >> 1) + 1 + !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
 	send_hdr->w0.sizem1 = segdw - 1;
 
-#ifdef RTE_ENABLE_ASSERT
-	rte_io_wmb();
-#endif
 	return segdw;
 }
 
@@ -950,10 +948,10 @@ cn9k_nix_prepare_mseg_vec_list(struct cn9k_eth_txq *txq,
 	RTE_SET_USED(cookie);
 #endif
 
-#ifdef RTE_ENABLE_ASSERT
-	m->next = NULL;
-	m->nb_segs = 1;
-#endif
+	if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F)) {
+		m->next = NULL;
+		m->nb_segs = 1;
+	}
 	m = m_next;
 	/* Fill mbuf segments */
 	do {
@@ -984,9 +982,8 @@ cn9k_nix_prepare_mseg_vec_list(struct cn9k_eth_txq *txq,
 			sg_u = sg->u;
 			slist++;
 		}
-#ifdef RTE_ENABLE_ASSERT
-		m->next = NULL;
-#endif
+		if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+			m->next = NULL;
 		m = m_next;
 	} while (nb_segs);
 
@@ -1002,9 +999,6 @@ cn9k_nix_prepare_mseg_vec_list(struct cn9k_eth_txq *txq,
 		 !!(flags & NIX_TX_OFFLOAD_TSTAMP_F);
 	send_hdr->w0.sizem1 = segdw - 1;
 
-#ifdef RTE_ENABLE_ASSERT
-	rte_io_wmb();
-#endif
 	return segdw;
 }
 
@@ -1089,6 +1083,10 @@ cn9k_nix_xmit_pkts_mseg_vector(uint64x2_t *cmd0, uint64x2_t *cmd1,
 		}
 	}
 
+	/* Multi segment mbufs */
+	if (!(flags & NIX_TX_OFFLOAD_MBUF_NOFF_F))
+		rte_io_wmb();
+
 	for (j = 0; j < NIX_DESCS_PER_LOOP;) {
 		/* Fit consecutive packets in same LMTLINE. */
 		if ((segdw[j] + segdw[j + 1]) <= 8) {
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH] eal: fix core_index for non-EAL registered threads
From: Maxime Peim @ 2026-06-10 13:45 UTC (permalink / raw)
  To: David Marchand; +Cc: dev
In-Reply-To: <CAJFAV8zsDEW6vU69NHmZiqUyB4xz+qDPQsw-cBBeScGeuE_Fiw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3900 bytes --]

Hi David

I am not sure it works either, if the lcore are manual set with a gap:
`--lcores=0,7` (from `eal_parse_lcores`):
- lcore 0 will get core_index = 0
- lcore 7 will get core_index = 1

When calling `rte_thread_register` we will hit lcore=1 as first
not-assigned lcore and set core_index=1 as well.

It seems like a solution should be to have a bitmap of the currently used
core_index stored in the global config.

Please let me know what you think about that.

Maxime Peim

On Mon, Jun 8, 2026 at 6:35 PM David Marchand <david.marchand@redhat.com>
wrote:

> On Mon, 8 Jun 2026 at 18:10, David Marchand <david.marchand@redhat.com>
> wrote:
> >
> > On Wed, 22 Apr 2026 at 09:54, Maxime Peim <maxime.peim@gmail.com> wrote:
> > >
> > > Threads registered via rte_thread_register() are assigned a valid
> > > lcore_id by eal_lcore_non_eal_allocate(), but their core_index in
> > > lcore_config is left at -1. This value was set during
> rte_eal_cpu_init()
> > > for lcores with ROLE_OFF (undetected CPUs) and is never updated when
> the
> > > lcore is later allocated to a non-EAL thread.
> > >
> > > As a result, rte_lcore_index() returns -1 for registered non-EAL
> > > threads. Libraries that use rte_lcore_index() to select per-lcore
> > > caches fall back to a shared global path when it returns -1, causing
> > > severe contention under concurrent access from multiple registered
> > > threads.
> > >
> > > A concrete example is the mlx5 indexed memory pool (mlx5_ipool), which
> > > uses rte_lcore_index() in mlx5_ipool_malloc_cache() to select a
> per-core
> > > cache slot. When core_index is -1, all registered threads are funneled
> > > into a single shared slot protected by a spinlock. In testing with VPP
> > > (which registers worker threads via rte_thread_register()), this caused
> > > async flow rule insertion throughput to drop from ~6.4M rules/sec to
> > > ~1.2M rules/sec with 4 workers -- a 5x regression attributable entirely
> > > to spinlock contention in the ipool allocator.
> > >
> > > Fix by setting core_index to the next sequential index
> (cfg->lcore_count)
> > > in eal_lcore_non_eal_allocate() before incrementing the count. Also
> reset
> > > core_index back to -1 on the error rollback path and in
> > > eal_lcore_non_eal_release() for correctness.
> > >
> > > Fixes: 5c307ba2a5b1 ("eal: register non-EAL threads as lcores")
> > Cc: stable@dpdk.org
> >
> > > Signed-off-by: Maxime Peim <maxime.peim@gmail.com>
> > Acked-by: David Marchand <david.marchand@redhat.com>
> >
>
> Hum, I did not push the change.
> Re-reading this code, we have an issue if some external thread
> unregisters in the middle.
>
> What do you think of the additional hunk:
>
> $ git diff
> diff --git a/lib/eal/common/eal_common_lcore.c
> b/lib/eal/common/eal_common_lcore.c
> index ae085d73e4..6f53f20d90 100644
> --- a/lib/eal/common/eal_common_lcore.c
> +++ b/lib/eal/common/eal_common_lcore.c
> @@ -372,13 +372,16 @@ eal_lcore_non_eal_allocate(void)
>         struct rte_config *cfg = rte_eal_get_configuration();
>         struct lcore_callback *callback;
>         struct lcore_callback *prev;
> +       unsigned int index = 0;
>         unsigned int lcore_id;
>
>         rte_rwlock_write_lock(&lcore_lock);
>         for (lcore_id = 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> -               if (cfg->lcore_role[lcore_id] != ROLE_OFF)
> +               if (cfg->lcore_role[lcore_id] != ROLE_OFF) {
> +                       index++;
>                         continue;
> -               lcore_config[lcore_id].core_index = cfg->lcore_count;
> +               }
> +               lcore_config[lcore_id].core_index = index;
>                 cfg->lcore_role[lcore_id] = ROLE_NON_EAL;
>                 cfg->lcore_count++;
>                 break;
>
>
> --
> David Marchand
>
>

[-- Attachment #2: Type: text/html, Size: 5124 bytes --]

^ permalink raw reply

* [PATCH v3] net/mlx5: fix counter TAILQ race between free and query callback
From: Linhu Li @ 2026-06-10  6:34 UTC (permalink / raw)
  To: dev; +Cc: stable, dsosnowski, Linhu Li
In-Reply-To: <20260604101112.72177-1-lilinhu618@gmail.com>

flow_dv_counter_free() inserts counters into
pool->counters[pool->query_gen] under pool->csl. Meanwhile,
mlx5_flow_async_pool_query_handle() moves counters from
pool->counters[query_gen ^ 1] to the global free list via
TAILQ_CONCAT while holding only cmng->csl, not pool->csl.

The comment in flow_dv_counter_free() claims the lock is not needed
because the query callback and the release function operate on different lists.
That holds only if the free path always observes the up-to-date query_gen. It
can be violated:

1. A counter free thread (non-PMD, e.g. OVS offload thread) reads
   pool->query_gen == 0 and is about to insert into counters[0].
2. The free thread is preempted by the OS scheduler; it is a regular
   pthread, not pinned to a core.
3. The eal-intr-thread alarm fires: query_gen++ (now 1) and the async
   query is sent.
4. Hardware completes the query and the callback runs TAILQ_CONCAT on
   counters[0] (= query_gen ^ 1).
5. The free thread resumes and runs TAILQ_INSERT_TAIL on counters[0]
   concurrently with step 4 on another core.

Because the two paths take different locks, TAILQ_INSERT_TAIL and
TAILQ_CONCAT run concurrently on the same list with no synchronization and
corrupt it: the pool-local list ends up with a NULL head but a dangling
tqh_last, and the global free list tail no longer points to the real tail. The just-
freed counter and every counter inserted afterwards become unreachable
and are leaked.

Non-PMD threads can be preempted for hundreds of microseconds under
CPU pressure, which is well within the async query round-trip time, so the
window is reachable in practice.

Fix it by taking pool->csl in the query completion callback before operating on
pool->counters[query_gen], serializing the CONCAT with any concurrent
INSERT. The lock is taken once per pool per query completion in the eal-intr-
thread context, not on the datapath, so the cost is negligible. Lock order is
pool->csl then cmng->csl, matching all other sites.

Also handle the error path: previously the counters accumulated in
pool->counters[query_gen] were abandoned when a query failed. Move
them back to the global free list to avoid a leak on persistent query failures.

Fixes: ac79183dc6f7 ("net/mlx5: optimize free counter lookup")
Cc: stable@dpdk.org

Signed-off-by: Linhu Li <lilinhu618@gmail.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 doc/guides/rel_notes/release_26_07.rst | 21 +++++++++++++++++
 drivers/net/mlx5/mlx5_flow.c           | 31 ++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index b8a3e2ced9..30a9564884 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -153,6 +153,27 @@ ABI Changes
 * No ABI change that would break compatibility with 25.11.

+Fixed Issues
+------------
+
+.. This section should contain fixed issues in this release. Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description of the fix in the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+* **net/mlx5: Fixed counter TAILQ race between free and query callback.**
+
+  Fixed a race condition where concurrent counter free operations and async
+  query completions could corrupt the counter free list, causing counter leaks.
+  The issue occurred when non-PMD threads were preempted between reading
+  ``query_gen`` and inserting into the counter list.
+
+
 Known Issues
 ------------

diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 915ea29a5a..2f785d58ec 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9893,6 +9893,13 @@ void
 mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 				  uint64_t async_id, int status)
 {
+	/*
+	 * Handle async counter pool query completion.
+	 * query_gen is flipped each round: freed counters go into [query_gen],
+	 * while this callback moves [query_gen ^ 1] to the global free list.
+	 * pool->csl must be held when operating on pool->counters[] to serialize
+	 * with concurrent free-path insertions.
+	 */
 	struct mlx5_flow_counter_pool *pool =
 		(struct mlx5_flow_counter_pool *)(uintptr_t)async_id;
 	struct mlx5_counter_stats_raw *raw_to_free;
@@ -9904,6 +9911,21 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,

 	if (unlikely(status)) {
 		raw_to_free = pool->raw_hw;
+		/*
+		 * The query failed, so the freed counters accumulated
+		 * in the old-gen list would otherwise be stranded.
+		 * Move them back to the global free list. This is safe
+		 * for both transient and persistent failures: the
+		 * counters are still valid and can be reused.
+		 */
+		if (!TAILQ_EMPTY(&pool->counters[query_gen])) {
+			rte_spinlock_lock(&pool->csl);
+			rte_spinlock_lock(&cmng->csl[cnt_type]);
+			TAILQ_CONCAT(&cmng->counters[cnt_type],
+				     &pool->counters[query_gen], next);
+			rte_spinlock_unlock(&cmng->csl[cnt_type]);
+			rte_spinlock_unlock(&pool->csl);
+		}
 	} else {
 		raw_to_free = pool->raw;
 		if (pool->is_aged)
@@ -9913,11 +9935,20 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
 		rte_spinlock_unlock(&pool->sl);
 		/* Be sure the new raw counters data is updated in memory. */
 		rte_io_wmb();
+		/*
+		 * A counter free thread may have read a stale query_gen
+		 * before the generation was flipped and could still be
+		 * inserting into this same old-gen list. Hold pool->csl to
+		 * serialize TAILQ_CONCAT with that TAILQ_INSERT_TAIL and
+		 * avoid corrupting the list.
+		 */
 		if (!TAILQ_EMPTY(&pool->counters[query_gen])) {
+			rte_spinlock_lock(&pool->csl);
 			rte_spinlock_lock(&cmng->csl[cnt_type]);
 			TAILQ_CONCAT(&cmng->counters[cnt_type],
 				     &pool->counters[query_gen], next);
 			rte_spinlock_unlock(&cmng->csl[cnt_type]);
+			rte_spinlock_unlock(&pool->csl);
 		}
 	}
 	LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws, raw_to_free, next);
-- 
2.39.3 (Apple Git-146)

^ permalink raw reply related

* 回复：回复：[PATCH] gpu/metax: add new driver for Metax GPU
From: 许玲燕 @ 2026-06-11  7:10 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, eagostini, 王冬冬
In-Reply-To: <J2c1Ke4gRXqOUlr2bqZFPg@monjalon.net>

[-- Attachment #1: Type: text/plain, Size: 5751 bytes --]

Hi,
Regarding your question about whether the lib and module are upstreamed already, I would like to clarify their current status:
Both libmcruntime.so and the corresponding gdrapi libraries are proprietary user-space libraries provided by Metax. They are not upstreamed to the DPDK mainline repository.
However, please rest assured that the current patch interacts with them via standard dlopen (dynamic loading) at runtime. We do not link directly against their source code or require them as hard build-time dependencies. Therefore, this approach will not introduce any additional compilation dependencies or licensing issues to the DPDK main tree.
------------------------------------------------------------------
发件人：Thomas Monjalon <thomas@monjalon.net>
发送时间：2026年6月9日(周二) 18:44
收件人："许玲燕"<lingyan.xu@metax-tech.com>
抄　送：dev<dev@dpdk.org>; eagostini<eagostini@nvidia.com>
主　题：Re: 回复：[PATCH] gpu/metax: add new driver for Metax GPU
Thank you for the detailed answer and your understanding.
One more question: are the lib and module upstreamed already?
09/06/2026 12:22, 许玲燕:
> Hi,
> Thank you for the detailed feedback and for reviewing the proposal for the Metax GPU driver.
> Based on the questions raised and the analysis of the code implementation, here are the clarifications and my action plan:
> 1. Regarding GPU Access Method
> The driver interfaces with the Metax GPU hardware through a combination of the vendor-provided MC Runtime (Metax Compute Runtime) library and GDRCopy (GPU Direct RDMA) technology.
> 
> * 
> User-space Library: As seen in the maca.c code, the driver dynamically loads (dlopen) the libmcruntime.so library. It uses mc_runtime_api.h to manage GPU contexts, memory allocation, and device attributes.
> 
> * 
> Kernel Module: The driver relies on the underlying Metax kernel driver (for PCI probing and basic device access) and the gdrapi (GDRCopy) kernel module to facilitate zero-copy data transfer between CPU and GPU memory.
> 
> * 
> Dependency: The build log confirms the detection of headers like mc_runtime_api.h and gdrapi.h, which are essential for this integration.
> 2. Clarification on "Rendering" Functionality
> I apologize for the confusion caused by the term "Rendering" in the initial description. Upon reviewing the code and your feedback, I realize this was an inaccurate choice of words.
> 
> * 
> Correction: The intended functionality is purely "Compute/Data Processing" and "Memory Management".
> 
> * 
> Explanation: The driver's core logic (as shown in the patch) focuses on memory registration, allocation, and CPU/GPU data synchronization (via maca_mem_cpu_map and gdrcopy_pin), which are essential for network data processing acceleration rather than graphical rendering. I will correct this terminology in the documentation to avoid further confusion.
> 3. Action Plan: Following the Contribution Guide
> I have reviewed the <"Adding a New Driver"> guide you linked.
> 
> * 
> Patch Splitting: I understand that the current monolithic patch is not suitable. I will rework the submission and split it into a logical patch series:
> * 
> Patch 1: Add the basic infrastructure (Meson files, maintainers, configuration).
> 
> * 
> Patch 2: Implement core device functionality (PCI probing, initialization, context management).
> 
> * 
> Patch 3: Add memory management and data path features (allocation, registration, and CPU mapping).
> Thank you again for your guidance. I will resubmit the revised patch series shortly.
> Best regards,
> Lingyan Xu
> ------------------------------------------------------------------
> 发件人：Thomas Monjalon <thomas@monjalon.net>
> 发送时间：2026年6月2日(周二) 18:01
> 收件人："许玲燕"<lingyan.xu@metax-tech.com>
> 抄 送：dev<dev@dpdk.org>; eagostini<eagostini@nvidia.com>
> 主 题：Re: [PATCH] gpu/metax: add new driver for Metax GPU
> Hello,
> 01/06/2026 07:47, 许玲燕:
> > I am writing to propose a new driver for the Metax GPU,
> How do you access the GPU?
> Are you using a specific library or kernel module?
> > which I believe will significantly enhance our support
> > and performance for this hardware.
> > The patch attached includes the initial implementation of the driver,
> > with key features such as:
> > 
> > * Basic initialization and configuration 
> > * Memory management and allocation 
> > * Core functionality for rendering and compute tasks 
> I am familiar with connecting compute tasks of a GPU
> with DPDK networking, but I'm surprised by the rendering functionality.
> Do you mean graphical rendering of data coming from the network?
> > Please review the code and let me know if you have any feedback or suggestions.
> > I am more than happy to make any necessary adjustments and improvements.
> Thank you for working on this.
> I recommend following this guide to introduce a new driver:
> https://doc.dpdk.org/guides/contributing/new_driver.html <https://doc.dpdk.org/guides/contributing/new_driver.html > <https://doc.dpdk.org/guides/contributing/new_driver.html <https://doc.dpdk.org/guides/contributing/new_driver.html > >
> 
> 
> 超大附件列表 dpdk-build-test-log.txt [48KB]
> 进入下载页面 https://qiye.aliyun.com/alimail/openLinks/downloadMimeMetaDiskBigAttach?id=netdiskid%3Av001%3Afile%3ADzzzzzzNqZx%3BJYiJwCficINAoHh55iyjKdydQzW5hDE%2FGjddF2Xp4ghl2ujmlGlWdfhgNCLOb5s3BZAHvDXTdZhtzGA3q8HJ%2Fv%2FPGnrPJfO1Xc%2BWnHr%2FKRwIkHzWFe5Iwm1IZrurr9hW <https://qiye.aliyun.com/alimail/openLinks/downloadMimeMetaDiskBigAttach?id=netdiskid%3Av001%3Afile%3ADzzzzzzNqZx%3BJYiJwCficINAoHh55iyjKdydQzW5hDE%2FGjddF2Xp4ghl2ujmlGlWdfhgNCLOb5s3BZAHvDXTdZhtzGA3q8HJ%2Fv%2FPGnrPJfO1Xc%2BWnHr%2FKRwIkHzWFe5Iwm1IZrurr9hW > 
> 

[-- Attachment #2: Type: text/html, Size: 9618 bytes --]

^ permalink raw reply

* [PATCH] test/security: increase wait time for reassebmly test
From: Rahul Bhansali @ 2026-06-11  6:18 UTC (permalink / raw)
  To: dev, Akhil Goyal, Anoob Joseph; +Cc: Rahul Bhansali

In case of multi segment inline IPsec reassembly burst test
of 4 fragment per packet where each fragment is multi
segmented ~11k bytes and sometimes few of reassembly fails
out of 33 such burst.

Delay of 1ms after burst Tx is not sufficient in this case,
hence need to increase to 10ms to avoid random reassembly
failures in functional tests.

Signed-off-by: Rahul Bhansali <rbhansali@marvell.com>
---
 app/test/test_security_inline_proto.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/app/test/test_security_inline_proto.c b/app/test/test_security_inline_proto.c
index b0cce5ebd9..55d81041df 100644
--- a/app/test/test_security_inline_proto.c
+++ b/app/test/test_security_inline_proto.c
@@ -1107,6 +1107,7 @@ test_ipsec_with_reassembly(struct reassembly_vector *vector,
 	void *ctx;
 	unsigned int i, nb_rx = 0, j;
 	uint32_t ol_flags;
+	uint32_t delay_ms;
 	bool outer_ipv4;
 	int ret = 0;
 
@@ -1214,7 +1215,9 @@ test_ipsec_with_reassembly(struct reassembly_vector *vector,
 		goto out;
 	}
 
-	rte_delay_ms(1);
+	/* Multi-segment fragments requires more delay for burst Tx and reassembly in Rx path. */
+	delay_ms = sg_mode ? 10 : 1;
+	rte_delay_ms(delay_ms);
 
 	/* Retry few times before giving up */
 	nb_rx = 0;
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v3 01/10] eal: add interface to check if lcore is EAL managed
From: lihuisong (C) @ 2026-06-11  6:16 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: anatoly.burakov, sivaprasad.tummala, dev, stephen, fengchengwen,
	yangxingui, zhanjie9, lihuisong
In-Reply-To: <vMlZoeDqTD6UrHS2wuWBKw@monjalon.net>

Hi Thomas,

Thanks for your review.


On 6/11/2026 7:28 AM, Thomas Monjalon wrote:
> 22/05/2026 06:11, Huisong Li:
>> Add a new helper function rte_lcore_is_eal_managed() to determine
>> if a logical core is managed by EAL.
>>
>> This interface returns true if the lcore role is either ROLE_RTE
>> (standard worker/main cores) or ROLE_SERVICE (service cores).
> [...]
>> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_lcore_is_eal_managed, 26.07)
>> +int rte_lcore_is_eal_managed(unsigned int lcore_id)
>> +{
>> +	struct rte_config *cfg = rte_eal_get_configuration();
>> +
>> +	if (lcore_id >= RTE_MAX_LCORE)
>> +		return 0;
>> +	return cfg->lcore_role[lcore_id] == ROLE_RTE ||
>> +		cfg->lcore_role[lcore_id] == ROLE_SERVICE;
>> +}
> I'm not sure about adding this function in the API.
> We already have rte_eal_lcore_role()
> and I feel having this explicit ROLE_RTE || ROLE_SERVICE
> in the code where needed may be less confusing.

Ack.

>
> Note: we should prefix these constants with RTE_LCORE_

Yeah, it's good.

This will break API. And we can do this in 26.11.

>
>

^ permalink raw reply

* RE: [PATCH v5] ethdev: support inline calculating masked item value
From: Bing Zhao @ 2026-06-11  4:56 UTC (permalink / raw)
  To: Slava Ovsiienko, dev@dpdk.org, Raslan Darawsheh,
	stephen@networkplumber.org
  Cc: Ori Kam, Dariusz Sosnowski, Suanming Mou, Matan Azrad,
	NBU-Contact-Thomas Monjalon (EXTERNAL)
In-Reply-To: <BN9PR12MB5338F6E48AC491C180292DB8D01B2@BN9PR12MB5338.namprd12.prod.outlook.com>

Oh, I see, I squashed the changes into the next following mlx5 driver fix but not the proper one, so the code is still the old one. My fault.

> -----Original Message-----
> From: Bing Zhao
> Sent: Thursday, June 11, 2026 12:55 PM
> To: 'Bing Zhao' <bingz@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; dev@dpdk.org; Raslan Darawsheh
> <rasland@nvidia.com>; stephen@networkplumber.org
> Cc: Ori Kam <orika@nvidia.com>; Dariusz Sosnowski <dsosnowski@nvidia.com>;
> Suanming Mou <suanmingm@nvidia.com>; Matan Azrad <matan@nvidia.com>; NBU-
> Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>
> Subject: RE: [PATCH v5] ethdev: support inline calculating masked item
> value
> 
> Hi,
> 
> In my local code,
> 
> diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index
> 7cf9f6f6f3..7a2721af00 100644
> --- a/lib/ethdev/rte_flow.c
> +++ b/lib/ethdev/rte_flow.c
> @@ -181,9 +181,18 @@ static const struct rte_flow_desc_data
> rte_flow_desc_item[] = {  static inline size_t
> rte_flow_conv_item_mask_size(const struct rte_flow_item *item)  {
> -       if ((int)item->type >= 0)
> +       if ((int)item->type < 0)
> +               return sizeof(void *);
> +       switch (item->type) {
> +       case RTE_FLOW_ITEM_TYPE_RAW:
> +               return offsetof(struct rte_flow_item_raw, pattern);
> +       case RTE_FLOW_ITEM_TYPE_GENEVE_OPT:
> +               return offsetof(struct rte_flow_item_geneve_opt, data);
> +       default:
> +               if (rte_flow_desc_item[item->type].desc_fn != NULL)
> +                       return 0;
>                 return rte_flow_desc_item[item->type].size;
> -       return sizeof(void *);
> +       }
>  }
> 
> // This is the code before my latest change.
> > +static inline size_t
> > +rte_flow_conv_item_mask_size(const struct rte_flow_item *item) {
> > +       if ((int)item->type >= 0)
> > +               return rte_flow_desc_item[item->type].size;
> > +       return sizeof(void *);
> > +}
> > +
> 
> 
> I didn't understand why the patch I sent is still using the old code.
> 
> > -----Original Message-----
> > From: Bing Zhao <bingz@nvidia.com>
> > Sent: Wednesday, June 10, 2026 1:27 PM
> > To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org; Raslan
> > Darawsheh <rasland@nvidia.com>; stephen@networkplumber.org
> > Cc: Ori Kam <orika@nvidia.com>; Dariusz Sosnowski
> > <dsosnowski@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan
> > Azrad <matan@nvidia.com>; NBU- Contact-Thomas Monjalon (EXTERNAL)
> > <thomas@monjalon.net>
> > Subject: [PATCH v5] ethdev: support inline calculating masked item
> > value
> >
> > External email: Use caution opening links or attachments
> >
> >
> > In the asynchronous API definition and some drivers, the rte_flow_item
> > spec value may not be calculated by the driver due to the reason of
> > speed of light rule insertion rate and sometimes the input parameters
> > will be copied and changed internally.
> >
> > After copying, the spec and last will be protected by the keyword
> > const and cannot be changed in the code itself. And also the driver
> > needs some extra memory to do the calculation and extra conditions to
> > understand the length of each item spec. This is not efficient.
> >
> > To solve the issue and support usage of the following fix, a new OP
> > was introduced to calculate the spec and last values after applying
> > the mask inline.
> >
> > Signed-off-by: Bing Zhao <bingz@nvidia.com>
> > Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> > ---
> > v3:
> >   - add test code
> >   - fix the issue found by AI
> > v4: reabse on top of the main
> > v5: handle some items separately and add test for them
> > ---
> >  app/test/test_ethdev_api.c             | 76 ++++++++++++++++++++++++++
> >  doc/guides/rel_notes/release_26_07.rst |  6 ++
> >  lib/ethdev/rte_flow.c                  | 46 ++++++++++++++--
> >  lib/ethdev/rte_flow.h                  | 13 +++++
> >  4 files changed, 135 insertions(+), 6 deletions(-)
> >
> > diff --git a/app/test/test_ethdev_api.c b/app/test/test_ethdev_api.c
> > index 76afd0345c..5cae1cdc1d 100644
> > --- a/app/test/test_ethdev_api.c
> > +++ b/app/test/test_ethdev_api.c
> > @@ -4,6 +4,7 @@
> >
> >  #include <rte_log.h>
> >  #include <rte_ethdev.h>
> > +#include <rte_flow.h>
> >
> >  #include <rte_test.h>
> >  #include "test.h"
> > @@ -15,6 +16,80 @@
> >  #define NUM_MBUF 1024
> >  #define MBUF_CACHE_SIZE 256
> >
> > +static int32_t
> > +ethdev_api_flow_conv_pattern_masked(void)
> > +{
> > +       const struct rte_flow_item_eth spec = {
> > +               .hdr.dst_addr.addr_bytes = { 0x01, 0x02, 0x03, 0x04,
> > +0x05,
> > 0x06 },
> > +               .hdr.src_addr.addr_bytes = { 0x0a, 0x0b, 0x0c, 0x0d,
> > + 0x0e,
> > 0x0f },
> > +               .hdr.ether_type = RTE_BE16(0x1234),
> > +       };
> > +       const struct rte_flow_item_eth last = {
> > +               .hdr.dst_addr.addr_bytes = { 0x11, 0x12, 0x13, 0x14,
> > + 0x15,
> > 0x16 },
> > +               .hdr.src_addr.addr_bytes = { 0x1a, 0x1b, 0x1c, 0x1d,
> > + 0x1e,
> > 0x1f },
> > +               .hdr.ether_type = RTE_BE16(0x5678),
> > +       };
> > +       const struct rte_flow_item_eth mask = {
> > +               .hdr.dst_addr.addr_bytes = { 0xff, 0xff, 0x00, 0x00,
> > + 0xff,
> > 0xff },
> > +               .hdr.src_addr.addr_bytes = { 0xff, 0x00, 0xff, 0x00,
> > + 0xff,
> > 0x00 },
> > +               .hdr.ether_type = RTE_BE16(0xffff),
> > +       };
> > +       const struct rte_flow_item pattern[] = {
> > +               {
> > +                       .type = RTE_FLOW_ITEM_TYPE_ETH,
> > +                       .spec = &spec,
> > +                       .last = &last,
> > +                       .mask = &mask,
> > +               },
> > +               { .type = RTE_FLOW_ITEM_TYPE_END },
> > +       };
> > +       union {
> > +               struct rte_flow_item item;
> > +               struct rte_flow_item_eth eth;
> > +               double align;
> > +               uint8_t raw[256];
> > +       } dst;
> > +       const struct rte_flow_item *item;
> > +       const struct rte_flow_item_eth *conv_spec;
> > +       const struct rte_flow_item_eth *conv_last;
> > +       int ret;
> > +
> > +       ret = rte_flow_conv(RTE_FLOW_CONV_OP_PATTERN_MASKED, NULL, 0,
> > pattern, NULL);
> > +       TEST_ASSERT(ret > 0, "Masked pattern conversion size query
> > failed");
> > +       TEST_ASSERT((size_t)ret <= sizeof(dst.raw),
> > +                   "Masked pattern conversion needs too much
> > + storage");
> > +
> > +       memset(&dst, 0, sizeof(dst));
> > +       ret = rte_flow_conv(RTE_FLOW_CONV_OP_PATTERN_MASKED, dst.raw,
> > +                           sizeof(dst.raw), pattern, NULL);
> > +       TEST_ASSERT(ret > 0, "Masked pattern conversion failed");
> > +
> > +       item = (const struct rte_flow_item *)dst.raw;
> > +       conv_spec = item[0].spec;
> > +       conv_last = item[0].last;
> > +       TEST_ASSERT_NOT_NULL(conv_spec, "Converted spec must be set");
> > +       TEST_ASSERT_NOT_NULL(conv_last, "Converted last must be set");
> > +
> > +       TEST_ASSERT_EQUAL(conv_spec->hdr.dst_addr.addr_bytes[0], 0x01,
> > +                         "Masked spec dst byte 0 mismatch");
> > +       TEST_ASSERT_EQUAL(conv_spec->hdr.dst_addr.addr_bytes[2], 0x00,
> > +                         "Masked spec dst byte 2 mismatch");
> > +       TEST_ASSERT_EQUAL(conv_spec->hdr.src_addr.addr_bytes[1], 0x00,
> > +                         "Masked spec src byte 1 mismatch");
> > +       TEST_ASSERT_EQUAL(conv_spec->hdr.ether_type, RTE_BE16(0x1234),
> > +                         "Masked spec ether type mismatch");
> > +       TEST_ASSERT_EQUAL(conv_last->hdr.dst_addr.addr_bytes[0], 0x11,
> > +                         "Masked last dst byte 0 mismatch");
> > +       TEST_ASSERT_EQUAL(conv_last->hdr.dst_addr.addr_bytes[2], 0x00,
> > +                         "Masked last dst byte 2 mismatch");
> > +       TEST_ASSERT_EQUAL(conv_last->hdr.src_addr.addr_bytes[1], 0x00,
> > +                         "Masked last src byte 1 mismatch");
> > +       TEST_ASSERT_EQUAL(conv_last->hdr.ether_type, RTE_BE16(0x5678),
> > +                         "Masked last ether type mismatch");
> > +
> > +       return TEST_SUCCESS;
> > +}
> > +
> >  static int32_t
> >  ethdev_api_queue_status(void)
> >  {
> > @@ -167,6 +242,7 @@ static struct unit_test_suite ethdev_api_testsuite =
> {
> >         .setup = NULL,
> >         .teardown = NULL,
> >         .unit_test_cases = {
> > +               TEST_CASE(ethdev_api_flow_conv_pattern_masked),
> >                 TEST_CASE(ethdev_api_queue_status),
> >                 /* TODO: Add deferred_start queue status test */
> >                 TEST_CASES_END() /**< NULL terminate unit test array
> > */ diff --git a/doc/guides/rel_notes/release_26_07.rst
> > b/doc/guides/rel_notes/release_26_07.rst
> > index b5285af5fe..4f5d21d576 100644
> > --- a/doc/guides/rel_notes/release_26_07.rst
> > +++ b/doc/guides/rel_notes/release_26_07.rst
> > @@ -190,6 +190,12 @@ API Changes
> >    - ``rte_pmd_mlx5_enable_steering``
> >    - ``rte_pmd_mlx5_disable_steering``
> >
> > +* ethdev: Added masked pattern conversion.
> > +
> > +  Added ``RTE_FLOW_CONV_OP_PATTERN_MASKED`` to ``rte_flow_conv()``
> > + to copy an entire pattern while applying each item's mask to its
> > + ``spec`` and ``last`` fields.
> > +
> >
> >  ABI Changes
> >  -----------
> > diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index
> > ec0fe08355..c7a94a1194 100644
> > --- a/lib/ethdev/rte_flow.c
> > +++ b/lib/ethdev/rte_flow.c
> > @@ -178,6 +178,14 @@ static const struct rte_flow_desc_data
> > rte_flow_desc_item[] = {
> >         MK_FLOW_ITEM(COMPARE, sizeof(struct rte_flow_item_compare)),
> > };
> >
> > +static inline size_t
> > +rte_flow_conv_item_mask_size(const struct rte_flow_item *item) {
> > +       if ((int)item->type >= 0)
> > +               return rte_flow_desc_item[item->type].size;
> > +       return sizeof(void *);
> > +}
> > +
> >  /** Generate flow_action[] entry. */
> >  #define MK_FLOW_ACTION(t, s) \
> >         [RTE_FLOW_ACTION_TYPE_ ## t] = { \ @@ -835,6 +843,8 @@
> > rte_flow_conv_action_conf(void *buf, const size_t size,
> >   *   RTE_FLOW_ITEM_TYPE_END is encountered.
> >   * @param[out] error
> >   *   Perform verbose error reporting if not NULL.
> > + * @param[in] with_mask
> > + *   If true, @p src mask will be applied to spec and last.
> >   *
> >   * @return
> >   *   A positive value representing the number of bytes needed to store
> > @@ -847,12 +857,13 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
> >                       const size_t size,
> >                       const struct rte_flow_item *src,
> >                       unsigned int num,
> > +                     bool with_mask,
> >                       struct rte_flow_error *error)  {
> >         uintptr_t data = (uintptr_t)dst;
> >         size_t off;
> >         size_t ret;
> > -       unsigned int i;
> > +       unsigned int i, j;
> >
> >         for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
> >                 /**
> > @@ -876,15 +887,27 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
> >         src -= num;
> >         dst -= num;
> >         do {
> > +               uint8_t *c_spec = NULL, *c_last = NULL;
> > +               const uint8_t *mask = src->mask;
> > +               size_t item_mask_size = mask ?
> > + rte_flow_conv_item_mask_size(src) : 0;
> > +
> >                 if (src->spec) {
> >                         off = RTE_ALIGN_CEIL(off, sizeof(double));
> >                         ret = rte_flow_conv_item_spec
> >                                 ((void *)(data + off),
> >                                  size > off ? size - off : 0, src,
> >                                  RTE_FLOW_CONV_ITEM_SPEC);
> > -                       if (size && size >= off + ret)
> > +                       if (size && size >= off + ret) {
> >                                 dst->spec = (void *)(data + off);
> > +                               c_spec = (uint8_t *)(data + off);
> > +                       }
> >                         off += ret;
> > +                       if (with_mask && c_spec && mask) {
> > +                               size_t mask_size = RTE_MIN(ret,
> > + item_mask_size);
> > +
> > +                               for (j = 0; j < mask_size; j++)
> > +                                       c_spec[j] &= mask[j];
> > +                       }
> >
> >                 }
> >                 if (src->last) {
> > @@ -893,9 +916,17 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
> >                                 ((void *)(data + off),
> >                                  size > off ? size - off : 0, src,
> >                                  RTE_FLOW_CONV_ITEM_LAST);
> > -                       if (size && size >= off + ret)
> > +                       if (size && size >= off + ret) {
> >                                 dst->last = (void *)(data + off);
> > +                               c_last = (uint8_t *)(data + off);
> > +                       }
> >                         off += ret;
> > +                       if (with_mask && c_last && mask) {
> > +                               size_t mask_size = RTE_MIN(ret,
> > + item_mask_size);
> > +
> > +                               for (j = 0; j < mask_size; j++)
> > +                                       c_last[j] &= mask[j];
> > +                       }
> >                 }
> >                 if (src->mask) {
> >                         off = RTE_ALIGN_CEIL(off, sizeof(double)); @@
> > -
> > 1042,7 +1073,7 @@ rte_flow_conv_rule(struct rte_flow_conv_rule *dst,
> >                 off = RTE_ALIGN_CEIL(off, sizeof(double));
> >                 ret = rte_flow_conv_pattern((void *)((uintptr_t)dst +
> > off),
> >                                             size > off ? size - off : 0,
> > -                                           src->pattern_ro, 0, error);
> > +                                           src->pattern_ro, 0, false,
> > + error);
> >                 if (ret < 0)
> >                         return ret;
> >                 if (size && size >= off + (size_t)ret) @@ -1143,7
> > +1174,7 @@ rte_flow_conv(enum rte_flow_conv_op op,
> >                 ret = sizeof(*attr);
> >                 break;
> >         case RTE_FLOW_CONV_OP_ITEM:
> > -               ret = rte_flow_conv_pattern(dst, size, src, 1, error);
> > +               ret = rte_flow_conv_pattern(dst, size, src, 1, false,
> > + error);
> >                 break;
> >         case RTE_FLOW_CONV_OP_ITEM_MASK:
> >                 item = src;
> > @@ -1158,7 +1189,7 @@ rte_flow_conv(enum rte_flow_conv_op op,
> >                 ret = rte_flow_conv_actions(dst, size, src, 1, error);
> >                 break;
> >         case RTE_FLOW_CONV_OP_PATTERN:
> > -               ret = rte_flow_conv_pattern(dst, size, src, 0, error);
> > +               ret = rte_flow_conv_pattern(dst, size, src, 0, false,
> > + error);
> >                 break;
> >         case RTE_FLOW_CONV_OP_ACTIONS:
> >                 ret = rte_flow_conv_actions(dst, size, src, 0, error);
> > @@
> > -1178,6 +1209,9 @@ rte_flow_conv(enum rte_flow_conv_op op,
> >         case RTE_FLOW_CONV_OP_ACTION_NAME_PTR:
> >                 ret = rte_flow_conv_name(1, 1, dst, size, src, error);
> >                 break;
> > +       case RTE_FLOW_CONV_OP_PATTERN_MASKED:
> > +               ret = rte_flow_conv_pattern(dst, size, src, 0, true,
> > error);
> > +               break;
> >         default:
> >                 ret = rte_flow_error_set
> >                 (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > NULL, diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> > b495409406..959a2f903b 100644
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -4556,6 +4556,19 @@ enum rte_flow_conv_op {
> >          *   @code const char ** @endcode
> >          */
> >         RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
> > +
> > +       /**
> > +        * Convert an entire pattern.
> > +        *
> > +        * Duplicates all pattern items at once, applying @p mask to
> > + @p
> > spec
> > +        * and @p last.
> > +        *
> > +        * - @p src type:
> > +        *   @code const struct rte_flow_item * @endcode
> > +        * - @p dst type:
> > +        *   @code struct rte_flow_item * @endcode
> > +        */
> > +       RTE_FLOW_CONV_OP_PATTERN_MASKED,
> >  };
> >
> >  /**
> > --
> > 2.34.1


^ permalink raw reply

* RE: [PATCH v5] ethdev: support inline calculating masked item value
From: Bing Zhao @ 2026-06-11  4:55 UTC (permalink / raw)
  To: Bing Zhao, Slava Ovsiienko, dev@dpdk.org, Raslan Darawsheh,
	stephen@networkplumber.org
  Cc: Ori Kam, Dariusz Sosnowski, Suanming Mou, Matan Azrad,
	NBU-Contact-Thomas Monjalon (EXTERNAL)
In-Reply-To: <20260610052729.5637-1-bingz@nvidia.com>

Hi,

In my local code,

diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index 7cf9f6f6f3..7a2721af00 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -181,9 +181,18 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
 static inline size_t
 rte_flow_conv_item_mask_size(const struct rte_flow_item *item)
 {
-       if ((int)item->type >= 0)
+       if ((int)item->type < 0)
+               return sizeof(void *);
+       switch (item->type) {
+       case RTE_FLOW_ITEM_TYPE_RAW:
+               return offsetof(struct rte_flow_item_raw, pattern);
+       case RTE_FLOW_ITEM_TYPE_GENEVE_OPT:
+               return offsetof(struct rte_flow_item_geneve_opt, data);
+       default:
+               if (rte_flow_desc_item[item->type].desc_fn != NULL)
+                       return 0;
                return rte_flow_desc_item[item->type].size;
-       return sizeof(void *);
+       }
 }

// This is the code before my latest change.
> +static inline size_t
> +rte_flow_conv_item_mask_size(const struct rte_flow_item *item) {
> +       if ((int)item->type >= 0)
> +               return rte_flow_desc_item[item->type].size;
> +       return sizeof(void *);
> +}
> +


I didn't understand why the patch I sent is still using the old code.

> -----Original Message-----
> From: Bing Zhao <bingz@nvidia.com>
> Sent: Wednesday, June 10, 2026 1:27 PM
> To: Slava Ovsiienko <viacheslavo@nvidia.com>; dev@dpdk.org; Raslan
> Darawsheh <rasland@nvidia.com>; stephen@networkplumber.org
> Cc: Ori Kam <orika@nvidia.com>; Dariusz Sosnowski <dsosnowski@nvidia.com>;
> Suanming Mou <suanmingm@nvidia.com>; Matan Azrad <matan@nvidia.com>; NBU-
> Contact-Thomas Monjalon (EXTERNAL) <thomas@monjalon.net>
> Subject: [PATCH v5] ethdev: support inline calculating masked item value
> 
> External email: Use caution opening links or attachments
> 
> 
> In the asynchronous API definition and some drivers, the rte_flow_item
> spec value may not be calculated by the driver due to the reason of speed
> of light rule insertion rate and sometimes the input parameters will be
> copied and changed internally.
> 
> After copying, the spec and last will be protected by the keyword const
> and cannot be changed in the code itself. And also the driver needs some
> extra memory to do the calculation and extra conditions to understand the
> length of each item spec. This is not efficient.
> 
> To solve the issue and support usage of the following fix, a new OP was
> introduced to calculate the spec and last values after applying the mask
> inline.
> 
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> ---
> v3:
>   - add test code
>   - fix the issue found by AI
> v4: reabse on top of the main
> v5: handle some items separately and add test for them
> ---
>  app/test/test_ethdev_api.c             | 76 ++++++++++++++++++++++++++
>  doc/guides/rel_notes/release_26_07.rst |  6 ++
>  lib/ethdev/rte_flow.c                  | 46 ++++++++++++++--
>  lib/ethdev/rte_flow.h                  | 13 +++++
>  4 files changed, 135 insertions(+), 6 deletions(-)
> 
> diff --git a/app/test/test_ethdev_api.c b/app/test/test_ethdev_api.c index
> 76afd0345c..5cae1cdc1d 100644
> --- a/app/test/test_ethdev_api.c
> +++ b/app/test/test_ethdev_api.c
> @@ -4,6 +4,7 @@
> 
>  #include <rte_log.h>
>  #include <rte_ethdev.h>
> +#include <rte_flow.h>
> 
>  #include <rte_test.h>
>  #include "test.h"
> @@ -15,6 +16,80 @@
>  #define NUM_MBUF 1024
>  #define MBUF_CACHE_SIZE 256
> 
> +static int32_t
> +ethdev_api_flow_conv_pattern_masked(void)
> +{
> +       const struct rte_flow_item_eth spec = {
> +               .hdr.dst_addr.addr_bytes = { 0x01, 0x02, 0x03, 0x04, 0x05,
> 0x06 },
> +               .hdr.src_addr.addr_bytes = { 0x0a, 0x0b, 0x0c, 0x0d, 0x0e,
> 0x0f },
> +               .hdr.ether_type = RTE_BE16(0x1234),
> +       };
> +       const struct rte_flow_item_eth last = {
> +               .hdr.dst_addr.addr_bytes = { 0x11, 0x12, 0x13, 0x14, 0x15,
> 0x16 },
> +               .hdr.src_addr.addr_bytes = { 0x1a, 0x1b, 0x1c, 0x1d, 0x1e,
> 0x1f },
> +               .hdr.ether_type = RTE_BE16(0x5678),
> +       };
> +       const struct rte_flow_item_eth mask = {
> +               .hdr.dst_addr.addr_bytes = { 0xff, 0xff, 0x00, 0x00, 0xff,
> 0xff },
> +               .hdr.src_addr.addr_bytes = { 0xff, 0x00, 0xff, 0x00, 0xff,
> 0x00 },
> +               .hdr.ether_type = RTE_BE16(0xffff),
> +       };
> +       const struct rte_flow_item pattern[] = {
> +               {
> +                       .type = RTE_FLOW_ITEM_TYPE_ETH,
> +                       .spec = &spec,
> +                       .last = &last,
> +                       .mask = &mask,
> +               },
> +               { .type = RTE_FLOW_ITEM_TYPE_END },
> +       };
> +       union {
> +               struct rte_flow_item item;
> +               struct rte_flow_item_eth eth;
> +               double align;
> +               uint8_t raw[256];
> +       } dst;
> +       const struct rte_flow_item *item;
> +       const struct rte_flow_item_eth *conv_spec;
> +       const struct rte_flow_item_eth *conv_last;
> +       int ret;
> +
> +       ret = rte_flow_conv(RTE_FLOW_CONV_OP_PATTERN_MASKED, NULL, 0,
> pattern, NULL);
> +       TEST_ASSERT(ret > 0, "Masked pattern conversion size query
> failed");
> +       TEST_ASSERT((size_t)ret <= sizeof(dst.raw),
> +                   "Masked pattern conversion needs too much storage");
> +
> +       memset(&dst, 0, sizeof(dst));
> +       ret = rte_flow_conv(RTE_FLOW_CONV_OP_PATTERN_MASKED, dst.raw,
> +                           sizeof(dst.raw), pattern, NULL);
> +       TEST_ASSERT(ret > 0, "Masked pattern conversion failed");
> +
> +       item = (const struct rte_flow_item *)dst.raw;
> +       conv_spec = item[0].spec;
> +       conv_last = item[0].last;
> +       TEST_ASSERT_NOT_NULL(conv_spec, "Converted spec must be set");
> +       TEST_ASSERT_NOT_NULL(conv_last, "Converted last must be set");
> +
> +       TEST_ASSERT_EQUAL(conv_spec->hdr.dst_addr.addr_bytes[0], 0x01,
> +                         "Masked spec dst byte 0 mismatch");
> +       TEST_ASSERT_EQUAL(conv_spec->hdr.dst_addr.addr_bytes[2], 0x00,
> +                         "Masked spec dst byte 2 mismatch");
> +       TEST_ASSERT_EQUAL(conv_spec->hdr.src_addr.addr_bytes[1], 0x00,
> +                         "Masked spec src byte 1 mismatch");
> +       TEST_ASSERT_EQUAL(conv_spec->hdr.ether_type, RTE_BE16(0x1234),
> +                         "Masked spec ether type mismatch");
> +       TEST_ASSERT_EQUAL(conv_last->hdr.dst_addr.addr_bytes[0], 0x11,
> +                         "Masked last dst byte 0 mismatch");
> +       TEST_ASSERT_EQUAL(conv_last->hdr.dst_addr.addr_bytes[2], 0x00,
> +                         "Masked last dst byte 2 mismatch");
> +       TEST_ASSERT_EQUAL(conv_last->hdr.src_addr.addr_bytes[1], 0x00,
> +                         "Masked last src byte 1 mismatch");
> +       TEST_ASSERT_EQUAL(conv_last->hdr.ether_type, RTE_BE16(0x5678),
> +                         "Masked last ether type mismatch");
> +
> +       return TEST_SUCCESS;
> +}
> +
>  static int32_t
>  ethdev_api_queue_status(void)
>  {
> @@ -167,6 +242,7 @@ static struct unit_test_suite ethdev_api_testsuite = {
>         .setup = NULL,
>         .teardown = NULL,
>         .unit_test_cases = {
> +               TEST_CASE(ethdev_api_flow_conv_pattern_masked),
>                 TEST_CASE(ethdev_api_queue_status),
>                 /* TODO: Add deferred_start queue status test */
>                 TEST_CASES_END() /**< NULL terminate unit test array */
> diff --git a/doc/guides/rel_notes/release_26_07.rst
> b/doc/guides/rel_notes/release_26_07.rst
> index b5285af5fe..4f5d21d576 100644
> --- a/doc/guides/rel_notes/release_26_07.rst
> +++ b/doc/guides/rel_notes/release_26_07.rst
> @@ -190,6 +190,12 @@ API Changes
>    - ``rte_pmd_mlx5_enable_steering``
>    - ``rte_pmd_mlx5_disable_steering``
> 
> +* ethdev: Added masked pattern conversion.
> +
> +  Added ``RTE_FLOW_CONV_OP_PATTERN_MASKED`` to ``rte_flow_conv()``  to
> + copy an entire pattern while applying each item's mask to its
> + ``spec`` and ``last`` fields.
> +
> 
>  ABI Changes
>  -----------
> diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c index
> ec0fe08355..c7a94a1194 100644
> --- a/lib/ethdev/rte_flow.c
> +++ b/lib/ethdev/rte_flow.c
> @@ -178,6 +178,14 @@ static const struct rte_flow_desc_data
> rte_flow_desc_item[] = {
>         MK_FLOW_ITEM(COMPARE, sizeof(struct rte_flow_item_compare)),  };
> 
> +static inline size_t
> +rte_flow_conv_item_mask_size(const struct rte_flow_item *item) {
> +       if ((int)item->type >= 0)
> +               return rte_flow_desc_item[item->type].size;
> +       return sizeof(void *);
> +}
> +
>  /** Generate flow_action[] entry. */
>  #define MK_FLOW_ACTION(t, s) \
>         [RTE_FLOW_ACTION_TYPE_ ## t] = { \ @@ -835,6 +843,8 @@
> rte_flow_conv_action_conf(void *buf, const size_t size,
>   *   RTE_FLOW_ITEM_TYPE_END is encountered.
>   * @param[out] error
>   *   Perform verbose error reporting if not NULL.
> + * @param[in] with_mask
> + *   If true, @p src mask will be applied to spec and last.
>   *
>   * @return
>   *   A positive value representing the number of bytes needed to store
> @@ -847,12 +857,13 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
>                       const size_t size,
>                       const struct rte_flow_item *src,
>                       unsigned int num,
> +                     bool with_mask,
>                       struct rte_flow_error *error)  {
>         uintptr_t data = (uintptr_t)dst;
>         size_t off;
>         size_t ret;
> -       unsigned int i;
> +       unsigned int i, j;
> 
>         for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
>                 /**
> @@ -876,15 +887,27 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
>         src -= num;
>         dst -= num;
>         do {
> +               uint8_t *c_spec = NULL, *c_last = NULL;
> +               const uint8_t *mask = src->mask;
> +               size_t item_mask_size = mask ?
> + rte_flow_conv_item_mask_size(src) : 0;
> +
>                 if (src->spec) {
>                         off = RTE_ALIGN_CEIL(off, sizeof(double));
>                         ret = rte_flow_conv_item_spec
>                                 ((void *)(data + off),
>                                  size > off ? size - off : 0, src,
>                                  RTE_FLOW_CONV_ITEM_SPEC);
> -                       if (size && size >= off + ret)
> +                       if (size && size >= off + ret) {
>                                 dst->spec = (void *)(data + off);
> +                               c_spec = (uint8_t *)(data + off);
> +                       }
>                         off += ret;
> +                       if (with_mask && c_spec && mask) {
> +                               size_t mask_size = RTE_MIN(ret,
> + item_mask_size);
> +
> +                               for (j = 0; j < mask_size; j++)
> +                                       c_spec[j] &= mask[j];
> +                       }
> 
>                 }
>                 if (src->last) {
> @@ -893,9 +916,17 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
>                                 ((void *)(data + off),
>                                  size > off ? size - off : 0, src,
>                                  RTE_FLOW_CONV_ITEM_LAST);
> -                       if (size && size >= off + ret)
> +                       if (size && size >= off + ret) {
>                                 dst->last = (void *)(data + off);
> +                               c_last = (uint8_t *)(data + off);
> +                       }
>                         off += ret;
> +                       if (with_mask && c_last && mask) {
> +                               size_t mask_size = RTE_MIN(ret,
> + item_mask_size);
> +
> +                               for (j = 0; j < mask_size; j++)
> +                                       c_last[j] &= mask[j];
> +                       }
>                 }
>                 if (src->mask) {
>                         off = RTE_ALIGN_CEIL(off, sizeof(double)); @@ -
> 1042,7 +1073,7 @@ rte_flow_conv_rule(struct rte_flow_conv_rule *dst,
>                 off = RTE_ALIGN_CEIL(off, sizeof(double));
>                 ret = rte_flow_conv_pattern((void *)((uintptr_t)dst +
> off),
>                                             size > off ? size - off : 0,
> -                                           src->pattern_ro, 0, error);
> +                                           src->pattern_ro, 0, false,
> + error);
>                 if (ret < 0)
>                         return ret;
>                 if (size && size >= off + (size_t)ret) @@ -1143,7 +1174,7
> @@ rte_flow_conv(enum rte_flow_conv_op op,
>                 ret = sizeof(*attr);
>                 break;
>         case RTE_FLOW_CONV_OP_ITEM:
> -               ret = rte_flow_conv_pattern(dst, size, src, 1, error);
> +               ret = rte_flow_conv_pattern(dst, size, src, 1, false,
> + error);
>                 break;
>         case RTE_FLOW_CONV_OP_ITEM_MASK:
>                 item = src;
> @@ -1158,7 +1189,7 @@ rte_flow_conv(enum rte_flow_conv_op op,
>                 ret = rte_flow_conv_actions(dst, size, src, 1, error);
>                 break;
>         case RTE_FLOW_CONV_OP_PATTERN:
> -               ret = rte_flow_conv_pattern(dst, size, src, 0, error);
> +               ret = rte_flow_conv_pattern(dst, size, src, 0, false,
> + error);
>                 break;
>         case RTE_FLOW_CONV_OP_ACTIONS:
>                 ret = rte_flow_conv_actions(dst, size, src, 0, error); @@
> -1178,6 +1209,9 @@ rte_flow_conv(enum rte_flow_conv_op op,
>         case RTE_FLOW_CONV_OP_ACTION_NAME_PTR:
>                 ret = rte_flow_conv_name(1, 1, dst, size, src, error);
>                 break;
> +       case RTE_FLOW_CONV_OP_PATTERN_MASKED:
> +               ret = rte_flow_conv_pattern(dst, size, src, 0, true,
> error);
> +               break;
>         default:
>                 ret = rte_flow_error_set
>                 (error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index
> b495409406..959a2f903b 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -4556,6 +4556,19 @@ enum rte_flow_conv_op {
>          *   @code const char ** @endcode
>          */
>         RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
> +
> +       /**
> +        * Convert an entire pattern.
> +        *
> +        * Duplicates all pattern items at once, applying @p mask to @p
> spec
> +        * and @p last.
> +        *
> +        * - @p src type:
> +        *   @code const struct rte_flow_item * @endcode
> +        * - @p dst type:
> +        *   @code struct rte_flow_item * @endcode
> +        */
> +       RTE_FLOW_CONV_OP_PATTERN_MASKED,
>  };
> 
>  /**
> --
> 2.34.1


^ permalink raw reply related

* Re: [PATCH v3 01/10] eal: add interface to check if lcore is EAL managed
From: Thomas Monjalon @ 2026-06-10 23:28 UTC (permalink / raw)
  To: Huisong Li
  Cc: anatoly.burakov, sivaprasad.tummala, dev, stephen, fengchengwen,
	yangxingui, zhanjie9, lihuisong
In-Reply-To: <20260522041110.2023062-2-lihuisong@huawei.com>

22/05/2026 06:11, Huisong Li:
> Add a new helper function rte_lcore_is_eal_managed() to determine
> if a logical core is managed by EAL.
> 
> This interface returns true if the lcore role is either ROLE_RTE
> (standard worker/main cores) or ROLE_SERVICE (service cores).
[...]
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_lcore_is_eal_managed, 26.07)
> +int rte_lcore_is_eal_managed(unsigned int lcore_id)
> +{
> +	struct rte_config *cfg = rte_eal_get_configuration();
> +
> +	if (lcore_id >= RTE_MAX_LCORE)
> +		return 0;
> +	return cfg->lcore_role[lcore_id] == ROLE_RTE ||
> +		cfg->lcore_role[lcore_id] == ROLE_SERVICE;
> +}

I'm not sure about adding this function in the API.
We already have rte_eal_lcore_role()
and I feel having this explicit ROLE_RTE || ROLE_SERVICE
in the code where needed may be less confusing.

Note: we should prefix these constants with RTE_LCORE_



^ permalink raw reply

* Re: [PATCH] power/intel_uncore: reduce log level for dependency missing
From: Thomas Monjalon @ 2026-06-10 23:21 UTC (permalink / raw)
  To: anatoly.burakov, sivaprasad.tummala
  Cc: dev, stephen, fengchengwen, yangxingui, zhanjie9, Huisong Li
In-Reply-To: <20260512013047.375535-1-lihuisong@huawei.com>

12/05/2026 03:30, Huisong Li:
> When run dpdk-l3fwd with '-u' on non-X86 platform, user would
> happen a noisy print as the following:
> "POWER: Uncore frequency management not supported/enabled on this
> kernel. Please enable CONFIG_INTEL_UNCORE_FREQ_CONTROL if on Intel
> x86 with linux kernel >= 5.6".
> 
> The root cause is that intel_uncore driver's .init() will be called
> on any platform when use automatic detection mode. The function in
> intel_uncore driver will print above log on non-X86 platform.
> 
> But the existing uncore core cannot solve this problem unless break
> ABI to add new callback. So reduce its log level to avoid this
> incorrect prompt.

Any comment please?
What would be the right solution?



^ permalink raw reply

* Re: [PATCH 0/3] power: some cleancode for cpufreq library
From: Thomas Monjalon @ 2026-06-10 23:14 UTC (permalink / raw)
  To: Huisong Li
  Cc: anatoly.burakov, sivaprasad.tummala, dev, stephen, yangxingui,
	zhanjie9, fengchengwen
In-Reply-To: <5a7c0fff-515c-425d-bd25-538cccabfb09@huawei.com>

11/05/2026 03:10, fengchengwen:
> Series-acked-by: Chengwen Feng <fengchengwen@huawei.com>
> 
> On 5/9/2026 4:45 PM, Huisong Li wrote:
> > Move some common definition to common header file.
> > 
> > Huisong Li (3):
> >   power: move power state structure to power cpufreq header
> >   power: unify decimal format macro for strtoul
> >   power: use common decimal macro definition

Applied, thanks.



^ permalink raw reply

* Re: [PATCH] power: fix duplicated typedef for setting uncore freq
From: Thomas Monjalon @ 2026-06-10 22:46 UTC (permalink / raw)
  To: Huisong Li
  Cc: anatoly.burakov, sivaprasad.tummala, stephen, dev, fengchengwen,
	yangxingui, zhanjie9
In-Reply-To: <20260507112754.3418377-1-lihuisong@huawei.com>

07/05/2026 13:27, Huisong Li:
> Remove a duplicated rte_power_set_uncore_freq_t definition.
> And this ops is intended to set any available uncore frequency instead
> of minimum and maximum one.
> 
> Fixes: ebe99d351a3f ("power: refactor uncore power management")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Huisong Li <lihuisong@huawei.com>

Applied, thanks.




^ permalink raw reply

* Re: [PATCH] power: fix off-by-one in uncore env bounds check
From: Thomas Monjalon @ 2026-06-10 22:36 UTC (permalink / raw)
  To: Denis Sergeev; +Cc: dev, stable, anatoly.burakov, sivaprasad.tummala, sdl.dpdk
In-Reply-To: <20260603042205.116191-1-denserg.edu@gmail.com>

03/06/2026 06:21, Denis Sergeev:
> The condition in rte_power_set_uncore_env() uses '<=' instead of '<'
> when comparing the env argument against the size of uncore_env_str[].
> Since RTE_DIM(uncore_env_str) equals 4 and valid indices are 0..3,
> a caller passing env=4 bypasses the guard and causes an out-of-bounds
> read of uncore_env_str[4] at two sites within the same block.
> 
> Fix by replacing '<=' with '<', consistent with the correct pattern
> already used in rte_power_uncore_init() in the same file.
> 
> Found by Linux Verification Center (linuxtesting.org) with SVACE.
> 
> Fixes: ac1edcb6621a ("power: refactor uncore power management API")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Denis Sergeev <denserg.edu@gmail.com>

Applied, thanks.



^ permalink raw reply

* Re: [PATCH] power/amd_pstate: fix frequency matching for continuous scaling
From: Thomas Monjalon @ 2026-06-10 22:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, stable, Anatoly Burakov, Sivaprasad Tummala
In-Reply-To: <20260328193419.106100-1-stephen@networkplumber.org>

28/03/2026 20:34, Stephen Hemminger:
> The power_init_for_setting_freq() function fails on systems using the
> amd-pstate-epp driver because the current CPU frequency read from
> scaling_setspeed does not exactly match any of the synthesized
> frequency buckets. Unlike acpi_cpufreq which provides a discrete list
> of frequencies, amd-pstate operates with continuously variable
> frequencies, so an exact match will rarely succeed.
> 
> For example, on a Ryzen 9 7945HX the sysfs file reports 2797172
> which rounds to 2797000, but this value does not appear in the
> generated frequency table.
> 
> Replace the exact match lookup with a nearest-frequency search.
> 
[...]
> -	freq = strtoul(buf, NULL, POWER_CONVERT_TO_DECIMAL);
> +	errno = 0;
> +	freq = strtoul(buf, &endptr, POWER_CONVERT_TO_DECIMAL);
> +	if (errno != 0 || endptr == buf || freq == 0) {
> +		POWER_LOG(ERR, "Failed to parse frequency '%s' for lcore %u",
> +				buf, pi->lcore_id);
> +		goto err;
> +	}
>  
>  	/* convert the frequency to nearest 1000 value
>  	 * Ex: if freq=1396789 then freq_conv=1397000
>  	 * Ex: if freq=800030 then freq_conv=800000
>  	 */
> -	unsigned int freq_conv = 0;
> -	freq_conv = (freq + FREQ_ROUNDING_DELTA)
> -				/ ROUND_FREQ_TO_N_1000;
> +	freq_conv = (freq + FREQ_ROUNDING_DELTA) / ROUND_FREQ_TO_N_1000;
>  	freq_conv = freq_conv * ROUND_FREQ_TO_N_1000;
>  
> -	for (i = 0; i < pi->nb_freqs; i++) {
> -		if (freq_conv == pi->freqs[i]) {
> -			pi->curr_idx = i;
> -			pi->f = f;
> -			return 0;
> +	/* Find the nearest frequency in the table.
> +	 * With amd-pstate the CPU runs at continuously variable
> +	 * frequencies so the current frequency will not exactly
> +	 * match one of the synthesized frequency buckets.
> +	 */
> +	best_idx = 0;
> +	best_diff = abs_diff(freq_conv, pi->freqs[0]);
> +
> +	for (i = 1; i < pi->nb_freqs; i++) {
> +		diff = abs_diff(freq_conv, pi->freqs[i]);
> +		if (diff < best_diff) {
> +			best_diff = diff;
> +			best_idx = i;
>  		}
>  	}

GPT found this problem:

power_init_for_setting_freq() now assigns pi->curr_idx = best_idx
after finding the nearest synthesized frequency bucket.
However, set_freq_internal() skips the sysfs write
whenever idx == pi->curr_idx.

This means that if the current scaling_setspeed value is merely close
to a bucket but not equal to it, a later request to set that bucket
will return success without actually writing the requested frequency.
This can happen during init too: power_amd_pstate_cpufreq_init()
calls freq_max() after initialization, but if the current frequency
is nearest to the max bucket, freq_max() will be skipped even when
the actual sysfs value is not the synthesized max.
The nearest-bucket match should not be treated as an exact programmed
frequency, or the next explicit set to that bucket should be forced.



^ permalink raw reply

* Re: [PATCH v3 0/2] ring: replace use of rte_atomic
From: Thomas Monjalon @ 2026-06-10 21:38 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev
In-Reply-To: <20260610184701.657769-1-stephen@networkplumber.org>

10/06/2026 20:43, Stephen Hemminger:
> v3:
>   - rebase and squash patches
>   - keep original code for x86 single thread case
> 
> Stephen Hemminger (2):
>   ring: split single thread vs multi-thread cases
>   ring: replace rte_atomic32 with __sync builtin

Applied, thanks.




^ permalink raw reply

* Re: [PATCH 0/8] telemetry: thread-safe and bounded parameter parsing
From: Thomas Monjalon @ 2026-06-10 20:42 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Bruce Richardson
In-Reply-To: <aiZ1jB3_MaDP4OTK@bricha3-mobl1.ger.corp.intel.com>

> > Stephen Hemminger (8):
> >   telemetry: fix thread-unsafe command parsing
> >   ethdev: make telemetry parameter parsing thread-safe
> >   dmadev: validate telemetry parameters
> >   security: harden telemetry parameter parsing
> >   eventdev: remove strtok from telemetry handlers
> >   eventdev/eth_rx: fix thread-unsafe telemetry parsing
> >   eventdev/eth_rx: reject out-of-range telemetry adapter ID
> >   eventdev/timer: reject out-of-range ID
> > 
> Series-Acked-by: Bruce Richardson <bruce.richardson@intel.com>

After passing the automatic AI review in codex,
it seems the review is not relevant.

Applied, thanks.




^ permalink raw reply

* [PATCH v3 2/2] ring: replace rte_atomic32 with __sync builtin
From: Stephen Hemminger @ 2026-06-10 18:43 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Wathsala Vithanage
In-Reply-To: <20260610184701.657769-1-stephen@networkplumber.org>

Replaces use of the deprecated rte_atomic32 code with GCC builtin
atomic operations on x86. The C11 version used on other architectures
is unchanged.

Although it would be preferable to use C11 on all architectures,
there is a performance loss if we do it that way.

On x86 i9-13900H, two physical cores MP/MC (cycles/elem),
ring_perf test with GCC 14.2:

  n      asm    sync     c11
  8    72.86   72.12   89.01
  32   18.74   18.80   24.62
  64   10.07    9.86   12.41
  128   6.99    6.74    9.01
  256   6.38    6.20    7.34

Pure C11 regresses 15-30% due to __atomic_compare_exchange_n's
failure-writeback semantic.

Drop the now-unused enqueue argument.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 lib/ring/meson.build                          |   2 +-
 lib/ring/rte_ring_c11_pvt.h                   |  25 ----
 lib/ring/rte_ring_elem_pvt.h                  |  37 +++--
 ..._ring_generic_pvt.h => rte_ring_gcc_pvt.h} | 141 ++++++++----------
 lib/ring/rte_ring_hts_elem_pvt.h              |   8 +-
 lib/ring/soring.c                             |  10 +-
 6 files changed, 99 insertions(+), 124 deletions(-)
 rename lib/ring/{rte_ring_generic_pvt.h => rte_ring_gcc_pvt.h} (81%)

diff --git a/lib/ring/meson.build b/lib/ring/meson.build
index 21f2c12989..2ba160b178 100644
--- a/lib/ring/meson.build
+++ b/lib/ring/meson.build
@@ -9,7 +9,7 @@ indirect_headers += files (
         'rte_ring_elem.h',
         'rte_ring_elem_pvt.h',
         'rte_ring_c11_pvt.h',
-        'rte_ring_generic_pvt.h',
+        'rte_ring_gcc_pvt.h',
         'rte_ring_hts.h',
         'rte_ring_hts_elem_pvt.h',
         'rte_ring_peek.h',
diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index 5afc14dec9..a6c14921d3 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -19,31 +19,6 @@
  * For more information please refer to <rte_ring.h>.
  */
 
-/**
- * @internal This function updates tail values.
- */
-static __rte_always_inline void
-__rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
-		uint32_t new_val, uint32_t single, uint32_t enqueue)
-{
-	RTE_SET_USED(enqueue);
-
-	/*
-	 * If there are other enqueues/dequeues in progress that preceded us,
-	 * we need to wait for them to complete
-	 */
-	if (!single)
-		rte_wait_until_equal_32((uint32_t *)(uintptr_t)&ht->tail, old_val,
-			rte_memory_order_relaxed);
-
-	/*
-	 * R0: Establishes a synchronizing edge with load-acquire of tail at A1.
-	 * Ensures that memory effects by this thread on ring elements array
-	 * is observed by a different thread of the other type.
-	 */
-	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
-}
-
 /**
  * @internal This is a helper function that moves the producer/consumer head
  *    optimized for single threaded case
diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
index a0fdec9812..29758d0bb8 100644
--- a/lib/ring/rte_ring_elem_pvt.h
+++ b/lib/ring/rte_ring_elem_pvt.h
@@ -299,17 +299,36 @@ __rte_ring_dequeue_elems(struct rte_ring *r, uint32_t cons_head,
 			cons_head & r->mask, esize, num);
 }
 
-/* Between load and load. there might be cpu reorder in weak model
- * (powerpc/arm).
- * There are 2 choices for the users
- * 1.use rmb() memory barrier
- * 2.use one-direction load_acquire/store_release barrier
- * It depends on performance test results.
+static __rte_always_inline void
+__rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
+		       uint32_t new_val, uint32_t single)
+{
+	/*
+	 * If there are other enqueues/dequeues in progress that preceded us,
+	 * we need to wait for them to complete
+	 */
+	if (!single)
+		rte_wait_until_equal_32((uint32_t *)(uintptr_t)&ht->tail, old_val,
+			rte_memory_order_relaxed);
+
+	/*
+	 * R0: Establishes a synchronizing edge with load-acquire of tail at A1.
+	 * Ensures that memory effects by this thread on ring elements array
+	 * is observed by a different thread of the other type.
+	 */
+	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
+}
+
+/*
+ * The function __rte_ring_headtail_move_head_mt,st has two versions
+ * based on what is most efficient on a given architecture.
+ *
+ * The C11 is preferred but on x86 GCC has 10% performance drop.
  */
 #ifdef RTE_USE_C11_MEM_MODEL
 #include "rte_ring_c11_pvt.h"
 #else
-#include "rte_ring_generic_pvt.h"
+#include "rte_ring_gcc_pvt.h"
 #endif
 
 /**
@@ -426,7 +445,7 @@ __rte_ring_do_enqueue_elem(struct rte_ring *r, const void *obj_table,
 
 	__rte_ring_enqueue_elems(r, prod_head, obj_table, esize, n);
 
-	__rte_ring_update_tail(&r->prod, prod_head, prod_next, is_sp, 1);
+	__rte_ring_update_tail(&r->prod, prod_head, prod_next, is_sp);
 end:
 	if (free_space != NULL)
 		*free_space = free_entries - n;
@@ -473,7 +492,7 @@ __rte_ring_do_dequeue_elem(struct rte_ring *r, void *obj_table,
 
 	__rte_ring_dequeue_elems(r, cons_head, obj_table, esize, n);
 
-	__rte_ring_update_tail(&r->cons, cons_head, cons_next, is_sc, 0);
+	__rte_ring_update_tail(&r->cons, cons_head, cons_next, is_sc);
 
 end:
 	if (available != NULL)
diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_gcc_pvt.h
similarity index 81%
rename from lib/ring/rte_ring_generic_pvt.h
rename to lib/ring/rte_ring_gcc_pvt.h
index c044b0824f..340ece28c7 100644
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_gcc_pvt.h
@@ -7,42 +7,21 @@
  * Used as BSD-3 Licensed with permission from Kip Macy.
  */
 
-#ifndef _RTE_RING_GENERIC_PVT_H_
-#define _RTE_RING_GENERIC_PVT_H_
+#ifndef _RTE_RING_GCC_PVT_H_
+#define _RTE_RING_GCC_PVT_H_
 
 /**
- * @file rte_ring_generic_pvt.h
+ * @file rte_ring_gcc_pvt.h
  * It is not recommended to include this file directly,
  * include <rte_ring.h> instead.
  * Contains internal helper functions for MP/SP and MC/SC ring modes.
  * For more information please refer to <rte_ring.h>.
  */
 
-/**
- * @internal This function updates tail values.
- */
-static __rte_always_inline void
-__rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
-		uint32_t new_val, uint32_t single, uint32_t enqueue)
-{
-	if (enqueue)
-		rte_smp_wmb();
-	else
-		rte_smp_rmb();
-	/*
-	 * If there are other enqueues/dequeues in progress that preceded us,
-	 * we need to wait for them to complete
-	 */
-	if (!single)
-		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
-			rte_memory_order_relaxed);
-
-	ht->tail = new_val;
-}
 
 /**
  * @internal This is a helper function that moves the producer/consumer head
- *    for use in multi-thread safe path
+ *    optimized for single threaded case
  *
  * @param d
  *   A pointer to the headtail structure with head value to be moved
@@ -67,52 +46,43 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
  *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only
  */
 static __rte_always_inline unsigned int
-__rte_ring_headtail_move_head_mt(struct rte_ring_headtail *d,
+__rte_ring_headtail_move_head_st(struct rte_ring_headtail *d,
 		const struct rte_ring_headtail *s, uint32_t capacity,
-		unsigned int n, enum rte_ring_queue_behavior behavior,
+		unsigned int n,
+		enum rte_ring_queue_behavior behavior,
 		uint32_t *old_head, uint32_t *new_head, uint32_t *entries)
 {
-	unsigned int max = n;
-	int success;
-
-	do {
-		/* Reset n to the initial burst count */
-		n = max;
 
-		*old_head = d->head;
+	*old_head = d->head;
 
-		/* add rmb barrier to avoid load/load reorder in weak
-		 * memory model. It is noop on x86
-		 */
-		rte_smp_rmb();
+	/* add rmb barrier to avoid load/load reorder in weak
+	 * memory model. It is noop on x86
+	 */
+	rte_smp_rmb();
 
-		/*
-		 *  The subtraction is done between two unsigned 32bits value
-		 * (the result is always modulo 32 bits even if we have
-		 * *old_head > s->tail). So 'entries' is always between 0
-		 * and capacity (which is < size).
-		 */
-		*entries = (capacity + s->tail - *old_head);
+	/*
+	 *  The subtraction is done between two unsigned 32bits value
+	 * (the result is always modulo 32 bits even if we have
+	 * *old_head > s->tail). So 'entries' is always between 0
+	 * and capacity (which is < size).
+	 */
+	*entries = capacity + s->tail - *old_head;
 
-		/* check that we have enough room in ring */
-		if (unlikely(n > *entries))
-			n = (behavior == RTE_RING_QUEUE_FIXED) ?
-					0 : *entries;
+	/* check that we have enough room in ring */
+	if (unlikely(n > *entries))
+		n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
 
-		if (n == 0)
-			return 0;
+	if (n == 0)
+		return 0;
 
-		*new_head = *old_head + n;
-		success = rte_atomic32_cmpset(
-				(uint32_t *)(uintptr_t)&d->head,
-				*old_head, *new_head);
-	} while (unlikely(success == 0));
+	*new_head = *old_head + n;
+	d->head = *new_head;
 	return n;
 }
 
 /**
  * @internal This is a helper function that moves the producer/consumer head
- *    optimized for single threaded case
+ *    for use in multi-thread safe path
  *
  * @param d
  *   A pointer to the headtail structure with head value to be moved
@@ -137,36 +107,49 @@ __rte_ring_headtail_move_head_mt(struct rte_ring_headtail *d,
  *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only
  */
 static __rte_always_inline unsigned int
-__rte_ring_headtail_move_head_st(struct rte_ring_headtail *d,
+__rte_ring_headtail_move_head_mt(struct rte_ring_headtail *d,
 		const struct rte_ring_headtail *s, uint32_t capacity,
-		unsigned int n,
-		enum rte_ring_queue_behavior behavior,
+		unsigned int n, enum rte_ring_queue_behavior behavior,
 		uint32_t *old_head, uint32_t *new_head, uint32_t *entries)
 {
-	*old_head = d->head;
+	unsigned int max = n;
+	bool success;
 
-	/* add rmb barrier to avoid load/load reorder in weak
-	 * memory model. It is noop on x86
-	 */
-	rte_smp_rmb();
+	do {
+		/* Reset n to the initial burst count */
+		n = max;
 
-	/*
-	 *  The subtraction is done between two unsigned 32bits value
-	 * (the result is always modulo 32 bits even if we have
-	 * *old_head > s->tail). So 'entries' is always between 0
-	 * and capacity (which is < size).
-	 */
-	*entries = (capacity + s->tail - *old_head);
+		*old_head = d->head;
 
-	/* check that we have enough room in ring */
-	if (unlikely(n > *entries))
-		n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+		/* add fence to avoid load/load reorder in weak
+		 * memory model. It is noop on x86
+		 */
+		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+
+		/*
+		 *  The subtraction is done between two unsigned 32bits value
+		 * (the result is always modulo 32 bits even if we have
+		 * *old_head > s->tail). So 'entries' is always between 0
+		 * and capacity (which is < size).
+		 */
+		*entries = (capacity + s->tail - *old_head);
+
+		/* check that we have enough room in ring */
+		if (unlikely(n > *entries))
+			n = (behavior == RTE_RING_QUEUE_FIXED) ?
+					0 : *entries;
+
+		if (n == 0)
+			return 0;
 
-	if (likely(n > 0)) {
 		*new_head = *old_head + n;
-		d->head = *new_head;
-	}
+
+		success = __sync_bool_compare_and_swap(
+				(uint32_t *)(uintptr_t)&d->head,
+				*old_head, *new_head);
+	} while (unlikely(!success));
+
 	return n;
 }
 
-#endif /* _RTE_RING_GENERIC_PVT_H_ */
+#endif /* _RTE_RING_GCC_PVT_H_ */
diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
index a01089d15d..97ae240e2e 100644
--- a/lib/ring/rte_ring_hts_elem_pvt.h
+++ b/lib/ring/rte_ring_hts_elem_pvt.h
@@ -25,12 +25,10 @@
  */
 static __rte_always_inline void
 __rte_ring_hts_update_tail(struct rte_ring_hts_headtail *ht, uint32_t old_tail,
-	uint32_t num, uint32_t enqueue)
+			   uint32_t num)
 {
 	uint32_t tail;
 
-	RTE_SET_USED(enqueue);
-
 	tail = old_tail + num;
 
 	/*
@@ -217,7 +215,7 @@ __rte_ring_do_hts_enqueue_elem(struct rte_ring *r, const void *obj_table,
 
 	if (n != 0) {
 		__rte_ring_enqueue_elems(r, head, obj_table, esize, n);
-		__rte_ring_hts_update_tail(&r->hts_prod, head, n, 1);
+		__rte_ring_hts_update_tail(&r->hts_prod, head, n);
 	}
 
 	if (free_space != NULL)
@@ -258,7 +256,7 @@ __rte_ring_do_hts_dequeue_elem(struct rte_ring *r, void *obj_table,
 
 	if (n != 0) {
 		__rte_ring_dequeue_elems(r, head, obj_table, esize, n);
-		__rte_ring_hts_update_tail(&r->hts_cons, head, n, 0);
+		__rte_ring_hts_update_tail(&r->hts_cons, head, n);
 	}
 
 	if (available != NULL)
diff --git a/lib/ring/soring.c b/lib/ring/soring.c
index 22f9c60e9c..45292c0f78 100644
--- a/lib/ring/soring.c
+++ b/lib/ring/soring.c
@@ -202,21 +202,21 @@ __rte_soring_move_cons_head(struct rte_soring *r, uint32_t stage, uint32_t num,
 
 static __rte_always_inline void
 __rte_soring_update_tail(struct __rte_ring_headtail *rht,
-	enum rte_ring_sync_type st, uint32_t head, uint32_t next, uint32_t enq)
+		 enum rte_ring_sync_type st, uint32_t head, uint32_t next)
 {
 	uint32_t n;
 
 	switch (st) {
 	case RTE_RING_SYNC_ST:
 	case RTE_RING_SYNC_MT:
-		__rte_ring_update_tail(&rht->ht, head, next, st, enq);
+		__rte_ring_update_tail(&rht->ht, head, next, st);
 		break;
 	case RTE_RING_SYNC_MT_RTS:
 		__rte_ring_rts_update_tail(&rht->rts);
 		break;
 	case RTE_RING_SYNC_MT_HTS:
 		n = next - head;
-		__rte_ring_hts_update_tail(&rht->hts, head, n, enq);
+		__rte_ring_hts_update_tail(&rht->hts, head, n);
 		break;
 	default:
 		/* unsupported mode, shouldn't be here */
@@ -295,7 +295,7 @@ soring_enqueue(struct rte_soring *r, const void *objs,
 			&prod_head, &prod_next, &nb_free);
 	if (n != 0) {
 		__enqueue_elems(r, objs, meta, prod_head, n);
-		__rte_soring_update_tail(&r->prod, st, prod_head, prod_next, 1);
+		__rte_soring_update_tail(&r->prod, st, prod_head, prod_next);
 	}
 
 	if (free_space != NULL)
@@ -401,7 +401,7 @@ soring_dequeue(struct rte_soring *r, void *objs, void *meta,
 	/* we have some elems to consume */
 	if (n != 0) {
 		__dequeue_elems(r, objs, meta, cons_head, n);
-		__rte_soring_update_tail(&r->cons, st, cons_head, cons_next, 0);
+		__rte_soring_update_tail(&r->cons, st, cons_head, cons_next);
 	}
 
 	if (available != NULL)
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 1/2] ring: split single thread vs multi-thread cases
From: Stephen Hemminger @ 2026-06-10 18:43 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Konstantin Ananyev, Wathsala Vithanage
In-Reply-To: <20260610184701.657769-1-stephen@networkplumber.org>

The move head function has optimization for updating when
being used on single threaded ring. Code is cleaner if the two
cases are split into separate functions.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Tested-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 lib/ring/rte_ring_c11_pvt.h     | 100 +++++++++++++++++++++++++-------
 lib/ring/rte_ring_elem_pvt.h    |  16 +++--
 lib/ring/rte_ring_generic_pvt.h |  77 ++++++++++++++++++++----
 lib/ring/soring.c               |  24 +++++---
 4 files changed, 171 insertions(+), 46 deletions(-)

diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index 07b6efc416..5afc14dec9 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -46,6 +46,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
 
 /**
  * @internal This is a helper function that moves the producer/consumer head
+ *    optimized for single threaded case
  *
  * @param d
  *   A pointer to the headtail structure with head value to be moved
@@ -54,8 +55,6 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
  *   function only reads tail value from it
  * @param capacity
  *   Either ring capacity value (for producer), or zero (for consumer)
- * @param is_st
- *   Indicates whether multi-thread safe path is needed or not
  * @param n
  *   The number of elements we want to move head value on
  * @param behavior
@@ -72,14 +71,77 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
  *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only
  */
 static __rte_always_inline unsigned int
-__rte_ring_headtail_move_head(struct rte_ring_headtail *d,
+__rte_ring_headtail_move_head_st(struct rte_ring_headtail *d,
 		const struct rte_ring_headtail *s, uint32_t capacity,
-		unsigned int is_st, unsigned int n,
+		unsigned int n,
 		enum rte_ring_queue_behavior behavior,
 		uint32_t *old_head, uint32_t *new_head, uint32_t *entries)
 {
 	uint32_t stail;
-	int success;
+
+	/* Single producer: only this thread writes d->head,
+	 * so a relaxed load is sufficient.
+	 */
+	*old_head = rte_atomic_load_explicit(&d->head, rte_memory_order_relaxed);
+
+	/* Acquire pairs with the consumer's release-store of tail in __rte_ring_update_tail,
+	 * ensuring the consumer's ring-element reads are complete before
+	 * we observe the updated tail.
+	 */
+	stail = rte_atomic_load_explicit(&s->tail, rte_memory_order_acquire);
+
+	/* Unsigned subtraction is modulo 2^32, so entries is always in
+	 * [0, capacity) even if old_head > stail.
+	 */
+	*entries = capacity + stail - *old_head;
+
+	/* check that we have enough room in ring */
+	if (unlikely(n > *entries))
+		n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+	if (n > 0) {
+		*new_head = *old_head + n;
+		rte_atomic_store_explicit(&d->head, *new_head, rte_memory_order_relaxed);
+	}
+
+	return n;
+}
+
+/**
+ * @internal This is a helper function that moves the producer/consumer head
+ *    for use in multi-thread safe path
+ *
+ * @param d
+ *   A pointer to the headtail structure with head value to be moved
+ * @param s
+ *   A pointer to the counter-part headtail structure. Note that this
+ *   function only reads tail value from it
+ * @param capacity
+ *   Either ring capacity value (for producer), or zero (for consumer)
+ * @param n
+ *   The number of elements we want to move head value on
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Move on a fixed number of items
+ *   RTE_RING_QUEUE_VARIABLE: Move on as many items as possible
+ * @param old_head
+ *   Returns head value as it was before the move
+ * @param new_head
+ *   Returns the new head value
+ * @param entries
+ *   Returns the number of ring entries available BEFORE head was moved
+ * @return
+ *   Actual number of objects the head was moved on
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only
+ */
+static __rte_always_inline unsigned int
+__rte_ring_headtail_move_head_mt(struct rte_ring_headtail *d,
+		const struct rte_ring_headtail *s, uint32_t capacity,
+		unsigned int n,
+		enum rte_ring_queue_behavior behavior,
+		uint32_t *old_head, uint32_t *new_head, uint32_t *entries)
+{
+	uint32_t stail;
+	bool success;
 	unsigned int max = n;
 
 	/*
@@ -120,25 +182,21 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_st) {
-			d->head = *new_head;
-			success = 1;
-		} else
-			/* on failure, *old_head is updated */
-			/*
-			 * R1/A2.
-			 * R1: Establishes a synchronizing edge with A0 of a
-			 * different thread.
-			 * A2: Establishes a synchronizing edge with R1 of a
-			 * different thread to observe same value for stail
-			 * observed by that thread on CAS failure (to retry
-			 * with an updated *old_head).
-			 */
-			success = rte_atomic_compare_exchange_strong_explicit(
+		/* on failure, *old_head is updated */
+		/*
+		 * R1/A2.
+		 * R1: Establishes a synchronizing edge with A0 of a
+		 * different thread.
+		 * A2: Establishes a synchronizing edge with R1 of a
+		 * different thread to observe same value for stail
+		 * observed by that thread on CAS failure (to retry
+		 * with an updated *old_head).
+		 */
+		success = rte_atomic_compare_exchange_strong_explicit(
 					&d->head, old_head, *new_head,
 					rte_memory_order_release,
 					rte_memory_order_acquire);
-	} while (unlikely(success == 0));
+	} while (unlikely(!success));
 	return n;
 }
 
diff --git a/lib/ring/rte_ring_elem_pvt.h b/lib/ring/rte_ring_elem_pvt.h
index 6eafae121f..a0fdec9812 100644
--- a/lib/ring/rte_ring_elem_pvt.h
+++ b/lib/ring/rte_ring_elem_pvt.h
@@ -341,8 +341,12 @@ __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
 		uint32_t *old_head, uint32_t *new_head,
 		uint32_t *free_entries)
 {
-	return __rte_ring_headtail_move_head(&r->prod, &r->cons, r->capacity,
-			is_sp, n, behavior, old_head, new_head, free_entries);
+	if (is_sp)
+		return __rte_ring_headtail_move_head_st(&r->prod, &r->cons, r->capacity,
+				n, behavior, old_head, new_head, free_entries);
+	else
+		return __rte_ring_headtail_move_head_mt(&r->prod, &r->cons, r->capacity,
+				n, behavior, old_head, new_head, free_entries);
 }
 
 /**
@@ -374,8 +378,12 @@ __rte_ring_move_cons_head(struct rte_ring *r, unsigned int is_sc,
 		uint32_t *old_head, uint32_t *new_head,
 		uint32_t *entries)
 {
-	return __rte_ring_headtail_move_head(&r->cons, &r->prod, 0,
-			is_sc, n, behavior, old_head, new_head, entries);
+	if (is_sc)
+		return __rte_ring_headtail_move_head_st(&r->cons, &r->prod, 0,
+				n, behavior, old_head, new_head, entries);
+	else
+		return __rte_ring_headtail_move_head_mt(&r->cons, &r->prod, 0,
+				n, behavior, old_head, new_head, entries);
 }
 
 /**
diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
index affd2d5ba7..c044b0824f 100644
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -42,6 +42,7 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
 
 /**
  * @internal This is a helper function that moves the producer/consumer head
+ *    for use in multi-thread safe path
  *
  * @param d
  *   A pointer to the headtail structure with head value to be moved
@@ -50,8 +51,6 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
  *   function only reads tail value from it
  * @param capacity
  *   Either ring capacity value (for producer), or zero (for consumer)
- * @param is_st
- *   Indicates whether multi-thread safe path is needed or not
  * @param n
  *   The number of elements we want to move head value on
  * @param behavior
@@ -68,10 +67,9 @@ __rte_ring_update_tail(struct rte_ring_headtail *ht, uint32_t old_val,
  *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only
  */
 static __rte_always_inline unsigned int
-__rte_ring_headtail_move_head(struct rte_ring_headtail *d,
+__rte_ring_headtail_move_head_mt(struct rte_ring_headtail *d,
 		const struct rte_ring_headtail *s, uint32_t capacity,
-		unsigned int is_st, unsigned int n,
-		enum rte_ring_queue_behavior behavior,
+		unsigned int n, enum rte_ring_queue_behavior behavior,
 		uint32_t *old_head, uint32_t *new_head, uint32_t *entries)
 {
 	unsigned int max = n;
@@ -105,15 +103,70 @@ __rte_ring_headtail_move_head(struct rte_ring_headtail *d,
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_st) {
-			d->head = *new_head;
-			success = 1;
-		} else
-			success = rte_atomic32_cmpset(
-					(uint32_t *)(uintptr_t)&d->head,
-					*old_head, *new_head);
+		success = rte_atomic32_cmpset(
+				(uint32_t *)(uintptr_t)&d->head,
+				*old_head, *new_head);
 	} while (unlikely(success == 0));
 	return n;
 }
 
+/**
+ * @internal This is a helper function that moves the producer/consumer head
+ *    optimized for single threaded case
+ *
+ * @param d
+ *   A pointer to the headtail structure with head value to be moved
+ * @param s
+ *   A pointer to the counter-part headtail structure. Note that this
+ *   function only reads tail value from it
+ * @param capacity
+ *   Either ring capacity value (for producer), or zero (for consumer)
+ * @param n
+ *   The number of elements we want to move head value on
+ * @param behavior
+ *   RTE_RING_QUEUE_FIXED:    Move on a fixed number of items
+ *   RTE_RING_QUEUE_VARIABLE: Move on as many items as possible
+ * @param old_head
+ *   Returns head value as it was before the move
+ * @param new_head
+ *   Returns the new head value
+ * @param entries
+ *   Returns the number of ring entries available BEFORE head was moved
+ * @return
+ *   Actual number of objects the head was moved on
+ *   If behavior == RTE_RING_QUEUE_FIXED, this will be 0 or n only
+ */
+static __rte_always_inline unsigned int
+__rte_ring_headtail_move_head_st(struct rte_ring_headtail *d,
+		const struct rte_ring_headtail *s, uint32_t capacity,
+		unsigned int n,
+		enum rte_ring_queue_behavior behavior,
+		uint32_t *old_head, uint32_t *new_head, uint32_t *entries)
+{
+	*old_head = d->head;
+
+	/* add rmb barrier to avoid load/load reorder in weak
+	 * memory model. It is noop on x86
+	 */
+	rte_smp_rmb();
+
+	/*
+	 *  The subtraction is done between two unsigned 32bits value
+	 * (the result is always modulo 32 bits even if we have
+	 * *old_head > s->tail). So 'entries' is always between 0
+	 * and capacity (which is < size).
+	 */
+	*entries = (capacity + s->tail - *old_head);
+
+	/* check that we have enough room in ring */
+	if (unlikely(n > *entries))
+		n = (behavior == RTE_RING_QUEUE_FIXED) ? 0 : *entries;
+
+	if (likely(n > 0)) {
+		*new_head = *old_head + n;
+		d->head = *new_head;
+	}
+	return n;
+}
+
 #endif /* _RTE_RING_GENERIC_PVT_H_ */
diff --git a/lib/ring/soring.c b/lib/ring/soring.c
index e9c75619fe..22f9c60e9c 100644
--- a/lib/ring/soring.c
+++ b/lib/ring/soring.c
@@ -135,9 +135,12 @@ __rte_soring_move_prod_head(struct rte_soring *r, uint32_t num,
 
 	switch (st) {
 	case RTE_RING_SYNC_ST:
+		n = __rte_ring_headtail_move_head_st(&r->prod.ht, &r->cons.ht,
+			r->capacity, num, behavior, head, next, free);
+		break;
 	case RTE_RING_SYNC_MT:
-		n = __rte_ring_headtail_move_head(&r->prod.ht, &r->cons.ht,
-			r->capacity, st, num, behavior, head, next, free);
+		n = __rte_ring_headtail_move_head_mt(&r->prod.ht, &r->cons.ht,
+			r->capacity, num, behavior, head, next, free);
 		break;
 	case RTE_RING_SYNC_MT_RTS:
 		n = __rte_ring_rts_move_head(&r->prod.rts, &r->cons.ht,
@@ -168,9 +171,13 @@ __rte_soring_move_cons_head(struct rte_soring *r, uint32_t stage, uint32_t num,
 
 	switch (st) {
 	case RTE_RING_SYNC_ST:
+		n = __rte_ring_headtail_move_head_st(&r->cons.ht,
+			&r->stage[stage].ht, 0, num, behavior,
+			head, next, avail);
+		break;
 	case RTE_RING_SYNC_MT:
-		n = __rte_ring_headtail_move_head(&r->cons.ht,
-			&r->stage[stage].ht, 0, st, num, behavior,
+		n = __rte_ring_headtail_move_head_mt(&r->cons.ht,
+			&r->stage[stage].ht, 0, num, behavior,
 			head, next, avail);
 		break;
 	case RTE_RING_SYNC_MT_RTS:
@@ -309,9 +316,8 @@ soring_enqueue_start(struct rte_soring *r, uint32_t num,
 
 	switch (st) {
 	case RTE_RING_SYNC_ST:
-		n = __rte_ring_headtail_move_head(&r->prod.ht, &r->cons.ht,
-			r->capacity, RTE_RING_SYNC_ST, num, behavior,
-			&head, &next, &free);
+		n = __rte_ring_headtail_move_head_st(&r->prod.ht, &r->cons.ht,
+			r->capacity, num, behavior, &head, &next, &free);
 		break;
 	case RTE_RING_SYNC_MT_HTS:
 		n = __rte_ring_hts_move_head(&r->prod.hts, &r->cons.ht,
@@ -419,8 +425,8 @@ soring_dequeue_start(struct rte_soring *r, void *objs, void *meta,
 
 	switch (st) {
 	case RTE_RING_SYNC_ST:
-		n = __rte_ring_headtail_move_head(&r->cons.ht, &r->stage[ns].ht,
-			0, RTE_RING_SYNC_ST, num, behavior, &head, &next,
+		n = __rte_ring_headtail_move_head_st(&r->cons.ht, &r->stage[ns].ht,
+			0, num, behavior, &head, &next,
 			&avail);
 		break;
 	case RTE_RING_SYNC_MT_HTS:
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 0/2] ring: replace use of rte_atomic
From: Stephen Hemminger @ 2026-06-10 18:43 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger
In-Reply-To: <20260602171552.686349-1-stephen@networkplumber.org>

This is part of the broader rte_atomic32 deprecation work, sent
separately because it is the most complex part and benefits from
independent review.

Convert lib/ring off rte_atomic32 and onto the C11 memory model,
except for the ring head compare-and-swap where special case
is needed. On x86 with GCC using C11 atomics produces measurably
worse code.

After this series only __rte_ring_headtail_move_head has separate
C11 and GCC-builtin implementations; everything else uses the same
code on all architectures. The default RTE_USE_C11_MEM_MODEL
selection per architecture is unchanged.

v3:
  - rebase and squash patches
  - keep original code for x86 single thread case

Stephen Hemminger (2):
  ring: split single thread vs multi-thread cases
  ring: replace rte_atomic32 with __sync builtin

 lib/ring/meson.build             |   2 +-
 lib/ring/rte_ring_c11_pvt.h      | 107 +++++++++++++--------
 lib/ring/rte_ring_elem_pvt.h     |  53 ++++++++---
 lib/ring/rte_ring_gcc_pvt.h      | 155 +++++++++++++++++++++++++++++++
 lib/ring/rte_ring_generic_pvt.h  | 119 ------------------------
 lib/ring/rte_ring_hts_elem_pvt.h |   8 +-
 lib/ring/soring.c                |  34 ++++---
 7 files changed, 289 insertions(+), 189 deletions(-)
 create mode 100644 lib/ring/rte_ring_gcc_pvt.h
 delete mode 100644 lib/ring/rte_ring_generic_pvt.h

-- 
2.53.0

^ permalink raw reply

* [PATCH] dts: avoid Scapy MAC resolution in Rx split test
From: Thomas Monjalon @ 2026-06-10 18:32 UTC (permalink / raw)
  To: dev; +Cc: Luca Vizzarro, Patrick Robb

The test gets the Ethernet header length from Scapy with len(Ether()).

When building DTS API documentation, Sphinx imports the test module
and shows this warning:
WARNING: MAC address to reach destination not found. Using broadcast.

Use a dummy MAC address so Scapy no longer performs
destination resolution during import.

Fixes: 01c70544cffd ("dts: add selective Rx tests")

Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
 dts/tests/TestSuite_rx_split.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/dts/tests/TestSuite_rx_split.py b/dts/tests/TestSuite_rx_split.py
index 0c7913bbd8..5f5a2e6187 100644
--- a/dts/tests/TestSuite_rx_split.py
+++ b/dts/tests/TestSuite_rx_split.py
@@ -27,7 +27,7 @@
 from framework.test_suite import TestSuite, func_test
 
 PAYLOAD = bytes(range(256))
-ETHER_HDR_LEN = len(Ether())
+ETHER_HDR_LEN = len(Ether(dst="00:00:00:00:00:00"))
 IP_HDR_LEN = len(IP())
 ETHER_IP_HDR_LEN = ETHER_HDR_LEN + IP_HDR_LEN
 ETHER_MIN_FRAME_LEN = 60
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH 00/10] net/bnxt: vector mode V3 implementation and AVX2 improvements
From: Kishore Padmanabha @ 2026-06-10 18:18 UTC (permalink / raw)
  To: Mohammad Shuab Siddique; +Cc: dev, stable
In-Reply-To: <20260604031851.2267548-1-Mohammad-Shuab.Siddique@broadcom.com>


[-- Attachment #1.1: Type: text/plain, Size: 2765 bytes --]

On Wed, Jun 3, 2026 at 11:17 PM Mohammad Shuab Siddique <
mohammad-shuab.siddique@broadcom.com> wrote:

> From: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
>
> This series adds vector mode support for BCM5760X (Thor2 / V3 packets)
> and fixes several AVX2 path issues:
>
>  - Implement AVX2 vector RX for V3 packet completions with VLAN TCI
> reporting
>  - Fix stale nr_bds values that could cause the producer to lag the
> consumer
>  - Fix incorrect advertisement of LRO offload capability
>  - Fix scalar RX path not checking rxcmp flags before setting the PTP mbuf
> flag
>  - Fix missing timestamps for non-PTP traffic when promiscuous
> timestamping is on
>  - Fix Tx ring corruption and burst truncation after an invalid Tx
> descriptor
>  - Optimise the AVX2 RX paths (dead code removal, register reduction for
> V3)
>  - Fix VLAN strip ol_flag being set per-port instead of per-packet for V3
>  - Add burst mode info entry for V3 in bnxt_rx_burst_info
>  - Fix V3 vector mode defaulting to cksum-good instead of cksum-unknown
>
> Most patches carry Fixes: tags. New functionality (V3 vector mode, AVX2
> optimisation) is targeted at 26.07.
>
> Note: this series depends on series "net/bnxt: stability fixes".
>
> Chenna Arnoori (1):
>   net/bnxt: fix RX timestamping for non-PTP packets
>
> Damodharam Ammepalli (1):
>   net/bnxt: fix advertising RX LRO offload capability
>
> Keegan Freyhof (6):
>   net/bnxt: vector mode implementation for V3 packets
>   net/bnxt: stale values in nr_bds are cleared
>   net/bnxt: optimization of the AVX2 RX paths
>   net/bnxt: fix for VLAN stripping being set incorrectly
>   net/bnxt: add vector AVX2 burst mode indicator for v3
>   net/bnxt: fix v3 vector mode not selecting cksum unknown
>
> Mohammad Shuab Siddique (1):
>   net/bnxt: scalar rx path disregarded rxcmp flags for setting ptp mbuf
>     flag
>
> Zoe Cheimets (1):
>   net/bnxt: fix packet burst truncation after invalid Tx descriptor
>
>  .gitignore                              |   1 +
>  drivers/net/bnxt/bnxt.h                 |   1 +
>  drivers/net/bnxt/bnxt_ethdev.c          |   6 +-
>  drivers/net/bnxt/bnxt_hwrm.c            |   7 +-
>  drivers/net/bnxt/bnxt_rxq.c             |   3 +-
>  drivers/net/bnxt/bnxt_rxr.c             |  25 +-
>  drivers/net/bnxt/bnxt_rxr.h             |  14 +-
>  drivers/net/bnxt/bnxt_rxtx_vec_avx2.c   | 444 +++++++++++++++++++++++-
>  drivers/net/bnxt/bnxt_rxtx_vec_common.h |  37 ++
>  drivers/net/bnxt/bnxt_stats.c           |   3 +
>  drivers/net/bnxt/bnxt_txr.c             | 170 ++++++++-
>  11 files changed, 677 insertions(+), 34 deletions(-)
>
> patches merged into dpdk-next-net-brcm
Thanks

> --
> 2.47.3
>
>

[-- Attachment #1.2: Type: text/html, Size: 3574 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5493 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/5] net/bnxt: interrupt handling, external mbuf and stability fixes
From: Kishore Padmanabha @ 2026-06-10 18:17 UTC (permalink / raw)
  To: Mohammad Shuab Siddique; +Cc: dev, stable
In-Reply-To: <20260605005016.2290160-1-Mohammad-Shuab.Siddique@broadcom.com>


[-- Attachment #1.1: Type: text/plain, Size: 2084 bytes --]

On Thu, Jun 4, 2026 at 8:48 PM Mohammad Shuab Siddique <
mohammad-shuab.siddique@broadcom.com> wrote:

> From: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
>
> This series addresses interrupt handling, external memory, and crash bugs:
>
>  - Fix incorrect completion validation for NQEs and RX completions causing
>    excess interrupts
>  - Use buf_addr instead of IOVA for mbufs from external memory pools
>  - Skip IOVA range check for external mbuf head nodes to avoid false
> failures
>  - Add null checks to prevent segfaults when accessing uninitialized
> structures
>  - Fix segfault on exit when bonded ports are present, by checking whether
>    ethdev has already freed the RX/TX queue arrays
>
> All patches carry Fixes: tags and Cc: stable@dpdk.org.
>
> Note: this series depends on series "net/bnxt: ULP stats timer and PTP".
>
> Changes in v2:
>  - Patch 1/5: replace printf() with PMD_DRV_LOG_LINE() (DPDK logging
> standard)
>  - Patch 2/5: replace custom bnxt_mbuf_buf_addr() with
> rte_pktmbuf_mtod_offset()
>
> Ajit Khaparde (2):
>   net/bnxt: use buf address for external mbuf
>   net/bnxt: prevent a potential segfault
>
> Keegan Freyhof (2):
>   net/bnxt: fix NQ/CQ processing for interrupt handling
>   net/bnxt: fix for segmentation fault that would occur on exit
>
> Mohammad Shuab Siddique (1):
>   net/bnxt: fix IOVA range check for external mbuf head node
>
>  drivers/net/bnxt/bnxt.h        |  2 +
>  drivers/net/bnxt/bnxt_cpr.c    | 100 ++++++++++++++++++++++++++++++++++
>  drivers/net/bnxt/bnxt_cpr.h    |  34 +++++++++++-
>  drivers/net/bnxt/bnxt_ethdev.c |   3 ++
>  drivers/net/bnxt/bnxt_hwrm.c   |   3 ++
>  drivers/net/bnxt/bnxt_ring.c   |  11 +++-
>  drivers/net/bnxt/bnxt_rxq.c    |  47 +++++++++++++++-
>  drivers/net/bnxt/bnxt_rxr.c    |   2 +-
>  drivers/net/bnxt/bnxt_stats.c  |  17 +++---
>  drivers/net/bnxt/bnxt_txr.c    |  19 +++++--
>  10 files changed, 223 insertions(+), 15 deletions(-)
>
> patches merged into dpdk-next-net-brcm
Thanks

> --
> 2.47.3
>
>

[-- Attachment #1.2: Type: text/html, Size: 2865 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5493 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/4] net/bnxt: miscellaneous bug fixes
From: Kishore Padmanabha @ 2026-06-10 18:17 UTC (permalink / raw)
  To: Mohammad Shuab Siddique; +Cc: dev, stable
In-Reply-To: <20260604225622.2285191-1-Mohammad-Shuab.Siddique@broadcom.com>


[-- Attachment #1.1: Type: text/plain, Size: 1305 bytes --]

On Thu, Jun 4, 2026 at 6:54 PM Mohammad Shuab Siddique <
mohammad-shuab.siddique@broadcom.com> wrote:

> From: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
>
> This series collects four independent bug fixes for the bnxt PMD:
>
>  - Eliminate unnecessary long TX BDs when only checksum offload is needed
>  - Pass QP1 resource count correctly when configuring backing store
>  - Fix implicit integer sign-extension in the doorbell calculation
>  - Prevent VFs from attempting global RSS configuration
>
> All patches carry Fixes: tags and Cc: stable@dpdk.org.
>
> Changes in v2:
>  - Patch 4/4: add missing Fixes: tag for RSS hash mode fix
>
> Ajit Khaparde (2):
>   net/bnxt: modify check for short Tx BDs
>   net/bnxt: fix QP resource count in backing store config
>
> Mohammad Shuab Siddique (1):
>   net/bnxt: fix RSS hash mode configuration for VFs
>
> Zoe Cheimets (1):
>   net/bnxt: remove implicit integer sign-extension
>
>  drivers/net/bnxt/bnxt_ethdev.c |  4 ++--
>  drivers/net/bnxt/bnxt_hwrm.c   | 18 ++++++++++++++++--
>  drivers/net/bnxt/bnxt_ring.c   |  7 ++++---
>  drivers/net/bnxt/bnxt_txr.c    |  3 +--
>  4 files changed, 23 insertions(+), 9 deletions(-)
>
> patches merged into dpdk-next-net-brcm
Thanks

> --
> 2.47.3
>
>

[-- Attachment #1.2: Type: text/html, Size: 2024 bytes --]

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5493 bytes --]

^ permalink raw reply

* Re: [PATCH v1 19/20] drivers: add testpmd commands for private features
From: Stephen Hemminger @ 2026-06-10 17:22 UTC (permalink / raw)
  To: liujie5; +Cc: dev
In-Reply-To: <20260610013936.3634968-20-liujie5@linkdatatechnology.com>

On Wed, 10 Jun 2026 09:39:35 +0800
liujie5@linkdatatechnology.com wrote:

> From: Jie Liu <liujie5@linkdatatechnology.com>
> 
> Introduce private testpmd commands and implementation files to enable
> debugging and testing of sxe2-specific hardware features (such as
> packet scheduling reset, UDP tunnel configuration, and IPsec ingress/
> egress offloads) directly within the testpmd application.
> 
> The parameters are parsed using the standard 'rte_kvargs' library during
> the PCI/vdev probing phase. Documentation for these parameters is also
> updated.
> 
> During memory hotplug events, the SXE2 driver needs to track memory
> segment layout changes to maintain internal DMA mappings. However,
> existing memseg walk functions (rte_memseg_walk) acquire memory locks
> and cannot be called from within memory event callbacks, leading to
> potential deadlocks.
> 
> This commit introduces sxe2_memseg_walk_cb() as a helper that walks
> memory segments using the thread-unsafe variant
> rte_memseg_walk_thread_unsafe(), which is safe to call from
> memory-related callbacks [citation:1][citation:3][citation:5].
> 
> The implementation follows the standard rte_memseg_walk_t prototype,
> processing each memseg to update driver-specific data structures.
> 
> Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
> ---

This memory stuff looks problematic and needs more review.
At a minimum I see a pattern of not handling values from strtoul()
that are out of range.

I asked AI for a more detailed review and it saw.
[PATCH 19/20] drivers: add testpmd commands for private features

There is concern about the amount of driver-private testpmd plumbing and
devargs this patch adds. The raw command count (7) is within precedent
(i40e has 29, mlx5 13, ixgbe 11), but the mechanism and content are not.

Error: the command logic is placed in sxe2_testpmd_lib.c, compiled into the
driver library, and exposed through 14 new RTE_EXPORT_EXPERIMENTAL_SYMBOL
entries (sxe2_ipsec_egress_create, sxe2_ipsec_conf_set, sxe2_flow_rule_dump,
sxe2_udp_tunnel_operations, sxe2_stats_info_show, sxe2_testpmd_sched_reset,
etc). No upstream driver exports symbols for its testpmd commands; all six
existing drivers with testpmd integration compile their *_testpmd.c into
testpmd via testpmd_sources and use internal access. These exports are
vendor public API that any application can link against. The driver .so also
gains application state for the commands: g_tx_session[][], g_rx_session[][],
g_esp_header_offset[], g_sess_pool. SA-manager bookkeeping does not belong
in a PMD. Move the logic into sxe2_testpmd.c and drop all 14 exports; at
most RTE_EXPORT_INTERNAL_SYMBOL is appropriate here.

Error: three commands duplicate standard testpmd functionality the driver
already supports. "sxe2 flow rule dump" exists because the driver does not
implement the rte_flow dev_dump op; implement the op and the standard
"flow dump <port> all" works for every application. "sxe2 <port>
udp_tunnel_port add|rm" duplicates "port config <port> udp_tunnel_port
add|rm", which calls the udp_tunnel ops added in patch 12. "sxe2 show stats"
duplicates "show port xstats"; the driver already implements xstats, and
anything missing from xstats should be added there, not shown by a private
formatter.

Warning: the 9-subcommand ipsec suite (egress/ingress add/rm/show,
session-id and esp-hdr-offset set/get, flush, stats) is an SA management
application embedded in the driver. Inline crypto is exercised with
examples/ipsec-secgw, as done for other inline-crypto PMDs. If interactive
SA management in testpmd is needed, propose it as generic testpmd commands
over rte_security so all drivers benefit.

Warning: seven private devargs are added (flow-duplicate-pattern,
function-flow-direct, fnav-stat-type, drv-sw-stats, high-performance-mode,
sched-layer-mode, rx-low-latency) with no documentation: no Runtime
Configuration section in sxe2.rst and no RTE_PMD_REGISTER_PARAM_STRING, so
they are undiscoverable. Beyond documentation: flow-duplicate-pattern makes
rte_flow duplicate-rule semantics vary per boot option, which is not
acceptable for a standard API; fnav-stat-type and drv-sw-stats select stats
sources and belong in xstats; sched-layer-mode configures TM topology that
the rte_tm hierarchy built by the application should determine;
high-performance-mode accepts only the value 1 and is undocumented - if the
mode is safe make it the default, otherwise document the trade-off. Each
surviving devarg needs documentation and a rationale for why no standard
API covers it.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox