DPDK-dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 0/5] add versioned symbols for recently stabilized APIs
From: David Marchand @ 2026-06-23 13:50 UTC (permalink / raw)
  To: Dariusz Sosnowski, Thomas Monjalon, dpdk-techboard
  Cc: Bruce Richardson, Andrew Rybchenko, Viacheslav Ovsiienko,
	Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad, dev
In-Reply-To: <20260623113752.1100072-1-dsosnowski@nvidia.com>

Hello Dariusz,

On Tue, 23 Jun 2026 at 13:38, Dariusz Sosnowski <dsosnowski@nvidia.com> wrote:
>
> Main goal of this patchset is to address https://bugs.dpdk.org/show_bug.cgi?id=1957

It is expected that experimental symbols may disappear overnight, and
this bug could also be closed as NOTABUG.

On the other hand, we do state in the doc that compatibility could be
provided when stabilising an experimental API, so ok.. let's try.

> but it also handles other recently stabilized symbols and has some minor fixes:
>
> - Patch 1 - Fix RTE_VERSION_EXPERIMENTAL_SYMBOL macro on clang.

Ouch... /me hides.


> - Patch 2 - Allow function versioning inside drivers.
> - Patch 3 - Version the function symbols stabilized in
>   https://git.dpdk.org/dpdk/commit/?id=e8cab133645f5466ef75e511629add43b68a5027
> - Patch 4 - Introduce versioning macros for global variable symbols.
> - Patch 5 - Version the function and variable symbols stabilized in
>   https://git.dpdk.org/dpdk/commit/?id=4ee2f5c1cedf9ee7f39afa667f71b07f4004ba5c
>
> Issue is still not fully fixed for stabilized global variables:
> rte_flow_dynf_metadata_offs and rte_flow_dynf_metadata_mask.

Well, symbol versioning is not something for variables.
Exposing global variables was a mistake from the start...
Those were exported for "performance" reasons as those are accessed
via inline helpers (but I am not sure there were benchmarks showing
the benefits).

I am for forbidding exports of global variables from now, unless some
really good performance benchmark is provided (@techboard for info).


Now, in practice for your issue, rather than reintroducing symbol
aliases (technical solution that I dropped when refactoring the
macros), I think we can do with some middle ground approach:
- leaving the inline helpers as "stable" (not __rte_experimental),
- restoring the EXPERIMENTAL version on the global variables, this
will restore the location of those symbols from the previous ABI pov,
and the checks won't catch this discrepancy anyway,
- during 26.11, drop the EXPERIMENTAL version on those variables,


In other words, stopping at your patch 3 of the series, then adding:

$ git diff
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index ec0fe08355..8bd21ccd31 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -23,11 +23,11 @@
 #define FLOW_LOG RTE_ETHDEV_LOG_LINE

 /* Mbuf dynamic field name for metadata. */
-RTE_EXPORT_SYMBOL(rte_flow_dynf_metadata_offs)
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_flow_dynf_metadata_offs, 19.11)
 int32_t rte_flow_dynf_metadata_offs = -1;

 /* Mbuf dynamic field flag bit number for metadata. */
-RTE_EXPORT_SYMBOL(rte_flow_dynf_metadata_mask)
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_flow_dynf_metadata_mask, 19.11)
 uint64_t rte_flow_dynf_metadata_mask;

 /**

> Patch 4 and 5 address the bug for these global variables,
> by providing a single storage for both EXPERIMENTAL and
> DPDK_26 variable symbol versions.
> This is achieved through symbol aliasing.
> But this solution is limited only to executables compiled with clang.
>
> clang and gcc have a different default behavior regarding relocations
> of global variables exposed by shared libraries.
>

Yeah... not even thinking about adding MSVC in the list...


-- 
David Marchand


^ permalink raw reply related

* Re: [PATCH 1/5] eal: fix macro for versioned experimental symbol
From: Stephen Hemminger @ 2026-06-23 13:50 UTC (permalink / raw)
  To: Dariusz Sosnowski; +Cc: David Marchand, dev, Bruce Richardson
In-Reply-To: <20260623113752.1100072-2-dsosnowski@nvidia.com>

On Tue, 23 Jun 2026 13:37:47 +0200
Dariusz Sosnowski <dsosnowski@nvidia.com> wrote:

> Add a missing semicolon after __asm__ block in
> RTE_VERSION_EXPERIMENTAL_SYMBOL macro.
> It's lack triggers the following compilation error with clang:
> 
>     ../lib/ethdev/rte_flow.c:320:1: error: expected ';' after top-level asm block
>       320 | RTE_VERSION_EXPERIMENTAL_SYMBOL(int, rte_flow_dynf_metadata_register, (void))
>           | ^
>     ../lib/eal/common/eal_export.h:75:74: note: expanded from macro 'RTE_VERSION_EXPERIMENTAL_SYMBOL'
>        75 | __asm__(".symver " RTE_STR(name) "_exp, " RTE_STR(name) "@EXPERIMENTAL") \
>           |                                                                          ^
>     ../lib/eal/include/rte_common.h:237:20: note: expanded from macro '\
>     __rte_used'
>       237 | #define __rte_used __attribute__((used))
>           |                    ^
> 
> Fixes: e30e194c4d06 ("eal: rework function versioning macros")
> Cc: david.marchand@redhat.com
> 
> Signed-

I didn't see this because clang doesn't have symver support.
Which version of clang is this?

^ permalink raw reply

* Re: [PATCH 0/5] add versioned symbols for recently stabilized APIs
From: Stephen Hemminger @ 2026-06-23 13:48 UTC (permalink / raw)
  To: Dariusz Sosnowski
  Cc: Thomas Monjalon, David Marchand, Bruce Richardson,
	Andrew Rybchenko, Viacheslav Ovsiienko, Bing Zhao, Ori Kam,
	Suanming Mou, Matan Azrad, dev
In-Reply-To: <20260623113752.1100072-1-dsosnowski@nvidia.com>

On Tue, 23 Jun 2026 13:37:46 +0200
Dariusz Sosnowski <dsosnowski@nvidia.com> wrote:

> Main goal of this patchset is to address https://bugs.dpdk.org/show_bug.cgi?id=1957
> but it also handles other recently stabilized symbols and has some minor fixes:
> 
> - Patch 1 - Fix RTE_VERSION_EXPERIMENTAL_SYMBOL macro on clang.
> - Patch 2 - Allow function versioning inside drivers.
> - Patch 3 - Version the function symbols stabilized in
>   https://git.dpdk.org/dpdk/commit/?id=e8cab133645f5466ef75e511629add43b68a5027
> - Patch 4 - Introduce versioning macros for global variable symbols.
> - Patch 5 - Version the function and variable symbols stabilized in
>   https://git.dpdk.org/dpdk/commit/?id=4ee2f5c1cedf9ee7f39afa667f71b07f4004ba5c
> 
> Issue is still not fully fixed for stabilized global variables:
> rte_flow_dynf_metadata_offs and rte_flow_dynf_metadata_mask.
> Patch 4 and 5 address the bug for these global variables,
> by providing a single storage for both EXPERIMENTAL and
> DPDK_26 variable symbol versions.
> This is achieved through symbol aliasing.
> But this solution is limited only to executables compiled with clang.
> 
> clang and gcc have a different default behavior regarding relocations
> of global variables exposed by shared libraries.
> 
> With clang, R_X86_64_GLOB_DAT relocations are generated for executables:
> 
>    $ readelf -sW build-26.07/lib/librte_ethdev.so | grep rte_flow_dynf_metadata_offs
>        113: 00000000000ea4c0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@@DPDK_26
>        116: 00000000000ea4c0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@EXPERIMENTAL
>        970: 00000000000ea4c0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_impl
>       1212: 00000000000ea4c0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_v26
>       1325: 00000000000ea4c0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_exp
>       1415: 00000000000ea4c0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@@DPDK_26
>       1705: 00000000000ea4c0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@EXPERIMENTAL
> 
>     $ readelf -rW build-26.07/drivers/librte_net_mlx5.so | grep rte_flow_dynf_metadata_offs
>     0000000003ed5f18  0000001600000006 R_X86_64_GLOB_DAT      0000000000000000 rte_flow_dynf_metadata_offs@DPDK_26 + 0
> 
>     $ readelf -rW build-25.11/app/dpdk-testpmd | grep rte_flow_dynf_metadata_offs
> --> 000000000028ef70  0000011300000006 R_X86_64_GLOB_DAT      0000000000000000 rte_flow_dynf_metadata_offs@EXPERIMENTAL + 0  
> 
> With gcc, R_X86_64_COPY relocations are generated:
> 
>     $ readelf -sW build-26.07/lib/librte_ethdev.so | grep rte_flow_dynf_metadata_offs
>        113: 00000000000e74e0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@@DPDK_26
>        116: 00000000000e74e0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@EXPERIMENTAL
>       1471: 00000000000e74e0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_impl
>       2134: 00000000000e74e0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_v26
>       2247: 00000000000e74e0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_exp
>       2337: 00000000000e74e0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@@DPDK_26
>       2627: 00000000000e74e0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@EXPERIMENTAL
> 
>     $ readelf -rW build-26.07/drivers/librte_net_mlx5.so | grep rte_flow_dynf_metadata_offs
>     00000000046dbef0  0000001600000006 R_X86_64_GLOB_DAT      0000000000000000 rte_flow_dynf_metadata_offs@DPDK_26 + 0
> 
>     $ readelf -rW build-25.11/app/dpdk-testpmd | grep rte_flow_dynf_metadata_offs
> --> 000000000029b540  000001d200000005 R_X86_64_COPY          000000000029b540 rte_flow_dynf_metadata_offs@EXPERIMENTAL + 0  
> 
> With copy relocations (testpmd linked through gcc) the following happens:
> 
> - When variable symbol (with EXPERIMENTAL version) gets resolved inside executable,
>   global variable gets copied from read-only data to executable's BSS section.
>   Executable will access this variable through BSS.
> - When variable symbol (with DPDK_26 version) gets resolved inside a library,
>   global variable is accessed indirectly through GOT.
>   It is stored inside BSS section of the shared library.
> 
> So executable and libraries refer to different storage,
> eventually leading to inconsistent runtime behavior.
> Problems only appears when executable and library require
> different versions of global variable symbol.
> If testpmd from 26.07 is used with libraries from 26.07,
> GOT entry for these variables will point to copied variable.
> 
> Without copy relocations (testpmd linked through clang) both
> executable and libraries access the global variable indirectly through GOT.
> Runtime behavior is consistent, regardless of the mix of variable symbol versions.
> 
> The only other solution I could find was to use dlsym() inside libraries
> to dynamically resolve the location rte_flow_dynf_metadata_offs and rte_flow_dynf_metadata_mask,
> but this solution sounds like an overkill.
> Essentially this would require moving to getter/setter functions for these variables
> inside the library.
> 
> I would appreciate any feedback or suggestions if anybody had encountered a similar issue before.
> 
> Dariusz Sosnowski (5):
>   eal: fix macro for versioned experimental symbol
>   drivers: support function versioning
>   net/mlx5: fix stabilized function versions
>   eal: support aliases for versioned variable symbols
>   ethdev: fix promoted flow metadata symbols
> 
>  buildtools/gen-version-map.py        | 11 ++++++++++
>  drivers/meson.build                  |  8 +++++++
>  drivers/net/mlx5/meson.build         |  2 ++
>  drivers/net/mlx5/mlx5_driver_event.c | 22 ++++++++++++++-----
>  drivers/net/mlx5/mlx5_flow.c         | 18 ++++++++++-----
>  lib/eal/common/eal_export.h          | 24 +++++++++++++++++++-
>  lib/ethdev/meson.build               |  2 ++
>  lib/ethdev/rte_flow.c                | 33 ++++++++++++++++++----------
>  8 files changed, 96 insertions(+), 24 deletions(-)
> 
> --
> 2.47.3
> 

The bugfix is good, but not sure the rest is needed right now.
It is getting late to add more stuff for 26.07 and in 26.11 function versioning
will not be needed.

^ permalink raw reply

* [PATCH v4] pcapng: add user-supplied timestamp support
From: Dawid Wesierski @ 2026-06-23 14:10 UTC (permalink / raw)
  To: dev; +Cc: dawid.wesierski, stephen, mb, Marek Kasiewicz
In-Reply-To: <20260618143819.310046-1-dawid.wesierski@intel.com>

From: "Wesierski, Dawid" <dawid.wesierski@intel.com>

Introduce rte_pcapng_copy_ts() alongside the existing rte_pcapng_copy()
so that callers with a hardware PTP or pre-captured timestamp can inject
an exact epoch-ns value directly into the packet record.

Timestamp handling in rte_pcapng_copy_ts():
 - ts != 0: caller-supplied nanoseconds since the Unix epoch, stored as-is.
 - ts == 0: TSC captured at copy time with bit 63 set as a sentinel.
   rte_pcapng_write_packets() detects the sentinel and converts the TSC to
   epoch ns using the file's calibrated clock.  The TSC will not reach
   bit 63 for centuries, and epoch-ns values stay below bit 63 until 2554,
   so the bit is safe to use as a disambiguation flag.

rte_pcapng_copy() is retained as a real exported function (not an inline
wrapper) so the stable ABI symbol is preserved.  It simply calls
rte_pcapng_copy_ts(..., 0) to capture the current TSC.

rte_pcapng_tsc_to_ns() is added as a new experimental helper (addressing
review requests from Stephen Hemminger and Morten Brørup).  It exposes the
same calibrated, drift-compensated, divide-free TSC-to-epoch-ns conversion
used internally by rte_pcapng_write_packets(), allowing callers to convert
a TSC captured at packet arrival time before passing it to
rte_pcapng_copy_ts().

Signed-off-by: Marek Kasiewicz <marek.kasiewicz@intel.com>
Signed-off-by: Dawid Wesierski <dawid.wesierski@intel.com>
---
Hi Stephen, Morten,
Thank you very much for your review and comments.

I have prepared a v4 patch.

ABI failure > I have restored rte_pcapng_copy() as a real exported function instead of a static inline wrapper.
This should fix the iol-abi-testing failure. It now simply calls rte_pcapng_copy_ts(..., 0) internally.

As suggested, I've added a new experimental function uint64_t rte_pcapng_tsc_to_ns(const rte_pcapng_t *self, uint64_t tsc);
I exposed the internal calibrated clock state maintained by the pcapng.

Regards,
Dawid Węsierski.

 .mailmap                |  2 ++
 lib/pcapng/rte_pcapng.c | 71 +++++++++++++++++++++++++++++++++--------
 lib/pcapng/rte_pcapng.h | 64 +++++++++++++++++++++++++++++++++++++
 3 files changed, 124 insertions(+), 13 deletions(-)

diff --git a/.mailmap b/.mailmap
index 4001e5fb0e..a7d97a631e 100644
--- a/.mailmap
+++ b/.mailmap
@@ -366,6 +366,7 @@ David Zeng <zengxhsh@cn.ibm.com>
 Davide Caratti <dcaratti@redhat.com>
 Dawid Gorecki <dgr@semihalf.com>
 Dawid Jurczak <dawid_jurek@vp.pl>
+Dawid Wesierski <dawid.wesierski@intel.com> Wesierski, Dawid <dawid.wesierski@intel.com>
 Dawid Zielinski <dawid.zielinski@intel.com>
 Dawid Łukwiński <dawid.lukwinski@intel.com>
 Daxue Gao <daxuex.gao@intel.com>
@@ -1014,6 +1015,7 @@ Marcin Wilk <marcin.wilk@caviumnetworks.com>
 Marcin Wojtas <mw@semihalf.com>
 Marcin Zapolski <marcinx.a.zapolski@intel.com>
 Marco Varlese <mvarlese@suse.de>
+Marek Kasiewicz <marek.kasiewicz@intel.com>
 Marek Mical <marekx.mical@intel.com>
 Marek Zalfresso-jundzillo <marekx.zalfresso-jundzillo@intel.com>
 Maria Lingemark <maria.lingemark@ericsson.com>
diff --git a/lib/pcapng/rte_pcapng.c b/lib/pcapng/rte_pcapng.c
index b5d1026891..f583fae995 100644
--- a/lib/pcapng/rte_pcapng.c
+++ b/lib/pcapng/rte_pcapng.c
@@ -546,14 +546,14 @@ pcapng_vlan_insert(struct rte_mbuf *m, uint16_t ether_type, uint16_t tci)
  */
 
 /* Make a copy of original mbuf with pcapng header and options */
-RTE_EXPORT_SYMBOL(rte_pcapng_copy)
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pcapng_copy_ts, 26.07)
 struct rte_mbuf *
-rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+rte_pcapng_copy_ts(uint16_t port_id, uint32_t queue,
 		const struct rte_mbuf *md,
 		struct rte_mempool *mp,
 		uint32_t length,
 		enum rte_pcapng_direction direction,
-		const char *comment)
+		const char *comment, uint64_t ts)
 {
 	struct pcapng_enhance_packet_block *epb;
 	uint32_t orig_len, pkt_len, padding, flags;
@@ -690,8 +690,20 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 	/* Interface index is filled in later during write */
 	mc->port = port_id;
 
-	/* Put timestamp in cycles here - adjust in packet write */
-	timestamp = rte_get_tsc_cycles();
+	/*
+	 * Timestamp handling:
+	 *  - If the caller supplied an explicit timestamp (ts != 0), it is
+	 *    already in nanoseconds since the Unix epoch, so store it as-is.
+	 *  - If the caller did not (ts == 0), store the current TSC and set
+	 *    the high bit as a sentinel so rte_pcapng_write_packets() knows
+	 *    it must convert TSC -> epoch ns at write time. The TSC counter
+	 *    will not reach bit 63 for centuries, and epoch-ns values stay
+	 *    below bit 63 until the year 2554, so the bit is safe to use.
+	 */
+	if (ts != 0)
+		timestamp = ts;
+	else
+		timestamp = rte_get_tsc_cycles() | (UINT64_C(1) << 63);
 	epb->timestamp_hi = timestamp >> 32;
 	epb->timestamp_lo = (uint32_t)timestamp;
 	epb->capture_length = pkt_len;
@@ -707,6 +719,35 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 	return NULL;
 }
 
+/*
+ * Compatibility wrapper: captures current TSC (converted at write time).
+ * Equivalent to rte_pcapng_copy_ts(..., 0).
+ */
+RTE_EXPORT_SYMBOL(rte_pcapng_copy)
+struct rte_mbuf *
+rte_pcapng_copy(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *md,
+		struct rte_mempool *mp,
+		uint32_t length,
+		enum rte_pcapng_direction direction,
+		const char *comment)
+{
+	return rte_pcapng_copy_ts(port_id, queue, md, mp, length, direction,
+				  comment, 0);
+}
+
+/*
+ * Convert a TSC value to nanoseconds since the Unix epoch using the
+ * calibrated clock of the capture file. Uses the same pre-computed
+ * reciprocal multiplier as the internal write path (no integer division).
+ */
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pcapng_tsc_to_ns, 26.07)
+uint64_t
+rte_pcapng_tsc_to_ns(const rte_pcapng_t *self, uint64_t tsc)
+{
+	return tsc_to_ns_epoch(&self->clock, tsc);
+}
+
 /* Write pre-formatted packets to file. */
 RTE_EXPORT_SYMBOL(rte_pcapng_write_packets)
 ssize_t
@@ -720,7 +761,7 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
 	for (i = 0; i < nb_pkts; i++) {
 		struct rte_mbuf *m = pkts[i];
 		struct pcapng_enhance_packet_block *epb;
-		uint64_t cycles, timestamp;
+		uint64_t timestamp;
 
 		/* sanity check that is really a pcapng mbuf */
 		epb = rte_pktmbuf_mtod(m, struct pcapng_enhance_packet_block *);
@@ -738,14 +779,18 @@ rte_pcapng_write_packets(rte_pcapng_t *self,
 		}
 
 		/*
-		 * When data is captured by pcapng_copy the current TSC is stored.
-		 * Adjust the value recorded in file to PCAP epoch units.
+		 * If rte_pcapng_copy[_ts]() stored a TSC value (high bit set
+		 * as sentinel), convert it to nanoseconds since the Unix epoch
+		 * using the per-file clock. Otherwise the timestamp is already
+		 * in epoch ns and is written unchanged.
 		 */
-		cycles = (uint64_t)epb->timestamp_hi << 32;
-		cycles += epb->timestamp_lo;
-		timestamp = tsc_to_ns_epoch(&self->clock, cycles);
-		epb->timestamp_hi = timestamp >> 32;
-		epb->timestamp_lo = (uint32_t)timestamp;
+		timestamp = ((uint64_t)epb->timestamp_hi << 32) | epb->timestamp_lo;
+		if (timestamp & (UINT64_C(1) << 63)) {
+			timestamp &= ~(UINT64_C(1) << 63);
+			timestamp = tsc_to_ns_epoch(&self->clock, timestamp);
+			epb->timestamp_hi = timestamp >> 32;
+			epb->timestamp_lo = (uint32_t)timestamp;
+		}
 
 		/*
 		 * Handle case of highly fragmented and large burst size
diff --git a/lib/pcapng/rte_pcapng.h b/lib/pcapng/rte_pcapng.h
index d8d328f710..6eeaeada05 100644
--- a/lib/pcapng/rte_pcapng.h
+++ b/lib/pcapng/rte_pcapng.h
@@ -108,9 +108,50 @@ enum rte_pcapng_direction {
 	RTE_PCAPNG_DIRECTION_OUT = 2,
 };
 
+/**
+ * Format an mbuf with a caller-supplied timestamp for writing to file.
+ *
+ * @param port_id
+ *   The Ethernet port on which packet was received
+ *   or is going to be transmitted.
+ * @param queue
+ *   The queue on the Ethernet port where packet was received
+ *   or is going to be transmitted.
+ * @param mp
+ *   The mempool from which the "clone" mbufs are allocated.
+ * @param m
+ *   The mbuf to copy
+ * @param length
+ *   The upper limit on bytes to copy.  Passing UINT32_MAX
+ *   means all data (after offset).
+ * @param direction
+ *   The direction of the packer: receive, transmit or unknown.
+ * @param comment
+ *   Optional per packet comment.
+ *   Truncated to UINT16_MAX characters.
+ * @param ts
+ *   Packet timestamp in nanoseconds since the Unix epoch. If zero, the
+ *   current TSC is captured and converted to epoch ns by
+ *   rte_pcapng_write_packets() when the packet is written.
+ *
+ * @return
+ *   - The pointer to the new mbuf formatted for pcapng_write
+ *   - NULL on error such as invalid port or out of memory.
+ */
+__rte_experimental
+struct rte_mbuf *
+rte_pcapng_copy_ts(uint16_t port_id, uint32_t queue,
+		const struct rte_mbuf *m, struct rte_mempool *mp,
+		uint32_t length,
+		enum rte_pcapng_direction direction, const char *comment,
+		uint64_t ts);
+
 /**
  * Format an mbuf for writing to file.
  *
+ * Equivalent to rte_pcapng_copy_ts() with ts=0: the current TSC is
+ * captured at copy time and converted to epoch ns at write time.
+ *
  * @param port_id
  *   The Ethernet port on which packet was received
  *   or is going to be transmitted.
@@ -153,6 +194,29 @@ rte_pcapng_copy(uint16_t port_id, uint32_t queue,
 uint32_t
 rte_pcapng_mbuf_size(uint32_t length);
 
+/**
+ * Convert a TSC value to nanoseconds since the Unix epoch.
+ *
+ * Uses the same calibrated clock reference as the capture file so that
+ * the result is consistent with timestamps written by
+ * rte_pcapng_write_packets(). The conversion is drift-compensated and
+ * uses a pre-computed reciprocal multiplier (no integer division).
+ *
+ * Typical use: convert a TSC timestamp captured close to packet arrival
+ * (e.g., from a PMD or hardware register) to an epoch-ns value before
+ * passing it to rte_pcapng_copy_ts().
+ *
+ * @param self
+ *   The handle to the packet capture file.
+ * @param tsc
+ *   TSC value to convert.
+ * @return
+ *   Nanoseconds since the Unix epoch corresponding to @p tsc.
+ */
+__rte_experimental
+uint64_t
+rte_pcapng_tsc_to_ns(const rte_pcapng_t *self, uint64_t tsc);
+
 /**
  * Write packets to the capture file.
  *
-- 
2.47.3

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.

^ permalink raw reply related

* Re: [PATCH] common/cnxk: fix inline dev null dereference
From: Jerin Jacob @ 2026-06-23 13:35 UTC (permalink / raw)
  To: Aarnav JP
  Cc: dev, Nithin Dabilpuram, Kiran Kumar K, Sunil Kumar Kori,
	Satha Rao, Harman Kalra, Rakesh Kudurumalla, jerinj, rbhansali,
	stable
In-Reply-To: <20260623085433.3190541-1-ajp@marvell.com>

On Tue, Jun 23, 2026 at 2:31 PM Aarnav JP <ajp@marvell.com> wrote:
>
> inl_dev is initialized to NULL and only assigned within the
> if (idev && idev->nix_inl_dev) block.
> Move inl_dev->res_addr_offset and inl_dev->cpt_cq_ena
> accesses inside this null-guarded block in
> nix_inl_inb_ipsec_sa_tbl_setup() and nix_inl_reass_inb_sa_tbl_setup()
> to avoid dereferencing a null pointer.
>
> Fixes: 3fdf3e53f3c4 ("common/cnxk: enable CPT CQ for inline IPsec inbound")
> Cc: stable@dpdk.org
>
> Signed-off-by: Aarnav JP <ajp@marvell.com>


Applied to dpdk-next-net-mrvl/for-main. Thanks

^ permalink raw reply

* Re: [PATCH] net/mlx5: fix double free in vectorized Rx recovery
From: Dariusz Sosnowski @ 2026-06-23 12:50 UTC (permalink / raw)
  To: Borys Tsyrulnikov
  Cc: Thomas Monjalon, Viacheslav Ovsiienko, Bing Zhao, Ori Kam,
	Suanming Mou, Matan Azrad, Alexander Kozyrev, dev, stable
In-Reply-To: <20260617134301.798213-1-tsyrulnikov.borys@gmail.com>

On Wed, Jun 17, 2026 at 04:43:01PM +0300, Borys Tsyrulnikov wrote:
> During Rx queue error recovery, the vectorized path in
> mlx5_rx_err_handle() reallocates an mbuf for every queue element. When
> rte_mbuf_raw_alloc() fails (for example, the mempool is exhausted), the
> rollback loop frees the mbufs allocated so far, but masks the element
> ring index with "& elts_n" instead of "& (elts_n - 1)".
> 
> elts_n is a power-of-two element count, so "x & elts_n" isolates a
> single bit and can only evaluate to 0 or elts_n, regardless of the loop
> counter. The rollback therefore never frees the mbufs just allocated in
> this pass (they are leaked); instead it repeatedly frees elts[0], a live
> mbuf still posted to the NIC (use-after-free / double free), and
> elts[elts_n], the fake_mbuf padding entry used by the vector datapath.
> 
> Mask with the existing e_mask (elts_n - 1), as already done in the
> matching forward allocation loop just above.
> 
> Fixes: 0f20acbf5eda ("net/mlx5: implement vectorized MPRQ burst")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Borys Tsyrulnikov <tsyrulnikov.borys@gmail.com>

Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>

^ permalink raw reply

* Re: [PATCH v3 05/25] bpf/validate: introduce debugging interface
From: Thomas Monjalon @ 2026-06-23 12:29 UTC (permalink / raw)
  To: Marat Khalili; +Cc: Konstantin Ananyev, dev@dpdk.org
In-Reply-To: <84ce7f7669404239864c61819267d9b6@huawei.com>

23/06/2026 12:29, Marat Khalili:
> > -----Original Message-----
> > From: Thomas Monjalon <thomas@monjalon.net>
> > Sent: Tuesday 23 June 2026 11:19
> > To: Marat Khalili <marat.khalili@huawei.com>
> > Cc: Konstantin Ananyev <konstantin.ananyev@huawei.com>; dev@dpdk.org
> > Subject: Re: [PATCH v3 05/25] bpf/validate: introduce debugging interface
> > 
> > 12/06/2026 12:47, Marat Khalili:
> > > +#ifndef LIST_FOREACH_SAFE
> > > +/* We need this macro which neither Linux nor EAL for Linux include yet. */
> > > +#define        LIST_FOREACH_SAFE(var, head, field, tvar)                       \
> > > +       for ((var) = LIST_FIRST((head));                                \
> > > +           (var) && ((tvar) = LIST_NEXT((var), field), 1);             \
> > > +           (var) = (tvar))
> > > +#else
> > > +#ifdef RTE_EXEC_ENV_LINUX
> > > +#error "Don't need LIST_FOREACH_SAFE in this version of DPDK anymore, remove it."
> > > +#endif
> > > +#endif
> > 
> > It fails on Alpine Linux.
> > Why adding this #error?
> > 
> 
> This is interesting. My mental model was that Linux is never going to have
> LIST_FOREACH_SAFE, but DPDK will eventually gain its own polyfill. I was
> actually expecting it to happen before my patch is published, so this was a
> reminder to remove my own definition since it clearly belongs to some common
> library. Turns out I was wrong on both accounts: there are Linuxes that define
> LIST_FOREACH_SAFE, and I managed to submit faster. Apart from these
> organizational issues the whole else branch can be safely removed. Do you want
> me to submit an updated version?

Yes would be nice so we will have a full CI run on it
now that the dependency is merged in main.



^ permalink raw reply

* RE: [PATCH v5] graph: add optional profiling stats
From: Morten Brørup @ 2026-06-23 12:04 UTC (permalink / raw)
  To: saeed bishara
  Cc: Jerin Jacob, dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram,
	Zhirun Yan
In-Reply-To: <CAHfVqdWKoDqb0uD_HrF8e=GqadThPhZj0vZnRYDW=KMPei0mXQ@mail.gmail.com>

> From: saeed bishara [mailto:saeed.bishara.os@gmail.com]
> Sent: Tuesday, 23 June 2026 10.34
> 
> > > > > +               /** Fast path area cache line 3. */
> > > > > +#ifdef RTE_GRAPH_PROFILE
> > > > > +               struct {
> > > > > +                       uint64_t calls;     /**< Calls
> processing
> > > > resp. 0 or 1 objects. */
> > > > > +                       uint64_t cycles;    /**< Cycles spent
> > > > processing resp. 0 or 1 objects. */
> > > > > +               } usage_stats[2];       /**< Usage when this
> node
> > > > processed 0 or 1 objects. */
> > > > > +               uint64_t full_burst_calls;  /**< Calls
> processing a
> > > > full burst of objects. */
> > > > > +               uint64_t full_burst_cycles; /**< Cycles spent
> > > > processing a full burst of objects. */
> > > > > +               uint64_t half_burst_calls;  /**< Calls
> processing a
> > > > half burst of objects. */
> > > > > +               uint64_t half_burst_cycles; /**< Cycles spent
> > > > processing a half burst of objects. */
> > > > > +               /** Fast path area cache line 4. */
> > > > > +#endif
> > > >
> > > > Is it an ABI breakage?
> Can you consider one array for all cases?

Ack.

> also, instead of adding cacheline for this profiling data, can we
> share with line 1 that used solely for xstats?

This profiling data is 4 indexes * 2 values * 8-byte fields, so one cache line in itself.


^ permalink raw reply

* Re: [PATCH v1 0/5] prefix lcore role enum values
From: lihuisong (C) @ 2026-06-23 11:52 UTC (permalink / raw)
  To: David Marchand
  Cc: Stephen Hemminger, Morten Brørup, thomas, andrew.rybchenko,
	dev, zhanjie9
In-Reply-To: <CAJFAV8yNsZ_SLcG-ukzmDTQXRXDsGVtf-9szwSc6T2GM+fhE_Q@mail.gmail.com>


On 6/22/2026 4:18 PM, David Marchand wrote:
> Hello all,
>
> On Mon, 22 Jun 2026 at 03:23, lihuisong (C) <lihuisong@huawei.com> wrote:
>> On 6/19/2026 10:03 AM, Stephen Hemminger wrote:
>>> On Wed, 17 Jun 2026 13:48:37 +0200
>>> Morten Brørup <mb@smartsharesystems.com> wrote:
>>>
>>>>> From: Huisong Li [mailto:lihuisong@huawei.com]
>>>>> Sent: Wednesday, 17 June 2026 12.28
>>>>>
>>>>> Add the RTE_LCORE_ prefix to the lcore role enum values in
>>>>> rte_lcore_role_t
>>>>> to follow DPDK naming conventions.
>>>>>
>>>>> - ROLE_RTE      -> RTE_LCORE_ROLE_RTE
>>>>> - ROLE_OFF      -> RTE_LCORE_ROLE_OFF
>>>>> - ROLE_SERVICE  -> RTE_LCORE_ROLE_SERVICE
>>>>> - ROLE_NON_EAL  -> RTE_LCORE_ROLE_NON_EAL
>>>>>
>>>>> Old names are kept as macros aliasing to the new names to preserve
>>>>> backward compatibility.
>>>>>
>>>> Series-Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>>>
>>> The problem with this patch it causes build failures now with abi diff.
>>>
>>> Example build log...
>>>
>>>
>>> 2 functions with some indirect sub-type change:
>>>
>>>
>>>
>>>
>>>
>>>    [C] 'function rte_lcore_role_t rte_eal_lcore_role(unsigned int)' at eal_common_lcore.c:74:1 has some indirect sub-type changes:
>>>
>>>    return type changed:
>>>
>>>    type size hasn't changed
>>>
>>>    4 enumerator deletions:
>>>
>>>    'rte_lcore_role_t::ROLE_RTE' value '0'
>>>
>>>    'rte_lcore_role_t::ROLE_OFF' value '1'
>>>
>>>    'rte_lcore_role_t::ROLE_SERVICE' value '2'
>>>
>>>    'rte_lcore_role_t::ROLE_NON_EAL' value '3'
>>>
>>>    4 enumerator insertions:
>>>
>>>    'rte_lcore_role_t::RTE_LCORE_ROLE_RTE' value '0'
>>>
>>>    'rte_lcore_role_t::RTE_LCORE_ROLE_OFF' value '1'
>>>
>>>    'rte_lcore_role_t::RTE_LCORE_ROLE_SERVICE' value '2'
>>>
>>>    'rte_lcore_role_t::RTE_LCORE_ROLE_NON_EAL' value '3'
>>>
>>>
>>>
>>>
>>>
>>>    [C] 'function int rte_lcore_has_role(unsigned int, rte_lcore_role_t)' at eal_common_lcore.c:85:1 has some indirect sub-type changes:
>>>
>>>    parameter 2 of type 'enum rte_lcore_role_t' has sub-type changes:
>>>
>>>    enum type 'enum rte_lcore_role_t' changed at rte_lcore.h:33:1, as reported earlier
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> Error: ABI issue reported for abidiff --suppr /home/runner/work/dpdk/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1 reference/usr/local/include --headers-dir2 install/usr/local/include reference/usr/local/lib/librte_eal.so.26.1 install/usr/local/lib/librte_eal.so.26.2
>> We just came back from the Dragon Boat Festival.
>> I also received this ABI change warning. But I didn't have any good
>> ideas yet.
>> Thanks for helping to handle this.
>> Sorry for the inconvenience.
> There is nothing broken from a ABI pov.
> This is a limitation in earlier versions of libabigail.
> I can't reproduce with libabigail 2.9 (update in progress as I see
> 2.10 is available now).
>
> I think it was solved in libabigail 2.8
> (https://sourceware.org/git/?p=libabigail.git;a=commit;h=6f5f91564bdd).
This seems to solve the problem.
>
> If we want to go with the enum renaming before 26.11, bumping
> libabigail to 2.10 in the CI is an option (latest upstream version,
> and this is the version in f43 and f44).
> I tried it in GHA:
> https://github.com/david-marchand/dpdk/actions/runs/27937595115/job/82662953500
I also tested it based on libabigail 2.9.0 version. No any warning.
-->
abidiff build-ref/lib/librte_eal.so build-new/lib/librte_eal.so
Functions changes summary: 0 Removed, 0 Changed (2 filtered out), 0 
Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

>
>

^ permalink raw reply

* [PATCH 5/5] ethdev: fix promoted flow metadata symbols
From: Dariusz Sosnowski @ 2026-06-23 11:37 UTC (permalink / raw)
  To: Thomas Monjalon, Andrew Rybchenko, Ori Kam
  Cc: dev, David Marchand, Bruce Richardson, Yu Jiang
In-Reply-To: <20260623113752.1100072-1-dsosnowski@nvidia.com>

Offending patch stabilized the following symbols:

- 1 function symbol:
    - rte_flow_dynf_metadata_register
- 2 global variable symbols:
    - rte_flow_dynf_metadata_offs
    - rte_flow_dynf_metadata_mask

Any application using these flow metadata symbols,
which was linked dynamically against 25.11 version of ethdev
library and using current version of ethdev library
would fail on symbol resolution, because EXPERIMENTAL versions
were not exported.
Specifically, on application start up
variable symbol lookup error happens:

/tmp/dpdk-25.11/usr/local/bin/dpdk-testpmd:
  symbol lookup error: /tmp/dpdk-25.11/usr/local/bin/dpdk-testpmd:
    undefined symbol: rte_flow_dynf_metadata_offs, version EXPERIMENTAL

This error occurss because symbol lookup for global variables
happens on application startup.

This patch addresses that by adding versioned aliases
for the following variable symbols:

- rte_flow_dynf_metadata_offs
- rte_flow_dynf_metadata_mask

Versioned function symbols are also added
for rte_flow_dynf_metadata_register().

Bugzilla ID: 1957
Fixes: 4ee2f5c1cedf ("ethdev: promote flow metadata API to stable")

Reported-by: Yu Jiang <yux.jiang@intel.com>
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 lib/ethdev/meson.build |  2 ++
 lib/ethdev/rte_flow.c  | 33 ++++++++++++++++++++++-----------
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/lib/ethdev/meson.build b/lib/ethdev/meson.build
index 8ba6c708a2..63fd866af9 100644
--- a/lib/ethdev/meson.build
+++ b/lib/ethdev/meson.build
@@ -1,6 +1,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017 Intel Corporation
 
+use_function_versioning = true
+
 sources = files(
         'ethdev_driver.c',
         'ethdev_private.c',
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index ec0fe08355..a8c01ffe8a 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -23,12 +23,20 @@
 #define FLOW_LOG RTE_ETHDEV_LOG_LINE
 
 /* Mbuf dynamic field name for metadata. */
-RTE_EXPORT_SYMBOL(rte_flow_dynf_metadata_offs)
-int32_t rte_flow_dynf_metadata_offs = -1;
+static int32_t rte_flow_dynf_metadata_offs_impl = -1;
+
+RTE_DEFAULT_SYMBOL_ALIAS(26, int32_t, rte_flow_dynf_metadata_offs,
+			 rte_flow_dynf_metadata_offs_impl);
+RTE_VERSION_EXPERIMENTAL_SYMBOL_ALIAS(int32_t, rte_flow_dynf_metadata_offs,
+				      rte_flow_dynf_metadata_offs_impl);
 
 /* Mbuf dynamic field flag bit number for metadata. */
-RTE_EXPORT_SYMBOL(rte_flow_dynf_metadata_mask)
-uint64_t rte_flow_dynf_metadata_mask;
+static uint64_t rte_flow_dynf_metadata_mask_impl = 0;
+
+RTE_DEFAULT_SYMBOL_ALIAS(26, uint64_t, rte_flow_dynf_metadata_mask,
+			 rte_flow_dynf_metadata_mask_impl);
+RTE_VERSION_EXPERIMENTAL_SYMBOL_ALIAS(uint64_t, rte_flow_dynf_metadata_mask,
+				      rte_flow_dynf_metadata_mask_impl);
 
 /**
  * Flow elements description tables.
@@ -281,9 +289,7 @@ static const struct rte_flow_desc_data rte_flow_desc_action[] = {
 	MK_FLOW_ACTION(JUMP_TO_TABLE_INDEX, sizeof(struct rte_flow_action_jump_to_table_index)),
 };
 
-RTE_EXPORT_SYMBOL(rte_flow_dynf_metadata_register)
-int
-rte_flow_dynf_metadata_register(void)
+RTE_DEFAULT_SYMBOL(26, int, rte_flow_dynf_metadata_register, (void))
 {
 	int offset;
 	int flag;
@@ -303,19 +309,24 @@ rte_flow_dynf_metadata_register(void)
 	flag = rte_mbuf_dynflag_register(&desc_flag);
 	if (flag < 0)
 		goto error;
-	rte_flow_dynf_metadata_offs = offset;
-	rte_flow_dynf_metadata_mask = RTE_BIT64(flag);
+	rte_flow_dynf_metadata_offs_impl = offset;
+	rte_flow_dynf_metadata_mask_impl = RTE_BIT64(flag);
 
 	rte_flow_trace_dynf_metadata_register(offset, RTE_BIT64(flag));
 
 	return 0;
 
 error:
-	rte_flow_dynf_metadata_offs = -1;
-	rte_flow_dynf_metadata_mask = UINT64_C(0);
+	rte_flow_dynf_metadata_offs_impl = -1;
+	rte_flow_dynf_metadata_mask_impl = UINT64_C(0);
 	return -rte_errno;
 }
 
+RTE_VERSION_EXPERIMENTAL_SYMBOL(int, rte_flow_dynf_metadata_register, (void))
+{
+	return rte_flow_dynf_metadata_register();
+}
+
 static inline void
 fts_enter(struct rte_eth_dev *dev)
 {
-- 
2.47.3


^ permalink raw reply related

* [PATCH 4/5] eal: support aliases for versioned variable symbols
From: Dariusz Sosnowski @ 2026-06-23 11:37 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev, David Marchand
In-Reply-To: <20260623113752.1100072-1-dsosnowski@nvidia.com>

Existing symbol versioning macros are not suitable for versioning
exported global variables.

Specifically, if existing macros are used for versioning
global variable symbol promoted from experimental to stable,
result would be multiple variables with separate storage defined.
If an application was linked against older DPDK and had copy
relocations, this would yield an inconsistent behavior:

- Application would use experimental symbol version,
  with storage set up in BSS section in application.
- Library would use latest symbol version,
  with storage set up in BSS section of shared object.

This patch adds versioning macros which utilize symbol aliasing.
Specifically, a new variable (with version suffix) is defined
as an alias to private (static) variable inside the library.
Variable symbol versions are attached to these alias variables.

Following macros are added:

- RTE_VERSION_EXPERIMENTAL_SYMBOL_ALIAS
- RTE_DEFAULT_SYMBOL_ALIAS

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 buildtools/gen-version-map.py | 11 +++++++++++
 lib/eal/common/eal_export.h   | 22 ++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/buildtools/gen-version-map.py b/buildtools/gen-version-map.py
index 57e08a8c0f..aa88e69179 100755
--- a/buildtools/gen-version-map.py
+++ b/buildtools/gen-version-map.py
@@ -14,8 +14,12 @@
 export_int_sym_regexp = re.compile(r"^RTE_EXPORT_INTERNAL_SYMBOL\(([^)]+)\)")
 export_sym_regexp = re.compile(r"^RTE_EXPORT_SYMBOL\(([^)]+)\)")
 ver_sym_regexp = re.compile(r"^RTE_VERSION_SYMBOL\(([^,]+), [^,]+, ([^,]+),")
+
 ver_exp_sym_regexp = re.compile(r"^RTE_VERSION_EXPERIMENTAL_SYMBOL\([^,]+, ([^,]+),")
+ver_exp_sym_alias_regexp = re.compile(r"^RTE_VERSION_EXPERIMENTAL_SYMBOL_ALIAS\([^,]+, ([^,]+),")
+
 default_sym_regexp = re.compile(r"^RTE_DEFAULT_SYMBOL\(([^,]+), [^,]+, ([^,]+),")
+default_sym_alias_regexp = re.compile(r"^RTE_DEFAULT_SYMBOL_ALIAS\(([^,]+), [^,]+, ([^,]+),")
 
 parser = argparse.ArgumentParser(
     description=__doc__,
@@ -73,10 +77,17 @@
         elif ver_exp_sym_regexp.match(ln):
             node = "EXPERIMENTAL"
             symbol = ver_exp_sym_regexp.match(ln).group(1)
+        elif ver_exp_sym_alias_regexp.match(ln):
+            node = "EXPERIMENTAL"
+            symbol = ver_exp_sym_alias_regexp.match(ln).group(1)
         elif default_sym_regexp.match(ln):
             abi = default_sym_regexp.match(ln).group(1)
             node = f"DPDK_{abi}"
             symbol = default_sym_regexp.match(ln).group(2)
+        elif default_sym_alias_regexp.match(ln):
+            abi = default_sym_alias_regexp.match(ln).group(1)
+            node = f"DPDK_{abi}"
+            symbol = default_sym_alias_regexp.match(ln).group(2)
 
         if not symbol:
             continue
diff --git a/lib/eal/common/eal_export.h b/lib/eal/common/eal_export.h
index 7971bf8d7a..5b458f81c6 100644
--- a/lib/eal/common/eal_export.h
+++ b/lib/eal/common/eal_export.h
@@ -63,6 +63,14 @@ __attribute__((__symver__(RTE_STR(name) "@@DPDK_" RTE_STR(ver)))) \
 type name ## _v ## ver args; \
 type name ## _v ## ver args
 
+#define RTE_VERSION_EXPERIMENTAL_SYMBOL_ALIAS(type, name, orig) VERSIONING_WARN \
+extern type name ## _exp __attribute((alias(RTE_STR(orig)), \
+				      __symver__(RTE_STR(name) "@EXPERIMENTAL")))
+
+#define RTE_DEFAULT_SYMBOL_ALIAS(ver, type, name, orig) VERSIONING_WARN \
+extern type name ## _v ## ver __attribute((alias(RTE_STR(orig)), \
+					   __symver__(RTE_STR(name) "@@DPDK_" RTE_STR(ver))))
+
 #else /* !__has_attribute(symver) */
 
 /* Use asm tag to create symbol table entry */
@@ -81,6 +89,14 @@ __asm__(".symver " RTE_STR(name) "_v" RTE_STR(ver) ", " RTE_STR(name) "@@DPDK_"
 __rte_used type name ## _v ## ver args; \
 type name ## _v ## ver args
 
+#define RTE_DEFAULT_SYMBOL_ALIAS(ver, type, name, orig) VERSIONING_WARN \
+extern type name ## _v ## ver __attribute__((alias(RTE_STR(orig)))); \
+__asm__(".symver " RTE_STR(name) "_v" RTE_STR(ver) ", " RTE_STR(name) "@@DPDK_" RTE_STR(ver));
+
+#define RTE_VERSION_EXPERIMENTAL_SYMBOL_ALIAS(type, name, orig) VERSIONING_WARN \
+extern type name ## _exp __attribute__((alias(RTE_STR(orig)))); \
+__asm__(".symver " RTE_STR(name) "_exp, " RTE_STR(name) "@EXPERIMENTAL");
+
 #endif /* __has_attribute(symver) */
 
 #else /* !RTE_BUILD_SHARED_LIB */
@@ -97,6 +113,12 @@ type name ## _exp args
 #define RTE_DEFAULT_SYMBOL(ver, type, name, args) VERSIONING_WARN \
 type name args
 
+#define RTE_VERSION_EXPERIMENTAL_SYMBOL_ALIAS(type, name, orig) VERSIONING_WARN \
+extern type name ## _exp __attribute__((alias(RTE_STR(orig))));
+
+#define RTE_DEFAULT_SYMBOL_ALIAS(ver, type, name, orig) VERSIONING_WARN \
+extern type name __attribute__((alias(RTE_STR(orig))));
+
 #endif /* RTE_BUILD_SHARED_LIB */
 
 #endif /* EAL_EXPORT_H */
-- 
2.47.3


^ permalink raw reply related

* [PATCH 3/5] net/mlx5: fix stabilized function versions
From: Dariusz Sosnowski @ 2026-06-23 11:37 UTC (permalink / raw)
  To: Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
	Matan Azrad
  Cc: dev, David Marchand, Bruce Richardson
In-Reply-To: <20260623113752.1100072-1-dsosnowski@nvidia.com>

Offending patch stabilized the following function symbols:

- rte_pmd_mlx5_driver_event_cb_register
- rte_pmd_mlx5_driver_event_cb_unregister
- rte_pmd_mlx5_enable_steering
- rte_pmd_mlx5_disable_steering

These function symbols were introduced in 25.11.
Any application using these functions, linked against 25.11 version,
would fail when used with 26.07 libraries, because only DPDK_26 versions
of these symbols were exported.

This patch fixes that by adding proper function symbol versioning
to these symbols.

Fixes: e8cab133645f ("net/mlx5: promote some private API to stable")

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/net/mlx5/meson.build         |  2 ++
 drivers/net/mlx5/mlx5_driver_event.c | 22 ++++++++++++++++------
 drivers/net/mlx5/mlx5_flow.c         | 18 ++++++++++++------
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/drivers/net/mlx5/meson.build b/drivers/net/mlx5/meson.build
index 82a7dfe782..0fa6322779 100644
--- a/drivers/net/mlx5/meson.build
+++ b/drivers/net/mlx5/meson.build
@@ -2,6 +2,8 @@
 # Copyright 2018 6WIND S.A.
 # Copyright 2018 Mellanox Technologies, Ltd
 
+use_function_versioning = true
+
 if not (is_linux or is_windows)
     build = false
     reason = 'only supported on Linux and Windows'
diff --git a/drivers/net/mlx5/mlx5_driver_event.c b/drivers/net/mlx5/mlx5_driver_event.c
index 89e49331c8..d0e22d6151 100644
--- a/drivers/net/mlx5/mlx5_driver_event.c
+++ b/drivers/net/mlx5/mlx5_driver_event.c
@@ -236,9 +236,8 @@ notify_existing_devices(rte_pmd_mlx5_driver_event_callback_t cb, void *opaque)
 		notify_existing_queues(port_id, cb, opaque);
 }
 
-RTE_EXPORT_SYMBOL(rte_pmd_mlx5_driver_event_cb_register)
-int
-rte_pmd_mlx5_driver_event_cb_register(rte_pmd_mlx5_driver_event_callback_t cb, void *opaque)
+RTE_DEFAULT_SYMBOL(26, int, rte_pmd_mlx5_driver_event_cb_register,
+		   (rte_pmd_mlx5_driver_event_callback_t cb, void *opaque))
 {
 	struct registered_cb *r;
 
@@ -264,9 +263,14 @@ rte_pmd_mlx5_driver_event_cb_register(rte_pmd_mlx5_driver_event_callback_t cb, v
 	return 0;
 }
 
-RTE_EXPORT_SYMBOL(rte_pmd_mlx5_driver_event_cb_unregister)
-int
-rte_pmd_mlx5_driver_event_cb_unregister(rte_pmd_mlx5_driver_event_callback_t cb)
+RTE_VERSION_EXPERIMENTAL_SYMBOL(int, rte_pmd_mlx5_driver_event_cb_register,
+				(rte_pmd_mlx5_driver_event_callback_t cb, void *opaque))
+{
+	return rte_pmd_mlx5_driver_event_cb_register(cb, opaque);
+}
+
+RTE_DEFAULT_SYMBOL(26, int, rte_pmd_mlx5_driver_event_cb_unregister,
+		   (rte_pmd_mlx5_driver_event_callback_t cb))
 {
 	struct registered_cb *r;
 	bool found = false;
@@ -289,6 +293,12 @@ rte_pmd_mlx5_driver_event_cb_unregister(rte_pmd_mlx5_driver_event_callback_t cb)
 	return 0;
 }
 
+RTE_VERSION_EXPERIMENTAL_SYMBOL(int, rte_pmd_mlx5_driver_event_cb_unregister,
+				(rte_pmd_mlx5_driver_event_callback_t cb))
+{
+	return rte_pmd_mlx5_driver_event_cb_unregister(cb);
+}
+
 RTE_FINI(rte_pmd_mlx5_driver_event_cb_cleanup) {
 	struct registered_cb *r;
 
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index a95dd9dc94..4b984df892 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -12506,9 +12506,7 @@ flow_disable_steering_run_on_related(struct rte_eth_dev *dev,
 	}
 }
 
-RTE_EXPORT_SYMBOL(rte_pmd_mlx5_disable_steering)
-void
-rte_pmd_mlx5_disable_steering(void)
+RTE_DEFAULT_SYMBOL(26, void, rte_pmd_mlx5_disable_steering, (void))
 {
 	uint16_t port_id;
 
@@ -12532,9 +12530,12 @@ rte_pmd_mlx5_disable_steering(void)
 	mlx5_steering_disabled = true;
 }
 
-RTE_EXPORT_SYMBOL(rte_pmd_mlx5_enable_steering)
-int
-rte_pmd_mlx5_enable_steering(void)
+RTE_VERSION_EXPERIMENTAL_SYMBOL(void, rte_pmd_mlx5_disable_steering, (void))
+{
+	rte_pmd_mlx5_disable_steering();
+}
+
+RTE_DEFAULT_SYMBOL(26, int, rte_pmd_mlx5_enable_steering, (void))
 {
 	uint16_t port_id;
 
@@ -12551,6 +12552,11 @@ rte_pmd_mlx5_enable_steering(void)
 	return 0;
 }
 
+RTE_VERSION_EXPERIMENTAL_SYMBOL(int, rte_pmd_mlx5_enable_steering, (void))
+{
+	return rte_pmd_mlx5_enable_steering();
+}
+
 bool
 mlx5_vport_rx_metadata_passing_enabled(const struct mlx5_dev_ctx_shared *sh)
 {
-- 
2.47.3


^ permalink raw reply related

* [PATCH 2/5] drivers: support function versioning
From: Dariusz Sosnowski @ 2026-06-23 11:37 UTC (permalink / raw)
  To: David Marchand, Bruce Richardson; +Cc: dev
In-Reply-To: <20260623113752.1100072-1-dsosnowski@nvidia.com>

Add support for enabling function versioning
(through use_function_versioning meson variable) for drivers,
similar to libraries.

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 drivers/meson.build | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/meson.build b/drivers/meson.build
index 4d95604ecd..a63d93372a 100644
--- a/drivers/meson.build
+++ b/drivers/meson.build
@@ -171,6 +171,7 @@ foreach subpath:subdirs
         pkgconfig_extra_libs = []
         testpmd_sources = []
         require_iova_in_mbuf = true
+        use_function_versioning = false
         # for handling base code files which may need extra cflags
         base_sources = []
         base_cflags = []
@@ -273,6 +274,13 @@ foreach subpath:subdirs
         endif
         dpdk_conf.set(lib_name.to_upper(), 1)

+        if developer_mode and is_windows and use_function_versioning
+            message('@0@: Function versioning is not supported by Windows.'.format(name))
+        endif
+        if use_function_versioning
+            cflags += '-DRTE_USE_FUNCTION_VERSIONING'
+        endif
+
         dpdk_extra_ldflags += pkgconfig_extra_libs

         dpdk_headers += headers
--
2.47.3


^ permalink raw reply related

* [PATCH 0/5] add versioned symbols for recently stabilized APIs
From: Dariusz Sosnowski @ 2026-06-23 11:37 UTC (permalink / raw)
  To: Thomas Monjalon, David Marchand, Bruce Richardson,
	Andrew Rybchenko, Viacheslav Ovsiienko, Bing Zhao, Ori Kam,
	Suanming Mou, Matan Azrad
  Cc: dev

Main goal of this patchset is to address https://bugs.dpdk.org/show_bug.cgi?id=1957
but it also handles other recently stabilized symbols and has some minor fixes:

- Patch 1 - Fix RTE_VERSION_EXPERIMENTAL_SYMBOL macro on clang.
- Patch 2 - Allow function versioning inside drivers.
- Patch 3 - Version the function symbols stabilized in
  https://git.dpdk.org/dpdk/commit/?id=e8cab133645f5466ef75e511629add43b68a5027
- Patch 4 - Introduce versioning macros for global variable symbols.
- Patch 5 - Version the function and variable symbols stabilized in
  https://git.dpdk.org/dpdk/commit/?id=4ee2f5c1cedf9ee7f39afa667f71b07f4004ba5c

Issue is still not fully fixed for stabilized global variables:
rte_flow_dynf_metadata_offs and rte_flow_dynf_metadata_mask.
Patch 4 and 5 address the bug for these global variables,
by providing a single storage for both EXPERIMENTAL and
DPDK_26 variable symbol versions.
This is achieved through symbol aliasing.
But this solution is limited only to executables compiled with clang.

clang and gcc have a different default behavior regarding relocations
of global variables exposed by shared libraries.

With clang, R_X86_64_GLOB_DAT relocations are generated for executables:

   $ readelf -sW build-26.07/lib/librte_ethdev.so | grep rte_flow_dynf_metadata_offs
       113: 00000000000ea4c0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@@DPDK_26
       116: 00000000000ea4c0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@EXPERIMENTAL
       970: 00000000000ea4c0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_impl
      1212: 00000000000ea4c0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_v26
      1325: 00000000000ea4c0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_exp
      1415: 00000000000ea4c0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@@DPDK_26
      1705: 00000000000ea4c0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@EXPERIMENTAL

    $ readelf -rW build-26.07/drivers/librte_net_mlx5.so | grep rte_flow_dynf_metadata_offs
    0000000003ed5f18  0000001600000006 R_X86_64_GLOB_DAT      0000000000000000 rte_flow_dynf_metadata_offs@DPDK_26 + 0

    $ readelf -rW build-25.11/app/dpdk-testpmd | grep rte_flow_dynf_metadata_offs
--> 000000000028ef70  0000011300000006 R_X86_64_GLOB_DAT      0000000000000000 rte_flow_dynf_metadata_offs@EXPERIMENTAL + 0

With gcc, R_X86_64_COPY relocations are generated:

    $ readelf -sW build-26.07/lib/librte_ethdev.so | grep rte_flow_dynf_metadata_offs
       113: 00000000000e74e0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@@DPDK_26
       116: 00000000000e74e0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@EXPERIMENTAL
      1471: 00000000000e74e0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_impl
      2134: 00000000000e74e0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_v26
      2247: 00000000000e74e0     4 OBJECT  LOCAL  DEFAULT   24 rte_flow_dynf_metadata_offs_exp
      2337: 00000000000e74e0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@@DPDK_26
      2627: 00000000000e74e0     4 OBJECT  GLOBAL DEFAULT   24 rte_flow_dynf_metadata_offs@EXPERIMENTAL

    $ readelf -rW build-26.07/drivers/librte_net_mlx5.so | grep rte_flow_dynf_metadata_offs
    00000000046dbef0  0000001600000006 R_X86_64_GLOB_DAT      0000000000000000 rte_flow_dynf_metadata_offs@DPDK_26 + 0

    $ readelf -rW build-25.11/app/dpdk-testpmd | grep rte_flow_dynf_metadata_offs
--> 000000000029b540  000001d200000005 R_X86_64_COPY          000000000029b540 rte_flow_dynf_metadata_offs@EXPERIMENTAL + 0

With copy relocations (testpmd linked through gcc) the following happens:

- When variable symbol (with EXPERIMENTAL version) gets resolved inside executable,
  global variable gets copied from read-only data to executable's BSS section.
  Executable will access this variable through BSS.
- When variable symbol (with DPDK_26 version) gets resolved inside a library,
  global variable is accessed indirectly through GOT.
  It is stored inside BSS section of the shared library.

So executable and libraries refer to different storage,
eventually leading to inconsistent runtime behavior.
Problems only appears when executable and library require
different versions of global variable symbol.
If testpmd from 26.07 is used with libraries from 26.07,
GOT entry for these variables will point to copied variable.

Without copy relocations (testpmd linked through clang) both
executable and libraries access the global variable indirectly through GOT.
Runtime behavior is consistent, regardless of the mix of variable symbol versions.

The only other solution I could find was to use dlsym() inside libraries
to dynamically resolve the location rte_flow_dynf_metadata_offs and rte_flow_dynf_metadata_mask,
but this solution sounds like an overkill.
Essentially this would require moving to getter/setter functions for these variables
inside the library.

I would appreciate any feedback or suggestions if anybody had encountered a similar issue before.

Dariusz Sosnowski (5):
  eal: fix macro for versioned experimental symbol
  drivers: support function versioning
  net/mlx5: fix stabilized function versions
  eal: support aliases for versioned variable symbols
  ethdev: fix promoted flow metadata symbols

 buildtools/gen-version-map.py        | 11 ++++++++++
 drivers/meson.build                  |  8 +++++++
 drivers/net/mlx5/meson.build         |  2 ++
 drivers/net/mlx5/mlx5_driver_event.c | 22 ++++++++++++++-----
 drivers/net/mlx5/mlx5_flow.c         | 18 ++++++++++-----
 lib/eal/common/eal_export.h          | 24 +++++++++++++++++++-
 lib/ethdev/meson.build               |  2 ++
 lib/ethdev/rte_flow.c                | 33 ++++++++++++++++++----------
 8 files changed, 96 insertions(+), 24 deletions(-)

--
2.47.3


^ permalink raw reply

* [PATCH 1/5] eal: fix macro for versioned experimental symbol
From: Dariusz Sosnowski @ 2026-06-23 11:37 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Bruce Richardson
In-Reply-To: <20260623113752.1100072-1-dsosnowski@nvidia.com>

Add a missing semicolon after __asm__ block in
RTE_VERSION_EXPERIMENTAL_SYMBOL macro.
It's lack triggers the following compilation error with clang:

    ../lib/ethdev/rte_flow.c:320:1: error: expected ';' after top-level asm block
      320 | RTE_VERSION_EXPERIMENTAL_SYMBOL(int, rte_flow_dynf_metadata_register, (void))
          | ^
    ../lib/eal/common/eal_export.h:75:74: note: expanded from macro 'RTE_VERSION_EXPERIMENTAL_SYMBOL'
       75 | __asm__(".symver " RTE_STR(name) "_exp, " RTE_STR(name) "@EXPERIMENTAL") \
          |                                                                          ^
    ../lib/eal/include/rte_common.h:237:20: note: expanded from macro '\
    __rte_used'
      237 | #define __rte_used __attribute__((used))
          |                    ^

Fixes: e30e194c4d06 ("eal: rework function versioning macros")
Cc: david.marchand@redhat.com

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
 lib/eal/common/eal_export.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/eal/common/eal_export.h b/lib/eal/common/eal_export.h
index 888fd9f9ed..7971bf8d7a 100644
--- a/lib/eal/common/eal_export.h
+++ b/lib/eal/common/eal_export.h
@@ -72,7 +72,7 @@ __rte_used type name ## _v ## ver args; \
 type name ## _v ## ver args
 
 #define RTE_VERSION_EXPERIMENTAL_SYMBOL(type, name, args) VERSIONING_WARN \
-__asm__(".symver " RTE_STR(name) "_exp, " RTE_STR(name) "@EXPERIMENTAL") \
+__asm__(".symver " RTE_STR(name) "_exp, " RTE_STR(name) "@EXPERIMENTAL"); \
 __rte_used type name ## _exp args; \
 type name ## _exp args
 
-- 
2.47.3


^ permalink raw reply related

* [PATCH v3 4/4] net/txgbe: add VF support for Amber-Lite 40G NIC
From: Zaiyu Wang @ 2026-06-23 11:38 UTC (permalink / raw)
  To: dev; +Cc: Zaiyu Wang, Jiawen Wu
In-Reply-To: <20260623113805.16464-1-zaiyuwang@trustnetic.com>

VF support for the 40G NIC was previously omitted; only the 25G VF was
added. Now add 40G VF support based on the existing 25G VF implementation,
with no major changes but only device ID adaptation.

Also, drop the redundant mac type check in txgbe_check_mac_link_vf(),
as the function now handles all VF types uniformly.

Signed-off-by: Zaiyu Wang <zaiyuwang@trustnetic.com>
---
 drivers/net/txgbe/base/txgbe_devids.h | 2 ++
 drivers/net/txgbe/base/txgbe_hw.c     | 7 +++++++
 drivers/net/txgbe/base/txgbe_regs.h   | 7 +++++--
 drivers/net/txgbe/base/txgbe_type.h   | 1 +
 drivers/net/txgbe/base/txgbe_vf.c     | 6 +++---
 drivers/net/txgbe/txgbe_ethdev.c      | 1 +
 drivers/net/txgbe/txgbe_ethdev_vf.c   | 2 ++
 7 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/net/txgbe/base/txgbe_devids.h b/drivers/net/txgbe/base/txgbe_devids.h
index b7133c7d54..f5454ffbb1 100644
--- a/drivers/net/txgbe/base/txgbe_devids.h
+++ b/drivers/net/txgbe/base/txgbe_devids.h
@@ -28,6 +28,8 @@
 #define TXGBE_DEV_ID_AML_VF			0x5001
 #define TXGBE_DEV_ID_AML5024_VF			0x5024
 #define TXGBE_DEV_ID_AML5124_VF			0x5124
+#define TXGBE_DEV_ID_AML503F_VF			0x503f
+#define TXGBE_DEV_ID_AML513F_VF			0x513f
 
 /*
  * Subsystem IDs
diff --git a/drivers/net/txgbe/base/txgbe_hw.c b/drivers/net/txgbe/base/txgbe_hw.c
index 0f3db3a1ad..21465d68ff 100644
--- a/drivers/net/txgbe/base/txgbe_hw.c
+++ b/drivers/net/txgbe/base/txgbe_hw.c
@@ -2543,6 +2543,7 @@ s32 txgbe_init_shared_code(struct txgbe_hw *hw)
 		break;
 	case txgbe_mac_sp_vf:
 	case txgbe_mac_aml_vf:
+	case txgbe_mac_aml40_vf:
 		status = txgbe_init_ops_vf(hw);
 		break;
 	default:
@@ -2573,6 +2574,7 @@ bool txgbe_is_vf(struct txgbe_hw *hw)
 	switch (hw->mac.type) {
 	case txgbe_mac_sp_vf:
 	case txgbe_mac_aml_vf:
+	case txgbe_mac_aml40_vf:
 		return true;
 	default:
 		return false;
@@ -2620,6 +2622,11 @@ s32 txgbe_set_mac_type(struct txgbe_hw *hw)
 		hw->phy.media_type = txgbe_media_type_virtual;
 		hw->mac.type = txgbe_mac_aml_vf;
 		break;
+	case TXGBE_DEV_ID_AML503F_VF:
+	case TXGBE_DEV_ID_AML513F_VF:
+		hw->phy.media_type = txgbe_media_type_virtual;
+		hw->mac.type = txgbe_mac_aml40_vf;
+		break;
 	default:
 		err = TXGBE_ERR_DEVICE_NOT_SUPPORTED;
 		DEBUGOUT("Unsupported device id: %x", hw->device_id);
diff --git a/drivers/net/txgbe/base/txgbe_regs.h b/drivers/net/txgbe/base/txgbe_regs.h
index 95c585a025..5eb92c54b6 100644
--- a/drivers/net/txgbe/base/txgbe_regs.h
+++ b/drivers/net/txgbe/base/txgbe_regs.h
@@ -1824,12 +1824,14 @@ txgbe_map_reg(struct txgbe_hw *hw, u32 reg)
 	switch (reg) {
 	case TXGBE_REG_RSSTBL:
 		if (hw->mac.type == txgbe_mac_sp_vf ||
-		    hw->mac.type == txgbe_mac_aml_vf)
+		    hw->mac.type == txgbe_mac_aml_vf ||
+		    hw->mac.type == txgbe_mac_aml40_vf)
 			reg = TXGBE_VFRSSTBL(0);
 		break;
 	case TXGBE_REG_RSSKEY:
 		if (hw->mac.type == txgbe_mac_sp_vf ||
-		    hw->mac.type == txgbe_mac_aml_vf)
+		    hw->mac.type == txgbe_mac_aml_vf ||
+		    hw->mac.type == txgbe_mac_aml40_vf)
 			reg = TXGBE_VFRSSKEY(0);
 		break;
 	default:
@@ -2012,6 +2014,7 @@ static inline void txgbe_flush(struct txgbe_hw *hw)
 		break;
 	case txgbe_mac_sp_vf:
 	case txgbe_mac_aml_vf:
+	case txgbe_mac_aml40_vf:
 		rd32(hw, TXGBE_VFSTATUS);
 		break;
 	default:
diff --git a/drivers/net/txgbe/base/txgbe_type.h b/drivers/net/txgbe/base/txgbe_type.h
index 956080c702..132d5c4eff 100644
--- a/drivers/net/txgbe/base/txgbe_type.h
+++ b/drivers/net/txgbe/base/txgbe_type.h
@@ -171,6 +171,7 @@ enum txgbe_mac_type {
 	txgbe_mac_aml40,
 	txgbe_mac_sp_vf,
 	txgbe_mac_aml_vf,
+	txgbe_mac_aml40_vf,
 	txgbe_num_macs
 };
 
diff --git a/drivers/net/txgbe/base/txgbe_vf.c b/drivers/net/txgbe/base/txgbe_vf.c
index 1a8a20f104..47d9bd16ee 100644
--- a/drivers/net/txgbe/base/txgbe_vf.c
+++ b/drivers/net/txgbe/base/txgbe_vf.c
@@ -134,7 +134,8 @@ s32 txgbe_reset_hw_vf(struct txgbe_hw *hw)
 	}
 
 	/* amlite: bme */
-	if (hw->mac.type == txgbe_mac_aml_vf)
+	if (hw->mac.type == txgbe_mac_aml_vf ||
+	    hw->mac.type == txgbe_mac_aml40_vf)
 		wr32(hw, TXGBE_BME_AML, 0x1);
 
 	if (!timeout)
@@ -493,8 +494,7 @@ s32 txgbe_check_mac_link_vf(struct txgbe_hw *hw, u32 *speed,
 	/* for SFP+ modules and DA cables it can take up to 500usecs
 	 * before the link status is correct
 	 */
-	if ((mac->type == txgbe_mac_sp_vf ||
-	     mac->type == txgbe_mac_aml_vf) && wait_to_complete) {
+	if (wait_to_complete) {
 		if (po32m(hw, TXGBE_VFSTATUS, TXGBE_VFSTATUS_UP,
 			0, NULL, 5, 100))
 			goto out;
diff --git a/drivers/net/txgbe/txgbe_ethdev.c b/drivers/net/txgbe/txgbe_ethdev.c
index 003a24141c..63b967d71a 100644
--- a/drivers/net/txgbe/txgbe_ethdev.c
+++ b/drivers/net/txgbe/txgbe_ethdev.c
@@ -5228,6 +5228,7 @@ txgbe_rss_update(enum txgbe_mac_type mac_type)
 	case txgbe_mac_aml:
 	case txgbe_mac_aml40:
 	case txgbe_mac_aml_vf:
+	case txgbe_mac_aml40_vf:
 		return 1;
 	default:
 		return 0;
diff --git a/drivers/net/txgbe/txgbe_ethdev_vf.c b/drivers/net/txgbe/txgbe_ethdev_vf.c
index e3832c0173..14cc49ece1 100644
--- a/drivers/net/txgbe/txgbe_ethdev_vf.c
+++ b/drivers/net/txgbe/txgbe_ethdev_vf.c
@@ -77,6 +77,8 @@ static const struct rte_pci_id pci_id_txgbevf_map[] = {
 	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_WANGXUN, TXGBE_DEV_ID_AML_VF) },
 	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_WANGXUN, TXGBE_DEV_ID_AML5024_VF) },
 	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_WANGXUN, TXGBE_DEV_ID_AML5124_VF) },
+	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_WANGXUN, TXGBE_DEV_ID_AML503F_VF) },
+	{ RTE_PCI_DEVICE(PCI_VENDOR_ID_WANGXUN, TXGBE_DEV_ID_AML513F_VF) },
 	{ .vendor_id = 0, /* sentinel */ },
 };
 
-- 
2.21.0.windows.1


^ permalink raw reply related

* [PATCH v3 3/4] net/txgbe: add support for VF sensing PF down
From: Zaiyu Wang @ 2026-06-23 11:38 UTC (permalink / raw)
  To: dev; +Cc: Zaiyu Wang, Jiawen Wu
In-Reply-To: <20260623113805.16464-1-zaiyuwang@trustnetic.com>

VFs should continue normal packet Rx/Tx after PF ifconfig down/up.

To achieve this, cooperate with mailbox commands added in our Linux
kernel driver txgbe-2.2.0. When mailbox messages lack the
TXGBE_VT_MSGTYPE_CTS flag, the PF is considered down. In this state,
the VF reports link down and stops transmitting. Upon detecting the
loss of CTS, the VF sends a reset request to the PF. If the request
succeeds (indicating PF recovery), the VF triggers an
RTE_ETH_EVENT_INTR_RESET event to notify the application or users to
reset the VF.

Additionally, hw->rx_loaded and hw->offset_loaded must be reset when
PF ifconfig down; otherwise, because hardware counter registers are
cleared during PF reset, the VF's software counters will overflow to
0xFFFFFFFF.

Signed-off-by: Zaiyu Wang <zaiyuwang@trustnetic.com>
---
 drivers/net/txgbe/base/txgbe_type.h |  1 +
 drivers/net/txgbe/txgbe_ethdev.c    |  3 +-
 drivers/net/txgbe/txgbe_ethdev_vf.c | 60 +++++++++++++++++++++++++----
 3 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/drivers/net/txgbe/base/txgbe_type.h b/drivers/net/txgbe/base/txgbe_type.h
index ede780321f..956080c702 100644
--- a/drivers/net/txgbe/base/txgbe_type.h
+++ b/drivers/net/txgbe/base/txgbe_type.h
@@ -883,6 +883,7 @@ struct txgbe_hw {
 	rte_atomic32_t swfw_busy;
 	u32 fec_mode;
 	u32 cur_fec_link;
+	bool pf_running;
 };
 
 struct txgbe_backplane_ability {
diff --git a/drivers/net/txgbe/txgbe_ethdev.c b/drivers/net/txgbe/txgbe_ethdev.c
index 0f484dfe91..003a24141c 100644
--- a/drivers/net/txgbe/txgbe_ethdev.c
+++ b/drivers/net/txgbe/txgbe_ethdev.c
@@ -3150,7 +3150,8 @@ txgbe_dev_link_update_share(struct rte_eth_dev *dev,
 
 	hw->mac.get_link_status = true;
 
-	if (intr->flags & TXGBE_FLAG_NEED_LINK_CONFIG)
+	if (intr->flags & TXGBE_FLAG_NEED_LINK_CONFIG ||
+	    (txgbe_is_vf(hw) && !hw->pf_running))
 		return rte_eth_linkstatus_set(dev, &link);
 
 	/* check if it needs to wait to complete, if lsc interrupt is enabled */
diff --git a/drivers/net/txgbe/txgbe_ethdev_vf.c b/drivers/net/txgbe/txgbe_ethdev_vf.c
index 7a50c7a855..e3832c0173 100644
--- a/drivers/net/txgbe/txgbe_ethdev_vf.c
+++ b/drivers/net/txgbe/txgbe_ethdev_vf.c
@@ -281,6 +281,7 @@ eth_txgbevf_dev_init(struct rte_eth_dev *eth_dev)
 	hw->subsystem_device_id = pci_dev->id.subsystem_device_id;
 	hw->subsystem_vendor_id = pci_dev->id.subsystem_vendor_id;
 	hw->hw_addr = (void *)pci_dev->mem_resource[0].addr;
+	hw->pf_running = true;
 
 	/* initialize the vfta */
 	memset(shadow_vfta, 0, sizeof(*shadow_vfta));
@@ -1405,10 +1406,20 @@ static s32 txgbevf_get_pf_link_status(struct rte_eth_dev *dev)
 	if (retval)
 		return 0;
 
+	if (!(msgbuf[0] & TXGBE_NOFITY_VF_LINK_STATUS))
+		return 0;
+
 	rte_eth_linkstatus_get(dev, &link);
 
+	if (!hw->pf_running) {
+		link.link_status =  RTE_ETH_LINK_DOWN;
+		link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+		link.link_duplex = RTE_ETH_LINK_HALF_DUPLEX;
+		return rte_eth_linkstatus_set(dev, &link);
+	}
+
 	link_up = msgbuf[1] & TXGBE_VFSTATUS_UP;
-	link_speed = (msgbuf[1] & 0xFFF0) >> 1;
+	link_speed = (msgbuf[1] & 0x1FFFFE) >> 1;
 
 	if (link_up == link.link_status && link_speed == link.link_speed)
 		return 0;
@@ -1434,10 +1445,22 @@ static s32 txgbevf_get_pf_link_status(struct rte_eth_dev *dev)
 static void txgbevf_check_link_for_intr(struct rte_eth_dev *dev)
 {
 	struct rte_eth_link orig_link, new_link;
+	struct txgbe_hw *hw = TXGBE_DEV_HW(dev);
 
 	rte_eth_linkstatus_get(dev, &orig_link);
-	txgbevf_dev_link_update(dev, 0);
-	rte_eth_linkstatus_get(dev, &new_link);
+
+	if (hw->pf_running) {
+		txgbevf_dev_link_update(dev, 0);
+		rte_eth_linkstatus_get(dev, &new_link);
+	} else {
+		DEBUGOUT("PF ifconfig down, so VF link down");
+		new_link.link_status = RTE_ETH_LINK_DOWN;
+		new_link.link_speed = RTE_ETH_SPEED_NUM_NONE;
+		new_link.link_duplex = RTE_ETH_LINK_HALF_DUPLEX;
+		new_link.link_autoneg = !(dev->data->dev_conf.link_speeds &
+					  RTE_ETH_LINK_SPEED_FIXED);
+		rte_eth_linkstatus_set(dev, &new_link);
+	}
 
 	PMD_DRV_LOG(INFO, "orig_link: %d, new_link: %d",
 		    orig_link.link_status, new_link.link_status);
@@ -1450,6 +1473,8 @@ static void txgbevf_check_link_for_intr(struct rte_eth_dev *dev)
 static void txgbevf_mbx_process(struct rte_eth_dev *dev)
 {
 	struct txgbe_hw *hw = TXGBE_DEV_HW(dev);
+	struct txgbe_mbx_info *mbx = &hw->mbx;
+	u32 msgbuf = 0;
 	u32 in_msg = 0;
 
 	/* peek the message first */
@@ -1457,14 +1482,33 @@ static void txgbevf_mbx_process(struct rte_eth_dev *dev)
 
 	/* PF reset VF event */
 	if (in_msg & TXGBE_PF_CONTROL_MSG) {
-		if (in_msg & TXGBE_NOFITY_VF_LINK_STATUS) {
+		/* msg is not CTS, we need to do reset */
+		if (!(in_msg & TXGBE_VT_MSGTYPE_CTS)) {
+			/* send reset to PF to reconfig CTS flag */
+			int err = 0;
+
+			msgbuf = TXGBE_VF_RESET;
+			err = mbx->write_posted(hw, &msgbuf, 1, 0);
+			if (err) {
+				hw->pf_running = false;
+				txgbevf_check_link_for_intr(dev);
+			} else {
+				hw->pf_running = true;
+				rte_eth_dev_callback_process(dev, RTE_ETH_EVENT_INTR_RESET,
+							     NULL);
+			}
+		}
+
+		if (in_msg & TXGBE_NOFITY_VF_LINK_STATUS)
 			txgbevf_get_pf_link_status(dev);
-		} else {
-			/* dummy mbx read to ack pf */
-			txgbe_read_mbx(hw, &in_msg, 1, 0);
+		else
 			/* check link status if pf ping vf */
 			txgbevf_check_link_for_intr(dev);
-		}
+	}
+
+	if (!hw->pf_running) {
+		hw->rx_loaded = true;
+		hw->offset_loaded = true;
 	}
 }
 
-- 
2.21.0.windows.1



^ permalink raw reply related

* [PATCH v3 2/4] net/txgbe: implement USO support
From: Zaiyu Wang @ 2026-06-23 11:38 UTC (permalink / raw)
  To: dev; +Cc: Zaiyu Wang, stable, Jiawen Wu, Ferruh Yigit
In-Reply-To: <20260623113805.16464-1-zaiyuwang@trustnetic.com>

USO (UDP Segmentation Offload), also known as UFO (UDP Fragmentation
Offload), is a hardware offload rarely seen in DPDK. Its implementation
is similar to TSO (TCP Segmentation Offload), so the driver enables
USO based on existing TSO support.

The driver has advertised RTE_ETH_TX_OFFLOAD_UDP_TSO in tx_offload_capa
since its initial integration, but the data path never implemented the
actual segmentation support. This commit fills that gap by enabling USO
in the transmit path, making the advertised capability fully functional.

Note:
USO segments UDP packets, requiring hardware to recalculate both IP
and UDP checksums due to length change. Thus, USO implicitly requires
IP and UDP checksum offloads, same as TSO.

Fixes: 86d8adc7702c ("net/txgbe: support getting device info")
Cc: stable@dpdk.org

Signed-off-by: Zaiyu Wang <zaiyuwang@trustnetic.com>
---
 drivers/net/txgbe/txgbe_rxtx.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/txgbe/txgbe_rxtx.c b/drivers/net/txgbe/txgbe_rxtx.c
index e2cd9b8841..c4cbdbc2b4 100644
--- a/drivers/net/txgbe/txgbe_rxtx.c
+++ b/drivers/net/txgbe/txgbe_rxtx.c
@@ -58,6 +58,7 @@ static const u64 TXGBE_TX_OFFLOAD_MASK = (RTE_MBUF_F_TX_IP_CKSUM |
 		RTE_MBUF_F_TX_VLAN |
 		RTE_MBUF_F_TX_L4_MASK |
 		RTE_MBUF_F_TX_TCP_SEG |
+		RTE_MBUF_F_TX_UDP_SEG |
 		RTE_MBUF_F_TX_TUNNEL_MASK |
 		RTE_MBUF_F_TX_OUTER_IP_CKSUM |
 		RTE_MBUF_F_TX_OUTER_UDP_CKSUM |
@@ -367,7 +368,7 @@ txgbe_set_xmit_ctx(struct txgbe_tx_queue *txq,
 	type_tucmd_mlhl |= TXGBE_TXD_PTID(tx_offload.ptid);
 
 	/* check if TCP segmentation required for this packet */
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 		tx_offload_mask.l2_len |= ~0;
 		tx_offload_mask.l3_len |= ~0;
 		tx_offload_mask.l4_len |= ~0;
@@ -517,7 +518,7 @@ tx_desc_cksum_flags_to_olinfo(uint64_t ol_flags)
 		tmp |= TXGBE_TXD_CC;
 		tmp |= TXGBE_TXD_EIPCS;
 	}
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 		tmp |= TXGBE_TXD_CC;
 		/* implies IPv4 cksum */
 		if (ol_flags & RTE_MBUF_F_TX_IPV4)
@@ -537,7 +538,7 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 
 	if (ol_flags & RTE_MBUF_F_TX_VLAN)
 		cmdtype |= TXGBE_TXD_VLE;
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
 		cmdtype |= TXGBE_TXD_TSE;
 	if (ol_flags & RTE_MBUF_F_TX_MACSEC)
 		cmdtype |= TXGBE_TXD_LINKSEC;
@@ -587,6 +588,8 @@ tx_desc_ol_flags_to_ptype(uint64_t oflags)
 
 	if (oflags & RTE_MBUF_F_TX_TCP_SEG)
 		ptype |= (tun ? RTE_PTYPE_INNER_L4_TCP : RTE_PTYPE_L4_TCP);
+	else if (oflags & RTE_MBUF_F_TX_UDP_SEG)
+		ptype |= (tun ? RTE_PTYPE_INNER_L4_UDP : RTE_PTYPE_L4_UDP);
 
 	/* Tunnel */
 	switch (oflags & RTE_MBUF_F_TX_TUNNEL_MASK) {
@@ -1071,7 +1074,7 @@ txgbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		olinfo_status = 0;
 		if (tx_ol_req) {
-			if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 				/* when TSO is on, paylen in descriptor is the
 				 * not the packet len but the tcp payload len
 				 */
@@ -2389,7 +2392,7 @@ txgbe_get_tx_port_offloads(struct rte_eth_dev *dev)
 		RTE_ETH_TX_OFFLOAD_TCP_CKSUM   |
 		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM  |
 		RTE_ETH_TX_OFFLOAD_TCP_TSO     |
-		RTE_ETH_TX_OFFLOAD_UDP_TSO	   |
+		RTE_ETH_TX_OFFLOAD_UDP_TSO     |
 		RTE_ETH_TX_OFFLOAD_UDP_TNL_TSO	|
 		RTE_ETH_TX_OFFLOAD_IP_TNL_TSO	|
 		RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO	|
-- 
2.21.0.windows.1


^ permalink raw reply related

* [PATCH v3 1/4] net/ngbe: implement USO support
From: Zaiyu Wang @ 2026-06-23 11:38 UTC (permalink / raw)
  To: dev; +Cc: Zaiyu Wang, stable, Jiawen Wu
In-Reply-To: <20260623113805.16464-1-zaiyuwang@trustnetic.com>

USO (UDP Segmentation Offload), also known as UFO (UDP Fragmentation
Offload), is a hardware offload rarely seen in DPDK. Its implementation
is similar to TSO (TCP Segmentation Offload), so the driver enables
USO based on existing TSO support.

The driver has advertised RTE_ETH_TX_OFFLOAD_UDP_TSO in tx_offload_capa
since its initial integration, but the data path never implemented the
actual segmentation support. This commit fills that gap by enabling USO
in the transmit path, making the advertised capability fully functional.

Note:
USO segments UDP packets, requiring hardware to recalculate both IP
and UDP checksums due to length change. Thus, USO implicitly requires
IP and UDP checksum offloads, same as TSO.

Fixes: 9f3206140274 ("net/ngbe: support TSO")
Cc: stable@dpdk.org

Signed-off-by: Zaiyu Wang <zaiyuwang@trustnetic.com>
---
 drivers/net/ngbe/ngbe_rxtx.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ngbe/ngbe_rxtx.c b/drivers/net/ngbe/ngbe_rxtx.c
index 91e215694c..a1389de9c0 100644
--- a/drivers/net/ngbe/ngbe_rxtx.c
+++ b/drivers/net/ngbe/ngbe_rxtx.c
@@ -30,6 +30,7 @@ static const u64 NGBE_TX_OFFLOAD_MASK = (RTE_MBUF_F_TX_IP_CKSUM |
 		RTE_MBUF_F_TX_VLAN |
 		RTE_MBUF_F_TX_L4_MASK |
 		RTE_MBUF_F_TX_TCP_SEG |
+		RTE_MBUF_F_TX_UDP_SEG |
 		NGBE_TX_IEEE1588_TMST);
 
 #define NGBE_TX_OFFLOAD_NOTSUP_MASK \
@@ -317,7 +318,7 @@ ngbe_set_xmit_ctx(struct ngbe_tx_queue *txq,
 	type_tucmd_mlhl |= NGBE_TXD_PTID(tx_offload.ptid);
 
 	/* check if TCP segmentation required for this packet */
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 		tx_offload_mask.l2_len |= ~0;
 		tx_offload_mask.l3_len |= ~0;
 		tx_offload_mask.l4_len |= ~0;
@@ -427,7 +428,7 @@ tx_desc_cksum_flags_to_olinfo(uint64_t ol_flags)
 		tmp |= NGBE_TXD_CC;
 		tmp |= NGBE_TXD_EIPCS;
 	}
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 		tmp |= NGBE_TXD_CC;
 		/* implies IPv4 cksum */
 		if (ol_flags & RTE_MBUF_F_TX_IPV4)
@@ -447,7 +448,7 @@ tx_desc_ol_flags_to_cmdtype(uint64_t ol_flags)
 
 	if (ol_flags & RTE_MBUF_F_TX_VLAN)
 		cmdtype |= NGBE_TXD_VLE;
-	if (ol_flags & RTE_MBUF_F_TX_TCP_SEG)
+	if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
 		cmdtype |= NGBE_TXD_TSE;
 	return cmdtype;
 }
@@ -483,6 +484,8 @@ tx_desc_ol_flags_to_ptype(uint64_t oflags)
 
 	if (oflags & RTE_MBUF_F_TX_TCP_SEG)
 		ptype |= RTE_PTYPE_L4_TCP;
+	else if (oflags & RTE_MBUF_F_TX_UDP_SEG)
+		ptype |= RTE_PTYPE_L4_UDP;
 
 	return ptype;
 }
@@ -764,7 +767,7 @@ ngbe_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts,
 
 		olinfo_status = 0;
 		if (tx_ol_req) {
-			if (ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
+			if (ol_flags & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG)) {
 				/* when TSO is on, paylen in descriptor is the
 				 * not the packet len but the tcp payload len
 				 */
@@ -1991,7 +1994,7 @@ ngbe_get_tx_port_offloads(struct rte_eth_dev *dev)
 		RTE_ETH_TX_OFFLOAD_TCP_CKSUM   |
 		RTE_ETH_TX_OFFLOAD_SCTP_CKSUM  |
 		RTE_ETH_TX_OFFLOAD_TCP_TSO     |
-		RTE_ETH_TX_OFFLOAD_UDP_TSO	   |
+		RTE_ETH_TX_OFFLOAD_UDP_TSO     |
 		RTE_ETH_TX_OFFLOAD_MULTI_SEGS;
 
 	if (hw->is_pf)
-- 
2.21.0.windows.1


^ permalink raw reply related

* [PATCH v3 0/4] Wangxun fixes and new features
From: Zaiyu Wang @ 2026-06-23 11:38 UTC (permalink / raw)
  To: dev; +Cc: Zaiyu Wang
In-Reply-To: <20260617105959.10764-1-zaiyuwang@trustnetic.com>

This patchset introduces three new features and critical fixes for our
recent release cycle.

Patches 1-2 add support for UDP Segmentation Offload (USO) to improve
large-packet transmission performance for UDP workloads.

Patch 3 enables VFs to sense PF ifconfig down/up events, allowing
better fault tolerance and fast recovery in virtualized environments.

Patch 4 adds the missing VF support for the Amber-Lite 40G NICs, which
was previously omitted in the initial integration.
---
v3:
- Patches 1-2: change from new feature to bug fix.
- Patch 3: fix link status update in txgbevf_get_pf_link_status();
           extend speed mask from 0xFFF0 to 0x1FFFFE for 40G speed;
           reduce msgbuf array to a single u32 variable;
           correct commit message.
- Patch 4: add a cleanup note in commit message for dropping the
           redundant mac type check in txgbevf_check_mac_link_vf();
	   remove a redundant blank line in txgbe_reset_hw_vf().
---
v2:
- Rebased on top of commit 72fdcb7bd19d to resolve conflict in
  drivers/net/txgbe/base/txgbe_type.h.
- No code changes compared to v1.
---

Zaiyu Wang (4):
  net/ngbe: implement USO support
  net/txgbe: implement USO support
  net/txgbe: add support for VF sensing PF down
  net/txgbe: add VF support for Amber-Lite 40G NIC

 drivers/net/ngbe/ngbe_rxtx.c          | 13 +++---
 drivers/net/txgbe/base/txgbe_devids.h |  2 +
 drivers/net/txgbe/base/txgbe_hw.c     |  7 +++
 drivers/net/txgbe/base/txgbe_regs.h   |  7 ++-
 drivers/net/txgbe/base/txgbe_type.h   |  2 +
 drivers/net/txgbe/base/txgbe_vf.c     |  6 +--
 drivers/net/txgbe/txgbe_ethdev.c      |  4 +-
 drivers/net/txgbe/txgbe_ethdev_vf.c   | 62 +++++++++++++++++++++++----
 drivers/net/txgbe/txgbe_rxtx.c        | 13 +++---
 9 files changed, 92 insertions(+), 24 deletions(-)

-- 
2.21.0.windows.1


^ permalink raw reply

* Re: [PATCH v3 07/11] bus/ifpga: allocate interrupt during probing
From: Bruce Richardson @ 2026-06-23 11:25 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, thomas, stephen, fengchengwen, longli, hemant.agrawal,
	Rosen Xu
In-Reply-To: <20260623105439.2144694-8-david.marchand@redhat.com>

On Tue, Jun 23, 2026 at 12:54:34PM +0200, David Marchand wrote:
> Allocating the interrupt handle is a waste of memory if no device is
> probed later (like for example, if a allowlist is passed).
> Instead, allocate this handle at the time probe_device is called.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply

* [PATCH] net/intel: fix use of non-recommended string functions
From: Bruce Richardson @ 2026-06-23 11:22 UTC (permalink / raw)
  To: dev
  Cc: Praveen Shetty, Anatoly Burakov, Vladimir Medvedkin, Shaiq Wani,
	Ciara Loftus, Bruce Richardson, stable

Replace use of the strncpy and strcpy functions with the safer strlcpy
alternative, which both bounds-checks and guarantees null termination.
In the process also replace instances of strcat with strlcat where
appropriate.

Fixes: 2d823ecd671c ("net/cpfl: support device initialization")
Fixes: c4c59ae62793 ("net/cpfl: refactor flow parser")
Fixes: c10881d3ee74 ("net/cpfl: support flow prog action")
Fixes: 9481b0902efe ("net/ice: send driver version to firmware")
Fixes: 7f7cbf80bdb7 ("net/ice: factorize firmware loading")
Fixes: 549343c25db8 ("net/idpf: support device initialization")
Fixes: 484f8e407a94 ("net/igb: support xstats by ID")
Fixes: fca82a8accf9 ("net/ixgbe: support xstats by ID")
Fixes: e163c18a15b0 ("net/i40e: update ptype and pctype info")
Cc: stable@dpdk.org

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/intel/cpfl/cpfl_ethdev.c      |  2 +-
 drivers/net/intel/cpfl/cpfl_flow_parser.c | 16 +++++++---------
 drivers/net/intel/e1000/igb_ethdev.c      |  5 +++--
 drivers/net/intel/i40e/i40e_ethdev.c      |  3 +--
 drivers/net/intel/ice/ice_ethdev.c        | 16 +++++++---------
 drivers/net/intel/idpf/idpf_ethdev.c      |  2 +-
 drivers/net/intel/ixgbe/ixgbe_ethdev.c    |  5 +++--
 7 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/drivers/net/intel/cpfl/cpfl_ethdev.c b/drivers/net/intel/cpfl/cpfl_ethdev.c
index 7ac8797490..4315adb68c 100644
--- a/drivers/net/intel/cpfl/cpfl_ethdev.c
+++ b/drivers/net/intel/cpfl/cpfl_ethdev.c
@@ -2534,7 +2534,7 @@ cpfl_adapter_ext_init(struct rte_pci_device *pci_dev, struct cpfl_adapter_ext *a
 	hw->subsystem_vendor_id = pci_dev->id.subsystem_vendor_id;
 	adapter->host_id = get_running_host_id();
 
-	strncpy(adapter->name, pci_dev->device.name, PCI_PRI_STR_SIZE);
+	strlcpy(adapter->name, pci_dev->device.name, sizeof(adapter->name));
 
 	memcpy(&base->caps, &req_caps, sizeof(struct virtchnl2_get_capabilities));
 
diff --git a/drivers/net/intel/cpfl/cpfl_flow_parser.c b/drivers/net/intel/cpfl/cpfl_flow_parser.c
index dfaddc9ec5..b1d06725e9 100644
--- a/drivers/net/intel/cpfl/cpfl_flow_parser.c
+++ b/drivers/net/intel/cpfl/cpfl_flow_parser.c
@@ -211,7 +211,7 @@ cpfl_flow_js_pattern_key_proto_field(json_t *ob_fields,
 			PMD_DRV_LOG(ERR, "The 'name' is too long.");
 			goto err;
 		}
-		strncpy(js_field->fields[i].name, name, CPFL_JS_STR_SIZE - 1);
+		strlcpy(js_field->fields[i].name, name, CPFL_JS_STR_SIZE);
 
 		if (js_field->type == RTE_FLOW_ITEM_TYPE_ETH ||
 		    js_field->type == RTE_FLOW_ITEM_TYPE_IPV4) {
@@ -716,8 +716,7 @@ cpfl_flow_js_mr_key(json_t *ob_mr_keys, struct cpfl_flow_js_mr_key *js_mr_key)
 					PMD_DRV_LOG(ERR, "The 'name' is too long.");
 					goto err;
 				}
-				strncpy(js_mr_key->actions[i].prog.name, name,
-					CPFL_JS_STR_SIZE - 1);
+				strlcpy(js_mr_key->actions[i].prog.name, name, CPFL_JS_STR_SIZE);
 			}
 
 			ob_param = json_object_get(object, "parameters");
@@ -742,8 +741,8 @@ cpfl_flow_js_mr_key(json_t *ob_mr_keys, struct cpfl_flow_js_mr_key *js_mr_key)
 						PMD_DRV_LOG(ERR, "The 'name' is too long.");
 						goto err;
 					}
-					strncpy(js_mr_key->actions[i].prog.params[j].name, name,
-						CPFL_JS_STR_SIZE - 1);
+					strlcpy(js_mr_key->actions[i].prog.params[j].name, name,
+						CPFL_JS_STR_SIZE);
 				}
 				ret = cpfl_json_t_to_uint16(subobject, "size", &value);
 				if (ret < 0) {
@@ -810,7 +809,7 @@ cpfl_flow_js_mr_layout(json_t *ob_layouts, struct cpfl_flow_js_mr_action_mod *js
 			PMD_DRV_LOG(ERR, "The 'hint' is too long.");
 			goto err;
 		}
-		strncpy(js_mod->layout[i].hint, hint, CPFL_JS_STR_SIZE - 1);
+		strlcpy(js_mod->layout[i].hint, hint, CPFL_JS_STR_SIZE);
 	}
 
 	return 0;
@@ -856,7 +855,7 @@ cpfl_flow_js_mr_content(json_t *ob_content, struct cpfl_flow_js_mr_action_mod *j
 			PMD_DRV_LOG(ERR, "The 'type' is too long.");
 			goto err;
 		}
-		strncpy(js_mod->content.fields[i].type, type, CPFL_JS_STR_SIZE - 1);
+		strlcpy(js_mod->content.fields[i].type, type, CPFL_JS_STR_SIZE);
 		ret = cpfl_json_t_to_uint16(object, "start", &start);
 		if (ret < 0) {
 			PMD_DRV_LOG(ERR, "Can not parse 'start'.");
@@ -1806,8 +1805,7 @@ cpfl_parse_check_prog_action(struct cpfl_flow_js_mr_key_action *key_act,
 			return -EINVAL;
 		if (param->has_name) {
 			mr_key_prog->has_name = TRUE;
-			strncpy(mr_key_prog->name[param->index], param->name,
-				CPFL_JS_STR_SIZE - 1);
+			strlcpy(mr_key_prog->name[param->index], param->name, CPFL_JS_STR_SIZE);
 		}
 	}
 
diff --git a/drivers/net/intel/e1000/igb_ethdev.c b/drivers/net/intel/e1000/igb_ethdev.c
index a4370fe32b..524c030be6 100644
--- a/drivers/net/intel/e1000/igb_ethdev.c
+++ b/drivers/net/intel/e1000/igb_ethdev.c
@@ -2047,8 +2047,9 @@ static int eth_igb_xstats_get_names_by_id(struct rte_eth_dev *dev,
 				PMD_INIT_LOG(ERR, "id value isn't valid");
 				return -1;
 			}
-			strcpy(xstats_names[i].name,
-					xstats_names_copy[ids[i]].name);
+			strlcpy(xstats_names[i].name,
+					xstats_names_copy[ids[i]].name,
+					sizeof(xstats_names[i].name));
 		}
 		return limit;
 	}
diff --git a/drivers/net/intel/i40e/i40e_ethdev.c b/drivers/net/intel/i40e/i40e_ethdev.c
index 1370db68f3..b2694cd33a 100644
--- a/drivers/net/intel/i40e/i40e_ethdev.c
+++ b/drivers/net/intel/i40e/i40e_ethdev.c
@@ -11916,8 +11916,7 @@ i40e_update_customized_ptype(struct rte_eth_dev *dev, uint8_t *pkg,
 			for (n = 0; n < proto_num; n++) {
 				if (proto[n].proto_id != proto_id)
 					continue;
-				memset(name, 0, sizeof(name));
-				strcpy(name, proto[n].name);
+				strlcpy(name, proto[n].name, sizeof(name));
 				PMD_DRV_LOG(INFO, "name = %s", name);
 				if (!strncasecmp(name, "PPPOE", 5))
 					ptype_mapping[i].sw_ptype |=
diff --git a/drivers/net/intel/ice/ice_ethdev.c b/drivers/net/intel/ice/ice_ethdev.c
index ad9c49b339..99305b604b 100644
--- a/drivers/net/intel/ice/ice_ethdev.c
+++ b/drivers/net/intel/ice/ice_ethdev.c
@@ -1897,7 +1897,7 @@ ice_send_driver_ver(struct ice_hw *hw)
 	dv.minor_ver = 0;
 	dv.build_ver = 0;
 	dv.subbuild_ver = 0;
-	strncpy((char *)dv.driver_string, "dpdk", sizeof(dv.driver_string));
+	strlcpy((char *)dv.driver_string, "dpdk", sizeof(dv.driver_string));
 
 	return ice_aq_send_driver_ver(hw, &dv, NULL);
 }
@@ -2054,24 +2054,22 @@ int ice_load_pkg(struct ice_adapter *adapter, bool use_dsn, uint64_t dsn)
 	if (!use_dsn)
 		goto no_dsn;
 
-	strncpy(pkg_file, ICE_PKG_FILE_SEARCH_PATH_UPDATES,
-		ICE_MAX_PKG_FILENAME_SIZE);
-	strcat(pkg_file, opt_ddp_filename);
+	strlcpy(pkg_file, ICE_PKG_FILE_SEARCH_PATH_UPDATES, ICE_MAX_PKG_FILENAME_SIZE);
+	strlcat(pkg_file, opt_ddp_filename, ICE_MAX_PKG_FILENAME_SIZE);
 	if (ice_firmware_read(pkg_file, &buf, &bufsz) == 0)
 		goto load_fw;
 
-	strncpy(pkg_file, ICE_PKG_FILE_SEARCH_PATH_DEFAULT,
-		ICE_MAX_PKG_FILENAME_SIZE);
-	strcat(pkg_file, opt_ddp_filename);
+	strlcpy(pkg_file, ICE_PKG_FILE_SEARCH_PATH_DEFAULT, ICE_MAX_PKG_FILENAME_SIZE);
+	strlcat(pkg_file, opt_ddp_filename, ICE_MAX_PKG_FILENAME_SIZE);
 	if (ice_firmware_read(pkg_file, &buf, &bufsz) == 0)
 		goto load_fw;
 
 no_dsn:
-	strncpy(pkg_file, ICE_PKG_FILE_UPDATES, ICE_MAX_PKG_FILENAME_SIZE);
+	strlcpy(pkg_file, ICE_PKG_FILE_UPDATES, ICE_MAX_PKG_FILENAME_SIZE);
 	if (ice_firmware_read(pkg_file, &buf, &bufsz) == 0)
 		goto load_fw;
 
-	strncpy(pkg_file, ICE_PKG_FILE_DEFAULT, ICE_MAX_PKG_FILENAME_SIZE);
+	strlcpy(pkg_file, ICE_PKG_FILE_DEFAULT, ICE_MAX_PKG_FILENAME_SIZE);
 	if (ice_firmware_read(pkg_file, &buf, &bufsz) < 0) {
 		PMD_INIT_LOG(ERR, "Failed to load default DDP package " ICE_PKG_FILE_DEFAULT);
 		return -1;
diff --git a/drivers/net/intel/idpf/idpf_ethdev.c b/drivers/net/intel/idpf/idpf_ethdev.c
index fc761c6094..c13505416a 100644
--- a/drivers/net/intel/idpf/idpf_ethdev.c
+++ b/drivers/net/intel/idpf/idpf_ethdev.c
@@ -1497,7 +1497,7 @@ idpf_adapter_ext_init(struct rte_pci_device *pci_dev, struct idpf_adapter_ext *a
 	hw->device_id = pci_dev->id.device_id;
 	hw->subsystem_vendor_id = pci_dev->id.subsystem_vendor_id;
 
-	strncpy(adapter->name, pci_dev->device.name, PCI_PRI_STR_SIZE);
+	strlcpy(adapter->name, pci_dev->device.name, sizeof(adapter->name));
 
 	memcpy(&base->caps, &req_caps, sizeof(struct virtchnl2_get_capabilities));
 
diff --git a/drivers/net/intel/ixgbe/ixgbe_ethdev.c b/drivers/net/intel/ixgbe/ixgbe_ethdev.c
index f9de95e4fc..b36867d18d 100644
--- a/drivers/net/intel/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/intel/ixgbe/ixgbe_ethdev.c
@@ -3635,8 +3635,9 @@ static int ixgbe_dev_xstats_get_names_by_id(
 			PMD_INIT_LOG(ERR, "id value isn't valid");
 			return -1;
 		}
-		strcpy(xstats_names[i].name,
-				xstats_names_copy[ids[i]].name);
+		strlcpy(xstats_names[i].name,
+				xstats_names_copy[ids[i]].name,
+				sizeof(xstats_names[i].name));
 	}
 	return limit;
 }
-- 
2.53.0


^ permalink raw reply related

* [PATCH v3 11/11] bus/vmbus: support unplug
From: David Marchand @ 2026-06-23 10:54 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, bruce.richardson, fengchengwen, longli,
	hemant.agrawal, Wei Hu
In-Reply-To: <20260623105439.2144694-1-david.marchand@redhat.com>

Add .unplug callback to handle driver removal, device unmapping, and
interrupt cleanup. This enables use of the generic bus cleanup helper.

The cleanup function was already performing these operations, so it
seems safe to expose them through the unplug operation.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 doc/guides/rel_notes/release_26_07.rst |  4 +++
 drivers/bus/vmbus/vmbus_common.c       | 41 ++++++++++++--------------
 2 files changed, 23 insertions(+), 22 deletions(-)

diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index 5d7aa8d1bf..55d3b44527 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -114,6 +114,10 @@ New Features
 
   Added no-IOMMU mode for devices without or not enabling IOMMU/SVA.
 
+* **Added unplug operation support to VMBUS bus.**
+
+  Implemented device unplug operation to allow runtime removal of VMBUS devices.
+
 * **Added selective Rx in ethdev API.**
 
   Some parts of packets may be discarded in Rx
diff --git a/drivers/bus/vmbus/vmbus_common.c b/drivers/bus/vmbus/vmbus_common.c
index a6e3a24a7c..cd6e851e4c 100644
--- a/drivers/bus/vmbus/vmbus_common.c
+++ b/drivers/bus/vmbus/vmbus_common.c
@@ -144,34 +144,29 @@ rte_vmbus_probe(void)
 }
 
 static int
-rte_vmbus_cleanup(struct rte_bus *bus)
+vmbus_unplug_device(struct rte_device *rte_dev)
 {
-	struct rte_vmbus_device *dev;
-	int error = 0;
-
-	RTE_BUS_FOREACH_DEV(dev, bus) {
-		const struct rte_vmbus_driver *drv;
-		int ret;
-
-		if (!rte_dev_is_probed(&dev->device))
-			continue;
-		drv = RTE_BUS_DRIVER(dev->device.driver, *drv);
-		if (drv->remove == NULL)
-			continue;
+	const struct rte_vmbus_driver *drv = RTE_BUS_DRIVER(rte_dev->driver, *drv);
+	struct rte_vmbus_device *dev = RTE_BUS_DEVICE(rte_dev, *dev);
+	int ret = 0;
 
+	if (drv->remove != NULL) {
 		ret = drv->remove(dev);
 		if (ret < 0)
-			error = -1;
+			return ret;
+	}
 
-		rte_vmbus_unmap_device(dev);
-		rte_intr_instance_free(dev->intr_handle);
+	rte_vmbus_unmap_device(dev);
+	rte_intr_instance_free(dev->intr_handle);
+	dev->intr_handle = NULL;
 
-		dev->device.driver = NULL;
-		rte_bus_remove_device(bus, &dev->device);
-		free(dev);
-	}
+	return 0;
+}
 
-	return error;
+static void
+vmbus_free_device(struct rte_device *dev)
+{
+	free(RTE_BUS_DEVICE(dev, struct rte_vmbus_device));
 }
 
 static int
@@ -222,10 +217,12 @@ rte_vmbus_unregister(struct rte_vmbus_driver *driver)
 struct rte_bus rte_vmbus_bus = {
 	.scan = rte_vmbus_scan,
 	.probe = rte_bus_generic_probe,
-	.cleanup = rte_vmbus_cleanup,
+	.free_device = vmbus_free_device,
+	.cleanup = rte_bus_generic_cleanup,
 	.find_device = rte_bus_generic_find_device,
 	.match = vmbus_bus_match,
 	.probe_device = vmbus_probe_device,
+	.unplug_device = vmbus_unplug_device,
 	.parse = vmbus_parse,
 	.dev_compare = vmbus_dev_compare,
 };
-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 10/11] bus/vmbus: store name in bus specific device
From: David Marchand @ 2026-06-23 10:54 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, bruce.richardson, fengchengwen, longli,
	hemant.agrawal, Wei Hu
In-Reply-To: <20260623105439.2144694-1-david.marchand@redhat.com>

The device name is allocated with strdup() during scan and freed in
several places. However, when this bus cleanup is converted to use the
EAL generic helper, freeing the device object will require a custom
helper to also free the device name (and for this, a cast will be
needed).

Instead, add an embedded name array to rte_vmbus_device structure
(char name[RTE_DEV_NAME_MAX_LEN]) which is sufficient for all VMBUS
device names (UUID format: 36 characters, or shorter legacy format).

This simplifies the device freeing to a simple free() call.

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/bus/vmbus/bus_vmbus_driver.h |  1 +
 drivers/bus/vmbus/linux/vmbus_bus.c  | 10 +++-------
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/bus/vmbus/bus_vmbus_driver.h b/drivers/bus/vmbus/bus_vmbus_driver.h
index 888d856141..706ff1fcf5 100644
--- a/drivers/bus/vmbus/bus_vmbus_driver.h
+++ b/drivers/bus/vmbus/bus_vmbus_driver.h
@@ -38,6 +38,7 @@ enum hv_uio_map {
  */
 struct rte_vmbus_device {
 	struct rte_device device;              /**< Inherit core device */
+	char name[RTE_DEV_NAME_MAX_LEN];       /**< VMBUS device name */
 	rte_uuid_t device_id;		       /**< VMBUS device id */
 	rte_uuid_t class_id;		       /**< VMBUS device type */
 	uint32_t relid;			       /**< id for primary */
diff --git a/drivers/bus/vmbus/linux/vmbus_bus.c b/drivers/bus/vmbus/linux/vmbus_bus.c
index 77d904ad6d..779ea50b92 100644
--- a/drivers/bus/vmbus/linux/vmbus_bus.c
+++ b/drivers/bus/vmbus/linux/vmbus_bus.c
@@ -280,15 +280,14 @@ vmbus_scan_one(const char *name)
 	char filename[PATH_MAX];
 	char dirname[PATH_MAX];
 	unsigned long tmp;
-	char *dev_name;
 
 	dev = calloc(1, sizeof(*dev));
 	if (dev == NULL)
 		return -1;
 
-	dev->device.name = dev_name = strdup(name);
-	if (!dev->device.name)
+	if (rte_strscpy(dev->name, name, sizeof(dev->name)) < 0)
 		goto error;
+	dev->device.name = dev->name;
 
 	/* sysfs base directory
 	 *   /sys/bus/vmbus/devices/7a08391f-f5a0-4ac0-9802-d13fd964f8df
@@ -305,7 +304,6 @@ vmbus_scan_one(const char *name)
 
 	/* skip non-network devices */
 	if (rte_uuid_compare(dev->class_id, vmbus_nic_uuid) != 0) {
-		free(dev_name);
 		free(dev);
 		return 0;
 	}
@@ -330,7 +328,7 @@ vmbus_scan_one(const char *name)
 		dev->monitor_id = UINT8_MAX;
 	}
 
-	dev->device.devargs = rte_bus_find_devargs(&rte_vmbus_bus, dev_name);
+	dev->device.devargs = rte_bus_find_devargs(&rte_vmbus_bus, dev->name);
 
 	dev->device.numa_node = SOCKET_ID_ANY;
 	if (vmbus_use_numa(dev)) {
@@ -360,7 +358,6 @@ vmbus_scan_one(const char *name)
 		} else { /* already registered */
 			VMBUS_LOG(NOTICE,
 				"%s already registered", name);
-			free(dev_name);
 			free(dev);
 		}
 		return 0;
@@ -371,7 +368,6 @@ vmbus_scan_one(const char *name)
 error:
 	VMBUS_LOG(DEBUG, "failed");
 
-	free(dev_name);
 	free(dev);
 	return -1;
 }
-- 
2.54.0


^ permalink raw reply related

* [PATCH v3 09/11] bus: implement cleanup in EAL
From: David Marchand @ 2026-06-23 10:54 UTC (permalink / raw)
  To: dev
  Cc: thomas, stephen, bruce.richardson, fengchengwen, longli,
	hemant.agrawal, Parav Pandit, Xueming Li, Sachin Saxena, Rosen Xu,
	Chenbo Xia, Nipun Gupta, Tomasz Duszynski, Wei Hu
In-Reply-To: <20260623105439.2144694-1-david.marchand@redhat.com>

Introduce a generic cleanup helper rte_bus_generic_cleanup() that
eliminates code duplication across bus cleanup implementations:
unplug probed devices, remove devargs, remove from bus list,
and free device structures.

Add .free_device operation to struct rte_bus to allow buses to specify
how to free their device structures.
Update all buses for the new .cleanup and RTE_REGISTER_BUS prototypes.

Convert to rte_bus_generic_cleanup() the buses that have both a .cleanup
and .unplug_device: this requires implementing .free_device for them.

Untouched buses are:
- dma/idxd which has no unplug support,
- bus/cdx which has unplug support, but no cleanup was implemented so
  far,
- NXP buses:
  - bus/dpaa and bus/fslmc have many issues on interrupt
    allocation/setup/freeing or VFIO setup/release,
  - bus/fslmc cleanup callback is actually implemented in its internal
    VFIO layer and requires too much refactoring,

Signed-off-by: David Marchand <david.marchand@redhat.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
Changes since v1:
- dropped hack on using free() and the check in RTE_REGISTER_BUS,

---
 drivers/bus/auxiliary/auxiliary_common.c | 28 ++++---------------
 drivers/bus/dpaa/dpaa_bus.c              |  4 +--
 drivers/bus/fslmc/fslmc_bus.c            |  2 +-
 drivers/bus/ifpga/ifpga_bus.c            | 32 ++++------------------
 drivers/bus/pci/pci_common.c             | 29 +++++---------------
 drivers/bus/platform/platform.c          | 20 ++++----------
 drivers/bus/uacce/uacce.c                | 28 ++++---------------
 drivers/bus/vdev/vdev.c                  | 26 +++++++-----------
 drivers/bus/vmbus/vmbus_common.c         |  6 ++---
 lib/eal/common/eal_common_bus.c          | 33 ++++++++++++++++++++++-
 lib/eal/include/bus_driver.h             | 34 +++++++++++++++++++++++-
 11 files changed, 107 insertions(+), 135 deletions(-)

diff --git a/drivers/bus/auxiliary/auxiliary_common.c b/drivers/bus/auxiliary/auxiliary_common.c
index 10f466e57a..80b90a4961 100644
--- a/drivers/bus/auxiliary/auxiliary_common.c
+++ b/drivers/bus/auxiliary/auxiliary_common.c
@@ -179,29 +179,10 @@ rte_auxiliary_unregister(struct rte_auxiliary_driver *driver)
 	rte_bus_remove_driver(&auxiliary_bus, &driver->driver);
 }
 
-static int
-auxiliary_cleanup(void)
+static void
+auxiliary_free_device(struct rte_device *dev)
 {
-	struct rte_auxiliary_device *dev;
-	int error = 0;
-
-	RTE_BUS_FOREACH_DEV(dev, &auxiliary_bus) {
-		int ret;
-
-		if (rte_dev_is_probed(&dev->device)) {
-			ret = auxiliary_unplug_device(&dev->device);
-			if (ret < 0) {
-				rte_errno = errno;
-				error = -1;
-			}
-		}
-
-		rte_devargs_remove(dev->device.devargs);
-		rte_bus_remove_device(&auxiliary_bus, &dev->device);
-		free(dev);
-	}
-
-	return error;
+	free(RTE_BUS_DEVICE(dev, struct rte_auxiliary_device));
 }
 
 static int
@@ -247,7 +228,8 @@ auxiliary_get_iommu_class(void)
 struct rte_bus auxiliary_bus = {
 	.scan = auxiliary_scan,
 	.probe = rte_bus_generic_probe,
-	.cleanup = auxiliary_cleanup,
+	.free_device = auxiliary_free_device,
+	.cleanup = rte_bus_generic_cleanup,
 	.find_device = rte_bus_generic_find_device,
 	.match = auxiliary_bus_match,
 	.probe_device = auxiliary_probe_device,
diff --git a/drivers/bus/dpaa/dpaa_bus.c b/drivers/bus/dpaa/dpaa_bus.c
index ee467b94d5..54779f82f7 100644
--- a/drivers/bus/dpaa/dpaa_bus.c
+++ b/drivers/bus/dpaa/dpaa_bus.c
@@ -807,12 +807,12 @@ dpaa_bus_probe_device(struct rte_driver *drv, struct rte_device *dev)
 }
 
 static int
-dpaa_bus_cleanup(void)
+dpaa_bus_cleanup(struct rte_bus *bus)
 {
 	struct rte_dpaa_device *dev;
 
 	BUS_INIT_FUNC_TRACE();
-	RTE_BUS_FOREACH_DEV(dev, &rte_dpaa_bus) {
+	RTE_BUS_FOREACH_DEV(dev, bus) {
 		const struct rte_dpaa_driver *drv;
 		int ret = 0;
 
diff --git a/drivers/bus/fslmc/fslmc_bus.c b/drivers/bus/fslmc/fslmc_bus.c
index dca4c5b182..1a0eca30b4 100644
--- a/drivers/bus/fslmc/fslmc_bus.c
+++ b/drivers/bus/fslmc/fslmc_bus.c
@@ -436,7 +436,7 @@ fslmc_bus_match(const struct rte_driver *drv, const struct rte_device *dev)
 }
 
 static int
-rte_fslmc_close(void)
+rte_fslmc_close(struct rte_bus *bus __rte_unused)
 {
 	int ret = 0;
 
diff --git a/drivers/bus/ifpga/ifpga_bus.c b/drivers/bus/ifpga/ifpga_bus.c
index 7e2e2efce0..79d1c3778f 100644
--- a/drivers/bus/ifpga/ifpga_bus.c
+++ b/drivers/bus/ifpga/ifpga_bus.c
@@ -298,33 +298,10 @@ ifpga_unplug_device(struct rte_device *dev)
 	return 0;
 }
 
-/*
- * Cleanup the content of the Intel FPGA bus, and call the remove() function
- * for all registered devices.
- */
-static int
-ifpga_cleanup(void)
+static void
+ifpga_free_device(struct rte_device *dev)
 {
-	struct rte_afu_device *afu_dev;
-	int error = 0;
-
-	RTE_BUS_FOREACH_DEV(afu_dev, &rte_ifpga_bus) {
-		int ret = 0;
-
-		if (rte_dev_is_probed(&afu_dev->device)) {
-			ret = ifpga_unplug_device(&afu_dev->device);
-			if (ret < 0) {
-				rte_errno = errno;
-				error = -1;
-			}
-		}
-
-		rte_devargs_remove(afu_dev->device.devargs);
-		rte_bus_remove_device(&rte_ifpga_bus, &afu_dev->device);
-		free(afu_dev);
-	}
-
-	return error;
+	free(RTE_BUS_DEVICE(dev, struct rte_afu_device));
 }
 
 static int
@@ -374,7 +351,8 @@ ifpga_parse(const char *name, void *addr)
 static struct rte_bus rte_ifpga_bus = {
 	.scan        = ifpga_scan,
 	.probe       = rte_bus_generic_probe,
-	.cleanup     = ifpga_cleanup,
+	.free_device = ifpga_free_device,
+	.cleanup     = rte_bus_generic_cleanup,
 	.find_device = rte_bus_generic_find_device,
 	.match       = ifpga_bus_match,
 	.probe_device = ifpga_probe_device,
diff --git a/drivers/bus/pci/pci_common.c b/drivers/bus/pci/pci_common.c
index bf4822f7ec..0f635e1537 100644
--- a/drivers/bus/pci/pci_common.c
+++ b/drivers/bus/pci/pci_common.c
@@ -317,29 +317,11 @@ pci_unplug_device(struct rte_device *rte_dev)
 	return 0;
 }
 
-static int
-pci_cleanup(void)
+static void
+pci_free_device(struct rte_device *dev)
 {
-	struct rte_pci_device *dev;
-	int error = 0;
-
-	RTE_BUS_FOREACH_DEV(dev, &rte_pci_bus) {
-		int ret = 0;
-
-		if (rte_dev_is_probed(&dev->device)) {
-			ret = pci_unplug_device(&dev->device);
-			if (ret < 0) {
-				rte_errno = errno;
-				error = -1;
-			}
-		}
-
-		rte_devargs_remove(dev->device.devargs);
-		rte_bus_remove_device(&rte_pci_bus, &dev->device);
-		pci_free(RTE_PCI_DEVICE_INTERNAL(dev));
-	}
-
-	return error;
+	struct rte_pci_device *pdev = RTE_BUS_DEVICE(dev, *pdev);
+	pci_free(RTE_PCI_DEVICE_INTERNAL(pdev));
 }
 
 /* dump one device */
@@ -743,7 +725,8 @@ struct rte_bus rte_pci_bus = {
 	.allow_multi_probe = true,
 	.scan = rte_pci_scan,
 	.probe = rte_bus_generic_probe,
-	.cleanup = pci_cleanup,
+	.free_device = pci_free_device,
+	.cleanup = rte_bus_generic_cleanup,
 	.find_device = rte_bus_generic_find_device,
 	.match = pci_bus_match,
 	.probe_device = pci_probe_device,
diff --git a/drivers/bus/platform/platform.c b/drivers/bus/platform/platform.c
index 5b3c78a505..90d865a8df 100644
--- a/drivers/bus/platform/platform.c
+++ b/drivers/bus/platform/platform.c
@@ -491,26 +491,17 @@ platform_bus_get_iommu_class(void)
 	return RTE_IOVA_DC;
 }
 
-static int
-platform_bus_cleanup(void)
+static void
+platform_free_device(struct rte_device *dev)
 {
-	struct rte_platform_device *pdev;
-
-	RTE_BUS_FOREACH_DEV(pdev, &platform_bus) {
-		if (rte_dev_is_probed(&pdev->device))
-			platform_bus_unplug_device(&pdev->device);
-
-		rte_devargs_remove(pdev->device.devargs);
-		rte_bus_remove_device(&platform_bus, &pdev->device);
-		free(pdev);
-	}
-
-	return 0;
+	free(RTE_BUS_DEVICE(dev, struct rte_platform_device));
 }
 
 static struct rte_bus platform_bus = {
 	.scan = platform_bus_scan,
 	.probe = rte_bus_generic_probe,
+	.free_device = platform_free_device,
+	.cleanup = rte_bus_generic_cleanup,
 	.find_device = rte_bus_generic_find_device,
 	.match = platform_bus_match,
 	.probe_device = platform_bus_probe_device,
@@ -520,7 +511,6 @@ static struct rte_bus platform_bus = {
 	.dma_unmap = platform_bus_dma_unmap,
 	.get_iommu_class = platform_bus_get_iommu_class,
 	.dev_iterate = rte_bus_generic_dev_iterate,
-	.cleanup = platform_bus_cleanup,
 };
 
 RTE_REGISTER_BUS(platform, platform_bus);
diff --git a/drivers/bus/uacce/uacce.c b/drivers/bus/uacce/uacce.c
index bfe1f26557..99a6fb314d 100644
--- a/drivers/bus/uacce/uacce.c
+++ b/drivers/bus/uacce/uacce.c
@@ -402,29 +402,10 @@ uacce_unplug_device(struct rte_device *rte_dev)
 	return 0;
 }
 
-static int
-uacce_cleanup(void)
+static void
+uacce_free_device(struct rte_device *dev)
 {
-	struct rte_uacce_device *dev;
-	int error = 0;
-
-	RTE_BUS_FOREACH_DEV(dev, &uacce_bus) {
-		int ret = 0;
-
-		if (rte_dev_is_probed(&dev->device)) {
-			ret = uacce_unplug_device(&dev->device);
-			if (ret < 0) {
-				rte_errno = errno;
-				error = -1;
-			}
-		}
-
-		rte_devargs_remove(dev->device.devargs);
-		rte_bus_remove_device(&uacce_bus, &dev->device);
-		free(dev);
-	}
-
-	return error;
+	free(RTE_BUS_DEVICE(dev, struct rte_uacce_device));
 }
 
 static int
@@ -551,7 +532,8 @@ rte_uacce_unregister(struct rte_uacce_driver *driver)
 static struct rte_bus uacce_bus = {
 	.scan = uacce_scan,
 	.probe = rte_bus_generic_probe,
-	.cleanup = uacce_cleanup,
+	.free_device = uacce_free_device,
+	.cleanup = rte_bus_generic_cleanup,
 	.match = uacce_bus_match,
 	.probe_device = uacce_probe_device,
 	.unplug_device = uacce_unplug_device,
diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c
index 7e94f86e28..02d719a44d 100644
--- a/drivers/bus/vdev/vdev.c
+++ b/drivers/bus/vdev/vdev.c
@@ -548,26 +548,19 @@ vdev_scan(void)
 	return 0;
 }
 
+static void
+vdev_free_device(struct rte_device *dev)
+{
+	free(RTE_BUS_DEVICE(dev, struct rte_vdev_device));
+}
+
 static int
-vdev_cleanup(void)
+vdev_cleanup(struct rte_bus *bus)
 {
-	struct rte_vdev_device *dev;
-	int error = 0;
+	int error;
 
 	rte_spinlock_recursive_lock(&vdev_device_list_lock);
-	RTE_BUS_FOREACH_DEV(dev, &rte_vdev_bus) {
-		int ret;
-
-		if (rte_dev_is_probed(&dev->device)) {
-			ret = vdev_unplug_device(&dev->device);
-			if (ret < 0)
-				error = -1;
-		}
-
-		rte_devargs_remove(dev->device.devargs);
-		rte_bus_remove_device(&rte_vdev_bus, &dev->device);
-		free(dev);
-	}
+	error = rte_bus_generic_cleanup(bus);
 	rte_spinlock_recursive_unlock(&vdev_device_list_lock);
 
 	return error;
@@ -608,6 +601,7 @@ vdev_get_iommu_class(void)
 static struct rte_bus rte_vdev_bus = {
 	.scan = vdev_scan,
 	.probe = rte_bus_generic_probe,
+	.free_device = vdev_free_device,
 	.cleanup = vdev_cleanup,
 	.find_device = vdev_find_device,
 	.match = vdev_bus_match,
diff --git a/drivers/bus/vmbus/vmbus_common.c b/drivers/bus/vmbus/vmbus_common.c
index bfb45e963c..a6e3a24a7c 100644
--- a/drivers/bus/vmbus/vmbus_common.c
+++ b/drivers/bus/vmbus/vmbus_common.c
@@ -144,12 +144,12 @@ rte_vmbus_probe(void)
 }
 
 static int
-rte_vmbus_cleanup(void)
+rte_vmbus_cleanup(struct rte_bus *bus)
 {
 	struct rte_vmbus_device *dev;
 	int error = 0;
 
-	RTE_BUS_FOREACH_DEV(dev, &rte_vmbus_bus) {
+	RTE_BUS_FOREACH_DEV(dev, bus) {
 		const struct rte_vmbus_driver *drv;
 		int ret;
 
@@ -167,7 +167,7 @@ rte_vmbus_cleanup(void)
 		rte_intr_instance_free(dev->intr_handle);
 
 		dev->device.driver = NULL;
-		rte_bus_remove_device(&rte_vmbus_bus, &dev->device);
+		rte_bus_remove_device(bus, &dev->device);
 		free(dev);
 	}
 
diff --git a/lib/eal/common/eal_common_bus.c b/lib/eal/common/eal_common_bus.c
index ca13ccce5b..9ba23516ee 100644
--- a/lib/eal/common/eal_common_bus.c
+++ b/lib/eal/common/eal_common_bus.c
@@ -124,6 +124,37 @@ rte_bus_generic_probe(struct rte_bus *bus)
 	return (probed && probed == failed) ? -1 : 0;
 }
 
+/*
+ * Generic cleanup function for buses.
+ * Iterates through all devices on the bus, unplugs probed devices,
+ * removes devargs, removes devices from the bus list, and frees device structures.
+ */
+RTE_EXPORT_INTERNAL_SYMBOL(rte_bus_generic_cleanup)
+int
+rte_bus_generic_cleanup(struct rte_bus *bus)
+{
+	struct rte_device *dev;
+	int error = 0;
+
+	RTE_VERIFY(bus->free_device);
+	RTE_VERIFY(bus->unplug_device);
+
+	while ((dev = TAILQ_FIRST(&bus->device_list)) != NULL) {
+		if (rte_dev_is_probed(dev)) {
+			if (bus->unplug_device && bus->unplug_device(dev) < 0) {
+				rte_errno = errno;
+				error = -1;
+			}
+		}
+
+		rte_devargs_remove(dev->devargs);
+		rte_bus_remove_device(bus, dev);
+		bus->free_device(dev);
+	}
+
+	return error;
+}
+
 /* Probe all devices of all buses */
 RTE_EXPORT_SYMBOL(rte_bus_probe)
 int
@@ -164,7 +195,7 @@ eal_bus_cleanup(void)
 	TAILQ_FOREACH(bus, &rte_bus_list, next) {
 		if (bus->cleanup == NULL)
 			continue;
-		if (bus->cleanup() != 0)
+		if (bus->cleanup(bus) != 0)
 			ret = -1;
 	}
 
diff --git a/lib/eal/include/bus_driver.h b/lib/eal/include/bus_driver.h
index fde55ff06d..4f6521c87f 100644
--- a/lib/eal/include/bus_driver.h
+++ b/lib/eal/include/bus_driver.h
@@ -226,17 +226,31 @@ typedef int (*rte_bus_hot_unplug_handler_t)(struct rte_device *dev);
  */
 typedef int (*rte_bus_sigbus_handler_t)(const void *failure_addr);
 
+/**
+ * Free a bus-specific device structure.
+ *
+ * @param dev
+ *	Device pointer.
+ */
+typedef void (*rte_bus_free_device_t)(struct rte_device *dev);
+
 /**
  * Implementation specific cleanup function which is responsible for cleaning up
  * devices on that bus with applicable drivers.
  *
+ * The cleanup operation is the counterpart to scan, removing all devices added
+ * during scan.
+ *
  * This is called while iterating over each registered bus.
  *
+ * @param bus
+ *   Pointer to the bus to cleanup.
+ *
  * @return
  * 0 for successful cleanup
  * !0 for any error during cleanup
  */
-typedef int (*rte_bus_cleanup_t)(void);
+typedef int (*rte_bus_cleanup_t)(struct rte_bus *bus);
 
 /**
  * Check if a driver matches a device.
@@ -336,6 +350,7 @@ struct rte_bus {
 				/**< handle hot-unplug failure on the bus */
 	rte_bus_sigbus_handler_t sigbus_handler;
 					/**< handle sigbus error on the bus */
+	rte_bus_free_device_t free_device; /**< Free bus-specific device */
 	rte_bus_cleanup_t cleanup;   /**< Cleanup devices on bus */
 	RTE_TAILQ_HEAD(, rte_device) device_list; /**< List of devices on the bus */
 	RTE_TAILQ_HEAD(, rte_driver) driver_list; /**< List of drivers on the bus */
@@ -624,6 +639,23 @@ struct rte_driver *rte_bus_find_driver(const struct rte_bus *bus, const struct r
 __rte_internal
 int rte_bus_generic_probe(struct rte_bus *bus);
 
+/**
+ * Generic cleanup function for buses.
+ *
+ * Iterates through all devices on the bus, unplugs probed devices,
+ * removes devargs, removes devices from the bus list, and frees device structures.
+ *
+ * This function can be used by buses that don't require special cleanup
+ * logic and just need the standard device cleanup sequence.
+ *
+ * @param bus
+ *   Pointer to the bus to cleanup.
+ * @return
+ *   0 on success, -1 if any errors occurred during cleanup.
+ */
+__rte_internal
+int rte_bus_generic_cleanup(struct rte_bus *bus);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.54.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox