* RE: [PATCH v7] net/idpf: update for new mempool cache algorithm
From: Morten Brørup @ 2026-06-10 11:31 UTC (permalink / raw)
To: dev
In-Reply-To: <20260601183621.252920-1-mb@smartsharesystems.com>
Recheck-request: iol-unit-arm64-testing
Unrelated CI failure.
CI log:
==== 20 line log output for Ubuntu 20.04 (lpm_autotest): ====
[7/407] Compiling C object lib/librte_log.a.p/log_log_journal.c.o
[8/407] Linking static target lib/librte_log.a
[9/407] Generating rte_argparse_map with a custom command
[10/407] Compiling C object lib/librte_kvargs.a.p/kvargs_rte_kvargs.c.o
[11/407] Linking static target lib/librte_kvargs.a
[12/407] Compiling C object lib/librte_argparse.a.p/argparse_rte_argparse.c.o
[13/407] Generating kvargs.sym_chk with a custom command (wrapped by meson to capture output)
[14/407] Linking static target lib/librte_argparse.a
[15/407] Generating log.sym_chk with a custom command (wrapped by meson to capture output)
[16/407] Linking target lib/librte_log.so.26.2
[17/407] Compiling C object lib/librte_telemetry.a.p/telemetry_telemetry_data.c.o
[18/407] Compiling C object lib/librte_telemetry.a.p/telemetry_telemetry.c.o
[19/407] Generating rte_telemetry_map with a custom command
FAILED: lib/telemetry_exports.map
/usr/bin/python3 ../buildtools/gen-version-map.py --linker gnu --abi-version ../ABI_VERSION --output lib/telemetry_exports.map --source ../lib/telemetry/telemetry.c ../lib/telemetry/telemetry_data.c ../lib/telemetry/telemetry_legacy.c
Segmentation fault (core dumped)
^ permalink raw reply
* Re: [PATCH v7] net/idpf: update for new mempool cache algorithm
From: Bruce Richardson @ 2026-06-10 11:31 UTC (permalink / raw)
To: Morten Brørup; +Cc: Jingjing Wu, Praveen Shetty, dev
In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35F65902@smartserver.smartshare.dk>
On Wed, Jun 10, 2026 at 01:21:38PM +0200, Morten Brørup wrote:
> Intel idpf maintainers,
>
> PING for review.
>
> The mempool library has been improved [1], so the idpf PMD - which bypasses the mempool API - must be updated to match the library implementation. This patch does that.
>
> [1]: https://git.dpdk.org/dpdk/commit/?id=f5e1310f16e0909e7e7f71807123644c63b23cba
>
>
> Venlig hilsen / Kind regards,
> -Morten Brørup
>
Yep. I was waiting to see what happened to the mempool patch before
considering this for next-net-intel.
/Bruce
^ permalink raw reply
* RE: [PATCH v7] net/idpf: update for new mempool cache algorithm
From: Morten Brørup @ 2026-06-10 11:21 UTC (permalink / raw)
To: Jingjing Wu, Praveen Shetty, Bruce Richardson; +Cc: dev
In-Reply-To: <20260601183621.252920-1-mb@smartsharesystems.com>
Intel idpf maintainers,
PING for review.
The mempool library has been improved [1], so the idpf PMD - which bypasses the mempool API - must be updated to match the library implementation. This patch does that.
[1]: https://git.dpdk.org/dpdk/commit/?id=f5e1310f16e0909e7e7f71807123644c63b23cba
Venlig hilsen / Kind regards,
-Morten Brørup
> -----Original Message-----
> From: Morten Brørup [mailto:mb@smartsharesystems.com]
> Sent: Monday, 1 June 2026 20.36
> To: dev@dpdk.org; Andrew Rybchenko; Bruce Richardson; Jingjing Wu;
> Praveen Shetty; Hemant Agrawal; Sachin Saxena
> Cc: Morten Brørup
> Subject: [PATCH v7] net/idpf: update for new mempool cache algorithm
>
> As a consequence of the improved mempool cache algorithm, the PMD was
> updated regarding how much to backfill the mempool cache in the AVX512
> code path.
>
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
> v7:
> * Rebased.
> v6:
> * Moved driver changes out as separate patches, for easier review.
> (Bruce)
> ---
> Depends-on: patch-164745 ("mempool: improve cache behaviour and
> performance")
> ---
> .../net/intel/idpf/idpf_common_rxtx_avx512.c | 52 +++++++++++++++----
> 1 file changed, 42 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> index 8db4c64106..5788a009ab 100644
> --- a/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> +++ b/drivers/net/intel/idpf/idpf_common_rxtx_avx512.c
> @@ -148,15 +148,31 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq)
> /* Can this be satisfied from the cache? */
> if (cache->len < IDPF_RXQ_REARM_THRESH) {
> /* No. Backfill the cache first, and then fill from it */
> - uint32_t req = IDPF_RXQ_REARM_THRESH + (cache->size -
> - cache->len);
>
> - /* How many do we require i.e. number to fill the cache +
> the request */
> + /* Backfill would exceed the cache bounce buffer limit? */
> + __rte_assume(cache->size / 2 <= RTE_MEMPOOL_CACHE_MAX_SIZE
> / 2);
> + if (unlikely(cache->size / 2 < IDPF_RXQ_REARM_THRESH)) {
> + idpf_singleq_rearm_common(rxq);
> + return;
> + }
> +
> + /*
> + * Backfill the cache from the backend;
> + * move up the hot objects in the cache to the top half of
> the cache,
> + * and fetch (size / 2) objects to the bottom of the cache.
> + */
> + __rte_assume(cache->len < cache->size / 2);
> + rte_memcpy(&cache->objs[cache->size / 2], &cache->objs[0],
> + sizeof(void *) * cache->len);
> int ret = rte_mempool_ops_dequeue_bulk
> - (rxq->mp, &cache->objs[cache->len], req);
> + (rxq->mp, &cache->objs[0], cache->size / 2);
> if (ret == 0) {
> - cache->len += req;
> + cache->len += cache->size / 2;
> } else {
> + /*
> + * No further action is required for roll back, as
> the objects moved
> + * in the cache were actually copied, and the cache
> remains intact.
> + */
> if (rxq->rxrearm_nb + IDPF_RXQ_REARM_THRESH >=
> rxq->nb_rx_desc) {
> __m128i dma_addr0;
> @@ -565,15 +581,31 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq)
> /* Can this be satisfied from the cache? */
> if (cache->len < IDPF_RXQ_REARM_THRESH) {
> /* No. Backfill the cache first, and then fill from it */
> - uint32_t req = IDPF_RXQ_REARM_THRESH + (cache->size -
> - cache->len);
>
> - /* How many do we require i.e. number to fill the cache +
> the request */
> + /* Backfill would exceed the cache bounce buffer limit? */
> + __rte_assume(cache->size / 2 <= RTE_MEMPOOL_CACHE_MAX_SIZE
> / 2);
> + if (unlikely(cache->size / 2 < IDPF_RXQ_REARM_THRESH)) {
> + idpf_splitq_rearm_common(rx_bufq);
> + return;
> + }
> +
> + /*
> + * Backfill the cache from the backend;
> + * move up the hot objects in the cache to the top half of
> the cache,
> + * and fetch (size / 2) objects to the bottom of the cache.
> + */
> + __rte_assume(cache->len < cache->size / 2);
> + rte_memcpy(&cache->objs[cache->size / 2], &cache->objs[0],
> + sizeof(void *) * cache->len);
> int ret = rte_mempool_ops_dequeue_bulk
> - (rx_bufq->mp, &cache->objs[cache->len], req);
> + (rx_bufq->mp, &cache->objs[0], cache->size /
> 2);
> if (ret == 0) {
> - cache->len += req;
> + cache->len += cache->size / 2;
> } else {
> + /*
> + * No further action is required for roll back, as
> the objects moved
> + * in the cache were actually copied, and the cache
> remains intact.
> + */
> if (rx_bufq->rxrearm_nb + IDPF_RXQ_REARM_THRESH >=
> rx_bufq->nb_rx_desc) {
> __m128i dma_addr0;
> --
> 2.43.0
^ permalink raw reply
* Re: [PATCH v8] mempool: improve cache behaviour and performance
From: Thomas Monjalon @ 2026-06-10 11:06 UTC (permalink / raw)
To: Morten Brørup; +Cc: dev, Andrew Rybchenko
In-Reply-To: <20260604114851.12586-1-mb@smartsharesystems.com>
04/06/2026 13:48, Morten Brørup:
> This patch refactors the mempool cache to eliminate some unexpected
> behaviour and reduce the mempool cache miss rate.
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v2] eal: fix function versioning with LTO
From: David Marchand @ 2026-06-10 10:54 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
In-Reply-To: <20260609135532.80396-1-stephen@networkplumber.org>
On Tue, 9 Jun 2026 at 15:56, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> When using function versioning and building with LTO,
> GCC gets confused by the symbol versioning using __asm__.
> There are no uses of function versioning in upstream repo.
> This was found when adding additional parameter to
> rte_eth_dev_get_name_by_port.
>
> Assembler messages:
> Error: invalid attempt to declare external version name as default in symbol `rte_eth_dev_get_name_by_port@@DPDK_27'
>
> The workaround GCC 10 introduced was an additional function attribute;
> clang doesn't have or need this attribute. No need to backport this to
> LTS since there is no function versioning in those releases.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Marchand <david.marchand@redhat.com>
Applied, thanks.
--
David Marchand
^ permalink raw reply
* Re: [EXTERNAL] [PATCH dpdk] graph: replace circular buffer with priority-based bitmap scheduling
From: Kiran Kumar Kokkilagadda @ 2026-06-10 10:51 UTC (permalink / raw)
To: Robin Jarry, dev@dpdk.org, Jerin Jacob, Nithin Kumar Dabilpuram,
Zhirun Yan
Cc: Vladimir Medvedkin, Christophe Fontaine, David Marchand,
Konstantin Ananyev, Maxime Leroy
In-Reply-To: <20260519101232.541102-2-rjarry@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 36805 bytes --]
From: Robin Jarry <rjarry@redhat.com>
Date: Tuesday, 19 May 2026 at 3:43 PM
To: dev@dpdk.org <dev@dpdk.org>; Jerin Jacob <jerinj@marvell.com>; Kiran Kumar Kokkilagadda <kirankumark@marvell.com>; Nithin Kumar Dabilpuram <ndabilpuram@marvell.com>; Zhirun Yan <yanzhirun_163@163.com>
Cc: Vladimir Medvedkin <vladimir.medvedkin@intel.com>; Christophe Fontaine <cfontain@redhat.com>; David Marchand <david.marchand@redhat.com>; Konstantin Ananyev <konstantin.ananyev@huawei.com>; Maxime Leroy <maxime@leroys.fr>
Subject: [EXTERNAL] [PATCH dpdk] graph: replace circular buffer with priority-based bitmap scheduling
Replace the FIFO circular buffer used to track pending nodes with
a bitmap and a priority-sorted schedule table. Each node can now have
a scheduling priority (int16_t, default 0, lower value means visited
first). Source nodes are forced to INT16_MIN so they always run first.
At graph creation time, nodes are sorted by (priority, id) and assigned
a bit position (sched_idx). During the walk, a bitmap tracks which nodes
have pending objects. Scanning from the lowest bit naturally visits
nodes in priority order.
This improves batching in fan-out-then-converge topologies. When
eth_input classifies packets to both mpls_input and ipv4_input, the old
FIFO order could process ipv4_input before mpls_input, causing
ipv4_input to be visited twice (once before and once after the MPLS
label is popped). With mpls_input at a higher priority (lower value), it
runs first and its output accumulates in ipv4_input which is then
visited only once with all packets.
The bitmap set operation is idempotent (OR on an already-set bit is
a no-op) which removes the need for the idx == 0 guards that the
circular buffer required to avoid duplicate enqueue.
Suggested-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Cc: Christophe Fontaine <cfontain@redhat.com>
Cc: David Marchand <david.marchand@redhat.com>
Cc: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Cc: Maxime Leroy <maxime@leroys.fr>
---
doc/guides/prog_guide/graph_lib.rst | 37 +-
.../prog_guide/img/graph_mem_layout.svg | 1823 +++++++----------
lib/graph/graph.c | 19 +-
lib/graph/graph_debug.c | 12 +-
lib/graph/graph_populate.c | 117 +-
lib/graph/graph_private.h | 27 +-
lib/graph/node.c | 2 +
lib/graph/rte_graph.h | 1 +
lib/graph/rte_graph_model_mcore_dispatch.h | 34 +-
lib/graph/rte_graph_model_rtc.h | 65 +-
lib/graph/rte_graph_worker.h | 2 +-
lib/graph/rte_graph_worker_common.h | 81 +-
12 files changed, 984 insertions(+), 1236 deletions(-)
diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst
index 8409e7666e85..9c6d8679b686 100644
--- a/doc/guides/prog_guide/graph_lib.rst
+++ b/doc/guides/prog_guide/graph_lib.rst
@@ -117,13 +117,22 @@ next_node[]:
The dynamic array to store the downstream nodes connected to this node. Downstream
node should not be current node itself or a source node.
+priority:
+^^^^^^^^^
+
+The scheduling priority of the node (``int16_t``, default 0). Nodes with lower
+priority values are visited first during the graph walk. This allows control
+over the order in which pending nodes are processed, which can improve packet
+batching in topologies where multiple paths converge on the same node.
+
Source node:
^^^^^^^^^^^^
Source nodes are static nodes created using ``RTE_NODE_REGISTER`` by passing
``flags`` as ``RTE_NODE_SOURCE_F``.
-While performing the graph walk, the ``process()`` function of all the source
-nodes will be called first. So that these nodes can be used as input nodes for a graph.
+Source nodes are automatically assigned the lowest possible priority
+(``INT16_MIN``) so that their ``process()`` function is always called first
+during the graph walk. This ensures they act as input nodes for a graph.
nb_xstats:
^^^^^^^^^^
@@ -396,12 +405,26 @@ Graph object memory layout
Understanding the memory layout helps to debug the graph library and
improve the performance if needed.
-Graph object consists of a header, circular buffer to store the pending stream
-when walking over the graph, variable-length memory to store the ``rte_node`` objects,
-and variable-length memory to store the xstat reported by each ``rte_node``.
+A graph object consists of a header, a scheduling table mapping bit positions to
+node offsets, pending and source bitmaps for tracking which nodes need
+processing, variable-length memory to store the ``rte_node`` objects, and
+variable-length memory to store the xstat reported by each ``rte_node``.
-The graph_nodes_mem_create() creates and populate this memory. The functions
-such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this memory
+Nodes are sorted by ``(priority, node_id)`` at graph creation time and each
+node is assigned a bit position in the pending bitmap. During the graph walk,
+the bitmap is scanned from the lowest bit, so nodes with lower priority values
+are visited first. Source nodes are always assigned the lowest priority
+(``INT16_MIN``) to ensure they run before any processing node.
+
+This priority-based ordering improves batching in fan-out-then-converge
+topologies. For example, if ``eth_input`` classifies packets to both
+``mpls_input`` and ``ipv4_input``, giving ``mpls_input`` a lower priority value
+ensures it runs first. Its output accumulates in ``ipv4_input`` which is then
+visited only once with all packets, instead of being visited twice (before and
+after MPLS label popping).
+
+The ``graph_fp_mem_create()`` function creates and populates this memory. The
+functions such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this memory
to enable fastpath services.
diff --git a/lib/graph/graph.c b/lib/graph/graph.c
index 6911ea8abeed..6dc1402e6bd0 100644
--- a/lib/graph/graph.c
+++ b/lib/graph/graph.c
@@ -334,20 +334,6 @@ graph_mem_fixup_secondary(struct rte_graph *graph)
return graph_mem_fixup_node_ctx(graph);
}
-static bool
-graph_src_node_avail(struct graph *graph)
-{
- struct graph_node *graph_node;
-
- STAILQ_FOREACH(graph_node, &graph->node_list, next)
- if ((graph_node->node->flags & RTE_NODE_SOURCE_F) &&
- (graph_node->node->lcore_id == RTE_MAX_LCORE ||
- graph->lcore_id == graph_node->node->lcore_id))
- return true;
-
- return false;
-}
-
RTE_EXPORT_SYMBOL(rte_graph_model_mcore_dispatch_core_bind)
int
rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
@@ -375,9 +361,8 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore)
graph->graph->dispatch.lcore_id = graph->lcore_id;
graph->socket = rte_lcore_to_socket_id(lcore);
- /* check the availability of source node */
- if (!graph_src_node_avail(graph))
- graph->graph->head = 0;
+ /* Rebuild source bitmap with only source nodes bound to this lcore */
+ graph_src_bitmap_rebuild(graph);
return 0;
diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c
index e3b8cccdc1f0..8e99fa1b0fb8 100644
--- a/lib/graph/graph_debug.c
+++ b/lib/graph/graph_debug.c
@@ -15,8 +15,8 @@ graph_dump(FILE *f, struct graph *g)
fprintf(f, "graph <%s>\n", g->name);
fprintf(f, " id=%" PRIu32 "\n", g->id);
- fprintf(f, " cir_start=%" PRIu32 "\n", g->cir_start);
- fprintf(f, " cir_mask=%" PRIu32 "\n", g->cir_mask);
+ fprintf(f, " sched_table_off=%" PRIu32 "\n", g->sched_table_off);
+ fprintf(f, " nb_sched_words=%" PRIu16 "\n", g->nb_sched_words);
fprintf(f, " addr=%p\n", g);
fprintf(f, " graph=%p\n", g->graph);
fprintf(f, " mem_sz=%zu\n", g->mem_sz);
@@ -63,14 +63,14 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all)
fprintf(f, "graph <%s> @ %p\n", g->name, g);
fprintf(f, " id=%" PRIu32 "\n", g->id);
- fprintf(f, " head=%" PRId32 "\n", (int32_t)g->head);
- fprintf(f, " tail=%" PRId32 "\n", (int32_t)g->tail);
- fprintf(f, " cir_mask=0x%" PRIx32 "\n", g->cir_mask);
fprintf(f, " nb_nodes=%" PRId32 "\n", g->nb_nodes);
+ fprintf(f, " nb_sched_words=%" PRIu16 "\n", g->nb_sched_words);
fprintf(f, " socket=%d\n", g->socket);
fprintf(f, " fence=0x%" PRIx64 "\n", g->fence);
fprintf(f, " nodes_start=0x%" PRIx32 "\n", g->nodes_start);
- fprintf(f, " cir_start=%p\n", g->cir_start);
+ fprintf(f, " sched_table=%p\n", g->sched_table);
+ fprintf(f, " pending=%p\n", g->pending);
+ fprintf(f, " src_pending=%p\n", g->src_pending);
rte_graph_foreach_node(count, off, g, n) {
if (!all && n->idx == 0)
diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c
index 026daecb2122..45bc7704fede 100644
--- a/lib/graph/graph_populate.c
+++ b/lib/graph/graph_populate.c
@@ -3,6 +3,8 @@
*/
+#include <stdlib.h>
+
#include <rte_common.h>
#include <rte_errno.h>
#include <rte_malloc.h>
@@ -15,19 +17,27 @@ static size_t
graph_fp_mem_calc_size(struct graph *graph)
{
struct graph_node *graph_node;
- rte_node_t val;
+ uint16_t nwords;
size_t sz;
/* Graph header */
sz = sizeof(struct rte_graph);
- /* Source nodes list */
- sz += sizeof(rte_graph_off_t) * graph->src_node_count;
- /* Circular buffer for pending streams of size number of nodes */
- val = rte_align32pow2(graph->node_count * sizeof(rte_graph_off_t));
- sz = RTE_ALIGN(sz, val);
- graph->cir_start = sz;
- graph->cir_mask = rte_align32pow2(graph->node_count) - 1;
- sz += val;
+
+ /* Schedule table: node offset indexed by sched_idx */
+ sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+ graph->sched_table_off = sz;
+ sz += sizeof(rte_graph_off_t) * graph->node_count;
+
+ /* Pending and source pending bitmaps */
+ nwords = (graph->node_count + 63) / 64;
+ graph->nb_sched_words = nwords;
+ sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+ graph->pending_off = sz;
+ sz += sizeof(uint64_t) * nwords;
+ sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
+ graph->src_pending_off = sz;
+ sz += sizeof(uint64_t) * nwords;
+
/* Fence */
sz += sizeof(RTE_GRAPH_FENCE);
sz = RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE);
@@ -54,20 +64,44 @@ graph_fp_mem_calc_size(struct graph *graph)
}
static void
-graph_header_popluate(struct graph *_graph)
+graph_header_populate(struct graph *_graph)
{
struct rte_graph *graph = _graph->graph;
- graph->tail = 0;
- graph->head = (int32_t)-_graph->src_node_count;
- graph->cir_mask = _graph->cir_mask;
graph->nb_nodes = _graph->node_count;
- graph->cir_start = RTE_PTR_ADD(graph, _graph->cir_start);
+ graph->nb_sched_words = _graph->nb_sched_words;
+ graph->sched_table = RTE_PTR_ADD(graph, _graph->sched_table_off);
+ graph->pending = RTE_PTR_ADD(graph, _graph->pending_off);
+ graph->src_pending = RTE_PTR_ADD(graph, _graph->src_pending_off);
graph->nodes_start = _graph->nodes_start;
graph->socket = _graph->socket;
graph->id = _graph->id;
memcpy(graph->name, _graph->name, RTE_GRAPH_NAMESIZE);
graph->fence = RTE_GRAPH_FENCE;
+
+ memset(graph->pending, 0, sizeof(uint64_t) * _graph->nb_sched_words);
+ memset(graph->src_pending, 0, sizeof(uint64_t) * _graph->nb_sched_words);
+}
+
+static int16_t
+graph_node_effective_priority(const struct graph_node *gn)
+{
+ if (gn->node->flags & RTE_NODE_SOURCE_F)
+ return INT16_MIN;
+ return gn->node->priority;
+}
+
+static int
+graph_node_priority_cmp(const void *a, const void *b)
+{
+ const struct graph_node *const *na = a;
+ const struct graph_node *const *nb = b;
+ int16_t pa = graph_node_effective_priority(*na);
+ int16_t pb = graph_node_effective_priority(*nb);
+
+ if (pa != pb)
+ return (int)pa - (int)pb;
+ return (int)(*na)->node->id - (int)(*nb)->node->id;
}
static void
@@ -76,15 +110,26 @@ graph_nodes_populate(struct graph *_graph)
rte_graph_off_t xstat_off = _graph->xstats_start;
rte_graph_off_t off = _graph->nodes_start;
struct rte_graph *graph = _graph->graph;
- struct graph_node *graph_node;
+ struct graph_node **sorted, *graph_node;
rte_edge_t count, nb_edges;
rte_node_t pid;
+ uint32_t n;
- STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
+ /* Build a sorted array of graph_node pointers by (priority, id) */
+ sorted = calloc(_graph->node_count, sizeof(*sorted));
+ RTE_VERIFY(sorted != NULL);
+ n = 0;
+ STAILQ_FOREACH(graph_node, &_graph->node_list, next)
+ sorted[n++] = graph_node;
+ qsort(sorted, n, sizeof(*sorted), graph_node_priority_cmp);
+
+ for (n = 0; n < _graph->node_count; n++) {
+ graph_node = sorted[n];
struct rte_node *node = RTE_PTR_ADD(graph, off);
memset(node, 0, sizeof(*node));
node->fence = RTE_GRAPH_FENCE;
node->off = off;
+ node->sched_idx = n;
if (graph_pcap_is_enable()) {
node->process = graph_pcap_dispatch;
node->original_process = graph_node->node->process;
@@ -123,8 +168,14 @@ graph_nodes_populate(struct graph *_graph)
off += sizeof(struct rte_node *) * nb_edges;
off = RTE_ALIGN(off, RTE_CACHE_LINE_SIZE);
node->next = off;
+
+ /* Fill the schedule table */
+ graph->sched_table[n] = node->off;
+
__rte_node_stream_alloc(graph, node);
}
+
+ free(sorted);
}
struct rte_node *
@@ -179,12 +230,11 @@ graph_node_nexts_populate(struct graph *_graph)
}
static int
-graph_src_nodes_offset_populate(struct graph *_graph)
+graph_src_bitmap_populate(struct graph *_graph)
{
struct rte_graph *graph = _graph->graph;
struct graph_node *graph_node;
struct rte_node *node;
- int32_t head = -1;
const char *name;
STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
@@ -195,7 +245,7 @@ graph_src_nodes_offset_populate(struct graph *_graph)
SET_ERR_JMP(EINVAL, fail, "%s not found", name);
__rte_node_stream_alloc(graph, node);
- graph->cir_start[head--] = node->off;
+ __rte_node_pending_set(graph->src_pending, node);
}
}
@@ -204,17 +254,42 @@ graph_src_nodes_offset_populate(struct graph *_graph)
return -rte_errno;
}
+void
+graph_src_bitmap_rebuild(struct graph *_graph)
+{
+ struct rte_graph *graph = _graph->graph;
+ struct graph_node *graph_node;
+ struct rte_node *node;
+ const char *name;
+
+ memset(graph->src_pending, 0,
+ sizeof(uint64_t) * graph->nb_sched_words);
+
+ STAILQ_FOREACH(graph_node, &_graph->node_list, next) {
+ if (!(graph_node->node->flags & RTE_NODE_SOURCE_F))
+ continue;
+ if (graph_node->node->lcore_id != RTE_MAX_LCORE &&
+ graph_node->node->lcore_id != _graph->lcore_id)
+ continue;
+ name = graph_node->node->name;
+ node = graph_node_name_to_ptr(graph, name);
+ if (node == NULL)
+ continue;
+ __rte_node_pending_set(graph->src_pending, node);
+ }
+}
+
static int
graph_fp_mem_populate(struct graph *graph)
{
int rc;
- graph_header_popluate(graph);
+ graph_header_populate(graph);
if (graph_pcap_is_enable())
graph_pcap_init(graph);
graph_nodes_populate(graph);
rc = graph_node_nexts_populate(graph);
- rc |= graph_src_nodes_offset_populate(graph);
+ rc |= graph_src_bitmap_populate(graph);
return rc;
}
diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h
index 26cdc6637192..df6f83b20261 100644
--- a/lib/graph/graph_private.h
+++ b/lib/graph/graph_private.h
@@ -49,6 +49,7 @@ struct node {
STAILQ_ENTRY(node) next; /**< Next node in the list. */
char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
uint64_t flags; /**< Node configuration flag. */
+ int16_t priority; /**< Scheduling priority. */
unsigned int lcore_id;
/**< Node runs on the Lcore ID used for mcore dispatch model. */
rte_node_process_t process; /**< Node process function. */
@@ -98,19 +99,23 @@ struct graph {
const struct rte_memzone *mz;
/**< Memzone to store graph data. */
rte_graph_off_t nodes_start;
- /**< Node memory start offset in graph reel. */
+ /**< Node memory start offset in graph memory. */
rte_graph_off_t xstats_start;
- /**< Node xstats memory start offset in graph reel. */
+ /**< Node xstats memory start offset in graph memory. */
rte_node_t src_node_count;
/**< Number of source nodes in a graph. */
struct rte_graph *graph;
/**< Pointer to graph data. */
rte_node_t node_count;
/**< Total number of nodes. */
- uint32_t cir_start;
- /**< Circular buffer start offset in graph reel. */
- uint32_t cir_mask;
- /**< Circular buffer mask for wrap around. */
+ uint32_t sched_table_off;
+ /**< Schedule table start offset in graph memory. */
+ uint32_t pending_off;
+ /**< Pending bitmap start offset in graph memory. */
+ uint32_t src_pending_off;
+ /**< Source pending bitmap start offset in graph memory. */
+ uint16_t nb_sched_words;
+ /**< Number of uint64_t words in pending bitmaps. */
rte_graph_t id;
/**< Graph identifier. */
rte_graph_t parent_id;
@@ -378,6 +383,16 @@ int graph_fp_mem_create(struct graph *graph);
*/
int graph_fp_mem_destroy(struct graph *graph);
+/**
+ * @internal
+ *
+ * Rebuild the source pending bitmap based on lcore affinity.
+ *
+ * @param graph
+ * Pointer to the internal graph object.
+ */
+void graph_src_bitmap_rebuild(struct graph *graph);
+
/* Lookup functions */
/**
* @internal
diff --git a/lib/graph/node.c b/lib/graph/node.c
index e3359fe490a5..b5599143b37b 100644
--- a/lib/graph/node.c
+++ b/lib/graph/node.c
@@ -153,6 +153,7 @@ __rte_node_register(const struct rte_node_register *reg)
if (rte_strscpy(node->name, reg->name, RTE_NODE_NAMESIZE) < 0)
goto free_xstat;
node->flags = reg->flags;
+ node->priority = reg->priority;
node->process = reg->process;
node->init = reg->init;
node->fini = reg->fini;
@@ -216,6 +217,7 @@ node_clone(struct node *node, const char *name)
/* Clone the source node */
reg->flags = node->flags;
+ reg->priority = node->priority;
reg->process = node->process;
reg->init = node->init;
reg->fini = node->fini;
diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h
index 7e433f466129..6cd32ec22284 100644
--- a/lib/graph/rte_graph.h
+++ b/lib/graph/rte_graph.h
@@ -496,6 +496,7 @@ struct rte_node_register {
char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */
uint64_t flags; /**< Node configuration flag. */
#define RTE_NODE_SOURCE_F (1ULL << 0) /**< Node type is source. */
+ int16_t priority; /**< Scheduling priority (lower = visited first, default 0). */
This will break the ABI. Please run ABI check and see/fix.
rte_node_process_t process; /**< Node process function. */
rte_node_init_t init; /**< Node init function. */
rte_node_fini_t fini; /**< Node fini function. */
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index f9ff3daa88ec..50a473564b56 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -77,9 +77,13 @@ int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
unsigned int lcore_id);
/**
- * Perform graph walk on the circular buffer and invoke the process function
+ * Perform graph walk on the pending bitmap and invoke the process function
* of the nodes and collect the stats.
*
+ * Nodes are visited in scheduling order (lowest priority value first).
+ * Source nodes are seeded into the pending bitmap at the start of each walk.
+ * Nodes with different lcore affinity are dispatched to their target lcore.
+ *
* @param graph
* Graph pointer returned from rte_graph_lookup function.
*
@@ -88,20 +92,28 @@ int rte_graph_model_mcore_dispatch_node_lcore_affinity_set(const char *name,
static inline void
rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
{
- const rte_graph_off_t *cir_start = graph->cir_start;
- const rte_node_t mask = graph->cir_mask;
- uint32_t head = graph->head;
+ const uint16_t nwords = graph->nb_sched_words;
struct rte_node *node;
+ uint16_t word, bit;
if (graph->dispatch.wq != NULL)
__rte_graph_mcore_dispatch_sched_wq_process(graph);
- while (likely(head != graph->tail)) {
- node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+ /* Seed pending bitmap with source nodes bound to this lcore */
+ for (word = 0; word < nwords; word++)
+ graph->pending[word] |= graph->src_pending[word];
- /* skip the src nodes which not bind with current worker */
- if ((int32_t)head < 1 && node->dispatch.lcore_id != graph->dispatch.lcore_id)
- continue;
+ for (;;) {
+ /* find first word with any pending bit */
+ for (word = 0; word < nwords; word++)
+ if (graph->pending[word])
+ break;
+ if (word == nwords)
+ break; /* no more pending nodes */
+
+ bit = rte_ctz64(graph->pending[word]);
+ graph->pending[word] &= ~(1ULL << bit);
+ node = __rte_graph_pending_node(graph, word, bit);
/* Schedule the node until all task/objs are done */
if (node->dispatch.lcore_id != RTE_MAX_LCORE &&
@@ -111,11 +123,7 @@ rte_graph_walk_mcore_dispatch(struct rte_graph *graph)
continue;
__rte_node_process(graph, node);
-
- head = likely((int32_t)head > 0) ? head & mask : head;
}
-
- graph->tail = 0;
}
#ifdef __cplusplus
diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h
index 4b6236e301e3..38feb3e1ca88 100644
--- a/lib/graph/rte_graph_model_rtc.h
+++ b/lib/graph/rte_graph_model_rtc.h
@@ -6,9 +6,12 @@
#include "rte_graph_worker_common.h"
/**
- * Perform graph walk on the circular buffer and invoke the process function
+ * Perform graph walk on the pending bitmap and invoke the process function
* of the nodes and collect the stats.
*
+ * Nodes are visited in scheduling order (lowest priority value first).
+ * Source nodes are seeded into the pending bitmap at the start of each walk.
+ *
* @param graph
* Graph pointer returned from rte_graph_lookup function.
*
@@ -17,30 +20,52 @@
static inline void
rte_graph_walk_rtc(struct rte_graph *graph)
{
- const rte_graph_off_t *cir_start = graph->cir_start;
- const rte_node_t mask = graph->cir_mask;
- uint32_t head = graph->head;
+ const uint16_t nwords = graph->nb_sched_words;
struct rte_node *node;
+ uint16_t word, bit;
/*
- * Walk on the source node(s) ((cir_start - head) -> cir_start) and then
- * on the pending streams (cir_start -> (cir_start + mask) -> cir_start)
- * in a circular buffer fashion.
+ * Nodes are assigned a bit position (sched_idx) sorted by (priority,
+ * node_id) at graph creation time. Source nodes are forced to INT16_MIN
+ * priority so they always come first.
*
- * +-----+ <= cir_start - head [number of source nodes]
- * | |
- * | ... | <= source nodes
- * | |
- * +-----+ <= cir_start [head = 0] [tail = 0]
- * | |
- * | ... | <= pending streams
- * | |
- * +-----+ <= cir_start + mask
+ * sched_table[] maps bit positions to node offsets:
+ *
+ * pending[] sched_table[]
+ * +----------+ +------------------+
+ * | word 0 | ---> | src_node_0 | bit 0 (prio=INT16_MIN)
+ * | 1100...1 | | src_node_1 | bit 1 (prio=INT16_MIN)
+ * | | | mpls_input | bit 2 (prio=-10)
+ * | | | ipv4_input | bit 3 (prio=0)
+ * | | | ... |
+ * +----------+ +------------------+
+ * | word 1 | ---> | ip4_rewrite | bit 64 (prio=10)
+ * | ... | | ... |
+ * +----------+ +------------------+
+ *
+ * Walk: for each word, find lowest set bit (rte_ctz64), process that
+ * node, clear the bit, re-read the word (processing may have set new
+ * bits), repeat.
+ *
+ * After each node is processed, restart scanning from word 0 since
+ * processing may set bits in any word, including earlier ones.
*/
- while (likely(head != graph->tail)) {
- node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]);
+
+ /* Seed pending bitmap with source nodes */
+ for (word = 0; word < nwords; word++)
+ graph->pending[word] |= graph->src_pending[word];
+
+ for (;;) {
+ /* find first word with any pending bit */
+ for (word = 0; word < nwords; word++)
+ if (graph->pending[word])
+ break;
+ if (word == nwords)
+ break; /* no more pending nodes */
+
+ bit = rte_ctz64(graph->pending[word]);
+ graph->pending[word] &= ~(1ULL << bit);
+ node = __rte_graph_pending_node(graph, word, bit);
__rte_node_process(graph, node);
- head = likely((int32_t)head > 0) ? head & mask : head;
}
- graph->tail = 0;
}
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index b0f952a82cc9..e513d7a655d9 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -14,7 +14,7 @@ extern "C" {
#endif
/**
- * Perform graph walk on the circular buffer and invoke the process function
+ * Perform graph walk on the pending bitmap and invoke the process function
* of the nodes and collect the stats.
*
* @param graph
diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 4ab53a533e4c..e52a37ce5e84 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -49,15 +49,14 @@ SLIST_HEAD(rte_graph_rq_head, rte_graph);
*/
struct __rte_cache_aligned rte_graph {
/* Fast path area. */
- uint32_t tail; /**< Tail of circular buffer. */
- uint32_t head; /**< Head of circular buffer. */
- uint32_t cir_mask; /**< Circular buffer wrap around mask. */
rte_node_t nb_nodes; /**< Number of nodes in the graph. */
- rte_graph_off_t *cir_start; /**< Pointer to circular buffer. */
rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */
+ rte_graph_off_t *sched_table; /**< Node offset indexed by sched_idx. */
+ uint64_t *pending; /**< Bitmap of pending nodes. */
+ uint64_t *src_pending; /**< Bitmap of source nodes (constant). */
+ uint16_t nb_sched_words; /**< Number of uint64_t words in pending bitmaps. */
uint8_t model; /**< graph model */
- uint8_t reserved1; /**< Reserved for future use. */
- uint16_t reserved2; /**< Reserved for future use. */
+ /* 26 bytes padding */
union {
/* Fast schedule area for mcore dispatch model */
struct {
@@ -98,6 +97,7 @@ struct __rte_cache_aligned rte_node {
rte_node_t id; /**< Node identifier. */
rte_node_t parent_id; /**< Parent Node identifier. */
rte_edge_t nb_edges; /**< Number of edges from this node. */
+ uint16_t sched_idx; /**< Bit position in pending bitmap. */
uint32_t realloc_count; /**< Number of times realloced. */
char parent[RTE_NODE_NAMESIZE]; /**< Parent node name. */
@@ -132,7 +132,7 @@ struct __rte_cache_aligned rte_node {
}; /**< Node Context. */
uint16_t size; /**< Total number of objects available. */
uint16_t idx; /**< Number of objects used. */
- rte_graph_off_t off; /**< Offset of node in the graph reel. */
+ rte_graph_off_t off; /**< Offset of node in the graph memory. */
uint64_t total_cycles; /**< Cycles spent in this node. */
uint64_t total_calls; /**< Calls done to this node. */
uint64_t total_objs; /**< Objects processed by this node. */
@@ -187,12 +187,12 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph,
/**
* @internal
*
- * Enqueue a given node to the tail of the graph reel.
+ * Process a node's pending objects and collect stats.
*
* @param graph
* Pointer Graph object.
* @param node
- * Pointer to node object to be enqueued.
+ * Pointer to node object to be processed.
*/
static __rte_always_inline void
__rte_node_process(struct rte_graph *graph, struct rte_node *node)
@@ -220,21 +220,42 @@ __rte_node_process(struct rte_graph *graph, struct rte_node *node)
/**
* @internal
*
- * Enqueue a given node to the tail of the graph reel.
+ * Get a pointer to a node from the scheduling table.
*
* @param graph
* Pointer Graph object.
+ * @param word
+ * Offset in the pending bitmap.
+ * @param bit
+ * Bit number.
+ *
+ * @return
+ * Pointer to the node.
+ */
+static __rte_always_inline struct rte_node *
+__rte_graph_pending_node(struct rte_graph *graph, uint16_t word, uint16_t bit)
+{
+ const uint16_t index = (word * sizeof(*graph->pending) * CHAR_BIT) + bit;
+ const rte_graph_off_t node_offset = graph->sched_table[index];
+ return RTE_PTR_ADD(graph, node_offset);
+}
+
+/**
+ * @internal
+ *
+ * Mark a node as pending in the graph scheduling bitmap.
+ *
+ * @param bitmap
+ * Either graph->pending or graph->src_pending.
* @param node
- * Pointer to node object to be enqueued.
+ * Pointer to node object to be marked pending.
*/
static __rte_always_inline void
-__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node)
+__rte_node_pending_set(uint64_t *bitmap, struct rte_node *node)
{
- uint32_t tail;
-
- tail = graph->tail;
- graph->cir_start[tail++] = node->off;
- graph->tail = tail & graph->cir_mask;
+ const uint16_t word = node->sched_idx / (sizeof(*bitmap) * CHAR_BIT);
+ const uint16_t bit = node->sched_idx % (sizeof(*bitmap) * CHAR_BIT);
+ bitmap[word] |= 1ULL << bit;
}
/**
@@ -242,8 +263,8 @@ __rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *node)
*
* Enqueue sequence prologue function.
*
- * Updates the node to tail of graph reel and resizes the number of objects
- * available in the stream as needed.
+ * Marks the node as pending in the scheduling bitmap and resizes the number
+ * of objects available in the stream as needed.
*
* @param graph
* Pointer to the graph object.
@@ -259,9 +280,7 @@ __rte_node_enqueue_prologue(struct rte_graph *graph, struct rte_node *node,
const uint16_t idx, const uint16_t space)
{
- /* Add to the pending stream list if the node is new */
- if (idx == 0)
- __rte_node_enqueue_tail_update(graph, node);
+ __rte_node_pending_set(graph->pending, node);
if (unlikely(node->size < (idx + space)))
__rte_node_stream_alloc_size(graph, node, node->size + space);
@@ -293,7 +312,7 @@ __rte_node_next_node_get(struct rte_node *node, rte_edge_t next)
/**
* Enqueue the objs to next node for further processing and set
- * the next node to pending state in the circular buffer.
+ * the next node to pending state in the scheduling bitmap.
*
* @param graph
* Graph pointer returned from rte_graph_lookup().
@@ -321,7 +340,7 @@ rte_node_enqueue(struct rte_graph *graph, struct rte_node *node,
/**
* Enqueue only one obj to next node for further processing and
- * set the next node to pending state in the circular buffer.
+ * set the next node to pending state in the scheduling bitmap.
*
* @param graph
* Graph pointer returned from rte_graph_lookup().
@@ -347,7 +366,7 @@ rte_node_enqueue_x1(struct rte_graph *graph, struct rte_node *node,
/**
* Enqueue only two objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
+ * set the next node to pending state in the scheduling bitmap.
* Same as rte_node_enqueue_x1 but enqueue two objs.
*
* @param graph
@@ -377,7 +396,7 @@ rte_node_enqueue_x2(struct rte_graph *graph, struct rte_node *node,
/**
* Enqueue only four objs to next node for further processing and
- * set the next node to pending state in the circular buffer.
+ * set the next node to pending state in the scheduling bitmap.
* Same as rte_node_enqueue_x1 but enqueue four objs.
*
* @param graph
@@ -414,7 +433,7 @@ rte_node_enqueue_x4(struct rte_graph *graph, struct rte_node *node,
/**
* Enqueue objs to multiple next nodes for further processing and
- * set the next nodes to pending state in the circular buffer.
+ * set the next nodes to pending state in the scheduling bitmap.
* objs[i] will be enqueued to nexts[i].
*
* @param graph
@@ -472,7 +491,7 @@ rte_node_next_stream_get(struct rte_graph *graph, struct rte_node *node,
}
/**
- * Put the next stream to pending state in the circular buffer
+ * Put the next stream to pending state in the scheduling bitmap
* for further processing. Should be invoked after rte_node_next_stream_get().
*
* @param graph
@@ -495,9 +514,7 @@ rte_node_next_stream_put(struct rte_graph *graph, struct rte_node *node,
return;
node = __rte_node_next_node_get(node, next);
- if (node->idx == 0)
- __rte_node_enqueue_tail_update(graph, node);
-
+ __rte_node_pending_set(graph->pending, node);
node->idx += idx;
}
@@ -530,7 +547,7 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src,
src->objs = dobjs;
src->size = dsz;
dst->idx = src->idx;
- __rte_node_enqueue_tail_update(graph, dst);
+ __rte_node_pending_set(graph->pending, dst);
} else { /* Move the objects from src node to dst node */
rte_node_enqueue(graph, src, next, src->objs, src->idx);
}
--
2.54.0
[-- Attachment #2: Type: text/html, Size: 36292 bytes --]
^ permalink raw reply related
* RE: [PATCH v5] net/iavf: fix duplicate VF reset during PF reset recovery
From: Loftus, Ciara @ 2026-06-10 10:50 UTC (permalink / raw)
To: Mandal, Anurag, dev@dpdk.org
Cc: Richardson, Bruce, Medvedkin, Vladimir, stable@dpdk.org
In-Reply-To: <20260610100704.366722-1-anurag.mandal@intel.com>
> Subject: [PATCH v5] net/iavf: fix duplicate VF reset during PF reset recovery
>
> During PF initiated reset recovery, iavf_dev_close() sending
> an extra VIRTCHNL_OP_RESET_VF while recovery is already in progress.
> That second reset can leave PF/VF virtchnl state inconsistent and
> cause VIRTCHNL_OP_CONFIG_VSI_QUEUES to fail with ERR_PARAM after
> ToR link flap/power-cycle, leaving the VF unable to recover.
> This results in connection loss.
>
> This patch introduces a new flag 'pf_reset_in_progress', that is
> set only when iavf_handle_hw_reset() is entered with
> vf_initiated_reset as false and is cleared on exit.
> Also, close-time VF reset and related close-time virtchnl
> operations are skipped when PF triggered reset recovery is set.
> This is done to avoid a duplicate VF reset, and keep normal
> behavior for application-driven close or VF-initiated reinit.
>
> Fixes: 675a104e2e94 ("net/iavf: fix abnormal disable HW interrupt")
> Fixes: b34fe66ea893 ("net/iavf: delay VF reset command")
> Fixes: 5e03e316c753 ("net/iavf: handle virtchnl event message without
> interrupt")
> Cc: stable@dpdk.org
>
> Signed-off-by: Anurag Mandal <anurag.mandal@intel.com>
Acked-by: Ciara Loftus <ciara.loftus@intel.com>
I think you may need to respin due to patch application failure.
I have some suggestions for improving the comments/release notes
that you could include in the next version. Code looks good to me.
> ---
> V5: Addressed Ciara Loftus's comments
> - added separate flag for PF initiated reset recovery
> V4: Addressed Ciara Loftus's comments
> - split VF reset from other code changes
> V3: Addressed latest ai-code-review comments
> V2: Addressed ai-code-review comments
>
> doc/guides/rel_notes/release_26_07.rst | 3 ++
> drivers/net/intel/iavf/iavf.h | 7 +++++
> drivers/net/intel/iavf/iavf_ethdev.c | 40 +++++++++++++++-----------
> drivers/net/intel/iavf/iavf_vchnl.c | 18 ++++++++++--
> 4 files changed, 49 insertions(+), 19 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_26_07.rst
> b/doc/guides/rel_notes/release_26_07.rst
> index d2563ac503..f6899a78c3 100644
> --- a/doc/guides/rel_notes/release_26_07.rst
> +++ b/doc/guides/rel_notes/release_26_07.rst
> @@ -95,6 +95,9 @@ New Features
>
> * Added support for transmitting LLDP packets based on mbuf packet type.
> * Implemented AVX2 context descriptor transmit paths.
> + * Prevented duplicate 'VIRTCHNL_OP_RESET_VF' during a PF-initiated
> + reset recovery, which earlier caused virtchnl state corruption
> + and connection loss after a top-of-rack (ToR) link flap/power-cycle.
I think something more concise here would be better.
eg. "Fixed duplicate send of 'VIRTCHNL_OP_RESET_VF' during PF reset
recovery which could cause virtchnl state corruption"
>
> * **Updated PCAP ethernet driver.**
>
> diff --git a/drivers/net/intel/iavf/iavf.h b/drivers/net/intel/iavf/iavf.h
> index 2615b6f034..67aacbe7a6 100644
> --- a/drivers/net/intel/iavf/iavf.h
> +++ b/drivers/net/intel/iavf/iavf.h
> @@ -292,6 +292,13 @@ struct iavf_info {
>
> bool in_reset_recovery;
>
> + /*
> + * Set only while iavf_handle_hw_reset()
> + * is processing a PF-initiated reset
> + * (vf_initiated_reset == false).
> + */
I don't think a comment is warranted here, the variable name is
self-explanatory.
> + bool pf_reset_in_progress;
> +
> uint32_t ptp_caps;
> rte_spinlock_t phc_time_aq_lock;
> };
> diff --git a/drivers/net/intel/iavf/iavf_ethdev.c
> b/drivers/net/intel/iavf/iavf_ethdev.c
> index a8031e23a5..2b6f4daa99 100644
> --- a/drivers/net/intel/iavf/iavf_ethdev.c
> +++ b/drivers/net/intel/iavf/iavf_ethdev.c
> @@ -3166,23 +3166,27 @@ iavf_dev_close(struct rte_eth_dev *dev)
>
> ret = iavf_dev_stop(dev);
>
> - /*
> - * Release redundant queue resource when close the dev
> - * so that other vfs can re-use the queues.
> - */
> - if (vf->lv_enabled) {
> - ret = iavf_request_queues(dev,
> IAVF_MAX_NUM_QUEUES_DFLT);
> - if (ret)
> - PMD_DRV_LOG(ERR, "Reset the num of queues
> failed");
> + /* Skip RESET_VF on a PF-initiated reset */
Regarding the comment above, here we're not skipping RESET_VF rather
preventing sending virtchnl messages to the adminq during the PF-initiated
reset. I suggest rewording the comment to reflect that.
> + if (!vf->pf_reset_in_progress) {
>
> - vf->max_rss_qregion = IAVF_MAX_NUM_QUEUES_DFLT;
> - }
> + /*
> + * Release redundant queue resource when close the dev
> + * so that other vfs can re-use the queues.
> + */
> + if (vf->lv_enabled) {
> + ret = iavf_request_queues(dev,
> IAVF_MAX_NUM_QUEUES_DFLT);
> + if (ret)
> + PMD_DRV_LOG(ERR, "Reset the num of
> queues failed");
> + vf->max_rss_qregion =
^ permalink raw reply
* [RFC] memtank: add memtank library
From: Konstantin Ananyev @ 2026-06-10 10:39 UTC (permalink / raw)
To: dev; +Cc: hofors, mb
Introduce memtank, highly customizable fixed sized object allocator
for DPDK applications. It offers close to the mempool level performance
on the fast path.
Main difference with the mempool is the ability to grow/shrink at runtime
with user provided grow/shirnk threshold values, plus some extra
features for higher flexibility.
Key properties:
- relies on user to provide callbacks for actual memory reservations.
User is free to choose whatever is most suitable way for his scenario,
i.e: via malloc/rte_malloc/mmap/some custom memory allocator.
- user defined constructor callback for newly allocated objects.
- bulk alloc and free APIs.
- different alloc/free policies (specified by user via flags parameter):
* lightweight as possible, but can fail
* more robust, but heavyweight - causes call to user-provided backing
memory allocator.
- backing memory grows/shrinks on demand, special API extensions
to allow user control grow/shrink size, frequency and when/where it
is going to happen (DP, CP, both, etc.).
- ability to pre-allocate all objects at memtank creation time
(mempool like behavior).
- custom object size and alignment.
- per object runtime statistics and sanity-checks (boundary violation,
double free, etc.) can be enabled/disabled at memtank creation time.
Known limitations (subject for further improvements):
- scalability:
after 8+ lcores conventional mempool (with FIFO) starts to outperform
memtank (which uses LIFO inside).
- mempool_cache integration is not part of the library and right now
has to be implemented by used manually on top of memtank API.
Envisioned usage scenarios within DPDK-based apps:
various flow/session control structures (TCP PCB, CT, NAT sessions, etc.)
that needs to be allocated/freed at the data-path.
Also can be used by 'semi-fastpath' allocations:
TBL-8 blocks for LPM, hash buckets, etc.
Initial idea is inspired by Linux/Solaris SLAB allocators.
Also re-used some ideas from my previous work for TLDK project:
https://github.com/FDio/tldk
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
app/test/meson.build | 2 +
app/test/test_memtank.c | 160 +++
app/test/test_memtank_stress.c | 1075 +++++++++++++++++
doc/api/doxy-api-index.md | 1 +
doc/api/doxy-api.conf.in | 1 +
.../prog_guide/img/memtank_internal.svg | 1 +
doc/guides/prog_guide/index.rst | 1 +
doc/guides/prog_guide/memtank_lib.rst | 231 ++++
lib/memtank/memtank.c | 630 ++++++++++
lib/memtank/memtank.h | 110 ++
lib/memtank/meson.build | 18 +
lib/memtank/misc.c | 375 ++++++
lib/memtank/rte_memtank.h | 303 +++++
lib/meson.build | 1 +
14 files changed, 2909 insertions(+)
create mode 100644 app/test/test_memtank.c
create mode 100644 app/test/test_memtank_stress.c
create mode 100644 doc/guides/prog_guide/img/memtank_internal.svg
create mode 100644 doc/guides/prog_guide/memtank_lib.rst
create mode 100644 lib/memtank/memtank.c
create mode 100644 lib/memtank/memtank.h
create mode 100644 lib/memtank/meson.build
create mode 100644 lib/memtank/misc.c
create mode 100644 lib/memtank/rte_memtank.h
diff --git a/app/test/meson.build b/app/test/meson.build
index 61024125a7..d1b6203cf5 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -129,6 +129,8 @@ source_file_deps = {
'test_memory.c': [],
'test_mempool.c': [],
'test_mempool_perf.c': [],
+ 'test_memtank.c': ['memtank'],
+ 'test_memtank_stress.c': ['memtank'],
'test_memzone.c': [],
'test_meter.c': ['meter'],
'test_metrics.c': ['metrics'],
diff --git a/app/test/test_memtank.c b/app/test/test_memtank.c
new file mode 100644
index 0000000000..1cf55d65e6
--- /dev/null
+++ b/app/test/test_memtank.c
@@ -0,0 +1,160 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2018 Intel Corporation
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include <rte_memory.h>
+#include <rte_memtank.h>
+#include <rte_errno.h>
+#include "test.h"
+
+/* TEST SUITE */
+
+static int
+memtank_test_setup(void)
+{
+ return 0;
+}
+
+static void
+memtank_test_teardown(void)
+{
+}
+
+
+
+static void *
+test_alloc(size_t sz, void *p)
+{
+ RTE_SET_USED(p);
+ return malloc(sz);
+}
+
+static void
+test_free(void *buf, void *p)
+{
+ RTE_SET_USED(p);
+ return free(buf);
+}
+
+static int
+test_memtank_create_invalid(void)
+{
+ struct rte_memtank_prm prm;
+ struct rte_memtank *mt;
+
+ memset(&prm, 0, sizeof(prm));
+
+ rte_errno = 0;
+ mt = rte_memtank_create(&prm);
+ RTE_TEST_ASSERT_EQUAL(mt, NULL, "memtank create");
+ RTE_TEST_ASSERT_EQUAL(rte_errno, EINVAL, "errno EINVAL");
+
+ prm.alloc = test_alloc;
+ rte_errno = 0;
+ mt = rte_memtank_create(&prm);
+ RTE_TEST_ASSERT_EQUAL(mt, NULL, "memtank create");
+ RTE_TEST_ASSERT_EQUAL(rte_errno, EINVAL, "errno EINVAL");
+
+ prm.free = test_free;
+ rte_errno = 0;
+ mt = rte_memtank_create(&prm);
+ RTE_TEST_ASSERT_EQUAL(mt, NULL, "memtank create");
+ RTE_TEST_ASSERT_EQUAL(rte_errno, EINVAL, "errno EINVAL");
+
+ prm.obj_align = 2;
+ rte_errno = 0;
+ mt = rte_memtank_create(&prm);
+ RTE_TEST_ASSERT_EQUAL(mt, NULL, "memtank create");
+ RTE_TEST_ASSERT_EQUAL(rte_errno, EINVAL, "errno EINVAL");
+
+ prm.min_free = 2;
+ rte_errno = 0;
+ mt = rte_memtank_create(&prm);
+ RTE_TEST_ASSERT_EQUAL(mt, NULL, "memtank create");
+ RTE_TEST_ASSERT_EQUAL(rte_errno, EINVAL, "errno EINVAL");
+
+ prm.max_free = 2;
+ rte_errno = 0;
+ mt = rte_memtank_create(&prm);
+ RTE_TEST_ASSERT_EQUAL(mt, NULL, "memtank create");
+ RTE_TEST_ASSERT_EQUAL(rte_errno, EINVAL, "errno EINVAL");
+
+ prm.max_obj = 2;
+ rte_errno = 0;
+ mt = rte_memtank_create(&prm);
+ RTE_TEST_ASSERT_EQUAL(mt, NULL, "memtank create");
+ RTE_TEST_ASSERT_EQUAL(rte_errno, EINVAL, "errno EINVAL");
+
+ prm.nb_obj_chunk = 2;
+ rte_errno = 0;
+ mt = rte_memtank_create(&prm);
+ RTE_TEST_ASSERT_NOT_EQUAL(mt, NULL, "memtank create");
+
+ rte_memtank_destroy(mt);
+ return TEST_SUCCESS;
+}
+
+
+static int
+test_memtank_alloc(void)
+{
+ struct rte_memtank_prm prm;
+ struct rte_memtank *mt;
+
+ memset(&prm, 0, sizeof(prm));
+ prm.alloc = test_alloc;
+ prm.free = test_free;
+ prm.obj_align = 2;
+ prm.nb_obj_chunk = 2;
+ prm.min_free = 2;
+ prm.max_free = 2;
+ prm.max_obj = 10;
+
+ mt = rte_memtank_create(&prm);
+ RTE_TEST_ASSERT_NOT_EQUAL(mt, NULL, "memtank create");
+
+ void *obj[3] = { NULL };
+ uint32_t rc;
+
+ /* min_obj is 0 so this is expected to fail */
+ rc = rte_memtank_alloc(mt, obj, 1, RTE_MTANK_ALLOC_CHUNK);
+ RTE_TEST_ASSERT_EQUAL(rc, 0, "memtank alloc chunk 0 (%u)", rc);
+
+ rc = rte_memtank_alloc(mt, obj, 1, RTE_MTANK_ALLOC_GROW);
+ RTE_TEST_ASSERT_EQUAL(rc, 1, "memtank alloc 1 (%u)", rc);
+ RTE_TEST_ASSERT_NOT_EQUAL(obj[0], NULL, "alloc obj");
+
+ rc = rte_memtank_alloc(mt, obj, 3, RTE_MTANK_ALLOC_CHUNK);
+ RTE_TEST_ASSERT_EQUAL(rc, 3, "memtank alloc 3 (%u)", rc);
+
+ /* will fail - out of free objs */
+ rc = rte_memtank_alloc(mt, obj, 1, RTE_MTANK_ALLOC_CHUNK);
+ RTE_TEST_ASSERT_EQUAL(rc, 0, "memtank alloc chunk 0 (%u)", rc);
+
+ rte_memtank_destroy(mt);
+ return TEST_SUCCESS;
+}
+
+static struct unit_test_suite memtank_testsuite = {
+ .suite_name = "memtank library test suite",
+ .setup = memtank_test_setup,
+ .teardown = memtank_test_teardown,
+ .unit_test_cases = {
+ TEST_CASE(test_memtank_alloc),
+ TEST_CASE(test_memtank_create_invalid),
+ TEST_CASES_END(), /**< NULL terminate unit test array */
+ },
+};
+
+static int
+test_memtank(void)
+{
+ return unit_test_suite_runner(&memtank_testsuite);
+}
+
+REGISTER_FAST_TEST(memtank_autotest, NOHUGE_OK, ASAN_OK, test_memtank);
diff --git a/app/test/test_memtank_stress.c b/app/test/test_memtank_stress.c
new file mode 100644
index 0000000000..67b10c6611
--- /dev/null
+++ b/app/test/test_memtank_stress.c
@@ -0,0 +1,1075 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ * Copyright(c) 2025 Huawei Technologies Co., Ltd
+ */
+
+#include <string.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <unistd.h>
+
+#include <rte_common.h>
+#include <rte_log.h>
+#include <rte_errno.h>
+#include <rte_launch.h>
+#include <rte_cycles.h>
+#include <rte_eal.h>
+#include <rte_ring.h>
+#include <rte_per_lcore.h>
+#include <rte_lcore.h>
+#include <rte_random.h>
+#include <rte_hexdump.h>
+#include <rte_malloc.h>
+#include <rte_memtank.h>
+
+#include "test.h"
+
+struct memstat {
+ struct {
+ rte_atomic64_t nb_call;
+ rte_atomic64_t nb_fail;
+ rte_atomic64_t sz;
+ } alloc;
+ struct {
+ rte_atomic64_t nb_call;
+ rte_atomic64_t nb_fail;
+ } free;
+ uint64_t nb_alloc_obj;
+};
+
+struct memtank_stat {
+ uint64_t nb_cycle;
+ struct {
+ uint64_t nb_call;
+ uint64_t nb_req;
+ uint64_t nb_alloc;
+ uint64_t nb_cycle;
+ uint64_t max_cycle;
+ uint64_t min_cycle;
+ } alloc;
+ struct {
+ uint64_t nb_call;
+ uint64_t nb_free;
+ uint64_t nb_cycle;
+ uint64_t max_cycle;
+ uint64_t min_cycle;
+ } free;
+ struct {
+ uint64_t nb_call;
+ uint64_t nb_chunk;
+ uint64_t nb_cycle;
+ uint64_t max_cycle;
+ uint64_t min_cycle;
+ } grow;
+ struct {
+ uint64_t nb_call;
+ uint64_t nb_chunk;
+ uint64_t nb_cycle;
+ uint64_t max_cycle;
+ uint64_t min_cycle;
+ } shrink;
+};
+
+struct master_args {
+ uint64_t run_cycles;
+ uint32_t delay_us;
+ uint32_t flags;
+};
+
+struct worker_args {
+ uint32_t max_obj;
+ uint32_t obj_size;
+ uint32_t alloc_flags;
+ uint32_t free_flags;
+ struct rte_ring *rng;
+};
+
+struct memtank_arg {
+ struct rte_memtank *mt;
+ union {
+ struct master_args master;
+ struct worker_args worker;
+ };
+ struct memtank_stat stats;
+} __rte_cache_aligned;
+
+#define BULK_NUM 32
+
+#define OBJ_SZ_MIN 1
+#define OBJ_SZ_MAX 0x100000
+#define OBJ_SZ_DEF (4 * RTE_CACHE_LINE_SIZE + 1)
+
+#define TEST_TIME 10
+#define CLEANUP_TIME 3
+
+#define FREE_THRSH_MIN 0
+#define FREE_THRSH_MAX 100
+
+enum {
+ WRK_CMD_STOP,
+ WRK_CMD_RUN,
+};
+
+enum {
+ MASTER_FLAG_GROW = 1,
+ MASTER_FLAG_SHRINK = 2,
+};
+
+enum {
+ MEM_FUNC_SYS,
+ MEM_FUNC_RTE,
+};
+
+static uint32_t wrk_cmd __rte_cache_aligned;
+
+static struct rte_memtank_prm mtnk_prm = {
+ .min_free = 4 * BULK_NUM,
+ .max_free = 32 * BULK_NUM,
+ .obj_size = OBJ_SZ_DEF,
+ .obj_align = RTE_CACHE_LINE_SIZE,
+ .nb_obj_chunk = BULK_NUM,
+ .flags = RTE_MTANK_OBJ_DBG,
+};
+
+static struct {
+ uint32_t run_time; /* test run-time in seconds */
+ uint32_t wrk_max_obj; /* max alloced objects per worker */
+ uint32_t wrk_fill_thrsh; /* wrk fill thresh % (0-100) */
+ uint32_t wrk_free_thrsh; /* wrk free thresh % (0-100) */
+ int32_t mem_func; /* memory subsystem to use for alloc/free */
+ int32_t verbose; /* verbose: print stat for each worker */
+} global_cfg = {
+ .run_time = TEST_TIME,
+ .wrk_max_obj = 2 * BULK_NUM,
+ .wrk_fill_thrsh = FREE_THRSH_MAX,
+ .wrk_free_thrsh = FREE_THRSH_MIN,
+ .mem_func = MEM_FUNC_SYS,
+ .verbose = 0,
+};
+
+static void *
+alloc_func(size_t sz)
+{
+ switch (global_cfg.mem_func) {
+ case MEM_FUNC_SYS:
+ return malloc(sz);
+ case MEM_FUNC_RTE:
+ return rte_malloc(NULL, sz, 0);
+ }
+
+ return NULL;
+}
+
+static void
+free_func(void *p)
+{
+ switch (global_cfg.mem_func) {
+ case MEM_FUNC_SYS:
+ return free(p);
+ case MEM_FUNC_RTE:
+ return rte_free(p);
+ }
+}
+
+static void *
+test_alloc1(size_t sz, void *p)
+{
+ struct memstat *ms;
+ void *buf;
+
+ ms = p;
+ buf = alloc_func(sz);
+ rte_atomic64_inc(&ms->alloc.nb_call);
+ if (buf != NULL) {
+ memset(buf, 0, sz);
+ rte_atomic64_add(&ms->alloc.sz, sz);
+ } else
+ rte_atomic64_inc(&ms->alloc.nb_fail);
+
+ return buf;
+}
+
+static void
+test_free1(void *buf, void *p)
+{
+ struct memstat *ms;
+
+ ms = p;
+
+ free_func(buf);
+ rte_atomic64_inc(&ms->free.nb_call);
+ if (buf == NULL)
+ rte_atomic64_inc(&ms->free.nb_fail);
+}
+
+static void
+global_cfg_dump(FILE *f)
+{
+ fprintf(f, "%s={\n", __func__);
+ fprintf(f, "\t.run_time=%u\n", global_cfg.run_time);
+ fprintf(f, "\t.wrk_max_obj=%u\n", global_cfg.wrk_max_obj);
+ fprintf(f, "\t.wrk_fill_thrsh=%u\n", global_cfg.wrk_fill_thrsh);
+ fprintf(f, "\t.wrk_free_thrsh=%u\n", global_cfg.wrk_free_thrsh);
+ fprintf(f, "\t.mem_func=%d\n", global_cfg.mem_func);
+ fprintf(f, "\t.verbose=%d\n", global_cfg.verbose);
+ fprintf(f, "};\n");
+}
+
+static void
+memstat_dump(FILE *f, struct memstat *ms)
+{
+ uint64_t alloc_sz, nb_alloc;
+ long double muc, mut;
+
+ nb_alloc = rte_atomic64_read(&ms->alloc.nb_call) -
+ rte_atomic64_read(&ms->alloc.nb_fail);
+ alloc_sz = rte_atomic64_read(&ms->alloc.sz) / nb_alloc;
+ nb_alloc -= rte_atomic64_read(&ms->free.nb_call) -
+ rte_atomic64_read(&ms->free.nb_fail);
+ alloc_sz *= nb_alloc;
+ mut = (alloc_sz == 0) ? 1 :
+ (long double)ms->nb_alloc_obj * mtnk_prm.obj_size / alloc_sz;
+ muc = (alloc_sz == 0) ? 1 :
+ (long double)(ms->nb_alloc_obj + mtnk_prm.max_free) *
+ mtnk_prm.obj_size / alloc_sz;
+
+ fprintf(f, "%s(%p)={\n", __func__, ms);
+ fprintf(f, "\talloc={\n");
+ fprintf(f, "\t\tnb_call=%" PRIu64 ",\n",
+ rte_atomic64_read(&ms->alloc.nb_call));
+ fprintf(f, "\t\tnb_fail=%" PRIu64 ",\n",
+ rte_atomic64_read(&ms->alloc.nb_fail));
+ fprintf(f, "\t\tsz=%" PRIu64 ",\n",
+ rte_atomic64_read(&ms->alloc.sz));
+ fprintf(f, "\t},\n");
+ fprintf(f, "\tfree={\n");
+ fprintf(f, "\t\tnb_call=%" PRIu64 ",\n",
+ rte_atomic64_read(&ms->free.nb_call));
+ fprintf(f, "\t\tnb_fail=%" PRIu64 ",\n",
+ rte_atomic64_read(&ms->free.nb_fail));
+ fprintf(f, "\t},\n");
+ fprintf(f, "\tnb_alloc_obj=%" PRIu64 ",\n", ms->nb_alloc_obj);
+ fprintf(f, "\tnb_alloc_chunk=%" PRIu64 ",\n", nb_alloc);
+ fprintf(f, "\talloc_sz=%" PRIu64 ",\n", alloc_sz);
+ fprintf(f, "\tmem_util(total)=%.2Lf %%,\n", mut * 100);
+ fprintf(f, "\tmem_util(cached)=%.2Lf %%,\n", muc * 100);
+ fprintf(f, "};\n");
+
+}
+
+static void
+memtank_stat_reset(struct memtank_stat *ms)
+{
+ static const struct memtank_stat init_stat = {
+ .alloc.min_cycle = UINT64_MAX,
+ .free.min_cycle = UINT64_MAX,
+ .grow.min_cycle = UINT64_MAX,
+ .shrink.min_cycle = UINT64_MAX,
+ };
+
+ *ms = init_stat;
+}
+
+static void
+memtank_stat_aggr(struct memtank_stat *as, const struct memtank_stat *ms)
+{
+ if (ms->alloc.nb_call != 0) {
+ as->alloc.nb_call += ms->alloc.nb_call;
+ as->alloc.nb_req += ms->alloc.nb_req;
+ as->alloc.nb_alloc += ms->alloc.nb_alloc;
+ as->alloc.nb_cycle += ms->alloc.nb_cycle;
+ as->alloc.max_cycle = RTE_MAX(as->alloc.max_cycle,
+ ms->alloc.max_cycle);
+ as->alloc.min_cycle = RTE_MIN(as->alloc.min_cycle,
+ ms->alloc.min_cycle);
+ }
+ if (ms->free.nb_call != 0) {
+ as->free.nb_call += ms->free.nb_call;
+ as->free.nb_free += ms->free.nb_free;
+ as->free.nb_cycle += ms->free.nb_cycle;
+ as->free.max_cycle = RTE_MAX(as->free.max_cycle,
+ ms->free.max_cycle);
+ as->free.min_cycle = RTE_MIN(as->free.min_cycle,
+ ms->free.min_cycle);
+ }
+ if (ms->grow.nb_call != 0) {
+ as->grow.nb_call += ms->grow.nb_call;
+ as->grow.nb_chunk += ms->grow.nb_chunk;
+ as->grow.nb_cycle += ms->grow.nb_cycle;
+ as->grow.max_cycle = RTE_MAX(as->grow.max_cycle,
+ ms->grow.max_cycle);
+ as->grow.min_cycle = RTE_MIN(as->grow.min_cycle,
+ ms->grow.min_cycle);
+ }
+ if (ms->shrink.nb_call != 0) {
+ as->shrink.nb_call += ms->shrink.nb_call;
+ as->shrink.nb_chunk += ms->shrink.nb_chunk;
+ as->shrink.nb_cycle += ms->shrink.nb_cycle;
+ as->shrink.max_cycle = RTE_MAX(as->shrink.max_cycle,
+ ms->shrink.max_cycle);
+ as->shrink.min_cycle = RTE_MIN(as->shrink.min_cycle,
+ ms->shrink.min_cycle);
+ }
+}
+
+static void
+memtank_stat_dump(FILE *f, uint32_t lc, const struct memtank_stat *ms)
+{
+ uint64_t t;
+ long double st;
+
+ st = (long double)rte_get_timer_hz() / US_PER_S;
+
+ if (lc == UINT32_MAX)
+ fprintf(f, "%s(AGGREGATE)={\n", __func__);
+ else
+ fprintf(f, "%s(lc=%u)={\n", __func__, lc);
+
+ fprintf(f, "\tnb_cycle=%" PRIu64 ",\n", ms->nb_cycle);
+ if (ms->alloc.nb_call != 0) {
+ fprintf(f, "\talloc={\n");
+ fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ms->alloc.nb_call);
+ fprintf(f, "\t\tnb_req=%" PRIu64 ",\n", ms->alloc.nb_req);
+ fprintf(f, "\t\tnb_alloc=%" PRIu64 ",\n", ms->alloc.nb_alloc);
+ fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ms->alloc.nb_cycle);
+
+ t = ms->alloc.nb_req - ms->alloc.nb_alloc;
+ fprintf(f, "\t\tfailed req: %"PRIu64 "(%.2Lf %%)\n",
+ t, (long double)t * 100 / ms->alloc.nb_req);
+ fprintf(f, "\t\tcycles/alloc: %.2Lf\n",
+ (long double)ms->alloc.nb_cycle / ms->alloc.nb_alloc);
+ fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+ (long double)ms->alloc.nb_alloc / ms->alloc.nb_call);
+
+ fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+ ms->alloc.max_cycle,
+ (long double)ms->alloc.max_cycle / st);
+ fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+ ms->alloc.min_cycle,
+ (long double)ms->alloc.min_cycle / st);
+
+ fprintf(f, "\t},\n");
+ }
+ if (ms->free.nb_call != 0) {
+ fprintf(f, "\tfree={\n");
+ fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ms->free.nb_call);
+ fprintf(f, "\t\tnb_free=%" PRIu64 ",\n", ms->free.nb_free);
+ fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ms->free.nb_cycle);
+
+ fprintf(f, "\t\tcycles/free: %.2Lf\n",
+ (long double)ms->free.nb_cycle / ms->free.nb_free);
+ fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+ (long double)ms->free.nb_free / ms->free.nb_call);
+
+ fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+ ms->free.max_cycle,
+ (long double)ms->free.max_cycle / st);
+ fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+ ms->free.min_cycle,
+ (long double)ms->free.min_cycle / st);
+
+ fprintf(f, "\t},\n");
+ }
+ if (ms->grow.nb_call != 0) {
+ fprintf(f, "\tgrow={\n");
+ fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ms->grow.nb_call);
+ fprintf(f, "\t\tnb_chunk=%" PRIu64 ",\n", ms->grow.nb_chunk);
+ fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ms->grow.nb_cycle);
+
+ fprintf(f, "\t\tcycles/chunk: %.2Lf\n",
+ (long double)ms->grow.nb_cycle / ms->grow.nb_chunk);
+ fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+ (long double)ms->grow.nb_chunk / ms->grow.nb_call);
+
+ fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+ ms->grow.max_cycle,
+ (long double)ms->grow.max_cycle / st);
+ fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+ ms->grow.min_cycle,
+ (long double)ms->grow.min_cycle / st);
+
+ fprintf(f, "\t},\n");
+ }
+ if (ms->shrink.nb_call != 0) {
+ fprintf(f, "\tshrink={\n");
+ fprintf(f, "\t\tnb_call=%" PRIu64 ",\n", ms->shrink.nb_call);
+ fprintf(f, "\t\tnb_chunk=%" PRIu64 ",\n", ms->shrink.nb_chunk);
+ fprintf(f, "\t\tnb_cycle=%" PRIu64 ",\n", ms->shrink.nb_cycle);
+
+ fprintf(f, "\t\tcycles/chunk: %.2Lf\n",
+ (long double)ms->shrink.nb_cycle / ms->shrink.nb_chunk);
+ fprintf(f, "\t\tobj/call(avg): %.2Lf\n",
+ (long double)ms->shrink.nb_chunk / ms->shrink.nb_call);
+
+ fprintf(f, "\t\tmax cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+ ms->shrink.max_cycle,
+ (long double)ms->shrink.max_cycle / st);
+ fprintf(f, "\t\tmin cycles/call=%" PRIu64 "(%.2Lf usec),\n",
+ ms->shrink.min_cycle,
+ (long double)ms->shrink.min_cycle / st);
+
+ fprintf(f, "\t},\n");
+ }
+ fprintf(f, "};\n");
+}
+
+static int32_t
+check_fill_objs(void *obj[], uint32_t sz, uint32_t num,
+ uint8_t check, uint8_t fill)
+{
+ uint32_t i;
+ uint8_t buf[sz];
+
+ static rte_spinlock_t dump_lock;
+
+ memset(buf, check, sz);
+
+ for (i = 0; i != num; i++) {
+ if (memcmp(buf, obj[i], sz) != 0) {
+ rte_spinlock_lock(&dump_lock);
+ printf("%s(%u, %u, %hu, %hu) failed at %u-th iter, "
+ "offendig object: %p\n",
+ __func__, sz, num, check, fill, i, obj[i]);
+ rte_memdump(stdout, "expected", buf, sz);
+ rte_memdump(stdout, "result", obj[i], sz);
+ rte_spinlock_unlock(&dump_lock);
+ return -EINVAL;
+ }
+ memset(obj[i], fill, sz);
+ }
+ return 0;
+}
+
+static void
+destroy_worker_ring(struct worker_args *wa)
+{
+ free(wa->rng);
+ wa->rng = NULL;
+}
+
+static int
+create_worker_ring(struct worker_args *wa, uint32_t lc)
+{
+ int32_t rc;
+ size_t sz;
+ struct rte_ring *ring;
+
+ sz = rte_ring_get_memsize(wa->max_obj);
+ ring = aligned_alloc(alignof(typeof(*ring)), sz);
+ if (ring == NULL) {
+ printf("%s(%u): alloc(%zu) for FIFO with %u elems failed",
+ __func__, lc, sz, wa->max_obj);
+ return -ENOMEM;
+ }
+ rc = rte_ring_init(ring, "", wa->max_obj,
+ RING_F_SP_ENQ | RING_F_SC_DEQ);
+ if (rc != 0) {
+ printf("%s(%u): rte_ring_init(%p, %u) failed, error: %d(%s)\n",
+ __func__, lc, ring, wa->max_obj,
+ rc, strerror(-rc));
+ free(ring);
+ return rc;
+ }
+
+ wa->rng = ring;
+ return rc;
+}
+
+static int
+test_worker_cleanup(void *arg)
+{
+ void *obj[BULK_NUM];
+ int32_t rc;
+ uint32_t lc, n, num;
+ struct memtank_arg *ma;
+ struct rte_ring *ring;
+
+ ma = arg;
+ ring = ma->worker.rng;
+ lc = rte_lcore_id();
+
+ rc = 0;
+ for (n = rte_ring_count(ring); rc == 0 && n != 0; n -= num) {
+
+ num = rte_rand() % RTE_DIM(obj);
+ num = RTE_MIN(num, n);
+
+ if (num != 0) {
+ /* retrieve objects to free */
+ rte_ring_dequeue_bulk(ring, obj, num, NULL);
+
+ /* check and fill contents of freeing objects */
+ rc = check_fill_objs(obj, ma->worker.obj_size, num,
+ lc, 0);
+ if (rc == 0) {
+ rte_memtank_free(ma->mt, obj, num,
+ ma->worker.free_flags);
+ ma->stats.free.nb_free += num;
+ }
+ }
+ }
+
+ return rc;
+}
+
+static int
+test_memtank_worker(void *arg)
+{
+ int32_t rc;
+ uint32_t lc, n, num, tfl, tfr;
+ uint64_t cl, tm0, tm1;
+ struct memtank_arg *ma;
+ struct rte_ring *ring;
+ void *obj[BULK_NUM];
+
+ ma = arg;
+ lc = rte_lcore_id();
+
+ /* calculate fill threshold */
+ tfl = (FREE_THRSH_MAX - global_cfg.wrk_fill_thrsh) *
+ ma->worker.max_obj / FREE_THRSH_MAX;
+
+ /* calculate free threshold */
+ tfr = ma->worker.max_obj * global_cfg.wrk_free_thrsh / FREE_THRSH_MAX;
+
+ ring = ma->worker.rng;
+
+ while (wrk_cmd != WRK_CMD_RUN) {
+ rte_smp_rmb();
+ rte_pause();
+ }
+
+ cl = rte_rdtsc_precise();
+
+ do {
+ num = rte_rand() % RTE_DIM(obj);
+ n = rte_ring_free_count(ring);
+ num = (n >= tfl) ? RTE_MIN(num, n) : 0;
+
+ /* perform alloc*/
+ if (num != 0) {
+ tm0 = rte_rdtsc_precise();
+ n = rte_memtank_alloc(ma->mt, obj, num,
+ ma->worker.alloc_flags);
+ tm1 = rte_rdtsc_precise();
+
+ /* check and fill contents of allocated objects */
+ rc = check_fill_objs(obj, ma->worker.obj_size, n,
+ 0, lc);
+ if (rc != 0)
+ break;
+
+ tm1 = tm1 - tm0;
+
+ /* collect alloc stat */
+ ma->stats.alloc.nb_call++;
+ ma->stats.alloc.nb_req += num;
+ ma->stats.alloc.nb_alloc += n;
+ ma->stats.alloc.nb_cycle += tm1;
+ ma->stats.alloc.max_cycle =
+ RTE_MAX(ma->stats.alloc.max_cycle, tm1);
+ ma->stats.alloc.min_cycle =
+ RTE_MIN(ma->stats.alloc.min_cycle, tm1);
+
+ /* store allocated objects */
+ rte_ring_enqueue_bulk(ring, obj, n, NULL);
+ }
+
+ /* get some objects to free */
+ num = rte_rand() % RTE_DIM(obj);
+ n = rte_ring_count(ring);
+ num = (n >= tfr) ? RTE_MIN(num, n) : 0;
+
+ /* perform free*/
+ if (num != 0) {
+
+ /* retrieve objects to free */
+ rte_ring_dequeue_bulk(ring, obj, num, NULL);
+
+ /* check and fill contents of freeing objects */
+ rc = check_fill_objs(obj, ma->worker.obj_size, num,
+ lc, 0);
+ if (rc != 0)
+ break;
+
+ tm0 = rte_rdtsc_precise();
+ rte_memtank_free(ma->mt, obj, num,
+ ma->worker.free_flags);
+ tm1 = rte_rdtsc_precise();
+
+ tm1 = tm1 - tm0;
+
+ /* collect free stat */
+ ma->stats.free.nb_call++;
+ ma->stats.free.nb_free += num;
+ ma->stats.free.nb_cycle += tm1;
+ ma->stats.free.max_cycle =
+ RTE_MAX(ma->stats.free.max_cycle, tm1);
+ ma->stats.free.min_cycle =
+ RTE_MIN(ma->stats.free.min_cycle, tm1);
+ }
+
+ rte_smp_mb();
+ } while (wrk_cmd == WRK_CMD_RUN);
+
+ ma->stats.nb_cycle = rte_rdtsc_precise() - cl;
+
+ return rc;
+}
+
+static int
+test_memtank_master(void *arg)
+{
+ struct memtank_arg *ma;
+ uint64_t cl, tm0, tm1, tm2;
+ uint32_t i, n;
+
+ ma = (struct memtank_arg *)arg;
+
+ for (cl = 0, i = 0; cl < ma->master.run_cycles;
+ cl += tm2 - tm0, i++) {
+
+ tm0 = rte_rdtsc_precise();
+
+ if (ma->master.flags & MASTER_FLAG_SHRINK) {
+
+ n = rte_memtank_shrink(ma->mt);
+ tm1 = rte_rdtsc_precise();
+ ma->stats.shrink.nb_call++;
+ ma->stats.shrink.nb_chunk += n;
+ tm1 = tm1 - tm0;
+
+ if (n != 0) {
+ ma->stats.shrink.nb_cycle += tm1;
+ ma->stats.shrink.max_cycle =
+ RTE_MAX(ma->stats.shrink.max_cycle,
+ tm1);
+ ma->stats.shrink.min_cycle =
+ RTE_MIN(ma->stats.shrink.min_cycle,
+ tm1);
+ }
+ }
+
+ if (ma->master.flags & MASTER_FLAG_GROW) {
+
+ tm1 = rte_rdtsc_precise();
+ n = rte_memtank_grow(ma->mt);
+ tm2 = rte_rdtsc_precise();
+ ma->stats.grow.nb_call++;
+ ma->stats.grow.nb_chunk += n;
+ tm2 = tm2 - tm1;
+
+ if (n != 0) {
+ ma->stats.grow.nb_cycle += tm2;
+ ma->stats.grow.max_cycle =
+ RTE_MAX(ma->stats.grow.max_cycle,
+ tm2);
+ ma->stats.grow.min_cycle =
+ RTE_MIN(ma->stats.grow.min_cycle,
+ tm2);
+ }
+ }
+
+ wrk_cmd = WRK_CMD_RUN;
+ rte_smp_mb();
+
+ rte_delay_us(ma->master.delay_us);
+ tm2 = rte_rdtsc_precise();
+ }
+
+ ma->stats.nb_cycle = cl;
+
+ rte_smp_mb();
+ wrk_cmd = WRK_CMD_STOP;
+
+ return 0;
+}
+
+static int
+fill_worker_args(struct worker_args *wa, uint32_t alloc_flags,
+ uint32_t free_flags, uint32_t lc)
+{
+ wa->max_obj = global_cfg.wrk_max_obj;
+ wa->obj_size = mtnk_prm.obj_size;
+ wa->alloc_flags = alloc_flags;
+ wa->free_flags = free_flags;
+
+ return create_worker_ring(wa, lc);
+}
+
+static void
+fill_master_args(struct master_args *ma, uint32_t flags)
+{
+ uint64_t tm;
+
+ tm = global_cfg.run_time * rte_get_timer_hz();
+
+ ma->run_cycles = tm;
+ ma->delay_us = US_PER_S / MS_PER_S;
+ ma->flags = flags;
+}
+
+static int
+test_memtank_cleanup(struct rte_memtank *mt, struct memstat *ms,
+ struct memtank_arg arg[], const char *tname)
+{
+ int32_t rc;
+ uint32_t lc;
+
+ printf("%s(%s)\n", __func__, tname);
+
+ RTE_LCORE_FOREACH_WORKER(lc)
+ rte_eal_remote_launch(test_worker_cleanup, &arg[lc], lc);
+
+ /* launch on master */
+ lc = rte_lcore_id();
+ arg[lc].master.run_cycles = CLEANUP_TIME * rte_get_timer_hz();
+ test_memtank_master(&arg[lc]);
+
+ rc = 0;
+ ms->nb_alloc_obj = 0;
+ RTE_LCORE_FOREACH_WORKER(lc) {
+ rc |= rte_eal_wait_lcore(lc);
+ ms->nb_alloc_obj += arg[lc].stats.alloc.nb_alloc -
+ arg[lc].stats.free.nb_free;
+ }
+
+ rte_memtank_dump(stdout, mt, RTE_MTANK_DUMP_STAT);
+
+ memstat_dump(stdout, ms);
+ rc = rte_memtank_sanity_check(mt, 0);
+
+ return rc;
+}
+
+/*
+ * alloc/free by workers threads.
+ * grow/shrink by master
+ */
+static int
+test_memtank_mt(const char *tname, uint32_t alloc_flags, uint32_t free_flags)
+{
+ int32_t rc;
+ uint32_t lc;
+ struct rte_memtank *mt;
+ struct rte_memtank_prm prm;
+ struct memstat ms;
+ struct memtank_stat wrk_stats;
+ struct memtank_arg arg[RTE_MAX_LCORE];
+
+ printf("%s(%s) start\n", __func__, tname);
+
+ memset(&prm, 0, sizeof(prm));
+ memset(&ms, 0, sizeof(ms));
+
+ prm = mtnk_prm;
+ prm.alloc = test_alloc1;
+ prm.free = test_free1;
+ prm.udata = &ms;
+
+ mt = rte_memtank_create(&prm);
+ if (mt == NULL) {
+ printf("%s(%s): memtank_create() failed\n", __func__, tname);
+ return -ENOMEM;
+ }
+
+ /* dump initial memory stats */
+ memstat_dump(stdout, &ms);
+
+ rc = 0;
+ memset(arg, 0, sizeof(arg));
+
+ /* prepare args on all slaves */
+ RTE_LCORE_FOREACH_WORKER(lc) {
+ arg[lc].mt = mt;
+ rc = fill_worker_args(&arg[lc].worker, alloc_flags,
+ free_flags, lc);
+ if (rc != 0)
+ break;
+ memtank_stat_reset(&arg[lc].stats);
+ }
+
+ if (rc != 0) {
+ rte_memtank_destroy(mt);
+ return rc;
+ }
+
+ /* launch on all slaves */
+ RTE_LCORE_FOREACH_WORKER(lc)
+ rte_eal_remote_launch(test_memtank_worker, &arg[lc], lc);
+
+ /* launch on master */
+ lc = rte_lcore_id();
+ arg[lc].mt = mt;
+ fill_master_args(&arg[lc].master,
+ MASTER_FLAG_GROW | MASTER_FLAG_SHRINK);
+ test_memtank_master(&arg[lc]);
+
+ /* wait for slaves and collect stats. */
+
+ memtank_stat_reset(&wrk_stats);
+
+ rc = 0;
+ RTE_LCORE_FOREACH_WORKER(lc) {
+ rc |= rte_eal_wait_lcore(lc);
+ if (global_cfg.verbose != 0)
+ memtank_stat_dump(stdout, lc, &arg[lc].stats);
+ memtank_stat_aggr(&wrk_stats, &arg[lc].stats);
+ ms.nb_alloc_obj += arg[lc].stats.alloc.nb_alloc -
+ arg[lc].stats.free.nb_free;
+ }
+
+ memtank_stat_dump(stdout, UINT32_MAX, &wrk_stats);
+
+ lc = rte_lcore_id();
+ memtank_stat_dump(stdout, lc, &arg[lc].stats);
+ rte_memtank_dump(stdout, mt, RTE_MTANK_DUMP_STAT);
+
+ memstat_dump(stdout, &ms);
+ rc |= rte_memtank_sanity_check(mt, 0);
+
+ /* run cleanup on all slave cores */
+ if (rc == 0)
+ rc = test_memtank_cleanup(mt, &ms, arg, tname);
+
+ RTE_LCORE_FOREACH_WORKER(lc)
+ destroy_worker_ring(&arg[lc].worker);
+
+ rte_memtank_destroy(mt);
+ return rc;
+}
+
+/*
+ * alloc/free by workers threads.
+ * grow/shrink by master
+ */
+static int
+test_memtank_mt1(const char *tname)
+{
+ return test_memtank_mt(tname, 0, 0);
+}
+
+/*
+ * alloc/free with grow/shrink by worker threads.
+ * master does nothing
+ */
+static int
+test_memtank_mt2(const char *tname)
+{
+ const uint32_t alloc_flags = RTE_MTANK_ALLOC_CHUNK |
+ RTE_MTANK_ALLOC_GROW;
+ const uint32_t free_flags = RTE_MTANK_FREE_SHRINK;
+
+ return test_memtank_mt(tname, alloc_flags, free_flags);
+}
+
+static int
+parse_uint_val(const char *str, uint32_t *val, uint32_t min, uint32_t max)
+{
+ unsigned long v;
+ char *end;
+
+ errno = 0;
+ v = strtoul(str, &end, 0);
+ if (errno != 0 || end[0] != 0 || v < min || v > max)
+ return -EINVAL;
+
+ val[0] = v;
+ return 0;
+}
+
+static int
+parse_mem_str(const char *str)
+{
+ uint32_t i;
+
+ static const struct {
+ const char *name;
+ int32_t val;
+ } name2val[] = {
+ {
+ .name = "sys",
+ .val = MEM_FUNC_SYS,
+ },
+ {
+ .name = "rte",
+ .val = MEM_FUNC_RTE,
+ },
+ };
+
+ for (i = 0; i != RTE_DIM(name2val); i++) {
+ if (strcmp(str, name2val[i].name) == 0)
+ return name2val[i].val;
+ }
+ return -EINVAL;
+}
+
+/* update global values based on provided user input */
+static int
+update_global_cfg(void)
+{
+ mtnk_prm.max_obj = global_cfg.wrk_max_obj * rte_lcore_count();
+ return 0;
+}
+
+static int
+parse_opt(int argc, char * const argv[])
+{
+ int32_t opt, rc;
+ uint32_t v;
+
+ rc = 0;
+ optind = 0;
+ optarg = NULL;
+
+ while ((opt = getopt(argc, argv, "a:F:f:M:m:s:t:w:v")) != EOF) {
+ switch (opt) {
+ case 'a':
+ rc = parse_mem_str(optarg);
+ if (rc >= 0)
+ global_cfg.mem_func = rc;
+ break;
+ case 'F':
+ rc = parse_uint_val(optarg, &v, FREE_THRSH_MIN,
+ FREE_THRSH_MAX);
+ if (rc == 0)
+ global_cfg.wrk_fill_thrsh = v;
+ break;
+ case 'f':
+ rc = parse_uint_val(optarg, &v, FREE_THRSH_MIN,
+ FREE_THRSH_MAX);
+ if (rc == 0)
+ global_cfg.wrk_free_thrsh = v;
+ break;
+ case 'M':
+ rc = parse_uint_val(optarg, &v, 0, UINT32_MAX);
+ if (rc == 0)
+ mtnk_prm.max_free = v;
+ break;
+ case 'm':
+ rc = parse_uint_val(optarg, &v, 0, UINT32_MAX);
+ if (rc == 0)
+ mtnk_prm.min_free = v;
+ break;
+ case 's':
+ rc = parse_uint_val(optarg, &v, OBJ_SZ_MIN,
+ OBJ_SZ_MAX);
+ if (rc == 0)
+ mtnk_prm.obj_size = v;
+ break;
+ case 't':
+ rc = parse_uint_val(optarg, &v, 0, UINT32_MAX);
+ if (rc == 0)
+ global_cfg.run_time = v;
+ break;
+ case 'w':
+ rc = parse_uint_val(optarg, &v, 0, UINT32_MAX);
+ if (rc == 0)
+ global_cfg.wrk_max_obj = v;
+ break;
+ case 'v':
+ global_cfg.verbose = 1;
+ break;
+ default:
+ rc = -EINVAL;
+ }
+ }
+
+ if (rc < 0)
+ printf("%s: invalid value: \"%s\" for option: \'%c\'\n",
+ __func__, optarg, opt);
+
+ return rc;
+}
+
+static int
+run_memtank_stress(int argc, char *argv[])
+{
+ int32_t rc;
+ uint32_t i, k;
+
+ const struct {
+ const char *name;
+ int (*func)(const char *tname);
+ } tests[] = {
+ {
+ .name = "MT1-WRK_ALLOC_FREE-MST_GROW_SHRINK",
+ .func = test_memtank_mt1,
+ },
+ {
+ .name = "MT1-WRK_ALLOC+GROW_FREE+SHRINK",
+ .func = test_memtank_mt2,
+ },
+ };
+
+ rc = parse_opt(argc, argv);
+ if (rc < 0) {
+ printf("%s: parse_op failed with error code: %d\n",
+ __func__, rc);
+ return rc;
+ }
+
+ /* update global values based on provided user input */
+ rc = update_global_cfg();
+ if (rc < 0)
+ return rc;
+
+ global_cfg_dump(stdout);
+
+ for (i = 0, k = 0; i != RTE_DIM(tests); i++) {
+
+ printf("TEST %s START\n", tests[i].name);
+
+ rc = tests[i].func(tests[i].name);
+ k += (rc == 0);
+
+ if (rc != 0)
+ printf("TEST %s FAILED\n", tests[i].name);
+ else
+ printf("TEST %s OK\n", tests[i].name);
+ }
+
+ printf("Number of tests:\t%u\nSuccess:\t%u\nFailed:\t%u\n",
+ i, k, i - k);
+ return (k != i);
+}
+
+static int
+test_memtank_stress(void)
+{
+ int32_t rc;
+ uint32_t i;
+ const char *val;
+ char *sp, *str, *argv[16];
+ char buf[0x100];
+
+ static const char *dlm = " \t\n";
+ static const char *evar = "DPDK_MEMTANK_STRESS_TEST_PARAM";
+ static const char *eval = "";
+
+ val = getenv(evar);
+ if (val == NULL)
+ val = eval;
+
+ snprintf(buf, sizeof(buf), "%s", val);
+
+ for (i = 0, str = buf; i != RTE_DIM(argv); str = NULL, i++) {
+ argv[i] = strtok_r(str, dlm, &sp);
+ if (argv[i] == NULL)
+ break;
+ }
+
+ if (i == RTE_DIM(argv)) {
+ printf("invalid value: \"%s\" for $(%s)\n", val, evar);
+ return -EINVAL;
+ }
+
+ rc = run_memtank_stress(i, argv);
+ return rc;
+}
+
+REGISTER_STRESS_TEST(memtank_stress_autotest, test_memtank_stress);
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 9296042119..68203b9d7b 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -71,6 +71,7 @@ The public API headers are grouped by topics:
[mempool](@ref rte_mempool.h),
[malloc](@ref rte_malloc.h),
[memcpy](@ref rte_memcpy.h)
+ [memtank](@ref rte_memtank.h)
- **timers**:
[cycles](@ref rte_cycles.h),
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index bedd944681..042482fe16 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -59,6 +59,7 @@ INPUT = @TOPDIR@/doc/api/doxy-api-index.md \
@TOPDIR@/lib/mbuf \
@TOPDIR@/lib/member \
@TOPDIR@/lib/mempool \
+ @TOPDIR@/lib/memtank \
@TOPDIR@/lib/meter \
@TOPDIR@/lib/metrics \
@TOPDIR@/lib/mldev \
diff --git a/doc/guides/prog_guide/img/memtank_internal.svg b/doc/guides/prog_guide/img/memtank_internal.svg
new file mode 100644
index 0000000000..0f13242ed5
--- /dev/null
+++ b/doc/guides/prog_guide/img/memtank_internal.svg
@@ -0,0 +1 @@
+<svg width="1280" height="720" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" overflow="hidden"><defs><clipPath id="clip0"><rect x="0" y="0" width="1280" height="720"/></clipPath></defs><g clip-path="url(#clip0)"><rect x="0" y="0" width="1280" height="720" fill="#FFFFFF"/><text font-family="Calibri Light,Calibri Light_MSFontService,sans-serif" font-weight="300" font-size="59" transform="translate(110.933 111)">memtank<tspan font-size="59" x="236.9" y="0">(internal structure)</tspan></text><rect x="392.5" y="170.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="392.5" y="186.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="392.5" y="202.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="392.5" y="218.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="392.5" y="233.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="392.5" y="249.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="392.5" y="265.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="392.5" y="281.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="16" transform="translate(205.267 215)">min_free<tspan font-size="16" x="62.62" y="0">(grow thresh) </tspan><tspan font-family="Wingdings,Wingdings_MSFontService,sans-serif" font-size="16" x="154" y="0"></tspan><tspan font-size="16" x="-3.66667" y="81">max_free</tspan><tspan font-size="16" x="61.4733" y="81">(shrink thresh) </tspan><tspan font-family="Wingdings,Wingdings_MSFontService,sans-serif" font-size="16" x="159.593" y="81"></tspan><tspan font-style="italic" font-size="24" x="370.867" y="-3">memtank_alloc</tspan><tspan font-style="italic" font-size="24" x="520.973" y="-3">(</tspan><tspan font-style="italic" font-size="24" x="528.307" y="-3">mtnk</tspan><tspan font-style="italic" font-size="24" x="578.24" y="-3">, </tspan><tspan font-style="italic" font-size="24" x="589.74" y="-3">obj</tspan><tspan font-style="italic" font-size="24" x="620.073" y="-3">[], </tspan><tspan font-style="italic" font-size="24" x="646.24" y="-3">num</tspan><tspan font-style="italic" font-size="24" x="689.907" y="-3">, flags) </tspan></text><path d="M488.5 207.167 561.833 207.167 561.833 207.833 488.5 207.833ZM560.5 203.5 568.5 207.5 560.5 211.5Z" fill="#203864"/><text font-family="Calibri,Calibri_MSFontService,sans-serif" font-style="italic" font-weight="400" font-size="24" transform="translate(577.2 274)">memtank_free(<tspan font-size="24" x="150.273" y="0">mtnk</tspan><tspan font-size="24" x="200.207" y="0">, </tspan>obj<tspan font-size="24" x="242.04" y="0">[], </tspan><tspan font-size="24" x="268.207" y="0">num</tspan><tspan font-size="24" x="311.873" y="0">, flags) </tspan></text><path d="M496.167 269.167 569.5 269.167 569.5 269.833 496.167 269.833ZM497.5 273.5 489.5 269.5 497.5 265.5Z" fill="#5B9BD5"/><rect x="90" y="353" width="162" height="29" fill="#E7E6E6"/><text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="16" transform="translate(99.9518 373)">mchunks<tspan font-size="16" x="58.1867" y="0">[USED]</tspan></text><path d="M264.5 365.5 876.43 365.5 876.43 403.05 877.055 403.05" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M847.5 383.5 910.397 383.5" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M865.5 402.5 888.672 402.5" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M858.5 393.5 898.224 393.5" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><rect x="325.5" y="385.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="325.5" y="401.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="325.5" y="417.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="325.5" y="433.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="505.5" y="383.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="505.5" y="399.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="505.5" y="415.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="505.5" y="431.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="686.5" y="385.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="686.5" y="401.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="686.5" y="417.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><rect x="686.5" y="433.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#FFFFFF"/><path d="M368.5 365.5 368.5 385.369" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M548.5 365.5 548.5 384.022" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M728.5 365.5 728.917 385.369" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><rect x="90" y="478" width="162" height="29" fill="#E7E6E6"/><text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="16" transform="translate(99.5345 498)">mchunks<tspan font-size="16" x="58.1867" y="0">[FULL]</tspan></text><path d="M264.5 490.5 729.18 490.5 729.18 526.369 729.059 526.369" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M699.5 507.5 762.397 507.5" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M718.5 526.5 741.672 526.5" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M710.5 517.5 750.224 517.5" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><rect x="325.5" y="510.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="325.5" y="526.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="325.5" y="542.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="325.5" y="558.5" width="85" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="505.5" y="508.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="505.5" y="524.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="505.5" y="540.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><rect x="505.5" y="556.5" width="86" height="16" stroke="#2F528F" stroke-width="1.33333" stroke-miterlimit="8" fill="#D6DCE5"/><path d="M367.5 490.5 367.5 510.369" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><path d="M548.5 490.5 548.5 509.022" stroke="#000000" stroke-width="0.666667" stroke-miterlimit="8" fill="none" fill-rule="evenodd"/><text font-family="Calibri,Calibri_MSFontService,sans-serif" font-style="italic" font-weight="400" font-size="24" transform="translate(178.705 644)">memtank_grow<tspan font-size="24" x="154.32" y="0">(</tspan><tspan font-size="24" x="161.653" y="0">mtnk</tspan><tspan font-size="24" x="211.587" y="0">) </tspan><tspan font-size="24" x="303.585" y="0">memtank_shrink</tspan>(<tspan font-size="24" x="473.858" y="0">mtnk</tspan><tspan font-size="24" x="523.792" y="0">) </tspan></text><path d="M0.333333-8.08095e-07 0.333422 36.6397-0.333245 36.6397-0.333333 8.08095e-07ZM4.00009 35.3063 0.000104987 43.3064-3.99991 35.3064Z" fill="#203864" transform="matrix(1 0 0 -1 367.5 620.806)"/><path d="M548.833 577.5 548.833 615.486 548.167 615.486 548.167 577.5ZM552.5 614.153 548.5 622.153 544.5 614.153Z" fill="#5B9BD5"/><path d="M416.833 310.167 416.833 360.426 416.167 360.426 416.167 310.167ZM412.5 311.5 416.5 303.5 420.5 311.5Z" fill="#203864"/><path d="M0.333346 6.66666 0.333438 56.926-0.333228 56.926-0.333321 6.66666ZM-3.99999 8.00001 0 0 4.00001 7.99999Z" fill="#5B9BD5" transform="matrix(1 0 0 -1 450.5 360.426)"/><text fill="#203864" font-family="Calibri,Calibri_MSFontService,sans-serif" font-style="italic" font-weight="400" font-size="16" transform="translate(313.161 342)">ALLOC_CHUNK <tspan font-size="16" x="-51.8536" y="264">ALLOC_GROW </tspan><tspan fill="#4472C4" font-size="16" x="244.219" y="260">FREE_SHRINK </tspan></text><rect x="90" y="161" width="162" height="29" fill="#E7E6E6"/><text font-family="Calibri,Calibri_MSFontService,sans-serif" font-weight="400" font-size="16" transform="translate(99.5345 181)">LIFO queue </text></g></svg>
\ No newline at end of file
diff --git a/doc/guides/prog_guide/index.rst b/doc/guides/prog_guide/index.rst
index e6f24945b0..834266f8de 100644
--- a/doc/guides/prog_guide/index.rst
+++ b/doc/guides/prog_guide/index.rst
@@ -26,6 +26,7 @@ Memory Management
lcore_var
mempool_lib
+ memtank_lib
mbuf_lib
multi_proc_support
diff --git a/doc/guides/prog_guide/memtank_lib.rst b/doc/guides/prog_guide/memtank_lib.rst
new file mode 100644
index 0000000000..51fe822572
--- /dev/null
+++ b/doc/guides/prog_guide/memtank_lib.rst
@@ -0,0 +1,231 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+ Copyright(c) 2026 Huawei Technologies Co., Ltd
+
+Memtank Library
+===============
+
+The memtank library is a fixed sized object allocator for DPDK applications.
+Same a s mempool it allows to alloc/free bulk of objects of fixed size in a
+lightweight manner.
+But in addition it can grow/shrink dynamically plus provides extra
+additional API for higher flexibility:
+
+* manual grow()/shrink() functions
+
+* different alloc/free policies
+ (can be specified by user via flags parameter):
+
+ * lightweight as possible, but can fail
+
+ * more robust, but heavyweight - might cause call to user-provided backing
+ memory allocator.
+
+* user provided callbacks for actual system-wide memory reservations.
+ User is free to choose whatever is most suitable way for his/her scenario,
+ i.e: via malloc/rte_malloc/mmap/some custom allocator.
+
+* user defined constructor callback for newly allocated objects.
+
+* built-in per object runtime verify (boundary violation, double free, etc.) –
+ controlled by flags at memtank creation time.
+
+
+Memtank usage scenarios
+-----------------------
+
+Use memtank when:
+
+* Relatively small objects of the same size are allocated and freed on the
+ data path with low, predictable latency requirements.
+
+* Pre-allocation memory for maximum possible number of objects is not
+ feasible.
+
+
+Internals
+---------
+
+Internally each memtank consists of:
+
+* Fixed size LIFO queue that serves as a pool of free objects for fast
+ allocation/deallocation. It's internals and behavior are very similar
+ to current mempool LIFO driver.
+
+* Two lists (USED, FREE) of memchunks. Memchunk is an analog of SLAB:
+ For performance reasons memtank tries to allocate memory in relatively
+ big chunks (memchunks) and then split each memchunk in dozens (or hundreds)
+ of objects. Objects from memchunks are used to populate pool of free
+ objects (see above).
+
+* Each memchunk consists of some metadata plus array of free objects (LIFO)
+ that belong to that chunk. As soon as number of free objects in the chunk
+ becomes equal to the total number of objects it considered as ```FREE```
+ and can be 'shrinked' - relased back to the memory subsystem.
+
+* Each object within memtank is a properly aligned and initialized data buffer
+ that will be provided to the user followed by the metadata that is used
+ to determine which memchunk it belongs to plus some extra fields used for
+ statisitics collection and runtime verification. Total size of the metadata
+ for each object: 32B.
+
+There are two user defined thresholds that control memtank grow/shrink
+behavior:
+
+* ```min_free``` - grow threshold. That value controls two things: when it is
+ time to request for more memory from the underlying memory subsystem and how
+ many memory has to be requested/released in one go.
+
+* ```max_free``` - shrink threshold. That value determines when it is ok for
+ memtank to start to release unused memory back to the underlying memory
+ subsystem.
+
+Also user can define a grow limit: ```max_obj``` - maximimum possible number
+of objects that given memtank can contain. By setting all these three
+parameters to the same value, memtank behaves like mempool with LIFO driver.
+
+.. _figure_memtank-internals:
+
+.. figure:: img/memtank_internal.svg
+
+
+Brief API description
+---------------------
+
+* ```rte_memtank_create()```/```rte_memtank_destroy()``` are responsible for
+creation/destroying the memntank.
+
+* ```rte_memtank_alloc()```/```rte_memtank_free()``` - perform objects
+ allocation/deallocation from/to the memntank. Note that both of them
+ operate in bulks and accept extra flag parameter to allow user to specify
+ exact behavior.
+
+* ```rte_memtank_chunk_alloc()```/```rte_memtank_chunk_free()``` also perform
+ allocation/deallocation from/to the memntank. Though these functions bypass
+ pooll of free objects and allocate/free objects straight from/to the pool.
+
+* ```rte_memtank_grow()```/```rte_memtank_shrink()``` are intended to
+ explicitly reserve/release memory from/to underlying memory subsystem and
+ add/remove objects to/from the tank. Possible usage scenario - either some
+ house-keeping task, or even data-path thread during idle periods.
+
+* ```rte_memtank_dump()```/```rte_memtank_sanity_check()``` - miscelanneous
+ API for statistics/internal dumping and sanity cheking.
+
+
+Aled public API functions except ```rte_memetank_destroy()``` are MT safe and
+can be called concurrently from different threads.
+
+Object allocation
+~~~~~~~~~~~~~~~~~
+
+By default ```rte_memtank_alloc()``` first tries to get objects from the free
+objects pool. If there are not enough free objects in the pool, then behavior
+depends on the flag values user provided:
+
+* none - alloc() will simply return to the user obtained from the pool objects.
+
+* ```RTE_MTANK_ALLOC_CHUNK``` - alloc() will try to get remaining free objects
+ from already allocated memchunks.
+
+* If already allocated memchunks also don't contain enough
+ free objects and ```RTE_MTANK_ALLOC_GROW``` is specified, then it will try
+ to perform ```grow``` operation by allocating extra memory from the
+ underlying memory susbystem and creating new memchunks to satisfy user
+ request.
+
+In last two cases, it will try to refill free pool up to ```min_free```
+threshold value.
+
+Object de-allocation
+~~~~~~~~~~~~~~~~~~~~
+
+In reverse, ```ret_memtank_free()``` first tries to put objects back to
+the free pool. In case there is not enough room, it puts remaining free
+objects to the memchunks they belong to. After that, if
+```RTE_MTANK_FREE_SHRINK```` is specified it starts ```shrink``` operation
+to return unused memchunks back to the memory subsystem.
+
+
+Grow/Shrink
+~~~~~~~~~~~
+
+Apart from invoking ```grow```/```shrink``` implicitly (via alloc/free flags)
+there is an API for explicit invocation:
+
+* ```rte_memtank_grow(struct rte_memtank \*)``` - if number of objects in
+the free pool drops below ```min_free``` thershold, it requests next memory
+region from the udnerlying memory subsystem, creates new memchunks from it
+and populates the pool.
+
+* ```rte_memtank_shrink(struct memtank \*)``` - if total number of free objects
+in the tank exceeds ```max_free``` theshold it de-allocates unused memchunks
+back to the underlying memory subsystem.
+
+
+Create/Destroy
+~~~~~~~~~~~~~~
+
+.. code-block:: c
+
+ sruct user_defined_type;
+
+ /*
+ * User defined callbacks to reserve/release memory from/to backing
+ * memory subsystem.
+ */
+
+ static void *
+ user_defined_alloc(size_t sz, void *udata)
+ {
+ RTE_SET_USED(udata);
+ return rte_malloc(NULL, sz, 0);
+ }
+
+ static void
+ user_defined_free(void *buf, void *udata)
+ {
+ RTE_SET_USED(udata);
+ rte_free(buf);
+ }
+
+ /*
+ * As used needs new memtank he fills memtank param structure and calls
+ * rte_memtank_create():
+ */
+ static struct rte_memtank_prm prm = {
+ /* min number of free objs in the pool (grow threshold). */
+ .min_free = 1024,
+ /* max number of free objs (shrink threshold)a */
+ .max_free = 1024 * 1024,
+ .obj_size = sizeof(struct user_defined_type);
+ .obj_align = alignof(struct user_defined_type);
+ .nb_obj_chunk = 2 * 1024,
+ /* enable obj runtime verify and stats collection */
+ .flags = RTE_MTANK_OBJ_DBG,
+ /* user defined callbacks to reserve/release actual memory */
+ .alloc = user_defined_alloc,
+ .free = user_define_free,
+ };
+
+ struct rte_memtank *mt = rte_memtank_create(&prm);
+
+ ....
+
+ /* no more objects from the memtank are in use */
+ rte_memtank_destroy(mt);
+
+
+Known limitations (subject for further improvements):
+-----------------------------------------------------
+
+* scalability:
+ after 8+ lcores conventional mempool (with FIFO) starts to outperform
+ memtank (which by default uses LIFO inside).
+
+* mempool_cache integration is not part of the library and right now
+ has to be implemented by used manually on top of memtank API.
+
+* As pool of free objects might contain objects from different memchunks,
+ it can prevent some memchunks to get deallocated and returned back to
+ the memory subsystem.
+
diff --git a/lib/memtank/memtank.c b/lib/memtank/memtank.c
new file mode 100644
index 0000000000..2c6ec948a5
--- /dev/null
+++ b/lib/memtank/memtank.c
@@ -0,0 +1,630 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ * Copyright(c) 2025 Huawei Technologies Co., Ltd
+ */
+
+#include "memtank.h"
+#include <rte_bitops.h>
+#include <rte_errno.h>
+#include <eal_export.h>
+
+#define MEMTANK_OBJ_BULK 0x100
+#define MEMTANK_CHUNK_BULK 0x100
+
+#define ALIGN_MUL_CEIL(v, mul) \
+ ((typeof(v))(((uint64_t)(v) + (mul) - 1) / (mul)))
+
+
+static inline size_t
+memtank_meta_size(uint32_t nb_free)
+{
+ size_t sz;
+ static const struct rte_memtank *mt;
+
+ sz = sizeof(*mt) + nb_free * sizeof(mt->mtf.free[0]);
+ sz = RTE_ALIGN_CEIL(sz, alignof(typeof(*mt)));
+ return sz;
+}
+
+static inline size_t
+memchunk_meta_size(uint32_t nb_obj)
+{
+ size_t sz;
+ static const struct memchunk *ch;
+
+ sz = sizeof(*ch) + nb_obj * sizeof(ch->free[0]);
+ sz = RTE_ALIGN_CEIL(sz, alignof(typeof(*ch)));
+ return sz;
+}
+
+static inline size_t
+memobj_size(uint32_t obj_size, uint32_t obj_align)
+{
+ size_t sz;
+ static const struct memobj *obj;
+
+ sz = sizeof(*obj) + obj_size;
+ sz = RTE_ALIGN_CEIL(sz, obj_align);
+ return sz;
+}
+
+static inline size_t
+memchunk_size(uint32_t nb_obj, uint32_t obj_size, uint32_t obj_align)
+{
+ size_t algn, sz;
+ static const struct memchunk *ch;
+
+ algn = RTE_MAX(alignof(typeof(*ch)), obj_align);
+ sz = memchunk_meta_size(nb_obj);
+ sz += nb_obj * memobj_size(obj_size, obj_align);
+ sz = RTE_ALIGN_CEIL(sz + algn - 1, algn);
+ return sz;
+}
+
+static void
+init_chunk(struct rte_memtank *mt, struct memchunk *ch)
+{
+ uint32_t i, n, sz;
+ uintptr_t p;
+ struct memobj *obj;
+
+ const struct memobj cobj = {
+ .red_zone1 = RED_ZONE_V1,
+ .chunk = ch,
+ .red_zone2 = RED_ZONE_V2,
+ };
+
+ n = mt->prm.nb_obj_chunk;
+ sz = mt->obj_size;
+
+ /* get start of memobj array */
+ p = (uintptr_t)ch + memchunk_meta_size(n);
+ p = RTE_ALIGN_CEIL(p, mt->prm.obj_align);
+
+ for (i = 0; i != n; i++) {
+ obj = obj_pub_full(p, sz);
+ obj[0] = cobj;
+ ch->free[i] = (void *)p;
+ p += sz;
+ }
+
+ ch->nb_total = n;
+ ch->nb_free = n;
+
+ if (mt->prm.init != NULL)
+ mt->prm.init(ch->free, n, mt->prm.udata);
+}
+
+static __rte_always_inline void
+copy_objs(void *dst[], void * const src[], uint32_t num)
+{
+ memcpy(dst, src, num * sizeof(dst[0]));
+}
+
+static inline uint32_t
+get_free(struct memtank_free *t, void *obj[], uint32_t num)
+{
+ uint32_t len, n;
+
+ rte_spinlock_lock(&t->lock);
+
+ len = t->nb_free;
+ n = RTE_MIN(num, len);
+ len -= n;
+ copy_objs(obj, t->free + len, n);
+ t->nb_free = len;
+
+ rte_spinlock_unlock(&t->lock);
+ return n;
+}
+
+static inline uint32_t
+put_free(struct memtank_free *t, void * const obj[], uint32_t num)
+{
+ uint32_t len, n;
+
+ rte_spinlock_lock(&t->lock);
+
+ len = t->nb_free;
+ n = t->max_free - len;
+ n = RTE_MIN(num, n);
+ copy_objs(t->free + len, obj, n);
+ t->nb_free = len + n;
+
+ rte_spinlock_unlock(&t->lock);
+ return n;
+}
+
+static inline void
+fill_free(struct rte_memtank *mt, uint32_t num, uint32_t flags)
+{
+ uint32_t i, l, k, n;
+ void *free[MEMTANK_OBJ_BULK];
+
+ for (i = 0; i != num; i += n) {
+ /* how many objects we need to add into @free */
+ n = RTE_MIN(num - i, RTE_DIM(free));
+ k = rte_memtank_chunk_alloc(mt, free, n, flags);
+ l = put_free(&mt->mtf, free, k);
+
+ /* @free is full, return allocated objects back to chunks */
+ if (l != k)
+ rte_memtank_chunk_free(mt, free + l, k - l, 0);
+
+ /* either free is full, or chunks are empty */
+ if (l != n)
+ break;
+ }
+}
+
+static void
+put_chunk(struct rte_memtank *mt, struct memchunk *ch, void * const obj[],
+ uint32_t num)
+{
+ uint32_t k, n;
+ struct mchunk_list *ls;
+
+ /* chunk should be in the *used* list */
+ k = MC_USED;
+ ls = &mt->chl[k];
+ rte_spinlock_lock(&ls->lock);
+
+ n = ch->nb_free;
+ RTE_ASSERT(n + num <= ch->nb_total);
+
+ copy_objs(ch->free + n, obj, num);
+ ch->nb_free = n + num;
+
+ /* chunk is full now */
+ if (ch->nb_free == ch->nb_total) {
+ TAILQ_REMOVE(&ls->chunk, ch, link);
+ k = MC_FULL;
+ /* chunk is not empty anymore, move it to the head */
+ } else if (n == 0) {
+ TAILQ_REMOVE(&ls->chunk, ch, link);
+ TAILQ_INSERT_HEAD(&ls->chunk, ch, link);
+ }
+
+ rte_spinlock_unlock(&ls->lock);
+
+ /* insert this chunk into the *full* list */
+ if (k == MC_FULL) {
+ ls = &mt->chl[k];
+ rte_spinlock_lock(&ls->lock);
+ TAILQ_INSERT_HEAD(&ls->chunk, ch, link);
+ rte_spinlock_unlock(&ls->lock);
+ }
+}
+
+static inline uint32_t
+_shrink_chunk(struct rte_memtank *mt, struct memchunk *ch[MEMTANK_CHUNK_BULK],
+ uint32_t num)
+{
+ uint32_t i, k;
+ struct mchunk_list *ls;
+
+ ls = &mt->chl[MC_FULL];
+ rte_spinlock_lock(&ls->lock);
+
+ for (k = 0; k != num; k++) {
+ ch[k] = TAILQ_LAST(&ls->chunk, mchunk_head);
+ if (ch[k] == NULL)
+ break;
+ TAILQ_REMOVE(&ls->chunk, ch[k], link);
+ }
+
+ rte_spinlock_unlock(&ls->lock);
+
+ rte_atomic_fetch_sub_explicit(&mt->nb_chunks, k,
+ rte_memory_order_acq_rel);
+
+ for (i = 0; i != k; i++)
+ mt->prm.free(ch[i]->raw, mt->prm.udata);
+
+ return k;
+}
+
+
+static uint32_t
+shrink_chunk(struct rte_memtank *mt, uint32_t num)
+{
+ uint32_t i, k, n;
+ struct memchunk *ch[MEMTANK_CHUNK_BULK];
+
+ k = 0;
+ n = 0;
+ for (i = 0; i != num && n != k; i += k) {
+ n = RTE_MIN(num - i, RTE_DIM(ch));
+ k = _shrink_chunk(mt, ch, n);
+ }
+
+ return i;
+}
+
+static struct memchunk *
+alloc_chunk(struct rte_memtank *mt)
+{
+ void *p;
+ struct memchunk *ch;
+
+ p = mt->prm.alloc(mt->chunk_size, mt->prm.udata);
+ if (p == NULL)
+ return NULL;
+ ch = RTE_PTR_ALIGN_CEIL(p, alignof(typeof(*ch)));
+ ch->raw = p;
+ return ch;
+}
+
+/* Determine by how many chunks we can actually grow */
+static inline uint32_t
+grow_num(struct rte_memtank *mt, uint32_t num)
+{
+ uint32_t k, n, max;
+
+ max = mt->max_chunk;
+ n = num + rte_atomic_fetch_add_explicit(&mt->nb_chunks, num,
+ rte_memory_order_acq_rel);
+
+ if (n <= max)
+ return num;
+
+ k = n - max;
+ return (k >= num) ? 0 : num - k;
+}
+
+static uint32_t
+grow_chunk(struct rte_memtank *mt, uint32_t num)
+{
+ uint32_t k, n;
+ struct mchunk_list *fls;
+ struct mchunk_head ls;
+ struct memchunk *ch;
+
+ /* check can we grow further */
+ k = grow_num(mt, num);
+
+ TAILQ_INIT(&ls);
+
+ for (n = 0; n != k; n++) {
+ ch = alloc_chunk(mt);
+ if (ch == NULL)
+ break;
+ init_chunk(mt, ch);
+ TAILQ_INSERT_HEAD(&ls, ch, link);
+ }
+
+ if (n != 0) {
+ fls = &mt->chl[MC_FULL];
+ rte_spinlock_lock(&fls->lock);
+ TAILQ_CONCAT(&fls->chunk, &ls, link);
+ rte_spinlock_unlock(&fls->lock);
+ }
+
+ if (n != num)
+ rte_atomic_fetch_sub_explicit(&mt->nb_chunks, num - n,
+ rte_memory_order_acq_rel);
+
+ return n;
+}
+
+static void
+obj_dbg_alloc(struct rte_memtank *mt, void * const obj[], uint32_t nb_obj)
+{
+ uint32_t i, sz;
+ struct memobj *po;
+
+ sz = mt->obj_size;
+ for (i = 0; i != nb_obj; i++) {
+ po = obj_pub_full((uintptr_t)obj[i], sz);
+ RTE_VERIFY(memobj_verify(po, 0) == 0);
+ po->dbg.nb_alloc++;
+ }
+}
+
+static void
+obj_dbg_free(struct rte_memtank *mt, void * const obj[], uint32_t nb_obj)
+{
+ uint32_t i, sz;
+ struct memobj *po;
+
+ sz = mt->obj_size;
+ for (i = 0; i != nb_obj; i++) {
+ po = obj_pub_full((uintptr_t)obj[i], sz);
+ RTE_VERIFY(memobj_verify(po, 1) == 0);
+ po->dbg.nb_free++;
+ }
+}
+
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_chunk_free, 26.11)
+void
+rte_memtank_chunk_free(struct rte_memtank *mt, void * const obj[],
+ uint32_t nb_obj, uint32_t flags)
+{
+ size_t csz;
+ uint32_t i, j, k, osz;
+ struct memobj *mo;
+ struct memchunk *ch;
+
+ csz = mt->chunk_size;
+ osz = mt->obj_size;
+
+ if (mt->flags & RTE_MTANK_OBJ_DBG)
+ obj_dbg_free(mt, obj, nb_obj);
+
+ k = 0;
+ for (i = 0; i != nb_obj; i = j) {
+
+ mo = obj_pub_full((uintptr_t)obj[i], osz);
+ ch = mo->chunk;
+
+ /* find number of consequtive objs from the same chunk */
+ for (j = i + 1; j != nb_obj; j++) {
+ if (obj_check_chunk((uintptr_t)obj[j], osz,
+ (uintptr_t)ch, csz) != 0)
+ break;
+ RTE_ASSERT(ch ==
+ obj_pub_full((uintptr_t)obj[j], osz)->chunk);
+ }
+
+ put_chunk(mt, ch, obj + i, j - i);
+ k++;
+ }
+
+ if (flags & RTE_MTANK_FREE_SHRINK)
+ shrink_chunk(mt, k);
+}
+
+static uint32_t
+get_chunk(struct mchunk_list *ls, struct mchunk_head *els,
+ struct mchunk_head *uls, void *obj[], uint32_t nb_obj)
+{
+ uint32_t l, k, n;
+ struct memchunk *ch, *nch;
+
+ rte_spinlock_lock(&ls->lock);
+
+ n = 0;
+ for (ch = TAILQ_FIRST(&ls->chunk);
+ n != nb_obj && ch != NULL && ch->nb_free != 0;
+ ch = nch, n += k) {
+
+ k = RTE_MIN(nb_obj - n, ch->nb_free);
+ l = ch->nb_free - k;
+ copy_objs(obj + n, ch->free + l, k);
+ ch->nb_free = l;
+
+ nch = TAILQ_NEXT(ch, link);
+
+ /* chunk is empty now */
+ if (l == 0) {
+ TAILQ_REMOVE(&ls->chunk, ch, link);
+ TAILQ_INSERT_TAIL(els, ch, link);
+ } else if (uls != NULL) {
+ TAILQ_REMOVE(&ls->chunk, ch, link);
+ TAILQ_INSERT_HEAD(uls, ch, link);
+ }
+ }
+
+ rte_spinlock_unlock(&ls->lock);
+ return n;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_chunk_alloc, 26.11)
+uint32_t
+rte_memtank_chunk_alloc(struct rte_memtank *mt, void *obj[], uint32_t nb_obj,
+ uint32_t flags)
+{
+ uint32_t k, n;
+ struct mchunk_head els, uls;
+
+ /* walk though the *used* list first */
+ n = get_chunk(&mt->chl[MC_USED], &mt->chl[MC_USED].chunk, NULL,
+ obj, nb_obj);
+
+ if (n != nb_obj) {
+
+ TAILQ_INIT(&els);
+ TAILQ_INIT(&uls);
+
+ /* walk though the *full* list */
+ n += get_chunk(&mt->chl[MC_FULL], &els, &uls,
+ obj + n, nb_obj - n);
+
+ if (n != nb_obj && (flags & RTE_MTANK_ALLOC_GROW) != 0) {
+
+ /*
+ * try to allocate extra memchunks.
+ * note that at rare situations with really high load
+ * when number of allocated chunks is close to the
+ * max allowed limit, when multiple threads are
+ * trying to do grow_chunk() simultaneously, it
+ * can fail for some of them leading to a failure
+ * to allocate new elements.
+ */
+ k = ALIGN_MUL_CEIL(nb_obj - n,
+ mt->prm.nb_obj_chunk);
+ k = grow_chunk(mt, k);
+
+ /* walk through the *full* list again */
+ if (k != 0)
+ n += get_chunk(&mt->chl[MC_FULL], &els, &uls,
+ obj + n, nb_obj - n);
+ }
+
+ /* concatenate with *used* list our temporary lists */
+ rte_spinlock_lock(&mt->chl[MC_USED].lock);
+
+ /* put new non-emtpy elems at head of the *used* list */
+ TAILQ_CONCAT(&uls, &mt->chl[MC_USED].chunk, link);
+ TAILQ_CONCAT(&mt->chl[MC_USED].chunk, &uls, link);
+
+ /* put new emtpy elems at tail of the *used* list */
+ TAILQ_CONCAT(&mt->chl[MC_USED].chunk, &els, link);
+
+ rte_spinlock_unlock(&mt->chl[MC_USED].lock);
+ }
+
+ if (mt->flags & RTE_MTANK_OBJ_DBG)
+ obj_dbg_alloc(mt, obj, n);
+
+ return n;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_grow, 26.11)
+int
+rte_memtank_grow(struct rte_memtank *mt)
+{
+ uint32_t k, n, num;
+ struct memtank_free *t;
+
+ t = &mt->mtf;
+
+ /* how many chunks we need to grow */
+ k = t->min_free - t->nb_free;
+ if ((int32_t)k <= 0)
+ return 0;
+
+ num = ALIGN_MUL_CEIL(k, mt->prm.nb_obj_chunk);
+
+ /* try to grow and refill the *free* */
+ n = grow_chunk(mt, num);
+ if (n != 0)
+ fill_free(mt, k, 0);
+
+ return n;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_shrink, 26.11)
+int
+rte_memtank_shrink(struct rte_memtank *mt)
+{
+ uint32_t n;
+ struct memtank_free *t;
+
+ t = &mt->mtf;
+
+ /* how many chunks we need to shrink */
+ if (t->nb_free < t->max_free)
+ return 0;
+
+ /* how many chunks we need to free */
+ n = ALIGN_MUL_CEIL(t->min_free, mt->prm.nb_obj_chunk);
+
+ /* free up to *num* chunks */
+ return shrink_chunk(mt, n);
+}
+
+static int
+check_param(const struct rte_memtank_prm *prm)
+{
+ if (prm->alloc == NULL || prm->free == NULL ||
+ prm->min_free > prm->max_free ||
+ prm->max_free > prm->max_obj ||
+ rte_is_power_of_2(prm->obj_align) == 0 ||
+ prm->min_free == 0 ||
+ prm->nb_obj_chunk == 0)
+ return -EINVAL;
+ return 0;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_create, 26.11)
+struct rte_memtank *
+rte_memtank_create(const struct rte_memtank_prm *prm)
+{
+ int32_t rc;
+ size_t sz;
+ void *p;
+ struct rte_memtank *mt;
+
+ rc = check_param(prm);
+ if (rc != 0) {
+ rte_errno = -rc;
+ return NULL;
+ }
+
+ sz = memtank_meta_size(prm->max_free);
+ p = prm->alloc(sz, prm->udata);
+ if (p == NULL) {
+ rte_errno = ENOMEM;
+ return NULL;
+ }
+
+ mt = RTE_PTR_ALIGN_CEIL(p, alignof(typeof(*mt)));
+
+ memset(mt, 0, sizeof(*mt));
+ mt->prm = *prm;
+
+ mt->raw = p;
+ mt->chunk_size = memchunk_size(prm->nb_obj_chunk, prm->obj_size,
+ prm->obj_align);
+ mt->obj_size = memobj_size(prm->obj_size, prm->obj_align);
+ mt->max_chunk = ALIGN_MUL_CEIL(prm->max_obj, prm->nb_obj_chunk);
+ mt->flags = prm->flags;
+
+ mt->mtf.min_free = prm->min_free;
+ mt->mtf.max_free = prm->max_free;
+
+ TAILQ_INIT(&mt->chl[MC_FULL].chunk);
+ TAILQ_INIT(&mt->chl[MC_USED].chunk);
+
+ return mt;
+}
+
+static void
+free_mchunk_list(struct rte_memtank *mt, struct mchunk_list *ls)
+{
+ struct memchunk *ch;
+
+ for (ch = TAILQ_FIRST(&ls->chunk); ch != NULL;
+ ch = TAILQ_FIRST(&ls->chunk)) {
+ TAILQ_REMOVE(&ls->chunk, ch, link);
+ mt->prm.free(ch->raw, mt->prm.udata);
+ }
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_destroy, 26.11)
+void
+rte_memtank_destroy(struct rte_memtank *mt)
+{
+ if (mt != NULL) {
+ free_mchunk_list(mt, &mt->chl[MC_FULL]);
+ free_mchunk_list(mt, &mt->chl[MC_USED]);
+ mt->prm.free(mt->raw, mt->prm.udata);
+ }
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_alloc, 26.11)
+uint32_t
+rte_memtank_alloc(struct rte_memtank *mt, void *obj[], uint32_t num,
+ uint32_t flags)
+{
+ uint32_t n;
+ struct memtank_free *t;
+
+ t = &mt->mtf;
+ n = get_free(t, obj, num);
+
+ /* not enough free objects, try to allocate via memchunks */
+ if (n != num && flags != 0) {
+ n += rte_memtank_chunk_alloc(mt, obj + n, num - n, flags);
+
+ /* refill *free* tank */
+ if (n == num)
+ fill_free(mt, t->min_free, flags);
+ }
+
+ return n;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_free, 26.11)
+void
+rte_memtank_free(struct rte_memtank *t, void * const obj[], uint32_t num,
+ uint32_t flags)
+{
+ uint32_t n;
+
+ n = put_free(&t->mtf, obj, num);
+ if (n != num)
+ rte_memtank_chunk_free(t, obj + n, num - n, flags);
+}
diff --git a/lib/memtank/memtank.h b/lib/memtank/memtank.h
new file mode 100644
index 0000000000..872f2f7def
--- /dev/null
+++ b/lib/memtank/memtank.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ * Copyright(c) 2025 Huawei Technologies Co., Ltd
+ */
+
+#ifndef _MEMTANK_H_
+#define _MEMTANK_H_
+
+#include <rte_memtank.h>
+#include <rte_atomic.h>
+#include <rte_spinlock.h>
+#include <rte_log.h>
+#include <stdalign.h>
+#include <errno.h>
+
+extern int memtank_logtype;
+#define RTE_LOGTYPE_MTANK memtank_logtype
+#define MTANK_LOG(level, ...) \
+ RTE_LOG_LINE(level, MTANK, "" __VA_ARGS__)
+
+struct memobj {
+ uint64_t red_zone1;
+ struct memchunk *chunk; /* ptr to the chunk it belongs to */
+ struct {
+ uint32_t nb_alloc;
+ uint32_t nb_free;
+ } dbg;
+ uint64_t red_zone2;
+};
+
+#define RED_ZONE_V1 UINT64_C(0xBADECAFEBADECAFE)
+#define RED_ZONE_V2 UINT64_C(0xDEADBEEFDEADBEEF)
+
+struct memchunk {
+ TAILQ_ENTRY(memchunk) link; /* link to the next chunk in the tank */
+ void *raw; /* un-aligned ptr returned by alloc() */
+ uint32_t nb_total; /* total number of objects in the chunk */
+ uint32_t nb_free; /* number of free object in the chunk */
+ void *free[]; /* array of free objects */
+} __rte_cache_aligned;
+
+
+TAILQ_HEAD(mchunk_head, memchunk);
+
+struct mchunk_list {
+ rte_spinlock_t lock;
+ struct mchunk_head chunk; /* list of chunks */
+} __rte_cache_aligned;
+
+enum {
+ MC_FULL, /* all memchunk objs are free */
+ MC_USED, /* some of memchunk objs are allocated */
+ MC_NUM,
+};
+
+struct memtank_free {
+ rte_spinlock_t lock;
+ uint32_t min_free;
+ uint32_t max_free;
+ uint32_t nb_free;
+ void *free[];
+} __rte_cache_aligned;
+
+struct rte_memtank {
+ /* user provided data */
+ struct rte_memtank_prm prm;
+
+ /*run-time data */
+ void *raw; /* un-aligned ptr returned by alloc() */
+ size_t chunk_size; /* full size of each memchunk */
+ uint32_t obj_size; /* full size of each memobj */
+ uint32_t max_chunk; /* max allowed number of chunks */
+ uint32_t flags; /* behavior flags */
+ RTE_ATOMIC(uint32_t) nb_chunks; /* number of allocated chunks */
+ struct mchunk_list chl[MC_NUM]; /* lists of memchunks */
+ struct memtank_free mtf; /* cached free objects */
+};
+
+/**
+ * Obtain pointer to internal memobj struct from public one
+ */
+static inline struct memobj *
+obj_pub_full(uintptr_t p, uint32_t obj_sz)
+{
+ uintptr_t v;
+
+ v = p + obj_sz - sizeof(struct memobj);
+ return (struct memobj *)v;
+}
+
+/**
+ * Fast check: does given object belongs to that memchunk.
+ * Returns zero, if object is within the chunk, non-zero value otherwise.
+ */
+static inline int
+obj_check_chunk(uintptr_t obj, size_t obj_sz, uintptr_t chn, size_t chn_sz)
+{
+ return (obj <= chn || obj + obj_sz > chn + chn_sz);
+}
+
+static inline int
+memobj_verify(const struct memobj *mo, uint32_t finc)
+{
+ if (mo->red_zone1 != RED_ZONE_V1 || mo->red_zone2 != RED_ZONE_V2 ||
+ mo->dbg.nb_alloc != mo->dbg.nb_free + finc)
+ return -EINVAL;
+ return 0;
+}
+
+#endif /* _MEMTANK_H_ */
diff --git a/lib/memtank/meson.build b/lib/memtank/meson.build
new file mode 100644
index 0000000000..a4c54c09bd
--- /dev/null
+++ b/lib/memtank/meson.build
@@ -0,0 +1,18 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2017 Intel Corporation
+
+extra_flags = []
+
+foreach flag: extra_flags
+ if cc.has_argument(flag)
+ cflags += flag
+ endif
+endforeach
+
+sources = files('memtank.c',
+ 'misc.c',
+)
+headers = files(
+ 'rte_memtank.h',
+)
+deps += ['ring', 'telemetry']
diff --git a/lib/memtank/misc.c b/lib/memtank/misc.c
new file mode 100644
index 0000000000..526bbbbcf1
--- /dev/null
+++ b/lib/memtank/misc.c
@@ -0,0 +1,375 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ * Copyright(c) 2025 Huawei Technologies Co., Ltd
+ */
+
+#include "memtank.h"
+#include <inttypes.h>
+#include <stdlib.h>
+#include <eal_export.h>
+
+#define CHUNK_OBJ_LT_NUM 4
+
+struct mchunk_stat {
+ uint32_t nb_empty;
+ uint32_t nb_full;
+ struct {
+ uint32_t nb_chunk;
+ uint32_t nb_obj;
+ struct {
+ uint32_t val;
+ uint32_t num;
+ } chunk_obj_lt[CHUNK_OBJ_LT_NUM];
+ } used;
+};
+
+struct mfree_stat {
+ uint32_t nb_chunk;
+ struct mchunk_stat chunk;
+};
+
+RTE_LOG_REGISTER_DEFAULT(memtank_logtype, INFO);
+
+static void
+mchunk_stat_dump(FILE *f, const struct mchunk_stat *st)
+{
+ uint32_t i;
+
+ fprintf(f, "\t\tstat={\n");
+ fprintf(f, "\t\t\tnb_empty=%u,\n", st->nb_empty);
+ fprintf(f, "\t\t\tnb_full=%u,\n", st->nb_full);
+ fprintf(f, "\t\t\tused={\n");
+ fprintf(f, "\t\t\t\tnb_chunk=%u,\n", st->used.nb_chunk);
+ fprintf(f, "\t\t\t\tnb_obj=%u,\n", st->used.nb_obj);
+
+ for (i = 0; i != RTE_DIM(st->used.chunk_obj_lt); i++) {
+ if (st->used.chunk_obj_lt[i].num != 0)
+ fprintf(f, "\t\t\t\tnb_chunk_obj_lt_%u=%u,\n",
+ st->used.chunk_obj_lt[i].val,
+ st->used.chunk_obj_lt[i].num);
+ }
+
+ fprintf(f, "\t\t\t},\n");
+ fprintf(f, "\t\t},\n");
+}
+
+static void
+mchunk_stat_init(struct mchunk_stat *st, uint32_t nb_obj_chunk)
+{
+ uint32_t i;
+
+ memset(st, 0, sizeof(*st));
+ for (i = 0; i != RTE_DIM(st->used.chunk_obj_lt); i++) {
+ st->used.chunk_obj_lt[i].val = (i + 1) * nb_obj_chunk /
+ RTE_DIM(st->used.chunk_obj_lt);
+ }
+}
+
+static void
+mchunk_stat_collect(struct mchunk_stat *st, const struct memchunk *ch)
+{
+ uint32_t i, n;
+
+ n = ch->nb_total - ch->nb_free;
+
+ if (ch->nb_free == 0)
+ st->nb_empty++;
+ else if (n == 0)
+ st->nb_full++;
+ else {
+ st->used.nb_chunk++;
+ st->used.nb_obj += n;
+
+ for (i = 0; i != RTE_DIM(st->used.chunk_obj_lt); i++) {
+ if (n < st->used.chunk_obj_lt[i].val) {
+ st->used.chunk_obj_lt[i].num++;
+ break;
+ }
+ }
+ }
+}
+
+static void
+mchunk_list_dump(FILE *f, struct rte_memtank *mt, uint32_t idx, uint32_t flags)
+{
+ struct mchunk_list *ls;
+ const struct memchunk *ch;
+ struct mchunk_stat mcs;
+
+ ls = &mt->chl[idx];
+ mchunk_stat_init(&mcs, mt->prm.nb_obj_chunk);
+
+ rte_spinlock_lock(&ls->lock);
+
+ for (ch = TAILQ_FIRST(&ls->chunk); ch != NULL;
+ ch = TAILQ_NEXT(ch, link)) {
+
+ /* collect chunk stats */
+ if (flags & RTE_MTANK_DUMP_CHUNK_STAT)
+ mchunk_stat_collect(&mcs, ch);
+
+ /* dump chunk metadata */
+ if (flags & RTE_MTANK_DUMP_CHUNK) {
+ fprintf(f, "\t\tmemchunk@%p={\n", ch);
+ fprintf(f, "\t\t\traw=%p,\n", ch->raw);
+ fprintf(f, "\t\t\tnb_total=%u,\n", ch->nb_total);
+ fprintf(f, "\t\t\tnb_free=%u,\n", ch->nb_free);
+ fprintf(f, "\t\t},\n");
+ }
+ }
+
+ rte_spinlock_unlock(&ls->lock);
+
+ /* print chunk stats */
+ if (flags & RTE_MTANK_DUMP_CHUNK_STAT)
+ mchunk_stat_dump(f, &mcs);
+}
+
+static void
+mfree_stat_init(struct mfree_stat *st, uint32_t nb_obj_chunk)
+{
+ st->nb_chunk = 0;
+ mchunk_stat_init(&st->chunk, nb_obj_chunk);
+}
+
+static int
+ptr_cmp(const void *p1, const void *p2)
+{
+ uintptr_t rc, v1, v2;
+
+ v1 = *(const uintptr_t *)p1;
+ v2 = *(const uintptr_t *)p2;
+ rc = v1 - v2;
+ return (rc > v1) ? -1 : ((rc > 0) ? 1 : 0);
+}
+
+static void
+mfree_stat_collect(struct mfree_stat *st, struct rte_memtank *mt)
+{
+ uint32_t i, j, n, sz;
+ uintptr_t *p;
+ const struct memobj *mo;
+
+ sz = mt->obj_size;
+
+ p = malloc(mt->mtf.max_free * sizeof(*p));
+ if (p == NULL)
+ return;
+
+ /**
+ * grab free lock and keep it till we analyze related memchunks,
+ * to make sure none of these memchunks will be freed until
+ * we are finished.
+ */
+ rte_spinlock_lock(&mt->mtf.lock);
+
+ /* collect chunks for all objects in free[] */
+ n = mt->mtf.nb_free;
+ memcpy(p, mt->mtf.free, n * sizeof(*p));
+ for (i = 0; i != n; i++) {
+ mo = obj_pub_full(p[i], sz);
+ p[i] = (uintptr_t)mo->chunk;
+ }
+
+ /* sort chunk pointers */
+ qsort(p, n, sizeof(*p), ptr_cmp);
+
+ /* for each chunk collect stats */
+ for (i = 0; i != n; i = j) {
+
+ RTE_ASSERT(st->nb_chunk < mt->max_chunk);
+ st->nb_chunk++;
+ mchunk_stat_collect(&st->chunk, (const struct memchunk *)p[i]);
+ for (j = i + 1; j != n && p[i] == p[j]; j++)
+ ;
+ }
+
+ rte_spinlock_unlock(&mt->mtf.lock);
+ free(p);
+}
+
+static void
+mfree_stat_dump(FILE *f, const struct mfree_stat *st)
+{
+ fprintf(f, "\tfree_stat={\n");
+ fprintf(f, "\t\tnb_chunk=%u,\n", st->nb_chunk);
+ mchunk_stat_dump(f, &st->chunk);
+ fprintf(f, "\t},\n");
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_dump, 26.11)
+void
+rte_memtank_dump(FILE *f, struct rte_memtank *mt, uint32_t flags)
+{
+ uint32_t n;
+
+ if (f == NULL || mt == NULL)
+ return;
+
+ fprintf(f, "rte_memtank@%p={\n", mt);
+ fprintf(f, "\tmin_free=%u,\n", mt->mtf.min_free);
+ fprintf(f, "\tmax_free=%u,\n", mt->mtf.max_free);
+ fprintf(f, "\tnb_free=%u,\n", mt->mtf.nb_free);
+ fprintf(f, "\tchunk_size=%zu,\n", mt->chunk_size);
+ fprintf(f, "\tobj_size=%u,\n", mt->obj_size);
+ fprintf(f, "\tmax_chunk=%u,\n", mt->max_chunk);
+ fprintf(f, "\tflags=%#x,\n", mt->flags);
+ n = rte_atomic_load_explicit(&mt->nb_chunks, rte_memory_order_relaxed);
+ fprintf(f, "\tnb_chunks=%u,\n", n);
+
+ if (flags & RTE_MTANK_DUMP_FREE_STAT) {
+ struct mfree_stat mfs;
+ mfree_stat_init(&mfs, mt->prm.nb_obj_chunk);
+ mfree_stat_collect(&mfs, mt);
+ mfree_stat_dump(f, &mfs);
+ }
+
+ if (flags & (RTE_MTANK_DUMP_CHUNK | RTE_MTANK_DUMP_CHUNK_STAT)) {
+
+ fprintf(f, "\t[FULL]={\n");
+ mchunk_list_dump(f, mt, MC_FULL, flags);
+ fprintf(f, "\t},\n");
+
+ fprintf(f, "\t[USED]={,\n");
+ mchunk_list_dump(f, mt, MC_USED, flags);
+ fprintf(f, "\t},\n");
+ }
+ fprintf(f, "};\n");
+}
+
+static int
+mobj_bulk_check(const char *fname, const struct rte_memtank *mt,
+ const uintptr_t p[], uint32_t num, uint32_t fmsk)
+{
+ int32_t ret;
+ uintptr_t align;
+ uint32_t i, k, sz;
+ const struct memobj *mo;
+
+ k = ((mt->flags & RTE_MTANK_OBJ_DBG) != 0) & fmsk;
+ sz = mt->obj_size;
+ align = mt->prm.obj_align - 1;
+
+ ret = 0;
+ for (i = 0; i != num; i++) {
+
+ if (p[i] == (uintptr_t)NULL) {
+ ret--;
+ MTANK_LOG(ERR,
+ "%s(mt=%p, %p[%u]): NULL object",
+ fname, mt, p, i);
+ } else if ((p[i] & align) != 0) {
+ ret--;
+ MTANK_LOG(ERR,
+ "%s(mt=%p, %p[%u]): object %#zx violates "
+ "expected alignment %#zx",
+ fname, mt, p, i, p[i], align);
+ } else {
+ mo = obj_pub_full(p[i], sz);
+ if (memobj_verify(mo, k) != 0) {
+ ret--;
+ MTANK_LOG(ERR,
+ "%s(mt=%p, %p[%u]): "
+ "invalid object header @%#zx={"
+ "red_zone1=%#" PRIx64 ","
+ "dbg={nb_alloc=%u,nb_free=%u},"
+ "red_zone2=%#" PRIx64
+ "}",
+ fname, mt, p, i, p[i],
+ mo->red_zone1,
+ mo->dbg.nb_alloc, mo->dbg.nb_free,
+ mo->red_zone2);
+ }
+ }
+ }
+
+ return ret;
+}
+
+/* grab free lock and check objects in free[] */
+static int
+mfree_check(struct rte_memtank *mt)
+{
+ int32_t rc;
+
+ rte_spinlock_lock(&mt->mtf.lock);
+ rc = mobj_bulk_check(__func__, mt, (const uintptr_t *)mt->mtf.free,
+ mt->mtf.nb_free, 1);
+ rte_spinlock_unlock(&mt->mtf.lock);
+ return rc;
+}
+
+static int
+mchunk_check(const struct rte_memtank *mt, const struct memchunk *mc,
+ uint32_t tc)
+{
+ int32_t n, rc;
+
+ rc = 0;
+ n = mc->nb_total - mc->nb_free;
+
+ rc -= (mc->nb_total != mt->prm.nb_obj_chunk);
+ rc -= (tc == MC_FULL) ? (n != 0) : (n <= 0);
+ rc -= (RTE_PTR_ALIGN_CEIL(mc->raw, alignof(typeof(*mc))) != mc);
+
+ if (rc != 0)
+ MTANK_LOG(ERR, "%s(mt=%p, tc=%u): invalid memchunk @%p={"
+ "raw=%p, nb_total=%u, nb_free=%u}",
+ __func__, mt, tc, mc,
+ mc->raw, mc->nb_total, mc->nb_free);
+
+ rc += mobj_bulk_check(__func__, mt, (const uintptr_t *)mc->free,
+ mc->nb_free, 0);
+ return rc;
+}
+
+static int
+mchunk_list_check(struct rte_memtank *mt, uint32_t tc, uint32_t *nb_chunk)
+{
+ int32_t rc;
+ uint32_t n;
+ struct mchunk_list *ls;
+ const struct memchunk *ch;
+
+ ls = &mt->chl[tc];
+ rte_spinlock_lock(&ls->lock);
+
+ rc = 0;
+ for (n = 0, ch = TAILQ_FIRST(&ls->chunk); ch != NULL;
+ ch = TAILQ_NEXT(ch, link), n++)
+ rc += mchunk_check(mt, ch, tc);
+
+ rte_spinlock_unlock(&ls->lock);
+
+ *nb_chunk = n;
+ return rc;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_memtank_sanity_check, 26.11)
+int
+rte_memtank_sanity_check(struct rte_memtank *mt, int32_t ct)
+{
+ int32_t rc;
+ uint32_t n, nf, nu;
+
+ rc = mfree_check(mt);
+
+ nf = 0;
+ nu = 0;
+ rc += mchunk_list_check(mt, MC_FULL, &nf);
+ rc += mchunk_list_check(mt, MC_USED, &nu);
+
+ /*
+ * if some other threads concurently do alloc/free/grow/shrink
+ * these numbers can still not match.
+ */
+ n = rte_atomic_load_explicit(&mt->nb_chunks, rte_memory_order_relaxed);
+ if (nf + nu != n && ct == 0) {
+ MTANK_LOG(ERR,
+ "%s(mt=%p) nb_chunks: expected=%u, full=%u, used=%u",
+ __func__, mt, n, nf, nu);
+ rc--;
+ }
+
+ return rc;
+}
diff --git a/lib/memtank/rte_memtank.h b/lib/memtank/rte_memtank.h
new file mode 100644
index 0000000000..7359b39840
--- /dev/null
+++ b/lib/memtank/rte_memtank.h
@@ -0,0 +1,303 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2019 Intel Corporation
+ * Copyright(c) 2025 Huawei Technologies Co., Ltd
+ */
+
+#ifndef _RTE_MEMTANK_H_
+#define _RTE_MEMTANK_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#include <rte_common.h>
+#include <rte_compat.h>
+#include <stdio.h>
+
+/**
+ * @file
+ * RTE memtank
+ *
+ * Same a s mempool it allows to alloc/free objects of fixed size
+ * in a lightweight manner (probably not as lightweight as mempool,
+ * but hopefully close enough).
+ * But in addition it can grow/shrink dynamically plus provides extra
+ * additional API for higher flexibility:
+ * - manual grow()/shrink() functions
+ * - different alloc/free policies
+ * (can be specified by user via flags parameter).
+ *
+ * Internally it consists of:
+ * - LIFO queue (fast allocator/deallocator)
+ * - lists of memchunks (USED, FREE).
+ *
+ * For performance reasons memtank tries to allocate memory in
+ * relatively big chunks (memchunks) and then split each memchunk
+ * in dozens (or hundreds) of objects.
+ * There are two thresholds:
+ * - min_free (grow threshold)
+ * - max_free (shrink threshold)
+ */
+
+struct rte_memtank;
+
+/** generic memtank behavior flags */
+enum {
+ /** Enable obj debugging */
+ RTE_MTANK_OBJ_DBG = 1,
+};
+
+struct rte_memtank_prm {
+ /** min number of free objs in the ring (grow threshold). */
+ uint32_t min_free;
+ uint32_t max_free; /**< max number of free objs (empty threshold) */
+ uint32_t max_obj; /**< max number of objs (grow limit) */
+ uint32_t obj_size; /**< size of each mem object */
+ uint32_t obj_align; /**< alignment of each mem object */
+ uint32_t nb_obj_chunk; /**< number of objects per chunk */
+ uint32_t flags; /**< behavior flags */
+ /** user provided function to alloc chunk of memory */
+ void * (*alloc)(size_t len, void *udata);
+ /** user provided function to free chunk of memory */
+ void (*free)(void *mem, void *udata);
+ /** user provided function to initialiaze an object */
+ void (*init)(void *obj[], uint32_t num, void *udata);
+ void *udata; /**< opaque user data for alloc/free/init */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate and intitialize new memtank instance, based on the
+ * parameters provided. Note that it uses user-provided *alloc()* function
+ * to allocate space for the memtank metadata.
+ * @param prm
+ * Parameters used to create and initialise new memtank.
+ * @return
+ * - Pointer to new memtank insteance created, if operation completed
+ * successfully.
+ * - NULL on error with rte_errno set appropriately.
+ */
+__rte_experimental
+struct rte_memtank *
+rte_memtank_create(const struct rte_memtank_prm *prm);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Destroy the memtank and free all memory referenced by the memtank.
+ * The objects must not be used by other cores as they will be freed.
+ *
+ * @param t
+ * A pointer to the memtank instance.
+ */
+__rte_experimental
+void
+rte_memtank_destroy(struct rte_memtank *t);
+
+
+/** alloc flags */
+enum {
+ RTE_MTANK_ALLOC_CHUNK = 1,
+ /** Allocate extra memchunks if needed */
+ RTE_MTANK_ALLOC_GROW = 2,
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate up to requested number of objects from the memtank.
+ * Note that depending on *alloc* behavior (flags) some new memory chunks
+ * can be allocated from the underlying memory subsystem.
+ *
+ * @param t
+ * A pointer to the memtank instance.
+ * @param obj
+ * An array of void * pointers (objects) that will be filled.
+ * @param num
+ * Number of objects to allocate from the memtank.
+ * @param flags
+ * Flags that control allocation behavior.
+ * @return
+ * Number of allocated objects.
+ */
+__rte_experimental
+uint32_t
+rte_memtank_alloc(struct rte_memtank *t, void *obj[], uint32_t num,
+ uint32_t flags);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate up to requested number of objects from the memtank.
+ * Note that this function bypasses *free* cache(s) and tries to allocate
+ * objects straight from the memory chunks.
+ * Note that depending on *alloc* behavior (flags) some new memory chunks
+ * can be allocated from the underlying memory subsystem.
+ *
+ * @param t
+ * A pointer to the memtank instance.
+ * @param obj
+ * An array of void * pointers (objects) that will be filled.
+ * @param nb_obj
+ * Number of objects to allocate from the memtank.
+ * @param flags
+ * Flags that control allocation behavior.
+ * @return
+ * Number of allocated objects.
+ */
+__rte_experimental
+uint32_t
+rte_memtank_chunk_alloc(struct rte_memtank *t, void *obj[], uint32_t nb_obj,
+ uint32_t flags);
+
+/** free flags */
+enum {
+ /** Free unneeded chunk of memory */
+ RTE_MTANK_FREE_SHRINK = 1,
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Free (put) provided objects back to the memtank.
+ * Note that depending on *free* behavior (flags) some memory chunks can be
+ * returned (freed) to the underlying memory subsystem.
+ *
+ * @param t
+ * A pointer to the memtank instance.
+ * @param obj
+ * An array of object pointers to be freed.
+ * @param num
+ * Number of objects to free.
+ * @param flags
+ * Flags that control free behavior.
+ */
+__rte_experimental
+void
+rte_memtank_free(struct rte_memtank *t, void * const obj[], uint32_t num,
+ uint32_t flags);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Free (put) provided objects back to the memtank.
+ * Note that this function bypasses *free* cache(s) and tries to put
+ * objects straight to the memory chunks.
+ * Note that depending on *free* behavior (flags) some memory chunks can be
+ * returned (freed) to the underlying memory subsystem.
+ *
+ * @param t
+ * A pointer to the memtank instance.
+ * @param obj
+ * An array of object pointers to be freed.
+ * @param nb_obj
+ * Number of objects to allocate from the memtank.
+ * @param flags
+ * Flags that control allocation behavior.
+ */
+__rte_experimental
+void
+rte_memtank_chunk_free(struct rte_memtank *t, void * const obj[],
+ uint32_t nb_obj, uint32_t flags);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check does number of objects in *free* cache is below memtank grow
+ * threshold (min_free). If yes, then tries to allocate memory for new
+ * objects from the underlying memory subsystem.
+ *
+ * @param t
+ * A pointer to the memtank instance.
+ * @return
+ * Number of newly allocated memory chunks.
+ */
+__rte_experimental
+int
+rte_memtank_grow(struct rte_memtank *t);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check does number of objects in *free* cache have reached memtank shrink
+ * threshold (max_free). If yes, then tries to return excessive memory to
+ * the underlying memory subsystem.
+ *
+ * @param t
+ * A pointer to the memtank instance.
+ * @return
+ * Number of freed memory chunks.
+ */
+__rte_experimental
+int
+rte_memtank_shrink(struct rte_memtank *t);
+
+/** dump flags */
+enum {
+ RTE_MTANK_DUMP_FREE_STAT = 1,
+ RTE_MTANK_DUMP_CHUNK_STAT = 2,
+ RTE_MTANK_DUMP_CHUNK = 4,
+ /* first not used power of two */
+ RTE_MTANK_DUMP_END = 8,
+
+ /** dump all stats */
+ RTE_MTANK_DUMP_STAT =
+ (RTE_MTANK_DUMP_FREE_STAT | RTE_MTANK_DUMP_CHUNK_STAT),
+ /** dump everything */
+ RTE_MTANK_DUMP_ALL = RTE_MTANK_DUMP_END - 1,
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Dump information about the memtank to the file.
+ * Note that depending of *flags* value it might cause some internal locks
+ * grabbing, and might affect performance of others threads that
+ * concurently use same memtank.
+ *
+ * @param f
+ * A pinter to the file.
+ * @param t
+ * A pointer to the memtank instance.
+ * @param flags
+ * Flags that control dump behavior.
+ */
+__rte_experimental
+void
+rte_memtank_dump(FILE *f, struct rte_memtank *t, uint32_t flags);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Check the consistency of the given memtank instance.
+ * Dumps error messages to the RTE log subsystem, if some inconsitency
+ * is detected.
+ *
+ * @param t
+ * A pointer to the memtank instance.
+ * @param ct
+ * Value greater then zero, if some other threads do concurently use
+ * that memtank.
+ * @return
+ * Zero on success, or negative value otherwise.
+ */
+__rte_experimental
+int
+rte_memtank_sanity_check(struct rte_memtank *t, int32_t ct);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_MEMTANK_H_ */
diff --git a/lib/meson.build b/lib/meson.build
index af5c160cb8..3b13dfee6c 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -19,6 +19,7 @@ libraries = [
'ring',
'rcu', # rcu depends on ring
'mempool',
+ 'memtank',
'mbuf',
'net',
'meter',
--
2.51.0
^ permalink raw reply related
* [PATCH v5] net/iavf: fix duplicate VF reset during PF reset recovery
From: Anurag Mandal @ 2026-06-10 10:07 UTC (permalink / raw)
To: dev
Cc: bruce.richardson, vladimir.medvedkin, ciara.loftus, Anurag Mandal,
stable
In-Reply-To: <20260605202911.314359-1-anurag.mandal@intel.com>
During PF initiated reset recovery, iavf_dev_close() sending
an extra VIRTCHNL_OP_RESET_VF while recovery is already in progress.
That second reset can leave PF/VF virtchnl state inconsistent and
cause VIRTCHNL_OP_CONFIG_VSI_QUEUES to fail with ERR_PARAM after
ToR link flap/power-cycle, leaving the VF unable to recover.
This results in connection loss.
This patch introduces a new flag 'pf_reset_in_progress', that is
set only when iavf_handle_hw_reset() is entered with
vf_initiated_reset as false and is cleared on exit.
Also, close-time VF reset and related close-time virtchnl
operations are skipped when PF triggered reset recovery is set.
This is done to avoid a duplicate VF reset, and keep normal
behavior for application-driven close or VF-initiated reinit.
Fixes: 675a104e2e94 ("net/iavf: fix abnormal disable HW interrupt")
Fixes: b34fe66ea893 ("net/iavf: delay VF reset command")
Fixes: 5e03e316c753 ("net/iavf: handle virtchnl event message without interrupt")
Cc: stable@dpdk.org
Signed-off-by: Anurag Mandal <anurag.mandal@intel.com>
---
V5: Addressed Ciara Loftus's comments
- added separate flag for PF initiated reset recovery
V4: Addressed Ciara Loftus's comments
- split VF reset from other code changes
V3: Addressed latest ai-code-review comments
V2: Addressed ai-code-review comments
doc/guides/rel_notes/release_26_07.rst | 3 ++
drivers/net/intel/iavf/iavf.h | 7 +++++
drivers/net/intel/iavf/iavf_ethdev.c | 40 +++++++++++++++-----------
drivers/net/intel/iavf/iavf_vchnl.c | 18 ++++++++++--
4 files changed, 49 insertions(+), 19 deletions(-)
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index d2563ac503..f6899a78c3 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -95,6 +95,9 @@ New Features
* Added support for transmitting LLDP packets based on mbuf packet type.
* Implemented AVX2 context descriptor transmit paths.
+ * Prevented duplicate 'VIRTCHNL_OP_RESET_VF' during a PF-initiated
+ reset recovery, which earlier caused virtchnl state corruption
+ and connection loss after a top-of-rack (ToR) link flap/power-cycle.
* **Updated PCAP ethernet driver.**
diff --git a/drivers/net/intel/iavf/iavf.h b/drivers/net/intel/iavf/iavf.h
index 2615b6f034..67aacbe7a6 100644
--- a/drivers/net/intel/iavf/iavf.h
+++ b/drivers/net/intel/iavf/iavf.h
@@ -292,6 +292,13 @@ struct iavf_info {
bool in_reset_recovery;
+ /*
+ * Set only while iavf_handle_hw_reset()
+ * is processing a PF-initiated reset
+ * (vf_initiated_reset == false).
+ */
+ bool pf_reset_in_progress;
+
uint32_t ptp_caps;
rte_spinlock_t phc_time_aq_lock;
};
diff --git a/drivers/net/intel/iavf/iavf_ethdev.c b/drivers/net/intel/iavf/iavf_ethdev.c
index a8031e23a5..2b6f4daa99 100644
--- a/drivers/net/intel/iavf/iavf_ethdev.c
+++ b/drivers/net/intel/iavf/iavf_ethdev.c
@@ -3166,23 +3166,27 @@ iavf_dev_close(struct rte_eth_dev *dev)
ret = iavf_dev_stop(dev);
- /*
- * Release redundant queue resource when close the dev
- * so that other vfs can re-use the queues.
- */
- if (vf->lv_enabled) {
- ret = iavf_request_queues(dev, IAVF_MAX_NUM_QUEUES_DFLT);
- if (ret)
- PMD_DRV_LOG(ERR, "Reset the num of queues failed");
+ /* Skip RESET_VF on a PF-initiated reset */
+ if (!vf->pf_reset_in_progress) {
- vf->max_rss_qregion = IAVF_MAX_NUM_QUEUES_DFLT;
- }
+ /*
+ * Release redundant queue resource when close the dev
+ * so that other vfs can re-use the queues.
+ */
+ if (vf->lv_enabled) {
+ ret = iavf_request_queues(dev, IAVF_MAX_NUM_QUEUES_DFLT);
+ if (ret)
+ PMD_DRV_LOG(ERR, "Reset the num of queues failed");
+ vf->max_rss_qregion = IAVF_MAX_NUM_QUEUES_DFLT;
+ }
- /* Disable promiscuous mode before resetting the VF. This is to avoid
- * potential issues when the PF is bound to the kernel driver.
- */
- if (vf->promisc_unicast_enabled || vf->promisc_multicast_enabled)
- iavf_config_promisc(adapter, false, false);
+ /*
+ * Disable promiscuous mode before resetting the VF. This is to avoid
+ * potential issues when the PF is bound to the kernel driver.
+ */
+ if (vf->promisc_unicast_enabled || vf->promisc_multicast_enabled)
+ iavf_config_promisc(adapter, false, false);
+ }
adapter->closed = true;
@@ -3195,7 +3199,9 @@ iavf_dev_close(struct rte_eth_dev *dev)
iavf_flow_flush(dev, NULL);
iavf_flow_uninit(adapter);
- iavf_vf_reset(hw);
+ /* Skip RESET_VF on a PF-initiated reset */
+ if (!vf->pf_reset_in_progress)
+ iavf_vf_reset(hw);
vf->aq_intr_enabled = false;
iavf_shutdown_adminq(hw);
if (vf->vf_res->vf_cap_flags & VIRTCHNL_VF_OFFLOAD_WB_ON_ITR) {
@@ -3368,6 +3374,7 @@ iavf_handle_hw_reset(struct rte_eth_dev *dev, bool vf_initiated_reset)
}
vf->in_reset_recovery = true;
+ vf->pf_reset_in_progress = !vf_initiated_reset;
iavf_set_no_poll(adapter, false);
/* Call the pre reset callback */
@@ -3418,6 +3425,7 @@ iavf_handle_hw_reset(struct rte_eth_dev *dev, bool vf_initiated_reset)
vf->post_reset_cb(dev->data->port_id, ret, vf->post_reset_cb_arg);
vf->in_reset_recovery = false;
+ vf->pf_reset_in_progress = false;
iavf_set_no_poll(adapter, false);
return;
diff --git a/drivers/net/intel/iavf/iavf_vchnl.c b/drivers/net/intel/iavf/iavf_vchnl.c
index 94ccfb5d6e..cf3513ef94 100644
--- a/drivers/net/intel/iavf/iavf_vchnl.c
+++ b/drivers/net/intel/iavf/iavf_vchnl.c
@@ -283,9 +283,21 @@ iavf_read_msg_from_pf(struct iavf_adapter *adapter, uint16_t buf_len,
vf->link_up ? "up" : "down");
break;
case VIRTCHNL_EVENT_RESET_IMPENDING:
- vf->vf_reset = true;
- iavf_set_no_poll(adapter, false);
- PMD_DRV_LOG(INFO, "VF is resetting");
+ /*
+ * Force link down on impending reset to drop
+ * the cached link-up state; a fresh LSC up
+ * event will be re-issued by the PF once the
+ * VF is reinitialised.
+ */
+ vf->link_up = false;
+ if (!vf->vf_reset) {
+ vf->vf_reset = true;
+ iavf_set_no_poll(adapter, false);
+ iavf_dev_event_post(vf->eth_dev,
+ RTE_ETH_EVENT_INTR_RESET,
+ NULL, 0);
+ }
+ PMD_DRV_LOG(DEBUG, "VF is resetting");
break;
case VIRTCHNL_EVENT_PF_DRIVER_CLOSE:
vf->dev_closed = true;
--
2.34.1
^ permalink raw reply related
* RE: [PATCH v2] net/iavf: fix to consolidate link change event handling
From: Loftus, Ciara @ 2026-06-10 9:43 UTC (permalink / raw)
To: Mandal, Anurag, dev@dpdk.org
Cc: Richardson, Bruce, Medvedkin, Vladimir, stable@dpdk.org
In-Reply-To: <20260609173822.364452-1-anurag.mandal@intel.com>
> Subject: [PATCH v2] net/iavf: fix to consolidate link change event handling
>
[snip]
> +
> /* Read data in admin queue to get msg from pf driver */
> static enum iavf_aq_result
> iavf_read_msg_from_pf(struct iavf_adapter *adapter, uint16_t buf_len,
> @@ -249,38 +310,15 @@ iavf_read_msg_from_pf(struct iavf_adapter
> *adapter, uint16_t buf_len,
> if (opcode == VIRTCHNL_OP_EVENT) {
> struct virtchnl_pf_event *vpe =
> (struct virtchnl_pf_event *)event.msg_buf;
> + if (vpe == NULL) {
> + PMD_DRV_LOG(ERR, "Invalid PF event message");
> + return IAVF_MSG_ERR;
> + }
This check can be removed.
iavf_read_msg_from_pf is called from iavf_wait_for_msg which performs a
NULL check on the same location (args->out_buffer) before passing it to
iavf_read_msg_from_pf
>
> result = IAVF_MSG_SYS;
> switch (vpe->event) {
> case VIRTCHNL_EVENT_LINK_CHANGE:
> - vf->link_up =
> - vpe->event_data.link_event.link_status;
> - if (vf->vf_res != NULL &&
> - vf->vf_res->vf_cap_flags &
> VIRTCHNL_VF_CAP_ADV_LINK_SPEED) {
> - vf->link_speed =
> - vpe-
> >event_data.link_event_adv.link_speed;
> - } else {
> - enum virtchnl_link_speed speed;
> - speed = vpe-
> >event_data.link_event.link_speed;
> - vf->link_speed =
> iavf_convert_link_speed(speed);
> - }
> - iavf_dev_link_update(vf->eth_dev, 0);
> - iavf_dev_event_post(vf->eth_dev,
> RTE_ETH_EVENT_INTR_LSC, NULL, 0);
> - if (vf->link_up && !vf->vf_reset) {
> - iavf_dev_watchdog_disable(adapter);
> - } else {
> - if (!vf->link_up)
> - iavf_dev_watchdog_enable(adapter);
> - }
> - if (adapter->devargs.no_poll_on_link_down) {
> - iavf_set_no_poll(adapter, true);
> - if (adapter->no_poll)
> - PMD_DRV_LOG(DEBUG, "VF no poll
> turned on");
> - else
> - PMD_DRV_LOG(DEBUG, "VF no poll
> turned off");
> - }
> - PMD_DRV_LOG(INFO, "Link status update:%s",
> - vf->link_up ? "up" : "down");
> + iavf_handle_link_change_event(vf->eth_dev, vpe);
> break;
> case VIRTCHNL_EVENT_RESET_IMPENDING:
> vf->vf_reset = true;
> @@ -505,6 +543,12 @@ iavf_handle_pf_event_msg(struct rte_eth_dev *dev,
> uint8_t *msg,
> PMD_DRV_LOG(DEBUG, "Error event");
> return;
> }
> +
> + if (pf_msg == NULL) {
> + PMD_DRV_LOG(ERR, "Invalid PF event message");
> + return;
> + }
This too can be removed.
pf_msg resolves to vf->aq_resp which is a fixed buffer allocated at
driver init time. It cannot be NULL here.
With those two changes I think the patch will be good to go.
> +
> switch (pf_msg->event) {
> case VIRTCHNL_EVENT_RESET_IMPENDING:
> PMD_DRV_LOG(DEBUG,
> "VIRTCHNL_EVENT_RESET_IMPENDING event");
> @@ -518,30 +562,7 @@ iavf_handle_pf_event_msg(struct rte_eth_dev *dev,
> uint8_t *msg,
> break;
> case VIRTCHNL_EVENT_LINK_CHANGE:
> PMD_DRV_LOG(DEBUG, "VIRTCHNL_EVENT_LINK_CHANGE
> event");
> - vf->link_up = pf_msg->event_data.link_event.link_status;
> - if (vf->vf_res->vf_cap_flags &
> VIRTCHNL_VF_CAP_ADV_LINK_SPEED) {
> - vf->link_speed =
> - pf_msg-
> >event_data.link_event_adv.link_speed;
> - } else {
> - enum virtchnl_link_speed speed;
> - speed = pf_msg->event_data.link_event.link_speed;
> - vf->link_speed = iavf_convert_link_speed(speed);
> - }
> - iavf_dev_link_update(dev, 0);
> - if (vf->link_up && !vf->vf_reset) {
> - iavf_dev_watchdog_disable(adapter);
> - } else {
> - if (!vf->link_up)
> - iavf_dev_watchdog_enable(adapter);
> - }
> - if (adapter->devargs.no_poll_on_link_down) {
> - iavf_set_no_poll(adapter, true);
> - if (adapter->no_poll)
> - PMD_DRV_LOG(DEBUG, "VF no poll turned
> on");
> - else
> - PMD_DRV_LOG(DEBUG, "VF no poll turned
> off");
> - }
> - iavf_dev_event_post(dev, RTE_ETH_EVENT_INTR_LSC, NULL,
> 0);
> + iavf_handle_link_change_event(dev, pf_msg);
> break;
> case VIRTCHNL_EVENT_PF_DRIVER_CLOSE:
> PMD_DRV_LOG(DEBUG,
> "VIRTCHNL_EVENT_PF_DRIVER_CLOSE event");
> --
> 2.34.1
^ permalink raw reply
* [PATCH 1/2] eal/pflock: add API to downgrade from wr to rd lock
From: Eimear Morrissey @ 2026-06-10 9:11 UTC (permalink / raw)
To: dev; +Cc: Konstantin Ananyev
In-Reply-To: <20260610091147.88412-1-eimear.morrissey@huawei.com>
From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
Add a new API that allows for the caller to downgrade from wrlock
to rdlock. Note that caller is expected to obtain wrlock before calling
that function.
Signed-off-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
lib/eal/include/rte_pflock.h | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/lib/eal/include/rte_pflock.h b/lib/eal/include/rte_pflock.h
index 6797ce5920..ed5255b3b5 100644
--- a/lib/eal/include/rte_pflock.h
+++ b/lib/eal/include/rte_pflock.h
@@ -179,6 +179,27 @@ rte_pflock_write_unlock(rte_pflock_t *pf)
rte_atomic_fetch_add_explicit(&pf->wr.out, 1, rte_memory_order_release);
}
+/**
+ * Release a pflock held for writing, while keeping lock for reading.
+ *
+ * @param pf
+ * A pointer to a pflock structure.
+ */
+static inline void
+rte_pflock_write_downgrade(rte_pflock_t *pf)
+{
+ /* Migrate from write phase to read phase. */
+ rte_atomic_fetch_add_explicit(&pf->rd.in, RTE_PFLOCK_RINC,
+ rte_memory_order_acq_rel);
+ rte_atomic_fetch_and_explicit(&pf->rd.in, RTE_PFLOCK_LSB,
+ rte_memory_order_release);
+
+ /* Allow other writers to continue. */
+ rte_atomic_fetch_add_explicit(&pf->wr.out, 1,
+ rte_memory_order_release);
+}
+
+
#ifdef __cplusplus
}
#endif
--
2.51.0
^ permalink raw reply related
* [PATCH 2/2] app/test: add stress tests for rwlock and pflock
From: Eimear Morrissey @ 2026-06-10 9:11 UTC (permalink / raw)
To: dev
In-Reply-To: <20260610091147.88412-1-eimear.morrissey@huawei.com>
Stress tests for pflock. Since the logic is generic enough for
rwlock run them against rwlock too.
Signed-off-by: Eimear Morrissey <eimear.morrissey@huawei.com>
---
app/test/meson.build | 2 +
app/test/test_pflock_stress.c | 76 ++++++
app/test/test_rwlock_stress.c | 59 +++++
app/test/test_rwlock_stress_impl.h | 393 +++++++++++++++++++++++++++++
4 files changed, 530 insertions(+)
create mode 100644 app/test/test_pflock_stress.c
create mode 100644 app/test/test_rwlock_stress.c
create mode 100644 app/test/test_rwlock_stress_impl.h
diff --git a/app/test/meson.build b/app/test/meson.build
index 61024125a7..f85ad617ce 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -140,6 +140,7 @@ source_file_deps = {
'test_pdump.c': ['pdump'] + sample_packet_forward_deps,
'test_per_lcore.c': [],
'test_pflock.c': [],
+ 'test_pflock_stress.c': [],
'test_pie.c': ['sched'],
'test_pmd_af_packet.c': ['net_af_packet', 'ethdev', 'bus_vdev'],
'test_pmd_pcap.c': ['net_pcap', 'ethdev', 'bus_vdev'] + packet_burst_generator_deps,
@@ -178,6 +179,7 @@ source_file_deps = {
'test_ring_st_peek_stress_zc.c': ['ptr_compress'],
'test_ring_stress.c': ['ptr_compress'],
'test_rwlock.c': [],
+ 'test_rwlock_stress.c': [],
'test_sched.c': ['net', 'sched'],
'test_security.c': ['net', 'security'],
'test_security_inline_macsec.c': ['ethdev', 'security'],
diff --git a/app/test/test_pflock_stress.c b/app/test/test_pflock_stress.c
new file mode 100644
index 0000000000..cafc5defba
--- /dev/null
+++ b/app/test/test_pflock_stress.c
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Huawei Technologies Co., Ltd
+ */
+
+#include "test_rwlock_stress_impl.h"
+
+/* Pflock operation implementations */
+static void
+pflock_init_fn(struct rwlock_stress_lock *lock)
+{
+ rte_pflock_init(&lock->lock.pflock);
+}
+
+static void
+pflock_read_lock_fn(struct rwlock_stress_lock *lock)
+{
+ rte_pflock_read_lock(&lock->lock.pflock);
+}
+
+static void
+pflock_read_unlock_fn(struct rwlock_stress_lock *lock)
+{
+ rte_pflock_read_unlock(&lock->lock.pflock);
+}
+
+static void
+pflock_write_lock_fn(struct rwlock_stress_lock *lock)
+{
+ rte_pflock_write_lock(&lock->lock.pflock);
+}
+
+static void
+pflock_write_unlock_fn(struct rwlock_stress_lock *lock)
+{
+ rte_pflock_write_unlock(&lock->lock.pflock);
+}
+
+static void
+pflock_write_downgrade_fn(struct rwlock_stress_lock *lock)
+{
+ rte_pflock_write_downgrade(&lock->lock.pflock);
+}
+
+/* Pflock operations table */
+static const struct rwlock_ops pflock_ops = {
+ .name = "pflock",
+ .init = pflock_init_fn,
+ .read_lock = pflock_read_lock_fn,
+ .read_unlock = pflock_read_unlock_fn,
+ .write_lock = pflock_write_lock_fn,
+ .write_unlock = pflock_write_unlock_fn,
+ .write_downgrade = pflock_write_downgrade_fn,
+};
+
+static const struct test_descriptor pflock_specific_tests[] = {
+{
+ .name = "write_downgrade",
+ .num_readers_pct = 50,
+ .reader_delay_us = 0,
+ .writer_delay_us = 0,
+ .flags = DOWNGRADE_TEST,
+ },
+};
+
+static int
+run_pflock_tests(void)
+{
+ int ret = 0;
+ ret |= run_test_suite("PFLOCK Common Stress Tests", &pflock_ops,
+ tests, RTE_DIM(tests));
+ ret |= run_test_suite("PFLOCK Specific Stress Tests", &pflock_ops,
+ pflock_specific_tests, RTE_DIM(pflock_specific_tests));
+ return ret ? -1 : 0;
+}
+
+REGISTER_STRESS_TEST(pflock_stress_autotest, run_pflock_tests);
diff --git a/app/test/test_rwlock_stress.c b/app/test/test_rwlock_stress.c
new file mode 100644
index 0000000000..5d151f3f8f
--- /dev/null
+++ b/app/test/test_rwlock_stress.c
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Huawei Technologies Co., Ltd
+ */
+
+#include "test_rwlock_stress_impl.h"
+
+/* RWLock operation implementations */
+static void
+rwlock_init_fn(struct rwlock_stress_lock *lock)
+{
+ rte_rwlock_init(&lock->lock.rwlock);
+}
+
+static void
+rwlock_read_lock_fn(struct rwlock_stress_lock *lock)
+{
+ rte_rwlock_read_lock(&lock->lock.rwlock);
+}
+
+static void
+rwlock_read_unlock_fn(struct rwlock_stress_lock *lock)
+{
+ rte_rwlock_read_unlock(&lock->lock.rwlock);
+}
+
+static void
+rwlock_write_lock_fn(struct rwlock_stress_lock *lock)
+{
+ rte_rwlock_write_lock(&lock->lock.rwlock);
+}
+
+static void
+rwlock_write_unlock_fn(struct rwlock_stress_lock *lock)
+{
+ rte_rwlock_write_unlock(&lock->lock.rwlock);
+}
+
+/* RWLock operations table */
+static const struct rwlock_ops rwlock_ops = {
+ .name = "rwlock",
+ .init = rwlock_init_fn,
+ .read_lock = rwlock_read_lock_fn,
+ .read_unlock = rwlock_read_unlock_fn,
+ .write_lock = rwlock_write_lock_fn,
+ .write_unlock = rwlock_write_unlock_fn,
+};
+
+static int
+run_rwlock_tests(void)
+{
+ int ret = 0;
+
+ ret |= run_test_suite("RWLOCK Stress Tests", &rwlock_ops, tests,
+ RTE_DIM(tests));
+
+ return ret ? -1 : 0;
+}
+
+REGISTER_STRESS_TEST(rwlock_stress_autotest, run_rwlock_tests);
diff --git a/app/test/test_rwlock_stress_impl.h b/app/test/test_rwlock_stress_impl.h
new file mode 100644
index 0000000000..d28ccd76e0
--- /dev/null
+++ b/app/test/test_rwlock_stress_impl.h
@@ -0,0 +1,393 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026 Huawei Technologies Co., Ltd
+ */
+
+#ifndef _TEST_RWLOCK_STRESS_H_
+#define _TEST_RWLOCK_STRESS_H_
+
+/**
+ * Generic reader-writer lock stress test.
+ *
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdbool.h>
+
+#include <rte_lcore.h>
+#include <rte_cycles.h>
+#include <rte_atomic.h>
+#include <rte_launch.h>
+#include <rte_per_lcore.h>
+#include <rte_malloc.h>
+#include <rte_pflock.h>
+#include <rte_random.h>
+#include <rte_rwlock.h>
+
+#include "test.h"
+
+#define TEST_DURATION_SEC 5
+#define COUNTER_ARRAY_SIZE 1024
+#define DOWNGRADE_TEST 0x1 /* Will attempt to downgrade from write to read lock */
+#define DYNAMIC_ROLES 0x2 /* Threads can switch between reader/writer roles */
+
+struct rwlock_stress_lock;
+
+/**
+ * Lock operations interface.
+ */
+struct rwlock_ops {
+ const char *name;
+
+ void (*init)(struct rwlock_stress_lock *lock);
+ void (*read_lock)(struct rwlock_stress_lock *lock);
+ void (*read_unlock)(struct rwlock_stress_lock *lock);
+ void (*write_lock)(struct rwlock_stress_lock *lock);
+ void (*write_unlock)(struct rwlock_stress_lock *lock);
+ void (*write_downgrade)(struct rwlock_stress_lock *lock);
+};
+
+/**
+ * Generic lock structure.
+ */
+struct rwlock_stress_lock {
+ const struct rwlock_ops *ops;
+
+ union {
+ struct rte_pflock pflock;
+ rte_rwlock_t rwlock;
+ } lock;
+};
+
+/**
+ * Per-lcore statistics
+ */
+struct lcore_stats {
+ uint64_t reader_ops;
+ uint64_t writer_ops;
+ uint64_t local_counter;
+ uint64_t reader_errors;
+ uint64_t writer_errors;
+ uint64_t acquire_time;
+} __rte_cache_aligned;
+
+/**
+ * Test controls
+ */
+struct test_descriptor {
+ const char *name;
+ uint32_t num_readers_pct; /* Percentage of workers as readers (0-100) */
+ uint32_t reader_delay_us; /* Microseconds to delay in reader */
+ uint32_t writer_delay_us; /* Microseconds to delay in writer */
+ uint32_t flags; /* Specialist test behaviour */
+};
+
+/**
+ * Shared test state.
+ */
+struct rwlock_test_shared {
+ struct rwlock_stress_lock lock;
+ volatile uint64_t counter;
+ volatile uint64_t counter_array[COUNTER_ARRAY_SIZE];
+ volatile bool stop;
+ uint32_t num_readers;
+ uint32_t num_writers;
+ const struct test_descriptor *test;
+ struct lcore_stats stats[RTE_MAX_LCORE];
+} __rte_cache_aligned;
+
+/* Test descriptors array */
+static const struct test_descriptor tests[] = {
+ {
+ .name = "basic_reader_writer",
+ .num_readers_pct = 75,
+ .reader_delay_us = 0,
+ .writer_delay_us = 0,
+ },
+ {
+ .name = "long_hold",
+ .num_readers_pct = 67,
+ .reader_delay_us = 100,
+ .writer_delay_us = 100,
+ },
+ {
+ .name = "rapid_acquire_release",
+ .num_readers_pct = 67,
+ .reader_delay_us = 0,
+ .writer_delay_us = 0,
+ },
+ {
+ .name = "dynamic_roles",
+ .num_readers_pct = 75,
+ .reader_delay_us = 0,
+ .writer_delay_us = 0,
+ .flags = DYNAMIC_ROLES,
+ },
+};
+
+static inline bool
+should_be_writer(uint32_t num_readers, uint32_t flags)
+{
+ uint32_t total_lcores = rte_lcore_count();
+ if (total_lcores <= 1)
+ return true;
+
+ if (flags & DYNAMIC_ROLES) {
+ uint32_t readers_pct = (num_readers * 100) / (total_lcores - 1);
+ return (rte_rand_max(100) >= readers_pct);
+ }
+
+ unsigned int idx = rte_lcore_index(rte_lcore_id()) - 1;
+ return idx >= num_readers;
+}
+
+static void
+handle_error(struct rwlock_test_shared *s, unsigned int lcore_id,
+ bool write_lock, const char *func, int line)
+{
+ s->stop = true;
+ if (write_lock) {
+ s->stats[lcore_id].writer_errors++;
+ s->lock.ops->write_unlock(&s->lock);
+ } else {
+ s->stats[lcore_id].reader_errors++;
+ /* Don't unlock here as it's already unlocked by the calling function */
+ }
+ printf("ERROR: lcore:%u: %s:%d early termination\n", lcore_id, func, line);
+}
+
+static int
+handle_writer_work(struct rwlock_test_shared *s, unsigned int lcore_id,
+ const struct test_descriptor *test, uint64_t delta)
+{
+ s->lock.ops->write_lock(&s->lock);
+ uint64_t old_val = s->counter;
+ s->counter += delta;
+ s->stats[lcore_id].local_counter += delta;
+
+ /* Verify increment was atomic */
+ if (s->counter != old_val + delta) {
+ handle_error(s, lcore_id, true, __func__, __LINE__);
+ return -1;
+ }
+
+ /* Update all array elements */
+ for (uint32_t i = 0; i < COUNTER_ARRAY_SIZE; i++) {
+ s->counter_array[i] += delta;
+ if (s->counter_array[i] != s->counter) {
+ handle_error(s, lcore_id, true, __func__, __LINE__);
+ return -1;
+ }
+ }
+
+ if (test->flags & DOWNGRADE_TEST) {
+ /* Downgrade to read lock */
+ if (s->lock.ops->write_downgrade) {
+ s->lock.ops->write_downgrade(&s->lock);
+ /* Verify array consistency under read lock */
+ for (uint32_t i = 0; i < COUNTER_ARRAY_SIZE; i++) {
+ if (s->counter_array[i] != s->counter) {
+ handle_error(s, lcore_id, false, __func__, __LINE__);
+ return -1;
+ }
+ }
+ s->lock.ops->read_unlock(&s->lock);
+ }
+ } else {
+ if (test->writer_delay_us > 0)
+ rte_delay_us_sleep(test->writer_delay_us);
+ s->lock.ops->write_unlock(&s->lock);
+ }
+ s->stats[lcore_id].writer_ops++;
+ return 0;
+}
+
+static int
+handle_reader_work(struct rwlock_test_shared *s, unsigned int lcore_id,
+ const struct test_descriptor *test)
+{
+ uint64_t local_counter;
+
+ s->lock.ops->read_lock(&s->lock);
+ local_counter = s->counter;
+
+ /* Verify array consistency */
+ for (uint32_t i = 0; i < COUNTER_ARRAY_SIZE; i++) {
+ if (s->counter_array[i] != local_counter) {
+ handle_error(s, lcore_id, false, __func__, __LINE__);
+ return -1;
+ }
+ }
+
+ if (test->reader_delay_us > 0)
+ rte_delay_us_sleep(test->reader_delay_us);
+
+ /* Verify counter didn't change during read */
+ if (s->counter != local_counter) {
+ handle_error(s, lcore_id, false, __func__, __LINE__);
+ return -1;
+ }
+
+ s->lock.ops->read_unlock(&s->lock);
+ s->stats[lcore_id].reader_ops++;
+ return 0;
+}
+
+static int
+lcore_function(void *arg)
+{
+ struct rwlock_test_shared *s = arg;
+ unsigned int lcore_id = rte_lcore_id();
+ bool is_writer = should_be_writer(s->num_readers, s->test->flags);
+ const struct test_descriptor *test = s->test;
+
+ while (!s->stop) {
+ uint64_t start = rte_get_timer_cycles();
+ uint64_t delta = (rte_rand() % 64) + 1;
+ int ret;
+
+ if (is_writer)
+ ret = handle_writer_work(s, lcore_id, test, delta);
+ else
+ ret = handle_reader_work(s, lcore_id, test);
+
+ if (ret < 0)
+ continue;
+
+ /* Record max acquire time */
+ uint64_t wait_time = rte_get_timer_cycles() - start;
+ if (wait_time > s->stats[lcore_id].acquire_time)
+ s->stats[lcore_id].acquire_time = wait_time;
+ }
+
+ return 0;
+}
+
+static int
+verify(struct rwlock_test_shared *s)
+{
+ int ret = 0;
+ unsigned int lcore_id;
+ uint64_t total_reader_errors = 0;
+ uint64_t total_writer_errors = 0;
+ uint64_t sum_local_counters = 0;
+
+ /* Calculate errors and counters */
+ RTE_LCORE_FOREACH_WORKER(lcore_id) {
+ total_reader_errors += s->stats[lcore_id].reader_errors;
+ total_writer_errors += s->stats[lcore_id].writer_errors;
+ sum_local_counters += s->stats[lcore_id].local_counter;
+ }
+
+ /* Verify sum of per-lcore counters matches the shared counter */
+ if (s->counter != sum_local_counters) {
+ printf(" FAILED: shared counter=%" PRIu64
+ " sum of local counters=%" PRIu64 "\n",
+ s->counter, sum_local_counters);
+ ret = -1;
+ }
+
+ if (total_reader_errors) {
+ printf(" FAILED: reader errors=%" PRIu64 "\n",
+ total_reader_errors);
+ ret = -1;
+ }
+
+ if (total_writer_errors) {
+ printf(" FAILED: writer errors =%" PRIu64 "\n",
+ total_writer_errors);
+ ret = -1;
+ }
+
+ /* Verify array consistency */
+ for (uint32_t i = 0; i < COUNTER_ARRAY_SIZE; i++) {
+ if (s->counter_array[i] != s->counter) {
+ printf(" FAILED: counter_array[%u]=%" PRIu64 " counter=%" PRIu64 "\n",
+ i, s->counter_array[i], s->counter);
+ ret = -1;
+ break;
+ }
+ }
+
+ return ret;
+}
+
+static int
+test_rwlock_stress_impl(const struct rwlock_ops *ops,
+ const struct test_descriptor *ind_test)
+{
+ struct rwlock_test_shared shared = {0};
+ uint64_t start_time, end_time;
+ uint64_t total_reader_ops = 0;
+ uint64_t total_writer_ops = 0;
+ uint64_t max_acquire_time = 0;
+ unsigned int lcore_id;
+ int ret = 0;
+
+ shared.lock.ops = ops;
+ shared.lock.ops->init(&shared.lock);
+ shared.test = ind_test;
+ shared.num_readers = (ind_test->num_readers_pct * (rte_lcore_count() - 1)) / 100;
+ shared.num_writers = (rte_lcore_count() - 1) - shared.num_readers;
+
+ printf(" %u readers, %u writers\n", shared.num_readers, shared.num_writers);
+
+ /* Launch workers */
+ RTE_LCORE_FOREACH_WORKER(lcore_id) {
+ rte_eal_remote_launch(lcore_function, &shared, lcore_id);
+ }
+
+ /* Run test for duration */
+ start_time = rte_get_timer_cycles();
+ rte_delay_ms(TEST_DURATION_SEC * 1000);
+
+ /* Stop workers and collect stats */
+ shared.stop = true;
+ RTE_LCORE_FOREACH_WORKER(lcore_id) {
+ rte_eal_wait_lcore(lcore_id);
+ if (shared.stats[lcore_id].acquire_time > max_acquire_time)
+ max_acquire_time = shared.stats[lcore_id].acquire_time;
+ total_reader_ops += shared.stats[lcore_id].reader_ops;
+ total_writer_ops += shared.stats[lcore_id].writer_ops;
+ }
+ end_time = rte_get_timer_cycles();
+
+ printf(" %"PRIu64" reader ops, %"PRIu64" writer ops,"
+ "total time: %.2f seconds\n",
+ total_reader_ops, total_writer_ops,
+ (double)(end_time - start_time) / rte_get_timer_hz());
+
+ ret = verify(&shared);
+ if (ret == 0) {
+ uint64_t hz = rte_get_timer_hz();
+ printf(" PASSED: All checks passed (max wait: %.2f us)\n",
+ (double)max_acquire_time * 1000000 / hz);
+ }
+ return ret;
+}
+
+/**
+ * Run a test suite with the given title and tests
+ */
+static int
+run_test_suite(const char *title, const struct rwlock_ops *ops,
+ const struct test_descriptor suite[], uint32_t count)
+{
+ uint32_t failed = 0;
+
+ printf("%s\n===================\n\n", title);
+ for (uint32_t i = 0; i < count; i++) {
+ printf("Test %u/%u: %s\n", i + 1, count, suite[i].name);
+ if (test_rwlock_stress_impl(ops, &suite[i]) < 0)
+ failed++;
+ printf("\n");
+ }
+ printf("===================\n");
+ printf("Results: %u/%u passed, %u failed\n", count - failed, count, failed);
+
+ return failed ? -1 : 0;
+}
+
+#endif /* _TEST_RWLOCK_STRESS_H_ */
--
2.51.0
^ permalink raw reply related
* [PATCH 0/2] Pflock downgrade & stress tests for pflock/rwlock libraries
From: Eimear Morrissey @ 2026-06-10 9:11 UTC (permalink / raw)
To: dev
Add new downgrade option for pflock. Add stress tests for this &
by extension the rest of the pflock/rwlock libraries.
Eimear Morrissey (1):
app/test: add stress tests for rwlock and pflock
Konstantin Ananyev (1):
eal/pflock: add API to downgrade from wr to rd lock
app/test/meson.build | 2 +
app/test/test_pflock_stress.c | 76 ++++++
app/test/test_rwlock_stress.c | 59 +++++
app/test/test_rwlock_stress_impl.h | 393 +++++++++++++++++++++++++++++
lib/eal/include/rte_pflock.h | 21 ++
5 files changed, 551 insertions(+)
create mode 100644 app/test/test_pflock_stress.c
create mode 100644 app/test/test_rwlock_stress.c
create mode 100644 app/test/test_rwlock_stress_impl.h
--
2.51.0
^ permalink raw reply
* Re: [PATCH 1/3] net/iavf: downgrade opcode 0 ARQ log to debug
From: Bruce Richardson @ 2026-06-10 8:26 UTC (permalink / raw)
To: Loftus, Ciara; +Cc: dev@dpdk.org, Talluri, ChaitanyababuX
In-Reply-To: <IA4PR11MB927828E81A5F1EC7C81BABF48E1A2@IA4PR11MB9278.namprd11.prod.outlook.com>
On Wed, Jun 10, 2026 at 09:13:21AM +0100, Loftus, Ciara wrote:
> > Subject: Re: [PATCH 1/3] net/iavf: downgrade opcode 0 ARQ log to debug
> >
> > On Mon, Jun 08, 2026 at 02:55:16PM +0000, Ciara Loftus wrote:
> > > From: Talluri Chaitanyababu <chaitanyababux.talluri@intel.com>
> > >
> > > After admin queue reinitialisation, completions from uninitialised
> > > ARQ ring descriptor memory may arrive before any real PF response.
> > > These carry opcode 0 (`VIRTCHNL_OP_UNKNOWN`) and trigger a WARNING
> > > log on every poll iteration, flooding the log during reset recovery.
> > >
> > > Treat opcode 0 as a distinct case and log it at DEBUG level, while
> > > retaining WARNING for genuine opcode mismatches.
> > >
> > > Signed-off-by: Talluri Chaitanyababu <chaitanyababux.talluri@intel.com>
> > > ---
> > > drivers/net/intel/iavf/iavf_vchnl.c | 11 +++++++++--
> > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > >
> > Should this be backported as a bugfix?
>
> The issue has been present since the driver was added, but only
> triggered in practise more recently with the addition of the reset
> logic.
> I think this tag can be added:
>
> Fixes: 2d23ed74c079 ("net/iavf: enable iavf PMD")
> Cc: stable@dpdk.org
>
> Let me know if you need a respin.
Sorry, never waited for your reply before applying patch, and has already
been pulled to main. Since the issue only recent appears we are probably ok
without backport then. If you think it needs it, I suggest sending an extra
copy of this patch to stable, so that it can be added to the list there.
/Bruce
^ permalink raw reply
* RE: [PATCH 1/3] net/iavf: downgrade opcode 0 ARQ log to debug
From: Loftus, Ciara @ 2026-06-10 8:13 UTC (permalink / raw)
To: Richardson, Bruce; +Cc: dev@dpdk.org, Talluri, ChaitanyababuX
In-Reply-To: <aigjEm-RskU5s45v@bricha3-mobl1.ger.corp.intel.com>
> Subject: Re: [PATCH 1/3] net/iavf: downgrade opcode 0 ARQ log to debug
>
> On Mon, Jun 08, 2026 at 02:55:16PM +0000, Ciara Loftus wrote:
> > From: Talluri Chaitanyababu <chaitanyababux.talluri@intel.com>
> >
> > After admin queue reinitialisation, completions from uninitialised
> > ARQ ring descriptor memory may arrive before any real PF response.
> > These carry opcode 0 (`VIRTCHNL_OP_UNKNOWN`) and trigger a WARNING
> > log on every poll iteration, flooding the log during reset recovery.
> >
> > Treat opcode 0 as a distinct case and log it at DEBUG level, while
> > retaining WARNING for genuine opcode mismatches.
> >
> > Signed-off-by: Talluri Chaitanyababu <chaitanyababux.talluri@intel.com>
> > ---
> > drivers/net/intel/iavf/iavf_vchnl.c | 11 +++++++++--
> > 1 file changed, 9 insertions(+), 2 deletions(-)
> >
> Should this be backported as a bugfix?
The issue has been present since the driver was added, but only
triggered in practise more recently with the addition of the reset
logic.
I think this tag can be added:
Fixes: 2d23ed74c079 ("net/iavf: enable iavf PMD")
Cc: stable@dpdk.org
Let me know if you need a respin.
^ permalink raw reply
* [PATCH v8 1/1] net/mana: add device reset support
From: Wei Hu @ 2026-06-10 7:21 UTC (permalink / raw)
To: dev, stephen; +Cc: longli, weh
In-Reply-To: <cover.1781017284.git.weh@linux.microsoft.com>
From: Wei Hu <weh@microsoft.com>
Add support for handling hardware reset events in the MANA driver.
When the MANA kernel driver receives a hardware service event, it
initiates a device reset and notifies userspace via
IBV_EVENT_DEVICE_FATAL. The DPDK driver handles this by performing
an automatic teardown and recovery sequence.
The interrupt handler sets the device state, blocks new data path
bursts, waits for in-flight bursts to drain using per-queue atomic
flags, and spawns a control thread. The control thread performs
teardown immediately (dev_stop, secondary IPC, dev_close, MR cache
free) before waiting for the hardware recovery timer to fire. This
avoids blocking the EAL interrupt thread on multi-second IPC
timeouts and ibverbs calls. After the recovery delay, the thread
unregisters the interrupt handler, re-probes the PCI device,
reinitializes MR caches, and restarts queues. Each function owns
its own lock scope with no lock hand-off between threads.
Each queue has an atomic burst_state variable where bit 0 is the
in-burst flag and bit 1 is a blocked flag. The data path uses a
single compare-and-swap (0 to 1) to enter a burst, which fails
immediately if the blocked bit is set. The reset path sets the
blocked bit via atomic fetch-or and polls bit 0 to wait for
in-flight bursts to drain. This single-variable design avoids the
need for sequential consistency ordering.
A per-device mutex serializes the reset path with ethdev
operations. The mutex uses PTHREAD_PROCESS_SHARED for multi-process
support and is held across blocking IB verbs calls. A trylock
helper encapsulates the lock acquisition and device state check
for all ethdev operation wrappers. Operations that cannot wait
(configure, queue setup) return -EBUSY during reset, while
dev_stop and dev_close join the reset thread before acquiring
the lock to ensure proper sequencing. A CAS-based helper prevents
double-join of the reset thread.
Multi-process support is included: secondary processes unmap and
remap doorbell pages via IPC during the reset enter and exit
phases. Data path functions in both primary and secondary
processes check the device state atomically and return early when
the device is not active.
The driver emits RTE_ETH_EVENT_ERR_RECOVERING before entering the
reset path so that upper layers (e.g. netvsc) can switch their
data path before queues are stopped. The event is emitted outside
the reset lock to avoid deadlock if the callback calls dev_stop or
dev_close. On completion, the driver emits RECOVERY_SUCCESS or
RECOVERY_FAILED after releasing the lock and clearing the
reset_thread_active flag, preventing self-join deadlock if the
callback calls dev_stop or dev_close. If the enter phase fails
internally, RECOVERY_FAILED is sent immediately so the application
receives a terminal event. A PCI device removal event callback
distinguishes hot-remove from service reset.
Documentation for the device reset feature is added in the MANA
NIC guide and the 26.07 release notes.
Signed-off-by: Wei Hu <weh@microsoft.com>
---
doc/guides/nics/mana.rst | 40 +
doc/guides/rel_notes/release_26_07.rst | 8 +
drivers/net/mana/mana.c | 1076 ++++++++++++++++++++++--
drivers/net/mana/mana.h | 52 +-
drivers/net/mana/mp.c | 89 +-
drivers/net/mana/mr.c | 6 +-
drivers/net/mana/rx.c | 23 +-
drivers/net/mana/tx.c | 44 +-
8 files changed, 1230 insertions(+), 108 deletions(-)
diff --git a/doc/guides/nics/mana.rst b/doc/guides/nics/mana.rst
index 0fcab6e2f6..08e345ea61 100644
--- a/doc/guides/nics/mana.rst
+++ b/doc/guides/nics/mana.rst
@@ -71,3 +71,43 @@ The user can specify below argument in devargs.
The default value is not set,
meaning all the NICs will be probed and loaded.
User can specify multiple mac=xx:xx:xx:xx:xx:xx arguments for up to 8 NICs.
+
+Device Reset Support
+--------------------
+
+The MANA PMD supports automatic recovery from hardware service reset events.
+When the MANA kernel driver receives a hardware service event,
+it initiates a device reset and notifies userspace
+via ``IBV_EVENT_DEVICE_FATAL``.
+
+The driver handles this transparently through a two-phase reset flow:
+
+* **Enter phase**: The interrupt handler blocks new data path bursts
+ and waits for all in-flight burst calls to drain
+ using per-queue atomic flags,
+ then spawns a control thread for the remaining work.
+
+* **Teardown and exit phase**: The control thread tears down
+ IB resources and queues, unmaps secondary process doorbell pages,
+ and closes the device. After a delay for hardware recovery,
+ it re-probes the PCI device,
+ reinstalls the interrupt handler,
+ reinitializes resources, and restarts queues.
+
+The driver emits the following ethdev recovery events
+to notify upper layers (e.g. netvsc) of the reset lifecycle:
+
+``RTE_ETH_EVENT_ERR_RECOVERING``
+ Reset has started.
+
+``RTE_ETH_EVENT_RECOVERY_SUCCESS``
+ Device has recovered successfully.
+
+``RTE_ETH_EVENT_RECOVERY_FAILED``
+ Recovery failed.
+
+To distinguish a PCI hot-remove from a service reset,
+the driver registers for PCI device removal events.
+This requires the application to call ``rte_dev_event_monitor_start()``
+for removal events to be delivered
+(e.g. testpmd ``--hot-plug-handling`` option).
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index bd0cec2709..58e8c2422e 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -122,6 +122,14 @@ New Features
Added AGENTS.md file for AI review
and supporting scripts to review patches and documentation.
+* **Added device reset support to the MANA PMD.**
+
+ Added automatic recovery from hardware service reset events
+ in the MANA poll mode driver. The driver uses ethdev recovery events
+ (``RTE_ETH_EVENT_ERR_RECOVERING``, ``RTE_ETH_EVENT_RECOVERY_SUCCESS``,
+ ``RTE_ETH_EVENT_RECOVERY_FAILED``) to notify upper layers of the
+ reset lifecycle.
+
Removed Items
-------------
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 67396cda1f..96990190b1 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -103,6 +103,8 @@ mana_dev_configure(struct rte_eth_dev *dev)
RTE_ETH_RX_OFFLOAD_VLAN_STRIP);
priv->num_queues = dev->data->nb_rx_queues;
+ DRV_LOG(DEBUG, "priv %p, port %u, dev port %u, num_queues: %u",
+ priv, priv->port_id, priv->dev_port, priv->num_queues);
manadv_set_context_attr(priv->ib_ctx, MANADV_CTX_ATTR_BUF_ALLOCATORS,
(void *)((uintptr_t)&(struct manadv_ctx_allocators){
@@ -214,8 +216,8 @@ mana_dev_start(struct rte_eth_dev *dev)
DRV_LOG(INFO, "TX/RX queues have started");
- /* Enable datapath for secondary processes */
- mana_mp_req_on_rxtx(dev, MANA_MP_REQ_START_RXTX);
+ /* Intentionally ignore errors — secondary may not be running */
+ (void)mana_mp_req_on_rxtx(dev, MANA_MP_REQ_START_RXTX);
ret = rxq_intr_enable(priv);
if (ret) {
@@ -242,26 +244,33 @@ mana_dev_stop(struct rte_eth_dev *dev)
{
int ret;
struct mana_priv *priv = dev->data->dev_private;
-
- rxq_intr_disable(priv);
+ enum mana_device_state state;
+
+ state = rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire);
+ if (state == MANA_DEV_ACTIVE ||
+ state == MANA_DEV_RESET_FAILED) {
+ rxq_intr_disable(priv);
+ DRV_LOG(DEBUG, "rxq_intr_disable called");
+ }
dev->tx_pkt_burst = mana_tx_burst_removed;
dev->rx_pkt_burst = mana_rx_burst_removed;
- /* Stop datapath on secondary processes */
- mana_mp_req_on_rxtx(dev, MANA_MP_REQ_STOP_RXTX);
+ /* Intentionally ignore errors — secondary may not be running */
+ (void)mana_mp_req_on_rxtx(dev, MANA_MP_REQ_STOP_RXTX);
rte_wmb();
ret = mana_stop_tx_queues(dev);
if (ret) {
- DRV_LOG(ERR, "failed to stop tx queues");
+ DRV_LOG(ERR, "failed to stop tx queues, ret %d", ret);
return ret;
}
ret = mana_stop_rx_queues(dev);
if (ret) {
- DRV_LOG(ERR, "failed to stop tx queues");
+ DRV_LOG(ERR, "failed to stop rx queues, ret %d", ret);
return ret;
}
@@ -275,36 +284,66 @@ mana_dev_close(struct rte_eth_dev *dev)
{
struct mana_priv *priv = dev->data->dev_private;
int ret;
+ enum mana_device_state state;
+ DRV_LOG(DEBUG, "Free MR for priv %p", priv);
mana_remove_all_mr(priv);
- ret = mana_intr_uninstall(priv);
- if (ret)
- return ret;
+ state = rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire);
+ if (state == MANA_DEV_ACTIVE ||
+ state == MANA_DEV_RESET_FAILED) {
+ ret = mana_intr_uninstall(priv);
+ if (ret)
+ return ret;
+ }
if (priv->ib_parent_pd) {
- int err = ibv_dealloc_pd(priv->ib_parent_pd);
- if (err)
- DRV_LOG(ERR, "Failed to deallocate parent PD: %d", err);
+ ret = ibv_dealloc_pd(priv->ib_parent_pd);
+ if (ret)
+ DRV_LOG(ERR,
+ "Failed to deallocate parent PD: %d", ret);
priv->ib_parent_pd = NULL;
}
if (priv->ib_pd) {
- int err = ibv_dealloc_pd(priv->ib_pd);
- if (err)
- DRV_LOG(ERR, "Failed to deallocate PD: %d", err);
+ ret = ibv_dealloc_pd(priv->ib_pd);
+ if (ret)
+ DRV_LOG(ERR, "Failed to deallocate PD: %d", ret);
priv->ib_pd = NULL;
}
- ret = ibv_close_device(priv->ib_ctx);
- if (ret) {
- ret = errno;
- return ret;
+ state = rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire);
+ if (state == MANA_DEV_ACTIVE ||
+ state == MANA_DEV_RESET_FAILED) {
+ if (priv->ib_ctx) {
+ ret = ibv_close_device(priv->ib_ctx);
+ if (ret) {
+ ret = errno;
+ return ret;
+ }
+ priv->ib_ctx = NULL;
+ }
}
return 0;
}
+/*
+ * Called from mana_pci_remove to free resources allocated
+ * during probe that are not freed by dev_close.
+ */
+static void
+mana_dev_free_resources(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+
+ pthread_mutex_destroy(&priv->reset_ops_lock);
+ pthread_mutex_destroy(&priv->reset_cond_mutex);
+ pthread_cond_destroy(&priv->reset_cond);
+}
+
static int
mana_dev_info_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info)
@@ -391,6 +430,39 @@ mana_dev_info_get(struct rte_eth_dev *dev,
return 0;
}
+/*
+ * Try to acquire the reset lock and verify the device is active.
+ * Returns 0 with lock held on success, or -EBUSY if the lock
+ * could not be acquired or the device is not in ACTIVE state.
+ */
+static int
+mana_reset_trylock(struct mana_priv *priv)
+{
+ if (pthread_mutex_trylock(&priv->reset_ops_lock))
+ return -EBUSY;
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_ACTIVE) {
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return -EBUSY;
+ }
+ return 0;
+}
+
+static int
+mana_dev_info_get_lock(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_info_get(dev, dev_info);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
static void
mana_dev_tx_queue_info(struct rte_eth_dev *dev, uint16_t queue_id,
struct rte_eth_txq_info *qinfo)
@@ -552,6 +624,22 @@ mana_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
return ret;
}
+static int
+mana_dev_tx_queue_setup_lock(struct rte_eth_dev *dev, uint16_t queue_idx,
+ uint16_t nb_desc, unsigned int socket_id,
+ const struct rte_eth_txconf *tx_conf)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_tx_queue_setup(dev, queue_idx,
+ nb_desc, socket_id, tx_conf);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
static void
mana_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
{
@@ -629,6 +717,23 @@ mana_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
return ret;
}
+static int
+mana_dev_rx_queue_setup_lock(struct rte_eth_dev *dev, uint16_t queue_idx,
+ uint16_t nb_desc, unsigned int socket_id,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mp)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_rx_queue_setup(dev, queue_idx, nb_desc,
+ socket_id, rx_conf, mp);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
static void
mana_dev_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
{
@@ -820,33 +925,253 @@ mana_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
return mana_ifreq(priv, SIOCSIFMTU, &request);
}
+static int
+mana_dev_configure_lock(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_configure(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_dev_start_lock(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_start(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+/*
+ * Join the reset thread if it is active. Uses CAS on
+ * reset_thread_active to ensure only one caller joins.
+ */
+static void
+mana_join_reset_thread(struct mana_priv *priv)
+{
+ bool expected = true;
+
+ if (rte_atomic_compare_exchange_strong_explicit(
+ &priv->reset_thread_active, &expected, false,
+ rte_memory_order_acq_rel,
+ rte_memory_order_acquire)) {
+ pthread_mutex_lock(&priv->reset_cond_mutex);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_ACTIVE, rte_memory_order_release);
+ pthread_cond_signal(&priv->reset_cond);
+ pthread_mutex_unlock(&priv->reset_cond_mutex);
+ rte_thread_join(priv->reset_thread, NULL);
+ }
+}
+
+/*
+ * Clear per-queue burst_state so the data path CAS can succeed again.
+ * Must be called under reset_ops_lock when transitioning back to ACTIVE
+ * after a failed or aborted reset.
+ */
+static void
+mana_clear_burst_state(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int i;
+
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ if (rxq)
+ rte_atomic_store_explicit(&rxq->burst_state, 0,
+ rte_memory_order_release);
+ if (txq)
+ rte_atomic_store_explicit(&txq->burst_state, 0,
+ rte_memory_order_release);
+ }
+}
+
+/*
+ * Custom lock wrappers for dev_stop and dev_close.
+ * These join any active reset thread and use a blocking lock (not
+ * trylock) so they wait for any in-progress reset processing to
+ * finish, rather than returning -EBUSY. When the device is not in
+ * MANA_DEV_ACTIVE state, they transition state to MANA_DEV_ACTIVE.
+ */
+static int
+mana_dev_stop_lock(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ mana_join_reset_thread(priv);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_ACTIVE) {
+ mana_clear_burst_state(dev);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_ACTIVE, rte_memory_order_release);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return 0;
+ }
+
+ ret = mana_dev_stop(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_dev_close_lock(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ mana_join_reset_thread(priv);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_ACTIVE) {
+ mana_clear_burst_state(dev);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_ACTIVE, rte_memory_order_release);
+ }
+
+ ret = mana_dev_close(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_rss_hash_update_lock(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_rss_hash_update(dev, rss_conf);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_rss_hash_conf_get_lock(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_rss_hash_conf_get(dev, rss_conf);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static void
+mana_dev_tx_queue_release_lock(struct rte_eth_dev *dev, uint16_t qid)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+
+ if (mana_reset_trylock(priv)) {
+ DRV_LOG(ERR, "Device reset in progress, "
+ "mana_dev_tx_queue_release not called");
+ return;
+ }
+ mana_dev_tx_queue_release(dev, qid);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+}
+
+static void
+mana_dev_rx_queue_release_lock(struct rte_eth_dev *dev, uint16_t qid)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+
+ if (mana_reset_trylock(priv)) {
+ DRV_LOG(ERR, "Device reset in progress, "
+ "mana_dev_rx_queue_release not called");
+ return;
+ }
+ mana_dev_rx_queue_release(dev, qid);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+}
+
+static int
+mana_rx_intr_enable_lock(struct rte_eth_dev *dev, uint16_t rx_queue_id)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_rx_intr_enable(dev, rx_queue_id);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_rx_intr_disable_lock(struct rte_eth_dev *dev, uint16_t rx_queue_id)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_rx_intr_disable(dev, rx_queue_id);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_mtu_set_lock(struct rte_eth_dev *dev, uint16_t mtu)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_mtu_set(dev, mtu);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
static const struct eth_dev_ops mana_dev_ops = {
- .dev_configure = mana_dev_configure,
- .dev_start = mana_dev_start,
- .dev_stop = mana_dev_stop,
- .dev_close = mana_dev_close,
- .dev_infos_get = mana_dev_info_get,
+ .dev_configure = mana_dev_configure_lock,
+ .dev_start = mana_dev_start_lock,
+ .dev_stop = mana_dev_stop_lock,
+ .dev_close = mana_dev_close_lock,
+ .dev_infos_get = mana_dev_info_get_lock,
.txq_info_get = mana_dev_tx_queue_info,
.rxq_info_get = mana_dev_rx_queue_info,
.dev_supported_ptypes_get = mana_supported_ptypes,
- .rss_hash_update = mana_rss_hash_update,
- .rss_hash_conf_get = mana_rss_hash_conf_get,
- .tx_queue_setup = mana_dev_tx_queue_setup,
- .tx_queue_release = mana_dev_tx_queue_release,
- .rx_queue_setup = mana_dev_rx_queue_setup,
- .rx_queue_release = mana_dev_rx_queue_release,
- .rx_queue_intr_enable = mana_rx_intr_enable,
- .rx_queue_intr_disable = mana_rx_intr_disable,
+ .rss_hash_update = mana_rss_hash_update_lock,
+ .rss_hash_conf_get = mana_rss_hash_conf_get_lock,
+ .tx_queue_setup = mana_dev_tx_queue_setup_lock,
+ .tx_queue_release = mana_dev_tx_queue_release_lock,
+ .rx_queue_setup = mana_dev_rx_queue_setup_lock,
+ .rx_queue_release = mana_dev_rx_queue_release_lock,
+ .rx_queue_intr_enable = mana_rx_intr_enable_lock,
+ .rx_queue_intr_disable = mana_rx_intr_disable_lock,
.link_update = mana_dev_link_update,
.stats_get = mana_dev_stats_get,
.stats_reset = mana_dev_stats_reset,
- .mtu_set = mana_mtu_set,
+ .mtu_set = mana_mtu_set_lock,
};
static const struct eth_dev_ops mana_dev_secondary_ops = {
.stats_get = mana_dev_stats_get,
.stats_reset = mana_dev_stats_reset,
- .dev_infos_get = mana_dev_info_get,
+ .dev_infos_get = mana_dev_info_get_lock,
};
uint16_t
@@ -1031,28 +1356,517 @@ mana_ibv_device_to_pci_addr(const struct ibv_device *device,
return 0;
}
+static int mana_pci_probe(struct rte_pci_driver *pci_drv,
+ struct rte_pci_device *pci_dev);
+static void mana_intr_handler(void *arg);
+static void mana_reset_exit(struct mana_priv *priv);
+
+/* Delay before initiating reset exit after reset enter completes */
+#define MANA_RESET_TIMER_US (15 * 1000000ULL) /* 15 seconds */
+
/*
- * Interrupt handler from IB layer to notify this device is being removed.
+ * Callback for PCI device removal events from EAL.
+ * If the device is in reset (RESET_EXIT state), this means the PCI
+ * device was hot-removed rather than a service reset. Wake the reset
+ * thread via condvar and notify netvsc via RTE_ETH_EVENT_INTR_RMV.
+ */
+static void
+mana_pci_remove_event_cb(const char *device_name,
+ enum rte_dev_event_type event, void *cb_arg)
+{
+ struct mana_priv *priv = cb_arg;
+ struct rte_eth_dev *dev;
+
+ if (event != RTE_DEV_EVENT_REMOVE)
+ return;
+
+ DRV_LOG(INFO, "PCI device %s removed", device_name);
+
+ /* Wake the reset thread immediately */
+ pthread_mutex_lock(&priv->reset_cond_mutex);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_RESET_FAILED, rte_memory_order_release);
+ pthread_cond_signal(&priv->reset_cond);
+ pthread_mutex_unlock(&priv->reset_cond_mutex);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+
+ dev = &rte_eth_devices[priv->port_id];
+ DRV_LOG(INFO, "Sending RTE_ETH_EVENT_INTR_RMV for port %u",
+ priv->port_id);
+ rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_INTR_RMV, NULL);
+}
+
+/*
+ * Reset thread: performs teardown immediately, waits for the
+ * recovery timer, then re-probes and restarts the device.
+ * Runs on a control thread so it can call blocking IPC, ibv
+ * teardown, and rte_intr_callback_unregister (which all must
+ * not run on the EAL interrupt thread).
+ */
+static uint32_t
+mana_reset_thread(void *arg)
+{
+ struct mana_priv *priv = (struct mana_priv *)arg;
+ struct rte_eth_dev *dev = &rte_eth_devices[priv->port_id];
+ struct timespec ts;
+ int ret;
+ int i;
+
+ DRV_LOG(INFO, "Reset thread started");
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+
+ /* Teardown: stop data path, unmap secondary doorbells, close device,
+ * free MR caches. Must happen immediately — hardware may be gone.
+ */
+ ret = mana_dev_stop(dev);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to stop mana dev ret %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_RESET_FAILED, rte_memory_order_release);
+ goto reset_failed;
+ }
+
+ ret = mana_mp_req_on_rxtx(dev, MANA_MP_REQ_RESET_ENTER);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to reset secondary processes ret = %d",
+ ret);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_RESET_FAILED, rte_memory_order_release);
+ goto reset_failed;
+ }
+
+ ret = mana_dev_close(dev);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to close mana dev ret %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_RESET_FAILED, rte_memory_order_release);
+ goto reset_failed;
+ }
+
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ DRV_LOG(DEBUG, "Free MR for priv = %p, rxq %u, txq %u",
+ priv, rxq->rxq_idx, txq->txq_idx);
+ mana_mr_btree_free(&rxq->mr_btree);
+ mana_mr_btree_free(&txq->mr_btree);
+ }
+
+ DRV_LOG(DEBUG, "Teardown complete");
+
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_EXIT,
+ rte_memory_order_release);
+
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+
+ /* Wait for the recovery timer before re-probing.
+ * Can be woken early by PCI remove via condvar signal.
+ */
+ DRV_LOG(INFO, "Waiting %us for hardware recovery",
+ (unsigned int)(MANA_RESET_TIMER_US / 1000000));
+
+ clock_gettime(CLOCK_REALTIME, &ts);
+ ts.tv_sec += MANA_RESET_TIMER_US / 1000000;
+
+ pthread_mutex_lock(&priv->reset_cond_mutex);
+ pthread_cond_timedwait(&priv->reset_cond, &priv->reset_cond_mutex, &ts);
+ pthread_mutex_unlock(&priv->reset_cond_mutex);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_RESET_EXIT) {
+ DRV_LOG(INFO, "Reset thread: dev_state=%d, skipping exit",
+ (int)rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire));
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return 0;
+ }
+
+ DRV_LOG(INFO, "Reset thread: initiating reset exit");
+ mana_reset_exit(priv);
+ /* Lock is released by mana_reset_exit_delay.
+ * reset_thread_active is cleared there before emitting
+ * the recovery event callback.
+ */
+ return 0;
+
+reset_failed:
+ mana_clear_burst_state(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+
+ /* Clear before emitting callback — if the callback calls
+ * dev_stop/dev_close, mana_join_reset_thread must be a no-op
+ * to avoid self-join deadlock on the current thread.
+ */
+ rte_atomic_store_explicit(&priv->reset_thread_active,
+ false, rte_memory_order_release);
+
+ DRV_LOG(INFO, "Sending RTE_ETH_EVENT_RECOVERY_FAILED for port %u",
+ priv->port_id);
+ rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_RECOVERY_FAILED, NULL);
+ return 0;
+}
+
+static void
+mana_reset_enter(struct mana_priv *priv)
+{
+ int ret;
+ int i;
+ struct rte_eth_dev *dev = &rte_eth_devices[priv->port_id];
+
+ /*
+ * Lock ownership: mana_intr_handler acquires reset_ops_lock,
+ * mana_reset_enter sets state/drains/spawns thread and releases it.
+ * The reset thread independently acquires/releases the lock for
+ * teardown and for the exit (re-probe) phase.
+ */
+
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_ENTER,
+ rte_memory_order_release);
+
+ DRV_LOG(DEBUG, "Entering into device reset state");
+ DRV_LOG(DEBUG, "Resetting dev = %p, priv = %p", dev, priv);
+
+ /* Set the blocked bit on each queue's burst_state so new bursts
+ * are rejected, then wait for any in-flight burst (bit 0) to finish.
+ */
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ if (rxq)
+ rte_atomic_fetch_or_explicit(&rxq->burst_state,
+ MANA_BURST_BLOCKED,
+ rte_memory_order_release);
+ if (txq)
+ rte_atomic_fetch_or_explicit(&txq->burst_state,
+ MANA_BURST_BLOCKED,
+ rte_memory_order_release);
+ }
+
+ /* Wait for all in-flight burst calls to finish (bit 0 to clear) */
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ if (rxq)
+ while (rte_atomic_load_explicit(&rxq->burst_state,
+ rte_memory_order_acquire) & 1)
+ rte_pause();
+ if (txq)
+ while (rte_atomic_load_explicit(&txq->burst_state,
+ rte_memory_order_acquire) & 1)
+ rte_pause();
+ }
+
+ DRV_LOG(DEBUG, "All data path threads drained");
+
+ /* Join previous reset thread if it completed but was not joined.
+ * Use CAS to avoid double-join if another path joined first.
+ * Don't use mana_join_reset_thread() here — we are already in
+ * RESET_ENTER state and must not change dev_state to ACTIVE.
+ */
+ {
+ bool expected = true;
+
+ if (rte_atomic_compare_exchange_strong_explicit(
+ &priv->reset_thread_active, &expected, false,
+ rte_memory_order_acq_rel,
+ rte_memory_order_acquire))
+ rte_thread_join(priv->reset_thread, NULL);
+ }
+
+ ret = rte_thread_create_internal_control(&priv->reset_thread,
+ "mana-reset",
+ mana_reset_thread, priv);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to create reset thread ret %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+ goto reset_failed;
+ }
+ rte_atomic_store_explicit(&priv->reset_thread_active,
+ true, rte_memory_order_release);
+
+ DRV_LOG(DEBUG, "Reset thread started");
+
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return;
+
+reset_failed:
+ mana_clear_burst_state(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+}
+
+static uint32_t
+mana_reset_exit_delay(void *arg)
+{
+ struct mana_priv *priv = (struct mana_priv *)arg;
+ uint32_t ret = 0;
+ int i;
+ struct rte_eth_dev *dev;
+ struct rte_pci_device *pci_dev;
+
+ DRV_LOG(DEBUG, "Delayed mana device reset complete processing");
+
+ /* If the app called dev_stop/dev_close during the timer window,
+ * state is no longer RESET_EXIT. Nothing to do.
+ */
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_RESET_EXIT) {
+ DRV_LOG(DEBUG, "State is not RESET_EXIT, skipping");
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+ }
+
+ dev = &rte_eth_devices[priv->port_id];
+ pci_dev = RTE_CLASS_TO_BUS_DEVICE(dev, *pci_dev);
+
+ DRV_LOG(DEBUG, "Resetting dev = %p, priv = %p", dev, priv);
+
+ ret = ibv_close_device(priv->ib_ctx);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to close ibv device %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+ goto out;
+ }
+ priv->ib_ctx = NULL;
+
+ ret = mana_pci_probe(NULL, pci_dev);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to probe mana pci dev ret %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+ goto out;
+ }
+
+ /*
+ * Init the local MR caches.
+ */
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ ret = mana_mr_btree_init(&rxq->mr_btree,
+ MANA_MR_BTREE_PER_QUEUE_N,
+ rxq->socket);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to init RXQ %d MR btree "
+ "on socket %u, ret %d", i, rxq->socket, ret);
+ goto mr_init_failed_rxq;
+ }
+
+ ret = mana_mr_btree_init(&txq->mr_btree,
+ MANA_MR_BTREE_PER_QUEUE_N,
+ txq->socket);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to init TXQ %d MR btree "
+ "on socket %u, ret %d", i, txq->socket, ret);
+ goto mr_init_failed_txq;
+ }
+ }
+ DRV_LOG(DEBUG, "priv %p, num_queues %u", priv, priv->num_queues);
+
+ /* Start secondaries */
+ ret = mana_mp_req_on_rxtx(dev, MANA_MP_REQ_RESET_EXIT);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to start secondary processes ret = %d",
+ ret);
+ goto mr_init_failed_all;
+ }
+
+ ret = mana_dev_start(dev);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to start mana dev ret %d", ret);
+ goto mr_init_failed_all;
+ }
+
+ /* Clear per-queue burst_state before marking device active so
+ * data path CAS can succeed again.
+ */
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ if (rxq)
+ rte_atomic_store_explicit(&rxq->burst_state, 0,
+ rte_memory_order_release);
+ if (txq)
+ rte_atomic_store_explicit(&txq->burst_state, 0,
+ rte_memory_order_release);
+ }
+
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_ACTIVE,
+ rte_memory_order_release);
+
+ DRV_LOG(DEBUG, "Exiting the reset complete processing");
+ goto out;
+
+mr_init_failed_all:
+ i = priv->num_queues;
+ goto mr_init_failed_rxq;
+
+mr_init_failed_txq:
+ /* RXQ btree at index i was initialized, free it */
+ mana_mr_btree_free(&((struct mana_rxq *)
+ dev->data->rx_queues[i])->mr_btree);
+
+mr_init_failed_rxq:
+ /* Free all fully initialized btrees for indices < i */
+ for (int j = 0; j < i; j++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[j];
+ struct mana_txq *txq = dev->data->tx_queues[j];
+
+ mana_mr_btree_free(&rxq->mr_btree);
+ mana_mr_btree_free(&txq->mr_btree);
+ }
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+
+out:
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+
+ /* Clear before emitting callback — if the callback calls
+ * dev_stop/dev_close, mana_join_reset_thread must be a no-op
+ * to avoid self-join deadlock on the current thread.
+ */
+ rte_atomic_store_explicit(&priv->reset_thread_active,
+ false, rte_memory_order_release);
+
+ if (!ret) {
+ DRV_LOG(INFO, "Sending RTE_ETH_EVENT_RECOVERY_SUCCESS for port %u",
+ priv->port_id);
+ rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_RECOVERY_SUCCESS, NULL);
+ } else {
+ DRV_LOG(INFO, "Sending RTE_ETH_EVENT_RECOVERY_FAILED for port %u",
+ priv->port_id);
+ rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_RECOVERY_FAILED, NULL);
+ }
+ return ret;
+}
+
+static void
+mana_reset_exit(struct mana_priv *priv)
+{
+ int ret;
+
+ if (!priv) {
+ DRV_LOG(ERR, "Private structure invalid");
+ return;
+ }
+ DRV_LOG(DEBUG, "Entering into device reset complete processing");
+
+ rxq_intr_disable(priv);
+
+ /* Unregister the interrupt handler. Since mana_reset_exit is always
+ * called from mana_reset_thread (a non-interrupt thread), the
+ * interrupt source is inactive and rte_intr_callback_unregister
+ * succeeds directly.
+ */
+ if (priv->intr_handle) {
+ ret = rte_intr_callback_unregister(priv->intr_handle,
+ mana_intr_handler, priv);
+ if (ret < 0)
+ DRV_LOG(ERR, "Failed to unregister intr callback ret %d",
+ ret);
+ else
+ DRV_LOG(DEBUG, "%d intr callback(s) removed", ret);
+
+ rte_intr_instance_free(priv->intr_handle);
+ priv->intr_handle = NULL;
+ }
+
+ /* Proceed directly to reset exit delay (re-probe and restart).
+ * No need for a separate thread - we are already on
+ * mana_reset_thread which is a non-interrupt control thread.
+ */
+ mana_reset_exit_delay(priv);
+}
+
+/*
+ * Interrupt handler from IB layer to notify this device is
+ * being removed or reset.
*/
static void
mana_intr_handler(void *arg)
{
struct mana_priv *priv = arg;
struct ibv_context *ctx = priv->ib_ctx;
- struct ibv_async_event event;
+ struct ibv_async_event event = { 0 };
+ struct rte_eth_dev *dev;
/* Read and ack all messages from IB device */
while (true) {
if (ibv_get_async_event(ctx, &event))
break;
- if (event.event_type == IBV_EVENT_DEVICE_FATAL) {
- struct rte_eth_dev *dev;
-
- dev = &rte_eth_devices[priv->port_id];
- if (dev->data->dev_conf.intr_conf.rmv)
+ switch (event.event_type) {
+ case IBV_EVENT_DEVICE_FATAL:
+ DRV_LOG(INFO, "IBV_EVENT_DEVICE_FATAL received, dev_state=%d",
+ (int)rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire));
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) == MANA_DEV_ACTIVE) {
+ /* Notify upper layers (e.g. netvsc) before
+ * acquiring the lock so they can switch data
+ * path before mana stops queues. Emitting
+ * outside the lock avoids deadlock if the
+ * callback calls dev_stop/dev_close.
+ */
+ dev = &rte_eth_devices[priv->port_id];
+ DRV_LOG(INFO,
+ "Sending RTE_ETH_EVENT_ERR_RECOVERING for port %u",
+ priv->port_id);
rte_eth_dev_callback_process(dev,
- RTE_ETH_EVENT_INTR_RMV, NULL);
+ RTE_ETH_EVENT_ERR_RECOVERING,
+ NULL);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+
+ /* Re-check after lock to avoid racing with
+ * mana_pci_remove_event_cb which may have
+ * set RESET_FAILED while we waited.
+ */
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) !=
+ MANA_DEV_ACTIVE) {
+ pthread_mutex_unlock(
+ &priv->reset_ops_lock);
+ break;
+ }
+
+ mana_reset_enter(priv);
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) ==
+ MANA_DEV_RESET_FAILED) {
+ DRV_LOG(INFO,
+ "Sending RTE_ETH_EVENT_RECOVERY_FAILED for port %u",
+ priv->port_id);
+ rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_RECOVERY_FAILED,
+ NULL);
+ }
+ } else {
+ DRV_LOG(ERR, "Already in reset handling, dev_state=%d",
+ (int)rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire));
+ }
+ break;
+
+ default:
+ break;
}
ibv_ack_async_event(&event);
@@ -1063,6 +1877,23 @@ static int
mana_intr_uninstall(struct mana_priv *priv)
{
int ret;
+ struct rte_eth_dev *dev;
+
+ if (!priv->intr_handle)
+ return 0;
+
+ /* Unregister PCI device removal event callback.
+ * Do not retry on -EAGAIN to avoid deadlock: the callback
+ * may be blocked waiting for reset_ops_lock which we hold.
+ */
+ dev = &rte_eth_devices[priv->port_id];
+ if (dev->device) {
+ ret = rte_dev_event_callback_unregister(dev->device->name,
+ mana_pci_remove_event_cb, priv);
+ if (ret < 0 && ret != -ENOENT)
+ DRV_LOG(WARNING, "Failed to unregister PCI remove cb ret %d",
+ ret);
+ }
ret = rte_intr_callback_unregister(priv->intr_handle,
mana_intr_handler, priv);
@@ -1072,6 +1903,7 @@ mana_intr_uninstall(struct mana_priv *priv)
}
rte_intr_instance_free(priv->intr_handle);
+ priv->intr_handle = NULL;
return 0;
}
@@ -1127,6 +1959,16 @@ mana_intr_install(struct rte_eth_dev *eth_dev, struct mana_priv *priv)
goto free_intr;
}
+ /* Register for PCI device removal events to distinguish
+ * PCI hot-remove from service reset. This requires the
+ * application to call rte_dev_event_monitor_start() for
+ * events to be delivered (e.g. testpmd --hot-plug-handling).
+ */
+ ret = rte_dev_event_callback_register(eth_dev->device->name,
+ mana_pci_remove_event_cb, priv);
+ if (ret)
+ DRV_LOG(WARNING, "Failed to register PCI remove event callback");
+
eth_dev->intr_handle = priv->intr_handle;
return 0;
@@ -1156,7 +1998,7 @@ mana_proc_priv_init(struct rte_eth_dev *dev)
/*
* Map the doorbell page for the secondary process through IB device handle.
*/
-static int
+int
mana_map_doorbell_secondary(struct rte_eth_dev *eth_dev, int fd)
{
struct mana_process_priv *priv = eth_dev->process_private;
@@ -1294,17 +2136,29 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
char name[RTE_ETH_NAME_MAX_LEN];
int ret;
struct ibv_context *ctx = NULL;
+ bool is_reset = false;
+ pthread_mutexattr_t mattr;
+ pthread_condattr_t cattr;
rte_ether_format_addr(address, sizeof(address), addr);
- DRV_LOG(INFO, "device located port %u address %s", port, address);
- priv = rte_zmalloc_socket(NULL, sizeof(*priv), RTE_CACHE_LINE_SIZE,
- SOCKET_ID_ANY);
- if (!priv)
- return -ENOMEM;
+ DRV_LOG(DEBUG, "device located port %u address %s", port, address);
snprintf(name, sizeof(name), "%s_port%d", pci_dev->device.name, port);
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev) {
+ is_reset = true;
+ priv = eth_dev->data->dev_private;
+ DRV_LOG(DEBUG, "Device reset for eth_dev %p priv %p",
+ eth_dev, priv);
+ } else {
+ priv = rte_zmalloc_socket(NULL, sizeof(*priv), RTE_CACHE_LINE_SIZE,
+ SOCKET_ID_ANY);
+ if (!priv)
+ return -ENOMEM;
+ }
+
if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
int fd;
@@ -1317,6 +2171,7 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
eth_dev->device = &pci_dev->device;
eth_dev->dev_ops = &mana_dev_secondary_ops;
+
ret = mana_proc_priv_init(eth_dev);
if (ret)
goto failed;
@@ -1336,7 +2191,7 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
goto failed;
}
- /* fd is no not used after mapping doorbell */
+ /* fd is not used after mapping doorbell */
close(fd);
eth_dev->tx_pkt_burst = mana_tx_burst;
@@ -1355,22 +2210,6 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
goto failed;
}
- eth_dev = rte_eth_dev_allocate(name);
- if (!eth_dev) {
- ret = -ENOMEM;
- goto failed;
- }
-
- eth_dev->data->mac_addrs =
- rte_calloc("mana_mac", 1,
- sizeof(struct rte_ether_addr), 0);
- if (!eth_dev->data->mac_addrs) {
- ret = -ENOMEM;
- goto failed;
- }
-
- rte_ether_addr_copy(addr, eth_dev->data->mac_addrs);
-
priv->ib_pd = ibv_alloc_pd(ctx);
if (!priv->ib_pd) {
DRV_LOG(ERR, "ibv_alloc_pd failed port %d", port);
@@ -1390,10 +2229,6 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
}
priv->ib_ctx = ctx;
- priv->port_id = eth_dev->data->port_id;
- priv->dev_port = port;
- eth_dev->data->dev_private = priv;
- priv->dev_data = eth_dev->data;
priv->max_rx_queues = dev_attr->orig_attr.max_qp;
priv->max_tx_queues = dev_attr->orig_attr.max_qp;
@@ -1415,23 +2250,73 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
name, priv->max_rx_queues, priv->max_rx_desc,
priv->max_send_sge, priv->max_mr_size);
- rte_eth_copy_pci_info(eth_dev, pci_dev);
+ if (!is_reset) {
+ eth_dev = rte_eth_dev_allocate(name);
+ if (!eth_dev) {
+ ret = -ENOMEM;
+ goto failed;
+ }
- /* Create async interrupt handler */
- ret = mana_intr_install(eth_dev, priv);
- if (ret) {
- DRV_LOG(ERR, "Failed to install intr handler");
- goto failed;
+ eth_dev->data->mac_addrs =
+ rte_calloc("mana_mac", 1,
+ sizeof(struct rte_ether_addr), 0);
+ if (!eth_dev->data->mac_addrs) {
+ ret = -ENOMEM;
+ goto failed;
+ }
+
+ rte_ether_addr_copy(addr, eth_dev->data->mac_addrs);
+ } else {
+ /*
+ * Reset path.
+ */
+ rte_ether_format_addr(address, RTE_ETHER_ADDR_FMT_SIZE,
+ eth_dev->data->mac_addrs);
+ DRV_LOG(DEBUG, "Found existing eth_dev %p with mac addr %s",
+ eth_dev, address);
+ DRV_LOG(DEBUG, "ib_ctx = %p", priv->ib_ctx);
+ goto out;
}
- eth_dev->device = &pci_dev->device;
+ priv->port_id = eth_dev->data->port_id;
+ priv->dev_port = port;
+ eth_dev->data->dev_private = priv;
+ priv->dev_data = eth_dev->data;
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_ACTIVE,
+ rte_memory_order_release);
+
+ rte_eth_copy_pci_info(eth_dev, pci_dev);
- DRV_LOG(INFO, "device %s at port %u", name, eth_dev->data->port_id);
+ pthread_mutexattr_init(&mattr);
+ pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
+ pthread_mutex_init(&priv->reset_ops_lock, &mattr);
+ pthread_mutex_init(&priv->reset_cond_mutex, &mattr);
+ pthread_mutexattr_destroy(&mattr);
+
+ pthread_condattr_init(&cattr);
+ pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED);
+ pthread_cond_init(&priv->reset_cond, &cattr);
+ pthread_condattr_destroy(&cattr);
+
+ eth_dev->device = &pci_dev->device;
eth_dev->rx_pkt_burst = mana_rx_burst_removed;
eth_dev->tx_pkt_burst = mana_tx_burst_removed;
eth_dev->dev_ops = &mana_dev_ops;
+out:
+ /* Create async interrupt handler */
+ ret = mana_intr_install(eth_dev, priv);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to install intr handler, ret %d", ret);
+ goto failed;
+ } else {
+ DRV_LOG(INFO, "mana_intr_install succeeded");
+ }
+
+ DRV_LOG(INFO, "device %s priv %p dev port %d at port %u",
+ name, priv, priv->dev_port, eth_dev->data->port_id);
+
rte_eth_dev_probing_finish(eth_dev);
return 0;
@@ -1439,20 +2324,29 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
failed:
/* Free the resource for the port failed */
if (priv) {
- if (priv->ib_parent_pd)
+ if (priv->ib_parent_pd) {
ibv_dealloc_pd(priv->ib_parent_pd);
+ priv->ib_parent_pd = NULL;
+ }
- if (priv->ib_pd)
+ if (priv->ib_pd) {
ibv_dealloc_pd(priv->ib_pd);
+ priv->ib_pd = NULL;
+ }
}
- if (eth_dev)
- rte_eth_dev_release_port(eth_dev);
+ if (!is_reset) {
+ if (eth_dev)
+ rte_eth_dev_release_port(eth_dev);
- rte_free(priv);
+ rte_free(priv);
+ }
- if (ctx)
+ if (ctx) {
ibv_close_device(ctx);
+ if (is_reset && priv)
+ priv->ib_ctx = NULL;
+ }
return ret;
}
@@ -1617,7 +2511,17 @@ mana_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
static int
mana_dev_uninit(struct rte_eth_dev *dev)
{
- return mana_dev_close(dev);
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ /* Join reset thread before teardown to ensure it has exited
+ * before we destroy the condvar/mutex in free_resources.
+ */
+ mana_join_reset_thread(priv);
+
+ ret = mana_dev_close(dev);
+ mana_dev_free_resources(dev);
+ return ret;
}
/*
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 79cc47b6ab..a7b301484a 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -5,6 +5,8 @@
#ifndef __MANA_H__
#define __MANA_H__
+#include <pthread.h>
+
#define PCI_VENDOR_ID_MICROSOFT 0x1414
#define PCI_DEVICE_ID_MICROSOFT_MANA_PF 0x00b9
#define PCI_DEVICE_ID_MICROSOFT_MANA 0x00ba
@@ -337,6 +339,26 @@ struct mana_process_priv {
void *db_page;
};
+enum mana_device_state {
+ /* Normal running */
+ MANA_DEV_ACTIVE = 0,
+ /* In reset enter processing */
+ MANA_DEV_RESET_ENTER = 1,
+ /*
+ * Reset enter processing completed.
+ * Waiting for reset exit or in reset exit processing.
+ */
+ MANA_DEV_RESET_EXIT = 2,
+ /* Reset failed */
+ MANA_DEV_RESET_FAILED = 3,
+};
+
+/* burst_state bit layout:
+ * Bit 0: in-burst (set by data path CAS 0→1, cleared on exit).
+ * Bit 1: blocked (set by reset path to reject new bursts).
+ */
+#define MANA_BURST_BLOCKED 2
+
struct mana_priv {
struct rte_eth_dev_data *dev_data;
struct mana_process_priv *process_priv;
@@ -368,6 +390,15 @@ struct mana_priv {
uint64_t max_mr_size;
struct mana_mr_btree mr_btree;
rte_spinlock_t mr_btree_lock;
+ RTE_ATOMIC(enum mana_device_state) dev_state;
+ /* mutex for synchronizing mana reset and some mana_dev_ops callbacks */
+ pthread_mutex_t reset_ops_lock;
+ /* Reset thread ID, valid when reset_thread_active is true */
+ rte_thread_t reset_thread;
+ RTE_ATOMIC(bool) reset_thread_active;
+ /* Condvar to wake reset thread early on PCI remove */
+ pthread_mutex_t reset_cond_mutex;
+ pthread_cond_t reset_cond;
};
struct mana_txq_desc {
@@ -427,6 +458,14 @@ struct mana_txq {
struct mana_mr_btree mr_btree;
struct mana_stats stats;
unsigned int socket;
+ unsigned int txq_idx;
+
+ /*
+ * Bit 0: in-burst flag (set by data path, cleared on exit).
+ * Bit 1: blocked flag (set by reset path via fetch_or).
+ * Data path CAS 0→1 to enter; fails if blocked bit is set.
+ */
+ RTE_ATOMIC(uint32_t) burst_state;
};
struct mana_rxq {
@@ -462,6 +501,14 @@ struct mana_rxq {
struct mana_mr_btree mr_btree;
unsigned int socket;
+ unsigned int rxq_idx;
+
+ /*
+ * Bit 0: in-burst flag (set by data path, cleared on exit).
+ * Bit 1: blocked flag (set by reset path via fetch_or).
+ * Data path CAS 0→1 to enter; fails if blocked bit is set.
+ */
+ RTE_ATOMIC(uint32_t) burst_state;
};
extern int mana_logtype_driver;
@@ -543,6 +590,8 @@ enum mana_mp_req_type {
MANA_MP_REQ_CREATE_MR,
MANA_MP_REQ_START_RXTX,
MANA_MP_REQ_STOP_RXTX,
+ MANA_MP_REQ_RESET_ENTER,
+ MANA_MP_REQ_RESET_EXIT,
};
/* Pameters for IPC. */
@@ -563,8 +612,9 @@ void mana_mp_uninit_primary(void);
void mana_mp_uninit_secondary(void);
int mana_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
int mana_mp_req_mr_create(struct mana_priv *priv, uintptr_t addr, uint32_t len);
+int mana_map_doorbell_secondary(struct rte_eth_dev *eth_dev, int fd);
-void mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type);
+int mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type);
void *mana_alloc_verbs_buf(size_t size, void *data);
void mana_free_verbs_buf(void *ptr, void *data __rte_unused);
diff --git a/drivers/net/mana/mp.c b/drivers/net/mana/mp.c
index 72417fc0c7..1161ebd71c 100644
--- a/drivers/net/mana/mp.c
+++ b/drivers/net/mana/mp.c
@@ -2,10 +2,13 @@
* Copyright 2022 Microsoft Corporation
*/
+#include <sys/mman.h>
#include <rte_malloc.h>
#include <ethdev_driver.h>
#include <rte_log.h>
+#include <rte_eal_paging.h>
#include <stdlib.h>
+#include <unistd.h>
#include <infiniband/verbs.h>
@@ -119,6 +122,23 @@ mana_mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
return ret;
}
+static int
+mana_mp_reset_enter(struct rte_eth_dev *dev)
+{
+ struct mana_process_priv *proc_priv = dev->process_private;
+
+ void *addr = proc_priv->db_page;
+
+ /* Reset the db_page to NULL */
+ proc_priv->db_page = NULL;
+
+ if (addr)
+ (void)munmap(addr, rte_mem_page_size());
+
+ DRV_LOG(DEBUG, "Secondary doorbell pages unmapped");
+ return 0;
+}
+
static int
mana_mp_secondary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
{
@@ -171,6 +191,49 @@ mana_mp_secondary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
ret = rte_mp_reply(&mp_res, peer);
break;
+ case MANA_MP_REQ_RESET_ENTER:
+ DRV_LOG(INFO, "Port %u reset enter", dev->data->port_id);
+ res->result = mana_mp_reset_enter(dev);
+
+ ret = rte_mp_reply(&mp_res, peer);
+ break;
+
+ case MANA_MP_REQ_RESET_EXIT:
+ DRV_LOG(INFO, "Port %u reset exit", dev->data->port_id);
+ {
+ struct mana_process_priv *proc_priv =
+ dev->process_private;
+
+ if (proc_priv->db_page != NULL) {
+ DRV_LOG(DEBUG,
+ "Secondary doorbell already "
+ "mapped to %p",
+ proc_priv->db_page);
+ res->result = 0;
+ } else if (mp_msg->num_fds < 1) {
+ DRV_LOG(ERR,
+ "No FD in RESET_EXIT message");
+ res->result = -EINVAL;
+ } else {
+ int fd = mp_msg->fds[0];
+
+ ret = mana_map_doorbell_secondary(dev,
+ fd);
+ if (ret) {
+ DRV_LOG(ERR,
+ "Failed secondary "
+ "doorbell map %d",
+ fd);
+ res->result = -ENODEV;
+ } else {
+ res->result = 0;
+ }
+ close(fd);
+ }
+ }
+ ret = rte_mp_reply(&mp_res, peer);
+ break;
+
default:
DRV_LOG(ERR, "Port %u unknown secondary MP type %u",
param->port_id, param->type);
@@ -254,7 +317,7 @@ mana_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev)
}
ret = mp_res->fds[0];
- DRV_LOG(ERR, "port %u command FD from primary is %d",
+ DRV_LOG(DEBUG, "port %u command FD from primary is %d",
dev->data->port_id, ret);
exit:
free(mp_rep.msgs);
@@ -298,27 +361,36 @@ mana_mp_req_mr_create(struct mana_priv *priv, uintptr_t addr, uint32_t len)
return ret;
}
-void
+int
mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type)
{
struct rte_mp_msg mp_req = { 0 };
struct rte_mp_msg *mp_res;
- struct rte_mp_reply mp_rep;
+ struct rte_mp_reply mp_rep = { 0 };
struct mana_mp_param *res;
struct timespec ts = {.tv_sec = MANA_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
- int i, ret;
+ int i, ret = 0;
- if (type != MANA_MP_REQ_START_RXTX && type != MANA_MP_REQ_STOP_RXTX) {
+ if (type != MANA_MP_REQ_START_RXTX && type != MANA_MP_REQ_STOP_RXTX &&
+ type != MANA_MP_REQ_RESET_ENTER && type != MANA_MP_REQ_RESET_EXIT) {
DRV_LOG(ERR, "port %u unknown request (req_type %d)",
dev->data->port_id, type);
- return;
+ return -EINVAL;
}
if (rte_atomic_load_explicit(&mana_shared_data->secondary_cnt, rte_memory_order_relaxed) == 0)
- return;
+ return 0;
mp_init_msg(&mp_req, type, dev->data->port_id);
+ /* Include IB cmd FD for secondary doorbell remap */
+ if (type == MANA_MP_REQ_RESET_EXIT) {
+ struct mana_priv *priv = dev->data->dev_private;
+
+ mp_req.num_fds = 1;
+ mp_req.fds[0] = priv->ib_ctx->cmd_fd;
+ }
+
ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
if (ret) {
if (rte_errno != ENOTSUP)
@@ -329,6 +401,7 @@ mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type)
if (mp_rep.nb_sent != mp_rep.nb_received) {
DRV_LOG(ERR, "port %u not all secondaries responded (%d)",
dev->data->port_id, type);
+ ret = -ETIMEDOUT;
goto exit;
}
for (i = 0; i < mp_rep.nb_received; i++) {
@@ -337,9 +410,11 @@ mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type)
if (res->result) {
DRV_LOG(ERR, "port %u request failed on secondary %d",
dev->data->port_id, i);
+ ret = res->result;
goto exit;
}
}
exit:
free(mp_rep.msgs);
+ return ret;
}
diff --git a/drivers/net/mana/mr.c b/drivers/net/mana/mr.c
index c4045141bc..8914f4cf04 100644
--- a/drivers/net/mana/mr.c
+++ b/drivers/net/mana/mr.c
@@ -314,8 +314,10 @@ mana_mr_btree_init(struct mana_mr_btree *bt, int n, int socket)
void
mana_mr_btree_free(struct mana_mr_btree *bt)
{
- rte_free(bt->table);
- memset(bt, 0, sizeof(*bt));
+ if (bt && bt->table) {
+ rte_free(bt->table);
+ memset(bt, 0, sizeof(*bt));
+ }
}
int
diff --git a/drivers/net/mana/rx.c b/drivers/net/mana/rx.c
index 1b8ba1f3a9..aedb05d46f 100644
--- a/drivers/net/mana/rx.c
+++ b/drivers/net/mana/rx.c
@@ -36,6 +36,11 @@ mana_rq_ring_doorbell(struct mana_rxq *rxq)
db_page = process_priv->db_page;
}
+ if (!db_page) {
+ DP_LOG(ERR, "db_page is NULL, cannot ring RX doorbell");
+ return -EINVAL;
+ }
+
/* Hardware Spec specifies that software client should set 0 for
* wqe_cnt for Receive Queues.
*/
@@ -172,7 +177,7 @@ mana_stop_rx_queues(struct rte_eth_dev *dev)
for (i = 0; i < priv->num_queues; i++)
if (dev->data->rx_queue_state[i] == RTE_ETH_QUEUE_STATE_STOPPED)
- return -EINVAL;
+ return 0;
if (priv->rwq_qp) {
ret = ibv_destroy_qp(priv->rwq_qp);
@@ -256,6 +261,9 @@ mana_start_rx_queues(struct rte_eth_dev *dev)
struct mana_rxq *rxq = dev->data->rx_queues[i];
struct ibv_wq_init_attr wq_attr = {};
+ rxq->rxq_idx = i;
+ DRV_LOG(DEBUG, "assigning rxq_idx to %d", i);
+
manadv_set_context_attr(priv->ib_ctx,
MANADV_CTX_ATTR_BUF_ALLOCATORS,
(void *)((uintptr_t)&(struct manadv_ctx_allocators){
@@ -451,6 +459,16 @@ mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
uint32_t pkt_len;
uint32_t i;
int polled = 0;
+ uint32_t expected = 0;
+
+ /* Single atomic CAS: enter burst only if device is active (0→1).
+ * Fails immediately if reset path has set the blocked bit.
+ */
+ if (unlikely(!rte_atomic_compare_exchange_strong_explicit(
+ &rxq->burst_state, &expected, 1,
+ rte_memory_order_acquire,
+ rte_memory_order_relaxed)))
+ return 0;
repoll:
/* Polling on new completions if we have no backlog */
@@ -592,6 +610,9 @@ mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
wqe_consumed, ret);
}
+ rte_atomic_fetch_and_explicit(&rxq->burst_state, ~(uint32_t)1,
+ rte_memory_order_release);
+
return pkt_received;
}
diff --git a/drivers/net/mana/tx.c b/drivers/net/mana/tx.c
index 57dbbc3651..10f2212b5d 100644
--- a/drivers/net/mana/tx.c
+++ b/drivers/net/mana/tx.c
@@ -17,7 +17,7 @@ mana_stop_tx_queues(struct rte_eth_dev *dev)
for (i = 0; i < priv->num_queues; i++)
if (dev->data->tx_queue_state[i] == RTE_ETH_QUEUE_STATE_STOPPED)
- return -EINVAL;
+ return 0;
for (i = 0; i < priv->num_queues; i++) {
struct mana_txq *txq = dev->data->tx_queues[i];
@@ -83,6 +83,9 @@ mana_start_tx_queues(struct rte_eth_dev *dev)
txq = dev->data->tx_queues[i];
+ txq->txq_idx = i;
+ DRV_LOG(DEBUG, "assigning txq_idx to %d", txq->txq_idx);
+
manadv_set_context_attr(priv->ib_ctx,
MANADV_CTX_ATTR_BUF_ALLOCATORS,
(void *)((uintptr_t)&(struct manadv_ctx_allocators){
@@ -190,10 +193,34 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
void *db_page;
uint16_t pkt_sent = 0;
uint32_t num_comp, i;
+ uint32_t expected = 0;
#ifdef RTE_ARCH_32
uint32_t wqe_count = 0;
#endif
+ db_page = priv->db_page;
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ struct rte_eth_dev *dev =
+ &rte_eth_devices[priv->dev_data->port_id];
+ struct mana_process_priv *process_priv = dev->process_private;
+
+ db_page = process_priv->db_page;
+ }
+
+ /* Single atomic CAS: enter burst only if device is active (0→1).
+ * Fails immediately if reset path has set the blocked bit.
+ */
+ if (unlikely(!rte_atomic_compare_exchange_strong_explicit(
+ &txq->burst_state, &expected, 1,
+ rte_memory_order_acquire,
+ rte_memory_order_relaxed) || !db_page)) {
+ if (!expected) /* CAS succeeded but db_page NULL — undo */
+ rte_atomic_fetch_and_explicit(&txq->burst_state,
+ ~(uint32_t)1,
+ rte_memory_order_release);
+ return 0;
+ }
+
/* Process send completions from GDMA */
num_comp = gdma_poll_completion_queue(&txq->gdma_cq,
txq->gdma_comp_buf, txq->num_desc);
@@ -216,7 +243,8 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
}
if (!desc->pkt) {
- DP_LOG(ERR, "mana_txq_desc has a NULL pkt");
+ DP_LOG(ERR, "mana_txq_desc has a NULL pkt, priv %p, "
+ "txq = %d", priv, txq->txq_idx);
} else {
txq->stats.bytes += desc->pkt->pkt_len;
rte_pktmbuf_free(desc->pkt);
@@ -474,15 +502,6 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
}
/* Ring hardware door bell */
- db_page = priv->db_page;
- if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
- struct rte_eth_dev *dev =
- &rte_eth_devices[priv->dev_data->port_id];
- struct mana_process_priv *process_priv = dev->process_private;
-
- db_page = process_priv->db_page;
- }
-
if (pkt_sent) {
#ifdef RTE_ARCH_32
ret = mana_ring_short_doorbell(db_page, GDMA_QUEUE_SEND,
@@ -501,5 +520,8 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
DP_LOG(ERR, "mana_ring_doorbell failed ret %d", ret);
}
+ rte_atomic_fetch_and_explicit(&txq->burst_state, ~(uint32_t)1,
+ rte_memory_order_release);
+
return pkt_sent;
}
--
2.34.1
^ permalink raw reply related
* [PATCH v8 0/1] net/mana: add device reset support
From: Wei Hu @ 2026-06-10 7:21 UTC (permalink / raw)
To: dev, stephen; +Cc: longli, weh
From: Wei Hu <weh@microsoft.com>
Add support for handling hardware service reset events in the
MANA driver. When the MANA kernel driver receives a hardware
service event, it initiates a device reset and notifies userspace
via IBV_EVENT_DEVICE_FATAL. The MANA PMD handles this by
performing an automatic teardown and recovery sequence.
The driver uses ethdev recovery events (ERR_RECOVERING,
RECOVERY_SUCCESS, RECOVERY_FAILED) to notify upper layers of
the reset lifecycle, and a PCI device removal event callback
to distinguish hot-remove from service reset.
Changes since v7:
- Moved heavy teardown (dev_stop, IPC to secondaries, dev_close,
MR btree free) from mana_reset_enter (EAL interrupt thread)
to mana_reset_thread (control thread). The interrupt handler
now only sets state, drains in-flight bursts, and spawns the
thread. Teardown runs immediately in the control thread before
the recovery timer wait, avoiding blocking the interrupt thread
on multi-second IPC timeouts and ibverbs calls. Each function
now owns its own lock scope with no lock hand-off between
threads.
- Fixed self-join deadlock: clear reset_thread_active before
emitting RECOVERY_SUCCESS/FAILED callbacks from the reset
thread. Without this, if the callback calls dev_stop/dev_close,
mana_join_reset_thread attempts to join the current thread.
- Simplified burst_state from encoding device state in bits 1+
to a single blocked flag (bit 1). Only one value was ever
stored, so the multi-state encoding was misleading. Added
MANA_BURST_BLOCKED constant.
- Updated mana.rst to reflect that teardown runs on the control
thread, not the interrupt handler.
Changes since v6:
- Rebased onto latest upstream for-main
- Replaced removed RTE_ETH_DEV_TO_PCI macro with
RTE_CLASS_TO_BUS_DEVICE (upstream commit 4757b8df04
removed the old bus-specific ethdev convenience macros)
Changes since v5:
- Replaced RCU QSBR with per-queue atomic burst_state using a
single-variable CAS design: bit 0 is the in-burst flag, bit 1
is the blocked flag. The data path uses CAS(0→1) to enter
burst and fetch_and(~1) to exit. The reset path uses fetch_or
to set the blocked bit and polls bit 0 to drain in-flight
bursts. This eliminates the two-variable Dekker pattern and the
need for sequential consistency (seq_cst) ordering.
- Removed librte_rcu dependency
- Removed __rte_no_thread_safety_analysis annotations (no longer
needed after mutex conversion)
- Moved ERR_RECOVERING event emission before acquiring
reset_ops_lock and before mana_reset_enter, so upper layers
(e.g. netvsc) can switch data path before mana stops queues.
Emitting outside the lock avoids deadlock if the callback
calls dev_stop or dev_close.
- Replaced MANA_OPS_*_LOCK macros with mana_reset_trylock()
helper function and explicit per-operation wrappers
- Removed unused rte_alarm.h and rte_lock_annotations.h includes
- Added RECOVERY_FAILED event when mana_reset_enter fails
internally, so the application always receives a terminal event
- Added mana_clear_burst_state() helper to clear per-queue
burst_state on failure paths (reset_failed, dev_stop_lock,
dev_close_lock) preventing permanent silent packet drop after
a failed reset
Changes since v4:
- Fixed stale rte_spinlock_unlock call in mana_intr_handler that
was missed during the spinlock-to-mutex conversion, causing a
-Wincompatible-pointer-types warning
Changes since v3:
- Converted reset_ops_lock from rte_spinlock_t to pthread_mutex_t
with PTHREAD_PROCESS_SHARED, since the lock is held across
blocking IB verbs calls and IPC with 5s timeout
- Removed rte_dev_event_callback_unregister retry loop to avoid
deadlock: the callback itself blocks on reset_ops_lock, so
retrying on -EAGAIN while holding the lock is a deadlock
- Introduced mana_join_reset_thread() helper using CAS on
reset_thread_active to prevent double-join undefined behavior
- Added reset thread join in mana_dev_uninit to prevent thread
leak on device removal
- Fixed ibv handle leak: priv->ib_ctx is now only set to NULL
after ibv_close_device succeeds
- Fixed misleading "All secondary threads are quiescent" log in
mana_mp_reset_enter — changed to "Secondary doorbell pages
unmapped" since actual quiescence is enforced by the primary's
per-queue atomic flag check before IPC is sent
- Changed event list in mana.rst to RST definition list style
- Squashed documentation into the feature patch per convention
Changes since v2:
- Fixed dev_state_qsv memory leak on device removal
- Fixed reset thread TCB/stack leak: reset_thread_active is now
only cleared by the joiner, not the thread itself
- Fixed second reset crash: removed reset thread join logic from
mana_dev_close (inner function) to avoid corrupting dev_state
when called from mana_reset_enter
- Made reset_thread_active RTE_ATOMIC(bool) with explicit ordering
- Added retry loop for rte_dev_event_callback_unregister on -EAGAIN
- Initialized condvar/mutex with PTHREAD_PROCESS_SHARED since priv
is in hugepage shared memory
- Added re-check of dev_state after lock acquisition in
mana_intr_handler to prevent racing with pci_remove_event_cb
- Replaced (void *)0 with NULL in mp.c
- Added lock ownership comment block at mana_reset_enter
- Documented rte_dev_event_monitor_start() requirement
- Added mana.rst documentation and release note
Changes since v1:
- Removed net/netvsc patch from this series
- Simplified reset exit: mana_reset_exit calls
mana_reset_exit_delay directly instead of spawning a thread
- Added __rte_no_thread_safety_analysis annotations for clang
- Switched to rte_thread_create_internal_control
- Fixed declaration-after-statement style issues
- Removed unnecessary blank lines and stale comments
Wei Hu (1):
net/mana: add device reset support
doc/guides/nics/mana.rst | 40 +
doc/guides/rel_notes/release_26_07.rst | 8 +
drivers/net/mana/mana.c | 1076 ++++++++++++++++++++++--
drivers/net/mana/mana.h | 52 +-
drivers/net/mana/mp.c | 89 +-
drivers/net/mana/mr.c | 6 +-
drivers/net/mana/rx.c | 23 +-
drivers/net/mana/tx.c | 44 +-
8 files changed, 1230 insertions(+), 108 deletions(-)
--
2.34.1
^ permalink raw reply
* [PATCH v5] ethdev: support inline calculating masked item value
From: Bing Zhao @ 2026-06-10 5:27 UTC (permalink / raw)
To: viacheslavo, dev, rasland, stephen
Cc: orika, dsosnowski, suanmingm, matan, thomas
In-Reply-To: <20260603092805.9837-1-bingz@nvidia.com>
In the asynchronous API definition and some drivers, the
rte_flow_item spec value may not be calculated by the driver due to the
reason of speed of light rule insertion rate and sometimes the input
parameters will be copied and changed internally.
After copying, the spec and last will be protected by the keyword
const and cannot be changed in the code itself. And also the driver
needs some extra memory to do the calculation and extra conditions
to understand the length of each item spec. This is not efficient.
To solve the issue and support usage of the following fix, a new OP
was introduced to calculate the spec and last values after applying
the mask inline.
Signed-off-by: Bing Zhao <bingz@nvidia.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
v3:
- add test code
- fix the issue found by AI
v4: reabse on top of the main
v5: handle some items separately and add test for them
---
app/test/test_ethdev_api.c | 76 ++++++++++++++++++++++++++
doc/guides/rel_notes/release_26_07.rst | 6 ++
lib/ethdev/rte_flow.c | 46 ++++++++++++++--
lib/ethdev/rte_flow.h | 13 +++++
4 files changed, 135 insertions(+), 6 deletions(-)
diff --git a/app/test/test_ethdev_api.c b/app/test/test_ethdev_api.c
index 76afd0345c..5cae1cdc1d 100644
--- a/app/test/test_ethdev_api.c
+++ b/app/test/test_ethdev_api.c
@@ -4,6 +4,7 @@
#include <rte_log.h>
#include <rte_ethdev.h>
+#include <rte_flow.h>
#include <rte_test.h>
#include "test.h"
@@ -15,6 +16,80 @@
#define NUM_MBUF 1024
#define MBUF_CACHE_SIZE 256
+static int32_t
+ethdev_api_flow_conv_pattern_masked(void)
+{
+ const struct rte_flow_item_eth spec = {
+ .hdr.dst_addr.addr_bytes = { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06 },
+ .hdr.src_addr.addr_bytes = { 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f },
+ .hdr.ether_type = RTE_BE16(0x1234),
+ };
+ const struct rte_flow_item_eth last = {
+ .hdr.dst_addr.addr_bytes = { 0x11, 0x12, 0x13, 0x14, 0x15, 0x16 },
+ .hdr.src_addr.addr_bytes = { 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f },
+ .hdr.ether_type = RTE_BE16(0x5678),
+ };
+ const struct rte_flow_item_eth mask = {
+ .hdr.dst_addr.addr_bytes = { 0xff, 0xff, 0x00, 0x00, 0xff, 0xff },
+ .hdr.src_addr.addr_bytes = { 0xff, 0x00, 0xff, 0x00, 0xff, 0x00 },
+ .hdr.ether_type = RTE_BE16(0xffff),
+ };
+ const struct rte_flow_item pattern[] = {
+ {
+ .type = RTE_FLOW_ITEM_TYPE_ETH,
+ .spec = &spec,
+ .last = &last,
+ .mask = &mask,
+ },
+ { .type = RTE_FLOW_ITEM_TYPE_END },
+ };
+ union {
+ struct rte_flow_item item;
+ struct rte_flow_item_eth eth;
+ double align;
+ uint8_t raw[256];
+ } dst;
+ const struct rte_flow_item *item;
+ const struct rte_flow_item_eth *conv_spec;
+ const struct rte_flow_item_eth *conv_last;
+ int ret;
+
+ ret = rte_flow_conv(RTE_FLOW_CONV_OP_PATTERN_MASKED, NULL, 0, pattern, NULL);
+ TEST_ASSERT(ret > 0, "Masked pattern conversion size query failed");
+ TEST_ASSERT((size_t)ret <= sizeof(dst.raw),
+ "Masked pattern conversion needs too much storage");
+
+ memset(&dst, 0, sizeof(dst));
+ ret = rte_flow_conv(RTE_FLOW_CONV_OP_PATTERN_MASKED, dst.raw,
+ sizeof(dst.raw), pattern, NULL);
+ TEST_ASSERT(ret > 0, "Masked pattern conversion failed");
+
+ item = (const struct rte_flow_item *)dst.raw;
+ conv_spec = item[0].spec;
+ conv_last = item[0].last;
+ TEST_ASSERT_NOT_NULL(conv_spec, "Converted spec must be set");
+ TEST_ASSERT_NOT_NULL(conv_last, "Converted last must be set");
+
+ TEST_ASSERT_EQUAL(conv_spec->hdr.dst_addr.addr_bytes[0], 0x01,
+ "Masked spec dst byte 0 mismatch");
+ TEST_ASSERT_EQUAL(conv_spec->hdr.dst_addr.addr_bytes[2], 0x00,
+ "Masked spec dst byte 2 mismatch");
+ TEST_ASSERT_EQUAL(conv_spec->hdr.src_addr.addr_bytes[1], 0x00,
+ "Masked spec src byte 1 mismatch");
+ TEST_ASSERT_EQUAL(conv_spec->hdr.ether_type, RTE_BE16(0x1234),
+ "Masked spec ether type mismatch");
+ TEST_ASSERT_EQUAL(conv_last->hdr.dst_addr.addr_bytes[0], 0x11,
+ "Masked last dst byte 0 mismatch");
+ TEST_ASSERT_EQUAL(conv_last->hdr.dst_addr.addr_bytes[2], 0x00,
+ "Masked last dst byte 2 mismatch");
+ TEST_ASSERT_EQUAL(conv_last->hdr.src_addr.addr_bytes[1], 0x00,
+ "Masked last src byte 1 mismatch");
+ TEST_ASSERT_EQUAL(conv_last->hdr.ether_type, RTE_BE16(0x5678),
+ "Masked last ether type mismatch");
+
+ return TEST_SUCCESS;
+}
+
static int32_t
ethdev_api_queue_status(void)
{
@@ -167,6 +242,7 @@ static struct unit_test_suite ethdev_api_testsuite = {
.setup = NULL,
.teardown = NULL,
.unit_test_cases = {
+ TEST_CASE(ethdev_api_flow_conv_pattern_masked),
TEST_CASE(ethdev_api_queue_status),
/* TODO: Add deferred_start queue status test */
TEST_CASES_END() /**< NULL terminate unit test array */
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index b5285af5fe..4f5d21d576 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -190,6 +190,12 @@ API Changes
- ``rte_pmd_mlx5_enable_steering``
- ``rte_pmd_mlx5_disable_steering``
+* ethdev: Added masked pattern conversion.
+
+ Added ``RTE_FLOW_CONV_OP_PATTERN_MASKED`` to ``rte_flow_conv()``
+ to copy an entire pattern while applying each item's mask to its
+ ``spec`` and ``last`` fields.
+
ABI Changes
-----------
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index ec0fe08355..c7a94a1194 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -178,6 +178,14 @@ static const struct rte_flow_desc_data rte_flow_desc_item[] = {
MK_FLOW_ITEM(COMPARE, sizeof(struct rte_flow_item_compare)),
};
+static inline size_t
+rte_flow_conv_item_mask_size(const struct rte_flow_item *item)
+{
+ if ((int)item->type >= 0)
+ return rte_flow_desc_item[item->type].size;
+ return sizeof(void *);
+}
+
/** Generate flow_action[] entry. */
#define MK_FLOW_ACTION(t, s) \
[RTE_FLOW_ACTION_TYPE_ ## t] = { \
@@ -835,6 +843,8 @@ rte_flow_conv_action_conf(void *buf, const size_t size,
* RTE_FLOW_ITEM_TYPE_END is encountered.
* @param[out] error
* Perform verbose error reporting if not NULL.
+ * @param[in] with_mask
+ * If true, @p src mask will be applied to spec and last.
*
* @return
* A positive value representing the number of bytes needed to store
@@ -847,12 +857,13 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
const size_t size,
const struct rte_flow_item *src,
unsigned int num,
+ bool with_mask,
struct rte_flow_error *error)
{
uintptr_t data = (uintptr_t)dst;
size_t off;
size_t ret;
- unsigned int i;
+ unsigned int i, j;
for (i = 0, off = 0; !num || i != num; ++i, ++src, ++dst) {
/**
@@ -876,15 +887,27 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
src -= num;
dst -= num;
do {
+ uint8_t *c_spec = NULL, *c_last = NULL;
+ const uint8_t *mask = src->mask;
+ size_t item_mask_size = mask ? rte_flow_conv_item_mask_size(src) : 0;
+
if (src->spec) {
off = RTE_ALIGN_CEIL(off, sizeof(double));
ret = rte_flow_conv_item_spec
((void *)(data + off),
size > off ? size - off : 0, src,
RTE_FLOW_CONV_ITEM_SPEC);
- if (size && size >= off + ret)
+ if (size && size >= off + ret) {
dst->spec = (void *)(data + off);
+ c_spec = (uint8_t *)(data + off);
+ }
off += ret;
+ if (with_mask && c_spec && mask) {
+ size_t mask_size = RTE_MIN(ret, item_mask_size);
+
+ for (j = 0; j < mask_size; j++)
+ c_spec[j] &= mask[j];
+ }
}
if (src->last) {
@@ -893,9 +916,17 @@ rte_flow_conv_pattern(struct rte_flow_item *dst,
((void *)(data + off),
size > off ? size - off : 0, src,
RTE_FLOW_CONV_ITEM_LAST);
- if (size && size >= off + ret)
+ if (size && size >= off + ret) {
dst->last = (void *)(data + off);
+ c_last = (uint8_t *)(data + off);
+ }
off += ret;
+ if (with_mask && c_last && mask) {
+ size_t mask_size = RTE_MIN(ret, item_mask_size);
+
+ for (j = 0; j < mask_size; j++)
+ c_last[j] &= mask[j];
+ }
}
if (src->mask) {
off = RTE_ALIGN_CEIL(off, sizeof(double));
@@ -1042,7 +1073,7 @@ rte_flow_conv_rule(struct rte_flow_conv_rule *dst,
off = RTE_ALIGN_CEIL(off, sizeof(double));
ret = rte_flow_conv_pattern((void *)((uintptr_t)dst + off),
size > off ? size - off : 0,
- src->pattern_ro, 0, error);
+ src->pattern_ro, 0, false, error);
if (ret < 0)
return ret;
if (size && size >= off + (size_t)ret)
@@ -1143,7 +1174,7 @@ rte_flow_conv(enum rte_flow_conv_op op,
ret = sizeof(*attr);
break;
case RTE_FLOW_CONV_OP_ITEM:
- ret = rte_flow_conv_pattern(dst, size, src, 1, error);
+ ret = rte_flow_conv_pattern(dst, size, src, 1, false, error);
break;
case RTE_FLOW_CONV_OP_ITEM_MASK:
item = src;
@@ -1158,7 +1189,7 @@ rte_flow_conv(enum rte_flow_conv_op op,
ret = rte_flow_conv_actions(dst, size, src, 1, error);
break;
case RTE_FLOW_CONV_OP_PATTERN:
- ret = rte_flow_conv_pattern(dst, size, src, 0, error);
+ ret = rte_flow_conv_pattern(dst, size, src, 0, false, error);
break;
case RTE_FLOW_CONV_OP_ACTIONS:
ret = rte_flow_conv_actions(dst, size, src, 0, error);
@@ -1178,6 +1209,9 @@ rte_flow_conv(enum rte_flow_conv_op op,
case RTE_FLOW_CONV_OP_ACTION_NAME_PTR:
ret = rte_flow_conv_name(1, 1, dst, size, src, error);
break;
+ case RTE_FLOW_CONV_OP_PATTERN_MASKED:
+ ret = rte_flow_conv_pattern(dst, size, src, 0, true, error);
+ break;
default:
ret = rte_flow_error_set
(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index b495409406..959a2f903b 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -4556,6 +4556,19 @@ enum rte_flow_conv_op {
* @code const char ** @endcode
*/
RTE_FLOW_CONV_OP_ACTION_NAME_PTR,
+
+ /**
+ * Convert an entire pattern.
+ *
+ * Duplicates all pattern items at once, applying @p mask to @p spec
+ * and @p last.
+ *
+ * - @p src type:
+ * @code const struct rte_flow_item * @endcode
+ * - @p dst type:
+ * @code struct rte_flow_item * @endcode
+ */
+ RTE_FLOW_CONV_OP_PATTERN_MASKED,
};
/**
--
2.34.1
^ permalink raw reply related
* [PATCH v1 19/20] drivers: add testpmd commands for private features
From: liujie5 @ 2026-06-10 1:39 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260610013936.3634968-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
Introduce private testpmd commands and implementation files to enable
debugging and testing of sxe2-specific hardware features (such as
packet scheduling reset, UDP tunnel configuration, and IPsec ingress/
egress offloads) directly within the testpmd application.
The parameters are parsed using the standard 'rte_kvargs' library during
the PCI/vdev probing phase. Documentation for these parameters is also
updated.
During memory hotplug events, the SXE2 driver needs to track memory
segment layout changes to maintain internal DMA mappings. However,
existing memseg walk functions (rte_memseg_walk) acquire memory locks
and cannot be called from within memory event callbacks, leading to
potential deadlocks.
This commit introduces sxe2_memseg_walk_cb() as a helper that walks
memory segments using the thread-unsafe variant
rte_memseg_walk_thread_unsafe(), which is safe to call from
memory-related callbacks [citation:1][citation:3][citation:5].
The implementation follows the standard rte_memseg_walk_t prototype,
processing each memseg to update driver-specific data structures.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
drivers/common/sxe2/sxe2_common.c | 110 +++
drivers/common/sxe2/sxe2_common.h | 2 +
drivers/common/sxe2/sxe2_ioctl_chnl.c | 2 +-
drivers/net/sxe2/meson.build | 5 +-
drivers/net/sxe2/sxe2_cmd_chnl.c | 21 +
drivers/net/sxe2/sxe2_cmd_chnl.h | 3 +
drivers/net/sxe2/sxe2_drv_cmd.h | 17 +
drivers/net/sxe2/sxe2_dump.c | 15 +
drivers/net/sxe2/sxe2_ethdev.c | 287 +++++++-
drivers/net/sxe2/sxe2_ethdev.h | 8 +
drivers/net/sxe2/sxe2_irq.c | 34 +-
drivers/net/sxe2/sxe2_rx.c | 12 +
drivers/net/sxe2/sxe2_testpmd.c | 733 +++++++++++++++++++
drivers/net/sxe2/sxe2_testpmd_lib.c | 969 ++++++++++++++++++++++++++
drivers/net/sxe2/sxe2_testpmd_lib.h | 142 ++++
drivers/net/sxe2/sxe2_tm.c | 18 +
drivers/net/sxe2/sxe2_tm.h | 2 +
17 files changed, 2374 insertions(+), 6 deletions(-)
create mode 100644 drivers/net/sxe2/sxe2_testpmd.c
create mode 100644 drivers/net/sxe2/sxe2_testpmd_lib.c
create mode 100644 drivers/net/sxe2/sxe2_testpmd_lib.h
diff --git a/drivers/common/sxe2/sxe2_common.c b/drivers/common/sxe2/sxe2_common.c
index c000a55cd0..5c5db85f29 100644
--- a/drivers/common/sxe2/sxe2_common.c
+++ b/drivers/common/sxe2/sxe2_common.c
@@ -196,6 +196,102 @@ static int32_t sxe2_parse_representor(const char *key, const char *value, void *
PMD_LOG_INFO(COM, "representor arg %s: \"%s\".", key, value);
+l_end:
+ return ret;
+}
+static int32_t sxe2_dma_mem_map(struct sxe2_common_device *cdev,
+ const void *addr, size_t len, bool do_map)
+{
+ struct rte_memseg_list *msl;
+ struct rte_memseg *ms;
+ size_t cur_len = 0;
+ int32_t ret = 0;
+
+ msl = rte_mem_virt2memseg_list(addr);
+ if (msl == NULL) {
+ ret = -EINVAL;
+ PMD_LOG_ERR(COM, "Invalid virt addr=%p.", addr);
+ goto l_end;
+ }
+
+ if ((uintptr_t)addr != RTE_ALIGN((uintptr_t)addr, msl->page_sz) ||
+ (len != RTE_ALIGN(len, msl->page_sz))) {
+ ret = -EINVAL;
+ PMD_LOG_ERR(COM, "Addr=%p and len=%zu not align page size=%" PRIu64 ".",
+ addr, len, msl->page_sz);
+ goto l_end;
+ }
+
+ /* memsegs are contiguous in memory */
+ ms = rte_mem_virt2memseg(addr, msl);
+ while (cur_len < len) {
+ /* some memory segments may have invalid IOVA */
+ if (ms->iova == RTE_BAD_IOVA) {
+ PMD_LOG_WARN(COM, "Memory segment at %p has bad IOVA, skipping.",
+ ms->addr);
+ goto next;
+ }
+ if (do_map)
+ sxe2_drv_dev_dma_map(cdev, ms->addr_64,
+ ms->iova, ms->len);
+ else
+ sxe2_drv_dev_dma_unmap(cdev, ms->iova);
+
+next:
+ cur_len += ms->len;
+ ++ms;
+ }
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_INTERNAL_SYMBOL(sxe2_common_mem_event_cb)
+void
+sxe2_common_mem_event_cb(enum rte_mem_event type,
+ const void *addr, size_t size, void *arg __rte_unused)
+{
+ struct sxe2_common_device *cdev = NULL;
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+ goto l_end;
+
+ pthread_mutex_lock(&sxe2_common_devices_list_lock);
+ switch (type) {
+ case RTE_MEM_EVENT_FREE:
+ TAILQ_FOREACH(cdev, &sxe2_common_devices_list, next)
+ (void)sxe2_dma_mem_map(cdev, addr, size, 0);
+ break;
+ case RTE_MEM_EVENT_ALLOC:
+ TAILQ_FOREACH(cdev, &sxe2_common_devices_list, next)
+ (void)sxe2_dma_mem_map(cdev, addr, size, 1);
+ break;
+ default:
+ break;
+ }
+ pthread_mutex_unlock(&sxe2_common_devices_list_lock);
+l_end:
+ return;
+}
+
+static int32_t sxe2_memseg_walk_cb(const struct rte_memseg_list *msl,
+ const struct rte_memseg *ms, void *arg)
+{
+ struct sxe2_common_device *cdev = arg;
+ int32_t ret = 0;
+
+ if (msl->external && !msl->heap)
+ goto l_end;
+
+ if (ms->iova == RTE_BAD_IOVA)
+ goto l_end;
+
+ ret = sxe2_drv_dev_dma_map(cdev, ms->addr_64, ms->iova, ms->len);
+ if (ret != 0) {
+ PMD_LOG_ERR(COM, "Fail to memseg dma map.");
+ goto l_end;
+ }
+
l_end:
return ret;
}
@@ -220,6 +316,18 @@ static int32_t sxe2_common_device_setup(struct sxe2_common_device *cdev)
goto l_close_dev;
}
+ rte_mcfg_mem_read_lock();
+ ret = rte_memseg_walk_thread_unsafe(sxe2_memseg_walk_cb, cdev);
+ if (ret) {
+ PMD_LOG_ERR(COM, "Fail to walk memseg, ret=%d", ret);
+ rte_mcfg_mem_read_unlock();
+ goto l_close_dev;
+ }
+ rte_mcfg_mem_read_unlock();
+
+ (void)rte_mem_event_callback_register("SXE2_MEM_EVENT_CB",
+ sxe2_common_mem_event_cb, NULL);
+
goto l_end;
l_close_dev:
@@ -251,6 +359,7 @@ static struct sxe2_common_device *sxe2_common_device_alloc(
}
cdev->dev = rte_dev;
cdev->class_type = class_type;
+ cdev->config.cmd_fd = SXE2_CMD_FD_INVALID;
cdev->config.kernel_reset = false;
pthread_mutex_init(&cdev->config.lock, NULL);
@@ -631,6 +740,7 @@ static int32_t sxe2_common_pci_id_table_update(const struct rte_pci_id *id_table
updated_table = calloc(num_ids, sizeof(*updated_table));
if (!updated_table) {
+ ret = -ENOMEM;
PMD_LOG_ERR(COM, "Failed to allocate memory for PCI ID table");
goto l_end;
}
diff --git a/drivers/common/sxe2/sxe2_common.h b/drivers/common/sxe2/sxe2_common.h
index b02b6317da..efc8d3585a 100644
--- a/drivers/common/sxe2/sxe2_common.h
+++ b/drivers/common/sxe2/sxe2_common.h
@@ -14,6 +14,8 @@
#define SXE2_COMMON_PCI_DRIVER_NAME "sxe2_pci"
+#define SXE2_CMD_FD_INVALID (-1)
+
#define SXE2_CDEV_TO_CMD_FD(cdev) \
((cdev)->config.cmd_fd)
diff --git a/drivers/common/sxe2/sxe2_ioctl_chnl.c b/drivers/common/sxe2/sxe2_ioctl_chnl.c
index 173d8d57ae..a233a78136 100644
--- a/drivers/common/sxe2/sxe2_ioctl_chnl.c
+++ b/drivers/common/sxe2/sxe2_ioctl_chnl.c
@@ -110,7 +110,7 @@ sxe2_drv_dev_close(struct sxe2_common_device *cdev)
if (fd >= 0)
close(fd);
PMD_LOG_INFO(COM, "closed device fd=%d", fd);
- SXE2_CDEV_TO_CMD_FD(cdev) = -1;
+ SXE2_CDEV_TO_CMD_FD(cdev) = SXE2_CMD_FD_INVALID;
}
RTE_EXPORT_INTERNAL_SYMBOL(sxe2_drv_dev_handshake)
diff --git a/drivers/net/sxe2/meson.build b/drivers/net/sxe2/meson.build
index 4fb2333926..04369402b7 100644
--- a/drivers/net/sxe2/meson.build
+++ b/drivers/net/sxe2/meson.build
@@ -9,9 +9,10 @@ endif
cflags += ['-g']
-deps += ['common_sxe2', 'hash','cryptodev','security']
+deps += ['common_sxe2', 'hash', 'cryptodev', 'security', 'cmdline']
includes += include_directories('../../common/sxe2')
+testpmd_sources = files('sxe2_testpmd.c')
if arch_subdir == 'x86'
sources += files('sxe2_txrx_vec_sse.c')
@@ -79,5 +80,5 @@ sources += files(
'sxe2_flow_parse_engine.c',
'sxe2_dump.c',
'sxe2_txrx_check_mbuf.c',
-
+ 'sxe2_testpmd_lib.c',
)
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.c b/drivers/net/sxe2/sxe2_cmd_chnl.c
index 43e8c59487..b09989fe50 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.c
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.c
@@ -99,6 +99,27 @@ int32_t sxe2_drv_dev_info_get(struct sxe2_adapter *adapter,
return ret;
}
+int32_t sxe2_drv_fc_state_get(struct sxe2_adapter *adapter,
+ struct sxe2_drv_vsi_fc_get_resp *dev_fc_state_resp)
+{
+ int32_t ret = 0;
+ struct sxe2_common_device *cdev = adapter->cdev;
+ struct sxe2_drv_cmd_params param = {0};
+ struct sxe2_drv_vsi_fc_get_req req = {0};
+
+ req.vsi_id = adapter->vsi_ctxt.main_vsi->vsi_id;
+ sxe2_drv_cmd_params_fill(adapter, ¶m, SXE2_DRV_CMD_VSI_FC_GET,
+ &req, sizeof(req),
+ dev_fc_state_resp,
+ sizeof(*dev_fc_state_resp));
+ ret = sxe2_drv_cmd_exec(cdev, ¶m);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "get fc state failed, ret=%d", ret);
+ ret = -EIO;
+ }
+ return ret;
+}
+
int32_t sxe2_drv_dev_fw_info_get(struct sxe2_adapter *adapter,
struct sxe2_drv_dev_fw_info_resp *dev_fw_info_resp)
{
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.h b/drivers/net/sxe2/sxe2_cmd_chnl.h
index 988d4b458b..d63caad526 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.h
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.h
@@ -99,6 +99,9 @@ int32_t sxe2_drv_vsi_stats_reset(struct sxe2_adapter *adapter);
int32_t sxe2_drv_queue_info_get_update(struct sxe2_adapter *adapter,
struct eth_queue_stats *qstats);
+int32_t sxe2_drv_fc_state_get(struct sxe2_adapter *adapter,
+ struct sxe2_drv_vsi_fc_get_resp *dev_fc_state_resp);
+
int32_t sxe2_drv_rxq_mapping_set(struct rte_eth_dev *eth_dev, uint16_t queue_id, uint8_t pool_idx);
int32_t sxe2_drv_txq_mapping_set(struct rte_eth_dev *eth_dev, uint16_t queue_id, uint8_t pool_idx);
diff --git a/drivers/net/sxe2/sxe2_drv_cmd.h b/drivers/net/sxe2/sxe2_drv_cmd.h
index 09b2f7d125..59a8aa6f13 100644
--- a/drivers/net/sxe2/sxe2_drv_cmd.h
+++ b/drivers/net/sxe2/sxe2_drv_cmd.h
@@ -651,6 +651,23 @@ struct __rte_aligned(4) __rte_packed_begin sxe2_drv_sfp_resp {
uint8_t data[];
} __rte_packed_end;
+enum sxe2_fc_type {
+ SXE2_FC_T_DIS = 0,
+ SXE2_FC_T_LFC,
+ SXE2_FC_T_PFC,
+ SXE2_FC_T_UNKNOWN = 255,
+};
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_vsi_fc_get_req {
+ uint16_t vsi_id;
+ uint8_t rsv[2];
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_vsi_fc_get_resp {
+ uint8_t fc_enable;
+ uint8_t rsv[3];
+} __rte_packed_end;
+
enum sxe2_drv_cmd_module {
SXE2_DRV_CMD_MODULE_HANDSHAKE = 0,
SXE2_DRV_CMD_MODULE_DEV = 1,
diff --git a/drivers/net/sxe2/sxe2_dump.c b/drivers/net/sxe2/sxe2_dump.c
index 1753eccf99..fd0a99d6fd 100644
--- a/drivers/net/sxe2/sxe2_dump.c
+++ b/drivers/net/sxe2/sxe2_dump.c
@@ -188,6 +188,20 @@ static void sxe2_dump_filter_info(FILE *file, struct rte_eth_dev *dev)
return;
}
+static void sxe2_dump_fc_state(FILE *file, struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+
+ if (!(adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_FC_STATE))
+ goto l_end;
+
+ fprintf(file, " -- fc state:\n"
+ "\t -- curr_state: %u\n",
+ adapter->fc_state_ctx.curr_state);
+l_end:
+ return;
+}
+
static const char *sxe2_vsi_id_str(uint16_t vsi_id, char *buf, size_t len)
{
if (vsi_id == SXE2_INVALID_VSI_ID)
@@ -274,6 +288,7 @@ int32_t sxe2_eth_dev_priv_dump(struct rte_eth_dev *dev, FILE *file)
sxe2_dump_dev_args_info(str, dev);
sxe2_dump_filter_info(str, dev);
sxe2_dump_switchdev_info(str, dev);
+ sxe2_dump_fc_state(str, dev);
(void)fflush(str);
diff --git a/drivers/net/sxe2/sxe2_ethdev.c b/drivers/net/sxe2/sxe2_ethdev.c
index 3c86424c15..b768a780b4 100644
--- a/drivers/net/sxe2/sxe2_ethdev.c
+++ b/drivers/net/sxe2/sxe2_ethdev.c
@@ -68,7 +68,14 @@ static const struct rte_pci_id pci_id_sxe2_tbl[] = {
{ RTE_PCI_DEVICE(SXE2_PCI_VENDOR_ID_206F, SXE2_PCI_DEVICE_ID_VF_1)},
{ .vendor_id = 0, },
};
-
+#define SXE2_TXSCH_NODE_ADJ_LVL_MAX 3
+#define SXE2_DEVARG_FLOW_DULP_PATTERN_MODE "flow-duplicate-pattern"
+#define SXE2_DEVARG_FUNC_FLOW_DIRCT "function-flow-direct"
+#define SXE2_DEVARG_FNAV_STAT_TYPE "fnav-stat-type"
+#define SXE2_DEVARG_SW_STATS "drv-sw-stats"
+#define SXE2_DEVARG_HIGH_PERFORMANCE_MODE "high-performance-mode"
+#define SXE2_DEVARG_SCHED_LAYER_MODE "sched-layer-mode"
+#define SXE2_DEVARG_RX_LOW_LATENCY "rx-low-latency"
static struct sxe2_pci_map_addr_info sxe2_net_map_addr_info_pf[SXE2_PCI_MAP_RES_MAX_COUNT] = {
[SXE2_PCI_MAP_RES_INVALID] = {.addr_base = 0,
.bar_idx = 0,
@@ -980,6 +987,149 @@ static inline void sxe2_init_ptype_tbl(struct rte_eth_dev *dev)
sxe2_init_ptype_list(ptype);
}
+static int32_t sxe2_parse_fnav_stat_type(const char *key, const char *value, void *args)
+{
+ int32_t ret = -EINVAL;
+ uint8_t *num = (uint8_t *)args;
+ uint8_t fnav_stat_type = 0;
+ char *endptr = NULL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ fnav_stat_type = (uint8_t)strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0') {
+ PMD_LOG_WARN(INIT, "%s: \"%s\" is not a valid int value.",
+ key, value);
+ goto l_end;
+ }
+ if (fnav_stat_type > SXE2_FNAV_STAT_ENA_ALL ||
+ fnav_stat_type == SXE2_FNAV_STAT_ENA_NONE) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" out of range [1-3].",
+ key, value);
+ goto l_end;
+ }
+ *num = fnav_stat_type;
+ ret = 0;
+l_end:
+ return ret;
+}
+static int32_t sxe2_parse_sched_layer_mode(const char *key, const char *value, void *args)
+{
+ int32_t ret = -EINVAL;
+ uint8_t *num = (uint8_t *)args;
+ uint8_t sched_layer_mode;
+ char *endptr = NULL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ sched_layer_mode = (uint8_t)strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0') {
+ PMD_LOG_WARN(INIT, "%s: \"%s\" is not a valid int value.",
+ key, value);
+ goto l_end;
+ }
+ if (sched_layer_mode > SXE2_TXSCH_NODE_ADJ_LVL_MAX) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" > 3.",
+ key, value);
+ goto l_end;
+ }
+ *num = sched_layer_mode;
+ ret = 0;
+l_end:
+ return ret;
+}
+static int32_t sxe2_parse_high_performance_mode(const char *key, const char *value, void *args)
+{
+ int32_t ret = -EINVAL;
+ uint8_t *num = (uint8_t *)args;
+ uint8_t high_performance_mode;
+ char *endptr = NULL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ high_performance_mode = (uint8_t)strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0') {
+ PMD_LOG_WARN(INIT, "%s: \"%s\" is not a valid int value.",
+ key, value);
+ goto l_end;
+ }
+ if (high_performance_mode != 1) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" != 1.",
+ key, value);
+ goto l_end;
+ }
+ *num = high_performance_mode;
+ ret = 0;
+l_end:
+ return ret;
+}
+static int32_t sxe2_parse_u8(const char *key, const char *value, void *args)
+{
+ uint8_t *num = (uint8_t *)args;
+ char *end;
+ unsigned long val;
+ int32_t ret = -EINVAL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ val = strtoul(value, &end, 10);
+ if (errno != 0 || end == value || *end != '\0') {
+ PMD_LOG_ERR(INIT, "Invalid 8-bit integer value for key %s: %s", key, value);
+ return -EINVAL;
+ }
+
+ if (val > UINT8_MAX) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" out of range [0-255].",
+ key, value);
+ return -ERANGE;
+ }
+
+ *num = val;
+ ret = 0;
+l_end:
+ return ret;
+}
+static int32_t sxe2_parse_bool(const char *key, const char *value, void *args)
+{
+ int32_t ret = -EINVAL;
+ uint8_t *num = (uint8_t *)args;
+ uint8_t bool_val = 0;
+ char *endptr = NULL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ bool_val = (uint8_t)strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0') {
+ PMD_LOG_WARN(INIT, "%s: \"%s\" is not a valid int value.",
+ key, value);
+ goto l_end;
+ }
+ if (bool_val != 0 && bool_val != 1) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" out of range [0|1].",
+ key, value);
+ goto l_end;
+ }
+ *num = bool_val;
+ ret = 0;
+l_end:
+ return ret;
+}
+
struct sxe2_pci_map_bar_info *sxe2_dev_get_bar_info(struct sxe2_adapter *adapter,
enum sxe2_pci_map_resource res_type)
{
@@ -1047,6 +1197,69 @@ void *sxe2_pci_map_addr_get(struct sxe2_adapter *adapter,
return addr;
}
+static int32_t sxe2_args_parse(struct rte_eth_dev *dev, struct sxe2_dev_kvargs_info *kvargs)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ int32_t ret = 0;
+ PMD_INIT_FUNC_TRACE();
+
+ if (kvargs == NULL)
+ goto l_end;
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_FNAV_STAT_TYPE,
+ &sxe2_parse_fnav_stat_type,
+ &adapter->devargs.fnav_stat_type);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse fnav stat type, ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_SW_STATS,
+ &sxe2_parse_bool,
+ &adapter->devargs.sw_stats_en);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse sw stats enable, ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_HIGH_PERFORMANCE_MODE,
+ &sxe2_parse_high_performance_mode,
+ &adapter->devargs.high_performance_mode);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse high performance, ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_SCHED_LAYER_MODE,
+ &sxe2_parse_sched_layer_mode,
+ &adapter->devargs.sched_layer_mode);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse sched layer mode, ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_FLOW_DULP_PATTERN_MODE,
+ &sxe2_parse_u8,
+ &adapter->devargs.flow_dup_pattern_mode);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse switch dulpliate flow pattern mode,"
+ "ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_FUNC_FLOW_DIRCT,
+ &sxe2_parse_bool,
+ &adapter->devargs.func_flow_direct_en);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse function flow rule enable,"
+ "ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_RX_LOW_LATENCY,
+ &sxe2_parse_bool,
+ &adapter->devargs.rx_low_latency);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse rx low latency, ret:%d", ret);
+ goto l_end;
+ }
+l_end:
+ return ret;
+}
+
static int32_t sxe2_eth_init(struct rte_eth_dev *dev)
{
int32_t ret = 0;
@@ -1599,6 +1812,37 @@ void sxe2_dev_pci_map_uinit(struct rte_eth_dev *dev)
adapter->dev_info.dev_data = NULL;
}
+static int32_t sxe2_fc_state_init(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter =
+ SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_drv_vsi_fc_get_resp fc_resp = {0};
+ int32_t ret;
+
+ if (!(adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_FC_STATE)) {
+ adapter->fc_state_ctx.cfg_state = 0;
+ adapter->fc_state_ctx.curr_state = 0;
+ ret = 0;
+ goto l_end;
+ }
+ ret = sxe2_drv_fc_state_get(adapter, &fc_resp);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to get fc state, ret=[%d]", ret);
+ goto l_end;
+ }
+ adapter->fc_state_ctx.cfg_state = fc_resp.fc_enable;
+ adapter->fc_state_ctx.curr_state = 0;
+l_end:
+ return ret;
+}
+static void sxe2_fc_state_uinit(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter =
+ SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ adapter->fc_state_ctx.cfg_state = 0;
+ adapter->fc_state_ctx.curr_state = 0;
+}
+
uint32_t sxe2_sched_mode_get(struct sxe2_adapter *adapter)
{
uint32_t ret_mode = SXE2_SCHED_MODE_INVALID;
@@ -1661,6 +1905,32 @@ static int32_t sxe2_sched_uinit(struct rte_eth_dev *dev)
return ret;
}
+int32_t sxe2_sched_reset(struct rte_eth_dev *dev)
+{
+ int32_t ret = 0;
+
+ if (dev->data->dev_started) {
+ PMD_LOG_ERR(DRV, "Device failed to Stop.");
+ ret = -EPERM;
+ goto l_end;
+ }
+
+ ret = sxe2_tm_conf_reset(dev);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_sched_uinit(dev);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_sched_init(dev);
+ if (ret)
+ goto l_end;
+
+l_end:
+ return ret;
+}
+
static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
struct sxe2_dev_kvargs_info *kvargs __rte_unused)
{
@@ -1683,6 +1953,12 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
sxe2_init_ptype_tbl(dev);
+ ret = sxe2_args_parse(dev, kvargs);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to parse devargs, ret=%d", ret);
+ goto l_end;
+ }
+
ret = sxe2_hw_init(dev);
if (ret) {
PMD_LOG_ERR(INIT, "Failed to initialize hw, ret=[%d]", ret);
@@ -1749,6 +2025,12 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
goto init_flow_err;
}
+ ret = sxe2_fc_state_init(dev);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to init fc state, ret=%d", ret);
+ goto init_fc_state_err;
+ }
+
ret = sxe2_sched_init(dev);
if (ret) {
PMD_LOG_ERR(INIT, "Failed to init sched, ret=%d", ret);
@@ -1772,6 +2054,8 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
init_xstats_err:
(void)sxe2_sched_uinit(dev);
init_sched_err:
+ sxe2_fc_state_uinit(dev);
+init_fc_state_err:
(void)sxe2_flow_uninit(dev);
init_flow_err:
init_rss_err:
@@ -1817,6 +2101,7 @@ static int32_t sxe2_dev_close(struct rte_eth_dev *dev)
sxe2_eth_uinit(dev);
sxe2_dev_pci_map_uinit(dev);
sxe2_free_repr_info(dev);
+ sxe2_fc_state_uinit(dev);
l_end:
return 0;
diff --git a/drivers/net/sxe2/sxe2_ethdev.h b/drivers/net/sxe2/sxe2_ethdev.h
index 510f5139c3..14ad26b439 100644
--- a/drivers/net/sxe2/sxe2_ethdev.h
+++ b/drivers/net/sxe2/sxe2_ethdev.h
@@ -311,6 +311,11 @@ struct sxe2_filter_context {
bool cur_l2_config;
};
+struct sxe2_fc_state_ctxt {
+ uint8_t curr_state;
+ uint8_t cfg_state;
+};
+
struct sxe2_adapter {
struct sxe2_common_device *cdev;
struct sxe2_dev_info dev_info;
@@ -332,6 +337,7 @@ struct sxe2_adapter {
struct sxe2_security_ctx security_ctx;
struct sxe2_repr_context repr_ctxt;
struct sxe2_switchdev_info switchdev_info;
+ struct sxe2_fc_state_ctxt fc_state_ctx;
bool rule_started;
bool flow_isolated;
bool flow_isolate_cfg;
@@ -359,6 +365,8 @@ bool sxe2_ethdev_check(struct rte_eth_dev *dev);
uint32_t sxe2_sched_mode_get(struct sxe2_adapter *adapter);
+int32_t sxe2_sched_reset(struct rte_eth_dev *dev);
+
struct sxe2_pci_map_bar_info *sxe2_dev_get_bar_info(struct sxe2_adapter *adapter,
enum sxe2_pci_map_resource res_type);
diff --git a/drivers/net/sxe2/sxe2_irq.c b/drivers/net/sxe2/sxe2_irq.c
index c26098ef3a..3306504761 100644
--- a/drivers/net/sxe2/sxe2_irq.c
+++ b/drivers/net/sxe2/sxe2_irq.c
@@ -10,6 +10,7 @@
#include <rte_alarm.h>
#include <fcntl.h>
#include <rte_stdatomic.h>
+#include <rte_common.h>
#include "sxe2_ethdev.h"
#include "sxe2_irq.h"
@@ -47,6 +48,31 @@ static struct sxe2_event_handler event_handler = {
static RTE_ATOMIC(uint32_t)event_thread_run;
+static int32_t sxe2_fc_state_callback(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_drv_vsi_fc_get_resp fc_resp = {0};
+ int32_t ret;
+
+ if (!(adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_FC_STATE)) {
+ ret = 0;
+ goto l_end;
+ }
+ ret = sxe2_drv_fc_state_get(adapter, &fc_resp);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to get fc state, ret=[%d]", ret);
+ goto l_end;
+ }
+ adapter->fc_state_ctx.cfg_state = fc_resp.fc_enable;
+ if (dev->data->dev_started) {
+ PMD_LOG_NOTICE(DRV, "Interrupt event: FC status changed."
+ "cfg_state:%u curr_state:%u",
+ adapter->fc_state_ctx.cfg_state,
+ adapter->fc_state_ctx.curr_state);
+ }
+l_end:
+ return ret;
+}
static void sxe2_event_irq_common_handler(struct sxe2_adapter *adapter, uint64_t oicr)
{
@@ -68,6 +94,10 @@ static void sxe2_event_irq_common_handler(struct sxe2_adapter *adapter, uint64_t
PMD_DEV_LOG_INFO(adapter, DRV, "event notify legacy");
(void)sxe2_switchdev_notify_callback(adapter, false);
}
+ if (oicr & RTE_BIT32(SXE2_COM_FC_ST_CHANGE)) {
+ PMD_DEV_LOG_INFO(adapter, DRV, "fc event notify legacy");
+ (void)sxe2_fc_state_callback(dev);
+ }
}
static uint32_t sxe2_event_intr_handle(void *param __rte_unused)
@@ -436,7 +466,7 @@ int32_t sxe2_intr_init(struct rte_eth_dev *dev)
{
struct sxe2_adapter *adapter =
SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
- struct rte_pci_device *pci_dev = SXE2_DEV_TO_PCI(dev);
+ struct rte_pci_device *pci_dev = container_of(dev->device, struct rte_pci_device, device);
struct rte_intr_handle *reset_handle = NULL;
int32_t ofd = -1;
int32_t rfd = -1;
@@ -518,7 +548,7 @@ void sxe2_intr_uninit(struct rte_eth_dev *dev)
{
struct sxe2_adapter *adapter =
SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
- struct rte_pci_device *pci_dev = SXE2_DEV_TO_PCI(dev);
+ struct rte_pci_device *pci_dev = container_of(dev->device, struct rte_pci_device, device);
sxe2_reset_intr_unregister(dev);
sxe2_intr_handler_destroy(adapter->irq_ctxt.reset_handle,
diff --git a/drivers/net/sxe2/sxe2_rx.c b/drivers/net/sxe2/sxe2_rx.c
index 79e65cfbf1..b5dd9950f0 100644
--- a/drivers/net/sxe2/sxe2_rx.c
+++ b/drivers/net/sxe2/sxe2_rx.c
@@ -467,12 +467,24 @@ int32_t __rte_cold sxe2_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queu
int32_t __rte_cold sxe2_rxqs_all_start(struct rte_eth_dev *dev)
{
struct rte_eth_dev_data *data = dev->data;
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_drv_vsi_fc_get_resp fc_resp = {0};
struct sxe2_rx_queue *rxq;
uint16_t nb_rxq;
uint16_t nb_started_rxq;
int32_t ret;
PMD_INIT_FUNC_TRACE();
+ if (adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_FC_STATE) {
+ ret = sxe2_drv_fc_state_get(adapter, &fc_resp);
+ if (ret) {
+ PMD_LOG_ERR(RX, "Failed to get fc state, ret=[%d]", ret);
+ goto l_end;
+ }
+ adapter->fc_state_ctx.cfg_state = fc_resp.fc_enable;
+ adapter->fc_state_ctx.curr_state = adapter->fc_state_ctx.cfg_state;
+ }
+
for (nb_rxq = 0; nb_rxq < data->nb_rx_queues; nb_rxq++) {
rxq = dev->data->rx_queues[nb_rxq];
if (!rxq || rxq->rx_deferred_start)
diff --git a/drivers/net/sxe2/sxe2_testpmd.c b/drivers/net/sxe2/sxe2_testpmd.c
new file mode 100644
index 0000000000..5792058212
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_testpmd.c
@@ -0,0 +1,733 @@
+
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef SXE2_TEST
+#include <cmdline_parse_num.h>
+#include <cmdline_parse_string.h>
+#include <stdlib.h>
+#include <testpmd.h>
+
+#include "sxe2_common_log.h"
+#include "sxe2_testpmd_lib.h"
+
+#define SXE2_SWITCH_BUFF_SIZE (4 * 1024 * 1024)
+
+struct cmd_stats_info_show_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t show;
+ cmdline_fixed_string_t stats;
+ portid_t port_id;
+};
+cmdline_parse_token_string_t cmd_stats_info_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_stats_info_show_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_stats_info_show =
+ TOKEN_STRING_INITIALIZER(struct cmd_stats_info_show_result, show, "show");
+cmdline_parse_token_string_t cmd_stats_info_stats =
+ TOKEN_STRING_INITIALIZER(struct cmd_stats_info_show_result, stats, "stats");
+cmdline_parse_token_num_t cmd_stats_info_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_stats_info_show_result, port_id, RTE_UINT16);
+
+struct cmd_flow_rule_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t flow;
+ cmdline_fixed_string_t rule;
+ cmdline_fixed_string_t dump;
+ portid_t port_id;
+};
+cmdline_parse_token_string_t cmd_flow_rule_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_flow_rule_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_flow_rule_flow =
+ TOKEN_STRING_INITIALIZER(struct cmd_flow_rule_result, flow, "flow");
+cmdline_parse_token_string_t cmd_flow_rule_rule =
+ TOKEN_STRING_INITIALIZER(struct cmd_flow_rule_result, rule, "rule");
+cmdline_parse_token_string_t cmd_flow_rule_dmp =
+ TOKEN_STRING_INITIALIZER(struct cmd_flow_rule_result, dump, "dump");
+cmdline_parse_token_num_t cmd_flow_rule_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_flow_rule_result, port_id, RTE_UINT16);
+
+struct cmd_udp_tunnel {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t tunnel_type;
+ cmdline_fixed_string_t action;
+ cmdline_fixed_string_t udp_tunnel_port;
+ uint16_t udp_port;
+ portid_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_udp_tunnel_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_udp_tunnel, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_udp_tunnel_action =
+ TOKEN_STRING_INITIALIZER(struct cmd_udp_tunnel, action, "add#rm#show");
+cmdline_parse_token_string_t cmd_udp_tunnel_udp_tunnel_port =
+ TOKEN_STRING_INITIALIZER(struct cmd_udp_tunnel, udp_tunnel_port, "udp_tunnel_port");
+cmdline_parse_token_string_t cmd_udp_tunnel_tunnel_type =
+ TOKEN_STRING_INITIALIZER(struct cmd_udp_tunnel,
+ tunnel_type, "vxlan#vxlan-gpe#geneve#gtp-c#gtp-u#pfcp#ecpri#mpls#nvgre#l2tp#teredo");
+cmdline_parse_token_num_t cmd_udp_tunnel_udp_port =
+ TOKEN_NUM_INITIALIZER(struct cmd_udp_tunnel, udp_port, RTE_UINT16);
+cmdline_parse_token_num_t cmd_udp_tunnel_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_udp_tunnel, port_id, RTE_UINT16);
+
+struct cmd_sched_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t sched;
+ cmdline_fixed_string_t reset;
+ portid_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_sched_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_sched_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_sched_sched =
+ TOKEN_STRING_INITIALIZER(struct cmd_sched_result, sched, "sched");
+cmdline_parse_token_string_t cmd_sched_reset =
+ TOKEN_STRING_INITIALIZER(struct cmd_sched_result, reset, "reset");
+cmdline_parse_token_num_t cmd_sched_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_sched_result, port_id, RTE_UINT16);
+
+struct cmd_ipsec_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t engin;
+ cmdline_fixed_string_t dir;
+ cmdline_fixed_string_t op;
+ portid_t port_id;
+ uint16_t session_id;
+ cmdline_fixed_string_t encrypt_algo;
+ cmdline_fixed_string_t encrypt_key;
+ cmdline_fixed_string_t auth_algo;
+ cmdline_fixed_string_t auth_key;
+ cmdline_fixed_string_t dst_ip;
+ uint16_t sport;
+ uint16_t dport;
+ uint32_t spi;
+};
+cmdline_parse_token_string_t cmd_ipsec_mgt_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_ipsec_mgt_module =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, engin, "ipsec");
+cmdline_parse_token_string_t cmd_ipsec_mgt_dir =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, dir, "egress#ingress");
+cmdline_parse_token_string_t cmd_ipsec_mgt_op =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, op, "add#rm#show");
+cmdline_parse_token_num_t cmd_ipsec_mgt_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, port_id, RTE_UINT16);
+cmdline_parse_token_num_t cmd_ipsec_mgt_session_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, session_id, RTE_UINT16);
+cmdline_parse_token_string_t cmd_ipsec_mgt_encrypt_algo =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, encrypt_algo, "aes-cbc#sm4-cbc#null");
+cmdline_parse_token_string_t cmd_ipsec_mgt_encrypt_key =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, encrypt_key, NULL);
+cmdline_parse_token_string_t cmd_ipsec_mgt_auth_algo =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, auth_algo, "sha-hmac#sm3-hmac#null");
+cmdline_parse_token_string_t cmd_ipsec_mgt_auth_key =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, auth_key, NULL);
+cmdline_parse_token_string_t cmd_ipsec_mgt_dst_ip =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, dst_ip, NULL);
+cmdline_parse_token_num_t cmd_ipsec_mgt_sport =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, sport, RTE_UINT16);
+cmdline_parse_token_num_t cmd_ipsec_mgt_dport =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, dport, RTE_UINT16);
+cmdline_parse_token_num_t cmd_ipsec_mgt_spi =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, spi, RTE_UINT32);
+
+struct cmd_ipsec_set_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t engin;
+ cmdline_fixed_string_t op;
+ cmdline_fixed_string_t type;
+ portid_t port_id;
+ uint16_t conf_value;
+};
+cmdline_parse_token_string_t cmd_ipsec_set_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_set_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_ipsec_set_module =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_set_result, engin, "ipsec");
+cmdline_parse_token_string_t cmd_ipsec_set_op =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_set_result, op, "set#get");
+cmdline_parse_token_string_t cmd_ipsec_set_type =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_set_result, type, "session-id#esp-hdr-offset");
+cmdline_parse_token_num_t cmd_ipsec_set_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_set_result, port_id, RTE_UINT16);
+cmdline_parse_token_num_t cmd_ipsec_set_value =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_set_result, conf_value, RTE_UINT16);
+
+struct cmd_ipsec_flush_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t engin;
+ cmdline_fixed_string_t op;
+ portid_t port_id;
+};
+cmdline_parse_token_string_t cmd_ipsec_flush_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_flush_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_ipsec_flush_module =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_flush_result, engin, "ipsec");
+cmdline_parse_token_string_t cmd_ipsec_flush_op =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_flush_result, op, "flush");
+cmdline_parse_token_num_t cmd_ipsec_flush_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_flush_result, port_id, RTE_UINT16);
+
+struct cmd_inject_irq {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t inject;
+ cmdline_fixed_string_t irq;
+ portid_t port_id;
+ cmdline_fixed_string_t type;
+};
+cmdline_parse_token_string_t cmd_inject_irq_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_inject_irq, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_inject_irq_inject =
+ TOKEN_STRING_INITIALIZER(struct cmd_inject_irq, inject, "inject");
+cmdline_parse_token_string_t cmd_inject_irq_irq =
+ TOKEN_STRING_INITIALIZER(struct cmd_inject_irq, irq, "irq");
+cmdline_parse_token_num_t cmd_inject_irq_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_inject_irq, port_id, RTE_UINT16);
+cmdline_parse_token_string_t cmd_inject_irq_type =
+ TOKEN_STRING_INITIALIZER(struct cmd_inject_irq, type, "reset#lsc");
+
+static void cmd_dump_flow_rule_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_flow_rule_result *res = parsed_result;
+ int ret = -1;
+
+ ret = sxe2_flow_rule_dump(res->port_id, cl);
+ switch (ret) {
+ case 0:
+ break;
+ case -EINVAL:
+ cmdline_printf(cl, "Invalid parameters.\n");
+ break;
+ case -ENODEV:
+ cmdline_printf(cl, "Device doesn't support\n");
+ break;
+ default:
+ cmdline_printf(cl,
+ "Failed to switch rule dump,"
+ " error: (%s)\n",
+ strerror(-ret));
+ }
+}
+
+static void cmd_udp_tunnel_set_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_udp_tunnel *res = parsed_result;
+ int32_t ret = -1;
+ uint8_t action;
+ const char *action_str[SXE2_TESTPMD_CMD_UDP_TUNNEL_MAX] = {
+ [SXE2_TESTPMD_CMD_UDP_TUNNEL_ADD] = "add",
+ [SXE2_TESTPMD_CMD_UDP_TUNNEL_DEL] = "rm",
+ [SXE2_TESTPMD_CMD_UDP_TUNNEL_GET] = "show"};
+
+ for (action = 0; action < SXE2_TESTPMD_CMD_UDP_TUNNEL_MAX; action++)
+ if (!strcmp(res->action, action_str[action]))
+ break;
+
+ if (action >= SXE2_TESTPMD_CMD_UDP_TUNNEL_MAX) {
+ cmdline_printf(cl, "Invalid action!\n");
+ return;
+ }
+
+ ret = sxe2_udp_tunnel_operations(res->port_id, cl, action,
+ res->udp_port,
+ res->tunnel_type);
+ if (ret)
+ cmdline_printf(cl, "%s udp tunnel port failed, ret = %d\n",
+ action_str[action], ret);
+}
+
+static void cmd_dump_stats_info_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_stats_info_show_result *res = parsed_result;
+ int ret = -1;
+
+ ret = sxe2_stats_info_show(res->port_id);
+ switch (ret) {
+ case 0:
+ break;
+ case -EINVAL:
+ cmdline_printf(cl, "Invalid parameters.\n");
+ break;
+ case -ENODEV:
+ cmdline_printf(cl, "Device doesn't support\n");
+ break;
+ default:
+ cmdline_printf(cl,
+ "Failed to show stats info,"
+ " error: (%s)\n", strerror(-ret));
+ }
+}
+
+static uint8_t cmd_ipsec_op_get(char *op)
+{
+ uint8_t i;
+ const char *op_type[SXE2_TESTPMD_CMD_IPSEC_OP_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_OP_ADD] = "add",
+ [SXE2_TESTPMD_CMD_IPSEC_OP_RM] = "rm",
+ [SXE2_TESTPMD_CMD_IPSEC_OP_SHOW] = "show",
+ };
+
+ for (i = 0; i < SXE2_TESTPMD_CMD_IPSEC_OP_MAX; i++) {
+ if (!strcmp(op, op_type[i]))
+ break;
+ }
+
+ return i;
+}
+
+static uint8_t cmd_ipsec_dir_get(char *dir)
+{
+ uint8_t i;
+ const char *dir_type[SXE2_TESTPMD_CMD_IPSEC_DIR_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS] = "egress",
+ [SXE2_TESTPMD_CMD_IPSEC_DIR_INGRESS] = "ingress"
+ };
+
+ for (i = 0; i < SXE2_TESTPMD_CMD_IPSEC_DIR_MAX; i++) {
+ if (!strcmp(dir, dir_type[i]))
+ break;
+ }
+
+ return i;
+}
+
+static int sxe2_hex_to_val(char c)
+{
+ int val = 0;
+
+ if (c >= '0' && c <= '9')
+ val = c - '0';
+ if (c >= 'A' && c <= 'F')
+ val = 10 + c - 'A';
+ if (c >= 'a' && c <= 'f')
+ val = 10 + c - 'a';
+ return val;
+}
+
+static void sxe2_hex_to_bytes(uint8_t *enc_key, char *hex_str, uint8_t len)
+{
+ uint8_t i;
+ int high = 0;
+ int low = 0;
+
+ for (i = 0; i < len; i++) {
+ high = sxe2_hex_to_val(hex_str[2 * i]);
+ low = sxe2_hex_to_val(hex_str[2 * i + 1]);
+ enc_key[i] = (high << 4) | low;
+ }
+}
+
+static int32_t cmd_ipsec_add_param_fill(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ uint8_t i;
+ uint8_t j;
+ int32_t ret = -1;
+ const char *encrypt_algo[SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC] = "aes-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC] = "sm4-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL] = "null"
+ };
+
+ const char *auth_algo[SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC] = "sha-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SM3_HMAC] = "sm3-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL] = "null"
+ };
+
+ for (i = 0; i < SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX; i++)
+ if (!strcmp(res->encrypt_algo, encrypt_algo[i]))
+ break;
+
+ if (i >= SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX) {
+ cmdline_printf(cl, "Invalid ipsec encrypt algo: %s!\n", res->encrypt_algo);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ for (j = 0; j < SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX; j++) {
+ if (!strcmp(res->auth_algo, auth_algo[j]))
+ break;
+ }
+
+
+ if (j >= SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX) {
+ cmdline_printf(cl, "Invalid ipsec auth algo: %s!\n", res->auth_algo);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ param->encrypt_algo = i;
+ param->auth_algo = j;
+ if (param->encrypt_algo == SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC)
+ param->enc_len = 16;
+ else
+ param->enc_len = 32;
+
+ sxe2_hex_to_bytes(param->enc_key, res->encrypt_key, param->enc_len);
+ if (param->auth_algo != SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL) {
+ param->auth_len = 32;
+ sxe2_hex_to_bytes(param->auth_key, res->auth_key, param->auth_len);
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t cmd_ipsec_egress_op_parsed(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ int32_t ret = -1;
+
+ switch (param->op) {
+ case SXE2_TESTPMD_CMD_IPSEC_OP_ADD:
+ ret = cmd_ipsec_add_param_fill(param, cl, res);
+ if (ret)
+ goto l_end;
+ ret = sxe2_ipsec_egress_create(param, cl);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_OP_RM:
+ param->session_id = res->session_id;
+ ret = sxe2_ipsec_egress_destroy(param, cl);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_OP_SHOW:
+ ret = sxe2_ipsec_egress_show(param, cl);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t cmd_ipsec_ip_addr_parsed(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ int32_t ret = -1;
+ struct in_addr addr4;
+ struct in6_addr addr6;
+
+ if (inet_pton(AF_INET, res->dst_ip, &addr4) == 1) {
+ param->ip_addr.type = RTE_SECURITY_IPSEC_TUNNEL_IPV4;
+ param->ip_addr.dst_ipv4 = addr4.s_addr;
+ ret = 0;
+ } else if (inet_pton(AF_INET6, res->dst_ip, &addr6) == 1) {
+ param->ip_addr.type = RTE_SECURITY_IPSEC_TUNNEL_IPV6;
+ memcpy(¶m->ip_addr.dst_ipv6, &addr6, sizeof(param->ip_addr.dst_ipv6));
+ ret = 0;
+ } else {
+ cmdline_printf(cl, "Invalid ip address: %s!\n", res->dst_ip);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t cmd_ipsec_ingress_op_parsed(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ int32_t ret = -1;
+
+ switch (param->op) {
+ case SXE2_TESTPMD_CMD_IPSEC_OP_ADD:
+ ret = cmd_ipsec_add_param_fill(param, cl, res);
+ if (ret)
+ goto l_end;
+ param->sport = htons(res->sport);
+ param->dport = htons(res->dport);
+ param->spi = htonl(res->spi);
+ ret = cmd_ipsec_ip_addr_parsed(param, cl, res);
+ if (ret)
+ goto l_end;
+ ret = sxe2_ipsec_ingress_create(param, cl);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_OP_RM:
+ param->session_id = res->session_id;
+ ret = sxe2_ipsec_ingress_destroy(param, cl);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_OP_SHOW:
+ ret = sxe2_ipsec_ingress_show(param, cl);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t cmd_ipsec_dir_parsed(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ int32_t ret = -1;
+
+ switch (param->dir) {
+ case SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS:
+ ret = cmd_ipsec_egress_op_parsed(param, cl, res);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_DIR_INGRESS:
+ ret = cmd_ipsec_ingress_op_parsed(param, cl, res);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static void cmd_ipsec_mgt_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_ipsec_result *res = parsed_result;
+ struct sxe2_ipsec_conf_param param;
+ int32_t ret = -1;
+ uint8_t dir = 0;
+ uint8_t op = 0;
+
+ dir = cmd_ipsec_dir_get(res->dir);
+ if (dir >= SXE2_TESTPMD_CMD_IPSEC_DIR_MAX) {
+ cmdline_printf(cl, "Invalid ipsec direction: %s!\n", res->dir);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ op = cmd_ipsec_op_get(res->op);
+ if (op >= SXE2_TESTPMD_CMD_IPSEC_OP_MAX) {
+ cmdline_printf(cl, "Invalid ipsec operation: %s!\n", res->op);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ memset(¶m, 0, sizeof(struct sxe2_ipsec_conf_param));
+ param.dir = dir;
+ param.op = op;
+ param.port_id = res->port_id;
+ ret = cmd_ipsec_dir_parsed(¶m, cl, res);
+
+ if (ret)
+ cmdline_printf(cl, "Command execute failed, ret = %d\n", ret);
+
+l_end:
+ return;
+}
+
+static void cmd_ipsec_set_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_ipsec_set_result *res = parsed_result;
+ int32_t ret = -1;
+
+ if (!strcmp(res->op, "set"))
+ ret = sxe2_ipsec_conf_set(res->port_id, cl, res->type, res->conf_value);
+ else if (!strcmp(res->op, "get"))
+ ret = sxe2_ipsec_conf_get(res->port_id, cl, res->type);
+ else
+ cmdline_printf(cl, "Invalid op: %s\n", res->op);
+
+ if (ret)
+ cmdline_printf(cl, "Command execute failed, ret = %d\n", ret);
+}
+
+static void cmd_ipsec_flush_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_ipsec_flush_result *res = parsed_result;
+ int32_t ret = -1;
+
+ ret = sxe2_ipsec_flush(res->port_id, cl);
+
+ if (ret)
+ cmdline_printf(cl, "Command execute failed, ret = %d\n", ret);
+}
+
+cmdline_parse_inst_t cmd_flow_rule_dump = {
+ .f = cmd_dump_flow_rule_parsed,
+ .data = NULL,
+ .help_str = "sxe2 flow rule dump <port_id>",
+ .tokens = {
+ (void *)&cmd_flow_rule_sxe2,
+ (void *)&cmd_flow_rule_flow,
+ (void *)&cmd_flow_rule_rule,
+ (void *)&cmd_flow_rule_dmp,
+ (void *)&cmd_flow_rule_port_id,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_udp_tunnel_set = {
+ .f = cmd_udp_tunnel_set_parsed,
+ .data = NULL,
+ .help_str = "sxe2 <port_id> udp_tunnel_port add|rm|show "
+ "vxlan|vxlan-gpe|geneve|gtp-c|gtp-u|pfcp|ecpri|mpls|nvgre|l2tp|teredo <udp_port>",
+ .tokens = {
+ (void *)&cmd_udp_tunnel_sxe2,
+ (void *)&cmd_udp_tunnel_port_id,
+ (void *)&cmd_udp_tunnel_udp_tunnel_port,
+ (void *)&cmd_udp_tunnel_action,
+ (void *)&cmd_udp_tunnel_tunnel_type,
+ (void *)&cmd_udp_tunnel_udp_port,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_stats_mgt = {
+ .f = cmd_dump_stats_info_parsed,
+ .data = NULL,
+ .help_str = "sxe2 show stats <port_id>",
+ .tokens = {
+ (void *)&cmd_stats_info_sxe2,
+ (void *)&cmd_stats_info_show,
+ (void *)&cmd_stats_info_stats,
+ (void *)&cmd_stats_info_port_id,
+ NULL,
+ },
+};
+
+static void cmd_sched_reset_cfg(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_sched_result *res = parsed_result;
+ int32_t ret = -1;
+
+ ret = sxe2_testpmd_sched_reset(res->port_id);
+ switch (ret) {
+ case 0:
+ break;
+ case -EINVAL:
+ cmdline_printf(cl, "invalid sched ops\n");
+ break;
+ case -ENOTSUP:
+ cmdline_printf(cl, "function not implemented\n");
+ break;
+ default:
+ cmdline_printf(cl, "programming error: (%s)\n",
+ strerror(-ret));
+ }
+}
+
+cmdline_parse_inst_t cmd_sched_reset_cmd = {
+ .f = cmd_sched_reset_cfg,
+ .data = NULL,
+ .help_str = "sxe2 sched reset <port_id>",
+ .tokens = {
+ (void *)&cmd_sched_sxe2,
+ (void *)&cmd_sched_sched,
+ (void *)&cmd_sched_reset,
+ (void *)&cmd_sched_port_id,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_ipsec_mgt = {
+ .f = cmd_ipsec_mgt_parsed,
+ .data = NULL,
+ .help_str = "sxe2 ipsec egress|ingress add|rm|show "
+ "<port_id> <session_id> aes-cbc|sm4-cbc|null <encrypt_key> sha-hmac|sm3-hmac|null "
+ "<auth_key> <dst_ip> <sport> <dport> <spi>",
+ .tokens = {
+ (void *)&cmd_ipsec_mgt_sxe2,
+ (void *)&cmd_ipsec_mgt_module,
+ (void *)&cmd_ipsec_mgt_dir,
+ (void *)&cmd_ipsec_mgt_op,
+ (void *)&cmd_ipsec_mgt_port_id,
+ (void *)&cmd_ipsec_mgt_session_id,
+ (void *)&cmd_ipsec_mgt_encrypt_algo,
+ (void *)&cmd_ipsec_mgt_encrypt_key,
+ (void *)&cmd_ipsec_mgt_auth_algo,
+ (void *)&cmd_ipsec_mgt_auth_key,
+ (void *)&cmd_ipsec_mgt_dst_ip,
+ (void *)&cmd_ipsec_mgt_sport,
+ (void *)&cmd_ipsec_mgt_dport,
+ (void *)&cmd_ipsec_mgt_spi,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_ipsec_set = {
+ .f = cmd_ipsec_set_parsed,
+ .data = NULL,
+ .help_str = "sxe2 ipsec set|get esp-hdr-offset|session-id <port_id> <value>",
+ .tokens = {
+ (void *)&cmd_ipsec_set_sxe2,
+ (void *)&cmd_ipsec_set_module,
+ (void *)&cmd_ipsec_set_op,
+ (void *)&cmd_ipsec_set_type,
+ (void *)&cmd_ipsec_set_port_id,
+ (void *)&cmd_ipsec_set_value,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_ipsec_flush = {
+ .f = cmd_ipsec_flush_parsed,
+ .data = NULL,
+ .help_str = "sxe2 ipsec flush <port_id>.\n",
+ .tokens = {
+ (void *)&cmd_ipsec_flush_sxe2,
+ (void *)&cmd_ipsec_flush_module,
+ (void *)&cmd_ipsec_flush_op,
+ (void *)&cmd_ipsec_flush_port_id,
+ NULL,
+ },
+};
+
+static struct testpmd_driver_commands sxe2_cmds = {
+ .commands = {
+ {
+ &cmd_udp_tunnel_set,
+ "sxe2 udp tunnel port set.\n"
+ "Add or remove a customed udp port for specific tunnel protocol\n\n",
+ },
+ {
+ &cmd_sched_reset_cmd,
+ "sxe2 sched reset <port_id>.\n"
+ "Reset sched node on the port\n\n",
+ },
+ {
+ &cmd_stats_mgt,
+ "sxe2 show stats.\n"
+ "Dump a runtime sxe2 dev stats on a port\n\n",
+ },
+ {
+ &cmd_ipsec_mgt,
+ "sxe2 ipsec <dir> <op> <port_id> <session_id> <encrypt_algo> <encrypt_key>"
+ "<encrypt_len> <auth_algo> <auth_key> <auth_len> <dst_ip> <sport> <dport> <spi>.\n"
+ "Create/query/remove ipsec security session\n\n",
+ },
+ {
+ &cmd_ipsec_set,
+ "sxe2 ipsec set <port_id> <session_id> <esp_hdr_offset>.\n"
+ "Set enabled tx session id or esp offset.\n\n",
+ },
+ {
+ &cmd_ipsec_flush,
+ "sxe2 ipsec flush <port_id>.\n"
+ "Flush ipsec all configurations\n\n",
+ },
+ { NULL, NULL},
+ },
+};
+TESTPMD_ADD_DRIVER_COMMANDS(sxe2_cmds)
+#endif
diff --git a/drivers/net/sxe2/sxe2_testpmd_lib.c b/drivers/net/sxe2/sxe2_testpmd_lib.c
new file mode 100644
index 0000000000..ab2530ffe6
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_testpmd_lib.c
@@ -0,0 +1,969 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#include <rte_bus.h>
+#include <eal_export.h>
+
+#include "sxe2_common_log.h"
+#include "sxe2_ethdev.h"
+#include "sxe2_stats.h"
+#include "sxe2_testpmd_lib.h"
+
+struct rte_mempool *g_sess_pool;
+
+bool g_sxe2_ipsec_mgt_init;
+struct sxe2_ipsec_session_mgt g_tx_session[SXE2_IPSEC_PORT_MAX][SXE2_IPSEC_SESSION_MAX];
+struct sxe2_ipsec_session_mgt g_rx_session[SXE2_IPSEC_PORT_MAX][SXE2_IPSEC_SESSION_MAX];
+uint16_t g_tx_sess_id[SXE2_IPSEC_PORT_MAX] = {0};
+uint16_t g_esp_header_offset[SXE2_IPSEC_PORT_MAX] = {0};
+
+static bool sxe2_is_supported(struct rte_eth_dev *dev)
+{
+ return sxe2_ethdev_check(dev);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_testpmd_sched_reset, 26.07)
+int32_t
+sxe2_testpmd_sched_reset(uint16_t port_id)
+{
+ struct rte_eth_dev *dev = NULL;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ return -ENODEV;
+ }
+
+ return sxe2_sched_reset(dev);
+}
+
+extern const char *sxe2_flow_type_name[SXE2_FLOW_TYPE_MAX];
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_flow_rule_dump, 26.07)
+int32_t
+sxe2_flow_rule_dump(uint16_t port_id, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct sxe2_adapter *adapter = NULL;
+ int32_t ret = -1;
+ struct rte_flow_list_t *flow_list = NULL;
+ struct rte_flow *flow = NULL;
+ uint32_t index = 0;
+ struct sxe2_flow *hw_flow = NULL;
+ uint8_t i = 0;
+
+ const char *sxe2_flow_engine_name[SXE2_FLOW_ENGINE_MAX] = {
+ [SXE2_FLOW_ENGINE_ACL] = "acl",
+ [SXE2_FLOW_ENGINE_RSS] = "rss",
+ [SXE2_FLOW_ENGINE_SWITCH] = "switch",
+ [SXE2_FLOW_ENGINE_FNAV] = "fnav",
+ };
+ const char *sxe2_flow_action_name[SXE2_FLOW_ACTION_MAX] = {
+ [SXE2_FLOW_ACTION_DROP] = "drop",
+ [SXE2_FLOW_ACTION_TC_REDIRECT] = "tc_redirect",
+ [SXE2_FLOW_ACTION_TO_VSI] = "to_vsi",
+ [SXE2_FLOW_ACTION_TO_VSI_LIST] = "to_vsi_list",
+ [SXE2_FLOW_ACTION_PASSTHRU] = "passthru",
+ [SXE2_FLOW_ACTION_QUEUE] = "queue",
+ [SXE2_FLOW_ACTION_Q_REGION] = "q_region",
+ [SXE2_FLOW_ACTION_MARK] = "mark",
+ [SXE2_FLOW_ACTION_COUNT] = "count",
+ [SXE2_FLOW_ACTION_RSS] = "rss",
+ };
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev");
+ ret = -ENODEV;
+ goto l_end;
+ }
+ adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ flow_list = &adapter->flow_ctxt.rte_flow_list;
+ cmdline_printf(cl, "Dump sxe2 flow rule:\n");
+ TAILQ_FOREACH(flow, flow_list, next) {
+ cmdline_printf(cl, "rule index: %d\n", index++);
+ TAILQ_FOREACH(hw_flow, &flow->sxe2_flow_list, next) {
+ cmdline_printf(cl, "\thw flow id: %d\n", hw_flow->flow_id);
+ cmdline_printf(cl, "\t\ttype: %s\n",
+ sxe2_flow_type_name[hw_flow->meta.flow_type]);
+ cmdline_printf(cl, "\t\tprio: %d\n", hw_flow->meta.flow_prio);
+ cmdline_printf(cl, "\t\tsrc vsi: %d,rule vsi: %d\n",
+ hw_flow->meta.flow_src_vsi, hw_flow->meta.flow_rule_vsi);
+ cmdline_printf(cl, "\t\tengine type: %s\n",
+ sxe2_flow_engine_name[hw_flow->engine_type]);
+ cmdline_printf(cl, "\t\taction:");
+ for (i = 0; i < SXE2_FLOW_ACTION_MAX; i++) {
+ if (sxe2_test_bit(i, hw_flow->action.act_types))
+ cmdline_printf(cl, "%s ", sxe2_flow_action_name[i]);
+ }
+ cmdline_printf(cl, "\n");
+ }
+ }
+ cmdline_printf(cl, "Dump sxe2 flow rule end.\n");
+ ret = 0;
+l_end:
+ return ret;
+}
+
+static const char *tunnel_type_list[SXE2_UDP_TUNNEL_MAX] = {
+ [SXE2_UDP_TUNNEL_PROTOCOL_VXLAN] = "vxlan",
+ [SXE2_UDP_TUNNEL_PROTOCOL_VXLAN_GPE] = "vxlan-gpe",
+ [SXE2_UDP_TUNNEL_PROTOCOL_GENEVE] = "geneve",
+ [SXE2_UDP_TUNNEL_PROTOCOL_GTP_C] = "gtp-c",
+ [SXE2_UDP_TUNNEL_PROTOCOL_GTP_U] = "gtp-u",
+ [SXE2_UDP_TUNNEL_PROTOCOL_PFCP] = "pfcp",
+ [SXE2_UDP_TUNNEL_PROTOCOL_ECPRI] = "ecpri",
+ [SXE2_UDP_TUNNEL_PROTOCOL_MPLS] = "mpls",
+ [SXE2_UDP_TUNNEL_PROTOCOL_NVGRE] = "nvgre",
+ [SXE2_UDP_TUNNEL_PROTOCOL_L2TP] = "l2tp",
+ [SXE2_UDP_TUNNEL_PROTOCOL_TEREDO] = "teredo"
+};
+
+static enum sxe2_udp_tunnel_protocol sxe2_udp_tunnel_type_str2proto(const char *tunnel_type)
+{
+ enum sxe2_udp_tunnel_protocol proto;
+
+ for (proto = 0; proto < SXE2_UDP_TUNNEL_MAX; proto++) {
+ if (tunnel_type_list[proto] != NULL &&
+ strcmp(tunnel_type_list[proto], tunnel_type) == 0) {
+ break;
+ }
+ }
+
+ return proto;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_udp_tunnel_operations, 26.07)
+int32_t
+sxe2_udp_tunnel_operations(uint16_t port_id, struct cmdline *cl, uint8_t action,
+ uint16_t udp_port, const char *tunnel_type)
+{
+ enum sxe2_udp_tunnel_protocol proto = sxe2_udp_tunnel_type_str2proto(tunnel_type);
+ struct rte_eth_dev *dev = NULL;
+ struct sxe2_adapter *adapter = NULL;
+ struct sxe2_udp_tunnel_cfg tunnel_config = { 0 };
+ int32_t ret = -1;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (proto >= SXE2_UDP_TUNNEL_MAX) {
+ cmdline_printf(cl, "Invalid tunnel type!\n");
+ goto l_end;
+ }
+ adapter = dev->data->dev_private;
+ switch (action) {
+ case SXE2_TESTPMD_CMD_UDP_TUNNEL_ADD:
+ ret = sxe2_udp_tunnel_port_add_common(adapter, proto, udp_port);
+ break;
+ case SXE2_TESTPMD_CMD_UDP_TUNNEL_DEL:
+ ret = sxe2_udp_tunnel_port_del_common(adapter, proto, udp_port);
+ break;
+ case SXE2_TESTPMD_CMD_UDP_TUNNEL_GET:
+ tunnel_config.protocol = proto;
+ ret = sxe2_udp_tunnel_port_get_common(adapter, &tunnel_config);
+ if (!ret) {
+ cmdline_printf(cl, "Dump firmware udp tunnel config: [proto:%s, port:%d,"
+ "enable:%d, src/dst:%d/%d, used:%d]\n",
+ tunnel_type_list[proto], tunnel_config.fw_port,
+ tunnel_config.fw_status, tunnel_config.fw_src_en,
+ tunnel_config.fw_dst_en, tunnel_config.fw_used);
+ }
+ break;
+ default:
+ break;
+ }
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_stats_info_show, 26.07)
+int32_t
+sxe2_stats_info_show(uint16_t port_id)
+{
+ struct rte_eth_dev *dev = NULL;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+static int32_t sxe2_ipsec_init_mempools(void *sec_ctx)
+{
+ uint16_t nb_sess = 8192;
+ uint32_t sess_sz;
+ char s[64];
+ int32_t ret = -1;
+
+ sess_sz = rte_security_session_get_size(sec_ctx);
+ if (g_sess_pool == NULL) {
+ snprintf(s, sizeof(s), "sess_pool");
+ g_sess_pool = rte_mempool_create(s, nb_sess, sess_sz,
+ MEMPOOL_CACHE_SIZE, 0,
+ NULL, NULL, NULL, NULL,
+ SOCKET_ID_ANY, 0);
+ if (g_sess_pool == NULL) {
+ ret = -ENOMEM;
+ PMD_LOG_ERR(DRV, "Failed to malloc session pool memory.");
+ goto l_end;
+ }
+ }
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static void sxe2_ipsec_init_session_mgt(void)
+{
+ uint16_t i;
+ uint8_t port_id;
+
+ if (g_sxe2_ipsec_mgt_init)
+ return;
+
+ for (port_id = 0; port_id < SXE2_IPSEC_PORT_MAX; port_id++) {
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ g_tx_session[port_id][i].session = NULL;
+ g_tx_session[port_id][i].encrypt_algo = SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL;
+ g_tx_session[port_id][i].auth_algo = SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL;
+ g_tx_session[port_id][i].session_id = i;
+ g_tx_session[port_id][i].status = 0;
+ }
+ }
+
+ for (port_id = 0; port_id < SXE2_IPSEC_PORT_MAX; port_id++) {
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ g_rx_session[port_id][i].session = NULL;
+ g_rx_session[port_id][i].encrypt_algo = SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL;
+ g_rx_session[port_id][i].auth_algo = SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL;
+ g_rx_session[port_id][i].session_id = i;
+ g_rx_session[port_id][i].status = 0;
+ }
+ }
+
+ g_sxe2_ipsec_mgt_init = true;
+}
+
+static uint16_t sxe2_ipsec_session_mgt_alloc(enum sxe2_testpmd_ipsec_dir dir, uint16_t port_id)
+{
+ uint16_t i;
+ uint16_t index = 0XFFFF;
+ struct sxe2_ipsec_session_mgt *mgt = NULL;
+
+ if (dir == SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS)
+ mgt = g_tx_session[port_id];
+ else
+ mgt = g_rx_session[port_id];
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ if (mgt[i].status == 0) {
+ index = i;
+ mgt[i].status = 1;
+ break;
+ }
+ }
+
+ return index;
+}
+
+static void sxe2_ipsec_session_mgt_free(enum sxe2_testpmd_ipsec_dir dir,
+ uint16_t index, uint16_t port_id)
+{
+ struct sxe2_ipsec_session_mgt *mgt = NULL;
+
+ if (dir == SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS)
+ mgt = g_tx_session[port_id];
+ else
+ mgt = g_rx_session[port_id];
+
+ mgt[index].session = NULL;
+ mgt[index].status = 0;
+}
+
+static int32_t sxe2_ipsec_egress_construct(struct cmdline *cl,
+ struct rte_crypto_sym_xform **xform,
+ struct sxe2_ipsec_conf_param *param)
+{
+ struct rte_crypto_sym_xform *cur_xform = NULL;
+ struct rte_crypto_sym_xform *next_xform = NULL;
+ int32_t ret = -1;
+
+ cur_xform = rte_zmalloc("current xform",
+ sizeof(struct rte_crypto_sym_xform), 0);
+ if (cur_xform == NULL) {
+ ret = -ENOMEM;
+ cmdline_printf(cl, "Failed to malloc memory!\n");
+ goto l_end;
+ }
+ cur_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+ cur_xform->cipher.op = RTE_CRYPTO_CIPHER_OP_ENCRYPT;
+ if (param->encrypt_algo == SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC)
+ cur_xform->cipher.algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC;
+ else
+ cur_xform->cipher.algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC;
+ cur_xform->cipher.key.length = param->enc_len;
+ cur_xform->cipher.key.data = param->enc_key;
+
+ if (param->auth_algo == SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL) {
+ ret = 0;
+ goto l_end;
+ }
+
+ next_xform = rte_zmalloc("next xform",
+ sizeof(struct rte_crypto_sym_xform), 0);
+ if (next_xform == NULL) {
+ rte_free(cur_xform);
+ ret = -ENOMEM;
+ cmdline_printf(cl, "Failed to malloc memory!\n");
+ goto l_end;
+ }
+ next_xform->type = RTE_CRYPTO_SYM_XFORM_AUTH;
+ next_xform->auth.op = RTE_CRYPTO_AUTH_OP_GENERATE;
+ if (param->auth_algo == SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC)
+ next_xform->auth.algo = SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC;
+ else
+ next_xform->auth.algo = SXE2_RTE_CRYPTO_AUTH_SM3_HMAC;
+ next_xform->auth.key.length = param->auth_len;
+ next_xform->auth.key.data = param->auth_key;
+ cur_xform->next = next_xform;
+ ret = 0;
+
+l_end:
+ *xform = cur_xform;
+ return ret;
+}
+
+static int32_t sxe2_ipsec_ingress_construct(struct cmdline *cl,
+ struct rte_crypto_sym_xform **xform,
+ struct sxe2_ipsec_conf_param *param)
+{
+ struct rte_crypto_sym_xform *cur_xform = NULL;
+ struct rte_crypto_sym_xform *next_xform = NULL;
+ int32_t ret = -1;
+
+ cur_xform = rte_zmalloc("current xform",
+ sizeof(struct rte_crypto_sym_xform), 0);
+ if (cur_xform == NULL) {
+ ret = -ENOMEM;
+ cmdline_printf(cl, "Failed to malloc memory!\n");
+ goto l_end;
+ }
+
+ if (param->auth_algo == SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL) {
+ cur_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+ cur_xform->cipher.op = RTE_CRYPTO_CIPHER_OP_DECRYPT;
+ if (param->encrypt_algo == SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC)
+ cur_xform->cipher.algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC;
+ else
+ cur_xform->cipher.algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC;
+ cur_xform->cipher.key.length = param->enc_len;
+ cur_xform->cipher.key.data = param->enc_key;
+ ret = 0;
+ goto l_end;
+ }
+
+ cur_xform->type = RTE_CRYPTO_SYM_XFORM_AUTH;
+ cur_xform->auth.op = RTE_CRYPTO_AUTH_OP_VERIFY;
+ if (param->auth_algo == SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC)
+ cur_xform->auth.algo = SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC;
+ else
+ cur_xform->auth.algo = SXE2_RTE_CRYPTO_AUTH_SM3_HMAC;
+
+ cur_xform->auth.key.length = param->auth_len;
+ cur_xform->auth.key.data = param->auth_key;
+
+ next_xform = rte_zmalloc("next xform",
+ sizeof(struct rte_crypto_sym_xform), 0);
+ if (next_xform == NULL) {
+ rte_free(cur_xform);
+ ret = -ENOMEM;
+ cmdline_printf(cl, "Failed to malloc memory!\n");
+ goto l_end;
+ }
+
+ next_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+ next_xform->cipher.op = RTE_CRYPTO_CIPHER_OP_DECRYPT;
+ if (param->encrypt_algo == SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC)
+ next_xform->cipher.algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC;
+ else
+ next_xform->cipher.algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC;
+ next_xform->cipher.key.length = param->enc_len;
+ next_xform->cipher.key.data = param->enc_key;
+ cur_xform->next = next_xform;
+ ret = 0;
+
+l_end:
+ *xform = cur_xform;
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_ingress_create, 26.07)
+int32_t
+sxe2_ipsec_ingress_create(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_session_conf conf;
+ struct rte_crypto_sym_xform *encrypt_xform = NULL;
+ void *session = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ int32_t ret = -1;
+ uint16_t index;
+ uint8_t i;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(param->port_id);
+
+ if (g_sess_pool == NULL) {
+ ret = sxe2_ipsec_init_mempools(p_ctx);
+ if (ret)
+ goto l_end;
+ }
+
+ sxe2_ipsec_init_session_mgt();
+
+ memset(&conf, 0, sizeof(conf));
+ conf.protocol = RTE_SECURITY_PROTOCOL_IPSEC;
+ conf.action_type = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO;
+ conf.ipsec.mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL;
+ conf.ipsec.proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP;
+ conf.ipsec.direction = RTE_SECURITY_IPSEC_SA_DIR_INGRESS;
+ conf.ipsec.spi = param->spi;
+ conf.ipsec.udp.sport = param->sport;
+ conf.ipsec.udp.dport = param->dport;
+ conf.ipsec.tunnel.type = param->ip_addr.type;
+ if (param->sport || param->dport)
+ conf.ipsec.options.udp_encap = true;
+ if (param->ip_addr.type == RTE_SECURITY_IPSEC_TUNNEL_IPV4)
+ conf.ipsec.tunnel.ipv4.dst_ip.s_addr = param->ip_addr.dst_ipv4;
+ else
+ memcpy(&conf.ipsec.tunnel.ipv6.dst_addr,
+ ¶m->ip_addr.dst_ipv6,
+ sizeof(param->ip_addr.dst_ipv6));
+
+ ret = sxe2_ipsec_ingress_construct(cl, &encrypt_xform, param);
+ if (ret)
+ goto l_end;
+ conf.crypto_xform = encrypt_xform;
+
+ session = rte_security_session_create(p_ctx, &conf, g_sess_pool);
+ if (session == NULL) {
+ ret = -1;
+ goto l_free;
+ }
+
+ index = sxe2_ipsec_session_mgt_alloc(param->dir, param->port_id);
+ if (index == 0XFFFF) {
+ ret = -1;
+ goto l_free;
+ }
+
+ g_rx_session[param->port_id][index].session = session;
+ g_rx_session[param->port_id][index].encrypt_algo = param->encrypt_algo;
+ g_rx_session[param->port_id][index].auth_algo = param->auth_algo;
+ for (i = 0; i < 32; i++) {
+ g_rx_session[param->port_id][index].enc_key[i] = param->enc_key[i];
+ g_rx_session[param->port_id][index].auth_key[i] = param->auth_key[i];
+ }
+ g_rx_session[param->port_id][index].sport = ntohs(param->sport);
+ g_rx_session[param->port_id][index].dport = ntohs(param->dport);
+ g_rx_session[param->port_id][index].spi = ntohl(param->spi);
+ memcpy(&g_rx_session[param->port_id][index].ip_addr,
+ ¶m->ip_addr,
+ sizeof(struct sxe2_ipsec_ip_param));
+
+ ret = 0;
+
+l_free:
+ if (encrypt_xform->next)
+ rte_free(encrypt_xform->next);
+ if (encrypt_xform)
+ rte_free(encrypt_xform);
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_ingress_destroy, 26.07)
+int32_t
+sxe2_ipsec_ingress_destroy(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ struct rte_security_session *session = NULL;
+ int32_t ret = -1;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ cmdline_printf(cl, "Invalid dev.\n");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ if (param->session_id >= SXE2_IPSEC_SESSION_MAX) {
+ PMD_LOG_ERR(DRV, "Invalid session id.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (!g_rx_session[param->port_id][param->session_id].status) {
+ PMD_LOG_ERR(DRV, "Invalid session status.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (g_rx_session[param->port_id][param->session_id].session == NULL) {
+ PMD_LOG_ERR(DRV, "Invalid session data.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(param->port_id);
+
+ session = g_rx_session[param->port_id][param->session_id].session;
+ ret = rte_security_session_destroy(p_ctx, session);
+ if (ret)
+ goto l_end;
+ sxe2_ipsec_session_mgt_free(param->dir, param->session_id, param->port_id);
+
+ ret = 0;
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_ingress_show, 26.07)
+int32_t
+sxe2_ipsec_ingress_show(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ int32_t ret = -1;
+ uint16_t i;
+ uint8_t j;
+ char encrypt_key[65];
+ char auth_key[65];
+ const char *encrypt_algo[SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC] = "aes-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC] = "sm4-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL] = "null"
+ };
+
+ const char *auth_algo[SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC] = "sha-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SM3_HMAC] = "sm3-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL] = "null"
+ };
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ if (g_rx_session[param->port_id][i].status &&
+ g_rx_session[param->port_id][i].session) {
+ memset(encrypt_key, '\0', sizeof(encrypt_key));
+ memset(auth_key, '\0', sizeof(auth_key));
+ for (j = 0; j < 32; j++) {
+ sprintf(encrypt_key + 2 * j, "%02x",
+ g_rx_session[param->port_id][i].enc_key[j]);
+ }
+
+ if (g_rx_session[param->port_id][i].auth_algo !=
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL) {
+ for (j = 0; j < 32; j++) {
+ sprintf(auth_key + 2 * j, "%02x",
+ g_rx_session[param->port_id][i].auth_key[j]);
+ }
+ }
+
+ cmdline_printf(cl, "session_id:%u, direction:rx ,"
+ "encrypt_algo:%s, encrypt_key:0x%s,"
+ "auth_algo:%s, auth_key:0x%s, sport:%u, dport:%u, spi:%u\n",
+ i,
+ encrypt_algo[g_rx_session[param->port_id][i].encrypt_algo],
+ encrypt_key,
+ auth_algo[g_rx_session[param->port_id][i].auth_algo],
+ auth_key,
+ g_rx_session[param->port_id][i].sport,
+ g_rx_session[param->port_id][i].dport,
+ g_rx_session[param->port_id][i].spi);
+ }
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_egress_create, 26.07)
+int32_t
+sxe2_ipsec_egress_create(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_session_conf conf;
+ struct rte_crypto_sym_xform *encrypt_xform = NULL;
+ void *session = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ int32_t ret = -1;
+ uint16_t index;
+ uint8_t i;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(param->port_id);
+
+ if (g_sess_pool == NULL) {
+ ret = sxe2_ipsec_init_mempools(p_ctx);
+ if (ret)
+ goto l_end;
+ }
+
+ sxe2_ipsec_init_session_mgt();
+
+ memset(&conf, 0, sizeof(conf));
+ conf.protocol = RTE_SECURITY_PROTOCOL_IPSEC;
+ conf.action_type = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO;
+ conf.ipsec.mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL;
+ conf.ipsec.proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP;
+ conf.ipsec.direction = RTE_SECURITY_IPSEC_SA_DIR_EGRESS;
+
+ ret = sxe2_ipsec_egress_construct(cl, &encrypt_xform, param);
+ if (ret)
+ goto l_end;
+ conf.crypto_xform = encrypt_xform;
+
+ session = rte_security_session_create(p_ctx, &conf, g_sess_pool);
+ if (session == NULL) {
+ ret = -1;
+ goto l_free;
+ }
+
+ index = sxe2_ipsec_session_mgt_alloc(param->dir, param->port_id);
+ if (index == 0XFFFF) {
+ ret = -1;
+ goto l_free;
+ }
+
+ g_tx_session[param->port_id][index].session = session;
+ g_tx_session[param->port_id][index].encrypt_algo = param->encrypt_algo;
+ g_tx_session[param->port_id][index].auth_algo = param->auth_algo;
+ for (i = 0; i < 32; i++) {
+ g_tx_session[param->port_id][index].enc_key[i] = param->enc_key[i];
+ g_tx_session[param->port_id][index].auth_key[i] = param->auth_key[i];
+ }
+ ret = 0;
+
+l_free:
+ if (encrypt_xform->next)
+ rte_free(encrypt_xform->next);
+ if (encrypt_xform)
+ rte_free(encrypt_xform);
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_egress_destroy, 26.07)
+int32_t
+sxe2_ipsec_egress_destroy(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ struct rte_security_session *session = NULL;
+ int32_t ret = -1;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ if (param->session_id >= SXE2_IPSEC_SESSION_MAX) {
+ PMD_LOG_ERR(DRV, "Invalid session id.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (!g_tx_session[param->port_id][param->session_id].status) {
+ PMD_LOG_ERR(DRV, "Invalid session status.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (g_tx_session[param->port_id][param->session_id].session == NULL) {
+ PMD_LOG_ERR(DRV, "Invalid session data.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(param->port_id);
+
+ session = g_tx_session[param->port_id][param->session_id].session;
+ ret = rte_security_session_destroy(p_ctx, session);
+ if (ret)
+ goto l_end;
+ sxe2_ipsec_session_mgt_free(param->dir, param->session_id, param->port_id);
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_egress_show, 26.07)
+int32_t
+sxe2_ipsec_egress_show(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ int32_t ret = -1;
+ uint16_t i;
+ uint8_t j;
+ char encrypt_key[65];
+ char auth_key[65];
+ const char *encrypt_algo[SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC] = "aes-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC] = "sm4-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL] = "null"
+ };
+
+ const char *auth_algo[SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC] = "sha-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SM3_HMAC] = "sm3-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL] = "null"
+ };
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ if (g_tx_session[param->port_id][i].status &&
+ g_tx_session[param->port_id][i].session) {
+ memset(encrypt_key, '\0', sizeof(encrypt_key));
+ memset(auth_key, '\0', sizeof(auth_key));
+ for (j = 0; j < 32; j++)
+ sprintf(encrypt_key + 2 * j, "%02x",
+ g_tx_session[param->port_id][i].enc_key[j]);
+ if (g_tx_session[param->port_id][i].auth_algo !=
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL)
+ for (j = 0; j < 32; j++)
+ sprintf(auth_key + 2 * j, "%02x",
+ g_tx_session[param->port_id][i].auth_key[j]);
+
+ cmdline_printf(cl, "id:%u, tx , encrypt_algo:%s,"
+ "encrypt_key:0x%s, auth_algo:%s, auth_key:0x%s.\n",
+ i,
+ encrypt_algo[g_tx_session[param->port_id][i].encrypt_algo],
+ encrypt_key,
+ auth_algo[g_tx_session[param->port_id][i].auth_algo],
+ auth_key);
+ }
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_conf_get, 26.07)
+int32_t
+sxe2_ipsec_conf_get(uint16_t port_id, struct cmdline *cl, char type[])
+{
+ struct rte_eth_dev *dev = NULL;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ return -ENODEV;
+ }
+ if (!strcmp(type, "session-id"))
+ cmdline_printf(cl, "session-id: %u\n",
+ g_tx_sess_id[port_id]);
+ else if (!strcmp(type, "esp-hdr-offset"))
+ cmdline_printf(cl, "esp-hdr-offset: %u\n",
+ g_esp_header_offset[port_id]);
+ else
+ cmdline_printf(cl, "Invalid type: %s\n", type);
+
+ return 0;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_conf_set, 26.07)
+int32_t
+sxe2_ipsec_conf_set(uint16_t port_id, struct cmdline *cl, char type[], uint16_t value)
+{
+ struct rte_eth_dev *dev = NULL;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ return -ENODEV;
+ }
+ if (!strcmp(type, "session-id")) {
+ if (value >= 4096 || !g_tx_session[port_id][value].status) {
+ cmdline_printf(cl, "Invalid session-id: %u,"
+ "0 <= value <= 4095 or the session is inactive.\n", value);
+ return -EINVAL;
+ }
+ g_tx_sess_id[port_id] = value;
+ cmdline_printf(cl, "session-id: %u\n", g_tx_sess_id[port_id]);
+ } else if (!strcmp(type, "esp-hdr-offset")) {
+ if (value < 34 || value > 512) {
+ cmdline_printf(cl, "Invalid esp-hdr-offset: %u,"
+ "34 <= value <= 512.\n", value);
+ return -EINVAL;
+ }
+ g_esp_header_offset[port_id] = value;
+ cmdline_printf(cl, "esp-hdr-offset: %u\n",
+ g_esp_header_offset[port_id]);
+ } else {
+ cmdline_printf(cl, "Invalid type: %s\n", type);
+ }
+
+ return 0;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_stats_show, 26.07)
+int32_t
+sxe2_ipsec_stats_show(uint16_t port_id)
+{
+ (void)port_id;
+ return 0;
+}
+
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_flush, 26.07)
+int32_t
+sxe2_ipsec_flush(uint16_t port_id, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ struct rte_security_session *session = NULL;
+ int32_t ret = -1;
+ uint16_t i;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ cmdline_printf(cl, "Invalid dev.\n");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(port_id);
+
+ g_esp_header_offset[port_id] = 0;
+ g_tx_sess_id[port_id] = 0;
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ session = g_tx_session[port_id][i].session;
+ if (g_tx_session[port_id][i].status && session) {
+ ret = rte_security_session_destroy(p_ctx, session);
+ if (ret)
+ cmdline_printf(cl, "failed to destroy tx session: %d.\n", i);
+ else
+ sxe2_ipsec_session_mgt_free(SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS,
+ i, port_id);
+ }
+ }
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ session = g_rx_session[port_id][i].session;
+ if (g_rx_session[port_id][i].status && session) {
+ ret = rte_security_session_destroy(p_ctx, session);
+ if (ret)
+ cmdline_printf(cl, "failed to destroy rx session: %d.\n", i);
+ else
+ sxe2_ipsec_session_mgt_free(SXE2_TESTPMD_CMD_IPSEC_DIR_INGRESS,
+ i, port_id);
+ }
+ }
+
+ g_sxe2_ipsec_mgt_init = false;
+ ret = 0;
+
+l_end:
+ return ret;
+}
diff --git a/drivers/net/sxe2/sxe2_testpmd_lib.h b/drivers/net/sxe2/sxe2_testpmd_lib.h
new file mode 100644
index 0000000000..3d2659ef00
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_testpmd_lib.h
@@ -0,0 +1,142 @@
+
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef __SXE2_TESTPMD_LIB_H__
+#define __SXE2_TESTPMD_LIB_H__
+#include <cmdline.h>
+#include "sxe2_ipsec.h"
+
+#define SXE2_IPSEC_SESSION_MAX (4096)
+#define SXE2_IPSEC_PORT_MAX RTE_MAX_ETHPORTS
+#define MEMPOOL_CACHE_SIZE (512 / 2)
+
+enum {
+ SXE2_TESTPMD_CMD_UDP_TUNNEL_ADD = 0,
+ SXE2_TESTPMD_CMD_UDP_TUNNEL_DEL = 1,
+ SXE2_TESTPMD_CMD_UDP_TUNNEL_GET = 2,
+ SXE2_TESTPMD_CMD_UDP_TUNNEL_MAX,
+};
+
+enum sxe2_testpmd_ipsec_op {
+ SXE2_TESTPMD_CMD_IPSEC_OP_ADD = 0,
+ SXE2_TESTPMD_CMD_IPSEC_OP_RM = 1,
+ SXE2_TESTPMD_CMD_IPSEC_OP_SHOW = 2,
+ SXE2_TESTPMD_CMD_IPSEC_OP_MAX,
+};
+
+enum sxe2_testpmd_ipsec_dir {
+ SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS = 0,
+ SXE2_TESTPMD_CMD_IPSEC_DIR_INGRESS = 1,
+ SXE2_TESTPMD_CMD_IPSEC_DIR_MAX,
+};
+
+enum sxe2_testpmd_ipsec_encrypt_algo {
+ SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC = 0,
+ SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC = 1,
+ SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL = 2,
+ SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX,
+};
+
+enum sxe2_testpmd_ipsec_auth_algo {
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC = 0,
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SM3_HMAC = 1,
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL = 2,
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX,
+};
+
+struct sxe2_ipsec_conf_param {
+ enum sxe2_testpmd_ipsec_dir dir;
+ enum sxe2_testpmd_ipsec_op op;
+ enum sxe2_testpmd_ipsec_encrypt_algo encrypt_algo;
+ enum sxe2_testpmd_ipsec_auth_algo auth_algo;
+ struct sxe2_ipsec_ip_param ip_addr;
+ uint32_t spi;
+ uint16_t port_id;
+ uint16_t session_id;
+ uint16_t sport;
+ uint16_t dport;
+ uint8_t enc_key[32];
+ uint8_t enc_len;
+ uint8_t auth_key[32];
+ uint8_t auth_len;
+};
+
+struct sxe2_ipsec_session_mgt {
+ void *session;
+ enum sxe2_testpmd_ipsec_encrypt_algo encrypt_algo;
+ enum sxe2_testpmd_ipsec_auth_algo auth_algo;
+ struct sxe2_ipsec_ip_param ip_addr;
+ uint32_t spi;
+ uint16_t session_id;
+ uint16_t sport;
+ uint16_t dport;
+ uint8_t enc_key[32];
+ uint8_t auth_key[32];
+ uint8_t status;
+};
+
+__rte_experimental
+int32_t
+sxe2_testpmd_sched_reset(uint16_t port_id);
+
+__rte_experimental
+int32_t
+sxe2_flow_rule_dump(uint16_t port_id, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_udp_tunnel_operations(uint16_t port_id, struct cmdline *cl, uint8_t action,
+ uint16_t udp_port, const char *tunnel_type);
+
+__rte_experimental
+int32_t
+sxe2_stats_info_show(uint16_t port_id);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_ingress_create(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_ingress_destroy(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_ingress_show(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_egress_create(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_egress_destroy(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_egress_show(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_conf_get(uint16_t port_id, struct cmdline *cl, char type[]);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_conf_set(uint16_t port_id, struct cmdline *cl, char type[], uint16_t value);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_stats_show(uint16_t port_id);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_flush(uint16_t port_id, struct cmdline *cl);
+
+extern struct sxe2_ipsec_session_mgt g_tx_session[SXE2_IPSEC_PORT_MAX][SXE2_IPSEC_SESSION_MAX];
+extern uint16_t g_tx_sess_id[SXE2_IPSEC_PORT_MAX];
+extern uint16_t g_esp_header_offset[SXE2_IPSEC_PORT_MAX];
+extern struct rte_mempool *g_sess_pool;
+
+#endif /* __SXE2_TESTPMD_LIB_H__ */
diff --git a/drivers/net/sxe2/sxe2_tm.c b/drivers/net/sxe2/sxe2_tm.c
index 4c4f793cd5..5de9b5d3b7 100644
--- a/drivers/net/sxe2/sxe2_tm.c
+++ b/drivers/net/sxe2/sxe2_tm.c
@@ -982,6 +982,24 @@ int32_t sxe2_tm_init(struct rte_eth_dev *dev)
return ret;
}
+int32_t sxe2_tm_conf_reset(struct rte_eth_dev *dev)
+{
+ int32_t ret;
+
+ ret = sxe2_tm_uninit(dev);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_tm_init(dev);
+ if (ret)
+ goto l_end;
+
+ PMD_LOG_DEBUG(DRV, "Tm config reset succeed.");
+
+l_end:
+ return ret;
+}
+
static int32_t sxe2_tm_chk_all_leaf(struct rte_eth_dev *dev)
{
int32_t ret = 0;
diff --git a/drivers/net/sxe2/sxe2_tm.h b/drivers/net/sxe2/sxe2_tm.h
index c4f8da6a8e..b0bfc2091d 100644
--- a/drivers/net/sxe2/sxe2_tm.h
+++ b/drivers/net/sxe2/sxe2_tm.h
@@ -73,4 +73,6 @@ int32_t sxe2_tm_init(struct rte_eth_dev *dev);
int32_t sxe2_tm_uninit(struct rte_eth_dev *dev);
+int32_t sxe2_tm_conf_reset(struct rte_eth_dev *dev);
+
#endif /* __SXE2_TM_H__ */
--
2.52.0
^ permalink raw reply related
* Re: [PATCH v3 0/3] extend interactive telemetry script
From: fengchengwen @ 2026-06-10 1:39 UTC (permalink / raw)
To: Bruce Richardson, dev
In-Reply-To: <20260609161400.3661268-1-bruce.richardson@intel.com>
Series-acked-by: Chengwen Feng <fengchengwen@huawei.com>
On 6/10/2026 12:13 AM, Bruce Richardson wrote:
> To simplify interactive telemetry script for general use, i.e. not from
> other scripts, we can add two new features to it:
>
> 1. Support for FOREACH to allow gathering a set of output values across
> a list of ports or devices, e.g. ethdevs or rawdevs.
> 2. Support having predefined aliases in a file in the user's home
> directory to simplify the use of more complicated FOREACH commands.
>
> Putting these together, we can create new commands such as "eth_names".
>
> bruce@host:$ cat ~/.dpdk_telemetry_aliases
> eth_names=FOREACH index /ethdev/list /ethdev/info,$index .name
>
> bruce@host:$ echo eth_names | ./usertools/dpdk-telemetry.py | jq
> [
> {
> "index": 0,
> "name": "0000:16:00.0"
> },
> {
> "index": 1,
> "name": "0000:16:00.1"
> }
> ]
>
> ---
> v3: updated based on review feedback from Chengwen:
> - added arg to override alias file
> - printed one-line summary of alias count loaded
> - improved doc for "help" command
> - added "help alias" to list aliases.
> v2: added third patch with "help" command giving more details on
> how to use the various commands.
>
> Bruce Richardson (3):
> usertools/telemetry: add a FOREACH command
> usertools/telemetry: support using aliases for long commands
> usertools/telemetry: add help support
>
> doc/guides/howto/telemetry.rst | 106 ++++++++++++-
> usertools/dpdk-telemetry.py | 278 ++++++++++++++++++++++++++++++++-
> 2 files changed, 373 insertions(+), 11 deletions(-)
>
> --
> 2.53.0
>
>
^ permalink raw reply
* [PATCH v1 20/20] net/sxe2: update sxe2 feature matrix docs
From: liujie5 @ 2026-06-10 1:39 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260610013936.3634968-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
Update the sxe2.ini feature sheet to accurately reflect the recently
implemented hardware capabilities in the sxe2 PMD.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
doc/guides/nics/features/sxe2.ini | 56 +++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/doc/guides/nics/features/sxe2.ini b/doc/guides/nics/features/sxe2.ini
index 09ba2f558c..3c1e6a8a39 100644
--- a/doc/guides/nics/features/sxe2.ini
+++ b/doc/guides/nics/features/sxe2.ini
@@ -7,17 +7,73 @@
; is selected.
;
[Features]
+Speed capabilities = Y
+Link status = Y
+Link status event = Y
+Rx interrupt = Y
Fast mbuf free = P
Free Tx mbuf on demand = Y
Burst mode info = Y
Queue start/stop = Y
+Power mgmt address monitor = Y
Buffer split on Rx = P
Scattered Rx = Y
+Traffic manager = Y
CRC offload = Y
+VLAN offload = Y
+QinQ offload = P
L3 checksum offload = Y
L4 checksum offload = Y
+Timestamp offload = P
+Inner L3 checksum = P
+Inner L4 checksum = P
Rx descriptor status = Y
Tx descriptor status = Y
+MTU update = Y
+TSO = P
+Promiscuous mode = Y
+Allmulticast mode = Y
+Unicast MAC filter = Y
+RSS hash = Y
+RSS key update = Y
+RSS reta update = Y
+VLAN filter = Y
+Inline crypto = Y
+Packet type parsing = Y
+Timesync = Y
+Basic stats = Y
+Extended stats = Y
+FW version = Y
+Module EEPROM dump = Y
+Multiprocess aware = Y
Linux = Y
x86-32 = Y
x86-64 = Y
+
+[rte_flow items]
+eth = P
+geneve = Y
+gre = Y
+gtpu = Y
+ipv4 = Y
+ipv6 = Y
+ipv6_frag_ext = Y
+nvgre = Y
+sctp = Y
+tcp = Y
+udp = Y
+vlan = P
+vxlan = Y
+vxlan_gpe = Y
+
+[rte_flow actions]
+count = Y
+drop = Y
+mark = Y
+passthru = Y
+port_representor = Y
+queue = Y
+represented_port = Y
+rss = Y
+send_to_kernel = Y
+port_id = Y
--
2.52.0
^ permalink raw reply related
* [PATCH v1 15/20] common/sxe2: add shared SFP module definitions
From: liujie5 @ 2026-06-10 1:39 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260610013936.3634968-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
This patch adds a new shared header file 'sxe2_msg.h' which
contains definitions for SFP/SFP+ modules. This file is shared across
Firmware, Kernel driver, and DPDK PMD to ensure consistent protocol
handling.
The header includes:
- SFP EEPROM memory map offsets.
- Module type encoding definitions.
By using this shared header, the PMD can correctly identify module
capabilities and report diagnostic information in a way that is
consistent with the underlying firmware logic.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
drivers/common/sxe2/sxe2_msg.h | 118 +++++++++++++++++++++++++++++++++
1 file changed, 118 insertions(+)
create mode 100644 drivers/common/sxe2/sxe2_msg.h
diff --git a/drivers/common/sxe2/sxe2_msg.h b/drivers/common/sxe2/sxe2_msg.h
new file mode 100644
index 0000000000..f08944f7c9
--- /dev/null
+++ b/drivers/common/sxe2/sxe2_msg.h
@@ -0,0 +1,118 @@
+
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef __SXE2_MSG_H__
+#define __SXE2_MSG_H__
+
+enum sfp_type_identifier {
+ SXE2_SFP_TYPE_UNKNOWN = 0x00,
+ SXE2_SFP_TYPE_SFP = 0x03,
+
+ SXE2_SFP_TYPE_QSFP_PLUS = 0x0D,
+ SXE2_SFP_TYPE_QSFP28 = 0x11,
+
+ SXE2_SFP_TYPE_MAX = 0xFF,
+};
+
+#ifndef SFP_DEFINE
+#define SFP_DEFINE
+
+#define SXE2_SFP_EEP_WR 0x1
+#define SXE2_SFP_EEP_QSFP 0x1
+
+enum sfp_bus_addr {
+ SXE2_SFP_EEP_I2C_ADDR0 = 0xA0,
+ SXE2_SFP_EEP_I2C_ADDR1 = 0xA2,
+ SXE2_SFP_EEP_I2C_ADDR_NR = 0xFFFF,
+};
+
+struct sxe2_sfp_req {
+ uint8_t is_wr;
+ uint8_t is_qsfp;
+ uint16_t bus_addr;
+ uint16_t page_cnt;
+ uint16_t offset;
+ uint16_t data_len;
+ uint16_t rvd;
+ uint8_t data[];
+};
+
+struct sxe2_sfp_resp {
+ uint8_t is_wr;
+ uint8_t is_qsfp;
+ uint16_t data_len;
+ uint8_t data[];
+};
+
+enum sfp_page_cnt {
+ SXE2_SFP_EEP_PAGE_CNT0 = 0,
+ SXE2_SFP_EEP_PAGE_CNT1,
+ SXE2_SFP_EEP_PAGE_CNT2,
+ SXE2_SFP_EEP_PAGE_CNT3,
+ SXE2_SFP_EEP_PAGE_CNT20 = 20,
+ SXE2_SFP_EEP_PAGE_CNT21 = 21,
+
+ SXE2_SFP_EEP_PAGE_CNT_NR = 0xFFFF,
+};
+
+#define SXE2_SFP_E2P_I2C_7BIT_ADDR0 (SXE2_SFP_EEP_I2C_ADDR0 >> 1)
+#define SXE2_SFP_E2P_I2C_7BIT_ADDR1 (SXE2_SFP_EEP_I2C_ADDR1 >> 1)
+
+#define SXE2_QSFP_PAGE_OFST_START 128
+#define SXE2_SFP_EEP_OFST_MAX 255
+#define SXE2_SFP_EEP_LEN_MAX 256
+#endif
+
+#ifndef FW_STATE_DEFINE
+#define FW_STATE_DEFINE
+
+#define SXE2_FW_STATUS_MAIN_SHIF (16)
+#define SXE2_FW_STATUS_MAIN_MASK (0xFF0000)
+#define SXE2_FW_STATUS_SUB_MASK (0xFFFF)
+
+enum Sxe2FwStateMain {
+ SXE2_FW_STATE_MAIN_UNDEFINED = 0x00,
+ SXE2_FW_STATE_MAIN_INIT = 0x10000,
+ SXE2_FW_STATE_MAIN_RUN = 0x20000,
+ SXE2_FW_STATE_MAIN_ABNOMAL = 0x30000,
+};
+
+enum Sxe2FwState {
+ SXE2_FW_START_STATE_UNDEFINED = SXE2_FW_STATE_MAIN_UNDEFINED,
+ SXE2_FW_START_STATE_INIT_BASE = (SXE2_FW_STATE_MAIN_INIT + 0x1),
+ SXE2_FW_START_STATE_SCAN_DEVICE = (SXE2_FW_STATE_MAIN_INIT + 0x20),
+ SXE2_FW_START_STATE_FINISHED = (SXE2_FW_STATE_MAIN_RUN + 0x0),
+ SXE2_FW_START_STATE_UPGRADE = (SXE2_FW_STATE_MAIN_RUN + 0x1),
+ SXE2_FW_START_STATE_SYNC = (SXE2_FW_STATE_MAIN_RUN + 0x2),
+ SXE2_FW_RUNNING_STATE_ABNOMAL = (SXE2_FW_STATE_MAIN_ABNOMAL + 0x1),
+ SXE2_FW_RUNNING_STATE_ABNOMAL_CORE1 = (SXE2_FW_STATE_MAIN_ABNOMAL + 0x2),
+ SXE2_FW_RUNNING_STATE_ABNOMAL_HEART = (SXE2_FW_STATE_MAIN_ABNOMAL + 0x3),
+ SXE2_FW_START_STATE_MASK = (SXE2_FW_STATUS_MAIN_MASK | SXE2_FW_STATUS_SUB_MASK),
+};
+#endif
+
+#ifndef LED_DEFINE
+#define LED_DEFINE
+
+enum sxe2_led_mode {
+ SXE2_IDENTIFY_LED_BLINK_ON = 0,
+ SXE2_IDENTIFY_LED_BLINK_OFF,
+ SXE2_IDENTIFY_LED_ON,
+ SXE2_IDENTIFY_LED_OFF,
+ SXE2_IDENTIFY_LED_RESET,
+};
+
+
+typedef struct sxe2_led_ctrl {
+ uint32_t mode;
+ uint32_t duration;
+} sxe2_led_ctrl_s;
+
+typedef struct sxe2_led_ctrl_resp {
+ uint32_t ack;
+} sxe2_led_ctrl_resp_s;
+#endif
+
+#endif /* __SXE2_MSG_H__ */
--
2.52.0
^ permalink raw reply related
* [PATCH v1 07/20] net/sxe2: support IPsec inline protocol offload
From: liujie5 @ 2026-06-10 1:39 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260610013936.3634968-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
This patch adds support for IPsec inline protocol offload for both
inbound and outbound traffic.
- Implement rte_security_ops: session_create, session_destroy.
- Add hardware SA table management.
- Update Rx/Tx data path to handle security offload flags.
The hardware offloads the ESP encapsulation/decapsulation and
cryptographic processing.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
drivers/net/sxe2/meson.build | 2 +
drivers/net/sxe2/sxe2_cmd_chnl.c | 197 ++++
drivers/net/sxe2/sxe2_cmd_chnl.h | 20 +
drivers/net/sxe2/sxe2_drv_cmd.h | 61 ++
drivers/net/sxe2/sxe2_ethdev.c | 14 +
drivers/net/sxe2/sxe2_ethdev.h | 3 +
drivers/net/sxe2/sxe2_ipsec.c | 1565 +++++++++++++++++++++++++++++
drivers/net/sxe2/sxe2_ipsec.h | 254 +++++
drivers/net/sxe2/sxe2_rx.c | 5 +
drivers/net/sxe2/sxe2_security.c | 335 ++++++
drivers/net/sxe2/sxe2_security.h | 77 ++
drivers/net/sxe2/sxe2_tx.c | 8 +
drivers/net/sxe2/sxe2_txrx_poll.c | 55 +
13 files changed, 2596 insertions(+)
create mode 100644 drivers/net/sxe2/sxe2_ipsec.c
create mode 100644 drivers/net/sxe2/sxe2_ipsec.h
create mode 100644 drivers/net/sxe2/sxe2_security.c
create mode 100644 drivers/net/sxe2/sxe2_security.h
diff --git a/drivers/net/sxe2/meson.build b/drivers/net/sxe2/meson.build
index f03ea15356..86973edc99 100644
--- a/drivers/net/sxe2/meson.build
+++ b/drivers/net/sxe2/meson.build
@@ -64,4 +64,6 @@ sources += files(
'sxe2_filter.c',
'sxe2_rss.c',
'sxe2_tm.c',
+ 'sxe2_ipsec.c',
+ 'sxe2_security.c',
)
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.c b/drivers/net/sxe2/sxe2_cmd_chnl.c
index 19323ffcc4..7711e8e57d 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.c
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.c
@@ -877,3 +877,200 @@ int32_t sxe2_drv_tm_commit(struct sxe2_adapter *adapter)
l_end:
return ret;
}
+
+
+int32_t sxe2_drv_ipsec_get_capa(struct sxe2_adapter *adapter)
+{
+ int32_t ret = -1;
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_drv_ipsec_capa_resq resp;
+ struct sxe2_common_device *cdev = adapter->cdev;
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_CAP_GET,
+ NULL, 0,
+ &resp, sizeof(resp));
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to get ipsec specifications, ret=%d", ret);
+ goto l_end;
+ }
+
+ adapter->security_ctx.ipsec_ctx.max_tx_sa = rte_le_to_cpu_16(resp.tx_sa_cnt);
+ adapter->security_ctx.ipsec_ctx.max_rx_sa = rte_le_to_cpu_16(resp.rx_sa_cnt);
+ adapter->security_ctx.ipsec_ctx.max_tcam = rte_le_to_cpu_16(resp.ip_id_cnt);
+ adapter->security_ctx.ipsec_ctx.max_udp_group = rte_le_to_cpu_16(resp.udp_group_cnt);
+
+ PMD_DEV_LOG_INFO(adapter, DRV, "Max tx sa:%u, max rx sa:%u, max tcam:%u, udp group:%u.",
+ rte_le_to_cpu_16(resp.tx_sa_cnt),
+ rte_le_to_cpu_16(resp.rx_sa_cnt),
+ rte_le_to_cpu_16(resp.ip_id_cnt),
+ rte_le_to_cpu_16(resp.udp_group_cnt));
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_resource_clear(struct sxe2_adapter *adapter)
+{
+ int32_t ret = -1;
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_RESOURCE_CLEAR,
+ NULL, 0,
+ NULL, 0);
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to clear ipsec resource, ret=%d", ret);
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_txsa_add(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_tx_sa *tx_sa)
+{
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_drv_ipsec_txsa_add_req req = { 0 };
+ struct sxe2_drv_ipsec_txsa_add_resp resp = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+ int32_t ret = -1;
+ uint32_t mode = 0;
+ uint32_t i = 0;
+
+ if (tx_sa->algo == SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC)
+ mode |= IPSEC_TX_ENGINE_SM4;
+ if (tx_sa->mode == SXE2_IPSEC_MODE_ENC_AND_AUTH)
+ mode |= IPSEC_TX_ENCRYPT;
+ req.mode = rte_cpu_to_le_32(mode);
+ for (i = 0; i < SXE2_IPSEC_KEY_LEN; i++) {
+ req.encrypt_keys[i] = tx_sa->enc_key[i];
+ req.auth_keys[i] = tx_sa->auth_key[i];
+ }
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_TXSA_ADD,
+ &req, sizeof(req),
+ &resp, sizeof(resp));
+
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "failed to add tx sa, ret=%d", ret);
+ goto l_end;
+ }
+ tx_sa->hw_sa_id = rte_le_to_cpu_16(resp.index);
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_rxsa_add(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_rx_sa *rx_sa,
+ struct sxe2_ipsec_rx_tcam *rx_tcam,
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group)
+{
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_drv_ipsec_rxsa_add_req req = { 0 };
+ struct sxe2_drv_ipsec_rxsa_add_resp resp = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+ int32_t ret = -1;
+ uint32_t mode = 0;
+ uint32_t i = 0;
+
+ if (rx_sa->algo == SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC)
+ mode |= IPSEC_RX_ENGINE_SM4;
+ if (rx_sa->mode == SXE2_IPSEC_MODE_ENC_AND_AUTH)
+ mode |= IPSEC_RX_DECRYPT;
+ if (rx_tcam->ip_addr.type == RTE_SECURITY_IPSEC_TUNNEL_IPV6) {
+ mode |= IPSEC_RX_IPV6;
+ memcpy(req.ipaddr, rx_tcam->ip_addr.dst_ipv6, sizeof(req.ipaddr));
+ } else {
+ req.ipaddr[0] = rx_tcam->ip_addr.dst_ipv4;
+ }
+ req.mode = rte_cpu_to_le_32(mode);
+ req.spi = rte_cpu_to_le_32(rx_sa->spi);
+ if (rx_udp_group != NULL) {
+ req.udp_port = rte_cpu_to_le_32((uint32_t)rx_udp_group->udp_port);
+ req.sport_en = rx_udp_group->sport_en;
+ req.dport_en = rx_udp_group->dport_en;
+ }
+
+ PMD_DEV_LOG_INFO(adapter, DRV, "Add rx sa, mode: 0x%x, spi: 0x%x, udp_port: %u, "
+ "sport_en: %u, dport_en: %u.",
+ req.mode, req.spi, req.udp_port, req.sport_en, req.dport_en);
+
+ /* encrypt and auth keys */
+ for (i = 0; i < SXE2_IPSEC_KEY_LEN; i++) {
+ req.encrypt_keys[i] = rx_sa->enc_key[i];
+ req.auth_keys[i] = rx_sa->auth_key[i];
+ }
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_RXSA_ADD,
+ &req, sizeof(req),
+ &resp, sizeof(resp));
+
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to add rx sa, ret=%d", ret);
+ goto l_end;
+ }
+ rx_sa->hw_sa_id = rte_le_to_cpu_16(resp.sa_idx);
+ rx_sa->hw_ip_id = resp.ip_id;
+ rx_tcam->hw_ip_id = resp.ip_id;
+ rx_sa->hw_udp_group_id = resp.udp_group_id;
+ if (rx_udp_group != NULL)
+ rx_udp_group->hw_group_id = resp.udp_group_id;
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_rxsa_delete(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_rx_sa *rx_sa)
+{
+ struct sxe2_drv_ipsec_rxsa_del_req req = { 0 };
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+ int32_t ret = -1;
+
+ req.sa_idx = rte_cpu_to_le_16(rx_sa->hw_sa_id);
+ req.spi = rte_cpu_to_le_32(rx_sa->spi);
+ req.ip_id = rx_sa->hw_ip_id;
+ req.group_id = rx_sa->hw_udp_group_id;
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_RXSA_DEL,
+ &req, sizeof(req),
+ NULL, 0);
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret)
+ PMD_DEV_LOG_ERR(adapter, DRV,
+ "Failed to delete rx sa, sa id: %u, spi: %u, "
+ "ip id: %u, udp group id: %u, ret: %d.",
+ rx_sa->hw_sa_id, rx_sa->spi, rx_sa->hw_ip_id,
+ rx_sa->hw_udp_group_id, ret);
+
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_txsa_delete(struct sxe2_adapter *adapter,
+ uint16_t sa_id)
+{
+ struct sxe2_drv_ipsec_txsa_del_req req = { 0 };
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+ int32_t ret = -1;
+
+ req.sa_idx = rte_cpu_to_le_16(sa_id);
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_TXSA_DEL,
+ &req, sizeof(req),
+ NULL, 0);
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret)
+ PMD_DEV_LOG_ERR(adapter, DRV,
+ "Failed to delete tx sa, sa id: %u, ret: %d.",
+ sa_id, ret);
+
+ return ret;
+}
+
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.h b/drivers/net/sxe2/sxe2_cmd_chnl.h
index 77e689abcd..dac487fe7d 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.h
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.h
@@ -44,6 +44,26 @@ int32_t sxe2_drv_root_tree_alloc(struct rte_eth_dev *dev);
int32_t sxe2_drv_tm_commit(struct sxe2_adapter *adapter);
+int32_t sxe2_drv_ipsec_resource_clear(struct sxe2_adapter *adapter);
+
+int32_t sxe2_drv_ipsec_get_capa(struct sxe2_adapter *adapter);
+
+int32_t sxe2_drv_ipsec_rxsa_add(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_rx_sa *rx_sa,
+ struct sxe2_ipsec_rx_tcam *rx_tcam,
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group);
+
+int32_t sxe2_drv_ipsec_txsa_add(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_tx_sa *tx_sa);
+
+int32_t sxe2_drv_ipsec_rxsa_delete(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_rx_sa *rx_sa);
+
+int32_t sxe2_drv_ipsec_txsa_delete(struct sxe2_adapter *adapter,
+ uint16_t sa_id);
+
+int32_t sxe2_drv_promisc_config(struct sxe2_adapter *adapter, bool set);
+
int32_t sxe2_drv_allmulti_config(struct sxe2_adapter *adapter, bool set);
int32_t sxe2_drv_uc_config(struct sxe2_adapter *adapter, struct rte_ether_addr *addr, bool add);
diff --git a/drivers/net/sxe2/sxe2_drv_cmd.h b/drivers/net/sxe2/sxe2_drv_cmd.h
index 67c6885cae..39a108d76a 100644
--- a/drivers/net/sxe2/sxe2_drv_cmd.h
+++ b/drivers/net/sxe2/sxe2_drv_cmd.h
@@ -375,6 +375,67 @@ struct __rte_aligned(4) __rte_packed_begin sxe2_tm_add_queue_msg {
struct sxe2_tm_info info;
} __rte_packed_end;
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_capa_resq {
+ uint16_t tx_sa_cnt;
+ uint16_t rx_sa_cnt;
+ uint16_t ip_id_cnt;
+ uint16_t udp_group_cnt;
+} __rte_packed_end;
+
+#define SXE2_IPSEC_KEY_LEN (32)
+#define SXE2_IPV6_ADDR_LEN (4)
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_txsa_add_req {
+ uint32_t mode;
+ uint8_t encrypt_keys[SXE2_IPSEC_KEY_LEN];
+ uint8_t auth_keys[SXE2_IPSEC_KEY_LEN];
+ bool func_type;
+ uint8_t func_id;
+ uint8_t drv_id;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_txsa_add_resp {
+ uint16_t index;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_rxsa_add_req {
+ uint32_t mode;
+ uint32_t spi;
+ uint32_t ipaddr[SXE2_IPV6_ADDR_LEN];
+ uint32_t udp_port;
+ uint8_t sport_en;
+ uint8_t dport_en;
+ uint8_t is_over_sdn;
+ uint8_t sdn_group_id;
+ uint8_t encrypt_keys[SXE2_IPSEC_KEY_LEN];
+ uint8_t auth_keys[SXE2_IPSEC_KEY_LEN];
+ bool func_type;
+ uint8_t func_id;
+ uint8_t drv_id;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_rxsa_add_resp {
+ uint8_t ip_id;
+ uint8_t udp_group_id;
+ uint16_t sa_idx;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_txsa_del_req {
+ uint16_t sa_idx;
+ bool func_type;
+ uint8_t func_id;
+ uint8_t drv_id;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_rxsa_del_req {
+ uint8_t ip_id;
+ uint8_t group_id;
+ uint16_t sa_idx;
+ uint32_t spi;
+ bool func_type;
+ uint8_t func_id;
+ uint8_t drv_id;
+} __rte_packed_end;
+
enum sxe2_drv_cmd_module {
SXE2_DRV_CMD_MODULE_HANDSHAKE = 0,
SXE2_DRV_CMD_MODULE_DEV = 1,
diff --git a/drivers/net/sxe2/sxe2_ethdev.c b/drivers/net/sxe2/sxe2_ethdev.c
index f98cf367f1..9c9d98782b 100644
--- a/drivers/net/sxe2/sxe2_ethdev.c
+++ b/drivers/net/sxe2/sxe2_ethdev.c
@@ -298,6 +298,11 @@ static int32_t sxe2_dev_infos_get(struct rte_eth_dev *dev,
if (adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_PTP)
dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+ if (sxe2_ipsec_supported(adapter)) {
+ dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_SECURITY;
+ dev_info->tx_offload_capa |= RTE_ETH_TX_OFFLOAD_SECURITY;
+ }
+
if (adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_RSS) {
dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
dev_info->flow_type_rss_offloads |= SXE2_RSS_HF_SUPPORT_ALL;
@@ -1053,6 +1058,12 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
goto init_eth_err;
}
+ ret = sxe2_security_init(dev);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to initialize security, ret=%d", ret);
+ goto init_security_err;
+ }
+
ret = sxe2_rss_disable(dev);
if (ret) {
PMD_LOG_ERR(INIT, "Failed to disable rss, ret=%d", ret);
@@ -1067,6 +1078,8 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
goto l_end;
+init_security_err:
+ sxe2_eth_uinit(dev);
init_sched_err:
init_rss_err:
init_eth_err:
@@ -1085,6 +1098,7 @@ static int32_t sxe2_dev_close(struct rte_eth_dev *dev)
(void)sxe2_rss_disable(dev);
(void)sxe2_sched_uinit(dev);
sxe2_vsi_uninit(dev);
+ sxe2_security_uinit(dev);
sxe2_dev_pci_map_uinit(dev);
sxe2_eth_uinit(dev);
diff --git a/drivers/net/sxe2/sxe2_ethdev.h b/drivers/net/sxe2/sxe2_ethdev.h
index 95594fcbde..fed8ce37d9 100644
--- a/drivers/net/sxe2/sxe2_ethdev.h
+++ b/drivers/net/sxe2/sxe2_ethdev.h
@@ -20,6 +20,8 @@
#include "sxe2_queue.h"
#include "sxe2_mac.h"
#include "sxe2_osal.h"
+#include "sxe2_security.h"
+#include "sxe2_ipsec.h"
#include "sxe2_tm.h"
#include "sxe2_filter.h"
@@ -313,6 +315,7 @@ struct sxe2_adapter {
struct sxe2_sched_hw_cap sched_ctxt;
struct sxe2_tm_context tm_ctxt;
struct sxe2_devargs devargs;
+ struct sxe2_security_ctx security_ctx;
struct sxe2_switchdev_info switchdev_info;
bool rule_started;
bool flow_isolated;
diff --git a/drivers/net/sxe2/sxe2_ipsec.c b/drivers/net/sxe2/sxe2_ipsec.c
new file mode 100644
index 0000000000..e783a51b85
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_ipsec.c
@@ -0,0 +1,1565 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#include <rte_malloc.h>
+#include <rte_bitmap.h>
+
+#include "sxe2_ethdev.h"
+#include "sxe2_security.h"
+#include "sxe2_ipsec.h"
+#include "sxe2_cmd_chnl.h"
+#include "sxe2_common_log.h"
+
+bool sxe2_ipsec_supported(struct sxe2_adapter *adapter)
+{
+ uint64_t cap = adapter->cap_flags;
+
+ return !!(cap & SXE2_DEV_CAPS_OFFLOAD_IPSEC);
+}
+
+bool sxe2_ipsec_valid_tx_offloads(uint64_t offloads)
+{
+ bool ret = true;
+ uint64_t tso_features = 0;
+ uint64_t cksum_features = 0;
+
+ if (offloads & RTE_ETH_TX_OFFLOAD_SECURITY) {
+ tso_features = RTE_ETH_TX_OFFLOAD_TCP_TSO |
+ RTE_ETH_TX_OFFLOAD_UDP_TSO |
+ RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO |
+ RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO |
+ RTE_ETH_TX_OFFLOAD_IPIP_TNL_TSO |
+ RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO;
+ if (offloads & tso_features) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with TSO offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ cksum_features = RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
+ RTE_ETH_TX_OFFLOAD_UDP_CKSUM |
+ RTE_ETH_TX_OFFLOAD_TCP_CKSUM |
+ RTE_ETH_TX_OFFLOAD_SCTP_CKSUM |
+ RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM |
+ RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM;
+ if (offloads & cksum_features) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with checksum offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ if (offloads & (RTE_ETH_TX_OFFLOAD_VLAN_INSERT | RTE_ETH_TX_OFFLOAD_QINQ_INSERT)) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with vlan offload.");
+ ret = false;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return ret;
+}
+
+bool sxe2_ipsec_valid_rx_offloads(uint64_t offloads)
+{
+ bool ret = true;
+
+ if (offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
+ if (offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with LRO offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ if (offloads & RTE_ETH_RX_OFFLOAD_CHECKSUM) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with checksum offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ if (offloads & RTE_ETH_RX_OFFLOAD_KEEP_CRC) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with keep CRC offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ if (offloads & RTE_ETH_RX_OFFLOAD_VLAN) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with vlan offload.");
+ ret = false;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t sxe2_ipsec_bitmap_mem_init(struct rte_bitmap **d_bmp, void **d_mem, uint32_t bits)
+{
+ struct rte_bitmap *bmp = NULL;
+ uint32_t bmp_size = 0;
+ void *mem = NULL;
+ int32_t ret = -1;
+
+ bmp_size = rte_bitmap_get_memory_footprint(bits);
+
+ mem = rte_zmalloc("ipsec bitmap", bmp_size, RTE_CACHE_LINE_SIZE);
+ if (mem == NULL) {
+ PMD_LOG_ERR(DRV, "Alloc ipsec bitmap memory failed.");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ bmp = rte_bitmap_init(bits, mem, bmp_size);
+ if (bmp == NULL) {
+ PMD_LOG_ERR(DRV, "Failed to init ipsec bitmap.");
+ rte_free(mem);
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ *d_bmp = bmp;
+ *d_mem = mem;
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t sxe2_ipsec_bitmap_init(struct sxe2_security_ctx *sxe2_sctx)
+{
+ int32_t ret = -1;
+
+ ret = sxe2_ipsec_bitmap_mem_init(&sxe2_sctx->ipsec_ctx.bmp.tx_sa_bmp,
+ &sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem, sxe2_sctx->ipsec_ctx.max_tx_sa);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_ipsec_bitmap_mem_init(&sxe2_sctx->ipsec_ctx.bmp.rx_sa_bmp,
+ &sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem, sxe2_sctx->ipsec_ctx.max_rx_sa);
+ if (ret) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_bitmap_mem_init(&sxe2_sctx->ipsec_ctx.bmp.rx_tcam_bmp,
+ &sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem, sxe2_sctx->ipsec_ctx.max_tcam);
+ if (ret) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem = NULL;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_bitmap_mem_init(&sxe2_sctx->ipsec_ctx.bmp.rx_udp_bmp,
+ &sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem, sxe2_sctx->ipsec_ctx.max_udp_group);
+ if (ret) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem);
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem = NULL;
+ sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem = NULL;
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+static uint16_t sxe2_ipsec_id_alloc(struct rte_bitmap *bmp, uint16_t bits)
+{
+ uint16_t i = 0;
+ uint16_t index = 0XFFFF;
+
+ for (i = 0; i < bits; i++) {
+ if (!rte_bitmap_get(bmp, i)) {
+ index = i;
+ rte_bitmap_set(bmp, i);
+ break;
+ }
+ }
+
+ return index;
+}
+
+static void sxe2_ipsec_id_free(struct rte_bitmap *bmp, uint16_t pos)
+{
+ rte_bitmap_clear(bmp, pos);
+}
+
+static struct rte_cryptodev_symmetric_capability *
+sxe2_ipsec_cipher_cap_get(struct rte_cryptodev_capabilities *crypto_cap,
+ enum rte_crypto_cipher_algorithm algo)
+{
+ struct rte_cryptodev_symmetric_capability *capability = NULL;
+ uint8_t index = 0;
+
+ for (index = 0; index < SXE2_IPSEC_CAP_MAX; index++) {
+ if (crypto_cap[index].sym.xform_type == RTE_CRYPTO_SYM_XFORM_CIPHER &&
+ crypto_cap[index].sym.cipher.algo == algo) {
+ capability = &crypto_cap[index].sym;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return capability;
+}
+
+static struct rte_cryptodev_symmetric_capability *
+sxe2_ipsec_auth_cap_get(struct rte_cryptodev_capabilities *crypto_cap,
+ enum rte_crypto_auth_algorithm algo)
+{
+ struct rte_cryptodev_symmetric_capability *capability = NULL;
+ uint8_t index = 0;
+
+ for (index = 0; index < SXE2_IPSEC_CAP_MAX; index++) {
+ if (crypto_cap[index].sym.xform_type == RTE_CRYPTO_SYM_XFORM_AUTH &&
+ crypto_cap[index].sym.auth.algo == algo) {
+ capability = &crypto_cap[index].sym;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return capability;
+}
+
+static bool sxe2_security_valid_key(uint16_t src_key, uint16_t max_key,
+ uint16_t min_key, uint16_t increment)
+{
+ bool is_valid = false;
+
+ if (src_key > SXE2_IPSEC_MAX_KEY_LEN) {
+ is_valid = false;
+ goto l_end;
+ }
+
+ if (src_key < min_key || src_key > max_key) {
+ is_valid = false;
+ goto l_end;
+ }
+
+ if (increment == 0) {
+ is_valid = true;
+ goto l_end;
+ }
+
+ if ((uint16_t)(src_key - min_key) % increment) {
+ is_valid = false;
+ goto l_end;
+ }
+
+ is_valid = true;
+
+l_end:
+ return is_valid;
+}
+
+static int32_t
+sxe2_ipsec_valid_cipher(enum rte_crypto_cipher_operation cipher_op,
+ struct rte_cryptodev_capabilities *crypto_cap,
+ struct rte_crypto_sym_xform *xform)
+{
+ const struct rte_cryptodev_symmetric_capability *capability = NULL;
+ uint16_t src_key = 0;
+ uint16_t max_key = 0;
+ uint16_t min_key = 0;
+ uint16_t increment = 0;
+ int32_t ret = -1;
+
+ if (xform->cipher.op != cipher_op) {
+ PMD_LOG_ERR(DRV, "Invalid cipher direction specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ capability = sxe2_ipsec_cipher_cap_get(crypto_cap, xform->cipher.algo);
+ if (!capability) {
+ PMD_LOG_ERR(DRV, "Invalid cipher algo specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ src_key = xform->cipher.key.length;
+ min_key = capability->cipher.key_size.min;
+ max_key = capability->cipher.key_size.max;
+ increment = capability->cipher.key_size.increment;
+ if (!sxe2_security_valid_key(src_key, max_key, min_key, increment)) {
+ PMD_LOG_ERR(DRV, "Invalid cipher key size specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_valid_auth(enum rte_crypto_auth_operation auth_op,
+ struct rte_cryptodev_capabilities *crypto_cap,
+ struct rte_crypto_sym_xform *xform)
+{
+ const struct rte_cryptodev_symmetric_capability *capability = NULL;
+ uint16_t src_key = 0;
+ uint16_t max_key = 0;
+ uint16_t min_key = 0;
+ uint16_t increment = 0;
+ int32_t ret = -1;
+
+ if (xform->auth.op != auth_op) {
+ PMD_LOG_ERR(DRV, "Invalid auth direction specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ capability = sxe2_ipsec_auth_cap_get(crypto_cap, xform->auth.algo);
+ if (!capability) {
+ PMD_LOG_ERR(DRV, "Invalid auth algo specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ src_key = xform->auth.key.length;
+ min_key = capability->auth.key_size.min;
+ max_key = capability->auth.key_size.max;
+ increment = capability->auth.key_size.increment;
+ if (!sxe2_security_valid_key(src_key, max_key, min_key, increment)) {
+ PMD_LOG_ERR(DRV, "Invalid auth key size specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static bool
+sxe2_ipsec_valid_algo(enum rte_crypto_auth_algorithm auth_algo,
+ enum rte_crypto_cipher_algorithm cipher_algo)
+{
+ bool ret = false;
+
+ if ((cipher_algo == SXE2_RTE_CRYPTO_CIPHER_AES_CBC &&
+ auth_algo == SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC) ||
+ (cipher_algo == SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC &&
+ auth_algo == SXE2_RTE_CRYPTO_AUTH_SM3_HMAC)) {
+ ret = true;
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+static enum sxe2_ipsec_algorithm
+sxe2_ipsec_algo_gen(enum rte_crypto_cipher_algorithm cipher_algo)
+{
+ enum sxe2_ipsec_algorithm algo = SXE2_IPSEC_ALGO_INVALID;
+
+ if (cipher_algo == SXE2_RTE_CRYPTO_CIPHER_AES_CBC)
+ algo = SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC;
+ else if (cipher_algo == SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC)
+ algo = SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC;
+
+ return algo;
+}
+
+static int32_t
+ sxe2_ipsec_valid_xform(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf)
+{
+ struct rte_crypto_sym_xform *xform = NULL;
+ struct rte_cryptodev_capabilities *crypto_cap =
+ sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC].crypto_capabilities;
+ enum rte_crypto_auth_algorithm auth_algo = RTE_CRYPTO_AUTH_NULL;
+ enum rte_crypto_cipher_algorithm cipher_algo = RTE_CRYPTO_CIPHER_NULL;
+ int32_t ret = -1;
+
+ if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_EGRESS &&
+ conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ xform = conf->crypto_xform;
+ cipher_algo = xform->cipher.algo;
+ ret = sxe2_ipsec_valid_cipher(RTE_CRYPTO_CIPHER_OP_ENCRYPT,
+ crypto_cap, xform);
+ if (ret)
+ goto l_end;
+
+ if (conf->crypto_xform->next) {
+ if (conf->crypto_xform->next->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
+ auth_algo = conf->crypto_xform->next->auth.algo;
+ if (!sxe2_ipsec_valid_algo(auth_algo, cipher_algo)) {
+ PMD_LOG_ERR(DRV, "Invalid algo group.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+ xform = conf->crypto_xform->next;
+ ret = sxe2_ipsec_valid_auth(RTE_CRYPTO_AUTH_OP_GENERATE,
+ crypto_cap, xform);
+ if (ret)
+ goto l_end;
+ } else {
+ PMD_LOG_ERR(DRV, "Encrypt direction next xform only verify.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+ }
+ } else if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_INGRESS &&
+ conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ xform = conf->crypto_xform;
+ ret = sxe2_ipsec_valid_cipher(RTE_CRYPTO_CIPHER_OP_DECRYPT,
+ crypto_cap, xform);
+ if (ret)
+ goto l_end;
+
+ } else if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_INGRESS &&
+ conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
+ xform = conf->crypto_xform;
+ ret = sxe2_ipsec_valid_auth(RTE_CRYPTO_AUTH_OP_VERIFY, crypto_cap, xform);
+ if (ret)
+ goto l_end;
+
+ if (conf->crypto_xform->next &&
+ conf->crypto_xform->next->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ auth_algo = conf->crypto_xform->auth.algo;
+ cipher_algo = conf->crypto_xform->next->cipher.algo;
+ if (!sxe2_ipsec_valid_algo(auth_algo, cipher_algo)) {
+ PMD_LOG_ERR(DRV, "Invalid algo group.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+ xform = conf->crypto_xform->next;
+ ret = sxe2_ipsec_valid_cipher(RTE_CRYPTO_CIPHER_OP_DECRYPT,
+ crypto_cap, xform);
+ if (ret)
+ goto l_end;
+ } else {
+ PMD_LOG_ERR(DRV, "Not support decrypt direction only verify, but not decrypt.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+ } else {
+ PMD_LOG_ERR(DRV, "Encrypt/decrypt xform invalid.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_valid_udp(struct rte_security_session_conf *conf)
+{
+ int32_t ret = -1;
+ uint16_t sport = conf->ipsec.udp.sport;
+ uint16_t dport = conf->ipsec.udp.dport;
+
+ if (conf->ipsec.options.udp_encap == 0) {
+ ret = 0;
+ goto l_end;
+ }
+
+ if (sport == 0 && dport == 0) {
+ PMD_LOG_ERR(DRV, "Invalid udp port, cannot be zero.");
+ ret = -1;
+ goto l_end;
+ }
+
+ if (sport != 0 && dport != 0 && sport != dport) {
+ PMD_LOG_ERR(DRV, "Invalid udp port, if sport and dport is not zero, must be equal.");
+ ret = -1;
+ goto l_end;
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_session_conf_valid(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf)
+{
+ int32_t ret = -1;
+
+ if (sxe2_sctx == NULL) {
+ PMD_LOG_ERR(DRV, "Invalid security ctx.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->action_type !=
+ sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC].action) {
+ PMD_LOG_ERR(DRV, "Invalid action specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->ipsec.mode !=
+ sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC].ipsec.mode) {
+ PMD_LOG_ERR(DRV, "Invalid IPsec mode specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->ipsec.proto !=
+ sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC].ipsec.proto) {
+ PMD_LOG_ERR(DRV, "Invalid IPsec protocol specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->ipsec.options.esn) {
+ PMD_LOG_ERR(DRV, "Not support esn.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_INGRESS &&
+ conf->ipsec.spi == 0) {
+ PMD_LOG_ERR(DRV, "spi cannot be zero.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->crypto_xform == NULL) {
+ PMD_LOG_ERR(DRV, "Invalid ipsec xform specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_valid_udp(conf);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_ipsec_valid_xform(sxe2_sctx, conf);
+ if (ret)
+ goto l_end;
+
+l_end:
+ return ret;
+}
+
+static void
+sxe2_ipsec_session_save(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess, uint16_t sa_id, uint16_t index)
+{
+ enum rte_crypto_cipher_algorithm cipher_algo = RTE_CRYPTO_CIPHER_NULL;
+
+ sxe2_sess->adapter = sxe2_sctx->adapter;
+ sxe2_sess->direction = conf->ipsec.direction;
+ sxe2_sess->protocol = conf->protocol;
+ sxe2_sess->mode = conf->ipsec.mode;
+ sxe2_sess->sa_proto = conf->ipsec.proto;
+ sxe2_sess->sa.spi = conf->ipsec.spi;
+ sxe2_sess->sa.hw_idx = sa_id;
+ sxe2_sess->sa.sw_idx = index;
+
+ if (conf->ipsec.options.esn) {
+ sxe2_sess->esn.enabled = true;
+ sxe2_sess->esn.value = conf->ipsec.esn.value;
+ }
+
+ if (sxe2_sess->mode == RTE_SECURITY_IPSEC_SA_MODE_TUNNEL)
+ sxe2_sess->type = conf->ipsec.tunnel.type;
+
+ if (conf->ipsec.options.udp_encap) {
+ sxe2_sess->udp_cap.enabled = true;
+ memcpy(&sxe2_sess->udp_cap.value, &conf->ipsec.udp,
+ sizeof(struct rte_security_ipsec_udp_param));
+ }
+
+ sxe2_sess->pkt_metadata_template.sa_idx = sa_id;
+ sxe2_sess->pkt_metadata_template.ol_flags |= SXE2_IPSEC_OL_FLAGS_IS_TUN;
+ sxe2_sess->pkt_metadata_template.ol_flags |= SXE2_IPSEC_OL_FLAGS_IS_ESP;
+
+ if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_EGRESS &&
+ conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ cipher_algo = conf->crypto_xform->cipher.algo;
+ sxe2_sess->pkt_metadata_template.algo = sxe2_ipsec_algo_gen(cipher_algo);
+ if (conf->crypto_xform->next)
+ sxe2_sess->pkt_metadata_template.mode = SXE2_IPSEC_MODE_ENC_AND_AUTH;
+ else
+ sxe2_sess->pkt_metadata_template.mode = SXE2_IPSEC_MODE_ONLY_ENCRYPT;
+ }
+
+ PMD_LOG_INFO(DRV,
+ "Save security info to session ctx, said:%u, spi:%u, mode:%u, algo:%u",
+ sa_id, sxe2_sess->sa.spi,
+ sxe2_sess->pkt_metadata_template.mode,
+ sxe2_sess->pkt_metadata_template.algo);
+}
+
+static void
+sxe2_ipsec_tx_sa_fill(struct sxe2_ipsec_tx_sa *tx_sa,
+ struct rte_security_session_conf *conf)
+{
+ uint8_t *dst = NULL;
+ uint8_t len = 0;
+
+ memcpy(&tx_sa->xform, &conf->ipsec, sizeof(struct rte_security_ipsec_xform));
+
+ if (conf->crypto_xform->next)
+ tx_sa->mode = SXE2_IPSEC_MODE_ENC_AND_AUTH;
+ else
+ tx_sa->mode = SXE2_IPSEC_MODE_ONLY_ENCRYPT;
+
+ if (conf->crypto_xform->cipher.algo == SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC)
+ tx_sa->algo = SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC;
+ else
+ tx_sa->algo = SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC;
+
+ dst = tx_sa->enc_key;
+ len = conf->crypto_xform->cipher.key.length;
+ memcpy(dst, conf->crypto_xform->cipher.key.data, len);
+
+ if (conf->crypto_xform->next) {
+ dst = tx_sa->auth_key;
+ len = conf->crypto_xform->next->auth.key.length;
+ memcpy(dst, conf->crypto_xform->next->auth.key.data, len);
+ }
+}
+
+static int32_t
+sxe2_ipsec_tx_sa_add(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct sxe2_ipsec_tx_sa *tx_sa = NULL;
+ struct rte_bitmap *bmp = sxe2_sctx->ipsec_ctx.bmp.tx_sa_bmp;
+ uint16_t bits = sxe2_sctx->ipsec_ctx.max_tx_sa;
+ uint16_t index = 0xFFFF;
+ int32_t ret = -1;
+
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ index = sxe2_ipsec_id_alloc(bmp, bits);
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+ if (index == 0xFFFF) {
+ PMD_LOG_ERR(DRV, "Failed to allocate ipsec tx sa index.");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ tx_sa = &sxe2_sctx->ipsec_ctx.tx_sa[index];
+
+ sxe2_ipsec_tx_sa_fill(tx_sa, conf);
+
+ ret = sxe2_drv_ipsec_txsa_add(sxe2_sctx->adapter, tx_sa);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Failed to add tx sa.");
+ ret = -EIO;
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ sxe2_ipsec_id_free(bmp, index);
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+ goto l_end;
+ }
+
+ sxe2_ipsec_session_save(sxe2_sctx, conf, sxe2_sess, tx_sa->hw_sa_id, tx_sa->id);
+
+ PMD_LOG_INFO(DRV, "Add tx sa success, tx sa id: %u, index: %u.",
+ tx_sa->hw_sa_id, tx_sa->id);
+
+l_end:
+ return ret;
+}
+
+static uint16_t
+sxe2_ipsec_tcam_id_find(struct sxe2_ipsec_rx_tcam *rx_tcam,
+ struct rte_security_ipsec_tunnel_param tunnel, uint16_t len)
+{
+ struct sxe2_ipsec_rx_tcam *per = NULL;
+ uint16_t tcam_id = 0XFFFF;
+ uint16_t i = 0;
+
+ for (i = 0; i < len; i++) {
+ per = &rx_tcam[i];
+ if (per->ip_addr.type == tunnel.type) {
+ if (tunnel.type == RTE_SECURITY_IPSEC_TUNNEL_IPV4 &&
+ per->ip_addr.dst_ipv4 == (uint32_t)tunnel.ipv4.dst_ip.s_addr) {
+ tcam_id = i;
+ goto l_end;
+ }
+ if (tunnel.type == RTE_SECURITY_IPSEC_TUNNEL_IPV6) {
+ if (!memcmp(&tunnel.ipv6, &per->ip_addr.dst_ipv6,
+ sizeof(tunnel.ipv6))) {
+ tcam_id = i;
+ goto l_end;
+ }
+ }
+ }
+ }
+
+l_end:
+ return tcam_id;
+}
+
+static uint16_t
+sxe2_ipsec_group_id_find(struct sxe2_ipsec_rx_udp_group *rx_udp_group,
+ uint16_t udp_port, uint8_t sport_en, uint8_t dport_en, uint16_t len)
+{
+ struct sxe2_ipsec_rx_udp_group *per = NULL;
+ uint16_t group_id = 0XFFFF;
+ uint16_t i;
+
+ for (i = 0; i < len; i++) {
+ per = &rx_udp_group[i];
+ if (per->udp_port == udp_port && per->sport_en == sport_en &&
+ per->dport_en == dport_en) {
+ group_id = i;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return group_id;
+}
+
+static void
+sxe2_ipsec_rx_sa_fill(struct sxe2_ipsec_rx_sa *rx_sa,
+ struct rte_security_session_conf *conf)
+{
+ uint8_t *dst = NULL;
+ uint8_t len = 0;
+
+ memcpy(&rx_sa->xform, &conf->ipsec, sizeof(struct rte_security_ipsec_xform));
+
+ if (conf->crypto_xform->next)
+ rx_sa->mode = SXE2_IPSEC_MODE_ENC_AND_AUTH;
+ else
+ rx_sa->mode = SXE2_IPSEC_MODE_ONLY_ENCRYPT;
+
+ if (conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ if (conf->crypto_xform->cipher.algo == SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC)
+ rx_sa->algo = SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC;
+ else
+ rx_sa->algo = SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC;
+ } else {
+ if (conf->crypto_xform->auth.algo == SXE2_RTE_CRYPTO_AUTH_SM3_HMAC)
+ rx_sa->algo = SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC;
+ else
+ rx_sa->algo = SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC;
+ }
+
+ if (conf->crypto_xform->next) {
+ dst = rx_sa->auth_key;
+ len = conf->crypto_xform->auth.key.length;
+ memcpy(dst, conf->crypto_xform->auth.key.data, len);
+
+ dst = rx_sa->enc_key;
+ len = conf->crypto_xform->next->cipher.key.length;
+ memcpy(dst, conf->crypto_xform->next->cipher.key.data, len);
+ } else {
+ dst = rx_sa->enc_key;
+ len = conf->crypto_xform->cipher.key.length;
+ memcpy(dst, conf->crypto_xform->cipher.key.data, len);
+ }
+
+ rx_sa->spi = conf->ipsec.spi;
+}
+
+static int32_t
+sxe2_ipsec_rx_tcam_fill(struct sxe2_security_ctx *sxe2_sctx, uint16_t *tcam_id,
+ struct rte_security_session_conf *conf)
+{
+ int32_t ret = -1;
+ uint16_t len = sxe2_sctx->ipsec_ctx.max_tcam;
+ struct sxe2_ipsec_rx_tcam *rx_tcam = NULL;
+
+ *tcam_id = sxe2_ipsec_tcam_id_find(sxe2_sctx->ipsec_ctx.rx_tcam,
+ conf->ipsec.tunnel, len);
+ if (*tcam_id == 0XFFFF) {
+ *tcam_id = sxe2_ipsec_id_alloc(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_bmp, len);
+ if (*tcam_id == 0xFFFF) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ rx_tcam = &sxe2_sctx->ipsec_ctx.rx_tcam[*tcam_id];
+
+ rx_tcam->ip_addr.type = conf->ipsec.tunnel.type;
+ if (rx_tcam->ip_addr.type == RTE_SECURITY_IPSEC_TUNNEL_IPV4) {
+ rx_tcam->ip_addr.dst_ipv4 = (uint32_t)conf->ipsec.tunnel.ipv4.dst_ip.s_addr;
+ } else {
+ memcpy(&rx_tcam->ip_addr.dst_ipv6, &conf->ipsec.tunnel.ipv6.dst_addr,
+ sizeof(rx_tcam->ip_addr.dst_ipv6));
+ }
+ } else {
+ rx_tcam = &sxe2_sctx->ipsec_ctx.rx_tcam[*tcam_id];
+ }
+ rx_tcam->ref_cnt++;
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_rx_udp_group_fill(struct sxe2_security_ctx *sxe2_sctx, uint16_t *udp_group_id,
+ struct rte_security_session_conf *conf)
+{
+ int32_t ret = -1;
+ uint16_t len = sxe2_sctx->ipsec_ctx.max_udp_group;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group = NULL;
+ uint8_t sport_en = 0;
+ uint8_t dport_en = 0;
+ uint16_t udp_port = 0;
+
+ if (!conf->ipsec.options.udp_encap) {
+ ret = 0;
+ goto l_end;
+ }
+
+ if (conf->ipsec.udp.sport) {
+ sport_en = 1;
+ udp_port = conf->ipsec.udp.sport;
+ } else {
+ sport_en = 0;
+ }
+ if (conf->ipsec.udp.dport) {
+ dport_en = 1;
+ udp_port = conf->ipsec.udp.dport;
+ } else {
+ dport_en = 0;
+ }
+
+ *udp_group_id = sxe2_ipsec_group_id_find(sxe2_sctx->ipsec_ctx.rx_udp_group,
+ udp_port, sport_en, dport_en, len);
+ if (*udp_group_id == 0XFFFF) {
+ *udp_group_id = sxe2_ipsec_id_alloc(sxe2_sctx->ipsec_ctx.bmp.rx_udp_bmp, len);
+ if (*udp_group_id == 0xFFFF) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ rx_udp_group = &sxe2_sctx->ipsec_ctx.rx_udp_group[*udp_group_id];
+ rx_udp_group->sport_en = sport_en;
+ rx_udp_group->dport_en = dport_en;
+ rx_udp_group->udp_port = udp_port;
+ } else {
+ rx_udp_group = &sxe2_sctx->ipsec_ctx.rx_udp_group[*udp_group_id];
+ }
+ rx_udp_group->ref_cnt++;
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_rx_sa_add(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct sxe2_ipsec_rx_tcam *rx_tcam = NULL;
+ struct sxe2_ipsec_rx_sa *rx_sa = NULL;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group = NULL;
+ struct rte_bitmap *rx_sa_bmp = sxe2_sctx->ipsec_ctx.bmp.rx_sa_bmp;
+ struct rte_bitmap *rx_tcam_bmp = sxe2_sctx->ipsec_ctx.bmp.rx_tcam_bmp;
+ uint16_t sa_bits = sxe2_sctx->ipsec_ctx.max_rx_sa;
+ uint16_t sa_id = 0xFFFF;
+ uint16_t tcam_id = 0xFFFF;
+ uint16_t udp_group_id = 0xFFFF;
+ int32_t ret = -1;
+
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ sa_id = sxe2_ipsec_id_alloc(rx_sa_bmp, sa_bits);
+ if (sa_id == 0xFFFF) {
+ PMD_LOG_ERR(DRV, "Failed to allocate ipsec rx sa index.");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ rx_sa = &sxe2_sctx->ipsec_ctx.rx_sa[sa_id];
+ sxe2_ipsec_rx_sa_fill(rx_sa, conf);
+
+ ret = sxe2_ipsec_rx_tcam_fill(sxe2_sctx, &tcam_id, conf);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Failed to allocate ipsec rx tcam index.");
+ sxe2_ipsec_id_free(rx_sa_bmp, sa_id);
+ goto l_end;
+ }
+ rx_sa->tcam_id = tcam_id;
+ rx_tcam = &sxe2_sctx->ipsec_ctx.rx_tcam[tcam_id];
+
+ ret = sxe2_ipsec_rx_udp_group_fill(sxe2_sctx, &udp_group_id, conf);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Failed to allocate ipsec rx udp group index.");
+ sxe2_ipsec_id_free(rx_sa_bmp, sa_id);
+ sxe2_ipsec_id_free(rx_tcam_bmp, tcam_id);
+ goto l_end;
+ }
+
+ if (udp_group_id != 0XFFFF) {
+ rx_sa->udp_group_id = (uint8_t)udp_group_id;
+ rx_udp_group = &sxe2_sctx->ipsec_ctx.rx_udp_group[udp_group_id];
+ } else {
+ rx_sa->udp_group_id = 0XFF;
+ }
+
+ ret = sxe2_drv_ipsec_rxsa_add(sxe2_sctx->adapter, rx_sa, rx_tcam, rx_udp_group);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Failed to add rx sa.");
+ sxe2_ipsec_id_free(rx_sa_bmp, sa_id);
+ rx_tcam->ref_cnt--;
+ if (rx_tcam->ref_cnt == 0)
+ sxe2_ipsec_id_free(rx_tcam_bmp, tcam_id);
+
+ if (rx_udp_group != NULL) {
+ rx_udp_group->ref_cnt--;
+ if (rx_udp_group->ref_cnt == 0)
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.rx_udp_bmp,
+ udp_group_id);
+ }
+
+ ret = -EIO;
+ goto l_end;
+ }
+
+ sxe2_ipsec_session_save(sxe2_sctx, conf, sxe2_sess, rx_sa->hw_sa_id, rx_sa->id);
+
+ PMD_LOG_INFO(DRV, "Add rx sa success, rx sa id: %u, rx ip id: %u, group id: %u, index: %u.",
+ rx_sa->hw_sa_id, rx_sa->hw_ip_id, rx_sa->udp_group_id, rx_sa->id);
+
+l_end:
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_hw_table_add(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess)
+{
+ int32_t ret = -1;
+
+ switch (conf->ipsec.direction) {
+ case RTE_SECURITY_IPSEC_SA_DIR_EGRESS:
+ ret = sxe2_ipsec_tx_sa_add(sxe2_sctx, conf, sxe2_sess);
+ break;
+ case RTE_SECURITY_IPSEC_SA_DIR_INGRESS:
+ ret = sxe2_ipsec_rx_sa_add(sxe2_sctx, conf, sxe2_sess);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid sa direction.");
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+int sxe2_ipsec_session_create(void *device,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(eth_dev);
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ int32_t ret = -1;
+
+ ret = sxe2_ipsec_session_conf_valid(sxe2_sctx, conf);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Input ipsec session conf invalid.");
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_hw_table_add(sxe2_sctx, conf, sxe2_sess);
+ if (ret)
+ goto l_end;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_tx_sa_delete(struct sxe2_security_ctx *sxe2_sctx,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct sxe2_ipsec_tx_sa *tx_sa = NULL;
+ uint16_t sa_id = sxe2_sess->sa.hw_idx;
+ uint16_t sw_sa_id = sxe2_sess->sa.sw_idx;
+ int32_t ret = -1;
+
+ if (sw_sa_id >= sxe2_sctx->ipsec_ctx.max_tx_sa) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "invalid sw sa id: %u.", sw_sa_id);
+ goto l_end;
+ }
+
+ if (!rte_bitmap_get(sxe2_sctx->ipsec_ctx.bmp.tx_sa_bmp, sw_sa_id)) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "bitmap not set, index: %u.", sw_sa_id);
+ goto l_end;
+ }
+
+ tx_sa = &sxe2_sctx->ipsec_ctx.tx_sa[sw_sa_id];
+
+ if (tx_sa->hw_sa_id != sa_id) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "invalid hw sa id: %u != %u.", sa_id, tx_sa->hw_sa_id);
+ goto l_end;
+ }
+
+ ret = sxe2_drv_ipsec_txsa_delete(sxe2_sctx->adapter, sa_id);
+ if (ret)
+ goto l_end;
+
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_bmp, sw_sa_id);
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_rx_sa_delete(struct sxe2_security_ctx *sxe2_sctx,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct sxe2_ipsec_rx_udp_group *rx_udp = NULL;
+ struct sxe2_ipsec_rx_tcam *rx_tcam = NULL;
+ struct sxe2_ipsec_rx_sa *rx_sa = NULL;
+ uint16_t sa_id = sxe2_sess->sa.hw_idx;
+ uint16_t sw_sa_id = sxe2_sess->sa.sw_idx;
+ int32_t ret = -1;
+
+ if (sw_sa_id >= sxe2_sctx->ipsec_ctx.max_rx_sa) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "invalid sw sa id: %u.", sw_sa_id);
+ goto l_end;
+ }
+
+ if (!rte_bitmap_get(sxe2_sctx->ipsec_ctx.bmp.rx_sa_bmp, sw_sa_id)) {
+ ret = 0;
+ PMD_LOG_INFO(DRV, "bitmap not set, id: %u.", sw_sa_id);
+ goto l_end;
+ }
+
+ rx_sa = &sxe2_sctx->ipsec_ctx.rx_sa[sw_sa_id];
+
+ if (rx_sa->hw_sa_id != sa_id) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "invalid hw sa id: %u != %u.", sa_id, rx_sa->hw_sa_id);
+ goto l_end;
+ }
+
+ ret = sxe2_drv_ipsec_rxsa_delete(sxe2_sctx->adapter, rx_sa);
+ if (ret)
+ goto l_end;
+
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_bmp, sw_sa_id);
+
+ rx_tcam = &sxe2_sctx->ipsec_ctx.rx_tcam[rx_sa->tcam_id];
+ rx_tcam->ref_cnt--;
+ if (rx_tcam->ref_cnt == 0)
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_bmp, rx_sa->tcam_id);
+
+ if (rx_sa->udp_group_id == 0xFF) {
+ PMD_LOG_INFO(DRV, "Not need to release udp group resource.");
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+ goto l_end;
+ }
+ rx_udp = &sxe2_sctx->ipsec_ctx.rx_udp_group[rx_sa->udp_group_id];
+ rx_udp->ref_cnt--;
+ if (rx_udp->ref_cnt == 0)
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.rx_udp_bmp, rx_sa->udp_group_id);
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_hw_table_delete(struct sxe2_security_ctx *sxe2_sctx,
+ struct sxe2_security_session *sxe2_sess)
+{
+ int32_t ret = -1;
+
+ switch (sxe2_sess->direction) {
+ case RTE_SECURITY_IPSEC_SA_DIR_EGRESS:
+ ret = sxe2_ipsec_tx_sa_delete(sxe2_sctx, sxe2_sess);
+ break;
+ case RTE_SECURITY_IPSEC_SA_DIR_INGRESS:
+ ret = sxe2_ipsec_rx_sa_delete(sxe2_sctx, sxe2_sess);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid sa direction.");
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+int sxe2_ipsec_session_destroy(void *device, struct rte_security_session *session)
+{
+ struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(eth_dev);
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ struct sxe2_security_session *sxe2_sess = NULL;
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+ int32_t ret = -1;
+
+ if (unlikely(sxe2_sess == NULL || sxe2_sess->adapter != adapter)) {
+ PMD_LOG_ERR(DRV, "Invalid device adapter.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_hw_table_delete(sxe2_sctx, sxe2_sess);
+ if (ret) {
+ ret = -EIO;
+ PMD_LOG_ERR(DRV, "Failed to delete ipsec hw tables.");
+ goto l_end;
+ }
+
+ memset(sxe2_sess, 0, sizeof(struct sxe2_security_session));
+
+ PMD_LOG_INFO(DRV, "Delete ipsec session success, sa_id: %u, spi: %u.",
+ sxe2_sess->sa.hw_idx, sxe2_sess->sa.spi);
+
+l_end:
+ return ret;
+}
+
+int sxe2_ipsec_pkt_metadata_set(void *device, struct rte_security_session *session,
+ struct rte_mbuf *m, void *params)
+{
+ struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(eth_dev);
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ struct sxe2_security_session *sxe2_sess = NULL;
+ struct sxe2_ipsec_pkt_metadata *md = NULL;
+ uint16_t offset = 0;
+ int32_t ret = -1;
+
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+ if (unlikely(sxe2_sess == NULL || sxe2_sess->adapter != adapter)) {
+ PMD_LOG_ERR(DRV, "Invalid parameters.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ offset = ((struct sxe2_ipsec_metadata_params *)params)->esp_header_offset;
+ if (offset <= IPSEC_ESP_OFFSET_MIN || offset >= IPSEC_ESP_OFFSET_MAX) {
+ PMD_LOG_ERR(DRV, "Invalid esp header offset.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ md = RTE_MBUF_DYNFIELD(m, sxe2_sctx->ipsec_ctx.md_offset, struct sxe2_ipsec_pkt_metadata *);
+
+ memcpy(md, &sxe2_sess->pkt_metadata_template, sizeof(struct sxe2_ipsec_pkt_metadata));
+ md->esp_head_offset = offset;
+
+ PMD_LOG_INFO(DRV, "ipsec metadata set, offset:%u, said:%u, mode:%u, algo:%u.", offset,
+ sxe2_sess->pkt_metadata_template.sa_idx, sxe2_sess->pkt_metadata_template.mode,
+ sxe2_sess->pkt_metadata_template.algo);
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+int sxe2_ipsec_pkt_md_offset_get(struct sxe2_adapter *adapter)
+{
+ return adapter->security_ctx.ipsec_ctx.md_offset;
+}
+
+static void sxe2_ipsec_enc_aes_cbc_fill(struct rte_cryptodev_capabilities *cap)
+{
+ cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+
+ cap->sym.cipher.algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC;
+
+ cap->sym.cipher.block_size = SXE2_SECURITY_BLOCK_SIZE_16;
+
+ cap->sym.cipher.key_size.min = SXE2_IPSEC_AES_KEY_MIN;
+ cap->sym.cipher.key_size.max = SXE2_IPSEC_AES_KEY_MAX;
+ cap->sym.cipher.key_size.increment = SXE2_IPSEC_AES_KEY_INC;
+
+ cap->sym.cipher.iv_size.min = SXE2_IPSEC_AES_IV_MIN;
+ cap->sym.cipher.iv_size.max = SXE2_IPSEC_AES_IV_MAX;
+ cap->sym.cipher.iv_size.increment = SXE2_IPSEC_AES_IV_INC;
+
+ cap->sym.cipher.dataunit_set |= RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES;
+}
+
+static void sxe2_ipsec_enc_sm4_cbc_fill(struct rte_cryptodev_capabilities *cap)
+{
+ cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+
+ cap->sym.cipher.algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC;
+
+ cap->sym.cipher.block_size = SXE2_SECURITY_BLOCK_SIZE_16;
+
+ cap->sym.cipher.key_size.min = SXE2_IPSEC_SM4_KEY_MIN;
+ cap->sym.cipher.key_size.max = SXE2_IPSEC_SM4_KEY_MAX;
+ cap->sym.cipher.key_size.increment = SXE2_IPSEC_SM4_KEY_INC;
+
+ cap->sym.cipher.iv_size.min = SXE2_IPSEC_SM4_IV_MIN;
+ cap->sym.cipher.iv_size.max = SXE2_IPSEC_SM4_IV_MAX;
+ cap->sym.cipher.iv_size.increment = SXE2_IPSEC_SM4_IV_INC;
+
+ cap->sym.cipher.dataunit_set |= RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES;
+}
+
+static void sxe2_ipsec_auth_sha_hmac_fill(struct rte_cryptodev_capabilities *cap)
+{
+ cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_AUTH;
+
+ cap->sym.auth.algo = SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC;
+
+ cap->sym.auth.block_size = SXE2_SECURITY_BLOCK_SIZE_64;
+
+ cap->sym.auth.key_size.min = SXE2_IPSEC_SHA_KEY_MIN;
+ cap->sym.auth.key_size.max = SXE2_IPSEC_SHA_KEY_MAX;
+ cap->sym.auth.key_size.increment = SXE2_IPSEC_SHA_KEY_INC;
+
+ cap->sym.auth.iv_size.min = SXE2_IPSEC_SHA_IV_MIN;
+ cap->sym.auth.iv_size.max = SXE2_IPSEC_SHA_IV_MAX;
+ cap->sym.auth.iv_size.increment = SXE2_IPSEC_SHA_IV_INC;
+
+ cap->sym.auth.digest_size.min = SXE2_IPSEC_SHA_DIGEST_MIN;
+ cap->sym.auth.digest_size.max = SXE2_IPSEC_SHA_DIGEST_MAX;
+ cap->sym.auth.digest_size.increment = SXE2_IPSEC_SHA_DIGEST_INC;
+
+ cap->sym.auth.aad_size.min = SXE2_IPSEC_AAD_MIN;
+ cap->sym.auth.aad_size.max = SXE2_IPSEC_AAD_MAX;
+ cap->sym.auth.aad_size.increment = SXE2_IPSEC_AAD_INC;
+}
+
+static void sxe2_ipsec_auth_sm3_hmac_fill(struct rte_cryptodev_capabilities *cap)
+{
+ cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_AUTH;
+
+ cap->sym.auth.algo = SXE2_RTE_CRYPTO_AUTH_SM3_HMAC;
+
+ cap->sym.auth.block_size = SXE2_SECURITY_BLOCK_SIZE_64;
+
+ cap->sym.auth.key_size.min = SXE2_IPSEC_SM3_KEY_MIN;
+ cap->sym.auth.key_size.max = SXE2_IPSEC_SM3_KEY_MAX;
+ cap->sym.auth.key_size.increment = SXE2_IPSEC_SM3_KEY_INC;
+
+ cap->sym.auth.iv_size.min = SXE2_IPSEC_SM3_IV_MIN;
+ cap->sym.auth.iv_size.max = SXE2_IPSEC_SM3_IV_MAX;
+ cap->sym.auth.iv_size.increment = SXE2_IPSEC_SM3_IV_INC;
+
+ cap->sym.auth.digest_size.min = SXE2_IPSEC_SM3_DIGEST_MIN;
+ cap->sym.auth.digest_size.max = SXE2_IPSEC_SM3_DIGEST_MAX;
+ cap->sym.auth.digest_size.increment = SXE2_IPSEC_SM3_DIGEST_INC;
+
+ cap->sym.auth.aad_size.min = SXE2_IPSEC_AAD_MIN;
+ cap->sym.auth.aad_size.max = SXE2_IPSEC_AAD_MAX;
+ cap->sym.auth.aad_size.increment = SXE2_IPSEC_AAD_INC;
+}
+
+static int32_t
+sxe2_ipsec_capabilities_init(struct sxe2_security_ctx *sxe2_sctx)
+{
+ struct rte_cryptodev_capabilities *capabilities = NULL;
+ struct sxe2_security_capabilities *sxe2_cap =
+ &sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC];
+ int32_t ret = -1;
+ uint8_t index = 0;
+
+ sxe2_cap->action = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO;
+ sxe2_cap->ipsec.proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP;
+ sxe2_cap->ipsec.mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL;
+ sxe2_cap->ipsec.options.stats = 1;
+
+ capabilities = rte_zmalloc("security_caps",
+ sizeof(struct rte_cryptodev_capabilities) * SXE2_IPSEC_CAP_MAX, 0);
+ if (capabilities == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ for (index = 0; index < SXE2_IPSEC_CAP_MAX; index++) {
+ capabilities[index].op = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
+ switch (index) {
+ case SXE2_IPSEC_CAP_ENC_AES_CBC:
+ sxe2_ipsec_enc_aes_cbc_fill(&capabilities[index]);
+ break;
+ case SXE2_IPSEC_CAP_ENC_SM4_CBC:
+ sxe2_ipsec_enc_sm4_cbc_fill(&capabilities[index]);
+ break;
+ case SXE2_IPSEC_CAP_AUTH_SHA256_HMAC:
+ sxe2_ipsec_auth_sha_hmac_fill(&capabilities[index]);
+ break;
+ case SXE2_IPSEC_CAP_AUTH_SM3_HMAC:
+ sxe2_ipsec_auth_sm3_hmac_fill(&capabilities[index]);
+ break;
+ default:
+ break;
+ }
+ }
+
+ sxe2_cap->crypto_capabilities = capabilities;
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static void
+sxe2_ipsec_tx_sa_init(struct sxe2_ipsec_tx_sa *tx_sa, uint16_t len)
+{
+ struct sxe2_ipsec_tx_sa *per = NULL;
+ uint16_t i;
+
+ memset(tx_sa, 0, sizeof(struct sxe2_ipsec_tx_sa) * len);
+ for (i = 0; i < len; i++) {
+ per = &tx_sa[i];
+ per->id = i;
+ }
+}
+
+static void
+sxe2_ipsec_rx_sa_init(struct sxe2_ipsec_rx_sa *rx_sa, uint16_t len)
+{
+ struct sxe2_ipsec_rx_sa *per = NULL;
+ uint16_t i;
+
+ memset(rx_sa, 0, sizeof(struct sxe2_ipsec_rx_sa) * len);
+ for (i = 0; i < len; i++) {
+ per = &rx_sa[i];
+ per->id = i;
+ }
+}
+
+static void
+sxe2_ipsec_rx_tcam_init(struct sxe2_ipsec_rx_tcam *rx_tcam, uint16_t len)
+{
+ struct sxe2_ipsec_rx_tcam *per = NULL;
+ uint16_t i;
+
+ memset(rx_tcam, 0, sizeof(struct sxe2_ipsec_rx_tcam) * len);
+ for (i = 0; i < len; i++) {
+ per = &rx_tcam[i];
+ per->id = i;
+ }
+}
+
+static void
+sxe2_ipsec_rx_udp_group_init(struct sxe2_ipsec_rx_udp_group *rx_udp_group, uint16_t len)
+{
+ struct sxe2_ipsec_rx_udp_group *per = NULL;
+ uint16_t i;
+
+ memset(rx_udp_group, 0, sizeof(struct sxe2_ipsec_rx_udp_group) * len);
+ for (i = 0; i < len; i++) {
+ per = &rx_udp_group[i];
+ per->id = i;
+ }
+}
+
+static int32_t
+sxe2_ipsec_hw_table_init(struct sxe2_security_ctx *sxe2_sctx)
+{
+ struct sxe2_ipsec_tx_sa *tx_sa = NULL;
+ struct sxe2_ipsec_rx_sa *rx_sa = NULL;
+ struct sxe2_ipsec_rx_tcam *rx_tcam = NULL;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group = NULL;
+ uint16_t max_tx_sa = sxe2_sctx->ipsec_ctx.max_tx_sa;
+ uint16_t max_rx_sa = sxe2_sctx->ipsec_ctx.max_rx_sa;
+ uint16_t max_tcam = sxe2_sctx->ipsec_ctx.max_tcam;
+ uint16_t max_udp_group = sxe2_sctx->ipsec_ctx.max_udp_group;
+ int32_t ret = -1;
+
+ tx_sa = rte_zmalloc("sxe2_ipsec_tx_sa", sizeof(struct sxe2_ipsec_tx_sa) * max_tx_sa, 0);
+ if (tx_sa == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ sxe2_ipsec_tx_sa_init(tx_sa, max_tx_sa);
+ sxe2_sctx->ipsec_ctx.tx_sa = tx_sa;
+
+ rx_sa = rte_zmalloc("sxe2_ipsec_rx_sa", sizeof(struct sxe2_ipsec_rx_sa) * max_rx_sa, 0);
+ if (rx_sa == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ sxe2_ipsec_rx_sa_init(rx_sa, max_rx_sa);
+ sxe2_sctx->ipsec_ctx.rx_sa = rx_sa;
+
+ rx_tcam = rte_zmalloc("sxe2_ipsec_rx_tcam",
+ sizeof(struct sxe2_ipsec_rx_tcam) * max_tcam, 0);
+ if (rx_tcam == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ sxe2_ipsec_rx_tcam_init(rx_tcam, max_tcam);
+ sxe2_sctx->ipsec_ctx.rx_tcam = rx_tcam;
+
+ rx_udp_group = rte_zmalloc("sxe2_ipsec_rx_udp_group",
+ sizeof(struct sxe2_ipsec_rx_udp_group) * max_udp_group, 0);
+ if (rx_udp_group == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ sxe2_ipsec_rx_udp_group_init(rx_udp_group, max_udp_group);
+ sxe2_sctx->ipsec_ctx.rx_udp_group = rx_udp_group;
+
+ ret = 0;
+
+l_end:
+ if (ret) {
+ if (tx_sa != NULL) {
+ rte_free(tx_sa);
+ sxe2_sctx->ipsec_ctx.tx_sa = NULL;
+ }
+ if (rx_sa != NULL) {
+ rte_free(rx_sa);
+ sxe2_sctx->ipsec_ctx.rx_sa = NULL;
+ }
+ if (rx_tcam != NULL) {
+ rte_free(rx_tcam);
+ sxe2_sctx->ipsec_ctx.rx_tcam = NULL;
+ }
+ if (rx_udp_group != NULL) {
+ rte_free(rx_udp_group);
+ sxe2_sctx->ipsec_ctx.rx_udp_group = NULL;
+ }
+ }
+ return ret;
+}
+
+int32_t sxe2_ipsec_init(struct sxe2_adapter *adapter)
+{
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ struct sxe2_security_capabilities *sxe2_cap = NULL;
+ int32_t ret = -1;
+ struct rte_mbuf_dynfield pkt_md_dynfield = {
+ .name = "sxe2_ipsec_pkt_metadata",
+ .size = sizeof(struct sxe2_ipsec_pkt_metadata),
+ .align = alignof(struct sxe2_ipsec_pkt_metadata)
+ };
+
+ PMD_LOG_INFO(INIT, "Init ipsec.");
+
+ sxe2_sctx->ipsec_ctx.md_offset = rte_mbuf_dynfield_register(&pkt_md_dynfield);
+ if (sxe2_sctx->ipsec_ctx.md_offset < 0) {
+ PMD_LOG_ERR(INIT, "Failed to register ipsec mbuf dynamic field.");
+ ret = -EIO;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_capabilities_init(sxe2_sctx);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to init ipsec capabilities.");
+ goto l_end;
+ }
+
+ ret = sxe2_drv_ipsec_get_capa(adapter);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to get ipsec capabilities.");
+ goto l_caps_free;
+ }
+
+ ret = sxe2_ipsec_bitmap_init(sxe2_sctx);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to init ipsec bitmap.");
+ goto l_caps_free;
+ }
+
+ ret = sxe2_ipsec_hw_table_init(sxe2_sctx);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to init ipsec hw table.");
+ goto l_bitmap_free;
+ }
+
+ goto l_end;
+
+l_bitmap_free:
+
+ if (sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem = NULL;
+ }
+l_caps_free:
+ sxe2_cap = &sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC];
+ if (sxe2_cap->crypto_capabilities != NULL) {
+ rte_free(sxe2_cap->crypto_capabilities);
+ sxe2_cap->crypto_capabilities = NULL;
+ }
+l_end:
+ return ret;
+}
+
+void sxe2_ipsec_uinit(struct sxe2_adapter *adapter)
+{
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ struct sxe2_security_capabilities *sxe2_cap =
+ &sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC];
+ struct sxe2_ipsec_tx_sa *tx_sa = sxe2_sctx->ipsec_ctx.tx_sa;
+ struct sxe2_ipsec_rx_sa *rx_sa = sxe2_sctx->ipsec_ctx.rx_sa;
+ struct sxe2_ipsec_rx_tcam *rx_tcam = sxe2_sctx->ipsec_ctx.rx_tcam;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group = sxe2_sctx->ipsec_ctx.rx_udp_group;
+
+ PMD_LOG_INFO(INIT, "Uinit ipsec.");
+
+ (void)sxe2_drv_ipsec_resource_clear(adapter);
+
+ if (sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem = NULL;
+ }
+
+ if (tx_sa != NULL) {
+ rte_free(tx_sa);
+ sxe2_sctx->ipsec_ctx.tx_sa = NULL;
+ }
+ if (rx_sa != NULL) {
+ rte_free(rx_sa);
+ sxe2_sctx->ipsec_ctx.rx_sa = NULL;
+ }
+ if (rx_tcam != NULL) {
+ rte_free(rx_tcam);
+ sxe2_sctx->ipsec_ctx.rx_tcam = NULL;
+ }
+ if (rx_udp_group != NULL) {
+ rte_free(rx_udp_group);
+ sxe2_sctx->ipsec_ctx.rx_udp_group = NULL;
+ }
+
+ if (sxe2_cap->crypto_capabilities != NULL) {
+ rte_free(sxe2_cap->crypto_capabilities);
+ sxe2_cap->crypto_capabilities = NULL;
+ }
+}
diff --git a/drivers/net/sxe2/sxe2_ipsec.h b/drivers/net/sxe2/sxe2_ipsec.h
new file mode 100644
index 0000000000..02930ddb4f
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_ipsec.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+#ifndef __SXE2_IPSEC_H__
+#define __SXE2_IPSEC_H__
+
+#include <rte_security.h>
+#include <rte_security_driver.h>
+
+struct sxe2_adapter;
+struct sxe2_security_session;
+
+#define SXE2_IPSEC_AES_KEY_MIN (32)
+#define SXE2_IPSEC_AES_KEY_MAX (32)
+#define SXE2_IPSEC_AES_KEY_INC (0)
+
+#define SXE2_IPSEC_SM4_KEY_MIN (16)
+#define SXE2_IPSEC_SM4_KEY_MAX (16)
+#define SXE2_IPSEC_SM4_KEY_INC (0)
+
+#define SXE2_IPSEC_SHA_KEY_MIN (32)
+#define SXE2_IPSEC_SHA_KEY_MAX (32)
+#define SXE2_IPSEC_SHA_KEY_INC (0)
+
+#define SXE2_IPSEC_SM3_KEY_MIN (32)
+#define SXE2_IPSEC_SM3_KEY_MAX (32)
+#define SXE2_IPSEC_SM3_KEY_INC (0)
+
+#define SXE2_IPSEC_AES_IV_MIN (16)
+#define SXE2_IPSEC_AES_IV_MAX (16)
+#define SXE2_IPSEC_AES_IV_INC (0)
+
+#define SXE2_IPSEC_SM4_IV_MIN (16)
+#define SXE2_IPSEC_SM4_IV_MAX (16)
+#define SXE2_IPSEC_SM4_IV_INC (0)
+
+#define SXE2_IPSEC_SHA_IV_MIN (0)
+#define SXE2_IPSEC_SHA_IV_MAX (32)
+#define SXE2_IPSEC_SHA_IV_INC (16)
+
+#define SXE2_IPSEC_SM3_IV_MIN (0)
+#define SXE2_IPSEC_SM3_IV_MAX (32)
+#define SXE2_IPSEC_SM3_IV_INC (16)
+
+#define SXE2_IPSEC_SHA_DIGEST_MIN (32)
+#define SXE2_IPSEC_SHA_DIGEST_MAX (32)
+#define SXE2_IPSEC_SHA_DIGEST_INC (0)
+
+#define SXE2_IPSEC_SM3_DIGEST_MIN (32)
+#define SXE2_IPSEC_SM3_DIGEST_MAX (32)
+#define SXE2_IPSEC_SM3_DIGEST_INC (0)
+
+#define SXE2_IPSEC_AAD_MIN (0)
+#define SXE2_IPSEC_AAD_MAX (0)
+#define SXE2_IPSEC_AAD_INC (0)
+
+#define SXE2_IPSEC_MAX_KEY_LEN (32)
+#define SXE2_IPSEC_MIN_KEY_LEN (0)
+
+#define SXE2_IPSEC_OL_FLAGS_IS_TUN (0x1 << 0)
+#define SXE2_IPSEC_OL_FLAGS_IS_ESP (0x1 << 1)
+
+#define SXE2_IPSEC_DEFAULT_SA_OFFSET (0)
+#define SXE2_IPSEC_DEFAULT_SA_LEN (1024)
+
+#define IPSEC_TX_ENCRYPT (RTE_BIT32(0))
+#define IPSEC_TX_ENGINE_SM4 (RTE_BIT32(1))
+
+#define IPSEC_RX_VALID (RTE_BIT32(0))
+#define IPSEC_RX_IPV6 (RTE_BIT32(2))
+#define IPSEC_RX_DECRYPT (RTE_BIT32(3))
+#define IPSEC_RX_ENGINE_SM4 (RTE_BIT32(4))
+
+#define IPSEC_IPV6_LEN (4)
+#define IPSEC_ESP_OFFSET_MIN (16)
+#define IPSEC_ESP_OFFSET_MAX (256)
+
+enum sxe2_ipsec_cap {
+ SXE2_IPSEC_CAP_ENC_AES_CBC = 0,
+ SXE2_IPSEC_CAP_ENC_SM4_CBC = 1,
+ SXE2_IPSEC_CAP_AUTH_SHA256_HMAC = 2,
+ SXE2_IPSEC_CAP_AUTH_SM3_HMAC = 3,
+ SXE2_IPSEC_CAP_MAX = 4,
+};
+
+enum sxe2_ipsec_icv_len {
+ SXE2_IPSEC_ICV_0_BYTES = 0,
+ SXE2_IPSEC_ICV_12_BYTES,
+ SXE2_IPSEC_ICV_16_BYTES,
+ SXE2_IPSEC_ICV_INVALID,
+};
+
+enum sxe2_ipsec_bypass_dir {
+ SXE2_IPSEC_BYPASS_DIR_RX = 0,
+ SXE2_IPSEC_BYPASS_DIR_TX,
+ SXE2_IPSEC_BYPASS_DIR_INVALID,
+};
+
+enum sxe2_ipsec_bypass_status {
+ SXE2_IPSEC_BYPASS_STATUS_DISABLE = 0,
+ SXE2_IPSEC_BYPASS_STATUS_ENABLE,
+ SXE2_IPSEC_BYPASS_STATUS_INVALID,
+};
+
+enum sxe2_ipsec_status {
+ SXE2_IPSEC_ENC_BYPASS = 0,
+ SXE2_IPSEC_ENC_ENABLE,
+ SXE2_IPSEC_ENC_INVALID,
+};
+
+enum sxe2_ipsec_mode {
+ SXE2_IPSEC_MODE_ENC_AND_AUTH = 0,
+ SXE2_IPSEC_MODE_ONLY_ENCRYPT,
+ SXE2_IPSEC_MODE_INVALID,
+};
+
+struct sxe2_ipsec_ip_param {
+ enum rte_security_ipsec_tunnel_type type;
+ union {
+ uint32_t dst_ipv4;
+ uint32_t dst_ipv6[IPSEC_IPV6_LEN];
+ };
+};
+
+enum sxe2_ipsec_algorithm {
+ SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC = 0,
+ SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC,
+ SXE2_IPSEC_ALGO_INVALID,
+};
+
+struct sxe2_ipsec_pkt_metadata {
+ uint16_t sa_idx;
+ uint16_t esp_head_offset;
+ uint8_t ol_flags;
+ uint8_t mode;
+ uint8_t algo;
+};
+
+struct sxe2_ipsec_bitmap {
+ struct rte_bitmap *tx_sa_bmp;
+ struct rte_bitmap *rx_sa_bmp;
+ struct rte_bitmap *rx_tcam_bmp;
+ struct rte_bitmap *rx_udp_bmp;
+ void *tx_sa_mem;
+ void *rx_sa_mem;
+ void *rx_tcam_mem;
+ void *rx_udp_mem;
+};
+
+struct sxe2_ipsec_security_sa {
+ uint32_t spi;
+ uint16_t hw_idx;
+ uint16_t sw_idx;
+};
+
+struct sxe2_ipsec_esn {
+ union {
+ uint64_t value;
+ struct {
+ uint32_t hi;
+ uint32_t low;
+ };
+ };
+ uint8_t enabled;
+};
+
+struct sxe2_ipsec_udp {
+ struct rte_security_ipsec_udp_param value;
+ uint8_t enabled;
+};
+
+struct sxe2_ipsec_tx_sa {
+ struct rte_security_ipsec_xform xform;
+ uint16_t id;
+ uint16_t hw_sa_id;
+ enum sxe2_ipsec_mode mode;
+ enum sxe2_ipsec_algorithm algo;
+ uint8_t enc_key[SXE2_IPSEC_MAX_KEY_LEN];
+ uint8_t auth_key[SXE2_IPSEC_MAX_KEY_LEN];
+};
+
+struct sxe2_ipsec_rx_sa {
+ struct rte_security_ipsec_xform xform;
+ uint32_t spi;
+ uint16_t id;
+ uint16_t hw_sa_id;
+ uint8_t hw_ip_id;
+ uint8_t hw_udp_group_id;
+ uint8_t tcam_id;
+ uint8_t udp_group_id;
+ uint8_t sdn_group_id;
+ enum sxe2_ipsec_mode mode;
+ enum sxe2_ipsec_algorithm algo;
+ uint8_t enc_key[SXE2_IPSEC_MAX_KEY_LEN];
+ uint8_t auth_key[SXE2_IPSEC_MAX_KEY_LEN];
+};
+
+struct sxe2_ipsec_rx_tcam {
+ struct sxe2_ipsec_ip_param ip_addr;
+ uint16_t id;
+ uint8_t hw_ip_id;
+ uint8_t ref_cnt;
+};
+
+struct sxe2_ipsec_rx_udp_group {
+ uint16_t udp_port;
+ uint8_t sport_en;
+ uint8_t dport_en;
+ uint8_t id;
+ uint8_t hw_group_id;
+ uint8_t ref_cnt;
+};
+
+struct sxe2_ipsec_ctx {
+ struct sxe2_ipsec_tx_sa *tx_sa;
+ struct sxe2_ipsec_rx_sa *rx_sa;
+ struct sxe2_ipsec_rx_tcam *rx_tcam;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group;
+ struct sxe2_ipsec_bitmap bmp;
+ int md_offset;
+ uint16_t max_tx_sa;
+ uint16_t max_rx_sa;
+ uint16_t max_tcam;
+ uint8_t max_udp_group;
+};
+
+struct sxe2_ipsec_metadata_params {
+ uint16_t esp_header_offset;
+ uint16_t reserved;
+};
+
+bool sxe2_ipsec_supported(struct sxe2_adapter *adapter);
+
+bool sxe2_ipsec_valid_tx_offloads(uint64_t offloads);
+
+bool sxe2_ipsec_valid_rx_offloads(uint64_t offloads);
+
+int sxe2_ipsec_pkt_md_offset_get(struct sxe2_adapter *adapter);
+
+int sxe2_ipsec_session_create(void *device,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess);
+
+int sxe2_ipsec_session_destroy(void *device,
+ struct rte_security_session *session);
+
+int sxe2_ipsec_pkt_metadata_set(void *device, struct rte_security_session *session,
+ struct rte_mbuf *m, void *params);
+
+int32_t sxe2_ipsec_init(struct sxe2_adapter *adapter);
+
+void sxe2_ipsec_uinit(struct sxe2_adapter *adapter);
+
+#endif /* __SXE2_IPSEC_H__ */
diff --git a/drivers/net/sxe2/sxe2_rx.c b/drivers/net/sxe2/sxe2_rx.c
index 28832d5f71..007192c7d8 100644
--- a/drivers/net/sxe2/sxe2_rx.c
+++ b/drivers/net/sxe2/sxe2_rx.c
@@ -294,6 +294,11 @@ int32_t __rte_cold sxe2_rx_queue_setup(struct rte_eth_dev *dev,
goto l_end;
}
+ if (!sxe2_ipsec_valid_rx_offloads(offloads)) {
+ ret = -EINVAL;
+ goto l_end;
+ }
+
rxq = sxe2_rx_queue_alloc(dev, queue_idx, nb_desc, socket_id);
if (rxq == NULL) {
PMD_LOG_ERR(RX, "rx queue[%d] resource alloc failed", queue_idx);
diff --git a/drivers/net/sxe2/sxe2_security.c b/drivers/net/sxe2/sxe2_security.c
new file mode 100644
index 0000000000..bc59d1b880
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_security.c
@@ -0,0 +1,335 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#include <rte_malloc.h>
+
+#include "sxe2_ethdev.h"
+#include "sxe2_security.h"
+#include "sxe2_ipsec.h"
+#include "sxe2_common_log.h"
+
+static unsigned int
+sxe2_security_session_size_get(void *device __rte_unused)
+{
+ return sizeof(struct sxe2_security_session);
+}
+
+static int
+sxe2_security_session_create(void *device,
+ struct rte_security_session_conf *conf,
+ struct rte_security_session *session)
+{
+ int32_t ret = -1;
+ struct sxe2_security_session *sxe2_sess = NULL;
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+
+ switch (conf->protocol) {
+ case RTE_SECURITY_PROTOCOL_IPSEC:
+ ret = sxe2_ipsec_session_create(device, conf, sxe2_sess);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid security protocol.");
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+sxe2_security_session_destroy(void *device, struct rte_security_session *session)
+{
+ int32_t ret = -1;
+ struct sxe2_security_session *sxe2_sess = NULL;
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+
+ switch (sxe2_sess->protocol) {
+ case RTE_SECURITY_PROTOCOL_IPSEC:
+ ret = sxe2_ipsec_session_destroy(device, session);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid security protocol.");
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
+
+static int
+sxe2_security_pkt_metadata_set(void *device,
+ struct rte_security_session *session,
+ struct rte_mbuf *m, void *params)
+{
+ struct sxe2_security_session *sxe2_sess = NULL;
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+ int32_t ret = -1;
+
+ switch (sxe2_sess->protocol) {
+ case RTE_SECURITY_PROTOCOL_IPSEC:
+ ret = sxe2_ipsec_pkt_metadata_set(device, session, m, params);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid security protocol.");
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static const struct rte_security_capability *
+sxe2_security_capabilities_get(void *device __rte_unused)
+{
+ static const struct rte_cryptodev_capabilities
+ ipsec_crypto_capabilities[] = {
+ {
+ .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
+ {.cipher = {
+ .algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC,
+ .block_size = SXE2_SECURITY_BLOCK_SIZE_16,
+ .key_size = {
+ .min = SXE2_IPSEC_AES_KEY_MIN,
+ .max = SXE2_IPSEC_AES_KEY_MAX,
+ .increment = SXE2_IPSEC_AES_KEY_INC
+ },
+ .iv_size = {
+ .min = SXE2_IPSEC_AES_IV_MIN,
+ .max = SXE2_IPSEC_AES_IV_MAX,
+ .increment = SXE2_IPSEC_AES_IV_INC
+ },
+ .dataunit_set = RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES,
+ }, }
+ }, }
+ },
+ {
+ .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
+ {.cipher = {
+ .algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC,
+ .block_size = SXE2_SECURITY_BLOCK_SIZE_16,
+ .key_size = {
+ .min = SXE2_IPSEC_SM4_KEY_MIN,
+ .max = SXE2_IPSEC_SM4_KEY_MAX,
+ .increment = SXE2_IPSEC_SM4_KEY_INC
+ },
+ .iv_size = {
+ .min = SXE2_IPSEC_SM4_IV_MIN,
+ .max = SXE2_IPSEC_SM4_IV_MAX,
+ .increment = SXE2_IPSEC_SM4_IV_INC
+ },
+ .dataunit_set = RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES,
+ }, }
+ }, }
+ },
+ {
+ .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+ {.auth = {
+ .algo = SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC,
+ .block_size = SXE2_SECURITY_BLOCK_SIZE_64,
+ .key_size = {
+ .min = SXE2_IPSEC_SHA_KEY_MIN,
+ .max = SXE2_IPSEC_SHA_KEY_MAX,
+ .increment = SXE2_IPSEC_SHA_KEY_INC
+ },
+ .digest_size = {
+ .min = SXE2_IPSEC_SHA_DIGEST_MIN,
+ .max = SXE2_IPSEC_SHA_DIGEST_MAX,
+ .increment = SXE2_IPSEC_SHA_DIGEST_INC
+ },
+ .iv_size = {
+ .min = SXE2_IPSEC_SHA_IV_MIN,
+ .max = SXE2_IPSEC_SHA_IV_MAX,
+ .increment = SXE2_IPSEC_SHA_IV_INC
+ },
+ .aad_size = {
+ .min = SXE2_IPSEC_AAD_MIN,
+ .max = SXE2_IPSEC_AAD_MAX,
+ .increment = SXE2_IPSEC_AAD_INC
+ }
+ }, }
+ }, }
+ },
+ {
+ .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+ {.auth = {
+ .algo = SXE2_RTE_CRYPTO_AUTH_SM3_HMAC,
+ .block_size = SXE2_SECURITY_BLOCK_SIZE_64,
+ .key_size = {
+ .min = SXE2_IPSEC_SM3_KEY_MIN,
+ .max = SXE2_IPSEC_SM3_KEY_MAX,
+ .increment = SXE2_IPSEC_SM3_KEY_INC
+ },
+ .digest_size = {
+ .min = SXE2_IPSEC_SM3_DIGEST_MIN,
+ .max = SXE2_IPSEC_SM3_DIGEST_MAX,
+ .increment = SXE2_IPSEC_SM3_DIGEST_INC
+ },
+ .iv_size = {
+ .min = SXE2_IPSEC_SM3_IV_MIN,
+ .max = SXE2_IPSEC_SM3_IV_MAX,
+ .increment = SXE2_IPSEC_SM3_IV_INC
+ },
+ .aad_size = {
+ .min = SXE2_IPSEC_AAD_MIN,
+ .max = SXE2_IPSEC_AAD_MAX,
+ .increment = SXE2_IPSEC_AAD_INC
+ }
+ }, }
+ }, }
+ },
+ {
+ .op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED
+ }, }
+ }
+ };
+
+ static const struct rte_security_capability
+ sxe2_security_capabilities[] = {
+ {
+ .action = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO,
+ .protocol = RTE_SECURITY_PROTOCOL_IPSEC,
+ {.ipsec = {
+ .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP,
+ .mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL,
+ .direction = RTE_SECURITY_IPSEC_SA_DIR_EGRESS,
+ .options = {
+ .esn = 0,
+ .udp_encap = 1,
+ .copy_dscp = 0,
+ .copy_flabel = 0,
+ .copy_df = 0,
+ .dec_ttl = 0,
+ .ecn = 0,
+ .stats = 1,
+ .iv_gen_disable = 0,
+ .tunnel_hdr_verify = 1,
+ .udp_ports_verify = 1,
+ .ip_csum_enable = 0,
+ .l4_csum_enable = 0,
+ .ip_reassembly_en = 0,
+ .ingress_oop = 0
+ } } },
+ .crypto_capabilities = ipsec_crypto_capabilities,
+ .ol_flags = RTE_SECURITY_TX_OLOAD_NEED_MDATA
+ },
+ {
+ .action = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO,
+ .protocol = RTE_SECURITY_PROTOCOL_IPSEC,
+ {.ipsec = {
+ .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP,
+ .mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL,
+ .direction = RTE_SECURITY_IPSEC_SA_DIR_INGRESS,
+ .options = {
+ .esn = 0,
+ .udp_encap = 1,
+ .copy_dscp = 0,
+ .copy_flabel = 0,
+ .copy_df = 0,
+ .dec_ttl = 0,
+ .ecn = 0,
+ .stats = 1,
+ .iv_gen_disable = 0,
+ .tunnel_hdr_verify = 1,
+ .udp_ports_verify = 1,
+ .ip_csum_enable = 0,
+ .l4_csum_enable = 0,
+ .ip_reassembly_en = 0,
+ .ingress_oop = 0
+ } } },
+ .crypto_capabilities = ipsec_crypto_capabilities,
+ .ol_flags = 0
+ },
+ {
+ .action = RTE_SECURITY_ACTION_TYPE_NONE
+ }
+ };
+
+ return sxe2_security_capabilities;
+}
+
+static struct rte_security_ops sxe2_security_ops = {
+ .session_get_size = sxe2_security_session_size_get,
+ .session_create = sxe2_security_session_create,
+ .session_destroy = sxe2_security_session_destroy,
+ .set_pkt_metadata = sxe2_security_pkt_metadata_set,
+ .capabilities_get = sxe2_security_capabilities_get,
+};
+
+int32_t sxe2_security_init(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct rte_security_ctx *sctx = NULL;
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ int32_t ret = -1;
+
+ if (!sxe2_ipsec_supported(adapter)) {
+ ret = 0;
+ PMD_LOG_INFO(INIT, "Not support security feature.");
+ goto l_end;
+ }
+
+ PMD_LOG_INFO(INIT, "Init security feature.");
+
+ sctx = rte_zmalloc("security_ctx", sizeof(struct rte_security_ctx), 0);
+ if (sctx == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ sctx->device = dev;
+ sctx->ops = &sxe2_security_ops;
+ sctx->sess_cnt = 0;
+ sctx->flags = 0;
+ dev->security_ctx = (void *)sctx;
+
+ rte_spinlock_init(&sxe2_sctx->security_lock);
+ sxe2_sctx->adapter = adapter;
+
+ if (sxe2_ipsec_supported(adapter)) {
+ ret = sxe2_ipsec_init(adapter);
+ if (ret) {
+ rte_free(sctx);
+ sctx = NULL;
+ dev->security_ctx = NULL;
+ goto l_end;
+ }
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+void sxe2_security_uinit(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct rte_security_ctx *sctx = dev->security_ctx;
+
+ if (!sxe2_ipsec_supported(adapter)) {
+ PMD_LOG_INFO(INIT, "Not support security feature.");
+ goto l_end;
+ }
+
+ PMD_LOG_INFO(INIT, "Uinit security feature.");
+
+ if (sctx != NULL) {
+ rte_free(sctx);
+ sctx = NULL;
+ }
+
+ sxe2_ipsec_uinit(adapter);
+
+l_end:
+ return;
+}
diff --git a/drivers/net/sxe2/sxe2_security.h b/drivers/net/sxe2/sxe2_security.h
new file mode 100644
index 0000000000..366c0614bd
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_security.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef __SXE2_SECURITY_H__
+#define __SXE2_SECURITY_H__
+
+#include <rte_security.h>
+#include <rte_cryptodev.h>
+#include <rte_security_driver.h>
+
+#include "sxe2_ipsec.h"
+
+#define SXE2_DEV_TO_SECURITY(eth) \
+ ((struct rte_security_ctx *)(((struct rte_eth_dev *)eth)->security_ctx))
+
+#define SXE2_RTE_CRYPTO_CIPHER_AES_CBC (RTE_CRYPTO_CIPHER_AES_CBC)
+
+#define SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC (RTE_CRYPTO_CIPHER_SM4_CBC)
+
+#define SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC (RTE_CRYPTO_AUTH_SHA256_HMAC)
+
+#define SXE2_RTE_CRYPTO_AUTH_SM3_HMAC (RTE_CRYPTO_AUTH_SM3_HMAC)
+
+enum sxe2_security_protocol {
+ SXE2_SECURITY_PROTOCOL_IPSEC = 0,
+ SXE2_SECURITY_PROTOCOL_MAX = 1,
+};
+
+enum sxe2_security_xform {
+ SXE2_SECURITY_IPSEC_EN = 0,
+ SXE2_SECURITY_IPSEC_DE = 1,
+ SXE2_SECURITY_NUM_MAX = 2,
+};
+
+enum sxe2_security_block_size {
+ SXE2_SECURITY_BLOCK_SIZE_16 = 16,
+ SXE2_SECURITY_BLOCK_SIZE_64 = 64,
+};
+
+struct sxe2_security_ipsec_caps {
+ enum rte_security_ipsec_sa_protocol proto;
+ enum rte_security_ipsec_sa_mode mode;
+ struct rte_security_ipsec_sa_options options;
+};
+
+struct sxe2_security_capabilities {
+ struct rte_cryptodev_capabilities *crypto_capabilities;
+ enum rte_security_session_action_type action;
+ struct sxe2_security_ipsec_caps ipsec;
+};
+
+struct sxe2_security_session {
+ struct sxe2_adapter *adapter;
+ struct sxe2_ipsec_pkt_metadata pkt_metadata_template;
+ struct sxe2_ipsec_security_sa sa;
+ struct sxe2_ipsec_esn esn;
+ struct sxe2_ipsec_udp udp_cap;
+ enum rte_security_session_protocol protocol;
+ enum rte_security_ipsec_sa_direction direction;
+ enum rte_security_ipsec_sa_mode mode;
+ enum rte_security_ipsec_sa_protocol sa_proto;
+ enum rte_security_ipsec_tunnel_type type;
+};
+
+struct sxe2_security_ctx {
+ struct sxe2_adapter *adapter;
+ struct sxe2_security_capabilities sxe2_capabilities[SXE2_SECURITY_PROTOCOL_MAX];
+ struct sxe2_ipsec_ctx ipsec_ctx;
+ rte_spinlock_t security_lock;
+};
+
+int32_t sxe2_security_init(struct rte_eth_dev *dev);
+
+void sxe2_security_uinit(struct rte_eth_dev *dev);
+
+#endif /* __SXE2_SECURITY_H__ */
diff --git a/drivers/net/sxe2/sxe2_tx.c b/drivers/net/sxe2/sxe2_tx.c
index a280edc9c5..f49238ceef 100644
--- a/drivers/net/sxe2/sxe2_tx.c
+++ b/drivers/net/sxe2/sxe2_tx.c
@@ -304,6 +304,11 @@ int32_t __rte_cold sxe2_tx_queue_setup(struct rte_eth_dev *dev,
}
offloads = tx_conf->offloads | dev->data->dev_conf.txmode.offloads;
+ if (!sxe2_ipsec_valid_tx_offloads(offloads)) {
+ ret = -EINVAL;
+ goto end;
+ }
+
txq = sxe2_tx_queue_alloc(dev, queue_idx, nb_desc, socket_id);
if (txq == NULL) {
PMD_LOG_ERR(TX, "failed to alloc sxe2vf tx queue:%u resource", queue_idx);
@@ -327,6 +332,9 @@ int32_t __rte_cold sxe2_tx_queue_setup(struct rte_eth_dev *dev,
txq->ops = sxe2_tx_default_ops_get();
txq->ops.queue_reset(txq);
+ if (sxe2_ipsec_supported(adapter) && txq->offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
+ txq->ipsec_pkt_md_offset = sxe2_ipsec_pkt_md_offset_get(adapter);
+
dev->data->tx_queues[queue_idx] = txq;
ret = 0;
diff --git a/drivers/net/sxe2/sxe2_txrx_poll.c b/drivers/net/sxe2/sxe2_txrx_poll.c
index 3c6fe37404..8b6e585c36 100644
--- a/drivers/net/sxe2/sxe2_txrx_poll.c
+++ b/drivers/net/sxe2/sxe2_txrx_poll.c
@@ -307,6 +307,25 @@ static __rte_always_inline void sxe2_desc_tso_fill(struct rte_mbuf *tx_pkt,
return;
}
+static __rte_always_inline void sxe2_desc_ipsec_fill(struct rte_mbuf *tx_pkt,
+ struct sxe2_tx_queue *txq, uint16_t *ipsec_offset,
+ uint64_t *desc_type_cmd_tso_mss)
+{
+ struct sxe2_ipsec_pkt_metadata *md = NULL;
+ uint16_t ipsec_pkt_md_offset = txq->ipsec_pkt_md_offset;
+
+ md = RTE_MBUF_DYNFIELD(tx_pkt, ipsec_pkt_md_offset, struct sxe2_ipsec_pkt_metadata *);
+ *ipsec_offset = md->esp_head_offset;
+ *desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_IPSEC_EN;
+ if (md->mode == SXE2_IPSEC_MODE_ONLY_ENCRYPT)
+ *desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_IPSEC_MODE;
+
+ if (md->algo == SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC)
+ *desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_IPSEC_ENGINE;
+
+ *desc_type_cmd_tso_mss |= (uint64_t)(md->sa_idx) << SXE2_TX_CTXT_DESC_IPSEC_SA_SHIFT;
+}
+
static __rte_always_inline uint64_t
sxe2_tx_data_desc_build_cobt(uint32_t cmd, uint32_t offset, uint16_t buf_size, uint16_t l2tag)
{
@@ -426,6 +445,11 @@ uint16_t sxe2_tx_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkt
else if (offloads & RTE_MBUF_F_TX_IEEE1588_TMST)
desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_CMD_TSYN_MASK;
+ if (offloads & RTE_MBUF_F_TX_SEC_OFFLOAD) {
+ sxe2_desc_ipsec_fill(tx_pkt, txq, &ipsec_offset,
+ &desc_type_cmd_tso_mss);
+ }
+
if (offloads & RTE_MBUF_F_TX_QINQ) {
desc_l2tag2 = tx_pkt->vlan_tci_outer;
desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_CMD_IL2TAG2_MASK;
@@ -786,6 +810,36 @@ static inline void sxe2_rx_desc_ptp_para_fill(struct sxe2_rx_queue *rxq,
rxq->ts_low);
}
}
+
+static inline void sxe2_rx_desc_ipsec_para_fill(struct sxe2_rx_queue *rxq __rte_unused,
+ struct rte_mbuf *mbuf, union sxe2_rx_desc *desc)
+{
+ uint32_t status_lrocnt_fdpf_id = rte_le_to_cpu_32(desc->wb.status_lrocnt_fdpf_id);
+ enum sxe2_rx_desc_ipsec_status ipsec_status;
+
+ if (status_lrocnt_fdpf_id & SXE2_RX_DESC_IPSEC_PKT_MASK) {
+ mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD;
+ ipsec_status = SXE2_RX_DESC_IPSEC_STATUS_VAL_GET(status_lrocnt_fdpf_id);
+ switch (ipsec_status) {
+ case SXE2_RX_DESC_IPSEC_STATUS_SUCCESS:
+ break;
+ case SXE2_RX_DESC_IPSEC_STATUS_PKG_OVER_2K:
+ case SXE2_RX_DESC_IPSEC_STATUS_SPI_IP_INVALID:
+ case SXE2_RX_DESC_IPSEC_STATUS_SA_INVALID:
+ case SXE2_RX_DESC_IPSEC_STATUS_NOT_ALIGN:
+ case SXE2_RX_DESC_IPSEC_STATUS_ICV_ERROR:
+ case SXE2_RX_DESC_IPSEC_STATUS_BY_PASSH:
+ case SXE2_RX_DESC_IPSEC_STATUS_MAC_BY_PASSH:
+ PMD_LOG_INFO(RX, "IPsec status error:%d", ipsec_status);
+ mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED;
+ break;
+ default:
+ PMD_LOG_INFO(RX, "Invalid ipsec status:%d", ipsec_status);
+ mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED;
+ break;
+ }
+ }
+}
#endif
static __rte_always_inline void
@@ -803,6 +857,7 @@ sxe2_rx_mbuf_common_fields_fill(struct sxe2_rx_queue *rxq, struct rte_mbuf *mbuf
sxe2_rx_desc_vlan_para_fill(mbuf, rxd);
sxe2_rx_desc_filter_para_fill(rxq, mbuf, rxd);
#ifndef RTE_LIBRTE_SXE2_16BYTE_RX_DESC
+ sxe2_rx_desc_ipsec_para_fill(rxq, mbuf, rxd);
sxe2_rx_desc_ptp_para_fill(rxq, mbuf, rxd);
#endif
--
2.52.0
^ permalink raw reply related
* [PATCH v1 16/20] net/sxe2: support SFP module info and EEPROM access
From: liujie5 @ 2026-06-10 1:39 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260610013936.3634968-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
This patch implements 'get_module_info' and 'get_module_eeprom'
ops for the sxe2 PMD. These interfaces allow applications to retrieve
the type of the plugged-in optical module and read its internal
EEPROM data.
The implementation utilizes the shared SFP header definitions to
parse the module ID, connector type, and encoding. It supports
reading the standard 256-byte EEPROM maps (SFF-8472 for SFP and
SFF-8636 for QSFP) via hardware-specific access commands.
Key features:
- Identify module types (SFP/SFP+/QSFP/QSFP28).
- Support standard EEPROM data retrieval for diagnostic tools.
- Add boundary checks to ensure safe I2C memory access.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
drivers/net/sxe2/sxe2_cmd_chnl.c | 46 +++++
drivers/net/sxe2/sxe2_cmd_chnl.h | 3 +
drivers/net/sxe2/sxe2_drv_cmd.h | 18 ++
drivers/net/sxe2/sxe2_ethdev.c | 298 +++++++++++++++++++++++++++++++
drivers/net/sxe2/sxe2_ethdev.h | 9 +
5 files changed, 374 insertions(+)
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.c b/drivers/net/sxe2/sxe2_cmd_chnl.c
index 926eaee062..43e8c59487 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.c
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.c
@@ -1833,3 +1833,49 @@ int32_t sxe2_drv_srcvsi_prune_config(struct sxe2_adapter *adapter,
return ret;
}
+
+int32_t sxe2_drv_sfp_eeprom_read(struct sxe2_adapter *adapter, struct sxe2_sfp_read_info *sfp_info)
+{
+ int32_t ret = -1;
+ struct sxe2_drv_sfp_req req = {0};
+ struct sxe2_drv_sfp_resp *resp = NULL;
+ struct sxe2_drv_cmd_params cmd = {0};
+
+ resp = rte_zmalloc("read sfp data", sizeof(*resp) + sfp_info->len, 0);
+ if (!resp) {
+ PMD_LOG_ERR(DRV, "Alloc memory failed");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ req.is_wr = false;
+ req.is_qsfp = sfp_info->is_qsfp;
+ req.page_cnt = rte_cpu_to_le_16(sfp_info->page_cnt);
+ req.offset = rte_cpu_to_le_16(sfp_info->offset);
+ req.data_len = rte_cpu_to_le_16(sfp_info->len);
+ req.bus_addr = rte_cpu_to_le_16(sfp_info->bus_addr);
+
+ PMD_DEV_LOG_INFO(adapter, DRV, "is_qsfp=%u, page_cnt=%u, offset=%u, datalen=%u, "
+ "bus_addr=%u", sfp_info->is_qsfp, sfp_info->page_cnt, sfp_info->offset,
+ sfp_info->len, sfp_info->bus_addr);
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_OPT_EEP_GET,
+ &req, sizeof(req),
+ resp, sizeof(*resp) + sfp_info->len);
+ ret = sxe2_drv_cmd_exec(adapter->cdev, &cmd);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to read sfp, ret=%d", ret);
+ goto l_end;
+ }
+
+ ret = 0;
+ rte_memcpy(sfp_info->data, resp->data, sfp_info->len);
+
+l_end:
+ if (resp) {
+ rte_free(resp);
+ resp = NULL;
+ }
+
+ return ret;
+}
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.h b/drivers/net/sxe2/sxe2_cmd_chnl.h
index 97007c7cfa..988d4b458b 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.h
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.h
@@ -167,4 +167,7 @@ int32_t sxe2_drv_flow_fnav_query_stat(struct sxe2_adapter *adapter,
int32_t sxe2_drv_srcvsi_prune_config(struct sxe2_adapter *adapter,
uint16_t *vsi_list, uint16_t vsi_cnt, bool set);
+int32_t sxe2_drv_sfp_eeprom_read(struct sxe2_adapter *adapter,
+ struct sxe2_sfp_read_info *sfp_info);
+
#endif /* SXE2_CMD_CHNL_H */
diff --git a/drivers/net/sxe2/sxe2_drv_cmd.h b/drivers/net/sxe2/sxe2_drv_cmd.h
index f7acd20642..09b2f7d125 100644
--- a/drivers/net/sxe2/sxe2_drv_cmd.h
+++ b/drivers/net/sxe2/sxe2_drv_cmd.h
@@ -633,6 +633,24 @@ struct __rte_aligned(4) __rte_packed_begin sxe2_drv_udp_tunnel_resp {
uint8_t rsv;
} __rte_packed_end;
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_sfp_req {
+ uint8_t is_wr;
+ uint8_t is_qsfp;
+ uint16_t bus_addr;
+ uint16_t page_cnt;
+ uint16_t offset;
+ uint16_t data_len;
+ uint16_t rvd;
+ uint8_t data[];
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_sfp_resp {
+ uint8_t is_wr;
+ uint8_t is_qsfp;
+ uint16_t data_len;
+ uint8_t data[];
+} __rte_packed_end;
+
enum sxe2_drv_cmd_module {
SXE2_DRV_CMD_MODULE_HANDSHAKE = 0,
SXE2_DRV_CMD_MODULE_DEV = 1,
diff --git a/drivers/net/sxe2/sxe2_ethdev.c b/drivers/net/sxe2/sxe2_ethdev.c
index d1ced7d427..d2b884c6cd 100644
--- a/drivers/net/sxe2/sxe2_ethdev.c
+++ b/drivers/net/sxe2/sxe2_ethdev.c
@@ -41,6 +41,7 @@
#include "sxe2_ethdev_repr.h"
#include "sxe2vf_regs.h"
#include "sxe2_switchdev.h"
+#include "sxe2_msg.h"
#define SXE2_PCI_VENDOR_ID_1 0x1ff2
#define SXE2_PCI_DEVICE_ID_PF_1 0x10b1
@@ -122,6 +123,10 @@ static int32_t sxe2_udp_tunnel_port_del(struct rte_eth_dev *dev,
struct rte_eth_udp_tunnel *tunnel_udp);
static int32_t sxe2_fw_version_string_get(struct rte_eth_dev *dev,
char *fw_version, size_t fw_size);
+static int32_t sxe2_get_module_info(struct rte_eth_dev *dev,
+ struct rte_eth_dev_module_info *info);
+static int32_t sxe2_get_module_eeprom(struct rte_eth_dev *dev,
+ struct rte_dev_eeprom_info *info);
static const struct eth_dev_ops sxe2_eth_dev_ops = {
.dev_configure = sxe2_dev_configure,
@@ -186,6 +191,9 @@ static const struct eth_dev_ops sxe2_eth_dev_ops = {
.fw_version_get = sxe2_fw_version_string_get,
.get_monitor_addr = sxe2_get_monitor_addr,
+
+ .get_module_info = sxe2_get_module_info,
+ .get_module_eeprom = sxe2_get_module_eeprom,
};
static int32_t sxe2_dev_configure(struct rte_eth_dev *dev)
@@ -291,6 +299,296 @@ static int32_t sxe2_dev_start(struct rte_eth_dev *dev)
return ret;
}
+static int32_t sxe2_sfp_type_get(struct sxe2_adapter *adapter, uint8_t *type)
+{
+ int32_t ret = -1;
+ struct sxe2_sfp_read_info sfp_info;
+
+ memset(&sfp_info, 0, sizeof(sfp_info));
+ sfp_info.bus_addr = SXE2_SFP_E2P_I2C_7BIT_ADDR0;
+ sfp_info.len = 1;
+ sfp_info.data = type;
+ sfp_info.offset = 0;
+ sfp_info.page_cnt = 0;
+ sfp_info.is_qsfp = false;
+
+ ret = sxe2_drv_sfp_eeprom_read(adapter, &sfp_info);
+ if (ret)
+ goto l_end;
+
+ ret = 0;
+ PMD_LOG_INFO(DRV, "Get sfp type success, type=%u", *type);
+
+l_end:
+ return ret;
+}
+
+static int32_t sxe2_sfp_module_info_get(struct sxe2_adapter *adapter,
+ struct rte_eth_dev_module_info *info)
+{
+ int32_t ret = -1;
+ bool page_swap = false;
+ uint8_t sff8472_rev = 0;
+ uint8_t addr_mode = 0;
+ struct sxe2_sfp_read_info sfp_info;
+
+ memset(&sfp_info, 0, sizeof(sfp_info));
+ sfp_info.bus_addr = SXE2_SFP_E2P_I2C_7BIT_ADDR0;
+ sfp_info.is_qsfp = false;
+ sfp_info.len = 1;
+ sfp_info.data = &sff8472_rev;
+ sfp_info.offset = SXE2_MODULE_SFF_8472_COMP;
+ sfp_info.page_cnt = 0;
+
+ ret = sxe2_drv_sfp_eeprom_read(adapter, &sfp_info);
+ if (ret) {
+ ret = -EIO;
+ PMD_LOG_ERR(DRV, "Failed to read 8472 protocol, ret=%d", ret);
+ goto l_end;
+ }
+
+ sfp_info.data = &addr_mode;
+ sfp_info.offset = SXE2_MODULE_SFF_8472_SWAP;
+
+ ret = sxe2_drv_sfp_eeprom_read(adapter, &sfp_info);
+ if (ret) {
+ ret = -EIO;
+ PMD_LOG_ERR(DRV, "Failed to read A2 page, ret=%d", ret);
+ goto l_end;
+ }
+
+ if (addr_mode & SXE2_MODULE_SFF_ADDR_MODE) {
+ PMD_LOG_ERR(DRV, "address change required to access page 0xA2, "
+ "but not supported. please report the module "
+ "type to the driver maintainers.");
+ page_swap = true;
+ }
+
+ PMD_LOG_INFO(DRV, "Read sfp module info, sff_8472=%u, a2_page=%u, swap_page=%d",
+ sff8472_rev, addr_mode, page_swap);
+
+ if (sff8472_rev == SXE2_MODULE_SFF_8472_UNSUP ||
+ page_swap ||
+ !(addr_mode & SXE2_MODULE_SFF_DDM_IMPLEMENTED)) {
+ info->type = SXE2_MODULE_SFF_8079;
+ info->eeprom_len = SXE2_MODULE_SFF_8079_LEN;
+ } else {
+ info->type = SXE2_MODULE_SFF_8472;
+ info->eeprom_len = SXE2_MODULE_SFF_8472_LEN;
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_qsfp_module_info_get(struct sxe2_adapter *adapter, struct rte_eth_dev_module_info *info)
+{
+ int32_t ret = -1;
+ uint8_t sff8636_rev = 0;
+ struct sxe2_sfp_read_info sfp_info;
+
+ memset(&sfp_info, 0, sizeof(sfp_info));
+ sfp_info.bus_addr = SXE2_SFP_E2P_I2C_7BIT_ADDR0;
+ sfp_info.is_qsfp = true;
+ sfp_info.len = 1;
+ sfp_info.data = &sff8636_rev;
+ sfp_info.offset = SXE2_MODULE_REVISION_ADDR;
+ sfp_info.page_cnt = 0;
+
+ ret = sxe2_drv_sfp_eeprom_read(adapter, &sfp_info);
+ if (ret) {
+ ret = -EIO;
+ PMD_LOG_ERR(DRV, "Failed to read 8636 protocol, ret=%d", ret);
+ goto l_end;
+ }
+
+ if (sff8636_rev > 0x02) {
+ info->type = SXE2_MODULE_SFF_8636;
+ info->eeprom_len = SXE2_MODULE_SFF_8636_MAX_LEN;
+ } else {
+ info->type = SXE2_MODULE_SFF_8436;
+ info->eeprom_len = SXE2_MODULE_SFF_8436_MAX_LEN;
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_get_module_info(struct rte_eth_dev *dev, struct rte_eth_dev_module_info *info)
+{
+ int32_t ret = -1;
+ uint8_t type = 0;
+ struct sxe2_adapter *adapter = dev->data->dev_private;
+
+ ret = sxe2_sfp_type_get(adapter, &type);
+ if (ret) {
+ ret = -EIO;
+ PMD_LOG_ERR(DRV, "Failed to read sfp type, ret=%d", ret);
+ goto l_end;
+ }
+
+ switch (type) {
+ case SXE2_MODULE_SFF_SFP_TYPE:
+ ret = sxe2_sfp_module_info_get(adapter, info);
+ if (ret)
+ goto l_end;
+ break;
+ case SXE2_MODULE_TYPE_QSFP_PLUS:
+ case SXE2_MODULE_TYPE_QSFP28:
+ ret = sxe2_qsfp_module_info_get(adapter, info);
+ if (ret)
+ goto l_end;
+ break;
+ default:
+ ret = -ENXIO;
+ PMD_LOG_ERR(DRV, "Invalid sfp type, type=%d.", type);
+ goto l_end;
+ }
+
+ PMD_LOG_INFO(DRV, "sfp eeprom type=%x, eeprom len=%d.", info->type, info->eeprom_len);
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_get_sfp_eeprom(struct sxe2_adapter *adapter, struct sxe2_sfp_read_info *sfp_info)
+{
+ int32_t ret = -1;
+ uint16_t ori_len = sfp_info->len;
+ uint16_t ori_offset = sfp_info->offset;
+
+ if ((ori_len + ori_offset) > SXE2_SFP_EEP_LEN_MAX) {
+ sfp_info->len = (uint16_t)(SXE2_SFP_EEP_LEN_MAX - ori_offset);
+ ret = sxe2_drv_sfp_eeprom_read(adapter, sfp_info);
+ if (ret)
+ goto l_end;
+ sfp_info->bus_addr = SXE2_SFP_E2P_I2C_7BIT_ADDR1;
+ sfp_info->len = (uint16_t)(ori_len - (SXE2_SFP_EEP_LEN_MAX - ori_offset));
+ sfp_info->data = (uint8_t *)(sfp_info->data) + (SXE2_SFP_EEP_LEN_MAX - ori_offset);
+ sfp_info->offset = 0;
+ sfp_info->page_cnt = 0;
+ ret = sxe2_drv_sfp_eeprom_read(adapter, sfp_info);
+ } else {
+ ret = sxe2_drv_sfp_eeprom_read(adapter, sfp_info);
+ }
+
+l_end:
+ if (ret)
+ PMD_LOG_ERR(DRV, "Failed to read sfp.");
+ return ret;
+}
+
+static int32_t
+sxe2_get_qsfp_eeprom(struct sxe2_adapter *adapter,
+ struct sxe2_sfp_read_info *sfp_info)
+{
+ int32_t ret = -1;
+ uint16_t ori_len = sfp_info->len;
+ uint16_t ori_offset = sfp_info->offset;
+ uint16_t read_len = 0;
+ uint16_t remain_len = 0;
+
+ if ((ori_len + ori_offset) > SXE2_SFP_EEP_LEN_MAX) {
+ sfp_info->len = (uint16_t)(SXE2_SFP_EEP_LEN_MAX - ori_offset);
+ ret = sxe2_drv_sfp_eeprom_read(adapter, sfp_info);
+ if (ret)
+ goto l_end;
+
+ do {
+ read_len = read_len + sfp_info->len;
+ sfp_info->data = (uint8_t *)(sfp_info->data) + sfp_info->len;
+ sfp_info->offset = SXE2_QSFP_PAGE_OFST_START;
+ sfp_info->page_cnt++;
+ remain_len = (uint16_t)(ori_len - read_len);
+ sfp_info->len = (remain_len > SXE2_QSFP_PAGE_OFST_START) ?
+ SXE2_QSFP_PAGE_OFST_START : remain_len;
+ ret = sxe2_drv_sfp_eeprom_read(adapter, sfp_info);
+ if (ret)
+ goto l_end;
+ } while (remain_len > SXE2_QSFP_PAGE_OFST_START);
+ } else {
+ ret = sxe2_drv_sfp_eeprom_read(adapter, sfp_info);
+ }
+
+l_end:
+ if (ret)
+ PMD_LOG_ERR(DRV, "Failed to read sfp.");
+ return ret;
+}
+
+static int32_t
+sxe2_get_module_eeprom(struct rte_eth_dev *dev, struct rte_dev_eeprom_info *info)
+{
+ int32_t ret = -1;
+ uint8_t type = 0;
+ struct sxe2_adapter *adapter = dev->data->dev_private;
+ struct sxe2_sfp_read_info sfp_info;
+
+ memset(&sfp_info, 0, sizeof(sfp_info));
+
+ if (!info || !info->length || !info->data ||
+ info->offset >= SXE2_SFP_EEP_LEN_MAX) {
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ PMD_LOG_INFO(DRV, "Dump sfp eeprom info offset=0x%x, len=0x%x.",
+ info->offset, info->length);
+
+ ret = sxe2_sfp_type_get(adapter, &type);
+ if (ret) {
+ ret = -EIO;
+ PMD_LOG_ERR(DRV, "Failed to read sfp type, ret=%d", ret);
+ goto l_end;
+ }
+
+ sfp_info.bus_addr = SXE2_SFP_E2P_I2C_7BIT_ADDR0;
+ sfp_info.len = info->length;
+ sfp_info.data = info->data;
+ sfp_info.offset = info->offset;
+ sfp_info.page_cnt = 0;
+
+ switch (type) {
+ case SXE2_MODULE_SFF_SFP_TYPE:
+ if (info->length > SXE2_SFP_EEP_LEN_MAX * 2) {
+ ret = -EINVAL;
+ PMD_LOG_ERR(DRV, "sfp read size[%u] > eeprom max size[%d], ret=%d",
+ info->length, SXE2_SFP_EEP_LEN_MAX * 2, ret);
+ goto l_end;
+ }
+ sfp_info.is_qsfp = false;
+ ret = sxe2_get_sfp_eeprom(adapter, &sfp_info);
+ if (ret)
+ goto l_end;
+ break;
+ case SXE2_MODULE_TYPE_QSFP_PLUS:
+ case SXE2_MODULE_TYPE_QSFP28:
+ if (info->length > SXE2_MODULE_SFF_8636_MAX_LEN) {
+ ret = -EINVAL;
+ PMD_LOG_ERR(DRV, "sfp read size[%u] > eeprom max size[%d], ret=%d",
+ info->length, SXE2_SFP_EEP_LEN_MAX * 2, ret);
+ goto l_end;
+ }
+ sfp_info.is_qsfp = true;
+ ret = sxe2_get_qsfp_eeprom(adapter, &sfp_info);
+ if (ret)
+ goto l_end;
+ break;
+ default:
+ ret = -ENXIO;
+ PMD_LOG_ERR(DRV, "Invalid sfp type, type=%d.", type);
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
static enum sxe2_udp_tunnel_protocol
sxe2_udp_tunnel_type_rte_to_sxe2(enum rte_eth_tunnel_type rte_type)
{
diff --git a/drivers/net/sxe2/sxe2_ethdev.h b/drivers/net/sxe2/sxe2_ethdev.h
index 34f9aa46b2..510f5139c3 100644
--- a/drivers/net/sxe2/sxe2_ethdev.h
+++ b/drivers/net/sxe2/sxe2_ethdev.h
@@ -275,6 +275,15 @@ struct sxe2_sched_hw_cap {
uint8_t adj_lvl;
};
+struct sxe2_sfp_read_info {
+ uint8_t *data;
+ uint16_t offset;
+ uint16_t len;
+ uint16_t bus_addr;
+ uint16_t page_cnt;
+ bool is_qsfp;
+};
+
struct sxe2_link_context {
rte_spinlock_t link_lock;
bool link_up;
--
2.52.0
^ permalink raw reply related
* [PATCH v1 13/20] net/sxe2: support firmware version reading
From: liujie5 @ 2026-06-10 1:39 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260610013936.3634968-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
This patch implements the logic to retrieve the firmware version and
Build ID from the hardware during device initialization.
The version is exposed to applications through the dev_info_get API.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
drivers/net/sxe2/sxe2_ethdev.c | 35 +++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/drivers/net/sxe2/sxe2_ethdev.c b/drivers/net/sxe2/sxe2_ethdev.c
index 58b56428f9..74b841c2ef 100644
--- a/drivers/net/sxe2/sxe2_ethdev.c
+++ b/drivers/net/sxe2/sxe2_ethdev.c
@@ -120,7 +120,8 @@ static int32_t sxe2_udp_tunnel_port_add(struct rte_eth_dev *dev,
struct rte_eth_udp_tunnel *tunnel_udp);
static int32_t sxe2_udp_tunnel_port_del(struct rte_eth_dev *dev,
struct rte_eth_udp_tunnel *tunnel_udp);
-
+static int32_t sxe2_fw_version_string_get(struct rte_eth_dev *dev,
+ char *fw_version, size_t fw_size);
static const struct eth_dev_ops sxe2_eth_dev_ops = {
.dev_configure = sxe2_dev_configure,
@@ -181,6 +182,8 @@ static const struct eth_dev_ops sxe2_eth_dev_ops = {
.xstats_reset = sxe2_stats_info_reset,
.queue_stats_mapping_set = sxe2_queue_stats_mapping_set,
+
+ .fw_version_get = sxe2_fw_version_string_get,
};
static int32_t sxe2_dev_configure(struct rte_eth_dev *dev)
@@ -1575,6 +1578,36 @@ static int32_t sxe2_eth_pmd_remove(struct sxe2_common_device *cdev)
return ret;
}
+static int32_t sxe2_fw_version_string_get(struct rte_eth_dev *dev, char *fw_version, size_t fw_size)
+{
+ struct sxe2_adapter *adapter =
+ SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_fw_info *fw_info = &adapter->dev_info.fw;
+ int32_t ret_len;
+ int32_t ret;
+
+ ret_len = snprintf(fw_version, fw_size,
+ "%u.%u.%u.%u",
+ fw_info->main_version_id,
+ fw_info->sub_version_id,
+ fw_info->fix_version_id,
+ fw_info->build_id);
+
+ if (ret_len < 0) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ret_len += 1;
+ if (fw_size < (size_t)ret_len)
+ ret = -EINVAL;
+ else
+ ret = 0;
+
+out:
+ return ret;
+}
+
static uint16_t sxe2_switchdev_repr_id_encode_get(struct sxe2_switchdev_info *switchdev_info)
{
enum rte_eth_representor_type type;
--
2.52.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox