* [PATCH v13 05/20] drivers: support RSS feature
From: liujie5 @ 2026-06-08 7:42 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260608074257.3043531-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
Add support for Receive Side Scaling (RSS) to distribute incoming
traffic across multiple receive queues.
- Implement rss_hash_update and rss_hash_conf_get.
- Implement reta_update and reta_query.
- Support RSS hash configuration for IPv4, IPv6, TCP and UDP.
- Default hash key is initialized during port start.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
drivers/common/sxe2/sxe2_flow_public.h | 633 +++++++++++++++++++++++++
drivers/net/sxe2/meson.build | 1 +
drivers/net/sxe2/sxe2_cmd_chnl.c | 173 +++++++
drivers/net/sxe2/sxe2_cmd_chnl.h | 16 +
drivers/net/sxe2/sxe2_drv_cmd.h | 29 ++
drivers/net/sxe2/sxe2_ethdev.c | 37 ++
drivers/net/sxe2/sxe2_ethdev.h | 8 +
drivers/net/sxe2/sxe2_flow_define.h | 143 ++++++
drivers/net/sxe2/sxe2_queue.c | 11 +
| 584 +++++++++++++++++++++++
| 81 ++++
drivers/net/sxe2/sxe2_txrx.h | 4 +
drivers/net/sxe2/sxe2_txrx_poll.c | 85 +++-
13 files changed, 1804 insertions(+), 1 deletion(-)
create mode 100644 drivers/common/sxe2/sxe2_flow_public.h
create mode 100644 drivers/net/sxe2/sxe2_flow_define.h
create mode 100644 drivers/net/sxe2/sxe2_rss.c
create mode 100644 drivers/net/sxe2/sxe2_rss.h
diff --git a/drivers/common/sxe2/sxe2_flow_public.h b/drivers/common/sxe2/sxe2_flow_public.h
new file mode 100644
index 0000000000..32ab2a9713
--- /dev/null
+++ b/drivers/common/sxe2/sxe2_flow_public.h
@@ -0,0 +1,633 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef __SXE2_FLOW_PUBLIC_H__
+#define __SXE2_FLOW_PUBLIC_H__
+#include "sxe2_osal.h"
+
+enum sxe2_flow_type {
+ SXE2_FLOW_TYPE_NONE = 0,
+ SXE2_FLOW_MAC_PAY = 1,
+ SXE2_FLOW_MAC_IPV4_FRAG_PAY = 22,
+ SXE2_FLOW_MAC_IPV4_PAY = 23,
+ SXE2_FLOW_MAC_IPV4_UDP_PAY = 24,
+ SXE2_FLOW_MAC_IPV4_TCP_PAY = 26,
+ SXE2_FLOW_MAC_IPV4_SCTP_PAY = 27,
+ SXE2_FLOW_MAC_IPV4_IPV4_FRAG_PAY = 29,
+ SXE2_FLOW_MAC_IPV4_IPV4_PAY = 30,
+ SXE2_FLOW_MAC_IPV4_IPV4_UDP_PAY = 31,
+ SXE2_FLOW_MAC_IPV4_IPV4_TCP_PAY = 33,
+ SXE2_FLOW_MAC_IPV4_IPV4_SCTP_PAY = 34,
+ SXE2_FLOW_MAC_IPV4_IPV6_FRAG_PAY = 36,
+ SXE2_FLOW_MAC_IPV4_IPV6_PAY = 37,
+ SXE2_FLOW_MAC_IPV4_IPV6_UDP_PAY = 38,
+ SXE2_FLOW_MAC_IPV4_IPV6_TCP_PAY = 40,
+ SXE2_FLOW_MAC_IPV4_IPV6_SCTP_PAY = 41,
+ SXE2_FLOW_MAC_IPV4_GRE_PAY = 43,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV4_FRAG_PAY = 44,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV4_PAY = 45,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV4_UDP_PAY = 46,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV4_TCP_PAY = 48,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV4_SCTP_PAY = 49,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV6_FRAG_PAY = 51,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV6_PAY = 52,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV6_UDP_PAY = 53,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV6_TCP_PAY = 55,
+ SXE2_FLOW_MAC_IPV4_GRE_IPV6_SCTP_PAY = 56,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_PAY = 58,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV4_FRAG_PAY = 59,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV4_PAY = 60,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV4_UDP_PAY = 61,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV4_TCP_PAY = 63,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV4_SCTP_PAY = 64,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV6_FRAG_PAY = 66,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV6_PAY = 67,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV6_UDP_PAY = 68,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV6_TCP_PAY = 70,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_IPV6_SCTP_PAY = 71,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_PAY = 73,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV4_FRAG_PAY = 74,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV4_PAY = 75,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV4_UDP_PAY = 76,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV4_TCP_PAY = 78,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV4_SCTP_PAY = 79,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV6_FRAG_PAY = 81,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV6_PAY = 82,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV6_UDP_PAY = 83,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV6_TCP_PAY = 85,
+ SXE2_FLOW_MAC_IPV4_GRE_MAC_VLAN_IPV6_SCTP_PAY = 86,
+ SXE2_FLOW_MAC_IPV6_FRAG_PAY = 88,
+ SXE2_FLOW_MAC_IPV6_PAY = 89,
+ SXE2_FLOW_MAC_IPV6_UDP_PAY = 90,
+ SXE2_FLOW_MAC_IPV6_TCP_PAY = 92,
+ SXE2_FLOW_MAC_IPV6_SCTP_PAY = 93,
+ SXE2_FLOW_MAC_IPV6_IPV4_FRAG_PAY = 95,
+ SXE2_FLOW_MAC_IPV6_IPV4_PAY = 96,
+ SXE2_FLOW_MAC_IPV6_IPV4_UDP_PAY = 97,
+ SXE2_FLOW_MAC_IPV6_IPV4_TCP_PAY = 99,
+ SXE2_FLOW_MAC_IPV6_IPV4_SCTP_PAY = 100,
+ SXE2_FLOW_MAC_IPV6_IPV6_FRAG_PAY = 102,
+ SXE2_FLOW_MAC_IPV6_IPV6_PAY = 103,
+ SXE2_FLOW_MAC_IPV6_IPV6_UDP_PAY = 104,
+ SXE2_FLOW_MAC_IPV6_IPV6_TCP_PAY = 106,
+ SXE2_FLOW_MAC_IPV6_IPV6_SCTP_PAY = 107,
+ SXE2_FLOW_MAC_IPV6_GRE_PAY = 109,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV4_FRAG_PAY = 110,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV4_PAY = 111,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV4_UDP_PAY = 112,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV4_TCP_PAY = 114,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV4_SCTP_PAY = 115,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV6_FRAG_PAY = 117,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV6_PAY = 118,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV6_UDP_PAY = 119,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV6_TCP_PAY = 121,
+ SXE2_FLOW_MAC_IPV6_GRE_IPV6_SCTP_PAY = 122,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_PAY = 124,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV4_FRAG_PAY = 125,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV4_PAY = 126,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV4_UDP_PAY = 127,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV4_TCP_PAY = 129,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV4_SCTP_PAY = 130,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV6_FRAG_PAY = 132,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV6_PAY = 133,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV6_UDP_PAY = 134,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV6_TCP_PAY = 136,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_IPV6_SCTP_PAY = 137,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_PAY = 139,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV4_FRAG_PAY = 140,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV4_PAY = 141,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV4_UDP_PAY = 142,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV4_TCP_PAY = 144,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV4_SCTP_PAY = 145,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV6_FRAG_PAY = 147,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV6_PAY = 148,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV6_UDP_PAY = 149,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV6_TCP_PAY = 151,
+ SXE2_FLOW_MAC_IPV6_GRE_MAC_VLAN_IPV6_SCTP_PAY = 152,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_PAY = 329,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_PAY = 330,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV4_FRAG_PAY = 331,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV4_PAY = 332,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV4_UDP_PAY = 333,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV4_TCP_PAY = 334,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV4_SCTP_PAY = 335,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV4_FRAG_PAY = 336,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV4_PAY = 337,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV4_UDP_PAY = 338,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV4_TCP_PAY = 339,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV4_SCTP_PAY = 340,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV6_FRAG_PAY = 341,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV6_PAY = 342,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV6_UDP_PAY = 343,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV6_TCP_PAY = 344,
+ SXE2_FLOW_MAC_IPV4_UDP_GTPU_IPV6_SCTP_PAY = 345,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV6_FRAG_PAY = 346,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV6_PAY = 347,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV6_UDP_PAY = 348,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV6_TCP_PAY = 349,
+ SXE2_FLOW_MAC_IPV6_UDP_GTPU_IPV6_SCTP_PAY = 350,
+ SXE2_FLOW_MAC_IPV6_MAC_PAY = 820,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV4_FRAG_PAY = 821,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV4_PAY = 822,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV4_UDP_PAY = 823,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV4_TCP_PAY = 824,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV4_SCTP_PAY = 825,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV6_FRAG_PAY = 827,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV6_PAY = 828,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV6_UDP_PAY = 829,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV6_TCP_PAY = 830,
+ SXE2_FLOW_MAC_IPV6_MAC_IPV6_SCTP_PAY = 831,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_PAY = 835,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV4_FRAG_PAY = 836,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV4_PAY = 837,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV4_UDP_PAY = 838,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV4_TCP_PAY = 839,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV4_SCTP_PAY = 840,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV6_FRAG_PAY = 842,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV6_PAY = 843,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV6_UDP_PAY = 844,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV6_TCP_PAY = 845,
+ SXE2_FLOW_MAC_IPV6_MAC_VLAN_IPV6_SCTP_PAY = 846,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_PAY = 878,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV4_FRAG_PAY = 877,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV4_PAY = 876,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV4_UDP_PAY = 879,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV4_TCP_PAY = 880,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV4_SCTP_PAY = 875,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV6_FRAG_PAY = 871,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV6_PAY = 870,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV6_UDP_PAY = 872,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV6_TCP_PAY = 873,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_IPV6_SCTP_PAY = 869,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_PAY = 891,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV4_FRAG_PAY = 890,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV4_PAY = 889,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV4_UDP_PAY = 892,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV4_TCP_PAY = 893,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV4_SCTP_PAY = 888,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV6_FRAG_PAY = 884,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV6_PAY = 883,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV6_UDP_PAY = 885,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV6_TCP_PAY = 886,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_IPV6_SCTP_PAY = 882,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_PAY = 904,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV4_FRAG_PAY = 903,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV4_PAY = 902,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV4_UDP_PAY = 905,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV4_TCP_PAY = 906,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV4_SCTP_PAY = 901,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV6_FRAG_PAY = 897,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV6_PAY = 896,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV6_UDP_PAY = 898,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV6_TCP_PAY = 899,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_IPV6_SCTP_PAY = 895,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_PAY = 917,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV4_FRAG_PAY = 916,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV4_PAY = 915,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV4_UDP_PAY = 918,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV4_TCP_PAY = 919,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV4_SCTP_PAY = 914,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV6_FRAG_PAY = 910,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV6_PAY = 909,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV6_UDP_PAY = 911,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV6_TCP_PAY = 912,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_IPV6_SCTP_PAY = 908,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_PAY = 930,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV4_FRAG_PAY = 929,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV4_PAY = 928,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV4_UDP_PAY = 931,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV4_TCP_PAY = 932,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV4_SCTP_PAY = 927,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV6_FRAG_PAY = 923,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV6_PAY = 922,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV6_UDP_PAY = 924,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV6_TCP_PAY = 925,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_VLAN_IPV6_SCTP_PAY = 921,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_PAY = 943,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV4_FRAG_PAY = 942,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV4_PAY = 941,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV4_UDP_PAY = 944,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV4_TCP_PAY = 945,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV4_SCTP_PAY = 940,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV6_FRAG_PAY = 936,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV6_PAY = 935,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV6_UDP_PAY = 937,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV6_TCP_PAY = 938,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_VLAN_IPV6_SCTP_PAY = 934,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_PAY = 956,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV4_FRAG_PAY = 955,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV4_PAY = 954,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV4_UDP_PAY = 957,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV4_TCP_PAY = 958,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV4_SCTP_PAY = 953,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV6_FRAG_PAY = 949,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV6_PAY = 948,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV6_UDP_PAY = 950,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV6_TCP_PAY = 951,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_VLAN_IPV6_SCTP_PAY = 947,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_PAY = 969,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV4_FRAG_PAY = 968,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV4_PAY = 967,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV4_UDP_PAY = 970,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV4_TCP_PAY = 971,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV4_SCTP_PAY = 966,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV6_FRAG_PAY = 962,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV6_PAY = 961,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV6_UDP_PAY = 963,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV6_TCP_PAY = 964,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_VLAN_IPV6_SCTP_PAY = 960,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_PAY = 982,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV4_FRAG_PAY = 981,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV4_PAY = 980,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV4_UDP_PAY = 983,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV4_TCP_PAY = 984,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV4_SCTP_PAY = 979,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV6_FRAG_PAY = 975,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV6_PAY = 974,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV6_UDP_PAY = 976,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV6_TCP_PAY = 977,
+ SXE2_FLOW_MAC_IPV6_UDP_VXGEN_MAC_IPV6_SCTP_PAY = 973,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_PAY = 995,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV4_FRAG_PAY = 994,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV4_PAY = 993,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV4_UDP_PAY = 996,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV4_TCP_PAY = 997,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV4_SCTP_PAY = 992,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV6_FRAG_PAY = 988,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV6_PAY = 987,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV6_UDP_PAY = 989,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV6_TCP_PAY = 990,
+ SXE2_FLOW_MAC_IPV4_UDP_VXGEN_MAC_IPV6_SCTP_PAY = 986,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_PAY = 1008,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV4_FRAG_PAY = 1007,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV4_PAY = 1006,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV4_UDP_PAY = 1009,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV4_TCP_PAY = 1010,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV4_SCTP_PAY = 1005,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV6_FRAG_PAY = 1001,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV6_PAY = 1000,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV6_UDP_PAY = 1002,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV6_TCP_PAY = 1003,
+ SXE2_FLOW_MAC_IPV6_UDP_GRE_MAC_IPV6_SCTP_PAY = 999,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_PAY = 1021,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV4_FRAG_PAY = 1020,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV4_PAY = 1019,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV4_UDP_PAY = 1022,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV4_TCP_PAY = 1023,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV4_SCTP_PAY = 1018,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV6_FRAG_PAY = 1014,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV6_PAY = 1013,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV6_UDP_PAY = 1015,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV6_TCP_PAY = 1016,
+ SXE2_FLOW_MAC_IPV4_UDP_GRE_MAC_IPV6_SCTP_PAY = 1012,
+ SXE2_FLOW_TYPE_MAX = 2048,
+};
+
+enum sxe2_rss_cfg_hdr_type {
+ SXE2_RSS_OUTER_HEADERS,
+ SXE2_RSS_INNER_HEADERS,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV4,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV6,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV4_GRE,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV6_GRE,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV4_UDP_GRE,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV6_UDP_GRE,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV4_UDP_VXLAN,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV6_UDP_VXLAN,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV4_UDP_GENEVE,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV6_UDP_GENEVE,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV4_UDP_GTPU,
+
+ SXE2_RSS_INNER_HEADERS_WITH_OUTER_IPV6_UDP_GTPU,
+ SXE2_RSS_ANY_HEADERS
+};
+
+enum sxe2_flow_hdr {
+ SXE2_FLOW_HDR_ETH = 0,
+ SXE2_FLOW_HDR_VLAN,
+ SXE2_FLOW_HDR_QINQ,
+ SXE2_FLOW_HDR_IPV4,
+ SXE2_FLOW_HDR_IPV6,
+ SXE2_FLOW_HDR_ICMP = 5,
+ SXE2_FLOW_HDR_TCP,
+ SXE2_FLOW_HDR_UDP,
+ SXE2_FLOW_HDR_SCTP,
+ SXE2_FLOW_HDR_GRE,
+ SXE2_FLOW_HDR_VXLAN = 10,
+ SXE2_FLOW_HDR_GENEVE,
+ SXE2_FLOW_HDR_GTPU,
+
+ SXE2_FLOW_HDR_IPV_FRAG,
+
+ SXE2_FLOW_HDR_IPV_OTHER,
+
+ SXE2_FLOW_HDR_ETH_NON_IP = 15,
+ SXE2_FLOW_HDR_MAX = 128,
+};
+
+enum sxe2_flow_fld_id {
+ SXE2_FLOW_FLD_ID_ETH_DA = 0,
+ SXE2_FLOW_FLD_ID_ETH_SA,
+ SXE2_FLOW_FLD_ID_S_TCI,
+ SXE2_FLOW_FLD_ID_C_TCI,
+ SXE2_FLOW_FLD_ID_S_TPID,
+ SXE2_FLOW_FLD_ID_C_TPID = 5,
+ SXE2_FLOW_FLD_ID_S_VID,
+ SXE2_FLOW_FLD_ID_C_VID,
+ SXE2_FLOW_FLD_ID_ETH_TYPE,
+
+ SXE2_FLOW_FLD_ID_IPV4_TOS,
+ SXE2_FLOW_FLD_ID_IPV6_DSCP = 10,
+ SXE2_FLOW_FLD_ID_IPV4_TTL,
+ SXE2_FLOW_FLD_ID_IPV4_PROT,
+ SXE2_FLOW_FLD_ID_IPV6_TTL,
+ SXE2_FLOW_FLD_ID_IPV6_PROT,
+ SXE2_FLOW_FLD_ID_IPV4_SA = 15,
+ SXE2_FLOW_FLD_ID_IPV4_DA,
+ SXE2_FLOW_FLD_ID_IPV6_SA,
+ SXE2_FLOW_FLD_ID_IPV6_DA,
+ SXE2_FLOW_FLD_ID_IPV4_CHKSUM,
+ SXE2_FLOW_FLD_ID_IPV4_ID = 20,
+ SXE2_FLOW_FLD_ID_IPV6_ID,
+ SXE2_FLOW_FLD_ID_IPV6_PRE32_SA,
+ SXE2_FLOW_FLD_ID_IPV6_PRE32_DA,
+ SXE2_FLOW_FLD_ID_IPV6_PRE48_SA,
+ SXE2_FLOW_FLD_ID_IPV6_PRE48_DA = 25,
+ SXE2_FLOW_FLD_ID_IPV6_PRE64_SA,
+ SXE2_FLOW_FLD_ID_IPV6_PRE64_DA,
+
+ SXE2_FLOW_FLD_ID_TCP_SRC_PORT,
+ SXE2_FLOW_FLD_ID_TCP_DST_PORT,
+ SXE2_FLOW_FLD_ID_UDP_SRC_PORT = 30,
+ SXE2_FLOW_FLD_ID_UDP_DST_PORT,
+ SXE2_FLOW_FLD_ID_SCTP_SRC_PORT,
+ SXE2_FLOW_FLD_ID_SCTP_DST_PORT,
+ SXE2_FLOW_FLD_ID_TCP_FLAGS,
+ SXE2_FLOW_FLD_ID_TCP_CHKSUM = 35,
+ SXE2_FLOW_FLD_ID_UDP_CHKSUM,
+ SXE2_FLOW_FLD_ID_SCTP_CHKSUM,
+
+ SXE2_FLOW_FLD_ID_VXLAN_VNI,
+
+ SXE2_FLOW_FLD_ID_GENEVE_VNI,
+
+ SXE2_FLOW_FLD_ID_GTPU_TEID = 40,
+
+ SXE2_FLOW_FLD_ID_NVGRE_TNI,
+
+ SXE2_FLOW_FLD_ID_MAX = 128,
+};
+
+struct sxe2_ether_hdr {
+ uint8_t dst_addr[SXE2_ETH_ALEN];
+ uint8_t src_addr[SXE2_ETH_ALEN];
+ uint16_t ether_type;
+};
+
+struct sxe2_vlan_hdr {
+ uint16_t type;
+ uint16_t vlan;
+};
+
+struct sxe2_ipv4_hdr {
+ uint8_t ver_ihl;
+ uint8_t tos;
+ uint16_t tot_len;
+ uint16_t id;
+ uint16_t frag_off;
+ uint8_t ttl;
+ uint8_t protocol;
+ uint16_t check;
+ uint32_t saddr;
+ uint32_t daddr;
+};
+#define SXE2_IPV6_ADDR_LENGTH (16)
+#define SXE2_IPV6_TC_SHIFT (20)
+#define SXE2_IPV6_TC_MASK (0xFF)
+
+struct sxe2_ipv6_hdr {
+ uint32_t pri_ver_flow;
+ uint16_t payload_len;
+ uint8_t nexthdr;
+ uint8_t hop_limit;
+ union {
+ uint8_t saddr[16];
+ uint16_t saddr16[8];
+ uint32_t saddr32[4];
+ };
+ union {
+ uint8_t daddr[16];
+ uint16_t daddr16[8];
+ uint32_t daddr32[4];
+ };
+};
+
+struct sxe2_tcp_hdr {
+ uint16_t source;
+ uint16_t dest;
+ uint32_t seq;
+ uint32_t ack_seq;
+ uint16_t flag;
+ uint16_t window;
+ uint16_t check;
+ uint16_t urg_ptr;
+};
+
+struct sxe2_udp_hdr {
+ uint16_t source;
+ uint16_t dest;
+ uint16_t len;
+ uint16_t check;
+};
+
+struct sxe2_sctp_hdr {
+ uint16_t src_port;
+ uint16_t dst_port;
+};
+
+struct sxe2_nvgre_hdr {
+ uint16_t flags;
+ uint16_t protocol;
+ uint32_t tni;
+};
+struct sxe2_geneve_hdr {
+ uint16_t flags;
+ uint16_t protocol;
+ uint32_t vni;
+};
+struct sxe2_gtpu_hdr {
+ uint8_t flag;
+ uint8_t msg_type;
+ uint16_t msg_len;
+ uint32_t teid;
+};
+struct sxe2_vxlan_hdr {
+ uint8_t flag;
+ uint8_t resvd0;
+ uint8_t resvd1;
+ uint8_t protocol;
+ uint32_t vni;
+};
+
+enum sxe2_flow_act_type {
+ SXE2_FLOW_ACTION_DROP = 0,
+ SXE2_FLOW_ACTION_TC_REDIRECT,
+ SXE2_FLOW_ACTION_TO_VSI,
+ SXE2_FLOW_ACTION_TO_VSI_LIST,
+ SXE2_FLOW_ACTION_PASSTHRU,
+ SXE2_FLOW_ACTION_QUEUE,
+ SXE2_FLOW_ACTION_Q_REGION,
+ SXE2_FLOW_ACTION_MARK,
+ SXE2_FLOW_ACTION_COUNT,
+ SXE2_FLOW_ACTION_RSS,
+ SXE2_FLOW_ACTION_MAX = 32,
+};
+
+enum sxe2_rss_hash_key_func {
+ SXE2_RSS_HASH_FUNC_TOEPLITZ = 0,
+ SXE2_RSS_HASH_FUNC_SYM_TOEPLITZ = 1,
+ SXE2_RSS_HASH_FUNC_XOR = 2,
+ SXE2_RSS_HASH_FUNC_JEKINS = 3
+};
+
+struct sxe2_flow_action_rss {
+ DECLARE_BITMAP(hdr_out, SXE2_FLOW_HDR_MAX);
+ DECLARE_BITMAP(hdr_in, SXE2_FLOW_HDR_MAX);
+ DECLARE_BITMAP(fld, SXE2_FLOW_FLD_ID_MAX);
+ uint8_t is_inner;
+ uint8_t func;
+ uint8_t hdr_type;
+};
+
+struct sxe2_flow_action_queue {
+ uint16_t vsi_index;
+ uint16_t q_index;
+};
+
+struct sxe2_flow_action_queue_region {
+ uint16_t vsi_index;
+ uint16_t q_index;
+ uint8_t region;
+};
+
+struct sxe2_flow_action_passthru {
+ uint16_t vsi_index;
+};
+
+struct sxe2_flow_action_mark {
+ uint32_t mark_id;
+};
+
+#define SXE2_VSI_MAX (2048)
+struct sxe2_flow_action_vsi {
+ uint16_t vsi_index;
+};
+
+struct sxe2_flow_action_vsi_list {
+ DECLARE_BITMAP(vsi_list_map, SXE2_VSI_MAX);
+ uint16_t vsi_cnt;
+};
+
+enum sxe2_fnav_stat_ctrl_type {
+ SXE2_FNAV_STAT_ENA_NONE = 0,
+ SXE2_FNAV_STAT_ENA_PKTS,
+ SXE2_FNAV_STAT_ENA_BYTES,
+ SXE2_FNAV_STAT_ENA_ALL,
+};
+
+struct sxe2_flow_action_count {
+ uint32_t user_id;
+ uint32_t driver_id;
+ uint32_t stat_index;
+ uint32_t stat_ctrl;
+};
+
+enum sxe2_flow_engine_type {
+ SXE2_FLOW_ENGINE_ACL,
+ SXE2_FLOW_ENGINE_SWITCH,
+ SXE2_FLOW_ENGINE_FNAV,
+ SXE2_FLOW_ENGINE_RSS,
+ SXE2_FLOW_ENGINE_MAX,
+};
+
+struct sxe2_flow_item {
+ struct sxe2_ether_hdr eth;
+ struct sxe2_vlan_hdr vlan;
+ struct sxe2_vlan_hdr qinq;
+ struct sxe2_ipv4_hdr ipv4;
+ struct sxe2_ipv6_hdr ipv6;
+ struct sxe2_udp_hdr udp;
+ struct sxe2_tcp_hdr tcp;
+ struct sxe2_sctp_hdr sctp;
+ struct sxe2_gtpu_hdr gtpu;
+ struct sxe2_vxlan_hdr vxlan;
+ struct sxe2_nvgre_hdr nvgre;
+ struct sxe2_geneve_hdr geneve;
+};
+
+enum sxe2_flow_sw_direct_type {
+ SXE2_FLOW_SW_DIRECT_TX,
+ SXE2_FLOW_SW_DIRECT_RX,
+ SXE2_FLOW_SW_DIRECT_MAX,
+};
+enum sxe2_flow_sw_pattern_type {
+ SXE2_FLOW_SW_PATTERN_ONLY,
+ SXE2_FLOW_SW_PATTERN_LAST,
+ SXE2_FLOW_SW_PATTERN_FIRST,
+ SXE2_FLOW_SW_PATTERN_MAX,
+};
+
+enum sxe2_flow_tunnel_type {
+ SXE2_FLOW_TUNNEL_TYPE_NONE,
+ SXE2_FLOW_TUNNEL_TYPE_PARENT,
+ SXE2_FLOW_TUNNEL_TYPE_VXLAN,
+ SXE2_FLOW_TUNNEL_TYPE_GTPU,
+ SXE2_FLOW_TUNNEL_TYPE_GENEVE,
+ SXE2_FLOW_TUNNEL_TYPE_GRE,
+ SXE2_FLOW_TUNNEL_TYPE_IPIP,
+};
+
+struct sxe2_flow_meta {
+ uint8_t switch_pattern_dup_allow;
+ uint8_t switch_src_direct;
+ uint16_t flow_src_vsi;
+ uint16_t flow_rule_vsi;
+ uint32_t flow_prio;
+ uint16_t flow_type;
+ uint8_t tunnel_type;
+ uint8_t rsv;
+};
+
+struct sxe2_flow_pattern {
+ DECLARE_BITMAP(hdrs, SXE2_FLOW_HDR_MAX);
+ DECLARE_BITMAP(map_spec, SXE2_FLOW_FLD_ID_MAX);
+ DECLARE_BITMAP(map_mask, SXE2_FLOW_FLD_ID_MAX);
+ struct sxe2_flow_item item_spec;
+ struct sxe2_flow_item item_mask;
+ uint64_t rss_type_allow;
+};
+
+struct sxe2_flow_action {
+ DECLARE_BITMAP(act_types, SXE2_FLOW_ACTION_MAX);
+ struct sxe2_flow_action_rss rss;
+ struct sxe2_flow_action_queue queue;
+ struct sxe2_flow_action_queue_region q_region;
+ struct sxe2_flow_action_passthru passthru;
+ struct sxe2_flow_action_vsi vsi;
+ struct sxe2_flow_action_vsi_list vsi_list;
+ struct sxe2_flow_action_mark mark;
+ struct sxe2_flow_action_count count;
+};
+#endif /* __SXE2_FLOW_PUBLIC_H__ */
diff --git a/drivers/net/sxe2/meson.build b/drivers/net/sxe2/meson.build
index b661e3cbf4..da7a690063 100644
--- a/drivers/net/sxe2/meson.build
+++ b/drivers/net/sxe2/meson.build
@@ -62,4 +62,5 @@ sources += files(
'sxe2_txrx_vec.c',
'sxe2_mac.c',
'sxe2_filter.c',
+ 'sxe2_rss.c',
)
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.c b/drivers/net/sxe2/sxe2_cmd_chnl.c
index 1fa9ad718e..b997e7b044 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.c
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.c
@@ -541,3 +541,176 @@ int32_t sxe2_drv_vlan_filter_switch(struct sxe2_adapter *adapter, bool on)
return ret;
}
+
+int32_t sxe2_drv_rss_key_set(struct sxe2_adapter *adapter, uint8_t *key, uint16_t key_size)
+{
+ struct sxe2_common_device *cdev = adapter->cdev;
+ struct sxe2_drv_cmd_params param = {0};
+ struct sxe2_rss_key_req *req = NULL;
+ int32_t ret = 0;
+ uint16_t buf_size = sizeof(*req) + key_size;
+
+ req = rte_zmalloc("drv_cmd_rss_key", buf_size, 0);
+ if (!req) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to alloc rss key");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ req->vsi_id = rte_cpu_to_le_16(adapter->vsi_ctxt.dpdk_vsi_id);
+ req->key_size = rte_cpu_to_le_16(key_size);
+ rte_memcpy(req->key, key, key_size);
+
+ sxe2_drv_cmd_params_fill(adapter, ¶m, SXE2_DRV_CMD_RSS_KEY_SET,
+ req, buf_size, NULL, 0);
+
+ ret = sxe2_drv_cmd_exec(cdev, ¶m);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to cmd set rss key, ret=%d", ret);
+ goto l_end;
+ }
+
+l_end:
+ if (req) {
+ rte_free(req);
+ req = NULL;
+ }
+ return ret;
+}
+
+int32_t sxe2_drv_rss_lut_set(struct sxe2_adapter *adapter, uint8_t *lut, uint16_t lut_size)
+{
+ struct sxe2_common_device *cdev = adapter->cdev;
+ struct sxe2_drv_cmd_params param = {0};
+ struct sxe2_rss_lut_req *req = NULL;
+ int32_t ret = 0;
+ uint16_t buf_size = sizeof(struct sxe2_rss_lut_req) + lut_size;
+
+ req = rte_zmalloc("drv_cmd_rss_lut", buf_size, 0);
+ if (!req) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to alloc rss lut");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ req->vsi_id = rte_cpu_to_le_16(adapter->vsi_ctxt.dpdk_vsi_id);
+ req->lut_size = rte_cpu_to_le_16(lut_size);
+ rte_memcpy(req->lut, lut, lut_size);
+
+ sxe2_drv_cmd_params_fill(adapter, ¶m, SXE2_DRV_CMD_RSS_LUT_SET,
+ req, buf_size, NULL, 0);
+
+ ret = sxe2_drv_cmd_exec(cdev, ¶m);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to cmd set rss lut, ret=%d", ret);
+ goto l_end;
+ }
+
+l_end:
+ if (req) {
+ rte_free(req);
+ req = NULL;
+ }
+ return ret;
+}
+
+int32_t sxe2_drv_rss_hash_ctrl_func(struct sxe2_adapter *adapter, enum sxe2_rss_hash_key_func func)
+{
+ struct sxe2_common_device *cdev = adapter->cdev;
+ struct sxe2_drv_cmd_params param = {0};
+ struct sxe2_rss_func_req req = {0};
+ int32_t ret = 0;
+
+ req.vsi_id = rte_cpu_to_le_16(adapter->vsi_ctxt.dpdk_vsi_id);
+ req.func = func;
+
+ sxe2_drv_cmd_params_fill(adapter, ¶m, SXE2_DRV_CMD_RSS_FUNC_SET,
+ &req, sizeof(req), NULL, 0);
+
+ ret = sxe2_drv_cmd_exec(cdev, ¶m);
+ if (ret)
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to cmd set rss func, ret=%d", ret);
+ return ret;
+}
+
+static void sxe2_drv_flow_bitmap_fill(uint32_t *bitmap, uint16_t *list)
+{
+ uint16_t index = 0;
+ uint16_t i = 0;
+ uint16_t map_size = sizeof(*bitmap) * SXE2_BITS_PER_BYTE;
+
+ while (list[i] != SXE2_FLOW_END) {
+ index = list[i] / map_size;
+ bitmap[index] |= (1UL << (list[i] % map_size));
+ i++;
+ }
+}
+
+int32_t sxe2_drv_rss_hf_add(struct sxe2_adapter *adapter,
+ struct sxe2_rss_hf_config *rss_conf)
+{
+ struct sxe2_common_device *cdev = adapter->cdev;
+ struct sxe2_drv_cmd_params param = {0};
+ struct sxe2_rss_hf_req req = {0};
+ int32_t ret = 0;
+
+ req.vsi_id = rte_cpu_to_le_16(adapter->vsi_ctxt.dpdk_vsi_id);
+ req.symm = rss_conf->symm;
+ req.hdr_type = rte_cpu_to_le_32(SXE2_RSS_OUTER_HEADERS);
+ sxe2_drv_flow_bitmap_fill(req.headers, rss_conf->hdrs);
+ sxe2_drv_flow_bitmap_fill(req.hash_flds, rss_conf->flds);
+
+ sxe2_drv_cmd_params_fill(adapter, ¶m, SXE2_DRV_CMD_RSS_HF_ADD,
+ &req, sizeof(req), NULL, 0);
+
+ ret = sxe2_drv_cmd_exec(cdev, ¶m);
+ if (ret)
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to cmd add rss hf, ret=%d", ret);
+ return ret;
+}
+
+int32_t sxe2_drv_rss_hf_del(struct sxe2_adapter *adapter,
+ struct sxe2_rss_hf_config *rss_conf)
+{
+ struct sxe2_common_device *cdev = adapter->cdev;
+ struct sxe2_drv_cmd_params param = {0};
+ struct sxe2_rss_hf_req req = {0};
+ int32_t ret = 0;
+
+ req.vsi_id = rte_cpu_to_le_16(adapter->vsi_ctxt.dpdk_vsi_id);
+ req.symm = rss_conf->symm;
+ req.hdr_type = rte_cpu_to_le_32(SXE2_RSS_OUTER_HEADERS);
+ sxe2_drv_flow_bitmap_fill(req.headers, rss_conf->hdrs);
+ sxe2_drv_flow_bitmap_fill(req.hash_flds, rss_conf->flds);
+
+ sxe2_drv_cmd_params_fill(adapter, ¶m, SXE2_DRV_CMD_RSS_HF_DEL,
+ &req, sizeof(req), NULL, 0);
+
+ ret = sxe2_drv_cmd_exec(cdev, ¶m);
+ if (ret)
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to cmd del rss hf, ret=%d", ret);
+ return ret;
+}
+
+int32_t sxe2_drv_rss_hf_clear(struct sxe2_adapter *adapter)
+{
+ struct sxe2_common_device *cdev = adapter->cdev;
+ struct sxe2_drv_cmd_params param = {0};
+ int32_t ret = 0;
+
+ sxe2_drv_cmd_params_fill(adapter, ¶m, SXE2_DRV_CMD_RSS_HF_CLEAR,
+ NULL, 0, NULL, 0);
+
+ ret = sxe2_drv_cmd_exec(cdev, ¶m);
+ if (ret)
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to cmd clear rss hf, ret=%d", ret);
+
+ return ret;
+}
+
+int32_t sxe2_drv_ptp_gettime(struct sxe2_adapter *adapter, struct sxe2_rx_queue *rxq)
+{
+ (void)adapter;
+ (void)rxq;
+ return 0;
+}
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.h b/drivers/net/sxe2/sxe2_cmd_chnl.h
index c93bc2b0c9..2546c65a6c 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.h
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.h
@@ -53,4 +53,20 @@ int32_t sxe2_drv_vlan_insert_strip_cfg(struct sxe2_adapter *adapter);
int32_t sxe2_drv_vlan_filter_switch(struct sxe2_adapter *adapter, bool on);
+int32_t sxe2_drv_rss_key_set(struct sxe2_adapter *adapter, uint8_t *key, uint16_t key_size);
+
+int32_t sxe2_drv_rss_lut_set(struct sxe2_adapter *adapter, uint8_t *lut, uint16_t lut_size);
+
+int32_t sxe2_drv_rss_hash_ctrl_func(struct sxe2_adapter *adapter, enum sxe2_rss_hash_key_func func);
+
+int32_t sxe2_drv_rss_hf_add(struct sxe2_adapter *adapter,
+ struct sxe2_rss_hf_config *rss_conf);
+
+int32_t sxe2_drv_rss_hf_del(struct sxe2_adapter *adapter,
+ struct sxe2_rss_hf_config *rss_conf);
+
+int32_t sxe2_drv_rss_hf_clear(struct sxe2_adapter *adapter);
+
+int32_t sxe2_drv_ptp_gettime(struct sxe2_adapter *adapter, struct sxe2_rx_queue *rxq);
+
#endif /* SXE2_CMD_CHNL_H */
diff --git a/drivers/net/sxe2/sxe2_drv_cmd.h b/drivers/net/sxe2/sxe2_drv_cmd.h
index d69d650148..9998f241f0 100644
--- a/drivers/net/sxe2/sxe2_drv_cmd.h
+++ b/drivers/net/sxe2/sxe2_drv_cmd.h
@@ -6,6 +6,7 @@
#define SXE2_DRV_CMD_H
#include "sxe2_osal.h"
+#include "sxe2_flow_public.h"
#define SXE2_DRV_CMD_MODULE_S (16)
#define SXE2_MK_DRV_CMD(module, cmd) (((module) << SXE2_DRV_CMD_MODULE_S) | ((cmd) & 0xFFFF))
@@ -320,6 +321,34 @@ struct __rte_aligned(4) __rte_packed_begin sxe2_vlan_filter_switch_req {
uint8_t rsv;
} __rte_packed_end;
+struct __rte_aligned(4) __rte_packed_begin sxe2_rss_key_req {
+ uint16_t vsi_id;
+ uint16_t key_size;
+ uint8_t key[];
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_rss_lut_req {
+ uint16_t vsi_id;
+ uint16_t lut_size;
+ uint8_t lut[];
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_rss_func_req {
+ uint16_t vsi_id;
+ uint8_t func;
+ uint8_t rsv[1];
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_rss_hf_req {
+ uint16_t vsi_id;
+ uint8_t rsv[2];
+ uint32_t headers[BITS_TO_U32(SXE2_FLOW_HDR_MAX)];
+ uint32_t hash_flds[BITS_TO_U32(SXE2_FLOW_FLD_ID_MAX)];
+ uint32_t hdr_type;
+ uint8_t symm;
+ uint8_t rsv1[3];
+} __rte_packed_end;
+
enum sxe2_drv_cmd_module {
SXE2_DRV_CMD_MODULE_HANDSHAKE = 0,
SXE2_DRV_CMD_MODULE_DEV = 1,
diff --git a/drivers/net/sxe2/sxe2_ethdev.c b/drivers/net/sxe2/sxe2_ethdev.c
index 9b117f097e..d48841b8e4 100644
--- a/drivers/net/sxe2/sxe2_ethdev.c
+++ b/drivers/net/sxe2/sxe2_ethdev.c
@@ -125,6 +125,11 @@ static const struct eth_dev_ops sxe2_eth_dev_ops = {
.vlan_filter_set = sxe2_dev_vlan_filter_set,
.vlan_offload_set = sxe2_dev_vlan_offload_set,
+
+ .reta_update = sxe2_dev_rss_reta_update,
+ .reta_query = sxe2_dev_rss_reta_query,
+ .rss_hash_update = sxe2_dev_rss_hash_update,
+ .rss_hash_conf_get = sxe2_dev_rss_hash_conf_get,
};
static int32_t sxe2_dev_configure(struct rte_eth_dev *dev)
@@ -141,6 +146,12 @@ static int32_t sxe2_dev_configure(struct rte_eth_dev *dev)
goto end;
}
+ ret = sxe2_rss_init(dev);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to init rss, ret=%d", ret);
+ goto end;
+ }
+
end:
return ret;
}
@@ -281,6 +292,22 @@ static int32_t sxe2_dev_infos_get(struct rte_eth_dev *dev,
RTE_ETH_TX_OFFLOAD_IPIP_TNL_TSO |
RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO;
+
+ if (adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_PTP)
+ dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+
+ if (adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_RSS) {
+ dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
+ dev_info->flow_type_rss_offloads |= SXE2_RSS_HF_SUPPORT_ALL;
+ dev_info->reta_size = adapter->rss_ctxt.rss_lut_size;
+ dev_info->hash_key_size = adapter->rss_ctxt.rss_key_size;
+ dev_info->rss_algo_capa =
+ RTE_ETH_HASH_ALGO_TO_CAPA(RTE_ETH_HASH_FUNCTION_DEFAULT) |
+ RTE_ETH_HASH_ALGO_TO_CAPA(RTE_ETH_HASH_FUNCTION_TOEPLITZ) |
+ RTE_ETH_HASH_ALGO_TO_CAPA(RTE_ETH_HASH_FUNCTION_SYMMETRIC_TOEPLITZ) |
+ RTE_ETH_HASH_ALGO_TO_CAPA(RTE_ETH_HASH_FUNCTION_SIMPLE_XOR);
+ }
+
dev_info->default_rxconf = (struct rte_eth_rxconf) {
.rx_thresh = {
.pthresh = SXE2_DEFAULT_RX_PTHRESH,
@@ -563,6 +590,8 @@ static int32_t sxe2_func_caps_get(struct sxe2_adapter *adapter)
sxe2_sw_queue_ctx_hw_cap_set(adapter, &dev_caps.queue_caps);
+ sxe2_sw_rss_ctx_hw_cap_set(adapter, &dev_caps.rss_hash_caps);
+
sxe2_sw_vsi_ctx_hw_cap_set(adapter, &dev_caps.vsi_caps);
l_end:
@@ -950,8 +979,15 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
goto init_eth_err;
}
+ ret = sxe2_rss_disable(dev);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to disable rss, ret=%d", ret);
+ goto init_rss_err;
+ }
+
goto l_end;
+init_rss_err:
init_eth_err:
init_dev_info_err:
sxe2_vsi_uninit(dev);
@@ -965,6 +1001,7 @@ static int32_t sxe2_dev_close(struct rte_eth_dev *dev)
{
(void)sxe2_dev_stop(dev);
(void)sxe2_queues_release(dev);
+ (void)sxe2_rss_disable(dev);
sxe2_vsi_uninit(dev);
sxe2_dev_pci_map_uinit(dev);
sxe2_eth_uinit(dev);
diff --git a/drivers/net/sxe2/sxe2_ethdev.h b/drivers/net/sxe2/sxe2_ethdev.h
index 34a4a45e4f..3955788634 100644
--- a/drivers/net/sxe2/sxe2_ethdev.h
+++ b/drivers/net/sxe2/sxe2_ethdev.h
@@ -15,6 +15,7 @@
#include "sxe2_common.h"
#include "sxe2_vsi.h"
+#include "sxe2_rss.h"
#include "sxe2_irq.h"
#include "sxe2_queue.h"
#include "sxe2_mac.h"
@@ -122,6 +123,11 @@ enum {
SXE2_FLAGS_NBITS
};
+struct sxe2_ptp_context {
+ uint64_t mbuf_rx_ts_flag;
+ int32_t mbuf_rx_ts_offset;
+};
+
struct sxe2_devargs {
uint8_t flow_dup_pattern_mode;
uint8_t func_flow_direct_en;
@@ -300,7 +306,9 @@ struct sxe2_adapter {
struct sxe2_queue_context q_ctxt;
struct sxe2_vsi_context vsi_ctxt;
struct sxe2_filter_context filter_ctxt;
+ struct sxe2_rss_context rss_ctxt;
struct sxe2_link_context link_ctxt;
+ struct sxe2_ptp_context ptp_ctxt;
struct sxe2_devargs devargs;
struct sxe2_switchdev_info switchdev_info;
bool rule_started;
diff --git a/drivers/net/sxe2/sxe2_flow_define.h b/drivers/net/sxe2/sxe2_flow_define.h
new file mode 100644
index 0000000000..d2f6000efa
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_flow_define.h
@@ -0,0 +1,143 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef __SXE2_FLOW_DEFINE_H__
+#define __SXE2_FLOW_DEFINE_H__
+#include <rte_tailq.h>
+#include <rte_eal.h>
+#include <rte_flow_driver.h>
+
+#include "sxe2_osal.h"
+#include "sxe2_flow_public.h"
+
+#define SXE2_FLOW_ETH_TYPE_MIN (1500)
+
+enum sxe2_expansion {
+ SXE2_EXPANSION_ERROR = 0,
+ SXE2_EXPANSION_OUTER_ETH,
+ SXE2_EXPANSION_OUTER_VLAN,
+ SXE2_EXPANSION_OUTER_QINQ,
+ SXE2_EXPANSION_OUTER_IPV4,
+ SXE2_EXPANSION_OUTER_IPV4_FRAG_EXT,
+ SXE2_EXPANSION_OUTER_IPV6,
+ SXE2_EXPANSION_OUTER_IPV6_FRAG_EXT,
+ SXE2_EXPANSION_OUTER_UDP,
+ SXE2_EXPANSION_OUTER_TCP,
+ SXE2_EXPANSION_OUTER_SCTP,
+
+ SXE2_EXPANSION_VXLAN,
+ SXE2_EXPANSION_VXLAN_GPE,
+ SXE2_EXPANSION_GRE,
+ SXE2_EXPANSION_NVGRE,
+ SXE2_EXPANSION_GENEVE,
+ SXE2_EXPANSION_GTPU,
+ SXE2_EXPANSION_IPIP,
+ SXE2_EXPANSION_OUTER_END,
+
+ SXE2_EXPANSION_ETH,
+ SXE2_EXPANSION_VLAN,
+ SXE2_EXPANSION_IPV4,
+ SXE2_EXPANSION_IPV4_FRAG_EXT,
+ SXE2_EXPANSION_IPV6,
+ SXE2_EXPANSION_IPV6_FRAG_EXT,
+ SXE2_EXPANSION_UDP,
+ SXE2_EXPANSION_TCP,
+ SXE2_EXPANSION_SCTP,
+
+ SXE2_EXPANSION_END,
+ SXE2_EXPANSION_MAX,
+};
+
+enum sxe2_flow_udp_tunnel_protocol {
+ SXE2_FLOW_UDP_TUNNEL_PROTOCOL_VXLAN,
+ SXE2_FLOW_UDP_TUNNEL_PROTOCOL_VXLAN_GPE,
+ SXE2_FLOW_UDP_TUNNEL_PROTOCOL_GENEVE,
+ SXE2_FLOW_UDP_TUNNEL_PROTOCOL_GTP_U,
+ SXE2_FLOW_UDP_TUNNEL_PROTOCOL_NVGRE,
+ SXE2_FLOW_UDP_TUNNEL_MAX,
+};
+
+enum {
+ SXE2_FLOW_ETH_TYPE_IPV4 = 0x0800,
+ SXE2_FLOW_ETH_TYPE_IPV6 = 0x86DD,
+ SXE2_FLOW_IP_PROTOCOL_GRE = 0x2F,
+ SXE2_FLOW_IP_PROTOCOL_IPV4 = 0x04,
+ SXE2_FLOW_IP_PROTOCOL_IPV6 = 0x29,
+ SXE2_FLOW_IP_PROTOCOL_ETH = 0x3B,
+ SXE2_FLOW_IP_PROTOCOL_UDP = 0x11,
+ SXE2_FLOW_IP_PROTOCOL_TCP = 0x06,
+ SXE2_FLOW_IP_PROTOCOL_SCTP = 0x84,
+};
+
+union sxe2_flow_item_raw {
+ struct sxe2_flow_item item;
+ uint8_t raw[sizeof(struct sxe2_flow_item)];
+};
+
+struct sxe2_flow {
+ TAILQ_ENTRY(sxe2_flow) next;
+ enum sxe2_flow_engine_type engine_type;
+ struct sxe2_flow_pattern pattern_outer;
+ struct sxe2_flow_pattern pattern_inner;
+ uint8_t has_mask;
+ uint8_t has_spec;
+ uint8_t has_hdr;
+ struct sxe2_flow_meta meta;
+ struct sxe2_flow_action action;
+ uint32_t flow_id;
+ int32_t create_err;
+ DECLARE_BITMAP(flow_type, SXE2_EXPANSION_MAX);
+};
+
+TAILQ_HEAD(sxe2_flow_list_t, sxe2_flow);
+
+struct rte_flow {
+ TAILQ_ENTRY(rte_flow) next;
+ struct sxe2_flow_list_t sxe2_flow_list;
+};
+TAILQ_HEAD(rte_flow_list_t, rte_flow);
+
+struct sxe2_fnav_cid_mgr {
+ TAILQ_ENTRY(sxe2_fnav_cid_mgr) next;
+ uint16_t stat_index;
+ uint32_t user_id;
+ uint32_t driver_id;
+ uint32_t count_type;
+ uint64_t hits;
+ uint64_t bytes;
+};
+TAILQ_HEAD(sxe2_fnav_cid_mgr_list_t, sxe2_fnav_cid_mgr);
+
+struct sxe2_fnav_count_resource {
+ uint32_t count_type;
+ uint32_t global_index;
+ struct sxe2_fnav_cid_mgr_list_t fnav_cid_mgr_list;
+};
+
+struct sxe2_flow_context {
+ struct rte_flow_list_t rte_flow_list;
+ rte_spinlock_t flow_list_lock;
+ struct sxe2_fnav_count_resource hw_res;
+ uint32_t fnav_inited;
+};
+#define SXE2_INVALID_RSS_ATTR \
+ (RTE_ETH_RSS_L3_PRE40 | RTE_ETH_RSS_L3_PRE56 | RTE_ETH_RSS_L3_PRE96)
+#define SXE2_VALID_RSS_IPV4_L4 \
+ (RTE_ETH_RSS_NONFRAG_IPV4_UDP | RTE_ETH_RSS_NONFRAG_IPV4_TCP | \
+ RTE_ETH_RSS_NONFRAG_IPV4_SCTP)
+
+#define SXE2_VALID_RSS_IPV6_L4 \
+ (RTE_ETH_RSS_NONFRAG_IPV6_UDP | RTE_ETH_RSS_NONFRAG_IPV6_TCP | \
+ RTE_ETH_RSS_NONFRAG_IPV6_SCTP)
+#define SXE2_VALID_RSS_IPV4 \
+ (RTE_ETH_RSS_IPV4 | RTE_ETH_RSS_FRAG_IPV4 | \
+ RTE_ETH_RSS_NONFRAG_IPV4_OTHER | SXE2_VALID_RSS_IPV4_L4)
+#define SXE2_VALID_RSS_IPV6 \
+ (RTE_ETH_RSS_IPV6 | RTE_ETH_RSS_FRAG_IPV6 | \
+ RTE_ETH_RSS_NONFRAG_IPV6_OTHER | SXE2_VALID_RSS_IPV6_L4)
+
+#define SXE2_VALID_RSS_L3 (SXE2_VALID_RSS_IPV4 | SXE2_VALID_RSS_IPV6)
+#define SXE2_VALID_RSS_L4 (SXE2_VALID_RSS_IPV4_L4 | SXE2_VALID_RSS_IPV6_L4)
+
+#endif /* __SXE2_FLOW_DEFINE_H__ */
diff --git a/drivers/net/sxe2/sxe2_queue.c b/drivers/net/sxe2/sxe2_queue.c
index 1786d6ea4f..220cab6fce 100644
--- a/drivers/net/sxe2/sxe2_queue.c
+++ b/drivers/net/sxe2/sxe2_queue.c
@@ -17,6 +17,7 @@ void sxe2_sw_queue_ctx_hw_cap_set(struct sxe2_adapter *adapter,
int32_t sxe2_queues_init(struct rte_eth_dev *dev)
{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
int32_t ret = 0;
uint16_t buf_size;
uint16_t frame_size;
@@ -36,6 +37,16 @@ int32_t sxe2_queues_init(struct rte_eth_dev *dev)
dev->data->scattered_rx = 1;
}
+ adapter->ptp_ctxt.mbuf_rx_ts_offset = -1;
+ adapter->ptp_ctxt.mbuf_rx_ts_flag = 0;
+ if (dev->data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) {
+ ret = rte_mbuf_dyn_rx_timestamp_register
+ (&adapter->ptp_ctxt.mbuf_rx_ts_offset,
+ (uint64_t *)&adapter->ptp_ctxt.mbuf_rx_ts_flag);
+ if (ret)
+ PMD_LOG_ERR(INIT, "Failed to enable timestamp offloads, ret=%d", ret);
+ }
+
return ret;
}
--git a/drivers/net/sxe2/sxe2_rss.c b/drivers/net/sxe2/sxe2_rss.c
new file mode 100644
index 0000000000..1d56613043
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_rss.c
@@ -0,0 +1,584 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#include "sxe2_rss.h"
+#include "sxe2_common_log.h"
+#include "sxe2_ethdev.h"
+#include "sxe2_cmd_chnl.h"
+
+void sxe2_sw_rss_ctx_hw_cap_set(struct sxe2_adapter *adapter,
+ struct sxe2_drv_rss_hash_caps *rss_caps)
+{
+ adapter->rss_ctxt.rss_key_size = rss_caps->hash_key_size;
+ adapter->rss_ctxt.rss_lut_size = rss_caps->lut_key_size;
+}
+
+int32_t sxe2_rss_hash_key_init(struct rte_eth_dev *dev)
+{
+ struct rte_eth_rss_conf *rss_conf = &dev->data->dev_conf.rx_adv_conf.rss_conf;
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+ int32_t ret = 0;
+ uint16_t i = 0;
+
+ if (rss_ctxt->rss_key == NULL) {
+ rss_ctxt->rss_key = (uint8_t *)rte_zmalloc("rss_key", rss_ctxt->rss_key_size, 0);
+ if (rss_ctxt->rss_key == NULL) {
+ PMD_LOG_ERR(INIT, "Failed to allocate rss key");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ }
+
+ if (!rss_conf->rss_key) {
+ for (i = 0; i < rss_ctxt->rss_key_size; i++)
+ rss_ctxt->rss_key[i] = (uint8_t)rte_rand();
+ } else {
+ rte_memcpy(rss_ctxt->rss_key, rss_conf->rss_key,
+ RTE_MIN(rss_conf->rss_key_len, rss_ctxt->rss_key_size));
+ }
+
+ ret = sxe2_drv_rss_key_set(adapter, rss_ctxt->rss_key,
+ rss_ctxt->rss_key_size);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to set rss key, ret:%d", ret);
+ rte_free(rss_ctxt->rss_key);
+ rss_ctxt->rss_key = NULL;
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+void sxe2_rss_hash_key_uninit(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+
+ if (rss_ctxt->rss_key) {
+ rte_free(rss_ctxt->rss_key);
+ rss_ctxt->rss_key = NULL;
+ }
+}
+
+int32_t sxe2_rss_lut_init(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+ int32_t ret = 0;
+ uint16_t i;
+
+ if (rss_ctxt->rss_lut == NULL) {
+ rss_ctxt->rss_lut = (uint8_t *)rte_zmalloc("rss_lut", rss_ctxt->rss_lut_size, 0);
+ if (rss_ctxt->rss_lut == NULL) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to allocate rss lut");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ }
+
+ for (i = 0; i < rss_ctxt->rss_lut_size; i++)
+ rss_ctxt->rss_lut[i] = (uint8_t)(i % dev->data->nb_rx_queues);
+
+ ret = sxe2_drv_rss_lut_set(adapter, rss_ctxt->rss_lut, rss_ctxt->rss_lut_size);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to set rss lut, ret:%d", ret);
+ rte_free(rss_ctxt->rss_lut);
+ rss_ctxt->rss_lut = NULL;
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+void sxe2_rss_lut_uninit(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+
+ if (rss_ctxt->rss_lut) {
+ rte_free(rss_ctxt->rss_lut);
+ rss_ctxt->rss_lut = NULL;
+ }
+}
+
+static struct sxe2_rss_hf_config sxe2_rss_default_hf_config[] = {
+ {
+ .rss_hf = RTE_ETH_RSS_L2_PAYLOAD,
+ .hdrs = {SXE2_FLOW_HDR_ETH,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_ETH_TYPE,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_IPV4,
+ .hdrs = {SXE2_FLOW_HDR_IPV4,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV4_SA,
+ SXE2_FLOW_FLD_ID_IPV4_DA,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_IPV6,
+ .hdrs = {SXE2_FLOW_HDR_IPV6,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV6_SA,
+ SXE2_FLOW_FLD_ID_IPV6_DA,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_FRAG_IPV4,
+ .hdrs = {SXE2_FLOW_HDR_IPV4,
+ SXE2_FLOW_HDR_IPV_FRAG,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV4_SA,
+ SXE2_FLOW_FLD_ID_IPV4_DA,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_FRAG_IPV6,
+ .hdrs = {SXE2_FLOW_HDR_IPV6,
+ SXE2_FLOW_HDR_IPV_FRAG,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV6_SA,
+ SXE2_FLOW_FLD_ID_IPV6_DA,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_NONFRAG_IPV4_OTHER,
+ .hdrs = {SXE2_FLOW_HDR_IPV4,
+ SXE2_FLOW_HDR_IPV_OTHER,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV4_SA,
+ SXE2_FLOW_FLD_ID_IPV4_DA,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_NONFRAG_IPV6_OTHER,
+ .hdrs = {SXE2_FLOW_HDR_IPV6,
+ SXE2_FLOW_HDR_IPV_OTHER,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV6_SA,
+ SXE2_FLOW_FLD_ID_IPV6_DA,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_NONFRAG_IPV4_UDP,
+ .hdrs = {SXE2_FLOW_HDR_IPV4,
+ SXE2_FLOW_HDR_UDP,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV4_SA,
+ SXE2_FLOW_FLD_ID_IPV4_DA,
+ SXE2_FLOW_FLD_ID_UDP_SRC_PORT,
+ SXE2_FLOW_FLD_ID_UDP_DST_PORT,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_NONFRAG_IPV6_UDP,
+ .hdrs = {SXE2_FLOW_HDR_IPV6,
+ SXE2_FLOW_HDR_UDP,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV6_SA,
+ SXE2_FLOW_FLD_ID_IPV6_DA,
+ SXE2_FLOW_FLD_ID_UDP_SRC_PORT,
+ SXE2_FLOW_FLD_ID_UDP_DST_PORT,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_NONFRAG_IPV4_TCP,
+ .hdrs = {SXE2_FLOW_HDR_IPV4,
+ SXE2_FLOW_HDR_TCP,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV4_SA,
+ SXE2_FLOW_FLD_ID_IPV4_DA,
+ SXE2_FLOW_FLD_ID_TCP_SRC_PORT,
+ SXE2_FLOW_FLD_ID_TCP_DST_PORT,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_NONFRAG_IPV6_TCP,
+ .hdrs = {SXE2_FLOW_HDR_IPV6,
+ SXE2_FLOW_HDR_TCP,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV6_SA,
+ SXE2_FLOW_FLD_ID_IPV6_DA,
+ SXE2_FLOW_FLD_ID_TCP_SRC_PORT,
+ SXE2_FLOW_FLD_ID_TCP_DST_PORT,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_NONFRAG_IPV4_SCTP,
+ .hdrs = {SXE2_FLOW_HDR_IPV4,
+ SXE2_FLOW_HDR_SCTP,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV4_SA,
+ SXE2_FLOW_FLD_ID_IPV4_DA,
+ SXE2_FLOW_FLD_ID_SCTP_SRC_PORT,
+ SXE2_FLOW_FLD_ID_SCTP_DST_PORT,
+ SXE2_FLOW_END},
+ },
+ {
+ .rss_hf = RTE_ETH_RSS_NONFRAG_IPV6_SCTP,
+ .hdrs = {SXE2_FLOW_HDR_IPV6,
+ SXE2_FLOW_HDR_SCTP,
+ SXE2_FLOW_END},
+ .flds = {SXE2_FLOW_FLD_ID_IPV6_SA,
+ SXE2_FLOW_FLD_ID_IPV6_DA,
+ SXE2_FLOW_FLD_ID_SCTP_SRC_PORT,
+ SXE2_FLOW_FLD_ID_SCTP_DST_PORT,
+ SXE2_FLOW_END},
+ },
+};
+
+int32_t sxe2_rss_hf_type_set(struct rte_eth_dev *dev, uint64_t rss_hf)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+ int32_t ret = 0;
+ uint32_t i;
+ uint8_t symm = 0;
+
+ if (0 == (rss_hf & SXE2_RSS_HF_SUPPORT_ALL) && rss_hf != 0) {
+ PMD_DEV_LOG_ERR(adapter, DRV,
+ "Failed to set unsupported rss_hf:0x%016" PRIx64,
+ rss_hf);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ for (i = 0; i < RTE_DIM(sxe2_rss_default_hf_config); i++) {
+ if (rss_ctxt->rss_hf & sxe2_rss_default_hf_config[i].rss_hf) {
+ sxe2_rss_default_hf_config[i].symm = rss_ctxt->symm;
+ ret = sxe2_drv_rss_hf_del(adapter, &sxe2_rss_default_hf_config[i]);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT,
+ "Failed to del rss hf cfg[%d], ret:%d", i, ret);
+ goto l_end;
+ }
+ }
+ }
+
+ if (rss_ctxt->hash_func == RTE_ETH_HASH_FUNCTION_SYMMETRIC_TOEPLITZ)
+ symm = 1;
+
+ for (i = 0; i < RTE_DIM(sxe2_rss_default_hf_config); i++) {
+ if (rss_hf & sxe2_rss_default_hf_config[i].rss_hf) {
+ sxe2_rss_default_hf_config[i].symm = symm;
+ ret = sxe2_drv_rss_hf_add(adapter, &sxe2_rss_default_hf_config[i]);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT,
+ "Failed to add rss hf cfg[%d], ret:%d", i, ret);
+ goto l_end;
+ }
+ }
+ }
+
+ rss_ctxt->rss_hf = rss_hf & SXE2_RSS_HF_SUPPORT_ALL;
+ rss_ctxt->symm = symm;
+l_end:
+ return ret;
+}
+
+int32_t sxe2_rss_hash_function_set(struct rte_eth_dev *dev, enum rte_eth_hash_function func)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ enum sxe2_rss_hash_key_func hash_func = SXE2_RSS_HASH_FUNC_SYM_TOEPLITZ;
+ int32_t ret = 0;
+
+ switch (func) {
+ case RTE_ETH_HASH_FUNCTION_DEFAULT:
+ case RTE_ETH_HASH_FUNCTION_TOEPLITZ:
+ case RTE_ETH_HASH_FUNCTION_SYMMETRIC_TOEPLITZ:
+ hash_func = SXE2_RSS_HASH_FUNC_SYM_TOEPLITZ;
+ break;
+ case RTE_ETH_HASH_FUNCTION_SIMPLE_XOR:
+ hash_func = SXE2_RSS_HASH_FUNC_XOR;
+ break;
+ default:
+ PMD_DEV_LOG_ERR(adapter, DRV, "RSS hash function[%d] not support.", func);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = sxe2_drv_rss_hash_ctrl_func(adapter, hash_func);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to set rss hash function, ret=[%d]", ret);
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_rss_init(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct rte_eth_rss_conf *rss_conf = &dev->data->dev_conf.rx_adv_conf.rss_conf;
+ enum rte_eth_hash_function rss_func = RTE_ETH_HASH_FUNCTION_SYMMETRIC_TOEPLITZ;
+ int32_t ret = 0;
+
+ adapter->rss_ctxt.inited = false;
+
+ if (dev->data->nb_rx_queues <= 1) {
+ PMD_DEV_LOG_DEBUG(adapter, INIT, "No need to init rss, rx queues %d.",
+ dev->data->nb_rx_queues);
+ goto l_end;
+ }
+
+ if ((adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_RSS) == 0) {
+ PMD_DEV_LOG_WARN(adapter, INIT, "RSS not supported");
+ goto l_end;
+ }
+
+ ret = sxe2_rss_hash_key_init(dev);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to init rss key");
+ goto l_end;
+ }
+
+ ret = sxe2_rss_lut_init(dev);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to init rss lut");
+ goto l_err_key;
+ }
+
+ rss_func = rss_conf->algorithm;
+ ret = sxe2_rss_hash_function_set(dev, rss_func);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to init rss hash function");
+ goto l_err_lut;
+ }
+ ret = sxe2_rss_hf_type_set(dev, rss_conf->rss_hf);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to set rss hf type");
+ goto l_err_lut;
+ }
+ adapter->rss_ctxt.inited = true;
+ goto l_end;
+
+l_err_lut:
+ sxe2_rss_lut_uninit(dev);
+l_err_key:
+ sxe2_rss_hash_key_uninit(dev);
+l_end:
+ return ret;
+}
+
+int32_t sxe2_rss_disable(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ int32_t ret = 0;
+
+ PMD_INIT_FUNC_TRACE();
+
+ if ((adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_RSS) == 0)
+ goto l_end;
+
+ ret = sxe2_drv_rss_hf_clear(adapter);
+ if (ret)
+ PMD_LOG_ERR(INIT, "Failed to clear rss hf");
+
+ sxe2_rss_hash_key_uninit(dev);
+
+ sxe2_rss_lut_uninit(dev);
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_dev_rss_reta_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t reta_size)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+ uint8_t *lut_tmp = NULL;
+ int32_t ret = 0;
+ uint16_t i;
+ uint16_t shift;
+ uint16_t idx;
+
+ if (!adapter->rss_ctxt.inited) {
+ PMD_DEV_LOG_INFO(adapter, DRV, "RSS not inited.");
+ ret = -ENOTSUP;
+ goto l_end;
+ }
+
+ if (reta_size != rss_ctxt->rss_lut_size) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "The size of hash lookup table configured "
+ "(%d) doesn't match the number of hardware can "
+ "support (%d)", reta_size, rss_ctxt->rss_lut_size);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ lut_tmp = rte_zmalloc("rss_lut_temp", reta_size, 0);
+ if (!lut_tmp) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "No memory can be allocated");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ rte_memcpy(lut_tmp, rss_ctxt->rss_lut, reta_size);
+
+ for (i = 0; i < reta_size; i++) {
+ idx = i / RTE_ETH_RETA_GROUP_SIZE;
+ shift = i % RTE_ETH_RETA_GROUP_SIZE;
+ if (reta_conf[idx].mask & (1ULL << shift))
+ lut_tmp[i] = reta_conf[idx].reta[shift];
+ }
+
+ ret = sxe2_drv_rss_lut_set(adapter, lut_tmp, reta_size);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to set rss lut");
+ goto l_end;
+ }
+
+ rte_memcpy(rss_ctxt->rss_lut, lut_tmp, reta_size);
+
+l_end:
+ if (lut_tmp)
+ rte_free(lut_tmp);
+ return ret;
+}
+
+int32_t sxe2_dev_rss_reta_query(struct rte_eth_dev *dev,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t reta_size)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+ int32_t ret = 0;
+ uint16_t i;
+ uint16_t shift;
+ uint16_t idx;
+
+ if (!adapter->rss_ctxt.inited) {
+ PMD_DEV_LOG_INFO(adapter, DRV, "RSS not inited.");
+ ret = -ENOTSUP;
+ goto l_end;
+ }
+
+ if (reta_size != rss_ctxt->rss_lut_size) {
+ PMD_LOG_ERR(INIT, "The size of hash lookup table configured "
+ "(%d) doesn't match the number of hardware can "
+ "support (%d)", reta_size, rss_ctxt->rss_lut_size);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ for (i = 0; i < reta_size; i++) {
+ idx = i / RTE_ETH_RETA_GROUP_SIZE;
+ shift = i % RTE_ETH_RETA_GROUP_SIZE;
+ if (reta_conf[idx].mask & (1ULL << shift))
+ reta_conf[idx].reta[shift] = rss_ctxt->rss_lut[i];
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t sxe2_rss_hash_key_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+ int32_t ret = 0;
+ uint16_t i;
+
+ if (rss_conf->rss_key_len == 0 || rss_conf->rss_key == NULL)
+ goto l_end;
+
+ if (rss_conf->rss_key_len != rss_ctxt->rss_key_size) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "The size of hash key configured "
+ "(%d) doesn't match the size of hardware can "
+ "support (%d)", rss_conf->rss_key_len,
+ rss_ctxt->rss_key_size);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ for (i = 0; i < rss_conf->rss_key_len; i++) {
+ if (rss_conf->rss_key[i] != rss_ctxt->rss_key[i])
+ break;
+ }
+ if (i == rss_conf->rss_key_len)
+ goto l_end;
+
+ ret = sxe2_drv_rss_key_set(adapter, rss_conf->rss_key,
+ rss_conf->rss_key_len);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to set rss key");
+ goto l_end;
+ }
+
+ rte_memcpy(rss_ctxt->rss_key, rss_conf->rss_key, rss_conf->rss_key_len);
+l_end:
+ return ret;
+}
+
+int32_t sxe2_dev_rss_hash_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+ int32_t ret = -1;
+
+ if (!adapter->rss_ctxt.inited) {
+ PMD_DEV_LOG_INFO(adapter, DRV, "RSS not inited.");
+ ret = -ENOTSUP;
+ goto l_end;
+ }
+
+ ret = sxe2_rss_hash_key_update(dev, rss_conf);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to set rss hash key");
+ goto l_end;
+ }
+
+ if (rss_conf->algorithm != rss_ctxt->hash_func) {
+ ret = sxe2_rss_hash_function_set(dev, rss_conf->algorithm);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to set rss hash function");
+ goto l_end;
+ }
+ rss_ctxt->hash_func = rss_conf->algorithm;
+ }
+
+ if ((rss_conf->rss_hf & SXE2_RSS_HF_SUPPORT_ALL)
+ != rss_ctxt->rss_hf) {
+ ret = sxe2_rss_hf_type_set(dev, rss_conf->rss_hf);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to set rss hf type");
+ goto l_end;
+ }
+ }
+ ret = 0;
+l_end:
+ return ret;
+}
+
+int32_t sxe2_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_rss_context *rss_ctxt = &adapter->rss_ctxt;
+ int32_t ret = 0;
+
+ if (adapter->rss_ctxt.inited == 0) {
+ PMD_DEV_LOG_INFO(adapter, DRV, "RSS not inited.");
+ ret = -ENOTSUP;
+ goto l_end;
+ }
+
+ if (rss_conf->rss_key) {
+ rss_conf->rss_key_len = rss_ctxt->rss_key_size;
+ rte_memcpy(rss_conf->rss_key, rss_ctxt->rss_key, rss_ctxt->rss_key_size);
+ }
+ rss_conf->rss_hf = rss_ctxt->rss_hf;
+ rss_conf->algorithm = rss_ctxt->hash_func;
+l_end:
+ return ret;
+}
--git a/drivers/net/sxe2/sxe2_rss.h b/drivers/net/sxe2/sxe2_rss.h
new file mode 100644
index 0000000000..2a454ac1b3
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_rss.h
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef __SXE2_RSS_H__
+#define __SXE2_RSS_H__
+#include <ethdev_driver.h>
+#include "sxe2_osal.h"
+#include "sxe2_drv_cmd.h"
+#include "sxe2_flow_define.h"
+
+#define SXE2_FLOW_END (0xFFFF)
+
+struct sxe2_rss_context {
+ enum rte_eth_hash_function hash_func;
+ uint16_t rss_key_size;
+ uint16_t rss_lut_size;
+ uint8_t *rss_key;
+ uint8_t *rss_lut;
+ uint64_t rss_hf;
+ uint8_t symm;
+ bool inited;
+};
+
+struct sxe2_rss_hf_config {
+ uint64_t rss_hf;
+ uint16_t hdrs[SXE2_FLOW_HDR_MAX];
+ uint16_t flds[SXE2_FLOW_FLD_ID_MAX];
+ uint8_t symm;
+};
+
+#define SXE2_RSS_HF_SUPPORT_ALL ( \
+ RTE_ETH_RSS_IPV4 | \
+ RTE_ETH_RSS_FRAG_IPV4 | \
+ RTE_ETH_RSS_NONFRAG_IPV4_TCP | \
+ RTE_ETH_RSS_NONFRAG_IPV4_UDP | \
+ RTE_ETH_RSS_NONFRAG_IPV4_SCTP | \
+ RTE_ETH_RSS_NONFRAG_IPV4_OTHER | \
+ RTE_ETH_RSS_IPV6 | \
+ RTE_ETH_RSS_FRAG_IPV6 | \
+ RTE_ETH_RSS_NONFRAG_IPV6_TCP | \
+ RTE_ETH_RSS_NONFRAG_IPV6_UDP | \
+ RTE_ETH_RSS_NONFRAG_IPV6_SCTP | \
+ RTE_ETH_RSS_NONFRAG_IPV6_OTHER | \
+ RTE_ETH_RSS_L2_PAYLOAD)
+
+struct sxe2_adapter;
+
+void sxe2_sw_rss_ctx_hw_cap_set(struct sxe2_adapter *adapter,
+ struct sxe2_drv_rss_hash_caps *rss_caps);
+
+int32_t sxe2_rss_hash_key_init(struct rte_eth_dev *dev);
+
+void sxe2_rss_hash_key_uninit(struct rte_eth_dev *dev);
+
+int32_t sxe2_rss_lut_init(struct rte_eth_dev *dev);
+
+void sxe2_rss_lut_uninit(struct rte_eth_dev *dev);
+
+int32_t sxe2_rss_init(struct rte_eth_dev *dev);
+
+int32_t sxe2_rss_hash_function_set(struct rte_eth_dev *dev, enum rte_eth_hash_function func);
+
+int32_t sxe2_rss_hf_type_set(struct rte_eth_dev *dev, uint64_t rss_hf);
+
+int32_t sxe2_rss_disable(struct rte_eth_dev *dev);
+
+int32_t sxe2_dev_rss_reta_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t reta_size);
+
+int32_t sxe2_dev_rss_reta_query(struct rte_eth_dev *dev,
+ struct rte_eth_rss_reta_entry64 *reta_conf,
+ uint16_t reta_size);
+
+int32_t sxe2_dev_rss_hash_update(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf);
+
+int32_t sxe2_dev_rss_hash_conf_get(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf);
+#endif /* __SXE2_RSS_H__ */
diff --git a/drivers/net/sxe2/sxe2_txrx.h b/drivers/net/sxe2/sxe2_txrx.h
index 6f6ff3e3d1..2ff5fee7c7 100644
--- a/drivers/net/sxe2/sxe2_txrx.h
+++ b/drivers/net/sxe2/sxe2_txrx.h
@@ -23,4 +23,8 @@ int32_t sxe2_tx_burst_mode_get(struct rte_eth_dev *dev,
int32_t sxe2_rx_burst_mode_get(struct rte_eth_dev *dev,
__rte_unused uint16_t queue_id, struct rte_eth_burst_mode *mode);
+#ifndef RTE_LIBRTE_SXE2_16BYTE_RX_DESC
+int32_t sxe2_rx_update_ptp_time(struct sxe2_rx_queue *rxq);
+#endif
+
#endif /* SXE2_TXRX_H */
diff --git a/drivers/net/sxe2/sxe2_txrx_poll.c b/drivers/net/sxe2/sxe2_txrx_poll.c
index 21d5c38725..3c6fe37404 100644
--- a/drivers/net/sxe2/sxe2_txrx_poll.c
+++ b/drivers/net/sxe2/sxe2_txrx_poll.c
@@ -17,6 +17,7 @@
#include "sxe2_queue.h"
#include "sxe2_ethdev.h"
#include "sxe2_common_log.h"
+#include "sxe2_cmd_chnl.h"
static __rte_always_inline int32_t
sxe2_tx_bufs_free(struct sxe2_tx_queue *txq)
@@ -282,6 +283,30 @@ sxe2_tx_desc_checksum_fill(uint64_t offloads, uint32_t *desc_cmd, uint32_t *desc
return;
}
+static __rte_always_inline void sxe2_desc_tso_fill(struct rte_mbuf *tx_pkt,
+ uint64_t *desc_type_cmd_tso_mss,
+ union sxe2_tx_offload_info ol_info)
+{
+ uint32_t hdr_len;
+ uint32_t tso_len;
+
+ if (!ol_info.l4_len) {
+ PMD_LOG_DEBUG(TX, "TSO ERROR: L4 length is 0");
+ goto l_end;
+ }
+ hdr_len = ol_info.l2_len + ol_info.l3_len + ol_info.l4_len;
+ if (tx_pkt->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK)
+ hdr_len += ol_info.outer_l2_len + ol_info.outer_l3_len;
+
+ tso_len = tx_pkt->pkt_len - hdr_len;
+ *desc_type_cmd_tso_mss |=
+ ((uint64_t)SXE2_TX_CTXT_DESC_CMD_TSO << SXE2_TX_CTXT_DESC_CMD_SHIFT) |
+ ((uint64_t)tso_len << SXE2_TX_CTXT_DESC_TSO_LEN_SHIFT) |
+ ((uint64_t)tx_pkt->tso_segsz << SXE2_TX_CTXT_DESC_MSS_SHIFT);
+l_end:
+ return;
+}
+
static __rte_always_inline uint64_t
sxe2_tx_data_desc_build_cobt(uint32_t cmd, uint32_t offset, uint16_t buf_size, uint16_t l2tag)
{
@@ -395,6 +420,11 @@ uint16_t sxe2_tx_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkt
rte_pktmbuf_free_seg(buffer->mbuf);
buffer->mbuf = NULL;
}
+ if (offloads & (RTE_MBUF_F_TX_TCP_SEG | RTE_MBUF_F_TX_UDP_SEG))
+ sxe2_desc_tso_fill(tx_pkt,
+ &desc_type_cmd_tso_mss, ol_info);
+ else if (offloads & RTE_MBUF_F_TX_IEEE1588_TMST)
+ desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_CMD_TSYN_MASK;
if (offloads & RTE_MBUF_F_TX_QINQ) {
desc_l2tag2 = tx_pkt->vlan_tci_outer;
@@ -707,6 +737,57 @@ sxe2_rx_desc_filter_para_fill(struct sxe2_rx_queue *rxq __rte_unused,
#endif
}
+#ifndef RTE_LIBRTE_SXE2_16BYTE_RX_DESC
+int32_t sxe2_rx_update_ptp_time(struct sxe2_rx_queue *rxq)
+{
+ struct sxe2_adapter *adapter;
+ uint64_t cur_time_ms;
+ int32_t ret = 0;
+ cur_time_ms = rte_get_timer_cycles() / (rte_get_timer_hz() / 1000);
+
+ if (likely((cur_time_ms - rxq->update_time) < SXE2_RX_PKTS_TS_TIMEOUT_VAL))
+ goto l_end;
+ rxq->update_time = cur_time_ms;
+ adapter = rxq->vsi->adapter;
+ rxq->ts_need_update = true;
+ ret = sxe2_drv_ptp_gettime(adapter, rxq);
+ if (rxq->desc_ts < rxq->ts_low)
+ rxq->ts_need_update = false;
+
+ PMD_LOG_INFO(RX, "rxq update time ret=%d, cur time=%" PRIu64 ", rxqh=%" PRIu64 ", rxql=%d",
+ ret, cur_time_ms, rxq->ts_high, rxq->ts_low);
+l_end:
+ return ret;
+}
+
+static inline void sxe2_rx_desc_ptp_para_fill(struct sxe2_rx_queue *rxq,
+ struct rte_mbuf *mbuf,
+ union sxe2_rx_desc *desc)
+{
+ struct sxe2_adapter *adapter = rxq->vsi->adapter;
+ uint64_t ts_ns;
+
+ if (adapter->ptp_ctxt.mbuf_rx_ts_flag != 0 &&
+ (rxq->offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP) &&
+ SXE2_RX_DESC_RXDID_VAL_GET(desc->wb.rxdid_src) == SXE2_RX_DESC_RXDID_1588) {
+ rxq->desc_ts = rte_le_to_cpu_32(desc->wb_ts.ts_h);
+ (void)sxe2_rx_update_ptp_time(rxq);
+ if (rxq->ts_need_update && rxq->desc_ts < rxq->ts_low)
+ rxq->ts_high += 1;
+
+ rxq->ts_need_update = true;
+ rxq->ts_low = rxq->desc_ts;
+ rxq->update_time = rte_get_timer_cycles() /
+ (rte_get_timer_hz() / 1000);
+ ts_ns = rxq->ts_high * NSEC_PER_SEC + rxq->ts_low;
+ *RTE_MBUF_DYNFIELD(mbuf, adapter->ptp_ctxt.mbuf_rx_ts_offset, uint64_t *) = ts_ns;
+ mbuf->ol_flags |= adapter->ptp_ctxt.mbuf_rx_ts_flag;
+ PMD_LOG_INFO(RX, "receive ptp pkt,ts_s=%" PRIu64 ", ts_ns=%d", rxq->ts_high,
+ rxq->ts_low);
+ }
+}
+#endif
+
static __rte_always_inline void
sxe2_rx_mbuf_common_fields_fill(struct sxe2_rx_queue *rxq, struct rte_mbuf *mbuf,
union sxe2_rx_desc *rxd)
@@ -718,10 +799,12 @@ sxe2_rx_mbuf_common_fields_fill(struct sxe2_rx_queue *rxq, struct rte_mbuf *mbuf
mbuf->ol_flags = 0;
mbuf->packet_type = ptype_tbl[SXE2_RX_DESC_PTYPE_VAL_GET(qword1)];
-
pkt_flags = sxe2_rx_desc_error_para(rxq, rxd);
sxe2_rx_desc_vlan_para_fill(mbuf, rxd);
sxe2_rx_desc_filter_para_fill(rxq, mbuf, rxd);
+#ifndef RTE_LIBRTE_SXE2_16BYTE_RX_DESC
+ sxe2_rx_desc_ptp_para_fill(rxq, mbuf, rxd);
+#endif
mbuf->ol_flags |= pkt_flags;
}
--
2.52.0
^ permalink raw reply related
* [PATCH v13 20/20] net/sxe2: update sxe2 feature matrix docs
From: liujie5 @ 2026-06-08 7:42 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260608074257.3043531-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
Update the sxe2.ini feature sheet to accurately reflect the recently
implemented hardware capabilities in the sxe2 PMD.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
doc/guides/nics/features/sxe2.ini | 56 +++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/doc/guides/nics/features/sxe2.ini b/doc/guides/nics/features/sxe2.ini
index 09ba2f558c..3c1e6a8a39 100644
--- a/doc/guides/nics/features/sxe2.ini
+++ b/doc/guides/nics/features/sxe2.ini
@@ -7,17 +7,73 @@
; is selected.
;
[Features]
+Speed capabilities = Y
+Link status = Y
+Link status event = Y
+Rx interrupt = Y
Fast mbuf free = P
Free Tx mbuf on demand = Y
Burst mode info = Y
Queue start/stop = Y
+Power mgmt address monitor = Y
Buffer split on Rx = P
Scattered Rx = Y
+Traffic manager = Y
CRC offload = Y
+VLAN offload = Y
+QinQ offload = P
L3 checksum offload = Y
L4 checksum offload = Y
+Timestamp offload = P
+Inner L3 checksum = P
+Inner L4 checksum = P
Rx descriptor status = Y
Tx descriptor status = Y
+MTU update = Y
+TSO = P
+Promiscuous mode = Y
+Allmulticast mode = Y
+Unicast MAC filter = Y
+RSS hash = Y
+RSS key update = Y
+RSS reta update = Y
+VLAN filter = Y
+Inline crypto = Y
+Packet type parsing = Y
+Timesync = Y
+Basic stats = Y
+Extended stats = Y
+FW version = Y
+Module EEPROM dump = Y
+Multiprocess aware = Y
Linux = Y
x86-32 = Y
x86-64 = Y
+
+[rte_flow items]
+eth = P
+geneve = Y
+gre = Y
+gtpu = Y
+ipv4 = Y
+ipv6 = Y
+ipv6_frag_ext = Y
+nvgre = Y
+sctp = Y
+tcp = Y
+udp = Y
+vlan = P
+vxlan = Y
+vxlan_gpe = Y
+
+[rte_flow actions]
+count = Y
+drop = Y
+mark = Y
+passthru = Y
+port_representor = Y
+queue = Y
+represented_port = Y
+rss = Y
+send_to_kernel = Y
+port_id = Y
--
2.52.0
^ permalink raw reply related
* [PATCH v13 19/20] drivers: add testpmd commands for private features
From: liujie5 @ 2026-06-08 7:42 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260608074257.3043531-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
Introduce private testpmd commands and implementation files to enable
debugging and testing of sxe2-specific hardware features (such as
packet scheduling reset, UDP tunnel configuration, and IPsec ingress/
egress offloads) directly within the testpmd application.
The parameters are parsed using the standard 'rte_kvargs' library during
the PCI/vdev probing phase. Documentation for these parameters is also
updated.
During memory hotplug events, the SXE2 driver needs to track memory
segment layout changes to maintain internal DMA mappings. However,
existing memseg walk functions (rte_memseg_walk) acquire memory locks
and cannot be called from within memory event callbacks, leading to
potential deadlocks.
This commit introduces sxe2_memseg_walk_cb() as a helper that walks
memory segments using the thread-unsafe variant
rte_memseg_walk_thread_unsafe(), which is safe to call from
memory-related callbacks [citation:1][citation:3][citation:5].
The implementation follows the standard rte_memseg_walk_t prototype,
processing each memseg to update driver-specific data structures.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
drivers/common/sxe2/sxe2_common.c | 110 +++
drivers/common/sxe2/sxe2_common.h | 2 +
drivers/common/sxe2/sxe2_ioctl_chnl.c | 2 +-
drivers/net/sxe2/meson.build | 5 +-
drivers/net/sxe2/sxe2_cmd_chnl.c | 21 +
drivers/net/sxe2/sxe2_cmd_chnl.h | 3 +
drivers/net/sxe2/sxe2_drv_cmd.h | 17 +
drivers/net/sxe2/sxe2_dump.c | 15 +
drivers/net/sxe2/sxe2_ethdev.c | 287 +++++++-
drivers/net/sxe2/sxe2_ethdev.h | 11 +-
drivers/net/sxe2/sxe2_irq.c | 34 +-
drivers/net/sxe2/sxe2_rx.c | 12 +
drivers/net/sxe2/sxe2_testpmd.c | 733 +++++++++++++++++++
drivers/net/sxe2/sxe2_testpmd_lib.c | 969 ++++++++++++++++++++++++++
drivers/net/sxe2/sxe2_testpmd_lib.h | 142 ++++
drivers/net/sxe2/sxe2_tm.c | 18 +
drivers/net/sxe2/sxe2_tm.h | 2 +
17 files changed, 2374 insertions(+), 9 deletions(-)
create mode 100644 drivers/net/sxe2/sxe2_testpmd.c
create mode 100644 drivers/net/sxe2/sxe2_testpmd_lib.c
create mode 100644 drivers/net/sxe2/sxe2_testpmd_lib.h
diff --git a/drivers/common/sxe2/sxe2_common.c b/drivers/common/sxe2/sxe2_common.c
index a5d36998e1..5e6e13dd19 100644
--- a/drivers/common/sxe2/sxe2_common.c
+++ b/drivers/common/sxe2/sxe2_common.c
@@ -196,6 +196,102 @@ static int32_t sxe2_parse_representor(const char *key, const char *value, void *
PMD_LOG_INFO(COM, "representor arg %s: \"%s\".", key, value);
+l_end:
+ return ret;
+}
+static int32_t sxe2_dma_mem_map(struct sxe2_common_device *cdev,
+ const void *addr, size_t len, bool do_map)
+{
+ struct rte_memseg_list *msl;
+ struct rte_memseg *ms;
+ size_t cur_len = 0;
+ int32_t ret = 0;
+
+ msl = rte_mem_virt2memseg_list(addr);
+ if (msl == NULL) {
+ ret = -EINVAL;
+ PMD_LOG_ERR(COM, "Invalid virt addr=%p.", addr);
+ goto l_end;
+ }
+
+ if ((uintptr_t)addr != RTE_ALIGN((uintptr_t)addr, msl->page_sz) ||
+ (len != RTE_ALIGN(len, msl->page_sz))) {
+ ret = -EINVAL;
+ PMD_LOG_ERR(COM, "Addr=%p and len=%zu not align page size=%" PRIu64 ".",
+ addr, len, msl->page_sz);
+ goto l_end;
+ }
+
+ /* memsegs are contiguous in memory */
+ ms = rte_mem_virt2memseg(addr, msl);
+ while (cur_len < len) {
+ /* some memory segments may have invalid IOVA */
+ if (ms->iova == RTE_BAD_IOVA) {
+ PMD_LOG_WARN(COM, "Memory segment at %p has bad IOVA, skipping.",
+ ms->addr);
+ goto next;
+ }
+ if (do_map)
+ sxe2_drv_dev_dma_map(cdev, ms->addr_64,
+ ms->iova, ms->len);
+ else
+ sxe2_drv_dev_dma_unmap(cdev, ms->iova);
+
+next:
+ cur_len += ms->len;
+ ++ms;
+ }
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_INTERNAL_SYMBOL(sxe2_common_mem_event_cb)
+void
+sxe2_common_mem_event_cb(enum rte_mem_event type,
+ const void *addr, size_t size, void *arg __rte_unused)
+{
+ struct sxe2_common_device *cdev = NULL;
+
+ if (rte_eal_process_type() != RTE_PROC_PRIMARY)
+ goto l_end;
+
+ pthread_mutex_lock(&sxe2_common_devices_list_lock);
+ switch (type) {
+ case RTE_MEM_EVENT_FREE:
+ TAILQ_FOREACH(cdev, &sxe2_common_devices_list, next)
+ (void)sxe2_dma_mem_map(cdev, addr, size, 0);
+ break;
+ case RTE_MEM_EVENT_ALLOC:
+ TAILQ_FOREACH(cdev, &sxe2_common_devices_list, next)
+ (void)sxe2_dma_mem_map(cdev, addr, size, 1);
+ break;
+ default:
+ break;
+ }
+ pthread_mutex_unlock(&sxe2_common_devices_list_lock);
+l_end:
+ return;
+}
+
+static int32_t sxe2_memseg_walk_cb(const struct rte_memseg_list *msl,
+ const struct rte_memseg *ms, void *arg)
+{
+ struct sxe2_common_device *cdev = arg;
+ int32_t ret = 0;
+
+ if (msl->external && !msl->heap)
+ goto l_end;
+
+ if (ms->iova == RTE_BAD_IOVA)
+ goto l_end;
+
+ ret = sxe2_drv_dev_dma_map(cdev, ms->addr_64, ms->iova, ms->len);
+ if (ret != 0) {
+ PMD_LOG_ERR(COM, "Fail to memseg dma map.");
+ goto l_end;
+ }
+
l_end:
return ret;
}
@@ -220,6 +316,18 @@ static int32_t sxe2_common_device_setup(struct sxe2_common_device *cdev)
goto l_close_dev;
}
+ rte_mcfg_mem_read_lock();
+ ret = rte_memseg_walk_thread_unsafe(sxe2_memseg_walk_cb, cdev);
+ if (ret) {
+ PMD_LOG_ERR(COM, "Fail to walk memseg, ret=%d", ret);
+ rte_mcfg_mem_read_unlock();
+ goto l_close_dev;
+ }
+ rte_mcfg_mem_read_unlock();
+
+ (void)rte_mem_event_callback_register("SXE2_MEM_EVENT_CB",
+ sxe2_common_mem_event_cb, NULL);
+
goto l_end;
l_close_dev:
@@ -251,6 +359,7 @@ static struct sxe2_common_device *sxe2_common_device_alloc(
}
cdev->dev = rte_dev;
cdev->class_type = class_type;
+ cdev->config.cmd_fd = SXE2_CMD_FD_INVALID;
cdev->config.kernel_reset = false;
pthread_mutex_init(&cdev->config.lock, NULL);
@@ -631,6 +740,7 @@ static int32_t sxe2_common_pci_id_table_update(const struct rte_pci_id *id_table
updated_table = calloc(num_ids, sizeof(*updated_table));
if (!updated_table) {
+ ret = -ENOMEM;
PMD_LOG_ERR(COM, "Failed to allocate memory for PCI ID table");
goto l_end;
}
diff --git a/drivers/common/sxe2/sxe2_common.h b/drivers/common/sxe2/sxe2_common.h
index b02b6317da..efc8d3585a 100644
--- a/drivers/common/sxe2/sxe2_common.h
+++ b/drivers/common/sxe2/sxe2_common.h
@@ -14,6 +14,8 @@
#define SXE2_COMMON_PCI_DRIVER_NAME "sxe2_pci"
+#define SXE2_CMD_FD_INVALID (-1)
+
#define SXE2_CDEV_TO_CMD_FD(cdev) \
((cdev)->config.cmd_fd)
diff --git a/drivers/common/sxe2/sxe2_ioctl_chnl.c b/drivers/common/sxe2/sxe2_ioctl_chnl.c
index 173d8d57ae..a233a78136 100644
--- a/drivers/common/sxe2/sxe2_ioctl_chnl.c
+++ b/drivers/common/sxe2/sxe2_ioctl_chnl.c
@@ -110,7 +110,7 @@ sxe2_drv_dev_close(struct sxe2_common_device *cdev)
if (fd >= 0)
close(fd);
PMD_LOG_INFO(COM, "closed device fd=%d", fd);
- SXE2_CDEV_TO_CMD_FD(cdev) = -1;
+ SXE2_CDEV_TO_CMD_FD(cdev) = SXE2_CMD_FD_INVALID;
}
RTE_EXPORT_INTERNAL_SYMBOL(sxe2_drv_dev_handshake)
diff --git a/drivers/net/sxe2/meson.build b/drivers/net/sxe2/meson.build
index 4fb2333926..04369402b7 100644
--- a/drivers/net/sxe2/meson.build
+++ b/drivers/net/sxe2/meson.build
@@ -9,9 +9,10 @@ endif
cflags += ['-g']
-deps += ['common_sxe2', 'hash','cryptodev','security']
+deps += ['common_sxe2', 'hash', 'cryptodev', 'security', 'cmdline']
includes += include_directories('../../common/sxe2')
+testpmd_sources = files('sxe2_testpmd.c')
if arch_subdir == 'x86'
sources += files('sxe2_txrx_vec_sse.c')
@@ -79,5 +80,5 @@ sources += files(
'sxe2_flow_parse_engine.c',
'sxe2_dump.c',
'sxe2_txrx_check_mbuf.c',
-
+ 'sxe2_testpmd_lib.c',
)
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.c b/drivers/net/sxe2/sxe2_cmd_chnl.c
index 43e8c59487..b09989fe50 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.c
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.c
@@ -99,6 +99,27 @@ int32_t sxe2_drv_dev_info_get(struct sxe2_adapter *adapter,
return ret;
}
+int32_t sxe2_drv_fc_state_get(struct sxe2_adapter *adapter,
+ struct sxe2_drv_vsi_fc_get_resp *dev_fc_state_resp)
+{
+ int32_t ret = 0;
+ struct sxe2_common_device *cdev = adapter->cdev;
+ struct sxe2_drv_cmd_params param = {0};
+ struct sxe2_drv_vsi_fc_get_req req = {0};
+
+ req.vsi_id = adapter->vsi_ctxt.main_vsi->vsi_id;
+ sxe2_drv_cmd_params_fill(adapter, ¶m, SXE2_DRV_CMD_VSI_FC_GET,
+ &req, sizeof(req),
+ dev_fc_state_resp,
+ sizeof(*dev_fc_state_resp));
+ ret = sxe2_drv_cmd_exec(cdev, ¶m);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "get fc state failed, ret=%d", ret);
+ ret = -EIO;
+ }
+ return ret;
+}
+
int32_t sxe2_drv_dev_fw_info_get(struct sxe2_adapter *adapter,
struct sxe2_drv_dev_fw_info_resp *dev_fw_info_resp)
{
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.h b/drivers/net/sxe2/sxe2_cmd_chnl.h
index 988d4b458b..d63caad526 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.h
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.h
@@ -99,6 +99,9 @@ int32_t sxe2_drv_vsi_stats_reset(struct sxe2_adapter *adapter);
int32_t sxe2_drv_queue_info_get_update(struct sxe2_adapter *adapter,
struct eth_queue_stats *qstats);
+int32_t sxe2_drv_fc_state_get(struct sxe2_adapter *adapter,
+ struct sxe2_drv_vsi_fc_get_resp *dev_fc_state_resp);
+
int32_t sxe2_drv_rxq_mapping_set(struct rte_eth_dev *eth_dev, uint16_t queue_id, uint8_t pool_idx);
int32_t sxe2_drv_txq_mapping_set(struct rte_eth_dev *eth_dev, uint16_t queue_id, uint8_t pool_idx);
diff --git a/drivers/net/sxe2/sxe2_drv_cmd.h b/drivers/net/sxe2/sxe2_drv_cmd.h
index 09b2f7d125..59a8aa6f13 100644
--- a/drivers/net/sxe2/sxe2_drv_cmd.h
+++ b/drivers/net/sxe2/sxe2_drv_cmd.h
@@ -651,6 +651,23 @@ struct __rte_aligned(4) __rte_packed_begin sxe2_drv_sfp_resp {
uint8_t data[];
} __rte_packed_end;
+enum sxe2_fc_type {
+ SXE2_FC_T_DIS = 0,
+ SXE2_FC_T_LFC,
+ SXE2_FC_T_PFC,
+ SXE2_FC_T_UNKNOWN = 255,
+};
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_vsi_fc_get_req {
+ uint16_t vsi_id;
+ uint8_t rsv[2];
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_vsi_fc_get_resp {
+ uint8_t fc_enable;
+ uint8_t rsv[3];
+} __rte_packed_end;
+
enum sxe2_drv_cmd_module {
SXE2_DRV_CMD_MODULE_HANDSHAKE = 0,
SXE2_DRV_CMD_MODULE_DEV = 1,
diff --git a/drivers/net/sxe2/sxe2_dump.c b/drivers/net/sxe2/sxe2_dump.c
index 1753eccf99..fd0a99d6fd 100644
--- a/drivers/net/sxe2/sxe2_dump.c
+++ b/drivers/net/sxe2/sxe2_dump.c
@@ -188,6 +188,20 @@ static void sxe2_dump_filter_info(FILE *file, struct rte_eth_dev *dev)
return;
}
+static void sxe2_dump_fc_state(FILE *file, struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+
+ if (!(adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_FC_STATE))
+ goto l_end;
+
+ fprintf(file, " -- fc state:\n"
+ "\t -- curr_state: %u\n",
+ adapter->fc_state_ctx.curr_state);
+l_end:
+ return;
+}
+
static const char *sxe2_vsi_id_str(uint16_t vsi_id, char *buf, size_t len)
{
if (vsi_id == SXE2_INVALID_VSI_ID)
@@ -274,6 +288,7 @@ int32_t sxe2_eth_dev_priv_dump(struct rte_eth_dev *dev, FILE *file)
sxe2_dump_dev_args_info(str, dev);
sxe2_dump_filter_info(str, dev);
sxe2_dump_switchdev_info(str, dev);
+ sxe2_dump_fc_state(str, dev);
(void)fflush(str);
diff --git a/drivers/net/sxe2/sxe2_ethdev.c b/drivers/net/sxe2/sxe2_ethdev.c
index b5adc31d69..b70b3e4162 100644
--- a/drivers/net/sxe2/sxe2_ethdev.c
+++ b/drivers/net/sxe2/sxe2_ethdev.c
@@ -68,7 +68,14 @@ static const struct rte_pci_id pci_id_sxe2_tbl[] = {
{ RTE_PCI_DEVICE(SXE2_PCI_VENDOR_ID_206F, SXE2_PCI_DEVICE_ID_VF_1)},
{ .vendor_id = 0, },
};
-
+#define SXE2_TXSCH_NODE_ADJ_LVL_MAX 3
+#define SXE2_DEVARG_FLOW_DULP_PATTERN_MODE "flow-duplicate-pattern"
+#define SXE2_DEVARG_FUNC_FLOW_DIRCT "function-flow-direct"
+#define SXE2_DEVARG_FNAV_STAT_TYPE "fnav-stat-type"
+#define SXE2_DEVARG_SW_STATS "drv-sw-stats"
+#define SXE2_DEVARG_HIGH_PERFORMANCE_MODE "high-performance-mode"
+#define SXE2_DEVARG_SCHED_LAYER_MODE "sched-layer-mode"
+#define SXE2_DEVARG_RX_LOW_LATENCY "rx-low-latency"
static struct sxe2_pci_map_addr_info sxe2_net_map_addr_info_pf[SXE2_PCI_MAP_RES_MAX_COUNT] = {
[SXE2_PCI_MAP_RES_INVALID] = {.addr_base = 0,
.bar_idx = 0,
@@ -980,6 +987,149 @@ static inline void sxe2_init_ptype_tbl(struct rte_eth_dev *dev)
sxe2_init_ptype_list(ptype);
}
+static int32_t sxe2_parse_fnav_stat_type(const char *key, const char *value, void *args)
+{
+ int32_t ret = -EINVAL;
+ uint8_t *num = (uint8_t *)args;
+ uint8_t fnav_stat_type = 0;
+ char *endptr = NULL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ fnav_stat_type = (uint8_t)strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0') {
+ PMD_LOG_WARN(INIT, "%s: \"%s\" is not a valid int value.",
+ key, value);
+ goto l_end;
+ }
+ if (fnav_stat_type > SXE2_FNAV_STAT_ENA_ALL ||
+ fnav_stat_type == SXE2_FNAV_STAT_ENA_NONE) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" out of range [1-3].",
+ key, value);
+ goto l_end;
+ }
+ *num = fnav_stat_type;
+ ret = 0;
+l_end:
+ return ret;
+}
+static int32_t sxe2_parse_sched_layer_mode(const char *key, const char *value, void *args)
+{
+ int32_t ret = -EINVAL;
+ uint8_t *num = (uint8_t *)args;
+ uint8_t sched_layer_mode;
+ char *endptr = NULL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ sched_layer_mode = (uint8_t)strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0') {
+ PMD_LOG_WARN(INIT, "%s: \"%s\" is not a valid int value.",
+ key, value);
+ goto l_end;
+ }
+ if (sched_layer_mode > SXE2_TXSCH_NODE_ADJ_LVL_MAX) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" > 3.",
+ key, value);
+ goto l_end;
+ }
+ *num = sched_layer_mode;
+ ret = 0;
+l_end:
+ return ret;
+}
+static int32_t sxe2_parse_high_performance_mode(const char *key, const char *value, void *args)
+{
+ int32_t ret = -EINVAL;
+ uint8_t *num = (uint8_t *)args;
+ uint8_t high_performance_mode;
+ char *endptr = NULL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ high_performance_mode = (uint8_t)strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0') {
+ PMD_LOG_WARN(INIT, "%s: \"%s\" is not a valid int value.",
+ key, value);
+ goto l_end;
+ }
+ if (high_performance_mode != 1) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" != 1.",
+ key, value);
+ goto l_end;
+ }
+ *num = high_performance_mode;
+ ret = 0;
+l_end:
+ return ret;
+}
+static int32_t sxe2_parse_u8(const char *key, const char *value, void *args)
+{
+ uint8_t *num = (uint8_t *)args;
+ char *end;
+ unsigned long val;
+ int32_t ret = -EINVAL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ val = strtoul(value, &end, 10);
+ if (errno != 0 || end == value || *end != '\0') {
+ PMD_LOG_ERR(INIT, "Invalid 8-bit integer value for key %s: %s", key, value);
+ return -EINVAL;
+ }
+
+ if (val > UINT8_MAX) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" out of range [0-255].",
+ key, value);
+ return -ERANGE;
+ }
+
+ *num = val;
+ ret = 0;
+l_end:
+ return ret;
+}
+static int32_t sxe2_parse_bool(const char *key, const char *value, void *args)
+{
+ int32_t ret = -EINVAL;
+ uint8_t *num = (uint8_t *)args;
+ uint8_t bool_val = 0;
+ char *endptr = NULL;
+
+ if (value == NULL || args == NULL) {
+ ret = 0;
+ goto l_end;
+ }
+ errno = 0;
+ bool_val = (uint8_t)strtoul(value, &endptr, 10);
+ if (errno != 0 || *endptr != '\0') {
+ PMD_LOG_WARN(INIT, "%s: \"%s\" is not a valid int value.",
+ key, value);
+ goto l_end;
+ }
+ if (bool_val != 0 && bool_val != 1) {
+ PMD_LOG_ERR(INIT, "%s: \"%s\" out of range [0|1].",
+ key, value);
+ goto l_end;
+ }
+ *num = bool_val;
+ ret = 0;
+l_end:
+ return ret;
+}
+
struct sxe2_pci_map_bar_info *sxe2_dev_get_bar_info(struct sxe2_adapter *adapter,
enum sxe2_pci_map_resource res_type)
{
@@ -1047,6 +1197,69 @@ void *sxe2_pci_map_addr_get(struct sxe2_adapter *adapter,
return addr;
}
+static int32_t sxe2_args_parse(struct rte_eth_dev *dev, struct sxe2_dev_kvargs_info *kvargs)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ int32_t ret = 0;
+ PMD_INIT_FUNC_TRACE();
+
+ if (kvargs == NULL)
+ goto l_end;
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_FNAV_STAT_TYPE,
+ &sxe2_parse_fnav_stat_type,
+ &adapter->devargs.fnav_stat_type);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse fnav stat type, ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_SW_STATS,
+ &sxe2_parse_bool,
+ &adapter->devargs.sw_stats_en);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse sw stats enable, ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_HIGH_PERFORMANCE_MODE,
+ &sxe2_parse_high_performance_mode,
+ &adapter->devargs.high_performance_mode);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse high performance, ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_SCHED_LAYER_MODE,
+ &sxe2_parse_sched_layer_mode,
+ &adapter->devargs.sched_layer_mode);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse sched layer mode, ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_FLOW_DULP_PATTERN_MODE,
+ &sxe2_parse_u8,
+ &adapter->devargs.flow_dup_pattern_mode);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse switch dulpliate flow pattern mode,"
+ "ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_FUNC_FLOW_DIRCT,
+ &sxe2_parse_bool,
+ &adapter->devargs.func_flow_direct_en);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse function flow rule enable,"
+ "ret:%d", ret);
+ goto l_end;
+ }
+ ret = sxe2_kvargs_process(kvargs, SXE2_DEVARG_RX_LOW_LATENCY,
+ &sxe2_parse_bool,
+ &adapter->devargs.rx_low_latency);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, INIT, "Failed to parse rx low latency, ret:%d", ret);
+ goto l_end;
+ }
+l_end:
+ return ret;
+}
+
static int32_t sxe2_eth_init(struct rte_eth_dev *dev)
{
int32_t ret = 0;
@@ -1599,6 +1812,37 @@ void sxe2_dev_pci_map_uinit(struct rte_eth_dev *dev)
adapter->dev_info.dev_data = NULL;
}
+static int32_t sxe2_fc_state_init(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter =
+ SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_drv_vsi_fc_get_resp fc_resp = {0};
+ int32_t ret;
+
+ if (!(adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_FC_STATE)) {
+ adapter->fc_state_ctx.cfg_state = 0;
+ adapter->fc_state_ctx.curr_state = 0;
+ ret = 0;
+ goto l_end;
+ }
+ ret = sxe2_drv_fc_state_get(adapter, &fc_resp);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to get fc state, ret=[%d]", ret);
+ goto l_end;
+ }
+ adapter->fc_state_ctx.cfg_state = fc_resp.fc_enable;
+ adapter->fc_state_ctx.curr_state = 0;
+l_end:
+ return ret;
+}
+static void sxe2_fc_state_uinit(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter =
+ SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ adapter->fc_state_ctx.cfg_state = 0;
+ adapter->fc_state_ctx.curr_state = 0;
+}
+
uint32_t sxe2_sched_mode_get(struct sxe2_adapter *adapter)
{
uint32_t ret_mode = SXE2_SCHED_MODE_INVALID;
@@ -1661,6 +1905,32 @@ static int32_t sxe2_sched_uinit(struct rte_eth_dev *dev)
return ret;
}
+int32_t sxe2_sched_reset(struct rte_eth_dev *dev)
+{
+ int32_t ret = 0;
+
+ if (dev->data->dev_started) {
+ PMD_LOG_ERR(DRV, "Device failed to Stop.");
+ ret = -EPERM;
+ goto l_end;
+ }
+
+ ret = sxe2_tm_conf_reset(dev);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_sched_uinit(dev);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_sched_init(dev);
+ if (ret)
+ goto l_end;
+
+l_end:
+ return ret;
+}
+
static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
struct sxe2_dev_kvargs_info *kvargs __rte_unused)
{
@@ -1683,6 +1953,12 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
sxe2_init_ptype_tbl(dev);
+ ret = sxe2_args_parse(dev, kvargs);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to parse devargs, ret=%d", ret);
+ goto l_end;
+ }
+
ret = sxe2_hw_init(dev);
if (ret) {
PMD_LOG_ERR(INIT, "Failed to initialize hw, ret=[%d]", ret);
@@ -1749,6 +2025,12 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
goto init_flow_err;
}
+ ret = sxe2_fc_state_init(dev);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to init fc state, ret=%d", ret);
+ goto init_fc_state_err;
+ }
+
ret = sxe2_sched_init(dev);
if (ret) {
PMD_LOG_ERR(INIT, "Failed to init sched, ret=%d", ret);
@@ -1772,6 +2054,8 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
init_xstats_err:
(void)sxe2_sched_uinit(dev);
init_sched_err:
+ sxe2_fc_state_uinit(dev);
+init_fc_state_err:
(void)sxe2_flow_uninit(dev);
init_flow_err:
init_rss_err:
@@ -1817,6 +2101,7 @@ static int32_t sxe2_dev_close(struct rte_eth_dev *dev)
sxe2_eth_uinit(dev);
sxe2_dev_pci_map_uinit(dev);
sxe2_free_repr_info(dev);
+ sxe2_fc_state_uinit(dev);
l_end:
return 0;
diff --git a/drivers/net/sxe2/sxe2_ethdev.h b/drivers/net/sxe2/sxe2_ethdev.h
index b103679c78..14ad26b439 100644
--- a/drivers/net/sxe2/sxe2_ethdev.h
+++ b/drivers/net/sxe2/sxe2_ethdev.h
@@ -311,6 +311,11 @@ struct sxe2_filter_context {
bool cur_l2_config;
};
+struct sxe2_fc_state_ctxt {
+ uint8_t curr_state;
+ uint8_t cfg_state;
+};
+
struct sxe2_adapter {
struct sxe2_common_device *cdev;
struct sxe2_dev_info dev_info;
@@ -332,6 +337,7 @@ struct sxe2_adapter {
struct sxe2_security_ctx security_ctx;
struct sxe2_repr_context repr_ctxt;
struct sxe2_switchdev_info switchdev_info;
+ struct sxe2_fc_state_ctxt fc_state_ctx;
bool rule_started;
bool flow_isolated;
bool flow_isolate_cfg;
@@ -351,9 +357,6 @@ struct sxe2_adapter {
#define SXE2_DEV_PRIVATE_TO_ADAPTER(dev) \
((struct sxe2_adapter *)(dev)->data->dev_private)
-#define SXE2_DEV_TO_PCI(eth_dev) \
- RTE_DEV_TO_PCI((eth_dev)->device)
-
void *sxe2_pci_map_addr_get(struct sxe2_adapter *adapter,
enum sxe2_pci_map_resource res_type,
uint16_t idx_in_func);
@@ -362,6 +365,8 @@ bool sxe2_ethdev_check(struct rte_eth_dev *dev);
uint32_t sxe2_sched_mode_get(struct sxe2_adapter *adapter);
+int32_t sxe2_sched_reset(struct rte_eth_dev *dev);
+
struct sxe2_pci_map_bar_info *sxe2_dev_get_bar_info(struct sxe2_adapter *adapter,
enum sxe2_pci_map_resource res_type);
diff --git a/drivers/net/sxe2/sxe2_irq.c b/drivers/net/sxe2/sxe2_irq.c
index c26098ef3a..3306504761 100644
--- a/drivers/net/sxe2/sxe2_irq.c
+++ b/drivers/net/sxe2/sxe2_irq.c
@@ -10,6 +10,7 @@
#include <rte_alarm.h>
#include <fcntl.h>
#include <rte_stdatomic.h>
+#include <rte_common.h>
#include "sxe2_ethdev.h"
#include "sxe2_irq.h"
@@ -47,6 +48,31 @@ static struct sxe2_event_handler event_handler = {
static RTE_ATOMIC(uint32_t)event_thread_run;
+static int32_t sxe2_fc_state_callback(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_drv_vsi_fc_get_resp fc_resp = {0};
+ int32_t ret;
+
+ if (!(adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_FC_STATE)) {
+ ret = 0;
+ goto l_end;
+ }
+ ret = sxe2_drv_fc_state_get(adapter, &fc_resp);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to get fc state, ret=[%d]", ret);
+ goto l_end;
+ }
+ adapter->fc_state_ctx.cfg_state = fc_resp.fc_enable;
+ if (dev->data->dev_started) {
+ PMD_LOG_NOTICE(DRV, "Interrupt event: FC status changed."
+ "cfg_state:%u curr_state:%u",
+ adapter->fc_state_ctx.cfg_state,
+ adapter->fc_state_ctx.curr_state);
+ }
+l_end:
+ return ret;
+}
static void sxe2_event_irq_common_handler(struct sxe2_adapter *adapter, uint64_t oicr)
{
@@ -68,6 +94,10 @@ static void sxe2_event_irq_common_handler(struct sxe2_adapter *adapter, uint64_t
PMD_DEV_LOG_INFO(adapter, DRV, "event notify legacy");
(void)sxe2_switchdev_notify_callback(adapter, false);
}
+ if (oicr & RTE_BIT32(SXE2_COM_FC_ST_CHANGE)) {
+ PMD_DEV_LOG_INFO(adapter, DRV, "fc event notify legacy");
+ (void)sxe2_fc_state_callback(dev);
+ }
}
static uint32_t sxe2_event_intr_handle(void *param __rte_unused)
@@ -436,7 +466,7 @@ int32_t sxe2_intr_init(struct rte_eth_dev *dev)
{
struct sxe2_adapter *adapter =
SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
- struct rte_pci_device *pci_dev = SXE2_DEV_TO_PCI(dev);
+ struct rte_pci_device *pci_dev = container_of(dev->device, struct rte_pci_device, device);
struct rte_intr_handle *reset_handle = NULL;
int32_t ofd = -1;
int32_t rfd = -1;
@@ -518,7 +548,7 @@ void sxe2_intr_uninit(struct rte_eth_dev *dev)
{
struct sxe2_adapter *adapter =
SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
- struct rte_pci_device *pci_dev = SXE2_DEV_TO_PCI(dev);
+ struct rte_pci_device *pci_dev = container_of(dev->device, struct rte_pci_device, device);
sxe2_reset_intr_unregister(dev);
sxe2_intr_handler_destroy(adapter->irq_ctxt.reset_handle,
diff --git a/drivers/net/sxe2/sxe2_rx.c b/drivers/net/sxe2/sxe2_rx.c
index 79e65cfbf1..b5dd9950f0 100644
--- a/drivers/net/sxe2/sxe2_rx.c
+++ b/drivers/net/sxe2/sxe2_rx.c
@@ -467,12 +467,24 @@ int32_t __rte_cold sxe2_rx_queue_start(struct rte_eth_dev *dev, uint16_t rx_queu
int32_t __rte_cold sxe2_rxqs_all_start(struct rte_eth_dev *dev)
{
struct rte_eth_dev_data *data = dev->data;
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct sxe2_drv_vsi_fc_get_resp fc_resp = {0};
struct sxe2_rx_queue *rxq;
uint16_t nb_rxq;
uint16_t nb_started_rxq;
int32_t ret;
PMD_INIT_FUNC_TRACE();
+ if (adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_FC_STATE) {
+ ret = sxe2_drv_fc_state_get(adapter, &fc_resp);
+ if (ret) {
+ PMD_LOG_ERR(RX, "Failed to get fc state, ret=[%d]", ret);
+ goto l_end;
+ }
+ adapter->fc_state_ctx.cfg_state = fc_resp.fc_enable;
+ adapter->fc_state_ctx.curr_state = adapter->fc_state_ctx.cfg_state;
+ }
+
for (nb_rxq = 0; nb_rxq < data->nb_rx_queues; nb_rxq++) {
rxq = dev->data->rx_queues[nb_rxq];
if (!rxq || rxq->rx_deferred_start)
diff --git a/drivers/net/sxe2/sxe2_testpmd.c b/drivers/net/sxe2/sxe2_testpmd.c
new file mode 100644
index 0000000000..5792058212
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_testpmd.c
@@ -0,0 +1,733 @@
+
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef SXE2_TEST
+#include <cmdline_parse_num.h>
+#include <cmdline_parse_string.h>
+#include <stdlib.h>
+#include <testpmd.h>
+
+#include "sxe2_common_log.h"
+#include "sxe2_testpmd_lib.h"
+
+#define SXE2_SWITCH_BUFF_SIZE (4 * 1024 * 1024)
+
+struct cmd_stats_info_show_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t show;
+ cmdline_fixed_string_t stats;
+ portid_t port_id;
+};
+cmdline_parse_token_string_t cmd_stats_info_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_stats_info_show_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_stats_info_show =
+ TOKEN_STRING_INITIALIZER(struct cmd_stats_info_show_result, show, "show");
+cmdline_parse_token_string_t cmd_stats_info_stats =
+ TOKEN_STRING_INITIALIZER(struct cmd_stats_info_show_result, stats, "stats");
+cmdline_parse_token_num_t cmd_stats_info_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_stats_info_show_result, port_id, RTE_UINT16);
+
+struct cmd_flow_rule_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t flow;
+ cmdline_fixed_string_t rule;
+ cmdline_fixed_string_t dump;
+ portid_t port_id;
+};
+cmdline_parse_token_string_t cmd_flow_rule_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_flow_rule_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_flow_rule_flow =
+ TOKEN_STRING_INITIALIZER(struct cmd_flow_rule_result, flow, "flow");
+cmdline_parse_token_string_t cmd_flow_rule_rule =
+ TOKEN_STRING_INITIALIZER(struct cmd_flow_rule_result, rule, "rule");
+cmdline_parse_token_string_t cmd_flow_rule_dmp =
+ TOKEN_STRING_INITIALIZER(struct cmd_flow_rule_result, dump, "dump");
+cmdline_parse_token_num_t cmd_flow_rule_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_flow_rule_result, port_id, RTE_UINT16);
+
+struct cmd_udp_tunnel {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t tunnel_type;
+ cmdline_fixed_string_t action;
+ cmdline_fixed_string_t udp_tunnel_port;
+ uint16_t udp_port;
+ portid_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_udp_tunnel_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_udp_tunnel, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_udp_tunnel_action =
+ TOKEN_STRING_INITIALIZER(struct cmd_udp_tunnel, action, "add#rm#show");
+cmdline_parse_token_string_t cmd_udp_tunnel_udp_tunnel_port =
+ TOKEN_STRING_INITIALIZER(struct cmd_udp_tunnel, udp_tunnel_port, "udp_tunnel_port");
+cmdline_parse_token_string_t cmd_udp_tunnel_tunnel_type =
+ TOKEN_STRING_INITIALIZER(struct cmd_udp_tunnel,
+ tunnel_type, "vxlan#vxlan-gpe#geneve#gtp-c#gtp-u#pfcp#ecpri#mpls#nvgre#l2tp#teredo");
+cmdline_parse_token_num_t cmd_udp_tunnel_udp_port =
+ TOKEN_NUM_INITIALIZER(struct cmd_udp_tunnel, udp_port, RTE_UINT16);
+cmdline_parse_token_num_t cmd_udp_tunnel_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_udp_tunnel, port_id, RTE_UINT16);
+
+struct cmd_sched_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t sched;
+ cmdline_fixed_string_t reset;
+ portid_t port_id;
+};
+
+cmdline_parse_token_string_t cmd_sched_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_sched_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_sched_sched =
+ TOKEN_STRING_INITIALIZER(struct cmd_sched_result, sched, "sched");
+cmdline_parse_token_string_t cmd_sched_reset =
+ TOKEN_STRING_INITIALIZER(struct cmd_sched_result, reset, "reset");
+cmdline_parse_token_num_t cmd_sched_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_sched_result, port_id, RTE_UINT16);
+
+struct cmd_ipsec_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t engin;
+ cmdline_fixed_string_t dir;
+ cmdline_fixed_string_t op;
+ portid_t port_id;
+ uint16_t session_id;
+ cmdline_fixed_string_t encrypt_algo;
+ cmdline_fixed_string_t encrypt_key;
+ cmdline_fixed_string_t auth_algo;
+ cmdline_fixed_string_t auth_key;
+ cmdline_fixed_string_t dst_ip;
+ uint16_t sport;
+ uint16_t dport;
+ uint32_t spi;
+};
+cmdline_parse_token_string_t cmd_ipsec_mgt_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_ipsec_mgt_module =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, engin, "ipsec");
+cmdline_parse_token_string_t cmd_ipsec_mgt_dir =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, dir, "egress#ingress");
+cmdline_parse_token_string_t cmd_ipsec_mgt_op =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, op, "add#rm#show");
+cmdline_parse_token_num_t cmd_ipsec_mgt_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, port_id, RTE_UINT16);
+cmdline_parse_token_num_t cmd_ipsec_mgt_session_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, session_id, RTE_UINT16);
+cmdline_parse_token_string_t cmd_ipsec_mgt_encrypt_algo =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, encrypt_algo, "aes-cbc#sm4-cbc#null");
+cmdline_parse_token_string_t cmd_ipsec_mgt_encrypt_key =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, encrypt_key, NULL);
+cmdline_parse_token_string_t cmd_ipsec_mgt_auth_algo =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, auth_algo, "sha-hmac#sm3-hmac#null");
+cmdline_parse_token_string_t cmd_ipsec_mgt_auth_key =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, auth_key, NULL);
+cmdline_parse_token_string_t cmd_ipsec_mgt_dst_ip =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_result, dst_ip, NULL);
+cmdline_parse_token_num_t cmd_ipsec_mgt_sport =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, sport, RTE_UINT16);
+cmdline_parse_token_num_t cmd_ipsec_mgt_dport =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, dport, RTE_UINT16);
+cmdline_parse_token_num_t cmd_ipsec_mgt_spi =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_result, spi, RTE_UINT32);
+
+struct cmd_ipsec_set_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t engin;
+ cmdline_fixed_string_t op;
+ cmdline_fixed_string_t type;
+ portid_t port_id;
+ uint16_t conf_value;
+};
+cmdline_parse_token_string_t cmd_ipsec_set_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_set_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_ipsec_set_module =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_set_result, engin, "ipsec");
+cmdline_parse_token_string_t cmd_ipsec_set_op =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_set_result, op, "set#get");
+cmdline_parse_token_string_t cmd_ipsec_set_type =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_set_result, type, "session-id#esp-hdr-offset");
+cmdline_parse_token_num_t cmd_ipsec_set_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_set_result, port_id, RTE_UINT16);
+cmdline_parse_token_num_t cmd_ipsec_set_value =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_set_result, conf_value, RTE_UINT16);
+
+struct cmd_ipsec_flush_result {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t engin;
+ cmdline_fixed_string_t op;
+ portid_t port_id;
+};
+cmdline_parse_token_string_t cmd_ipsec_flush_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_flush_result, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_ipsec_flush_module =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_flush_result, engin, "ipsec");
+cmdline_parse_token_string_t cmd_ipsec_flush_op =
+ TOKEN_STRING_INITIALIZER(struct cmd_ipsec_flush_result, op, "flush");
+cmdline_parse_token_num_t cmd_ipsec_flush_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_ipsec_flush_result, port_id, RTE_UINT16);
+
+struct cmd_inject_irq {
+ cmdline_fixed_string_t sxe2;
+ cmdline_fixed_string_t inject;
+ cmdline_fixed_string_t irq;
+ portid_t port_id;
+ cmdline_fixed_string_t type;
+};
+cmdline_parse_token_string_t cmd_inject_irq_sxe2 =
+ TOKEN_STRING_INITIALIZER(struct cmd_inject_irq, sxe2, "sxe2");
+cmdline_parse_token_string_t cmd_inject_irq_inject =
+ TOKEN_STRING_INITIALIZER(struct cmd_inject_irq, inject, "inject");
+cmdline_parse_token_string_t cmd_inject_irq_irq =
+ TOKEN_STRING_INITIALIZER(struct cmd_inject_irq, irq, "irq");
+cmdline_parse_token_num_t cmd_inject_irq_port_id =
+ TOKEN_NUM_INITIALIZER(struct cmd_inject_irq, port_id, RTE_UINT16);
+cmdline_parse_token_string_t cmd_inject_irq_type =
+ TOKEN_STRING_INITIALIZER(struct cmd_inject_irq, type, "reset#lsc");
+
+static void cmd_dump_flow_rule_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_flow_rule_result *res = parsed_result;
+ int ret = -1;
+
+ ret = sxe2_flow_rule_dump(res->port_id, cl);
+ switch (ret) {
+ case 0:
+ break;
+ case -EINVAL:
+ cmdline_printf(cl, "Invalid parameters.\n");
+ break;
+ case -ENODEV:
+ cmdline_printf(cl, "Device doesn't support\n");
+ break;
+ default:
+ cmdline_printf(cl,
+ "Failed to switch rule dump,"
+ " error: (%s)\n",
+ strerror(-ret));
+ }
+}
+
+static void cmd_udp_tunnel_set_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_udp_tunnel *res = parsed_result;
+ int32_t ret = -1;
+ uint8_t action;
+ const char *action_str[SXE2_TESTPMD_CMD_UDP_TUNNEL_MAX] = {
+ [SXE2_TESTPMD_CMD_UDP_TUNNEL_ADD] = "add",
+ [SXE2_TESTPMD_CMD_UDP_TUNNEL_DEL] = "rm",
+ [SXE2_TESTPMD_CMD_UDP_TUNNEL_GET] = "show"};
+
+ for (action = 0; action < SXE2_TESTPMD_CMD_UDP_TUNNEL_MAX; action++)
+ if (!strcmp(res->action, action_str[action]))
+ break;
+
+ if (action >= SXE2_TESTPMD_CMD_UDP_TUNNEL_MAX) {
+ cmdline_printf(cl, "Invalid action!\n");
+ return;
+ }
+
+ ret = sxe2_udp_tunnel_operations(res->port_id, cl, action,
+ res->udp_port,
+ res->tunnel_type);
+ if (ret)
+ cmdline_printf(cl, "%s udp tunnel port failed, ret = %d\n",
+ action_str[action], ret);
+}
+
+static void cmd_dump_stats_info_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_stats_info_show_result *res = parsed_result;
+ int ret = -1;
+
+ ret = sxe2_stats_info_show(res->port_id);
+ switch (ret) {
+ case 0:
+ break;
+ case -EINVAL:
+ cmdline_printf(cl, "Invalid parameters.\n");
+ break;
+ case -ENODEV:
+ cmdline_printf(cl, "Device doesn't support\n");
+ break;
+ default:
+ cmdline_printf(cl,
+ "Failed to show stats info,"
+ " error: (%s)\n", strerror(-ret));
+ }
+}
+
+static uint8_t cmd_ipsec_op_get(char *op)
+{
+ uint8_t i;
+ const char *op_type[SXE2_TESTPMD_CMD_IPSEC_OP_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_OP_ADD] = "add",
+ [SXE2_TESTPMD_CMD_IPSEC_OP_RM] = "rm",
+ [SXE2_TESTPMD_CMD_IPSEC_OP_SHOW] = "show",
+ };
+
+ for (i = 0; i < SXE2_TESTPMD_CMD_IPSEC_OP_MAX; i++) {
+ if (!strcmp(op, op_type[i]))
+ break;
+ }
+
+ return i;
+}
+
+static uint8_t cmd_ipsec_dir_get(char *dir)
+{
+ uint8_t i;
+ const char *dir_type[SXE2_TESTPMD_CMD_IPSEC_DIR_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS] = "egress",
+ [SXE2_TESTPMD_CMD_IPSEC_DIR_INGRESS] = "ingress"
+ };
+
+ for (i = 0; i < SXE2_TESTPMD_CMD_IPSEC_DIR_MAX; i++) {
+ if (!strcmp(dir, dir_type[i]))
+ break;
+ }
+
+ return i;
+}
+
+static int sxe2_hex_to_val(char c)
+{
+ int val = 0;
+
+ if (c >= '0' && c <= '9')
+ val = c - '0';
+ if (c >= 'A' && c <= 'F')
+ val = 10 + c - 'A';
+ if (c >= 'a' && c <= 'f')
+ val = 10 + c - 'a';
+ return val;
+}
+
+static void sxe2_hex_to_bytes(uint8_t *enc_key, char *hex_str, uint8_t len)
+{
+ uint8_t i;
+ int high = 0;
+ int low = 0;
+
+ for (i = 0; i < len; i++) {
+ high = sxe2_hex_to_val(hex_str[2 * i]);
+ low = sxe2_hex_to_val(hex_str[2 * i + 1]);
+ enc_key[i] = (high << 4) | low;
+ }
+}
+
+static int32_t cmd_ipsec_add_param_fill(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ uint8_t i;
+ uint8_t j;
+ int32_t ret = -1;
+ const char *encrypt_algo[SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC] = "aes-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC] = "sm4-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL] = "null"
+ };
+
+ const char *auth_algo[SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC] = "sha-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SM3_HMAC] = "sm3-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL] = "null"
+ };
+
+ for (i = 0; i < SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX; i++)
+ if (!strcmp(res->encrypt_algo, encrypt_algo[i]))
+ break;
+
+ if (i >= SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX) {
+ cmdline_printf(cl, "Invalid ipsec encrypt algo: %s!\n", res->encrypt_algo);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ for (j = 0; j < SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX; j++) {
+ if (!strcmp(res->auth_algo, auth_algo[j]))
+ break;
+ }
+
+
+ if (j >= SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX) {
+ cmdline_printf(cl, "Invalid ipsec auth algo: %s!\n", res->auth_algo);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ param->encrypt_algo = i;
+ param->auth_algo = j;
+ if (param->encrypt_algo == SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC)
+ param->enc_len = 16;
+ else
+ param->enc_len = 32;
+
+ sxe2_hex_to_bytes(param->enc_key, res->encrypt_key, param->enc_len);
+ if (param->auth_algo != SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL) {
+ param->auth_len = 32;
+ sxe2_hex_to_bytes(param->auth_key, res->auth_key, param->auth_len);
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t cmd_ipsec_egress_op_parsed(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ int32_t ret = -1;
+
+ switch (param->op) {
+ case SXE2_TESTPMD_CMD_IPSEC_OP_ADD:
+ ret = cmd_ipsec_add_param_fill(param, cl, res);
+ if (ret)
+ goto l_end;
+ ret = sxe2_ipsec_egress_create(param, cl);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_OP_RM:
+ param->session_id = res->session_id;
+ ret = sxe2_ipsec_egress_destroy(param, cl);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_OP_SHOW:
+ ret = sxe2_ipsec_egress_show(param, cl);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t cmd_ipsec_ip_addr_parsed(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ int32_t ret = -1;
+ struct in_addr addr4;
+ struct in6_addr addr6;
+
+ if (inet_pton(AF_INET, res->dst_ip, &addr4) == 1) {
+ param->ip_addr.type = RTE_SECURITY_IPSEC_TUNNEL_IPV4;
+ param->ip_addr.dst_ipv4 = addr4.s_addr;
+ ret = 0;
+ } else if (inet_pton(AF_INET6, res->dst_ip, &addr6) == 1) {
+ param->ip_addr.type = RTE_SECURITY_IPSEC_TUNNEL_IPV6;
+ memcpy(¶m->ip_addr.dst_ipv6, &addr6, sizeof(param->ip_addr.dst_ipv6));
+ ret = 0;
+ } else {
+ cmdline_printf(cl, "Invalid ip address: %s!\n", res->dst_ip);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t cmd_ipsec_ingress_op_parsed(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ int32_t ret = -1;
+
+ switch (param->op) {
+ case SXE2_TESTPMD_CMD_IPSEC_OP_ADD:
+ ret = cmd_ipsec_add_param_fill(param, cl, res);
+ if (ret)
+ goto l_end;
+ param->sport = htons(res->sport);
+ param->dport = htons(res->dport);
+ param->spi = htonl(res->spi);
+ ret = cmd_ipsec_ip_addr_parsed(param, cl, res);
+ if (ret)
+ goto l_end;
+ ret = sxe2_ipsec_ingress_create(param, cl);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_OP_RM:
+ param->session_id = res->session_id;
+ ret = sxe2_ipsec_ingress_destroy(param, cl);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_OP_SHOW:
+ ret = sxe2_ipsec_ingress_show(param, cl);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t cmd_ipsec_dir_parsed(struct sxe2_ipsec_conf_param *param,
+ struct cmdline *cl,
+ struct cmd_ipsec_result *res)
+{
+ int32_t ret = -1;
+
+ switch (param->dir) {
+ case SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS:
+ ret = cmd_ipsec_egress_op_parsed(param, cl, res);
+ break;
+ case SXE2_TESTPMD_CMD_IPSEC_DIR_INGRESS:
+ ret = cmd_ipsec_ingress_op_parsed(param, cl, res);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static void cmd_ipsec_mgt_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_ipsec_result *res = parsed_result;
+ struct sxe2_ipsec_conf_param param;
+ int32_t ret = -1;
+ uint8_t dir = 0;
+ uint8_t op = 0;
+
+ dir = cmd_ipsec_dir_get(res->dir);
+ if (dir >= SXE2_TESTPMD_CMD_IPSEC_DIR_MAX) {
+ cmdline_printf(cl, "Invalid ipsec direction: %s!\n", res->dir);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ op = cmd_ipsec_op_get(res->op);
+ if (op >= SXE2_TESTPMD_CMD_IPSEC_OP_MAX) {
+ cmdline_printf(cl, "Invalid ipsec operation: %s!\n", res->op);
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ memset(¶m, 0, sizeof(struct sxe2_ipsec_conf_param));
+ param.dir = dir;
+ param.op = op;
+ param.port_id = res->port_id;
+ ret = cmd_ipsec_dir_parsed(¶m, cl, res);
+
+ if (ret)
+ cmdline_printf(cl, "Command execute failed, ret = %d\n", ret);
+
+l_end:
+ return;
+}
+
+static void cmd_ipsec_set_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_ipsec_set_result *res = parsed_result;
+ int32_t ret = -1;
+
+ if (!strcmp(res->op, "set"))
+ ret = sxe2_ipsec_conf_set(res->port_id, cl, res->type, res->conf_value);
+ else if (!strcmp(res->op, "get"))
+ ret = sxe2_ipsec_conf_get(res->port_id, cl, res->type);
+ else
+ cmdline_printf(cl, "Invalid op: %s\n", res->op);
+
+ if (ret)
+ cmdline_printf(cl, "Command execute failed, ret = %d\n", ret);
+}
+
+static void cmd_ipsec_flush_parsed(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_ipsec_flush_result *res = parsed_result;
+ int32_t ret = -1;
+
+ ret = sxe2_ipsec_flush(res->port_id, cl);
+
+ if (ret)
+ cmdline_printf(cl, "Command execute failed, ret = %d\n", ret);
+}
+
+cmdline_parse_inst_t cmd_flow_rule_dump = {
+ .f = cmd_dump_flow_rule_parsed,
+ .data = NULL,
+ .help_str = "sxe2 flow rule dump <port_id>",
+ .tokens = {
+ (void *)&cmd_flow_rule_sxe2,
+ (void *)&cmd_flow_rule_flow,
+ (void *)&cmd_flow_rule_rule,
+ (void *)&cmd_flow_rule_dmp,
+ (void *)&cmd_flow_rule_port_id,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_udp_tunnel_set = {
+ .f = cmd_udp_tunnel_set_parsed,
+ .data = NULL,
+ .help_str = "sxe2 <port_id> udp_tunnel_port add|rm|show "
+ "vxlan|vxlan-gpe|geneve|gtp-c|gtp-u|pfcp|ecpri|mpls|nvgre|l2tp|teredo <udp_port>",
+ .tokens = {
+ (void *)&cmd_udp_tunnel_sxe2,
+ (void *)&cmd_udp_tunnel_port_id,
+ (void *)&cmd_udp_tunnel_udp_tunnel_port,
+ (void *)&cmd_udp_tunnel_action,
+ (void *)&cmd_udp_tunnel_tunnel_type,
+ (void *)&cmd_udp_tunnel_udp_port,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_stats_mgt = {
+ .f = cmd_dump_stats_info_parsed,
+ .data = NULL,
+ .help_str = "sxe2 show stats <port_id>",
+ .tokens = {
+ (void *)&cmd_stats_info_sxe2,
+ (void *)&cmd_stats_info_show,
+ (void *)&cmd_stats_info_stats,
+ (void *)&cmd_stats_info_port_id,
+ NULL,
+ },
+};
+
+static void cmd_sched_reset_cfg(void *parsed_result,
+ struct cmdline *cl,
+ __rte_unused void *data)
+{
+ struct cmd_sched_result *res = parsed_result;
+ int32_t ret = -1;
+
+ ret = sxe2_testpmd_sched_reset(res->port_id);
+ switch (ret) {
+ case 0:
+ break;
+ case -EINVAL:
+ cmdline_printf(cl, "invalid sched ops\n");
+ break;
+ case -ENOTSUP:
+ cmdline_printf(cl, "function not implemented\n");
+ break;
+ default:
+ cmdline_printf(cl, "programming error: (%s)\n",
+ strerror(-ret));
+ }
+}
+
+cmdline_parse_inst_t cmd_sched_reset_cmd = {
+ .f = cmd_sched_reset_cfg,
+ .data = NULL,
+ .help_str = "sxe2 sched reset <port_id>",
+ .tokens = {
+ (void *)&cmd_sched_sxe2,
+ (void *)&cmd_sched_sched,
+ (void *)&cmd_sched_reset,
+ (void *)&cmd_sched_port_id,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_ipsec_mgt = {
+ .f = cmd_ipsec_mgt_parsed,
+ .data = NULL,
+ .help_str = "sxe2 ipsec egress|ingress add|rm|show "
+ "<port_id> <session_id> aes-cbc|sm4-cbc|null <encrypt_key> sha-hmac|sm3-hmac|null "
+ "<auth_key> <dst_ip> <sport> <dport> <spi>",
+ .tokens = {
+ (void *)&cmd_ipsec_mgt_sxe2,
+ (void *)&cmd_ipsec_mgt_module,
+ (void *)&cmd_ipsec_mgt_dir,
+ (void *)&cmd_ipsec_mgt_op,
+ (void *)&cmd_ipsec_mgt_port_id,
+ (void *)&cmd_ipsec_mgt_session_id,
+ (void *)&cmd_ipsec_mgt_encrypt_algo,
+ (void *)&cmd_ipsec_mgt_encrypt_key,
+ (void *)&cmd_ipsec_mgt_auth_algo,
+ (void *)&cmd_ipsec_mgt_auth_key,
+ (void *)&cmd_ipsec_mgt_dst_ip,
+ (void *)&cmd_ipsec_mgt_sport,
+ (void *)&cmd_ipsec_mgt_dport,
+ (void *)&cmd_ipsec_mgt_spi,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_ipsec_set = {
+ .f = cmd_ipsec_set_parsed,
+ .data = NULL,
+ .help_str = "sxe2 ipsec set|get esp-hdr-offset|session-id <port_id> <value>",
+ .tokens = {
+ (void *)&cmd_ipsec_set_sxe2,
+ (void *)&cmd_ipsec_set_module,
+ (void *)&cmd_ipsec_set_op,
+ (void *)&cmd_ipsec_set_type,
+ (void *)&cmd_ipsec_set_port_id,
+ (void *)&cmd_ipsec_set_value,
+ NULL,
+ },
+};
+
+cmdline_parse_inst_t cmd_ipsec_flush = {
+ .f = cmd_ipsec_flush_parsed,
+ .data = NULL,
+ .help_str = "sxe2 ipsec flush <port_id>.\n",
+ .tokens = {
+ (void *)&cmd_ipsec_flush_sxe2,
+ (void *)&cmd_ipsec_flush_module,
+ (void *)&cmd_ipsec_flush_op,
+ (void *)&cmd_ipsec_flush_port_id,
+ NULL,
+ },
+};
+
+static struct testpmd_driver_commands sxe2_cmds = {
+ .commands = {
+ {
+ &cmd_udp_tunnel_set,
+ "sxe2 udp tunnel port set.\n"
+ "Add or remove a customed udp port for specific tunnel protocol\n\n",
+ },
+ {
+ &cmd_sched_reset_cmd,
+ "sxe2 sched reset <port_id>.\n"
+ "Reset sched node on the port\n\n",
+ },
+ {
+ &cmd_stats_mgt,
+ "sxe2 show stats.\n"
+ "Dump a runtime sxe2 dev stats on a port\n\n",
+ },
+ {
+ &cmd_ipsec_mgt,
+ "sxe2 ipsec <dir> <op> <port_id> <session_id> <encrypt_algo> <encrypt_key>"
+ "<encrypt_len> <auth_algo> <auth_key> <auth_len> <dst_ip> <sport> <dport> <spi>.\n"
+ "Create/query/remove ipsec security session\n\n",
+ },
+ {
+ &cmd_ipsec_set,
+ "sxe2 ipsec set <port_id> <session_id> <esp_hdr_offset>.\n"
+ "Set enabled tx session id or esp offset.\n\n",
+ },
+ {
+ &cmd_ipsec_flush,
+ "sxe2 ipsec flush <port_id>.\n"
+ "Flush ipsec all configurations\n\n",
+ },
+ { NULL, NULL},
+ },
+};
+TESTPMD_ADD_DRIVER_COMMANDS(sxe2_cmds)
+#endif
diff --git a/drivers/net/sxe2/sxe2_testpmd_lib.c b/drivers/net/sxe2/sxe2_testpmd_lib.c
new file mode 100644
index 0000000000..ab2530ffe6
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_testpmd_lib.c
@@ -0,0 +1,969 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#include <rte_bus.h>
+#include <eal_export.h>
+
+#include "sxe2_common_log.h"
+#include "sxe2_ethdev.h"
+#include "sxe2_stats.h"
+#include "sxe2_testpmd_lib.h"
+
+struct rte_mempool *g_sess_pool;
+
+bool g_sxe2_ipsec_mgt_init;
+struct sxe2_ipsec_session_mgt g_tx_session[SXE2_IPSEC_PORT_MAX][SXE2_IPSEC_SESSION_MAX];
+struct sxe2_ipsec_session_mgt g_rx_session[SXE2_IPSEC_PORT_MAX][SXE2_IPSEC_SESSION_MAX];
+uint16_t g_tx_sess_id[SXE2_IPSEC_PORT_MAX] = {0};
+uint16_t g_esp_header_offset[SXE2_IPSEC_PORT_MAX] = {0};
+
+static bool sxe2_is_supported(struct rte_eth_dev *dev)
+{
+ return sxe2_ethdev_check(dev);
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_testpmd_sched_reset, 26.07)
+int32_t
+sxe2_testpmd_sched_reset(uint16_t port_id)
+{
+ struct rte_eth_dev *dev = NULL;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ return -ENODEV;
+ }
+
+ return sxe2_sched_reset(dev);
+}
+
+extern const char *sxe2_flow_type_name[SXE2_FLOW_TYPE_MAX];
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_flow_rule_dump, 26.07)
+int32_t
+sxe2_flow_rule_dump(uint16_t port_id, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct sxe2_adapter *adapter = NULL;
+ int32_t ret = -1;
+ struct rte_flow_list_t *flow_list = NULL;
+ struct rte_flow *flow = NULL;
+ uint32_t index = 0;
+ struct sxe2_flow *hw_flow = NULL;
+ uint8_t i = 0;
+
+ const char *sxe2_flow_engine_name[SXE2_FLOW_ENGINE_MAX] = {
+ [SXE2_FLOW_ENGINE_ACL] = "acl",
+ [SXE2_FLOW_ENGINE_RSS] = "rss",
+ [SXE2_FLOW_ENGINE_SWITCH] = "switch",
+ [SXE2_FLOW_ENGINE_FNAV] = "fnav",
+ };
+ const char *sxe2_flow_action_name[SXE2_FLOW_ACTION_MAX] = {
+ [SXE2_FLOW_ACTION_DROP] = "drop",
+ [SXE2_FLOW_ACTION_TC_REDIRECT] = "tc_redirect",
+ [SXE2_FLOW_ACTION_TO_VSI] = "to_vsi",
+ [SXE2_FLOW_ACTION_TO_VSI_LIST] = "to_vsi_list",
+ [SXE2_FLOW_ACTION_PASSTHRU] = "passthru",
+ [SXE2_FLOW_ACTION_QUEUE] = "queue",
+ [SXE2_FLOW_ACTION_Q_REGION] = "q_region",
+ [SXE2_FLOW_ACTION_MARK] = "mark",
+ [SXE2_FLOW_ACTION_COUNT] = "count",
+ [SXE2_FLOW_ACTION_RSS] = "rss",
+ };
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev");
+ ret = -ENODEV;
+ goto l_end;
+ }
+ adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ flow_list = &adapter->flow_ctxt.rte_flow_list;
+ cmdline_printf(cl, "Dump sxe2 flow rule:\n");
+ TAILQ_FOREACH(flow, flow_list, next) {
+ cmdline_printf(cl, "rule index: %d\n", index++);
+ TAILQ_FOREACH(hw_flow, &flow->sxe2_flow_list, next) {
+ cmdline_printf(cl, "\thw flow id: %d\n", hw_flow->flow_id);
+ cmdline_printf(cl, "\t\ttype: %s\n",
+ sxe2_flow_type_name[hw_flow->meta.flow_type]);
+ cmdline_printf(cl, "\t\tprio: %d\n", hw_flow->meta.flow_prio);
+ cmdline_printf(cl, "\t\tsrc vsi: %d,rule vsi: %d\n",
+ hw_flow->meta.flow_src_vsi, hw_flow->meta.flow_rule_vsi);
+ cmdline_printf(cl, "\t\tengine type: %s\n",
+ sxe2_flow_engine_name[hw_flow->engine_type]);
+ cmdline_printf(cl, "\t\taction:");
+ for (i = 0; i < SXE2_FLOW_ACTION_MAX; i++) {
+ if (sxe2_test_bit(i, hw_flow->action.act_types))
+ cmdline_printf(cl, "%s ", sxe2_flow_action_name[i]);
+ }
+ cmdline_printf(cl, "\n");
+ }
+ }
+ cmdline_printf(cl, "Dump sxe2 flow rule end.\n");
+ ret = 0;
+l_end:
+ return ret;
+}
+
+static const char *tunnel_type_list[SXE2_UDP_TUNNEL_MAX] = {
+ [SXE2_UDP_TUNNEL_PROTOCOL_VXLAN] = "vxlan",
+ [SXE2_UDP_TUNNEL_PROTOCOL_VXLAN_GPE] = "vxlan-gpe",
+ [SXE2_UDP_TUNNEL_PROTOCOL_GENEVE] = "geneve",
+ [SXE2_UDP_TUNNEL_PROTOCOL_GTP_C] = "gtp-c",
+ [SXE2_UDP_TUNNEL_PROTOCOL_GTP_U] = "gtp-u",
+ [SXE2_UDP_TUNNEL_PROTOCOL_PFCP] = "pfcp",
+ [SXE2_UDP_TUNNEL_PROTOCOL_ECPRI] = "ecpri",
+ [SXE2_UDP_TUNNEL_PROTOCOL_MPLS] = "mpls",
+ [SXE2_UDP_TUNNEL_PROTOCOL_NVGRE] = "nvgre",
+ [SXE2_UDP_TUNNEL_PROTOCOL_L2TP] = "l2tp",
+ [SXE2_UDP_TUNNEL_PROTOCOL_TEREDO] = "teredo"
+};
+
+static enum sxe2_udp_tunnel_protocol sxe2_udp_tunnel_type_str2proto(const char *tunnel_type)
+{
+ enum sxe2_udp_tunnel_protocol proto;
+
+ for (proto = 0; proto < SXE2_UDP_TUNNEL_MAX; proto++) {
+ if (tunnel_type_list[proto] != NULL &&
+ strcmp(tunnel_type_list[proto], tunnel_type) == 0) {
+ break;
+ }
+ }
+
+ return proto;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_udp_tunnel_operations, 26.07)
+int32_t
+sxe2_udp_tunnel_operations(uint16_t port_id, struct cmdline *cl, uint8_t action,
+ uint16_t udp_port, const char *tunnel_type)
+{
+ enum sxe2_udp_tunnel_protocol proto = sxe2_udp_tunnel_type_str2proto(tunnel_type);
+ struct rte_eth_dev *dev = NULL;
+ struct sxe2_adapter *adapter = NULL;
+ struct sxe2_udp_tunnel_cfg tunnel_config = { 0 };
+ int32_t ret = -1;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (proto >= SXE2_UDP_TUNNEL_MAX) {
+ cmdline_printf(cl, "Invalid tunnel type!\n");
+ goto l_end;
+ }
+ adapter = dev->data->dev_private;
+ switch (action) {
+ case SXE2_TESTPMD_CMD_UDP_TUNNEL_ADD:
+ ret = sxe2_udp_tunnel_port_add_common(adapter, proto, udp_port);
+ break;
+ case SXE2_TESTPMD_CMD_UDP_TUNNEL_DEL:
+ ret = sxe2_udp_tunnel_port_del_common(adapter, proto, udp_port);
+ break;
+ case SXE2_TESTPMD_CMD_UDP_TUNNEL_GET:
+ tunnel_config.protocol = proto;
+ ret = sxe2_udp_tunnel_port_get_common(adapter, &tunnel_config);
+ if (!ret) {
+ cmdline_printf(cl, "Dump firmware udp tunnel config: [proto:%s, port:%d,"
+ "enable:%d, src/dst:%d/%d, used:%d]\n",
+ tunnel_type_list[proto], tunnel_config.fw_port,
+ tunnel_config.fw_status, tunnel_config.fw_src_en,
+ tunnel_config.fw_dst_en, tunnel_config.fw_used);
+ }
+ break;
+ default:
+ break;
+ }
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_stats_info_show, 26.07)
+int32_t
+sxe2_stats_info_show(uint16_t port_id)
+{
+ struct rte_eth_dev *dev = NULL;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+static int32_t sxe2_ipsec_init_mempools(void *sec_ctx)
+{
+ uint16_t nb_sess = 8192;
+ uint32_t sess_sz;
+ char s[64];
+ int32_t ret = -1;
+
+ sess_sz = rte_security_session_get_size(sec_ctx);
+ if (g_sess_pool == NULL) {
+ snprintf(s, sizeof(s), "sess_pool");
+ g_sess_pool = rte_mempool_create(s, nb_sess, sess_sz,
+ MEMPOOL_CACHE_SIZE, 0,
+ NULL, NULL, NULL, NULL,
+ SOCKET_ID_ANY, 0);
+ if (g_sess_pool == NULL) {
+ ret = -ENOMEM;
+ PMD_LOG_ERR(DRV, "Failed to malloc session pool memory.");
+ goto l_end;
+ }
+ }
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static void sxe2_ipsec_init_session_mgt(void)
+{
+ uint16_t i;
+ uint8_t port_id;
+
+ if (g_sxe2_ipsec_mgt_init)
+ return;
+
+ for (port_id = 0; port_id < SXE2_IPSEC_PORT_MAX; port_id++) {
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ g_tx_session[port_id][i].session = NULL;
+ g_tx_session[port_id][i].encrypt_algo = SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL;
+ g_tx_session[port_id][i].auth_algo = SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL;
+ g_tx_session[port_id][i].session_id = i;
+ g_tx_session[port_id][i].status = 0;
+ }
+ }
+
+ for (port_id = 0; port_id < SXE2_IPSEC_PORT_MAX; port_id++) {
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ g_rx_session[port_id][i].session = NULL;
+ g_rx_session[port_id][i].encrypt_algo = SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL;
+ g_rx_session[port_id][i].auth_algo = SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL;
+ g_rx_session[port_id][i].session_id = i;
+ g_rx_session[port_id][i].status = 0;
+ }
+ }
+
+ g_sxe2_ipsec_mgt_init = true;
+}
+
+static uint16_t sxe2_ipsec_session_mgt_alloc(enum sxe2_testpmd_ipsec_dir dir, uint16_t port_id)
+{
+ uint16_t i;
+ uint16_t index = 0XFFFF;
+ struct sxe2_ipsec_session_mgt *mgt = NULL;
+
+ if (dir == SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS)
+ mgt = g_tx_session[port_id];
+ else
+ mgt = g_rx_session[port_id];
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ if (mgt[i].status == 0) {
+ index = i;
+ mgt[i].status = 1;
+ break;
+ }
+ }
+
+ return index;
+}
+
+static void sxe2_ipsec_session_mgt_free(enum sxe2_testpmd_ipsec_dir dir,
+ uint16_t index, uint16_t port_id)
+{
+ struct sxe2_ipsec_session_mgt *mgt = NULL;
+
+ if (dir == SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS)
+ mgt = g_tx_session[port_id];
+ else
+ mgt = g_rx_session[port_id];
+
+ mgt[index].session = NULL;
+ mgt[index].status = 0;
+}
+
+static int32_t sxe2_ipsec_egress_construct(struct cmdline *cl,
+ struct rte_crypto_sym_xform **xform,
+ struct sxe2_ipsec_conf_param *param)
+{
+ struct rte_crypto_sym_xform *cur_xform = NULL;
+ struct rte_crypto_sym_xform *next_xform = NULL;
+ int32_t ret = -1;
+
+ cur_xform = rte_zmalloc("current xform",
+ sizeof(struct rte_crypto_sym_xform), 0);
+ if (cur_xform == NULL) {
+ ret = -ENOMEM;
+ cmdline_printf(cl, "Failed to malloc memory!\n");
+ goto l_end;
+ }
+ cur_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+ cur_xform->cipher.op = RTE_CRYPTO_CIPHER_OP_ENCRYPT;
+ if (param->encrypt_algo == SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC)
+ cur_xform->cipher.algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC;
+ else
+ cur_xform->cipher.algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC;
+ cur_xform->cipher.key.length = param->enc_len;
+ cur_xform->cipher.key.data = param->enc_key;
+
+ if (param->auth_algo == SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL) {
+ ret = 0;
+ goto l_end;
+ }
+
+ next_xform = rte_zmalloc("next xform",
+ sizeof(struct rte_crypto_sym_xform), 0);
+ if (next_xform == NULL) {
+ rte_free(cur_xform);
+ ret = -ENOMEM;
+ cmdline_printf(cl, "Failed to malloc memory!\n");
+ goto l_end;
+ }
+ next_xform->type = RTE_CRYPTO_SYM_XFORM_AUTH;
+ next_xform->auth.op = RTE_CRYPTO_AUTH_OP_GENERATE;
+ if (param->auth_algo == SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC)
+ next_xform->auth.algo = SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC;
+ else
+ next_xform->auth.algo = SXE2_RTE_CRYPTO_AUTH_SM3_HMAC;
+ next_xform->auth.key.length = param->auth_len;
+ next_xform->auth.key.data = param->auth_key;
+ cur_xform->next = next_xform;
+ ret = 0;
+
+l_end:
+ *xform = cur_xform;
+ return ret;
+}
+
+static int32_t sxe2_ipsec_ingress_construct(struct cmdline *cl,
+ struct rte_crypto_sym_xform **xform,
+ struct sxe2_ipsec_conf_param *param)
+{
+ struct rte_crypto_sym_xform *cur_xform = NULL;
+ struct rte_crypto_sym_xform *next_xform = NULL;
+ int32_t ret = -1;
+
+ cur_xform = rte_zmalloc("current xform",
+ sizeof(struct rte_crypto_sym_xform), 0);
+ if (cur_xform == NULL) {
+ ret = -ENOMEM;
+ cmdline_printf(cl, "Failed to malloc memory!\n");
+ goto l_end;
+ }
+
+ if (param->auth_algo == SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL) {
+ cur_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+ cur_xform->cipher.op = RTE_CRYPTO_CIPHER_OP_DECRYPT;
+ if (param->encrypt_algo == SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC)
+ cur_xform->cipher.algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC;
+ else
+ cur_xform->cipher.algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC;
+ cur_xform->cipher.key.length = param->enc_len;
+ cur_xform->cipher.key.data = param->enc_key;
+ ret = 0;
+ goto l_end;
+ }
+
+ cur_xform->type = RTE_CRYPTO_SYM_XFORM_AUTH;
+ cur_xform->auth.op = RTE_CRYPTO_AUTH_OP_VERIFY;
+ if (param->auth_algo == SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC)
+ cur_xform->auth.algo = SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC;
+ else
+ cur_xform->auth.algo = SXE2_RTE_CRYPTO_AUTH_SM3_HMAC;
+
+ cur_xform->auth.key.length = param->auth_len;
+ cur_xform->auth.key.data = param->auth_key;
+
+ next_xform = rte_zmalloc("next xform",
+ sizeof(struct rte_crypto_sym_xform), 0);
+ if (next_xform == NULL) {
+ rte_free(cur_xform);
+ ret = -ENOMEM;
+ cmdline_printf(cl, "Failed to malloc memory!\n");
+ goto l_end;
+ }
+
+ next_xform->type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+ next_xform->cipher.op = RTE_CRYPTO_CIPHER_OP_DECRYPT;
+ if (param->encrypt_algo == SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC)
+ next_xform->cipher.algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC;
+ else
+ next_xform->cipher.algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC;
+ next_xform->cipher.key.length = param->enc_len;
+ next_xform->cipher.key.data = param->enc_key;
+ cur_xform->next = next_xform;
+ ret = 0;
+
+l_end:
+ *xform = cur_xform;
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_ingress_create, 26.07)
+int32_t
+sxe2_ipsec_ingress_create(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_session_conf conf;
+ struct rte_crypto_sym_xform *encrypt_xform = NULL;
+ void *session = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ int32_t ret = -1;
+ uint16_t index;
+ uint8_t i;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(param->port_id);
+
+ if (g_sess_pool == NULL) {
+ ret = sxe2_ipsec_init_mempools(p_ctx);
+ if (ret)
+ goto l_end;
+ }
+
+ sxe2_ipsec_init_session_mgt();
+
+ memset(&conf, 0, sizeof(conf));
+ conf.protocol = RTE_SECURITY_PROTOCOL_IPSEC;
+ conf.action_type = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO;
+ conf.ipsec.mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL;
+ conf.ipsec.proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP;
+ conf.ipsec.direction = RTE_SECURITY_IPSEC_SA_DIR_INGRESS;
+ conf.ipsec.spi = param->spi;
+ conf.ipsec.udp.sport = param->sport;
+ conf.ipsec.udp.dport = param->dport;
+ conf.ipsec.tunnel.type = param->ip_addr.type;
+ if (param->sport || param->dport)
+ conf.ipsec.options.udp_encap = true;
+ if (param->ip_addr.type == RTE_SECURITY_IPSEC_TUNNEL_IPV4)
+ conf.ipsec.tunnel.ipv4.dst_ip.s_addr = param->ip_addr.dst_ipv4;
+ else
+ memcpy(&conf.ipsec.tunnel.ipv6.dst_addr,
+ ¶m->ip_addr.dst_ipv6,
+ sizeof(param->ip_addr.dst_ipv6));
+
+ ret = sxe2_ipsec_ingress_construct(cl, &encrypt_xform, param);
+ if (ret)
+ goto l_end;
+ conf.crypto_xform = encrypt_xform;
+
+ session = rte_security_session_create(p_ctx, &conf, g_sess_pool);
+ if (session == NULL) {
+ ret = -1;
+ goto l_free;
+ }
+
+ index = sxe2_ipsec_session_mgt_alloc(param->dir, param->port_id);
+ if (index == 0XFFFF) {
+ ret = -1;
+ goto l_free;
+ }
+
+ g_rx_session[param->port_id][index].session = session;
+ g_rx_session[param->port_id][index].encrypt_algo = param->encrypt_algo;
+ g_rx_session[param->port_id][index].auth_algo = param->auth_algo;
+ for (i = 0; i < 32; i++) {
+ g_rx_session[param->port_id][index].enc_key[i] = param->enc_key[i];
+ g_rx_session[param->port_id][index].auth_key[i] = param->auth_key[i];
+ }
+ g_rx_session[param->port_id][index].sport = ntohs(param->sport);
+ g_rx_session[param->port_id][index].dport = ntohs(param->dport);
+ g_rx_session[param->port_id][index].spi = ntohl(param->spi);
+ memcpy(&g_rx_session[param->port_id][index].ip_addr,
+ ¶m->ip_addr,
+ sizeof(struct sxe2_ipsec_ip_param));
+
+ ret = 0;
+
+l_free:
+ if (encrypt_xform->next)
+ rte_free(encrypt_xform->next);
+ if (encrypt_xform)
+ rte_free(encrypt_xform);
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_ingress_destroy, 26.07)
+int32_t
+sxe2_ipsec_ingress_destroy(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ struct rte_security_session *session = NULL;
+ int32_t ret = -1;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ cmdline_printf(cl, "Invalid dev.\n");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ if (param->session_id >= SXE2_IPSEC_SESSION_MAX) {
+ PMD_LOG_ERR(DRV, "Invalid session id.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (!g_rx_session[param->port_id][param->session_id].status) {
+ PMD_LOG_ERR(DRV, "Invalid session status.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (g_rx_session[param->port_id][param->session_id].session == NULL) {
+ PMD_LOG_ERR(DRV, "Invalid session data.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(param->port_id);
+
+ session = g_rx_session[param->port_id][param->session_id].session;
+ ret = rte_security_session_destroy(p_ctx, session);
+ if (ret)
+ goto l_end;
+ sxe2_ipsec_session_mgt_free(param->dir, param->session_id, param->port_id);
+
+ ret = 0;
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_ingress_show, 26.07)
+int32_t
+sxe2_ipsec_ingress_show(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ int32_t ret = -1;
+ uint16_t i;
+ uint8_t j;
+ char encrypt_key[65];
+ char auth_key[65];
+ const char *encrypt_algo[SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC] = "aes-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC] = "sm4-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL] = "null"
+ };
+
+ const char *auth_algo[SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC] = "sha-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SM3_HMAC] = "sm3-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL] = "null"
+ };
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ if (g_rx_session[param->port_id][i].status &&
+ g_rx_session[param->port_id][i].session) {
+ memset(encrypt_key, '\0', sizeof(encrypt_key));
+ memset(auth_key, '\0', sizeof(auth_key));
+ for (j = 0; j < 32; j++) {
+ sprintf(encrypt_key + 2 * j, "%02x",
+ g_rx_session[param->port_id][i].enc_key[j]);
+ }
+
+ if (g_rx_session[param->port_id][i].auth_algo !=
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL) {
+ for (j = 0; j < 32; j++) {
+ sprintf(auth_key + 2 * j, "%02x",
+ g_rx_session[param->port_id][i].auth_key[j]);
+ }
+ }
+
+ cmdline_printf(cl, "session_id:%u, direction:rx ,"
+ "encrypt_algo:%s, encrypt_key:0x%s,"
+ "auth_algo:%s, auth_key:0x%s, sport:%u, dport:%u, spi:%u\n",
+ i,
+ encrypt_algo[g_rx_session[param->port_id][i].encrypt_algo],
+ encrypt_key,
+ auth_algo[g_rx_session[param->port_id][i].auth_algo],
+ auth_key,
+ g_rx_session[param->port_id][i].sport,
+ g_rx_session[param->port_id][i].dport,
+ g_rx_session[param->port_id][i].spi);
+ }
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_egress_create, 26.07)
+int32_t
+sxe2_ipsec_egress_create(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_session_conf conf;
+ struct rte_crypto_sym_xform *encrypt_xform = NULL;
+ void *session = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ int32_t ret = -1;
+ uint16_t index;
+ uint8_t i;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(param->port_id);
+
+ if (g_sess_pool == NULL) {
+ ret = sxe2_ipsec_init_mempools(p_ctx);
+ if (ret)
+ goto l_end;
+ }
+
+ sxe2_ipsec_init_session_mgt();
+
+ memset(&conf, 0, sizeof(conf));
+ conf.protocol = RTE_SECURITY_PROTOCOL_IPSEC;
+ conf.action_type = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO;
+ conf.ipsec.mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL;
+ conf.ipsec.proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP;
+ conf.ipsec.direction = RTE_SECURITY_IPSEC_SA_DIR_EGRESS;
+
+ ret = sxe2_ipsec_egress_construct(cl, &encrypt_xform, param);
+ if (ret)
+ goto l_end;
+ conf.crypto_xform = encrypt_xform;
+
+ session = rte_security_session_create(p_ctx, &conf, g_sess_pool);
+ if (session == NULL) {
+ ret = -1;
+ goto l_free;
+ }
+
+ index = sxe2_ipsec_session_mgt_alloc(param->dir, param->port_id);
+ if (index == 0XFFFF) {
+ ret = -1;
+ goto l_free;
+ }
+
+ g_tx_session[param->port_id][index].session = session;
+ g_tx_session[param->port_id][index].encrypt_algo = param->encrypt_algo;
+ g_tx_session[param->port_id][index].auth_algo = param->auth_algo;
+ for (i = 0; i < 32; i++) {
+ g_tx_session[param->port_id][index].enc_key[i] = param->enc_key[i];
+ g_tx_session[param->port_id][index].auth_key[i] = param->auth_key[i];
+ }
+ ret = 0;
+
+l_free:
+ if (encrypt_xform->next)
+ rte_free(encrypt_xform->next);
+ if (encrypt_xform)
+ rte_free(encrypt_xform);
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_egress_destroy, 26.07)
+int32_t
+sxe2_ipsec_egress_destroy(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ struct rte_security_session *session = NULL;
+ int32_t ret = -1;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ if (param->session_id >= SXE2_IPSEC_SESSION_MAX) {
+ PMD_LOG_ERR(DRV, "Invalid session id.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (!g_tx_session[param->port_id][param->session_id].status) {
+ PMD_LOG_ERR(DRV, "Invalid session status.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (g_tx_session[param->port_id][param->session_id].session == NULL) {
+ PMD_LOG_ERR(DRV, "Invalid session data.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(param->port_id);
+
+ session = g_tx_session[param->port_id][param->session_id].session;
+ ret = rte_security_session_destroy(p_ctx, session);
+ if (ret)
+ goto l_end;
+ sxe2_ipsec_session_mgt_free(param->dir, param->session_id, param->port_id);
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_egress_show, 26.07)
+int32_t
+sxe2_ipsec_egress_show(struct sxe2_ipsec_conf_param *param, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ int32_t ret = -1;
+ uint16_t i;
+ uint8_t j;
+ char encrypt_key[65];
+ char auth_key[65];
+ const char *encrypt_algo[SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC] = "aes-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC] = "sm4-cbc",
+ [SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL] = "null"
+ };
+
+ const char *auth_algo[SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX] = {
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC] = "sha-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SM3_HMAC] = "sm3-hmac",
+ [SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL] = "null"
+ };
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(param->port_id, -ENODEV);
+
+ dev = &rte_eth_devices[param->port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ if (g_tx_session[param->port_id][i].status &&
+ g_tx_session[param->port_id][i].session) {
+ memset(encrypt_key, '\0', sizeof(encrypt_key));
+ memset(auth_key, '\0', sizeof(auth_key));
+ for (j = 0; j < 32; j++)
+ sprintf(encrypt_key + 2 * j, "%02x",
+ g_tx_session[param->port_id][i].enc_key[j]);
+ if (g_tx_session[param->port_id][i].auth_algo !=
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL)
+ for (j = 0; j < 32; j++)
+ sprintf(auth_key + 2 * j, "%02x",
+ g_tx_session[param->port_id][i].auth_key[j]);
+
+ cmdline_printf(cl, "id:%u, tx , encrypt_algo:%s,"
+ "encrypt_key:0x%s, auth_algo:%s, auth_key:0x%s.\n",
+ i,
+ encrypt_algo[g_tx_session[param->port_id][i].encrypt_algo],
+ encrypt_key,
+ auth_algo[g_tx_session[param->port_id][i].auth_algo],
+ auth_key);
+ }
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_conf_get, 26.07)
+int32_t
+sxe2_ipsec_conf_get(uint16_t port_id, struct cmdline *cl, char type[])
+{
+ struct rte_eth_dev *dev = NULL;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ return -ENODEV;
+ }
+ if (!strcmp(type, "session-id"))
+ cmdline_printf(cl, "session-id: %u\n",
+ g_tx_sess_id[port_id]);
+ else if (!strcmp(type, "esp-hdr-offset"))
+ cmdline_printf(cl, "esp-hdr-offset: %u\n",
+ g_esp_header_offset[port_id]);
+ else
+ cmdline_printf(cl, "Invalid type: %s\n", type);
+
+ return 0;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_conf_set, 26.07)
+int32_t
+sxe2_ipsec_conf_set(uint16_t port_id, struct cmdline *cl, char type[], uint16_t value)
+{
+ struct rte_eth_dev *dev = NULL;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ PMD_LOG_ERR(DRV, "Invalid dev.");
+ return -ENODEV;
+ }
+ if (!strcmp(type, "session-id")) {
+ if (value >= 4096 || !g_tx_session[port_id][value].status) {
+ cmdline_printf(cl, "Invalid session-id: %u,"
+ "0 <= value <= 4095 or the session is inactive.\n", value);
+ return -EINVAL;
+ }
+ g_tx_sess_id[port_id] = value;
+ cmdline_printf(cl, "session-id: %u\n", g_tx_sess_id[port_id]);
+ } else if (!strcmp(type, "esp-hdr-offset")) {
+ if (value < 34 || value > 512) {
+ cmdline_printf(cl, "Invalid esp-hdr-offset: %u,"
+ "34 <= value <= 512.\n", value);
+ return -EINVAL;
+ }
+ g_esp_header_offset[port_id] = value;
+ cmdline_printf(cl, "esp-hdr-offset: %u\n",
+ g_esp_header_offset[port_id]);
+ } else {
+ cmdline_printf(cl, "Invalid type: %s\n", type);
+ }
+
+ return 0;
+}
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_stats_show, 26.07)
+int32_t
+sxe2_ipsec_stats_show(uint16_t port_id)
+{
+ (void)port_id;
+ return 0;
+}
+
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(sxe2_ipsec_flush, 26.07)
+int32_t
+sxe2_ipsec_flush(uint16_t port_id, struct cmdline *cl)
+{
+ struct rte_eth_dev *dev = NULL;
+ struct rte_security_ctx *p_ctx = NULL;
+ struct rte_security_session *session = NULL;
+ int32_t ret = -1;
+ uint16_t i;
+
+ RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+
+ dev = &rte_eth_devices[port_id];
+ if (!sxe2_is_supported(dev)) {
+ cmdline_printf(cl, "Invalid dev.\n");
+ ret = -ENODEV;
+ goto l_end;
+ }
+
+ if (dev->data->dev_started != 0) {
+ cmdline_printf(cl, "port %d must be stopped.\n", dev->data->port_id);
+ ret = 0;
+ goto l_end;
+ }
+
+ p_ctx = rte_eth_dev_get_sec_ctx(port_id);
+
+ g_esp_header_offset[port_id] = 0;
+ g_tx_sess_id[port_id] = 0;
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ session = g_tx_session[port_id][i].session;
+ if (g_tx_session[port_id][i].status && session) {
+ ret = rte_security_session_destroy(p_ctx, session);
+ if (ret)
+ cmdline_printf(cl, "failed to destroy tx session: %d.\n", i);
+ else
+ sxe2_ipsec_session_mgt_free(SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS,
+ i, port_id);
+ }
+ }
+
+ for (i = 0; i < SXE2_IPSEC_SESSION_MAX; i++) {
+ session = g_rx_session[port_id][i].session;
+ if (g_rx_session[port_id][i].status && session) {
+ ret = rte_security_session_destroy(p_ctx, session);
+ if (ret)
+ cmdline_printf(cl, "failed to destroy rx session: %d.\n", i);
+ else
+ sxe2_ipsec_session_mgt_free(SXE2_TESTPMD_CMD_IPSEC_DIR_INGRESS,
+ i, port_id);
+ }
+ }
+
+ g_sxe2_ipsec_mgt_init = false;
+ ret = 0;
+
+l_end:
+ return ret;
+}
diff --git a/drivers/net/sxe2/sxe2_testpmd_lib.h b/drivers/net/sxe2/sxe2_testpmd_lib.h
new file mode 100644
index 0000000000..3d2659ef00
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_testpmd_lib.h
@@ -0,0 +1,142 @@
+
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef __SXE2_TESTPMD_LIB_H__
+#define __SXE2_TESTPMD_LIB_H__
+#include <cmdline.h>
+#include "sxe2_ipsec.h"
+
+#define SXE2_IPSEC_SESSION_MAX (4096)
+#define SXE2_IPSEC_PORT_MAX RTE_MAX_ETHPORTS
+#define MEMPOOL_CACHE_SIZE (512 / 2)
+
+enum {
+ SXE2_TESTPMD_CMD_UDP_TUNNEL_ADD = 0,
+ SXE2_TESTPMD_CMD_UDP_TUNNEL_DEL = 1,
+ SXE2_TESTPMD_CMD_UDP_TUNNEL_GET = 2,
+ SXE2_TESTPMD_CMD_UDP_TUNNEL_MAX,
+};
+
+enum sxe2_testpmd_ipsec_op {
+ SXE2_TESTPMD_CMD_IPSEC_OP_ADD = 0,
+ SXE2_TESTPMD_CMD_IPSEC_OP_RM = 1,
+ SXE2_TESTPMD_CMD_IPSEC_OP_SHOW = 2,
+ SXE2_TESTPMD_CMD_IPSEC_OP_MAX,
+};
+
+enum sxe2_testpmd_ipsec_dir {
+ SXE2_TESTPMD_CMD_IPSEC_DIR_EGRESS = 0,
+ SXE2_TESTPMD_CMD_IPSEC_DIR_INGRESS = 1,
+ SXE2_TESTPMD_CMD_IPSEC_DIR_MAX,
+};
+
+enum sxe2_testpmd_ipsec_encrypt_algo {
+ SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_AES_CBC = 0,
+ SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_SM4_CBC = 1,
+ SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_NULL = 2,
+ SXE2_TESTPMD_CMD_IPSEC_EN_ALGO_MAX,
+};
+
+enum sxe2_testpmd_ipsec_auth_algo {
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SHA_HMAC = 0,
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_SM3_HMAC = 1,
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_NULL = 2,
+ SXE2_TESTPMD_CMD_IPSEC_AUTH_ALGO_MAX,
+};
+
+struct sxe2_ipsec_conf_param {
+ enum sxe2_testpmd_ipsec_dir dir;
+ enum sxe2_testpmd_ipsec_op op;
+ enum sxe2_testpmd_ipsec_encrypt_algo encrypt_algo;
+ enum sxe2_testpmd_ipsec_auth_algo auth_algo;
+ struct sxe2_ipsec_ip_param ip_addr;
+ uint32_t spi;
+ uint16_t port_id;
+ uint16_t session_id;
+ uint16_t sport;
+ uint16_t dport;
+ uint8_t enc_key[32];
+ uint8_t enc_len;
+ uint8_t auth_key[32];
+ uint8_t auth_len;
+};
+
+struct sxe2_ipsec_session_mgt {
+ void *session;
+ enum sxe2_testpmd_ipsec_encrypt_algo encrypt_algo;
+ enum sxe2_testpmd_ipsec_auth_algo auth_algo;
+ struct sxe2_ipsec_ip_param ip_addr;
+ uint32_t spi;
+ uint16_t session_id;
+ uint16_t sport;
+ uint16_t dport;
+ uint8_t enc_key[32];
+ uint8_t auth_key[32];
+ uint8_t status;
+};
+
+__rte_experimental
+int32_t
+sxe2_testpmd_sched_reset(uint16_t port_id);
+
+__rte_experimental
+int32_t
+sxe2_flow_rule_dump(uint16_t port_id, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_udp_tunnel_operations(uint16_t port_id, struct cmdline *cl, uint8_t action,
+ uint16_t udp_port, const char *tunnel_type);
+
+__rte_experimental
+int32_t
+sxe2_stats_info_show(uint16_t port_id);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_ingress_create(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_ingress_destroy(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_ingress_show(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_egress_create(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_egress_destroy(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_egress_show(struct sxe2_ipsec_conf_param *param, struct cmdline *cl);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_conf_get(uint16_t port_id, struct cmdline *cl, char type[]);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_conf_set(uint16_t port_id, struct cmdline *cl, char type[], uint16_t value);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_stats_show(uint16_t port_id);
+
+__rte_experimental
+int32_t
+sxe2_ipsec_flush(uint16_t port_id, struct cmdline *cl);
+
+extern struct sxe2_ipsec_session_mgt g_tx_session[SXE2_IPSEC_PORT_MAX][SXE2_IPSEC_SESSION_MAX];
+extern uint16_t g_tx_sess_id[SXE2_IPSEC_PORT_MAX];
+extern uint16_t g_esp_header_offset[SXE2_IPSEC_PORT_MAX];
+extern struct rte_mempool *g_sess_pool;
+
+#endif /* __SXE2_TESTPMD_LIB_H__ */
diff --git a/drivers/net/sxe2/sxe2_tm.c b/drivers/net/sxe2/sxe2_tm.c
index 4c4f793cd5..5de9b5d3b7 100644
--- a/drivers/net/sxe2/sxe2_tm.c
+++ b/drivers/net/sxe2/sxe2_tm.c
@@ -982,6 +982,24 @@ int32_t sxe2_tm_init(struct rte_eth_dev *dev)
return ret;
}
+int32_t sxe2_tm_conf_reset(struct rte_eth_dev *dev)
+{
+ int32_t ret;
+
+ ret = sxe2_tm_uninit(dev);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_tm_init(dev);
+ if (ret)
+ goto l_end;
+
+ PMD_LOG_DEBUG(DRV, "Tm config reset succeed.");
+
+l_end:
+ return ret;
+}
+
static int32_t sxe2_tm_chk_all_leaf(struct rte_eth_dev *dev)
{
int32_t ret = 0;
diff --git a/drivers/net/sxe2/sxe2_tm.h b/drivers/net/sxe2/sxe2_tm.h
index c4f8da6a8e..b0bfc2091d 100644
--- a/drivers/net/sxe2/sxe2_tm.h
+++ b/drivers/net/sxe2/sxe2_tm.h
@@ -73,4 +73,6 @@ int32_t sxe2_tm_init(struct rte_eth_dev *dev);
int32_t sxe2_tm_uninit(struct rte_eth_dev *dev);
+int32_t sxe2_tm_conf_reset(struct rte_eth_dev *dev);
+
#endif /* __SXE2_TM_H__ */
--
2.52.0
^ permalink raw reply related
* [PATCH v13 07/20] net/sxe2: support IPsec inline protocol offload
From: liujie5 @ 2026-06-08 7:42 UTC (permalink / raw)
To: stephen; +Cc: dev, Jie Liu
In-Reply-To: <20260608074257.3043531-1-liujie5@linkdatatechnology.com>
From: Jie Liu <liujie5@linkdatatechnology.com>
This patch adds support for IPsec inline protocol offload for both
inbound and outbound traffic.
- Implement rte_security_ops: session_create, session_destroy.
- Add hardware SA table management.
- Update Rx/Tx data path to handle security offload flags.
The hardware offloads the ESP encapsulation/decapsulation and
cryptographic processing.
Signed-off-by: Jie Liu <liujie5@linkdatatechnology.com>
---
drivers/net/sxe2/meson.build | 2 +
drivers/net/sxe2/sxe2_cmd_chnl.c | 197 ++++
drivers/net/sxe2/sxe2_cmd_chnl.h | 20 +
drivers/net/sxe2/sxe2_drv_cmd.h | 61 ++
drivers/net/sxe2/sxe2_ethdev.c | 14 +
drivers/net/sxe2/sxe2_ethdev.h | 3 +
drivers/net/sxe2/sxe2_ipsec.c | 1565 +++++++++++++++++++++++++++++
drivers/net/sxe2/sxe2_ipsec.h | 254 +++++
drivers/net/sxe2/sxe2_rx.c | 5 +
drivers/net/sxe2/sxe2_security.c | 335 ++++++
drivers/net/sxe2/sxe2_security.h | 77 ++
drivers/net/sxe2/sxe2_tx.c | 8 +
drivers/net/sxe2/sxe2_txrx_poll.c | 55 +
13 files changed, 2596 insertions(+)
create mode 100644 drivers/net/sxe2/sxe2_ipsec.c
create mode 100644 drivers/net/sxe2/sxe2_ipsec.h
create mode 100644 drivers/net/sxe2/sxe2_security.c
create mode 100644 drivers/net/sxe2/sxe2_security.h
diff --git a/drivers/net/sxe2/meson.build b/drivers/net/sxe2/meson.build
index f03ea15356..86973edc99 100644
--- a/drivers/net/sxe2/meson.build
+++ b/drivers/net/sxe2/meson.build
@@ -64,4 +64,6 @@ sources += files(
'sxe2_filter.c',
'sxe2_rss.c',
'sxe2_tm.c',
+ 'sxe2_ipsec.c',
+ 'sxe2_security.c',
)
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.c b/drivers/net/sxe2/sxe2_cmd_chnl.c
index 19323ffcc4..7711e8e57d 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.c
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.c
@@ -877,3 +877,200 @@ int32_t sxe2_drv_tm_commit(struct sxe2_adapter *adapter)
l_end:
return ret;
}
+
+
+int32_t sxe2_drv_ipsec_get_capa(struct sxe2_adapter *adapter)
+{
+ int32_t ret = -1;
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_drv_ipsec_capa_resq resp;
+ struct sxe2_common_device *cdev = adapter->cdev;
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_CAP_GET,
+ NULL, 0,
+ &resp, sizeof(resp));
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to get ipsec specifications, ret=%d", ret);
+ goto l_end;
+ }
+
+ adapter->security_ctx.ipsec_ctx.max_tx_sa = rte_le_to_cpu_16(resp.tx_sa_cnt);
+ adapter->security_ctx.ipsec_ctx.max_rx_sa = rte_le_to_cpu_16(resp.rx_sa_cnt);
+ adapter->security_ctx.ipsec_ctx.max_tcam = rte_le_to_cpu_16(resp.ip_id_cnt);
+ adapter->security_ctx.ipsec_ctx.max_udp_group = rte_le_to_cpu_16(resp.udp_group_cnt);
+
+ PMD_DEV_LOG_INFO(adapter, DRV, "Max tx sa:%u, max rx sa:%u, max tcam:%u, udp group:%u.",
+ rte_le_to_cpu_16(resp.tx_sa_cnt),
+ rte_le_to_cpu_16(resp.rx_sa_cnt),
+ rte_le_to_cpu_16(resp.ip_id_cnt),
+ rte_le_to_cpu_16(resp.udp_group_cnt));
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_resource_clear(struct sxe2_adapter *adapter)
+{
+ int32_t ret = -1;
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_RESOURCE_CLEAR,
+ NULL, 0,
+ NULL, 0);
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to clear ipsec resource, ret=%d", ret);
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_txsa_add(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_tx_sa *tx_sa)
+{
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_drv_ipsec_txsa_add_req req = { 0 };
+ struct sxe2_drv_ipsec_txsa_add_resp resp = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+ int32_t ret = -1;
+ uint32_t mode = 0;
+ uint32_t i = 0;
+
+ if (tx_sa->algo == SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC)
+ mode |= IPSEC_TX_ENGINE_SM4;
+ if (tx_sa->mode == SXE2_IPSEC_MODE_ENC_AND_AUTH)
+ mode |= IPSEC_TX_ENCRYPT;
+ req.mode = rte_cpu_to_le_32(mode);
+ for (i = 0; i < SXE2_IPSEC_KEY_LEN; i++) {
+ req.encrypt_keys[i] = tx_sa->enc_key[i];
+ req.auth_keys[i] = tx_sa->auth_key[i];
+ }
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_TXSA_ADD,
+ &req, sizeof(req),
+ &resp, sizeof(resp));
+
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "failed to add tx sa, ret=%d", ret);
+ goto l_end;
+ }
+ tx_sa->hw_sa_id = rte_le_to_cpu_16(resp.index);
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_rxsa_add(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_rx_sa *rx_sa,
+ struct sxe2_ipsec_rx_tcam *rx_tcam,
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group)
+{
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_drv_ipsec_rxsa_add_req req = { 0 };
+ struct sxe2_drv_ipsec_rxsa_add_resp resp = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+ int32_t ret = -1;
+ uint32_t mode = 0;
+ uint32_t i = 0;
+
+ if (rx_sa->algo == SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC)
+ mode |= IPSEC_RX_ENGINE_SM4;
+ if (rx_sa->mode == SXE2_IPSEC_MODE_ENC_AND_AUTH)
+ mode |= IPSEC_RX_DECRYPT;
+ if (rx_tcam->ip_addr.type == RTE_SECURITY_IPSEC_TUNNEL_IPV6) {
+ mode |= IPSEC_RX_IPV6;
+ memcpy(req.ipaddr, rx_tcam->ip_addr.dst_ipv6, sizeof(req.ipaddr));
+ } else {
+ req.ipaddr[0] = rx_tcam->ip_addr.dst_ipv4;
+ }
+ req.mode = rte_cpu_to_le_32(mode);
+ req.spi = rte_cpu_to_le_32(rx_sa->spi);
+ if (rx_udp_group != NULL) {
+ req.udp_port = rte_cpu_to_le_32((uint32_t)rx_udp_group->udp_port);
+ req.sport_en = rx_udp_group->sport_en;
+ req.dport_en = rx_udp_group->dport_en;
+ }
+
+ PMD_DEV_LOG_INFO(adapter, DRV, "Add rx sa, mode: 0x%x, spi: 0x%x, udp_port: %u, "
+ "sport_en: %u, dport_en: %u.",
+ req.mode, req.spi, req.udp_port, req.sport_en, req.dport_en);
+
+ /* encrypt and auth keys */
+ for (i = 0; i < SXE2_IPSEC_KEY_LEN; i++) {
+ req.encrypt_keys[i] = rx_sa->enc_key[i];
+ req.auth_keys[i] = rx_sa->auth_key[i];
+ }
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_RXSA_ADD,
+ &req, sizeof(req),
+ &resp, sizeof(resp));
+
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret) {
+ PMD_DEV_LOG_ERR(adapter, DRV, "Failed to add rx sa, ret=%d", ret);
+ goto l_end;
+ }
+ rx_sa->hw_sa_id = rte_le_to_cpu_16(resp.sa_idx);
+ rx_sa->hw_ip_id = resp.ip_id;
+ rx_tcam->hw_ip_id = resp.ip_id;
+ rx_sa->hw_udp_group_id = resp.udp_group_id;
+ if (rx_udp_group != NULL)
+ rx_udp_group->hw_group_id = resp.udp_group_id;
+
+l_end:
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_rxsa_delete(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_rx_sa *rx_sa)
+{
+ struct sxe2_drv_ipsec_rxsa_del_req req = { 0 };
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+ int32_t ret = -1;
+
+ req.sa_idx = rte_cpu_to_le_16(rx_sa->hw_sa_id);
+ req.spi = rte_cpu_to_le_32(rx_sa->spi);
+ req.ip_id = rx_sa->hw_ip_id;
+ req.group_id = rx_sa->hw_udp_group_id;
+
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_RXSA_DEL,
+ &req, sizeof(req),
+ NULL, 0);
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret)
+ PMD_DEV_LOG_ERR(adapter, DRV,
+ "Failed to delete rx sa, sa id: %u, spi: %u, "
+ "ip id: %u, udp group id: %u, ret: %d.",
+ rx_sa->hw_sa_id, rx_sa->spi, rx_sa->hw_ip_id,
+ rx_sa->hw_udp_group_id, ret);
+
+ return ret;
+}
+
+int32_t sxe2_drv_ipsec_txsa_delete(struct sxe2_adapter *adapter,
+ uint16_t sa_id)
+{
+ struct sxe2_drv_ipsec_txsa_del_req req = { 0 };
+ struct sxe2_drv_cmd_params cmd = { 0 };
+ struct sxe2_common_device *cdev = adapter->cdev;
+ int32_t ret = -1;
+
+ req.sa_idx = rte_cpu_to_le_16(sa_id);
+ sxe2_drv_cmd_params_fill(adapter, &cmd, SXE2_DRV_CMD_IPSEC_TXSA_DEL,
+ &req, sizeof(req),
+ NULL, 0);
+ ret = sxe2_drv_cmd_exec(cdev, &cmd);
+ if (ret)
+ PMD_DEV_LOG_ERR(adapter, DRV,
+ "Failed to delete tx sa, sa id: %u, ret: %d.",
+ sa_id, ret);
+
+ return ret;
+}
+
diff --git a/drivers/net/sxe2/sxe2_cmd_chnl.h b/drivers/net/sxe2/sxe2_cmd_chnl.h
index 77e689abcd..dac487fe7d 100644
--- a/drivers/net/sxe2/sxe2_cmd_chnl.h
+++ b/drivers/net/sxe2/sxe2_cmd_chnl.h
@@ -44,6 +44,26 @@ int32_t sxe2_drv_root_tree_alloc(struct rte_eth_dev *dev);
int32_t sxe2_drv_tm_commit(struct sxe2_adapter *adapter);
+int32_t sxe2_drv_ipsec_resource_clear(struct sxe2_adapter *adapter);
+
+int32_t sxe2_drv_ipsec_get_capa(struct sxe2_adapter *adapter);
+
+int32_t sxe2_drv_ipsec_rxsa_add(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_rx_sa *rx_sa,
+ struct sxe2_ipsec_rx_tcam *rx_tcam,
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group);
+
+int32_t sxe2_drv_ipsec_txsa_add(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_tx_sa *tx_sa);
+
+int32_t sxe2_drv_ipsec_rxsa_delete(struct sxe2_adapter *adapter,
+ struct sxe2_ipsec_rx_sa *rx_sa);
+
+int32_t sxe2_drv_ipsec_txsa_delete(struct sxe2_adapter *adapter,
+ uint16_t sa_id);
+
+int32_t sxe2_drv_promisc_config(struct sxe2_adapter *adapter, bool set);
+
int32_t sxe2_drv_allmulti_config(struct sxe2_adapter *adapter, bool set);
int32_t sxe2_drv_uc_config(struct sxe2_adapter *adapter, struct rte_ether_addr *addr, bool add);
diff --git a/drivers/net/sxe2/sxe2_drv_cmd.h b/drivers/net/sxe2/sxe2_drv_cmd.h
index 67c6885cae..39a108d76a 100644
--- a/drivers/net/sxe2/sxe2_drv_cmd.h
+++ b/drivers/net/sxe2/sxe2_drv_cmd.h
@@ -375,6 +375,67 @@ struct __rte_aligned(4) __rte_packed_begin sxe2_tm_add_queue_msg {
struct sxe2_tm_info info;
} __rte_packed_end;
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_capa_resq {
+ uint16_t tx_sa_cnt;
+ uint16_t rx_sa_cnt;
+ uint16_t ip_id_cnt;
+ uint16_t udp_group_cnt;
+} __rte_packed_end;
+
+#define SXE2_IPSEC_KEY_LEN (32)
+#define SXE2_IPV6_ADDR_LEN (4)
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_txsa_add_req {
+ uint32_t mode;
+ uint8_t encrypt_keys[SXE2_IPSEC_KEY_LEN];
+ uint8_t auth_keys[SXE2_IPSEC_KEY_LEN];
+ bool func_type;
+ uint8_t func_id;
+ uint8_t drv_id;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_txsa_add_resp {
+ uint16_t index;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_rxsa_add_req {
+ uint32_t mode;
+ uint32_t spi;
+ uint32_t ipaddr[SXE2_IPV6_ADDR_LEN];
+ uint32_t udp_port;
+ uint8_t sport_en;
+ uint8_t dport_en;
+ uint8_t is_over_sdn;
+ uint8_t sdn_group_id;
+ uint8_t encrypt_keys[SXE2_IPSEC_KEY_LEN];
+ uint8_t auth_keys[SXE2_IPSEC_KEY_LEN];
+ bool func_type;
+ uint8_t func_id;
+ uint8_t drv_id;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_rxsa_add_resp {
+ uint8_t ip_id;
+ uint8_t udp_group_id;
+ uint16_t sa_idx;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_txsa_del_req {
+ uint16_t sa_idx;
+ bool func_type;
+ uint8_t func_id;
+ uint8_t drv_id;
+} __rte_packed_end;
+
+struct __rte_aligned(4) __rte_packed_begin sxe2_drv_ipsec_rxsa_del_req {
+ uint8_t ip_id;
+ uint8_t group_id;
+ uint16_t sa_idx;
+ uint32_t spi;
+ bool func_type;
+ uint8_t func_id;
+ uint8_t drv_id;
+} __rte_packed_end;
+
enum sxe2_drv_cmd_module {
SXE2_DRV_CMD_MODULE_HANDSHAKE = 0,
SXE2_DRV_CMD_MODULE_DEV = 1,
diff --git a/drivers/net/sxe2/sxe2_ethdev.c b/drivers/net/sxe2/sxe2_ethdev.c
index a095888c00..00c0552d4a 100644
--- a/drivers/net/sxe2/sxe2_ethdev.c
+++ b/drivers/net/sxe2/sxe2_ethdev.c
@@ -298,6 +298,11 @@ static int32_t sxe2_dev_infos_get(struct rte_eth_dev *dev,
if (adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_PTP)
dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_TIMESTAMP;
+ if (sxe2_ipsec_supported(adapter)) {
+ dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_SECURITY;
+ dev_info->tx_offload_capa |= RTE_ETH_TX_OFFLOAD_SECURITY;
+ }
+
if (adapter->cap_flags & SXE2_DEV_CAPS_OFFLOAD_RSS) {
dev_info->rx_offload_capa |= RTE_ETH_RX_OFFLOAD_RSS_HASH;
dev_info->flow_type_rss_offloads |= SXE2_RSS_HF_SUPPORT_ALL;
@@ -1053,6 +1058,12 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
goto init_eth_err;
}
+ ret = sxe2_security_init(dev);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to initialize security, ret=%d", ret);
+ goto init_security_err;
+ }
+
ret = sxe2_rss_disable(dev);
if (ret) {
PMD_LOG_ERR(INIT, "Failed to disable rss, ret=%d", ret);
@@ -1067,6 +1078,8 @@ static int32_t sxe2_dev_init(struct rte_eth_dev *dev,
goto l_end;
+init_security_err:
+ sxe2_eth_uinit(dev);
init_sched_err:
init_rss_err:
init_eth_err:
@@ -1085,6 +1098,7 @@ static int32_t sxe2_dev_close(struct rte_eth_dev *dev)
(void)sxe2_rss_disable(dev);
(void)sxe2_sched_uinit(dev);
sxe2_vsi_uninit(dev);
+ sxe2_security_uinit(dev);
sxe2_dev_pci_map_uinit(dev);
sxe2_eth_uinit(dev);
diff --git a/drivers/net/sxe2/sxe2_ethdev.h b/drivers/net/sxe2/sxe2_ethdev.h
index 76e4cc8b33..f226d6d5f9 100644
--- a/drivers/net/sxe2/sxe2_ethdev.h
+++ b/drivers/net/sxe2/sxe2_ethdev.h
@@ -20,6 +20,8 @@
#include "sxe2_queue.h"
#include "sxe2_mac.h"
#include "sxe2_osal.h"
+#include "sxe2_security.h"
+#include "sxe2_ipsec.h"
#include "sxe2_tm.h"
#include "sxe2_filter.h"
@@ -313,6 +315,7 @@ struct sxe2_adapter {
struct sxe2_sched_hw_cap sched_ctxt;
struct sxe2_tm_context tm_ctxt;
struct sxe2_devargs devargs;
+ struct sxe2_security_ctx security_ctx;
struct sxe2_switchdev_info switchdev_info;
bool rule_started;
bool flow_isolated;
diff --git a/drivers/net/sxe2/sxe2_ipsec.c b/drivers/net/sxe2/sxe2_ipsec.c
new file mode 100644
index 0000000000..e783a51b85
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_ipsec.c
@@ -0,0 +1,1565 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#include <rte_malloc.h>
+#include <rte_bitmap.h>
+
+#include "sxe2_ethdev.h"
+#include "sxe2_security.h"
+#include "sxe2_ipsec.h"
+#include "sxe2_cmd_chnl.h"
+#include "sxe2_common_log.h"
+
+bool sxe2_ipsec_supported(struct sxe2_adapter *adapter)
+{
+ uint64_t cap = adapter->cap_flags;
+
+ return !!(cap & SXE2_DEV_CAPS_OFFLOAD_IPSEC);
+}
+
+bool sxe2_ipsec_valid_tx_offloads(uint64_t offloads)
+{
+ bool ret = true;
+ uint64_t tso_features = 0;
+ uint64_t cksum_features = 0;
+
+ if (offloads & RTE_ETH_TX_OFFLOAD_SECURITY) {
+ tso_features = RTE_ETH_TX_OFFLOAD_TCP_TSO |
+ RTE_ETH_TX_OFFLOAD_UDP_TSO |
+ RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO |
+ RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO |
+ RTE_ETH_TX_OFFLOAD_IPIP_TNL_TSO |
+ RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO;
+ if (offloads & tso_features) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with TSO offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ cksum_features = RTE_ETH_TX_OFFLOAD_IPV4_CKSUM |
+ RTE_ETH_TX_OFFLOAD_UDP_CKSUM |
+ RTE_ETH_TX_OFFLOAD_TCP_CKSUM |
+ RTE_ETH_TX_OFFLOAD_SCTP_CKSUM |
+ RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM |
+ RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM;
+ if (offloads & cksum_features) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with checksum offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ if (offloads & (RTE_ETH_TX_OFFLOAD_VLAN_INSERT | RTE_ETH_TX_OFFLOAD_QINQ_INSERT)) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with vlan offload.");
+ ret = false;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return ret;
+}
+
+bool sxe2_ipsec_valid_rx_offloads(uint64_t offloads)
+{
+ bool ret = true;
+
+ if (offloads & RTE_ETH_RX_OFFLOAD_SECURITY) {
+ if (offloads & RTE_ETH_RX_OFFLOAD_TCP_LRO) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with LRO offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ if (offloads & RTE_ETH_RX_OFFLOAD_CHECKSUM) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with checksum offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ if (offloads & RTE_ETH_RX_OFFLOAD_KEEP_CRC) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with keep CRC offload.");
+ ret = false;
+ goto l_end;
+ }
+
+ if (offloads & RTE_ETH_RX_OFFLOAD_VLAN) {
+ PMD_LOG_ERR(DRV, "Security offload is not compatible with vlan offload.");
+ ret = false;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return ret;
+}
+
+static int32_t sxe2_ipsec_bitmap_mem_init(struct rte_bitmap **d_bmp, void **d_mem, uint32_t bits)
+{
+ struct rte_bitmap *bmp = NULL;
+ uint32_t bmp_size = 0;
+ void *mem = NULL;
+ int32_t ret = -1;
+
+ bmp_size = rte_bitmap_get_memory_footprint(bits);
+
+ mem = rte_zmalloc("ipsec bitmap", bmp_size, RTE_CACHE_LINE_SIZE);
+ if (mem == NULL) {
+ PMD_LOG_ERR(DRV, "Alloc ipsec bitmap memory failed.");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ bmp = rte_bitmap_init(bits, mem, bmp_size);
+ if (bmp == NULL) {
+ PMD_LOG_ERR(DRV, "Failed to init ipsec bitmap.");
+ rte_free(mem);
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ *d_bmp = bmp;
+ *d_mem = mem;
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t sxe2_ipsec_bitmap_init(struct sxe2_security_ctx *sxe2_sctx)
+{
+ int32_t ret = -1;
+
+ ret = sxe2_ipsec_bitmap_mem_init(&sxe2_sctx->ipsec_ctx.bmp.tx_sa_bmp,
+ &sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem, sxe2_sctx->ipsec_ctx.max_tx_sa);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_ipsec_bitmap_mem_init(&sxe2_sctx->ipsec_ctx.bmp.rx_sa_bmp,
+ &sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem, sxe2_sctx->ipsec_ctx.max_rx_sa);
+ if (ret) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_bitmap_mem_init(&sxe2_sctx->ipsec_ctx.bmp.rx_tcam_bmp,
+ &sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem, sxe2_sctx->ipsec_ctx.max_tcam);
+ if (ret) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem = NULL;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_bitmap_mem_init(&sxe2_sctx->ipsec_ctx.bmp.rx_udp_bmp,
+ &sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem, sxe2_sctx->ipsec_ctx.max_udp_group);
+ if (ret) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem);
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem = NULL;
+ sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem = NULL;
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+static uint16_t sxe2_ipsec_id_alloc(struct rte_bitmap *bmp, uint16_t bits)
+{
+ uint16_t i = 0;
+ uint16_t index = 0XFFFF;
+
+ for (i = 0; i < bits; i++) {
+ if (!rte_bitmap_get(bmp, i)) {
+ index = i;
+ rte_bitmap_set(bmp, i);
+ break;
+ }
+ }
+
+ return index;
+}
+
+static void sxe2_ipsec_id_free(struct rte_bitmap *bmp, uint16_t pos)
+{
+ rte_bitmap_clear(bmp, pos);
+}
+
+static struct rte_cryptodev_symmetric_capability *
+sxe2_ipsec_cipher_cap_get(struct rte_cryptodev_capabilities *crypto_cap,
+ enum rte_crypto_cipher_algorithm algo)
+{
+ struct rte_cryptodev_symmetric_capability *capability = NULL;
+ uint8_t index = 0;
+
+ for (index = 0; index < SXE2_IPSEC_CAP_MAX; index++) {
+ if (crypto_cap[index].sym.xform_type == RTE_CRYPTO_SYM_XFORM_CIPHER &&
+ crypto_cap[index].sym.cipher.algo == algo) {
+ capability = &crypto_cap[index].sym;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return capability;
+}
+
+static struct rte_cryptodev_symmetric_capability *
+sxe2_ipsec_auth_cap_get(struct rte_cryptodev_capabilities *crypto_cap,
+ enum rte_crypto_auth_algorithm algo)
+{
+ struct rte_cryptodev_symmetric_capability *capability = NULL;
+ uint8_t index = 0;
+
+ for (index = 0; index < SXE2_IPSEC_CAP_MAX; index++) {
+ if (crypto_cap[index].sym.xform_type == RTE_CRYPTO_SYM_XFORM_AUTH &&
+ crypto_cap[index].sym.auth.algo == algo) {
+ capability = &crypto_cap[index].sym;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return capability;
+}
+
+static bool sxe2_security_valid_key(uint16_t src_key, uint16_t max_key,
+ uint16_t min_key, uint16_t increment)
+{
+ bool is_valid = false;
+
+ if (src_key > SXE2_IPSEC_MAX_KEY_LEN) {
+ is_valid = false;
+ goto l_end;
+ }
+
+ if (src_key < min_key || src_key > max_key) {
+ is_valid = false;
+ goto l_end;
+ }
+
+ if (increment == 0) {
+ is_valid = true;
+ goto l_end;
+ }
+
+ if ((uint16_t)(src_key - min_key) % increment) {
+ is_valid = false;
+ goto l_end;
+ }
+
+ is_valid = true;
+
+l_end:
+ return is_valid;
+}
+
+static int32_t
+sxe2_ipsec_valid_cipher(enum rte_crypto_cipher_operation cipher_op,
+ struct rte_cryptodev_capabilities *crypto_cap,
+ struct rte_crypto_sym_xform *xform)
+{
+ const struct rte_cryptodev_symmetric_capability *capability = NULL;
+ uint16_t src_key = 0;
+ uint16_t max_key = 0;
+ uint16_t min_key = 0;
+ uint16_t increment = 0;
+ int32_t ret = -1;
+
+ if (xform->cipher.op != cipher_op) {
+ PMD_LOG_ERR(DRV, "Invalid cipher direction specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ capability = sxe2_ipsec_cipher_cap_get(crypto_cap, xform->cipher.algo);
+ if (!capability) {
+ PMD_LOG_ERR(DRV, "Invalid cipher algo specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ src_key = xform->cipher.key.length;
+ min_key = capability->cipher.key_size.min;
+ max_key = capability->cipher.key_size.max;
+ increment = capability->cipher.key_size.increment;
+ if (!sxe2_security_valid_key(src_key, max_key, min_key, increment)) {
+ PMD_LOG_ERR(DRV, "Invalid cipher key size specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_valid_auth(enum rte_crypto_auth_operation auth_op,
+ struct rte_cryptodev_capabilities *crypto_cap,
+ struct rte_crypto_sym_xform *xform)
+{
+ const struct rte_cryptodev_symmetric_capability *capability = NULL;
+ uint16_t src_key = 0;
+ uint16_t max_key = 0;
+ uint16_t min_key = 0;
+ uint16_t increment = 0;
+ int32_t ret = -1;
+
+ if (xform->auth.op != auth_op) {
+ PMD_LOG_ERR(DRV, "Invalid auth direction specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ capability = sxe2_ipsec_auth_cap_get(crypto_cap, xform->auth.algo);
+ if (!capability) {
+ PMD_LOG_ERR(DRV, "Invalid auth algo specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ src_key = xform->auth.key.length;
+ min_key = capability->auth.key_size.min;
+ max_key = capability->auth.key_size.max;
+ increment = capability->auth.key_size.increment;
+ if (!sxe2_security_valid_key(src_key, max_key, min_key, increment)) {
+ PMD_LOG_ERR(DRV, "Invalid auth key size specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static bool
+sxe2_ipsec_valid_algo(enum rte_crypto_auth_algorithm auth_algo,
+ enum rte_crypto_cipher_algorithm cipher_algo)
+{
+ bool ret = false;
+
+ if ((cipher_algo == SXE2_RTE_CRYPTO_CIPHER_AES_CBC &&
+ auth_algo == SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC) ||
+ (cipher_algo == SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC &&
+ auth_algo == SXE2_RTE_CRYPTO_AUTH_SM3_HMAC)) {
+ ret = true;
+ goto l_end;
+ }
+
+l_end:
+ return ret;
+}
+
+static enum sxe2_ipsec_algorithm
+sxe2_ipsec_algo_gen(enum rte_crypto_cipher_algorithm cipher_algo)
+{
+ enum sxe2_ipsec_algorithm algo = SXE2_IPSEC_ALGO_INVALID;
+
+ if (cipher_algo == SXE2_RTE_CRYPTO_CIPHER_AES_CBC)
+ algo = SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC;
+ else if (cipher_algo == SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC)
+ algo = SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC;
+
+ return algo;
+}
+
+static int32_t
+ sxe2_ipsec_valid_xform(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf)
+{
+ struct rte_crypto_sym_xform *xform = NULL;
+ struct rte_cryptodev_capabilities *crypto_cap =
+ sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC].crypto_capabilities;
+ enum rte_crypto_auth_algorithm auth_algo = RTE_CRYPTO_AUTH_NULL;
+ enum rte_crypto_cipher_algorithm cipher_algo = RTE_CRYPTO_CIPHER_NULL;
+ int32_t ret = -1;
+
+ if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_EGRESS &&
+ conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ xform = conf->crypto_xform;
+ cipher_algo = xform->cipher.algo;
+ ret = sxe2_ipsec_valid_cipher(RTE_CRYPTO_CIPHER_OP_ENCRYPT,
+ crypto_cap, xform);
+ if (ret)
+ goto l_end;
+
+ if (conf->crypto_xform->next) {
+ if (conf->crypto_xform->next->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
+ auth_algo = conf->crypto_xform->next->auth.algo;
+ if (!sxe2_ipsec_valid_algo(auth_algo, cipher_algo)) {
+ PMD_LOG_ERR(DRV, "Invalid algo group.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+ xform = conf->crypto_xform->next;
+ ret = sxe2_ipsec_valid_auth(RTE_CRYPTO_AUTH_OP_GENERATE,
+ crypto_cap, xform);
+ if (ret)
+ goto l_end;
+ } else {
+ PMD_LOG_ERR(DRV, "Encrypt direction next xform only verify.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+ }
+ } else if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_INGRESS &&
+ conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ xform = conf->crypto_xform;
+ ret = sxe2_ipsec_valid_cipher(RTE_CRYPTO_CIPHER_OP_DECRYPT,
+ crypto_cap, xform);
+ if (ret)
+ goto l_end;
+
+ } else if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_INGRESS &&
+ conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_AUTH) {
+ xform = conf->crypto_xform;
+ ret = sxe2_ipsec_valid_auth(RTE_CRYPTO_AUTH_OP_VERIFY, crypto_cap, xform);
+ if (ret)
+ goto l_end;
+
+ if (conf->crypto_xform->next &&
+ conf->crypto_xform->next->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ auth_algo = conf->crypto_xform->auth.algo;
+ cipher_algo = conf->crypto_xform->next->cipher.algo;
+ if (!sxe2_ipsec_valid_algo(auth_algo, cipher_algo)) {
+ PMD_LOG_ERR(DRV, "Invalid algo group.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+ xform = conf->crypto_xform->next;
+ ret = sxe2_ipsec_valid_cipher(RTE_CRYPTO_CIPHER_OP_DECRYPT,
+ crypto_cap, xform);
+ if (ret)
+ goto l_end;
+ } else {
+ PMD_LOG_ERR(DRV, "Not support decrypt direction only verify, but not decrypt.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+ } else {
+ PMD_LOG_ERR(DRV, "Encrypt/decrypt xform invalid.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_valid_udp(struct rte_security_session_conf *conf)
+{
+ int32_t ret = -1;
+ uint16_t sport = conf->ipsec.udp.sport;
+ uint16_t dport = conf->ipsec.udp.dport;
+
+ if (conf->ipsec.options.udp_encap == 0) {
+ ret = 0;
+ goto l_end;
+ }
+
+ if (sport == 0 && dport == 0) {
+ PMD_LOG_ERR(DRV, "Invalid udp port, cannot be zero.");
+ ret = -1;
+ goto l_end;
+ }
+
+ if (sport != 0 && dport != 0 && sport != dport) {
+ PMD_LOG_ERR(DRV, "Invalid udp port, if sport and dport is not zero, must be equal.");
+ ret = -1;
+ goto l_end;
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_session_conf_valid(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf)
+{
+ int32_t ret = -1;
+
+ if (sxe2_sctx == NULL) {
+ PMD_LOG_ERR(DRV, "Invalid security ctx.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->action_type !=
+ sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC].action) {
+ PMD_LOG_ERR(DRV, "Invalid action specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->ipsec.mode !=
+ sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC].ipsec.mode) {
+ PMD_LOG_ERR(DRV, "Invalid IPsec mode specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->ipsec.proto !=
+ sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC].ipsec.proto) {
+ PMD_LOG_ERR(DRV, "Invalid IPsec protocol specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->ipsec.options.esn) {
+ PMD_LOG_ERR(DRV, "Not support esn.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_INGRESS &&
+ conf->ipsec.spi == 0) {
+ PMD_LOG_ERR(DRV, "spi cannot be zero.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ if (conf->crypto_xform == NULL) {
+ PMD_LOG_ERR(DRV, "Invalid ipsec xform specified");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_valid_udp(conf);
+ if (ret)
+ goto l_end;
+
+ ret = sxe2_ipsec_valid_xform(sxe2_sctx, conf);
+ if (ret)
+ goto l_end;
+
+l_end:
+ return ret;
+}
+
+static void
+sxe2_ipsec_session_save(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess, uint16_t sa_id, uint16_t index)
+{
+ enum rte_crypto_cipher_algorithm cipher_algo = RTE_CRYPTO_CIPHER_NULL;
+
+ sxe2_sess->adapter = sxe2_sctx->adapter;
+ sxe2_sess->direction = conf->ipsec.direction;
+ sxe2_sess->protocol = conf->protocol;
+ sxe2_sess->mode = conf->ipsec.mode;
+ sxe2_sess->sa_proto = conf->ipsec.proto;
+ sxe2_sess->sa.spi = conf->ipsec.spi;
+ sxe2_sess->sa.hw_idx = sa_id;
+ sxe2_sess->sa.sw_idx = index;
+
+ if (conf->ipsec.options.esn) {
+ sxe2_sess->esn.enabled = true;
+ sxe2_sess->esn.value = conf->ipsec.esn.value;
+ }
+
+ if (sxe2_sess->mode == RTE_SECURITY_IPSEC_SA_MODE_TUNNEL)
+ sxe2_sess->type = conf->ipsec.tunnel.type;
+
+ if (conf->ipsec.options.udp_encap) {
+ sxe2_sess->udp_cap.enabled = true;
+ memcpy(&sxe2_sess->udp_cap.value, &conf->ipsec.udp,
+ sizeof(struct rte_security_ipsec_udp_param));
+ }
+
+ sxe2_sess->pkt_metadata_template.sa_idx = sa_id;
+ sxe2_sess->pkt_metadata_template.ol_flags |= SXE2_IPSEC_OL_FLAGS_IS_TUN;
+ sxe2_sess->pkt_metadata_template.ol_flags |= SXE2_IPSEC_OL_FLAGS_IS_ESP;
+
+ if (conf->ipsec.direction == RTE_SECURITY_IPSEC_SA_DIR_EGRESS &&
+ conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ cipher_algo = conf->crypto_xform->cipher.algo;
+ sxe2_sess->pkt_metadata_template.algo = sxe2_ipsec_algo_gen(cipher_algo);
+ if (conf->crypto_xform->next)
+ sxe2_sess->pkt_metadata_template.mode = SXE2_IPSEC_MODE_ENC_AND_AUTH;
+ else
+ sxe2_sess->pkt_metadata_template.mode = SXE2_IPSEC_MODE_ONLY_ENCRYPT;
+ }
+
+ PMD_LOG_INFO(DRV,
+ "Save security info to session ctx, said:%u, spi:%u, mode:%u, algo:%u",
+ sa_id, sxe2_sess->sa.spi,
+ sxe2_sess->pkt_metadata_template.mode,
+ sxe2_sess->pkt_metadata_template.algo);
+}
+
+static void
+sxe2_ipsec_tx_sa_fill(struct sxe2_ipsec_tx_sa *tx_sa,
+ struct rte_security_session_conf *conf)
+{
+ uint8_t *dst = NULL;
+ uint8_t len = 0;
+
+ memcpy(&tx_sa->xform, &conf->ipsec, sizeof(struct rte_security_ipsec_xform));
+
+ if (conf->crypto_xform->next)
+ tx_sa->mode = SXE2_IPSEC_MODE_ENC_AND_AUTH;
+ else
+ tx_sa->mode = SXE2_IPSEC_MODE_ONLY_ENCRYPT;
+
+ if (conf->crypto_xform->cipher.algo == SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC)
+ tx_sa->algo = SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC;
+ else
+ tx_sa->algo = SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC;
+
+ dst = tx_sa->enc_key;
+ len = conf->crypto_xform->cipher.key.length;
+ memcpy(dst, conf->crypto_xform->cipher.key.data, len);
+
+ if (conf->crypto_xform->next) {
+ dst = tx_sa->auth_key;
+ len = conf->crypto_xform->next->auth.key.length;
+ memcpy(dst, conf->crypto_xform->next->auth.key.data, len);
+ }
+}
+
+static int32_t
+sxe2_ipsec_tx_sa_add(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct sxe2_ipsec_tx_sa *tx_sa = NULL;
+ struct rte_bitmap *bmp = sxe2_sctx->ipsec_ctx.bmp.tx_sa_bmp;
+ uint16_t bits = sxe2_sctx->ipsec_ctx.max_tx_sa;
+ uint16_t index = 0xFFFF;
+ int32_t ret = -1;
+
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ index = sxe2_ipsec_id_alloc(bmp, bits);
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+ if (index == 0xFFFF) {
+ PMD_LOG_ERR(DRV, "Failed to allocate ipsec tx sa index.");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ tx_sa = &sxe2_sctx->ipsec_ctx.tx_sa[index];
+
+ sxe2_ipsec_tx_sa_fill(tx_sa, conf);
+
+ ret = sxe2_drv_ipsec_txsa_add(sxe2_sctx->adapter, tx_sa);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Failed to add tx sa.");
+ ret = -EIO;
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ sxe2_ipsec_id_free(bmp, index);
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+ goto l_end;
+ }
+
+ sxe2_ipsec_session_save(sxe2_sctx, conf, sxe2_sess, tx_sa->hw_sa_id, tx_sa->id);
+
+ PMD_LOG_INFO(DRV, "Add tx sa success, tx sa id: %u, index: %u.",
+ tx_sa->hw_sa_id, tx_sa->id);
+
+l_end:
+ return ret;
+}
+
+static uint16_t
+sxe2_ipsec_tcam_id_find(struct sxe2_ipsec_rx_tcam *rx_tcam,
+ struct rte_security_ipsec_tunnel_param tunnel, uint16_t len)
+{
+ struct sxe2_ipsec_rx_tcam *per = NULL;
+ uint16_t tcam_id = 0XFFFF;
+ uint16_t i = 0;
+
+ for (i = 0; i < len; i++) {
+ per = &rx_tcam[i];
+ if (per->ip_addr.type == tunnel.type) {
+ if (tunnel.type == RTE_SECURITY_IPSEC_TUNNEL_IPV4 &&
+ per->ip_addr.dst_ipv4 == (uint32_t)tunnel.ipv4.dst_ip.s_addr) {
+ tcam_id = i;
+ goto l_end;
+ }
+ if (tunnel.type == RTE_SECURITY_IPSEC_TUNNEL_IPV6) {
+ if (!memcmp(&tunnel.ipv6, &per->ip_addr.dst_ipv6,
+ sizeof(tunnel.ipv6))) {
+ tcam_id = i;
+ goto l_end;
+ }
+ }
+ }
+ }
+
+l_end:
+ return tcam_id;
+}
+
+static uint16_t
+sxe2_ipsec_group_id_find(struct sxe2_ipsec_rx_udp_group *rx_udp_group,
+ uint16_t udp_port, uint8_t sport_en, uint8_t dport_en, uint16_t len)
+{
+ struct sxe2_ipsec_rx_udp_group *per = NULL;
+ uint16_t group_id = 0XFFFF;
+ uint16_t i;
+
+ for (i = 0; i < len; i++) {
+ per = &rx_udp_group[i];
+ if (per->udp_port == udp_port && per->sport_en == sport_en &&
+ per->dport_en == dport_en) {
+ group_id = i;
+ goto l_end;
+ }
+ }
+
+l_end:
+ return group_id;
+}
+
+static void
+sxe2_ipsec_rx_sa_fill(struct sxe2_ipsec_rx_sa *rx_sa,
+ struct rte_security_session_conf *conf)
+{
+ uint8_t *dst = NULL;
+ uint8_t len = 0;
+
+ memcpy(&rx_sa->xform, &conf->ipsec, sizeof(struct rte_security_ipsec_xform));
+
+ if (conf->crypto_xform->next)
+ rx_sa->mode = SXE2_IPSEC_MODE_ENC_AND_AUTH;
+ else
+ rx_sa->mode = SXE2_IPSEC_MODE_ONLY_ENCRYPT;
+
+ if (conf->crypto_xform->type == RTE_CRYPTO_SYM_XFORM_CIPHER) {
+ if (conf->crypto_xform->cipher.algo == SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC)
+ rx_sa->algo = SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC;
+ else
+ rx_sa->algo = SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC;
+ } else {
+ if (conf->crypto_xform->auth.algo == SXE2_RTE_CRYPTO_AUTH_SM3_HMAC)
+ rx_sa->algo = SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC;
+ else
+ rx_sa->algo = SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC;
+ }
+
+ if (conf->crypto_xform->next) {
+ dst = rx_sa->auth_key;
+ len = conf->crypto_xform->auth.key.length;
+ memcpy(dst, conf->crypto_xform->auth.key.data, len);
+
+ dst = rx_sa->enc_key;
+ len = conf->crypto_xform->next->cipher.key.length;
+ memcpy(dst, conf->crypto_xform->next->cipher.key.data, len);
+ } else {
+ dst = rx_sa->enc_key;
+ len = conf->crypto_xform->cipher.key.length;
+ memcpy(dst, conf->crypto_xform->cipher.key.data, len);
+ }
+
+ rx_sa->spi = conf->ipsec.spi;
+}
+
+static int32_t
+sxe2_ipsec_rx_tcam_fill(struct sxe2_security_ctx *sxe2_sctx, uint16_t *tcam_id,
+ struct rte_security_session_conf *conf)
+{
+ int32_t ret = -1;
+ uint16_t len = sxe2_sctx->ipsec_ctx.max_tcam;
+ struct sxe2_ipsec_rx_tcam *rx_tcam = NULL;
+
+ *tcam_id = sxe2_ipsec_tcam_id_find(sxe2_sctx->ipsec_ctx.rx_tcam,
+ conf->ipsec.tunnel, len);
+ if (*tcam_id == 0XFFFF) {
+ *tcam_id = sxe2_ipsec_id_alloc(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_bmp, len);
+ if (*tcam_id == 0xFFFF) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ rx_tcam = &sxe2_sctx->ipsec_ctx.rx_tcam[*tcam_id];
+
+ rx_tcam->ip_addr.type = conf->ipsec.tunnel.type;
+ if (rx_tcam->ip_addr.type == RTE_SECURITY_IPSEC_TUNNEL_IPV4) {
+ rx_tcam->ip_addr.dst_ipv4 = (uint32_t)conf->ipsec.tunnel.ipv4.dst_ip.s_addr;
+ } else {
+ memcpy(&rx_tcam->ip_addr.dst_ipv6, &conf->ipsec.tunnel.ipv6.dst_addr,
+ sizeof(rx_tcam->ip_addr.dst_ipv6));
+ }
+ } else {
+ rx_tcam = &sxe2_sctx->ipsec_ctx.rx_tcam[*tcam_id];
+ }
+ rx_tcam->ref_cnt++;
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_rx_udp_group_fill(struct sxe2_security_ctx *sxe2_sctx, uint16_t *udp_group_id,
+ struct rte_security_session_conf *conf)
+{
+ int32_t ret = -1;
+ uint16_t len = sxe2_sctx->ipsec_ctx.max_udp_group;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group = NULL;
+ uint8_t sport_en = 0;
+ uint8_t dport_en = 0;
+ uint16_t udp_port = 0;
+
+ if (!conf->ipsec.options.udp_encap) {
+ ret = 0;
+ goto l_end;
+ }
+
+ if (conf->ipsec.udp.sport) {
+ sport_en = 1;
+ udp_port = conf->ipsec.udp.sport;
+ } else {
+ sport_en = 0;
+ }
+ if (conf->ipsec.udp.dport) {
+ dport_en = 1;
+ udp_port = conf->ipsec.udp.dport;
+ } else {
+ dport_en = 0;
+ }
+
+ *udp_group_id = sxe2_ipsec_group_id_find(sxe2_sctx->ipsec_ctx.rx_udp_group,
+ udp_port, sport_en, dport_en, len);
+ if (*udp_group_id == 0XFFFF) {
+ *udp_group_id = sxe2_ipsec_id_alloc(sxe2_sctx->ipsec_ctx.bmp.rx_udp_bmp, len);
+ if (*udp_group_id == 0xFFFF) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ rx_udp_group = &sxe2_sctx->ipsec_ctx.rx_udp_group[*udp_group_id];
+ rx_udp_group->sport_en = sport_en;
+ rx_udp_group->dport_en = dport_en;
+ rx_udp_group->udp_port = udp_port;
+ } else {
+ rx_udp_group = &sxe2_sctx->ipsec_ctx.rx_udp_group[*udp_group_id];
+ }
+ rx_udp_group->ref_cnt++;
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_rx_sa_add(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct sxe2_ipsec_rx_tcam *rx_tcam = NULL;
+ struct sxe2_ipsec_rx_sa *rx_sa = NULL;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group = NULL;
+ struct rte_bitmap *rx_sa_bmp = sxe2_sctx->ipsec_ctx.bmp.rx_sa_bmp;
+ struct rte_bitmap *rx_tcam_bmp = sxe2_sctx->ipsec_ctx.bmp.rx_tcam_bmp;
+ uint16_t sa_bits = sxe2_sctx->ipsec_ctx.max_rx_sa;
+ uint16_t sa_id = 0xFFFF;
+ uint16_t tcam_id = 0xFFFF;
+ uint16_t udp_group_id = 0xFFFF;
+ int32_t ret = -1;
+
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ sa_id = sxe2_ipsec_id_alloc(rx_sa_bmp, sa_bits);
+ if (sa_id == 0xFFFF) {
+ PMD_LOG_ERR(DRV, "Failed to allocate ipsec rx sa index.");
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ rx_sa = &sxe2_sctx->ipsec_ctx.rx_sa[sa_id];
+ sxe2_ipsec_rx_sa_fill(rx_sa, conf);
+
+ ret = sxe2_ipsec_rx_tcam_fill(sxe2_sctx, &tcam_id, conf);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Failed to allocate ipsec rx tcam index.");
+ sxe2_ipsec_id_free(rx_sa_bmp, sa_id);
+ goto l_end;
+ }
+ rx_sa->tcam_id = tcam_id;
+ rx_tcam = &sxe2_sctx->ipsec_ctx.rx_tcam[tcam_id];
+
+ ret = sxe2_ipsec_rx_udp_group_fill(sxe2_sctx, &udp_group_id, conf);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Failed to allocate ipsec rx udp group index.");
+ sxe2_ipsec_id_free(rx_sa_bmp, sa_id);
+ sxe2_ipsec_id_free(rx_tcam_bmp, tcam_id);
+ goto l_end;
+ }
+
+ if (udp_group_id != 0XFFFF) {
+ rx_sa->udp_group_id = (uint8_t)udp_group_id;
+ rx_udp_group = &sxe2_sctx->ipsec_ctx.rx_udp_group[udp_group_id];
+ } else {
+ rx_sa->udp_group_id = 0XFF;
+ }
+
+ ret = sxe2_drv_ipsec_rxsa_add(sxe2_sctx->adapter, rx_sa, rx_tcam, rx_udp_group);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Failed to add rx sa.");
+ sxe2_ipsec_id_free(rx_sa_bmp, sa_id);
+ rx_tcam->ref_cnt--;
+ if (rx_tcam->ref_cnt == 0)
+ sxe2_ipsec_id_free(rx_tcam_bmp, tcam_id);
+
+ if (rx_udp_group != NULL) {
+ rx_udp_group->ref_cnt--;
+ if (rx_udp_group->ref_cnt == 0)
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.rx_udp_bmp,
+ udp_group_id);
+ }
+
+ ret = -EIO;
+ goto l_end;
+ }
+
+ sxe2_ipsec_session_save(sxe2_sctx, conf, sxe2_sess, rx_sa->hw_sa_id, rx_sa->id);
+
+ PMD_LOG_INFO(DRV, "Add rx sa success, rx sa id: %u, rx ip id: %u, group id: %u, index: %u.",
+ rx_sa->hw_sa_id, rx_sa->hw_ip_id, rx_sa->udp_group_id, rx_sa->id);
+
+l_end:
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_hw_table_add(struct sxe2_security_ctx *sxe2_sctx,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess)
+{
+ int32_t ret = -1;
+
+ switch (conf->ipsec.direction) {
+ case RTE_SECURITY_IPSEC_SA_DIR_EGRESS:
+ ret = sxe2_ipsec_tx_sa_add(sxe2_sctx, conf, sxe2_sess);
+ break;
+ case RTE_SECURITY_IPSEC_SA_DIR_INGRESS:
+ ret = sxe2_ipsec_rx_sa_add(sxe2_sctx, conf, sxe2_sess);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid sa direction.");
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+int sxe2_ipsec_session_create(void *device,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(eth_dev);
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ int32_t ret = -1;
+
+ ret = sxe2_ipsec_session_conf_valid(sxe2_sctx, conf);
+ if (ret) {
+ PMD_LOG_ERR(DRV, "Input ipsec session conf invalid.");
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_hw_table_add(sxe2_sctx, conf, sxe2_sess);
+ if (ret)
+ goto l_end;
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_tx_sa_delete(struct sxe2_security_ctx *sxe2_sctx,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct sxe2_ipsec_tx_sa *tx_sa = NULL;
+ uint16_t sa_id = sxe2_sess->sa.hw_idx;
+ uint16_t sw_sa_id = sxe2_sess->sa.sw_idx;
+ int32_t ret = -1;
+
+ if (sw_sa_id >= sxe2_sctx->ipsec_ctx.max_tx_sa) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "invalid sw sa id: %u.", sw_sa_id);
+ goto l_end;
+ }
+
+ if (!rte_bitmap_get(sxe2_sctx->ipsec_ctx.bmp.tx_sa_bmp, sw_sa_id)) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "bitmap not set, index: %u.", sw_sa_id);
+ goto l_end;
+ }
+
+ tx_sa = &sxe2_sctx->ipsec_ctx.tx_sa[sw_sa_id];
+
+ if (tx_sa->hw_sa_id != sa_id) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "invalid hw sa id: %u != %u.", sa_id, tx_sa->hw_sa_id);
+ goto l_end;
+ }
+
+ ret = sxe2_drv_ipsec_txsa_delete(sxe2_sctx->adapter, sa_id);
+ if (ret)
+ goto l_end;
+
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_bmp, sw_sa_id);
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_rx_sa_delete(struct sxe2_security_ctx *sxe2_sctx,
+ struct sxe2_security_session *sxe2_sess)
+{
+ struct sxe2_ipsec_rx_udp_group *rx_udp = NULL;
+ struct sxe2_ipsec_rx_tcam *rx_tcam = NULL;
+ struct sxe2_ipsec_rx_sa *rx_sa = NULL;
+ uint16_t sa_id = sxe2_sess->sa.hw_idx;
+ uint16_t sw_sa_id = sxe2_sess->sa.sw_idx;
+ int32_t ret = -1;
+
+ if (sw_sa_id >= sxe2_sctx->ipsec_ctx.max_rx_sa) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "invalid sw sa id: %u.", sw_sa_id);
+ goto l_end;
+ }
+
+ if (!rte_bitmap_get(sxe2_sctx->ipsec_ctx.bmp.rx_sa_bmp, sw_sa_id)) {
+ ret = 0;
+ PMD_LOG_INFO(DRV, "bitmap not set, id: %u.", sw_sa_id);
+ goto l_end;
+ }
+
+ rx_sa = &sxe2_sctx->ipsec_ctx.rx_sa[sw_sa_id];
+
+ if (rx_sa->hw_sa_id != sa_id) {
+ ret = 0;
+ PMD_LOG_WARN(DRV, "invalid hw sa id: %u != %u.", sa_id, rx_sa->hw_sa_id);
+ goto l_end;
+ }
+
+ ret = sxe2_drv_ipsec_rxsa_delete(sxe2_sctx->adapter, rx_sa);
+ if (ret)
+ goto l_end;
+
+ rte_spinlock_lock(&sxe2_sctx->security_lock);
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_bmp, sw_sa_id);
+
+ rx_tcam = &sxe2_sctx->ipsec_ctx.rx_tcam[rx_sa->tcam_id];
+ rx_tcam->ref_cnt--;
+ if (rx_tcam->ref_cnt == 0)
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_bmp, rx_sa->tcam_id);
+
+ if (rx_sa->udp_group_id == 0xFF) {
+ PMD_LOG_INFO(DRV, "Not need to release udp group resource.");
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+ goto l_end;
+ }
+ rx_udp = &sxe2_sctx->ipsec_ctx.rx_udp_group[rx_sa->udp_group_id];
+ rx_udp->ref_cnt--;
+ if (rx_udp->ref_cnt == 0)
+ sxe2_ipsec_id_free(sxe2_sctx->ipsec_ctx.bmp.rx_udp_bmp, rx_sa->udp_group_id);
+ rte_spinlock_unlock(&sxe2_sctx->security_lock);
+
+l_end:
+ return ret;
+}
+
+static int32_t
+sxe2_ipsec_hw_table_delete(struct sxe2_security_ctx *sxe2_sctx,
+ struct sxe2_security_session *sxe2_sess)
+{
+ int32_t ret = -1;
+
+ switch (sxe2_sess->direction) {
+ case RTE_SECURITY_IPSEC_SA_DIR_EGRESS:
+ ret = sxe2_ipsec_tx_sa_delete(sxe2_sctx, sxe2_sess);
+ break;
+ case RTE_SECURITY_IPSEC_SA_DIR_INGRESS:
+ ret = sxe2_ipsec_rx_sa_delete(sxe2_sctx, sxe2_sess);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid sa direction.");
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+int sxe2_ipsec_session_destroy(void *device, struct rte_security_session *session)
+{
+ struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(eth_dev);
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ struct sxe2_security_session *sxe2_sess = NULL;
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+ int32_t ret = -1;
+
+ if (unlikely(sxe2_sess == NULL || sxe2_sess->adapter != adapter)) {
+ PMD_LOG_ERR(DRV, "Invalid device adapter.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_hw_table_delete(sxe2_sctx, sxe2_sess);
+ if (ret) {
+ ret = -EIO;
+ PMD_LOG_ERR(DRV, "Failed to delete ipsec hw tables.");
+ goto l_end;
+ }
+
+ memset(sxe2_sess, 0, sizeof(struct sxe2_security_session));
+
+ PMD_LOG_INFO(DRV, "Delete ipsec session success, sa_id: %u, spi: %u.",
+ sxe2_sess->sa.hw_idx, sxe2_sess->sa.spi);
+
+l_end:
+ return ret;
+}
+
+int sxe2_ipsec_pkt_metadata_set(void *device, struct rte_security_session *session,
+ struct rte_mbuf *m, void *params)
+{
+ struct rte_eth_dev *eth_dev = (struct rte_eth_dev *)device;
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(eth_dev);
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ struct sxe2_security_session *sxe2_sess = NULL;
+ struct sxe2_ipsec_pkt_metadata *md = NULL;
+ uint16_t offset = 0;
+ int32_t ret = -1;
+
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+ if (unlikely(sxe2_sess == NULL || sxe2_sess->adapter != adapter)) {
+ PMD_LOG_ERR(DRV, "Invalid parameters.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ offset = ((struct sxe2_ipsec_metadata_params *)params)->esp_header_offset;
+ if (offset <= IPSEC_ESP_OFFSET_MIN || offset >= IPSEC_ESP_OFFSET_MAX) {
+ PMD_LOG_ERR(DRV, "Invalid esp header offset.");
+ ret = -EINVAL;
+ goto l_end;
+ }
+
+ md = RTE_MBUF_DYNFIELD(m, sxe2_sctx->ipsec_ctx.md_offset, struct sxe2_ipsec_pkt_metadata *);
+
+ memcpy(md, &sxe2_sess->pkt_metadata_template, sizeof(struct sxe2_ipsec_pkt_metadata));
+ md->esp_head_offset = offset;
+
+ PMD_LOG_INFO(DRV, "ipsec metadata set, offset:%u, said:%u, mode:%u, algo:%u.", offset,
+ sxe2_sess->pkt_metadata_template.sa_idx, sxe2_sess->pkt_metadata_template.mode,
+ sxe2_sess->pkt_metadata_template.algo);
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+int sxe2_ipsec_pkt_md_offset_get(struct sxe2_adapter *adapter)
+{
+ return adapter->security_ctx.ipsec_ctx.md_offset;
+}
+
+static void sxe2_ipsec_enc_aes_cbc_fill(struct rte_cryptodev_capabilities *cap)
+{
+ cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+
+ cap->sym.cipher.algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC;
+
+ cap->sym.cipher.block_size = SXE2_SECURITY_BLOCK_SIZE_16;
+
+ cap->sym.cipher.key_size.min = SXE2_IPSEC_AES_KEY_MIN;
+ cap->sym.cipher.key_size.max = SXE2_IPSEC_AES_KEY_MAX;
+ cap->sym.cipher.key_size.increment = SXE2_IPSEC_AES_KEY_INC;
+
+ cap->sym.cipher.iv_size.min = SXE2_IPSEC_AES_IV_MIN;
+ cap->sym.cipher.iv_size.max = SXE2_IPSEC_AES_IV_MAX;
+ cap->sym.cipher.iv_size.increment = SXE2_IPSEC_AES_IV_INC;
+
+ cap->sym.cipher.dataunit_set |= RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES;
+}
+
+static void sxe2_ipsec_enc_sm4_cbc_fill(struct rte_cryptodev_capabilities *cap)
+{
+ cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER;
+
+ cap->sym.cipher.algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC;
+
+ cap->sym.cipher.block_size = SXE2_SECURITY_BLOCK_SIZE_16;
+
+ cap->sym.cipher.key_size.min = SXE2_IPSEC_SM4_KEY_MIN;
+ cap->sym.cipher.key_size.max = SXE2_IPSEC_SM4_KEY_MAX;
+ cap->sym.cipher.key_size.increment = SXE2_IPSEC_SM4_KEY_INC;
+
+ cap->sym.cipher.iv_size.min = SXE2_IPSEC_SM4_IV_MIN;
+ cap->sym.cipher.iv_size.max = SXE2_IPSEC_SM4_IV_MAX;
+ cap->sym.cipher.iv_size.increment = SXE2_IPSEC_SM4_IV_INC;
+
+ cap->sym.cipher.dataunit_set |= RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES;
+}
+
+static void sxe2_ipsec_auth_sha_hmac_fill(struct rte_cryptodev_capabilities *cap)
+{
+ cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_AUTH;
+
+ cap->sym.auth.algo = SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC;
+
+ cap->sym.auth.block_size = SXE2_SECURITY_BLOCK_SIZE_64;
+
+ cap->sym.auth.key_size.min = SXE2_IPSEC_SHA_KEY_MIN;
+ cap->sym.auth.key_size.max = SXE2_IPSEC_SHA_KEY_MAX;
+ cap->sym.auth.key_size.increment = SXE2_IPSEC_SHA_KEY_INC;
+
+ cap->sym.auth.iv_size.min = SXE2_IPSEC_SHA_IV_MIN;
+ cap->sym.auth.iv_size.max = SXE2_IPSEC_SHA_IV_MAX;
+ cap->sym.auth.iv_size.increment = SXE2_IPSEC_SHA_IV_INC;
+
+ cap->sym.auth.digest_size.min = SXE2_IPSEC_SHA_DIGEST_MIN;
+ cap->sym.auth.digest_size.max = SXE2_IPSEC_SHA_DIGEST_MAX;
+ cap->sym.auth.digest_size.increment = SXE2_IPSEC_SHA_DIGEST_INC;
+
+ cap->sym.auth.aad_size.min = SXE2_IPSEC_AAD_MIN;
+ cap->sym.auth.aad_size.max = SXE2_IPSEC_AAD_MAX;
+ cap->sym.auth.aad_size.increment = SXE2_IPSEC_AAD_INC;
+}
+
+static void sxe2_ipsec_auth_sm3_hmac_fill(struct rte_cryptodev_capabilities *cap)
+{
+ cap->sym.xform_type = RTE_CRYPTO_SYM_XFORM_AUTH;
+
+ cap->sym.auth.algo = SXE2_RTE_CRYPTO_AUTH_SM3_HMAC;
+
+ cap->sym.auth.block_size = SXE2_SECURITY_BLOCK_SIZE_64;
+
+ cap->sym.auth.key_size.min = SXE2_IPSEC_SM3_KEY_MIN;
+ cap->sym.auth.key_size.max = SXE2_IPSEC_SM3_KEY_MAX;
+ cap->sym.auth.key_size.increment = SXE2_IPSEC_SM3_KEY_INC;
+
+ cap->sym.auth.iv_size.min = SXE2_IPSEC_SM3_IV_MIN;
+ cap->sym.auth.iv_size.max = SXE2_IPSEC_SM3_IV_MAX;
+ cap->sym.auth.iv_size.increment = SXE2_IPSEC_SM3_IV_INC;
+
+ cap->sym.auth.digest_size.min = SXE2_IPSEC_SM3_DIGEST_MIN;
+ cap->sym.auth.digest_size.max = SXE2_IPSEC_SM3_DIGEST_MAX;
+ cap->sym.auth.digest_size.increment = SXE2_IPSEC_SM3_DIGEST_INC;
+
+ cap->sym.auth.aad_size.min = SXE2_IPSEC_AAD_MIN;
+ cap->sym.auth.aad_size.max = SXE2_IPSEC_AAD_MAX;
+ cap->sym.auth.aad_size.increment = SXE2_IPSEC_AAD_INC;
+}
+
+static int32_t
+sxe2_ipsec_capabilities_init(struct sxe2_security_ctx *sxe2_sctx)
+{
+ struct rte_cryptodev_capabilities *capabilities = NULL;
+ struct sxe2_security_capabilities *sxe2_cap =
+ &sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC];
+ int32_t ret = -1;
+ uint8_t index = 0;
+
+ sxe2_cap->action = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO;
+ sxe2_cap->ipsec.proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP;
+ sxe2_cap->ipsec.mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL;
+ sxe2_cap->ipsec.options.stats = 1;
+
+ capabilities = rte_zmalloc("security_caps",
+ sizeof(struct rte_cryptodev_capabilities) * SXE2_IPSEC_CAP_MAX, 0);
+ if (capabilities == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ for (index = 0; index < SXE2_IPSEC_CAP_MAX; index++) {
+ capabilities[index].op = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
+ switch (index) {
+ case SXE2_IPSEC_CAP_ENC_AES_CBC:
+ sxe2_ipsec_enc_aes_cbc_fill(&capabilities[index]);
+ break;
+ case SXE2_IPSEC_CAP_ENC_SM4_CBC:
+ sxe2_ipsec_enc_sm4_cbc_fill(&capabilities[index]);
+ break;
+ case SXE2_IPSEC_CAP_AUTH_SHA256_HMAC:
+ sxe2_ipsec_auth_sha_hmac_fill(&capabilities[index]);
+ break;
+ case SXE2_IPSEC_CAP_AUTH_SM3_HMAC:
+ sxe2_ipsec_auth_sm3_hmac_fill(&capabilities[index]);
+ break;
+ default:
+ break;
+ }
+ }
+
+ sxe2_cap->crypto_capabilities = capabilities;
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+static void
+sxe2_ipsec_tx_sa_init(struct sxe2_ipsec_tx_sa *tx_sa, uint16_t len)
+{
+ struct sxe2_ipsec_tx_sa *per = NULL;
+ uint16_t i;
+
+ memset(tx_sa, 0, sizeof(struct sxe2_ipsec_tx_sa) * len);
+ for (i = 0; i < len; i++) {
+ per = &tx_sa[i];
+ per->id = i;
+ }
+}
+
+static void
+sxe2_ipsec_rx_sa_init(struct sxe2_ipsec_rx_sa *rx_sa, uint16_t len)
+{
+ struct sxe2_ipsec_rx_sa *per = NULL;
+ uint16_t i;
+
+ memset(rx_sa, 0, sizeof(struct sxe2_ipsec_rx_sa) * len);
+ for (i = 0; i < len; i++) {
+ per = &rx_sa[i];
+ per->id = i;
+ }
+}
+
+static void
+sxe2_ipsec_rx_tcam_init(struct sxe2_ipsec_rx_tcam *rx_tcam, uint16_t len)
+{
+ struct sxe2_ipsec_rx_tcam *per = NULL;
+ uint16_t i;
+
+ memset(rx_tcam, 0, sizeof(struct sxe2_ipsec_rx_tcam) * len);
+ for (i = 0; i < len; i++) {
+ per = &rx_tcam[i];
+ per->id = i;
+ }
+}
+
+static void
+sxe2_ipsec_rx_udp_group_init(struct sxe2_ipsec_rx_udp_group *rx_udp_group, uint16_t len)
+{
+ struct sxe2_ipsec_rx_udp_group *per = NULL;
+ uint16_t i;
+
+ memset(rx_udp_group, 0, sizeof(struct sxe2_ipsec_rx_udp_group) * len);
+ for (i = 0; i < len; i++) {
+ per = &rx_udp_group[i];
+ per->id = i;
+ }
+}
+
+static int32_t
+sxe2_ipsec_hw_table_init(struct sxe2_security_ctx *sxe2_sctx)
+{
+ struct sxe2_ipsec_tx_sa *tx_sa = NULL;
+ struct sxe2_ipsec_rx_sa *rx_sa = NULL;
+ struct sxe2_ipsec_rx_tcam *rx_tcam = NULL;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group = NULL;
+ uint16_t max_tx_sa = sxe2_sctx->ipsec_ctx.max_tx_sa;
+ uint16_t max_rx_sa = sxe2_sctx->ipsec_ctx.max_rx_sa;
+ uint16_t max_tcam = sxe2_sctx->ipsec_ctx.max_tcam;
+ uint16_t max_udp_group = sxe2_sctx->ipsec_ctx.max_udp_group;
+ int32_t ret = -1;
+
+ tx_sa = rte_zmalloc("sxe2_ipsec_tx_sa", sizeof(struct sxe2_ipsec_tx_sa) * max_tx_sa, 0);
+ if (tx_sa == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ sxe2_ipsec_tx_sa_init(tx_sa, max_tx_sa);
+ sxe2_sctx->ipsec_ctx.tx_sa = tx_sa;
+
+ rx_sa = rte_zmalloc("sxe2_ipsec_rx_sa", sizeof(struct sxe2_ipsec_rx_sa) * max_rx_sa, 0);
+ if (rx_sa == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ sxe2_ipsec_rx_sa_init(rx_sa, max_rx_sa);
+ sxe2_sctx->ipsec_ctx.rx_sa = rx_sa;
+
+ rx_tcam = rte_zmalloc("sxe2_ipsec_rx_tcam",
+ sizeof(struct sxe2_ipsec_rx_tcam) * max_tcam, 0);
+ if (rx_tcam == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ sxe2_ipsec_rx_tcam_init(rx_tcam, max_tcam);
+ sxe2_sctx->ipsec_ctx.rx_tcam = rx_tcam;
+
+ rx_udp_group = rte_zmalloc("sxe2_ipsec_rx_udp_group",
+ sizeof(struct sxe2_ipsec_rx_udp_group) * max_udp_group, 0);
+ if (rx_udp_group == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+ sxe2_ipsec_rx_udp_group_init(rx_udp_group, max_udp_group);
+ sxe2_sctx->ipsec_ctx.rx_udp_group = rx_udp_group;
+
+ ret = 0;
+
+l_end:
+ if (ret) {
+ if (tx_sa != NULL) {
+ rte_free(tx_sa);
+ sxe2_sctx->ipsec_ctx.tx_sa = NULL;
+ }
+ if (rx_sa != NULL) {
+ rte_free(rx_sa);
+ sxe2_sctx->ipsec_ctx.rx_sa = NULL;
+ }
+ if (rx_tcam != NULL) {
+ rte_free(rx_tcam);
+ sxe2_sctx->ipsec_ctx.rx_tcam = NULL;
+ }
+ if (rx_udp_group != NULL) {
+ rte_free(rx_udp_group);
+ sxe2_sctx->ipsec_ctx.rx_udp_group = NULL;
+ }
+ }
+ return ret;
+}
+
+int32_t sxe2_ipsec_init(struct sxe2_adapter *adapter)
+{
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ struct sxe2_security_capabilities *sxe2_cap = NULL;
+ int32_t ret = -1;
+ struct rte_mbuf_dynfield pkt_md_dynfield = {
+ .name = "sxe2_ipsec_pkt_metadata",
+ .size = sizeof(struct sxe2_ipsec_pkt_metadata),
+ .align = alignof(struct sxe2_ipsec_pkt_metadata)
+ };
+
+ PMD_LOG_INFO(INIT, "Init ipsec.");
+
+ sxe2_sctx->ipsec_ctx.md_offset = rte_mbuf_dynfield_register(&pkt_md_dynfield);
+ if (sxe2_sctx->ipsec_ctx.md_offset < 0) {
+ PMD_LOG_ERR(INIT, "Failed to register ipsec mbuf dynamic field.");
+ ret = -EIO;
+ goto l_end;
+ }
+
+ ret = sxe2_ipsec_capabilities_init(sxe2_sctx);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to init ipsec capabilities.");
+ goto l_end;
+ }
+
+ ret = sxe2_drv_ipsec_get_capa(adapter);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to get ipsec capabilities.");
+ goto l_caps_free;
+ }
+
+ ret = sxe2_ipsec_bitmap_init(sxe2_sctx);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to init ipsec bitmap.");
+ goto l_caps_free;
+ }
+
+ ret = sxe2_ipsec_hw_table_init(sxe2_sctx);
+ if (ret) {
+ PMD_LOG_ERR(INIT, "Failed to init ipsec hw table.");
+ goto l_bitmap_free;
+ }
+
+ goto l_end;
+
+l_bitmap_free:
+
+ if (sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem = NULL;
+ }
+l_caps_free:
+ sxe2_cap = &sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC];
+ if (sxe2_cap->crypto_capabilities != NULL) {
+ rte_free(sxe2_cap->crypto_capabilities);
+ sxe2_cap->crypto_capabilities = NULL;
+ }
+l_end:
+ return ret;
+}
+
+void sxe2_ipsec_uinit(struct sxe2_adapter *adapter)
+{
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ struct sxe2_security_capabilities *sxe2_cap =
+ &sxe2_sctx->sxe2_capabilities[SXE2_SECURITY_PROTOCOL_IPSEC];
+ struct sxe2_ipsec_tx_sa *tx_sa = sxe2_sctx->ipsec_ctx.tx_sa;
+ struct sxe2_ipsec_rx_sa *rx_sa = sxe2_sctx->ipsec_ctx.rx_sa;
+ struct sxe2_ipsec_rx_tcam *rx_tcam = sxe2_sctx->ipsec_ctx.rx_tcam;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group = sxe2_sctx->ipsec_ctx.rx_udp_group;
+
+ PMD_LOG_INFO(INIT, "Uinit ipsec.");
+
+ (void)sxe2_drv_ipsec_resource_clear(adapter);
+
+ if (sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.tx_sa_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_sa_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_tcam_mem = NULL;
+ }
+ if (sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem != NULL) {
+ rte_free(sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem);
+ sxe2_sctx->ipsec_ctx.bmp.rx_udp_mem = NULL;
+ }
+
+ if (tx_sa != NULL) {
+ rte_free(tx_sa);
+ sxe2_sctx->ipsec_ctx.tx_sa = NULL;
+ }
+ if (rx_sa != NULL) {
+ rte_free(rx_sa);
+ sxe2_sctx->ipsec_ctx.rx_sa = NULL;
+ }
+ if (rx_tcam != NULL) {
+ rte_free(rx_tcam);
+ sxe2_sctx->ipsec_ctx.rx_tcam = NULL;
+ }
+ if (rx_udp_group != NULL) {
+ rte_free(rx_udp_group);
+ sxe2_sctx->ipsec_ctx.rx_udp_group = NULL;
+ }
+
+ if (sxe2_cap->crypto_capabilities != NULL) {
+ rte_free(sxe2_cap->crypto_capabilities);
+ sxe2_cap->crypto_capabilities = NULL;
+ }
+}
diff --git a/drivers/net/sxe2/sxe2_ipsec.h b/drivers/net/sxe2/sxe2_ipsec.h
new file mode 100644
index 0000000000..02930ddb4f
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_ipsec.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+#ifndef __SXE2_IPSEC_H__
+#define __SXE2_IPSEC_H__
+
+#include <rte_security.h>
+#include <rte_security_driver.h>
+
+struct sxe2_adapter;
+struct sxe2_security_session;
+
+#define SXE2_IPSEC_AES_KEY_MIN (32)
+#define SXE2_IPSEC_AES_KEY_MAX (32)
+#define SXE2_IPSEC_AES_KEY_INC (0)
+
+#define SXE2_IPSEC_SM4_KEY_MIN (16)
+#define SXE2_IPSEC_SM4_KEY_MAX (16)
+#define SXE2_IPSEC_SM4_KEY_INC (0)
+
+#define SXE2_IPSEC_SHA_KEY_MIN (32)
+#define SXE2_IPSEC_SHA_KEY_MAX (32)
+#define SXE2_IPSEC_SHA_KEY_INC (0)
+
+#define SXE2_IPSEC_SM3_KEY_MIN (32)
+#define SXE2_IPSEC_SM3_KEY_MAX (32)
+#define SXE2_IPSEC_SM3_KEY_INC (0)
+
+#define SXE2_IPSEC_AES_IV_MIN (16)
+#define SXE2_IPSEC_AES_IV_MAX (16)
+#define SXE2_IPSEC_AES_IV_INC (0)
+
+#define SXE2_IPSEC_SM4_IV_MIN (16)
+#define SXE2_IPSEC_SM4_IV_MAX (16)
+#define SXE2_IPSEC_SM4_IV_INC (0)
+
+#define SXE2_IPSEC_SHA_IV_MIN (0)
+#define SXE2_IPSEC_SHA_IV_MAX (32)
+#define SXE2_IPSEC_SHA_IV_INC (16)
+
+#define SXE2_IPSEC_SM3_IV_MIN (0)
+#define SXE2_IPSEC_SM3_IV_MAX (32)
+#define SXE2_IPSEC_SM3_IV_INC (16)
+
+#define SXE2_IPSEC_SHA_DIGEST_MIN (32)
+#define SXE2_IPSEC_SHA_DIGEST_MAX (32)
+#define SXE2_IPSEC_SHA_DIGEST_INC (0)
+
+#define SXE2_IPSEC_SM3_DIGEST_MIN (32)
+#define SXE2_IPSEC_SM3_DIGEST_MAX (32)
+#define SXE2_IPSEC_SM3_DIGEST_INC (0)
+
+#define SXE2_IPSEC_AAD_MIN (0)
+#define SXE2_IPSEC_AAD_MAX (0)
+#define SXE2_IPSEC_AAD_INC (0)
+
+#define SXE2_IPSEC_MAX_KEY_LEN (32)
+#define SXE2_IPSEC_MIN_KEY_LEN (0)
+
+#define SXE2_IPSEC_OL_FLAGS_IS_TUN (0x1 << 0)
+#define SXE2_IPSEC_OL_FLAGS_IS_ESP (0x1 << 1)
+
+#define SXE2_IPSEC_DEFAULT_SA_OFFSET (0)
+#define SXE2_IPSEC_DEFAULT_SA_LEN (1024)
+
+#define IPSEC_TX_ENCRYPT (RTE_BIT32(0))
+#define IPSEC_TX_ENGINE_SM4 (RTE_BIT32(1))
+
+#define IPSEC_RX_VALID (RTE_BIT32(0))
+#define IPSEC_RX_IPV6 (RTE_BIT32(2))
+#define IPSEC_RX_DECRYPT (RTE_BIT32(3))
+#define IPSEC_RX_ENGINE_SM4 (RTE_BIT32(4))
+
+#define IPSEC_IPV6_LEN (4)
+#define IPSEC_ESP_OFFSET_MIN (16)
+#define IPSEC_ESP_OFFSET_MAX (256)
+
+enum sxe2_ipsec_cap {
+ SXE2_IPSEC_CAP_ENC_AES_CBC = 0,
+ SXE2_IPSEC_CAP_ENC_SM4_CBC = 1,
+ SXE2_IPSEC_CAP_AUTH_SHA256_HMAC = 2,
+ SXE2_IPSEC_CAP_AUTH_SM3_HMAC = 3,
+ SXE2_IPSEC_CAP_MAX = 4,
+};
+
+enum sxe2_ipsec_icv_len {
+ SXE2_IPSEC_ICV_0_BYTES = 0,
+ SXE2_IPSEC_ICV_12_BYTES,
+ SXE2_IPSEC_ICV_16_BYTES,
+ SXE2_IPSEC_ICV_INVALID,
+};
+
+enum sxe2_ipsec_bypass_dir {
+ SXE2_IPSEC_BYPASS_DIR_RX = 0,
+ SXE2_IPSEC_BYPASS_DIR_TX,
+ SXE2_IPSEC_BYPASS_DIR_INVALID,
+};
+
+enum sxe2_ipsec_bypass_status {
+ SXE2_IPSEC_BYPASS_STATUS_DISABLE = 0,
+ SXE2_IPSEC_BYPASS_STATUS_ENABLE,
+ SXE2_IPSEC_BYPASS_STATUS_INVALID,
+};
+
+enum sxe2_ipsec_status {
+ SXE2_IPSEC_ENC_BYPASS = 0,
+ SXE2_IPSEC_ENC_ENABLE,
+ SXE2_IPSEC_ENC_INVALID,
+};
+
+enum sxe2_ipsec_mode {
+ SXE2_IPSEC_MODE_ENC_AND_AUTH = 0,
+ SXE2_IPSEC_MODE_ONLY_ENCRYPT,
+ SXE2_IPSEC_MODE_INVALID,
+};
+
+struct sxe2_ipsec_ip_param {
+ enum rte_security_ipsec_tunnel_type type;
+ union {
+ uint32_t dst_ipv4;
+ uint32_t dst_ipv6[IPSEC_IPV6_LEN];
+ };
+};
+
+enum sxe2_ipsec_algorithm {
+ SXE2_IPSEC_ALGO_AES_CBC_AND_SHA256_128_HMAC = 0,
+ SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC,
+ SXE2_IPSEC_ALGO_INVALID,
+};
+
+struct sxe2_ipsec_pkt_metadata {
+ uint16_t sa_idx;
+ uint16_t esp_head_offset;
+ uint8_t ol_flags;
+ uint8_t mode;
+ uint8_t algo;
+};
+
+struct sxe2_ipsec_bitmap {
+ struct rte_bitmap *tx_sa_bmp;
+ struct rte_bitmap *rx_sa_bmp;
+ struct rte_bitmap *rx_tcam_bmp;
+ struct rte_bitmap *rx_udp_bmp;
+ void *tx_sa_mem;
+ void *rx_sa_mem;
+ void *rx_tcam_mem;
+ void *rx_udp_mem;
+};
+
+struct sxe2_ipsec_security_sa {
+ uint32_t spi;
+ uint16_t hw_idx;
+ uint16_t sw_idx;
+};
+
+struct sxe2_ipsec_esn {
+ union {
+ uint64_t value;
+ struct {
+ uint32_t hi;
+ uint32_t low;
+ };
+ };
+ uint8_t enabled;
+};
+
+struct sxe2_ipsec_udp {
+ struct rte_security_ipsec_udp_param value;
+ uint8_t enabled;
+};
+
+struct sxe2_ipsec_tx_sa {
+ struct rte_security_ipsec_xform xform;
+ uint16_t id;
+ uint16_t hw_sa_id;
+ enum sxe2_ipsec_mode mode;
+ enum sxe2_ipsec_algorithm algo;
+ uint8_t enc_key[SXE2_IPSEC_MAX_KEY_LEN];
+ uint8_t auth_key[SXE2_IPSEC_MAX_KEY_LEN];
+};
+
+struct sxe2_ipsec_rx_sa {
+ struct rte_security_ipsec_xform xform;
+ uint32_t spi;
+ uint16_t id;
+ uint16_t hw_sa_id;
+ uint8_t hw_ip_id;
+ uint8_t hw_udp_group_id;
+ uint8_t tcam_id;
+ uint8_t udp_group_id;
+ uint8_t sdn_group_id;
+ enum sxe2_ipsec_mode mode;
+ enum sxe2_ipsec_algorithm algo;
+ uint8_t enc_key[SXE2_IPSEC_MAX_KEY_LEN];
+ uint8_t auth_key[SXE2_IPSEC_MAX_KEY_LEN];
+};
+
+struct sxe2_ipsec_rx_tcam {
+ struct sxe2_ipsec_ip_param ip_addr;
+ uint16_t id;
+ uint8_t hw_ip_id;
+ uint8_t ref_cnt;
+};
+
+struct sxe2_ipsec_rx_udp_group {
+ uint16_t udp_port;
+ uint8_t sport_en;
+ uint8_t dport_en;
+ uint8_t id;
+ uint8_t hw_group_id;
+ uint8_t ref_cnt;
+};
+
+struct sxe2_ipsec_ctx {
+ struct sxe2_ipsec_tx_sa *tx_sa;
+ struct sxe2_ipsec_rx_sa *rx_sa;
+ struct sxe2_ipsec_rx_tcam *rx_tcam;
+ struct sxe2_ipsec_rx_udp_group *rx_udp_group;
+ struct sxe2_ipsec_bitmap bmp;
+ int md_offset;
+ uint16_t max_tx_sa;
+ uint16_t max_rx_sa;
+ uint16_t max_tcam;
+ uint8_t max_udp_group;
+};
+
+struct sxe2_ipsec_metadata_params {
+ uint16_t esp_header_offset;
+ uint16_t reserved;
+};
+
+bool sxe2_ipsec_supported(struct sxe2_adapter *adapter);
+
+bool sxe2_ipsec_valid_tx_offloads(uint64_t offloads);
+
+bool sxe2_ipsec_valid_rx_offloads(uint64_t offloads);
+
+int sxe2_ipsec_pkt_md_offset_get(struct sxe2_adapter *adapter);
+
+int sxe2_ipsec_session_create(void *device,
+ struct rte_security_session_conf *conf,
+ struct sxe2_security_session *sxe2_sess);
+
+int sxe2_ipsec_session_destroy(void *device,
+ struct rte_security_session *session);
+
+int sxe2_ipsec_pkt_metadata_set(void *device, struct rte_security_session *session,
+ struct rte_mbuf *m, void *params);
+
+int32_t sxe2_ipsec_init(struct sxe2_adapter *adapter);
+
+void sxe2_ipsec_uinit(struct sxe2_adapter *adapter);
+
+#endif /* __SXE2_IPSEC_H__ */
diff --git a/drivers/net/sxe2/sxe2_rx.c b/drivers/net/sxe2/sxe2_rx.c
index 28832d5f71..007192c7d8 100644
--- a/drivers/net/sxe2/sxe2_rx.c
+++ b/drivers/net/sxe2/sxe2_rx.c
@@ -294,6 +294,11 @@ int32_t __rte_cold sxe2_rx_queue_setup(struct rte_eth_dev *dev,
goto l_end;
}
+ if (!sxe2_ipsec_valid_rx_offloads(offloads)) {
+ ret = -EINVAL;
+ goto l_end;
+ }
+
rxq = sxe2_rx_queue_alloc(dev, queue_idx, nb_desc, socket_id);
if (rxq == NULL) {
PMD_LOG_ERR(RX, "rx queue[%d] resource alloc failed", queue_idx);
diff --git a/drivers/net/sxe2/sxe2_security.c b/drivers/net/sxe2/sxe2_security.c
new file mode 100644
index 0000000000..bc59d1b880
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_security.c
@@ -0,0 +1,335 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#include <rte_malloc.h>
+
+#include "sxe2_ethdev.h"
+#include "sxe2_security.h"
+#include "sxe2_ipsec.h"
+#include "sxe2_common_log.h"
+
+static unsigned int
+sxe2_security_session_size_get(void *device __rte_unused)
+{
+ return sizeof(struct sxe2_security_session);
+}
+
+static int
+sxe2_security_session_create(void *device,
+ struct rte_security_session_conf *conf,
+ struct rte_security_session *session)
+{
+ int32_t ret = -1;
+ struct sxe2_security_session *sxe2_sess = NULL;
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+
+ switch (conf->protocol) {
+ case RTE_SECURITY_PROTOCOL_IPSEC:
+ ret = sxe2_ipsec_session_create(device, conf, sxe2_sess);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid security protocol.");
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static int
+sxe2_security_session_destroy(void *device, struct rte_security_session *session)
+{
+ int32_t ret = -1;
+ struct sxe2_security_session *sxe2_sess = NULL;
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+
+ switch (sxe2_sess->protocol) {
+ case RTE_SECURITY_PROTOCOL_IPSEC:
+ ret = sxe2_ipsec_session_destroy(device, session);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid security protocol.");
+ ret = -EINVAL;
+ break;
+ }
+ return ret;
+}
+
+static int
+sxe2_security_pkt_metadata_set(void *device,
+ struct rte_security_session *session,
+ struct rte_mbuf *m, void *params)
+{
+ struct sxe2_security_session *sxe2_sess = NULL;
+ sxe2_sess = SECURITY_GET_SESS_PRIV(session);
+ int32_t ret = -1;
+
+ switch (sxe2_sess->protocol) {
+ case RTE_SECURITY_PROTOCOL_IPSEC:
+ ret = sxe2_ipsec_pkt_metadata_set(device, session, m, params);
+ break;
+ default:
+ PMD_LOG_ERR(DRV, "Invalid security protocol.");
+ ret = -EINVAL;
+ break;
+ }
+
+ return ret;
+}
+
+static const struct rte_security_capability *
+sxe2_security_capabilities_get(void *device __rte_unused)
+{
+ static const struct rte_cryptodev_capabilities
+ ipsec_crypto_capabilities[] = {
+ {
+ .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
+ {.cipher = {
+ .algo = SXE2_RTE_CRYPTO_CIPHER_AES_CBC,
+ .block_size = SXE2_SECURITY_BLOCK_SIZE_16,
+ .key_size = {
+ .min = SXE2_IPSEC_AES_KEY_MIN,
+ .max = SXE2_IPSEC_AES_KEY_MAX,
+ .increment = SXE2_IPSEC_AES_KEY_INC
+ },
+ .iv_size = {
+ .min = SXE2_IPSEC_AES_IV_MIN,
+ .max = SXE2_IPSEC_AES_IV_MAX,
+ .increment = SXE2_IPSEC_AES_IV_INC
+ },
+ .dataunit_set = RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES,
+ }, }
+ }, }
+ },
+ {
+ .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_CIPHER,
+ {.cipher = {
+ .algo = SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC,
+ .block_size = SXE2_SECURITY_BLOCK_SIZE_16,
+ .key_size = {
+ .min = SXE2_IPSEC_SM4_KEY_MIN,
+ .max = SXE2_IPSEC_SM4_KEY_MAX,
+ .increment = SXE2_IPSEC_SM4_KEY_INC
+ },
+ .iv_size = {
+ .min = SXE2_IPSEC_SM4_IV_MIN,
+ .max = SXE2_IPSEC_SM4_IV_MAX,
+ .increment = SXE2_IPSEC_SM4_IV_INC
+ },
+ .dataunit_set = RTE_CRYPTO_CIPHER_DATA_UNIT_LEN_512_BYTES,
+ }, }
+ }, }
+ },
+ {
+ .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+ {.auth = {
+ .algo = SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC,
+ .block_size = SXE2_SECURITY_BLOCK_SIZE_64,
+ .key_size = {
+ .min = SXE2_IPSEC_SHA_KEY_MIN,
+ .max = SXE2_IPSEC_SHA_KEY_MAX,
+ .increment = SXE2_IPSEC_SHA_KEY_INC
+ },
+ .digest_size = {
+ .min = SXE2_IPSEC_SHA_DIGEST_MIN,
+ .max = SXE2_IPSEC_SHA_DIGEST_MAX,
+ .increment = SXE2_IPSEC_SHA_DIGEST_INC
+ },
+ .iv_size = {
+ .min = SXE2_IPSEC_SHA_IV_MIN,
+ .max = SXE2_IPSEC_SHA_IV_MAX,
+ .increment = SXE2_IPSEC_SHA_IV_INC
+ },
+ .aad_size = {
+ .min = SXE2_IPSEC_AAD_MIN,
+ .max = SXE2_IPSEC_AAD_MAX,
+ .increment = SXE2_IPSEC_AAD_INC
+ }
+ }, }
+ }, }
+ },
+ {
+ .op = RTE_CRYPTO_OP_TYPE_SYMMETRIC,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_AUTH,
+ {.auth = {
+ .algo = SXE2_RTE_CRYPTO_AUTH_SM3_HMAC,
+ .block_size = SXE2_SECURITY_BLOCK_SIZE_64,
+ .key_size = {
+ .min = SXE2_IPSEC_SM3_KEY_MIN,
+ .max = SXE2_IPSEC_SM3_KEY_MAX,
+ .increment = SXE2_IPSEC_SM3_KEY_INC
+ },
+ .digest_size = {
+ .min = SXE2_IPSEC_SM3_DIGEST_MIN,
+ .max = SXE2_IPSEC_SM3_DIGEST_MAX,
+ .increment = SXE2_IPSEC_SM3_DIGEST_INC
+ },
+ .iv_size = {
+ .min = SXE2_IPSEC_SM3_IV_MIN,
+ .max = SXE2_IPSEC_SM3_IV_MAX,
+ .increment = SXE2_IPSEC_SM3_IV_INC
+ },
+ .aad_size = {
+ .min = SXE2_IPSEC_AAD_MIN,
+ .max = SXE2_IPSEC_AAD_MAX,
+ .increment = SXE2_IPSEC_AAD_INC
+ }
+ }, }
+ }, }
+ },
+ {
+ .op = RTE_CRYPTO_OP_TYPE_UNDEFINED,
+ {.sym = {
+ .xform_type = RTE_CRYPTO_SYM_XFORM_NOT_SPECIFIED
+ }, }
+ }
+ };
+
+ static const struct rte_security_capability
+ sxe2_security_capabilities[] = {
+ {
+ .action = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO,
+ .protocol = RTE_SECURITY_PROTOCOL_IPSEC,
+ {.ipsec = {
+ .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP,
+ .mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL,
+ .direction = RTE_SECURITY_IPSEC_SA_DIR_EGRESS,
+ .options = {
+ .esn = 0,
+ .udp_encap = 1,
+ .copy_dscp = 0,
+ .copy_flabel = 0,
+ .copy_df = 0,
+ .dec_ttl = 0,
+ .ecn = 0,
+ .stats = 1,
+ .iv_gen_disable = 0,
+ .tunnel_hdr_verify = 1,
+ .udp_ports_verify = 1,
+ .ip_csum_enable = 0,
+ .l4_csum_enable = 0,
+ .ip_reassembly_en = 0,
+ .ingress_oop = 0
+ } } },
+ .crypto_capabilities = ipsec_crypto_capabilities,
+ .ol_flags = RTE_SECURITY_TX_OLOAD_NEED_MDATA
+ },
+ {
+ .action = RTE_SECURITY_ACTION_TYPE_INLINE_CRYPTO,
+ .protocol = RTE_SECURITY_PROTOCOL_IPSEC,
+ {.ipsec = {
+ .proto = RTE_SECURITY_IPSEC_SA_PROTO_ESP,
+ .mode = RTE_SECURITY_IPSEC_SA_MODE_TUNNEL,
+ .direction = RTE_SECURITY_IPSEC_SA_DIR_INGRESS,
+ .options = {
+ .esn = 0,
+ .udp_encap = 1,
+ .copy_dscp = 0,
+ .copy_flabel = 0,
+ .copy_df = 0,
+ .dec_ttl = 0,
+ .ecn = 0,
+ .stats = 1,
+ .iv_gen_disable = 0,
+ .tunnel_hdr_verify = 1,
+ .udp_ports_verify = 1,
+ .ip_csum_enable = 0,
+ .l4_csum_enable = 0,
+ .ip_reassembly_en = 0,
+ .ingress_oop = 0
+ } } },
+ .crypto_capabilities = ipsec_crypto_capabilities,
+ .ol_flags = 0
+ },
+ {
+ .action = RTE_SECURITY_ACTION_TYPE_NONE
+ }
+ };
+
+ return sxe2_security_capabilities;
+}
+
+static struct rte_security_ops sxe2_security_ops = {
+ .session_get_size = sxe2_security_session_size_get,
+ .session_create = sxe2_security_session_create,
+ .session_destroy = sxe2_security_session_destroy,
+ .set_pkt_metadata = sxe2_security_pkt_metadata_set,
+ .capabilities_get = sxe2_security_capabilities_get,
+};
+
+int32_t sxe2_security_init(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct rte_security_ctx *sctx = NULL;
+ struct sxe2_security_ctx *sxe2_sctx = &adapter->security_ctx;
+ int32_t ret = -1;
+
+ if (!sxe2_ipsec_supported(adapter)) {
+ ret = 0;
+ PMD_LOG_INFO(INIT, "Not support security feature.");
+ goto l_end;
+ }
+
+ PMD_LOG_INFO(INIT, "Init security feature.");
+
+ sctx = rte_zmalloc("security_ctx", sizeof(struct rte_security_ctx), 0);
+ if (sctx == NULL) {
+ ret = -ENOMEM;
+ goto l_end;
+ }
+
+ sctx->device = dev;
+ sctx->ops = &sxe2_security_ops;
+ sctx->sess_cnt = 0;
+ sctx->flags = 0;
+ dev->security_ctx = (void *)sctx;
+
+ rte_spinlock_init(&sxe2_sctx->security_lock);
+ sxe2_sctx->adapter = adapter;
+
+ if (sxe2_ipsec_supported(adapter)) {
+ ret = sxe2_ipsec_init(adapter);
+ if (ret) {
+ rte_free(sctx);
+ sctx = NULL;
+ dev->security_ctx = NULL;
+ goto l_end;
+ }
+ }
+
+ ret = 0;
+
+l_end:
+ return ret;
+}
+
+void sxe2_security_uinit(struct rte_eth_dev *dev)
+{
+ struct sxe2_adapter *adapter = SXE2_DEV_PRIVATE_TO_ADAPTER(dev);
+ struct rte_security_ctx *sctx = dev->security_ctx;
+
+ if (!sxe2_ipsec_supported(adapter)) {
+ PMD_LOG_INFO(INIT, "Not support security feature.");
+ goto l_end;
+ }
+
+ PMD_LOG_INFO(INIT, "Uinit security feature.");
+
+ if (sctx != NULL) {
+ rte_free(sctx);
+ sctx = NULL;
+ }
+
+ sxe2_ipsec_uinit(adapter);
+
+l_end:
+ return;
+}
diff --git a/drivers/net/sxe2/sxe2_security.h b/drivers/net/sxe2/sxe2_security.h
new file mode 100644
index 0000000000..366c0614bd
--- /dev/null
+++ b/drivers/net/sxe2/sxe2_security.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (C), 2025, Wuxi Stars Micro System Technologies Co., Ltd.
+ */
+
+#ifndef __SXE2_SECURITY_H__
+#define __SXE2_SECURITY_H__
+
+#include <rte_security.h>
+#include <rte_cryptodev.h>
+#include <rte_security_driver.h>
+
+#include "sxe2_ipsec.h"
+
+#define SXE2_DEV_TO_SECURITY(eth) \
+ ((struct rte_security_ctx *)(((struct rte_eth_dev *)eth)->security_ctx))
+
+#define SXE2_RTE_CRYPTO_CIPHER_AES_CBC (RTE_CRYPTO_CIPHER_AES_CBC)
+
+#define SXE2_RTE_RTE_CRYPTO_CIPHER_SM4_CBC (RTE_CRYPTO_CIPHER_SM4_CBC)
+
+#define SXE2_RTE_CRYPTO_AUTH_SHA256_HMAC (RTE_CRYPTO_AUTH_SHA256_HMAC)
+
+#define SXE2_RTE_CRYPTO_AUTH_SM3_HMAC (RTE_CRYPTO_AUTH_SM3_HMAC)
+
+enum sxe2_security_protocol {
+ SXE2_SECURITY_PROTOCOL_IPSEC = 0,
+ SXE2_SECURITY_PROTOCOL_MAX = 1,
+};
+
+enum sxe2_security_xform {
+ SXE2_SECURITY_IPSEC_EN = 0,
+ SXE2_SECURITY_IPSEC_DE = 1,
+ SXE2_SECURITY_NUM_MAX = 2,
+};
+
+enum sxe2_security_block_size {
+ SXE2_SECURITY_BLOCK_SIZE_16 = 16,
+ SXE2_SECURITY_BLOCK_SIZE_64 = 64,
+};
+
+struct sxe2_security_ipsec_caps {
+ enum rte_security_ipsec_sa_protocol proto;
+ enum rte_security_ipsec_sa_mode mode;
+ struct rte_security_ipsec_sa_options options;
+};
+
+struct sxe2_security_capabilities {
+ struct rte_cryptodev_capabilities *crypto_capabilities;
+ enum rte_security_session_action_type action;
+ struct sxe2_security_ipsec_caps ipsec;
+};
+
+struct sxe2_security_session {
+ struct sxe2_adapter *adapter;
+ struct sxe2_ipsec_pkt_metadata pkt_metadata_template;
+ struct sxe2_ipsec_security_sa sa;
+ struct sxe2_ipsec_esn esn;
+ struct sxe2_ipsec_udp udp_cap;
+ enum rte_security_session_protocol protocol;
+ enum rte_security_ipsec_sa_direction direction;
+ enum rte_security_ipsec_sa_mode mode;
+ enum rte_security_ipsec_sa_protocol sa_proto;
+ enum rte_security_ipsec_tunnel_type type;
+};
+
+struct sxe2_security_ctx {
+ struct sxe2_adapter *adapter;
+ struct sxe2_security_capabilities sxe2_capabilities[SXE2_SECURITY_PROTOCOL_MAX];
+ struct sxe2_ipsec_ctx ipsec_ctx;
+ rte_spinlock_t security_lock;
+};
+
+int32_t sxe2_security_init(struct rte_eth_dev *dev);
+
+void sxe2_security_uinit(struct rte_eth_dev *dev);
+
+#endif /* __SXE2_SECURITY_H__ */
diff --git a/drivers/net/sxe2/sxe2_tx.c b/drivers/net/sxe2/sxe2_tx.c
index a280edc9c5..f49238ceef 100644
--- a/drivers/net/sxe2/sxe2_tx.c
+++ b/drivers/net/sxe2/sxe2_tx.c
@@ -304,6 +304,11 @@ int32_t __rte_cold sxe2_tx_queue_setup(struct rte_eth_dev *dev,
}
offloads = tx_conf->offloads | dev->data->dev_conf.txmode.offloads;
+ if (!sxe2_ipsec_valid_tx_offloads(offloads)) {
+ ret = -EINVAL;
+ goto end;
+ }
+
txq = sxe2_tx_queue_alloc(dev, queue_idx, nb_desc, socket_id);
if (txq == NULL) {
PMD_LOG_ERR(TX, "failed to alloc sxe2vf tx queue:%u resource", queue_idx);
@@ -327,6 +332,9 @@ int32_t __rte_cold sxe2_tx_queue_setup(struct rte_eth_dev *dev,
txq->ops = sxe2_tx_default_ops_get();
txq->ops.queue_reset(txq);
+ if (sxe2_ipsec_supported(adapter) && txq->offloads & RTE_ETH_TX_OFFLOAD_SECURITY)
+ txq->ipsec_pkt_md_offset = sxe2_ipsec_pkt_md_offset_get(adapter);
+
dev->data->tx_queues[queue_idx] = txq;
ret = 0;
diff --git a/drivers/net/sxe2/sxe2_txrx_poll.c b/drivers/net/sxe2/sxe2_txrx_poll.c
index 3c6fe37404..8b6e585c36 100644
--- a/drivers/net/sxe2/sxe2_txrx_poll.c
+++ b/drivers/net/sxe2/sxe2_txrx_poll.c
@@ -307,6 +307,25 @@ static __rte_always_inline void sxe2_desc_tso_fill(struct rte_mbuf *tx_pkt,
return;
}
+static __rte_always_inline void sxe2_desc_ipsec_fill(struct rte_mbuf *tx_pkt,
+ struct sxe2_tx_queue *txq, uint16_t *ipsec_offset,
+ uint64_t *desc_type_cmd_tso_mss)
+{
+ struct sxe2_ipsec_pkt_metadata *md = NULL;
+ uint16_t ipsec_pkt_md_offset = txq->ipsec_pkt_md_offset;
+
+ md = RTE_MBUF_DYNFIELD(tx_pkt, ipsec_pkt_md_offset, struct sxe2_ipsec_pkt_metadata *);
+ *ipsec_offset = md->esp_head_offset;
+ *desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_IPSEC_EN;
+ if (md->mode == SXE2_IPSEC_MODE_ONLY_ENCRYPT)
+ *desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_IPSEC_MODE;
+
+ if (md->algo == SXE2_IPSEC_ALGO_SM4_CBC_AND_SM3_96_HMAC)
+ *desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_IPSEC_ENGINE;
+
+ *desc_type_cmd_tso_mss |= (uint64_t)(md->sa_idx) << SXE2_TX_CTXT_DESC_IPSEC_SA_SHIFT;
+}
+
static __rte_always_inline uint64_t
sxe2_tx_data_desc_build_cobt(uint32_t cmd, uint32_t offset, uint16_t buf_size, uint16_t l2tag)
{
@@ -426,6 +445,11 @@ uint16_t sxe2_tx_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkt
else if (offloads & RTE_MBUF_F_TX_IEEE1588_TMST)
desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_CMD_TSYN_MASK;
+ if (offloads & RTE_MBUF_F_TX_SEC_OFFLOAD) {
+ sxe2_desc_ipsec_fill(tx_pkt, txq, &ipsec_offset,
+ &desc_type_cmd_tso_mss);
+ }
+
if (offloads & RTE_MBUF_F_TX_QINQ) {
desc_l2tag2 = tx_pkt->vlan_tci_outer;
desc_type_cmd_tso_mss |= SXE2_TX_CTXT_DESC_CMD_IL2TAG2_MASK;
@@ -786,6 +810,36 @@ static inline void sxe2_rx_desc_ptp_para_fill(struct sxe2_rx_queue *rxq,
rxq->ts_low);
}
}
+
+static inline void sxe2_rx_desc_ipsec_para_fill(struct sxe2_rx_queue *rxq __rte_unused,
+ struct rte_mbuf *mbuf, union sxe2_rx_desc *desc)
+{
+ uint32_t status_lrocnt_fdpf_id = rte_le_to_cpu_32(desc->wb.status_lrocnt_fdpf_id);
+ enum sxe2_rx_desc_ipsec_status ipsec_status;
+
+ if (status_lrocnt_fdpf_id & SXE2_RX_DESC_IPSEC_PKT_MASK) {
+ mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD;
+ ipsec_status = SXE2_RX_DESC_IPSEC_STATUS_VAL_GET(status_lrocnt_fdpf_id);
+ switch (ipsec_status) {
+ case SXE2_RX_DESC_IPSEC_STATUS_SUCCESS:
+ break;
+ case SXE2_RX_DESC_IPSEC_STATUS_PKG_OVER_2K:
+ case SXE2_RX_DESC_IPSEC_STATUS_SPI_IP_INVALID:
+ case SXE2_RX_DESC_IPSEC_STATUS_SA_INVALID:
+ case SXE2_RX_DESC_IPSEC_STATUS_NOT_ALIGN:
+ case SXE2_RX_DESC_IPSEC_STATUS_ICV_ERROR:
+ case SXE2_RX_DESC_IPSEC_STATUS_BY_PASSH:
+ case SXE2_RX_DESC_IPSEC_STATUS_MAC_BY_PASSH:
+ PMD_LOG_INFO(RX, "IPsec status error:%d", ipsec_status);
+ mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED;
+ break;
+ default:
+ PMD_LOG_INFO(RX, "Invalid ipsec status:%d", ipsec_status);
+ mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED;
+ break;
+ }
+ }
+}
#endif
static __rte_always_inline void
@@ -803,6 +857,7 @@ sxe2_rx_mbuf_common_fields_fill(struct sxe2_rx_queue *rxq, struct rte_mbuf *mbuf
sxe2_rx_desc_vlan_para_fill(mbuf, rxd);
sxe2_rx_desc_filter_para_fill(rxq, mbuf, rxd);
#ifndef RTE_LIBRTE_SXE2_16BYTE_RX_DESC
+ sxe2_rx_desc_ipsec_para_fill(rxq, mbuf, rxd);
sxe2_rx_desc_ptp_para_fill(rxq, mbuf, rxd);
#endif
--
2.52.0
^ permalink raw reply related
* Re: [PATCH 1/8] telemetry: fix thread-unsafe command parsing
From: Bruce Richardson @ 2026-06-08 7:49 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, stable, Ciara Power, Keith Wiles
In-Reply-To: <20260605205253.520196-2-stephen@networkplumber.org>
On Fri, Jun 05, 2026 at 01:50:58PM -0700, Stephen Hemminger wrote:
> The telemetry client_handler() runs in a detached thread per connection,
> and up to MAX_CONNECTIONS instances can run concurrently.
> The function strtok() keeps parser state in a static variable
> shared across all threads, so concurrent clients corrupt each other's
> command parsing. Use strtok_r() with a local saveptr.
>
> Fixes: 6dd571fd07c3 ("telemetry: introduce new functionality")
> Cc: stable@dpdk.org
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> lib/telemetry/telemetry.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
> index b109d076d4..e591c1e283 100644
> --- a/lib/telemetry/telemetry.c
> +++ b/lib/telemetry/telemetry.c
> @@ -415,8 +415,9 @@ client_handler(void *sock_id)
> int bytes = read(s, buffer, sizeof(buffer) - 1);
> while (bytes > 0) {
> buffer[bytes] = 0;
> - const char *cmd = strtok(buffer, ",");
> - const char *param = strtok(NULL, "\0");
> + char *saveptr = NULL;
> + const char *cmd = strtok_r(buffer, ",", &saveptr);
> + const char *param = strtok_r(NULL, "\0", &saveptr);
> struct cmd_callback cb = {.fn = unknown_command};
> int i;
>
> --
> 2.53.0
>
^ permalink raw reply
* Re: [PATCH 0/8] telemetry: thread-safe and bounded parameter parsing
From: Bruce Richardson @ 2026-06-08 7:55 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev
In-Reply-To: <20260605205253.520196-1-stephen@networkplumber.org>
On Fri, Jun 05, 2026 at 01:50:57PM -0700, Stephen Hemminger wrote:
> While looking into extending telemetry for other uses, I noticed a
> pattern of unsafe string handling in the command handlers. They run one
> thread per client connection but parse parameters with non-reentrant
> strtok(), and convert ids with atoi()/unchecked strtoul() that silently
> truncate or alias out-of-range values; in eth_rx the strtok()
> continuation chain can also dereference freed memory.
>
> This series covers the library code (telemetry, ethdev, dmadev, security,
> eventdev, eth_rx, timer). A follow-up is needed for the same strtok()
> use in drivers.
>
> They are marked for stable: the races and the use-after-free are real and
> the changes are low-risk to backport. But severity is low since telemetry is
> not a remote interface, but these are the kind of issues likely to
> be found by AI security scanning tools.
>
> In future, atoi() and strtok() look worth adding to the forbidden
> tokens list in devtools/checkpatches.sh.
>
> Stephen Hemminger (8):
> telemetry: fix thread-unsafe command parsing
> ethdev: make telemetry parameter parsing thread-safe
> dmadev: validate telemetry parameters
> security: harden telemetry parameter parsing
> eventdev: remove strtok from telemetry handlers
> eventdev/eth_rx: fix thread-unsafe telemetry parsing
> eventdev/eth_rx: reject out-of-range telemetry adapter ID
> eventdev/timer: reject out-of-range ID
>
Series-Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply
* Re: [PATCH] eal: add destructor to unregister tailq on unload
From: Bruce Richardson @ 2026-06-08 7:57 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: dev, stable, David Marchand, Neil Horman
In-Reply-To: <20260607150418.30885-1-stephen@networkplumber.org>
On Sun, Jun 07, 2026 at 08:04:17AM -0700, Stephen Hemminger wrote:
> Libraries that use EAL_REGISTER_TAILQ insert a pointer to a static
> struct rte_tailq_elem into the process-local tailq list via a
> constructor, but have no matching destructor. When such a library
> is loaded as a dependency of a plugin via dlopen() and later
> unloaded via dlclose(), the list retains a dangling pointer to the
> now-unmapped static. Reloading the plugin crashes in
> rte_eal_tailq_local_register() when it traverses the stale entry.
>
> Add rte_eal_tailq_unregister() and extend the EAL_REGISTER_TAILQ
> macro to emit an RTE_FINI destructor alongside the existing
> RTE_INIT constructor. Every library that uses the macro
> automatically gets both sides; no per-library changes are needed.
>
> Bugzilla ID: 1081
> Fixes: 873a61c7526b ("tailq: introduce dynamic register system")
> Cc: stable@dpdk.org
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
^ permalink raw reply
* Re: [PATCH v15 0/5] Support add/remove memory region and get-max-slots
From: Maxime Coquelin @ 2026-06-08 8:09 UTC (permalink / raw)
To: pravin.bathija; +Cc: dev, fengchengwen, stephen, thomas
In-Reply-To: <CAO55csxM0wB4u4A8c9=C6epEs2dEt3DHxGc+C_EqQqG1aNUQew@mail.gmail.com>
On Fri, Jun 5, 2026 at 3:14 PM Maxime Coquelin
<maxime.coquelin@redhat.com> wrote:
>
> On Fri, Jun 5, 2026 at 1:57 AM <pravin.bathija@dell.com> wrote:
> >
> > From: Pravin M Bathija <pravin.bathija@dell.com>
> >
> > This is version v15 of the patchset and it incorporates the
> > recommendations made by Maxime Coquelin.
> >
> > Patch 4/5
> > - Changed VHOST_USER_REM_MEM_REG handler declaration from
> > accepts_fd=true to accepts_fd=false, as the remove request does not
> > expect FDs in ancillary data.
> > - Removed all close_msg_fds(ctx) calls from vhost_user_rem_mem_reg(), no
> > longer needed since the handler is declared as not accepting FDs.
> > - Removed validate_msg_fds(dev, ctx, 0) check from
> > vhost_user_rem_mem_reg(), as FD validation is now handled generically
> > by the framework.
> > - Added targeted IOTLB cache invalidation in vhost_user_rem_mem_reg()
> > using vhost_user_iotlb_cache_remove() for the removed region's GPA
> > range, instead of the nuclear iotlb_flush_all() used by set_mem_table.
> >
> > This implementation has been extensively tested by doing Read/Write I/O
> > from multiple instances of fio + libblkio (front-end) talking to
> > spdk/dpdk (back-end) based drives. Tested with qemu front-end talking to
> > dpdk testpmd (back-end) performing add/removal of memory regions. Also
> > tested post-copy live migration after doing add_memory_region.
> >
> > Version Log:
> > Version v15 (Current version): Incorporate code review suggestions from
> > Maxime Coquelin as described above.
> >
> > Version v14: Incorporate code review suggestions from Stephen Hemminger
> > and Fengcheng Wen.
> > Changes from Fengcheng Wen review:
> > Patch 3/5
> > - Moved free_all_mem_regions() call sites in vhost_user_set_mem_table()
> > from patch 4/5 to patch 3/5 so each commit compiles independently
> > Patch 4/5
> > - Renamed _dev_invalidate_vrings() to vhost_user_invalidate_vrings() to
> > follow vhost naming convention
> > - Added comment explaining *pdev propagation through
> > translate_ring_addresses / numa_realloc()
> > - Reordered local variables in vhost_user_add_mem_reg() and
> > vhost_user_rem_mem_reg() by descending line length
> > - Shortened overlap check variable names (current_region_guest_start/end
> > --> cur_start/end, proposed_region_guest_start/end -> new_start/end)
> > - Fixed DMA error path in vhost_user_add_mem_reg(): added
> > free_new_region_no_dma label so async_dma_map_region(false) is not
> > called when the map itself failed.
> > Changes from Stephen Hemminger review:
> > Patch 4/5
> > - vhost_user_add_mem_reg() now constructs a reply with the back-end's
> > host mapping address in userspace_addr and returns
> > RTE_VHOST_MSG_RESULT_REPLY per the vhost-user spec
> > - Added validate_msg_fds(dev, ctx, 0) in vhost_user_rem_mem_reg() to
> > reject malformed messages with unexpected file descriptors
> > - Dropped unnecessary (uint64_t) cast in vhost_user_get_max_mem_slots()
> >
> > Version v13: Incorporate code review suggestions from Fengcheng Wen
> > Patch 2/5
> > Renamed VhostUserSingleMemReg to VhostUserMemRegMsg and memory_single
> > to memreg
> > Patches 3/5 and 4/5
> > Relocated function remove_guest_pages from patch 3/5 to 4/5
> >
> > Version v12: Incorporate code review suggestions from Maxime Coquelin
> > and ai-code-review.
> > Patch 3/5
> > Refactored async_dma_map() to delegate to async_dma_map_region(),
> > eliminating code duplication between the two functions.
> > Restored original comments in async_dma_map_region() explaining why
> > ENODEV and EINVAL errors are ignored (these were stripped in v10)
> > Reverted unnecessary changes to vhost_user_postcopy_register() --
> > removed the host_user_addr == 0 checks and reg_msg_index indirection
> > that were added in v10, since this function is only called from
> > vhost_user_set_mem_table() where regions are always contiguous.
> >
> > Version v11: Incorporate code review suggestions from Stephen Hemminger.
> > Patch 4/5
> > Fix incomplete cleanup in vhost_user_add_mem_reg() when
> > vhost_user_mmap_region() fails after the mmap succeeds (e.g.
> > add_guest_pages() realloc failure) realloc failure). The error path now
> > calls remove_guest_pages() and free_mem_region() to undo the mapping
> > and stale guest-page entries, preventing a leaked mmap and slot reuse
> > corruption. The plain close(fd) path is kept for pre-mmap failures.
> >
> > Version v10: Incorporate code review suggestions from Stephen Hemminger.
> > Patch 4/5
> > Moved dev_invalidate_vrings after free_mem_region, array compaction, and
> > nregions decrement. This ensures translate_ring_addresses only sees
> > surviving memory regions, preventing vring pointers from resolving into
> > a region that is about to be unmapped.
> >
> > Version v9: Incorporate code review suggestions from Stephen Hemminger.
> > Patch 3/5
> > Restored max_guest_pages initial value to hardcoded 8 instead of
> > VHOST_MEMORY_MAX_NREGIONS, matching upstream semantics.
> > Patch 4/5
> > Added close(reg->fd) and reg->fd = -1 before goto close_msg_fds in the
> > mmap failure path to fix fd leak after fd was moved from ctx->fds[0].
> > Converted dev_invalidate_vrings from a plain function to a macro +
> > implementation function pair, accepting message ID as a parameter so
> > the static_assert reports the correct handler at each call site.
> > Updated dev_invalidate_vrings call in add_mem_reg to pass
> > VHOST_USER_ADD_MEM_REG as message ID.
> > Updated dev_invalidate_vrings call in rem_mem_reg to pass
> > VHOST_USER_REM_MEM_REG as message ID.
> >
> > Version v8: Incorporate code review suggestions from Stephen Hemminger.
> > rewrite async_dma_map_region function to iterate guest pages by host
> > address range matching
> > change function dev_invalidate_vrings to accept a double pointer to
> > propagate pointer updates
> > new function remove_guest_pages was added
> > add_mem_reg error path was narrowed to only clean up the single failed
> > region instead of destroting all existing regions
> >
> > Version v7: Incorporate code review suggestions from Maxime Coquelin.
> > Add debug messages to vhost_postcopy_register function.
> >
> > Version v6: Added the enablement of this feature as a final patch in
> > this patch-set and other code optimizations as suggested by Maxime
> > Coquelin.
> >
> > Version v5: removed the patch that increased the number of memory regions
> > from 8 to 128. This will be submitted as a separate feature at a later
> > point after incorporating additional optimizations. Also includes code
> > optimizations as suggested by Feng Cheng Wen.
> >
> > Version v4: code optimizations as suggested by Feng Cheng Wen.
> >
> > Version v3: code optimizations as suggested by Maxime Coquelin
> > and Thomas Monjalon.
> >
> > Version v2: code optimizations as suggested by Maxime Coquelin.
> >
> > Version v1: Initial patch set.
> >
> > Pravin M Bathija (5):
> > vhost: add user to mailmap and define to vhost hdr
> > vhost: header defines for add/rem mem region
> > vhost: refactor memory helper functions
> > vhost: add mem region add/remove handlers
> > vhost: enable configure memory slots
> >
> > .mailmap | 1 +
> > lib/vhost/rte_vhost.h | 4 +
> > lib/vhost/vhost_user.c | 425 +++++++++++++++++++++++++++++++++++------
> > lib/vhost/vhost_user.h | 10 +
> > 4 files changed, 378 insertions(+), 62 deletions(-)
> >
> > --
> > 2.43.0
> >
>
> Applied to next-virtio/for-next-net.
Given a new version has been posted and Thomas did not pull it yet,
I dropped this series from my pull request.
>
> Thanks,
> Maxime
^ permalink raw reply
* [PATCH] net/iavf: remove unreachable hash action case
From: Artem Novikov @ 2026-06-04 16:13 UTC (permalink / raw)
To: Vladimir Medvedkin, Qi Zhang, Jeff Guo; +Cc: dev, stable, Artem Novikov
The hash action parser stops iterating when it reaches
RTE_FLOW_ACTION_TYPE_END. The switch below uses action_type copied
directly from action->type inside that loop, so the END case cannot be
selected.
Remove the unreachable case. No functional change.
Fixes: 7be10c3004be ("net/iavf: add RSS configuration for VF")
Cc: stable@dpdk.org
Cc: jia.guo@intel.com
Signed-off-by: Artem Novikov <naa@amicon.ru>
---
drivers/net/intel/iavf/iavf_hash.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/net/intel/iavf/iavf_hash.c b/drivers/net/intel/iavf/iavf_hash.c
index 217f0500ab..7ce7b86d8e 100644
--- a/drivers/net/intel/iavf/iavf_hash.c
+++ b/drivers/net/intel/iavf/iavf_hash.c
@@ -1490,8 +1490,6 @@ iavf_hash_parse_action(struct iavf_pattern_match_item *match_item,
rss_type, pattern_hint);
break;
- case RTE_FLOW_ACTION_TYPE_END:
- break;
default:
rte_flow_error_set(error, EINVAL,
--
2.43.0
^ permalink raw reply related
* RE: [PATCH v3] net/iavf: fix duplicate VF reset during PF reset recovery
From: Loftus, Ciara @ 2026-06-08 9:58 UTC (permalink / raw)
To: Mandal, Anurag, dev@dpdk.org
Cc: Richardson, Bruce, Medvedkin, Vladimir, Mandal, Anurag,
stable@dpdk.org
In-Reply-To: <20260608054433.351880-1-anurag.mandal@intel.com>
> Subject: [PATCH v3] net/iavf: fix duplicate VF reset during PF reset recovery
>
> During PF initiated reset recovery, iavf_dev_close() sending
> an extra VIRTCHNL_OP_RESET_VF while recovery is already in progress.
> That second reset can leave PF/VF virtchnl state inconsistent and
> cause VIRTCHNL_OP_CONFIG_VSI_QUEUES to fail with ERR_PARAM after
> ToR link flap/power-cycle, leaving the VF unable to recover.
> This results in connection loss.
>
> Skipped close-time VF reset and related close-time virtchnl
> operations when PF triggered reset recovery is set. This is
> done to avoid a duplicate VF reset, and keep normal behavior
> for application-driven close.
> Handled link-change events through a common static function that
> reads the correct advanced & legacy link fields properly and
> updates no-poll/watchdog/LSC state consistently.
> Also added IAVF_ERR_ADMIN_QUEUE_NO_WORK in virtchnl message
> drain as a normal empty-queue condition and avoid logging it as
> an misleading AQ failure.
>
> Fixes: 675a104e2e94 ("net/iavf: fix abnormal disable HW interrupt")
> Fixes: b34fe66ea893 ("net/iavf: delay VF reset command")
> Fixes: 5e03e316c753 ("net/iavf: handle virtchnl event message without
> interrupt")
> Fixes: 5c8ca9f13c78 ("net/iavf: fix no polling mode switching")
> Fixes: 48de41ca11f0 ("net/avf: enable link status update")
> Fixes: 02d212ca3125 ("net/iavf: rename remaining avf strings")
> Cc: stable@dpdk.org
Hi Anurag,
Thanks for the patch. There seems to be multiple logical fixes/changes in
here and I think it would be good to split them into individual patches,
each with their own Fixes tag where relevant. Having multiple fixes in
one patch with multiple Fixes tags makes backporting tricky.
I think at least logic which prevents the RESET_VF during a PF initiated
reset should be split out from the link-change logic.
Thanks,
Ciara
>
> Signed-off-by: Anurag Mandal <anurag.mandal@intel.com>
> ---
> V3: Addressed latest ai-code-review comments
> V2: Addressed ai-code-review comments
>
> doc/guides/rel_notes/release_26_07.rst | 3 +
> drivers/net/intel/iavf/iavf_ethdev.c | 37 +++---
> drivers/net/intel/iavf/iavf_vchnl.c | 155 ++++++++++++++++---------
> 3 files changed, 123 insertions(+), 72 deletions(-)
>
> diff --git a/doc/guides/rel_notes/release_26_07.rst
> b/doc/guides/rel_notes/release_26_07.rst
> index b8a3e2ced9..e7ac730369 100644
> --- a/doc/guides/rel_notes/release_26_07.rst
> +++ b/doc/guides/rel_notes/release_26_07.rst
> @@ -89,6 +89,9 @@ New Features
>
> * Added support for transmitting LLDP packets based on mbuf packet type.
> * Implemented AVX2 context descriptor transmit paths.
> + * Prevented duplicate 'VIRTCHNL_OP_RESET_VF' during a PF-initiated
> + reset recovery, which earlier caused virtchnl state corruption
> + and connection loss after a top-of-rack (ToR) link flap/power-cycle.
>
> * **Updated PCAP ethernet driver.**
>
> diff --git a/drivers/net/intel/iavf/iavf_ethdev.c
> b/drivers/net/intel/iavf/iavf_ethdev.c
> index bdf650b822..fb6f287d3c 100644
> --- a/drivers/net/intel/iavf/iavf_ethdev.c
> +++ b/drivers/net/intel/iavf/iavf_ethdev.c
> @@ -3166,24 +3166,27 @@ iavf_dev_close(struct rte_eth_dev *dev)
>
> ret = iavf_dev_stop(dev);
>
> - /*
> - * Release redundant queue resource when close the dev
> - * so that other vfs can re-use the queues.
> - */
> - if (vf->lv_enabled) {
> - ret = iavf_request_queues(dev,
> IAVF_MAX_NUM_QUEUES_DFLT);
> - if (ret)
> - PMD_DRV_LOG(ERR, "Reset the num of queues
> failed");
> + /* Skip RESET_VF on a PF-initiated reset */
> + if (!adapter->closed && !vf->in_reset_recovery) {
adapter->closed will always be false here so the check is redundant.
> + /*
> + * Release redundant queue resource when close the dev
> + * so that other vfs can re-use the queues.
> + */
> + if (vf->lv_enabled) {
> + ret = iavf_request_queues(dev,
> IAVF_MAX_NUM_QUEUES_DFLT);
> + if (ret)
> + PMD_DRV_LOG(ERR, "Reset the num of
> queues failed");
> + vf->max_rss_qregion =
> IAVF_MAX_NUM_QUEUES_DFLT;
> + }
>
^ permalink raw reply
* [PATCH] doc: move firmware instructions in mlx5 guide
From: Thomas Monjalon @ 2026-06-08 12:05 UTC (permalink / raw)
To: dev
Cc: Dariusz Sosnowski, Viacheslav Ovsiienko, Bing Zhao, Ori Kam,
Suanming Mou, Matan Azrad
Having firmware update instructions before firmware config
looks simpler to find than in compilation prerequisites.
A link is also added after listing minimum firmware versions.
Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
---
doc/guides/platform/mlx5.rst | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/doc/guides/platform/mlx5.rst b/doc/guides/platform/mlx5.rst
index 285d58be4f..0a8917530d 100644
--- a/doc/guides/platform/mlx5.rst
+++ b/doc/guides/platform/mlx5.rst
@@ -143,7 +143,7 @@ The following dependencies are not part of DPDK and must be installed separately
- BlueField-2: **24.28.1002** and above.
- BlueField-3: **32.36.3126** and above.
- New features may be added in more recent firmwares.
+ New features may be added in more recent :ref:`firmwares <mlx5_firmware_config>`.
Libraries and kernel modules can be provided either by the Linux distribution,
or by installing NVIDIA MLNX_OFED/EN which provides compatibility with older kernels.
@@ -171,11 +171,6 @@ It is possible to build rdma-core as static libraries starting with version 21::
ninja
ninja install
-The firmware can be updated with `mlxup
-<https://docs.nvidia.com/networking/display/mlxupfwutility>`_.
-The latest firmwares can be downloaded at
-https://network.nvidia.com/support/firmware/firmware-downloads/
-
NVIDIA MLNX_OFED/EN
^^^^^^^^^^^^^^^^^^^
@@ -490,6 +485,11 @@ Additional information can be found in the WinOF2 user manual.
Firmware Configuration
~~~~~~~~~~~~~~~~~~~~~~
+The firmware can be updated with `mlxup
+<https://docs.nvidia.com/networking/display/mlxupfwutility>`_
+after `downloading a new version
+<https://network.nvidia.com/support/firmware/firmware-downloads/>`_.
+
Firmware features can be configured as key/value pairs.
The command to set a value is::
--
2.54.0
^ permalink raw reply related
* [PATCH v7 0/1] net/mana: add device reset support
From: Wei Hu @ 2026-06-08 12:08 UTC (permalink / raw)
To: dev, stephen; +Cc: longli, weh
From: Wei Hu <weh@microsoft.com>
Add support for handling hardware service reset events in the
MANA driver. When the MANA kernel driver receives a hardware
service event, it initiates a device reset and notifies userspace
via IBV_EVENT_DEVICE_FATAL. The MANA PMD handles this by
performing an automatic teardown and recovery sequence.
The driver uses ethdev recovery events (ERR_RECOVERING,
RECOVERY_SUCCESS, RECOVERY_FAILED) to notify upper layers of
the reset lifecycle, and a PCI device removal event callback
to distinguish hot-remove from service reset.
Changes since v6:
- Rebased onto latest upstream for-main
- Replaced removed RTE_ETH_DEV_TO_PCI macro with
RTE_CLASS_TO_BUS_DEVICE (upstream commit 4757b8df04
removed the old bus-specific ethdev convenience macros)
Changes since v5:
- Replaced RCU QSBR with per-queue atomic burst_state using a
single-variable CAS design: bit 0 is the in-burst flag, bits
1+ encode device state. The data path uses CAS(0→1) to enter
burst and fetch_and(~1) to exit. The reset path uses fetch_or
to set state bits and polls bit 0 to drain in-flight bursts.
This eliminates the two-variable Dekker pattern and the need
for sequential consistency (seq_cst) ordering.
- Removed librte_rcu dependency
- Removed __rte_no_thread_safety_analysis annotations (no longer
needed after mutex conversion)
- Moved ERR_RECOVERING event emission before acquiring
reset_ops_lock and before mana_reset_enter, so upper layers
(e.g. netvsc) can switch data path before mana stops queues.
Emitting outside the lock avoids deadlock if the callback
calls dev_stop or dev_close.
- Replaced MANA_OPS_*_LOCK macros with mana_reset_trylock()
helper function and explicit per-operation wrappers
- Removed unused rte_alarm.h and rte_lock_annotations.h includes
- Added RECOVERY_FAILED event when mana_reset_enter fails
internally, so the application always receives a terminal event
- Added mana_clear_burst_state() helper to clear per-queue
burst_state on failure paths (reset_failed, dev_stop_lock,
dev_close_lock) preventing permanent silent packet drop after
a failed reset
Changes since v4:
- Fixed stale rte_spinlock_unlock call in mana_intr_handler that
was missed during the spinlock-to-mutex conversion, causing a
-Wincompatible-pointer-types warning
Changes since v3:
- Converted reset_ops_lock from rte_spinlock_t to pthread_mutex_t
with PTHREAD_PROCESS_SHARED, since the lock is held across
blocking IB verbs calls and IPC with 5s timeout
- Removed rte_dev_event_callback_unregister retry loop to avoid
deadlock: the callback itself blocks on reset_ops_lock, so
retrying on -EAGAIN while holding the lock is a deadlock
- Introduced mana_join_reset_thread() helper using CAS on
reset_thread_active to prevent double-join undefined behavior
- Added reset thread join in mana_dev_uninit to prevent thread
leak on device removal
- Fixed ibv handle leak: priv->ib_ctx is now only set to NULL
after ibv_close_device succeeds
- Fixed misleading "All secondary threads are quiescent" log in
mana_mp_reset_enter — changed to "Secondary doorbell pages
unmapped" since actual quiescence is enforced by the primary's
per-queue atomic flag check before IPC is sent
- Changed event list in mana.rst to RST definition list style
- Squashed documentation into the feature patch per convention
Changes since v2:
- Fixed dev_state_qsv memory leak on device removal
- Fixed reset thread TCB/stack leak: reset_thread_active is now
only cleared by the joiner, not the thread itself
- Fixed second reset crash: removed reset thread join logic from
mana_dev_close (inner function) to avoid corrupting dev_state
when called from mana_reset_enter
- Made reset_thread_active RTE_ATOMIC(bool) with explicit ordering
- Added retry loop for rte_dev_event_callback_unregister on -EAGAIN
- Initialized condvar/mutex with PTHREAD_PROCESS_SHARED since priv
is in hugepage shared memory
- Added re-check of dev_state after lock acquisition in
mana_intr_handler to prevent racing with pci_remove_event_cb
- Replaced (void *)0 with NULL in mp.c
- Added lock ownership comment block at mana_reset_enter
- Documented rte_dev_event_monitor_start() requirement
- Added mana.rst documentation and release note
Changes since v1:
- Removed net/netvsc patch from this series
- Simplified reset exit: mana_reset_exit calls
mana_reset_exit_delay directly instead of spawning a thread
- Added __rte_no_thread_safety_analysis annotations for clang
- Switched to rte_thread_create_internal_control
- Fixed declaration-after-statement style issues
- Removed unnecessary blank lines and stale comments
Wei Hu (1):
net/mana: add device reset support
doc/guides/nics/mana.rst | 39 +
doc/guides/rel_notes/release_26_07.rst | 8 +
drivers/net/mana/mana.c | 1049 ++++++++++++++++++++++--
drivers/net/mana/mana.h | 46 +-
drivers/net/mana/mp.c | 89 +-
drivers/net/mana/mr.c | 6 +-
drivers/net/mana/rx.c | 23 +-
drivers/net/mana/tx.c | 44 +-
8 files changed, 1196 insertions(+), 108 deletions(-)
--
2.34.1
^ permalink raw reply
* [PATCH v7 1/1] net/mana: add device reset support
From: Wei Hu @ 2026-06-08 12:08 UTC (permalink / raw)
To: dev, stephen; +Cc: longli, weh
In-Reply-To: <20260608120824.287050-1-weh@linux.microsoft.com>
From: Wei Hu <weh@microsoft.com>
Add support for handling hardware reset events in the MANA driver.
When the MANA kernel driver receives a hardware service event, it
initiates a device reset and notifies userspace via
IBV_EVENT_DEVICE_FATAL. The DPDK driver handles this by performing
an automatic teardown and recovery sequence.
The reset flow has two phases. In the enter phase, running on the
EAL interrupt thread, the driver transitions the device state,
waits for data path threads to drain using per-queue atomic flags,
stops queues, tears down IB resources, and frees per-queue MR
caches. A control thread is then spawned to handle the exit phase:
it waits for the hardware to recover, unregisters the interrupt
handler, re-probes the PCI device, reinitializes MR caches, and
restarts queues.
Each queue has an atomic burst_state variable where bit 0 is the
in-burst flag and bits 1+ encode device state. The data path uses
a single compare-and-swap (0 to 1) to enter a burst, which fails
immediately if the reset path has set any state bits. The reset
path sets state bits via atomic fetch-or and polls bit 0 to wait
for in-flight bursts to drain. This single-variable design avoids
the need for sequential consistency ordering.
A per-device mutex serializes the reset path with ethdev
operations. The mutex uses PTHREAD_PROCESS_SHARED for multi-process
support and is held across blocking IB verbs calls. A trylock
helper encapsulates the lock acquisition and device state check
for all ethdev operation wrappers. Operations that cannot wait
(configure, queue setup) return -EBUSY during reset, while
dev_stop and dev_close join the reset thread before acquiring
the lock to ensure proper sequencing. A CAS-based helper prevents
double-join of the reset thread.
Multi-process support is included: secondary processes unmap and
remap doorbell pages via IPC during the reset enter and exit
phases. Data path functions in both primary and secondary
processes check the device state atomically and return early when
the device is not active.
The driver emits RTE_ETH_EVENT_ERR_RECOVERING before entering the
reset path so that upper layers (e.g. netvsc) can switch their
data path before queues are stopped. The event is emitted outside
the reset lock to avoid deadlock if the callback calls dev_stop or
dev_close. On completion, the driver emits RECOVERY_SUCCESS or
RECOVERY_FAILED. If the enter phase fails internally,
RECOVERY_FAILED is sent immediately so the application receives a
terminal event. A PCI device removal event callback distinguishes
hot-remove from service reset.
Documentation for the device reset feature is added in the MANA
NIC guide and the 26.07 release notes.
Signed-off-by: Wei Hu <weh@microsoft.com>
---
doc/guides/nics/mana.rst | 39 +
doc/guides/rel_notes/release_26_07.rst | 8 +
drivers/net/mana/mana.c | 1049 ++++++++++++++++++++++--
drivers/net/mana/mana.h | 46 +-
drivers/net/mana/mp.c | 89 +-
drivers/net/mana/mr.c | 6 +-
drivers/net/mana/rx.c | 23 +-
drivers/net/mana/tx.c | 44 +-
8 files changed, 1196 insertions(+), 108 deletions(-)
diff --git a/doc/guides/nics/mana.rst b/doc/guides/nics/mana.rst
index 0fcab6e2f6..136adf8808 100644
--- a/doc/guides/nics/mana.rst
+++ b/doc/guides/nics/mana.rst
@@ -71,3 +71,42 @@ The user can specify below argument in devargs.
The default value is not set,
meaning all the NICs will be probed and loaded.
User can specify multiple mac=xx:xx:xx:xx:xx:xx arguments for up to 8 NICs.
+
+Device Reset Support
+--------------------
+
+The MANA PMD supports automatic recovery from hardware service reset events.
+When the MANA kernel driver receives a hardware service event,
+it initiates a device reset and notifies userspace
+via ``IBV_EVENT_DEVICE_FATAL``.
+
+The driver handles this transparently through a two-phase reset flow:
+
+* **Enter phase**: The driver stops the data path,
+ waits for all in-flight burst calls to drain
+ using per-queue atomic flags,
+ tears down IB resources and queues,
+ and unmaps secondary process doorbell pages.
+
+* **Exit phase**: After a delay for hardware recovery,
+ a control thread re-probes the PCI device,
+ reinstalls the interrupt handler,
+ reinitializes resources, and restarts queues.
+
+The driver emits the following ethdev recovery events
+to notify upper layers (e.g. netvsc) of the reset lifecycle:
+
+``RTE_ETH_EVENT_ERR_RECOVERING``
+ Reset has started.
+
+``RTE_ETH_EVENT_RECOVERY_SUCCESS``
+ Device has recovered successfully.
+
+``RTE_ETH_EVENT_RECOVERY_FAILED``
+ Recovery failed.
+
+To distinguish a PCI hot-remove from a service reset,
+the driver registers for PCI device removal events.
+This requires the application to call ``rte_dev_event_monitor_start()``
+for removal events to be delivered
+(e.g. testpmd ``--hot-plug-handling`` option).
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index bd0cec2709..58e8c2422e 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -122,6 +122,14 @@ New Features
Added AGENTS.md file for AI review
and supporting scripts to review patches and documentation.
+* **Added device reset support to the MANA PMD.**
+
+ Added automatic recovery from hardware service reset events
+ in the MANA poll mode driver. The driver uses ethdev recovery events
+ (``RTE_ETH_EVENT_ERR_RECOVERING``, ``RTE_ETH_EVENT_RECOVERY_SUCCESS``,
+ ``RTE_ETH_EVENT_RECOVERY_FAILED``) to notify upper layers of the
+ reset lifecycle.
+
Removed Items
-------------
diff --git a/drivers/net/mana/mana.c b/drivers/net/mana/mana.c
index 67396cda1f..eecbfb8009 100644
--- a/drivers/net/mana/mana.c
+++ b/drivers/net/mana/mana.c
@@ -103,6 +103,8 @@ mana_dev_configure(struct rte_eth_dev *dev)
RTE_ETH_RX_OFFLOAD_VLAN_STRIP);
priv->num_queues = dev->data->nb_rx_queues;
+ DRV_LOG(DEBUG, "priv %p, port %u, dev port %u, num_queues: %u",
+ priv, priv->port_id, priv->dev_port, priv->num_queues);
manadv_set_context_attr(priv->ib_ctx, MANADV_CTX_ATTR_BUF_ALLOCATORS,
(void *)((uintptr_t)&(struct manadv_ctx_allocators){
@@ -214,8 +216,8 @@ mana_dev_start(struct rte_eth_dev *dev)
DRV_LOG(INFO, "TX/RX queues have started");
- /* Enable datapath for secondary processes */
- mana_mp_req_on_rxtx(dev, MANA_MP_REQ_START_RXTX);
+ /* Intentionally ignore errors — secondary may not be running */
+ (void)mana_mp_req_on_rxtx(dev, MANA_MP_REQ_START_RXTX);
ret = rxq_intr_enable(priv);
if (ret) {
@@ -242,26 +244,33 @@ mana_dev_stop(struct rte_eth_dev *dev)
{
int ret;
struct mana_priv *priv = dev->data->dev_private;
-
- rxq_intr_disable(priv);
+ enum mana_device_state state;
+
+ state = rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire);
+ if (state == MANA_DEV_ACTIVE ||
+ state == MANA_DEV_RESET_FAILED) {
+ rxq_intr_disable(priv);
+ DRV_LOG(DEBUG, "rxq_intr_disable called");
+ }
dev->tx_pkt_burst = mana_tx_burst_removed;
dev->rx_pkt_burst = mana_rx_burst_removed;
- /* Stop datapath on secondary processes */
- mana_mp_req_on_rxtx(dev, MANA_MP_REQ_STOP_RXTX);
+ /* Intentionally ignore errors — secondary may not be running */
+ (void)mana_mp_req_on_rxtx(dev, MANA_MP_REQ_STOP_RXTX);
rte_wmb();
ret = mana_stop_tx_queues(dev);
if (ret) {
- DRV_LOG(ERR, "failed to stop tx queues");
+ DRV_LOG(ERR, "failed to stop tx queues, ret %d", ret);
return ret;
}
ret = mana_stop_rx_queues(dev);
if (ret) {
- DRV_LOG(ERR, "failed to stop tx queues");
+ DRV_LOG(ERR, "failed to stop rx queues, ret %d", ret);
return ret;
}
@@ -275,36 +284,66 @@ mana_dev_close(struct rte_eth_dev *dev)
{
struct mana_priv *priv = dev->data->dev_private;
int ret;
+ enum mana_device_state state;
+ DRV_LOG(DEBUG, "Free MR for priv %p", priv);
mana_remove_all_mr(priv);
- ret = mana_intr_uninstall(priv);
- if (ret)
- return ret;
+ state = rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire);
+ if (state == MANA_DEV_ACTIVE ||
+ state == MANA_DEV_RESET_FAILED) {
+ ret = mana_intr_uninstall(priv);
+ if (ret)
+ return ret;
+ }
if (priv->ib_parent_pd) {
- int err = ibv_dealloc_pd(priv->ib_parent_pd);
- if (err)
- DRV_LOG(ERR, "Failed to deallocate parent PD: %d", err);
+ ret = ibv_dealloc_pd(priv->ib_parent_pd);
+ if (ret)
+ DRV_LOG(ERR,
+ "Failed to deallocate parent PD: %d", ret);
priv->ib_parent_pd = NULL;
}
if (priv->ib_pd) {
- int err = ibv_dealloc_pd(priv->ib_pd);
- if (err)
- DRV_LOG(ERR, "Failed to deallocate PD: %d", err);
+ ret = ibv_dealloc_pd(priv->ib_pd);
+ if (ret)
+ DRV_LOG(ERR, "Failed to deallocate PD: %d", ret);
priv->ib_pd = NULL;
}
- ret = ibv_close_device(priv->ib_ctx);
- if (ret) {
- ret = errno;
- return ret;
+ state = rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire);
+ if (state == MANA_DEV_ACTIVE ||
+ state == MANA_DEV_RESET_FAILED) {
+ if (priv->ib_ctx) {
+ ret = ibv_close_device(priv->ib_ctx);
+ if (ret) {
+ ret = errno;
+ return ret;
+ }
+ priv->ib_ctx = NULL;
+ }
}
return 0;
}
+/*
+ * Called from mana_pci_remove to free resources allocated
+ * during probe that are not freed by dev_close.
+ */
+static void
+mana_dev_free_resources(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+
+ pthread_mutex_destroy(&priv->reset_ops_lock);
+ pthread_mutex_destroy(&priv->reset_cond_mutex);
+ pthread_cond_destroy(&priv->reset_cond);
+}
+
static int
mana_dev_info_get(struct rte_eth_dev *dev,
struct rte_eth_dev_info *dev_info)
@@ -391,6 +430,39 @@ mana_dev_info_get(struct rte_eth_dev *dev,
return 0;
}
+/*
+ * Try to acquire the reset lock and verify the device is active.
+ * Returns 0 with lock held on success, or -EBUSY if the lock
+ * could not be acquired or the device is not in ACTIVE state.
+ */
+static int
+mana_reset_trylock(struct mana_priv *priv)
+{
+ if (pthread_mutex_trylock(&priv->reset_ops_lock))
+ return -EBUSY;
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_ACTIVE) {
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return -EBUSY;
+ }
+ return 0;
+}
+
+static int
+mana_dev_info_get_lock(struct rte_eth_dev *dev,
+ struct rte_eth_dev_info *dev_info)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_info_get(dev, dev_info);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
static void
mana_dev_tx_queue_info(struct rte_eth_dev *dev, uint16_t queue_id,
struct rte_eth_txq_info *qinfo)
@@ -552,6 +624,22 @@ mana_dev_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
return ret;
}
+static int
+mana_dev_tx_queue_setup_lock(struct rte_eth_dev *dev, uint16_t queue_idx,
+ uint16_t nb_desc, unsigned int socket_id,
+ const struct rte_eth_txconf *tx_conf)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_tx_queue_setup(dev, queue_idx,
+ nb_desc, socket_id, tx_conf);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
static void
mana_dev_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
{
@@ -629,6 +717,23 @@ mana_dev_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx,
return ret;
}
+static int
+mana_dev_rx_queue_setup_lock(struct rte_eth_dev *dev, uint16_t queue_idx,
+ uint16_t nb_desc, unsigned int socket_id,
+ const struct rte_eth_rxconf *rx_conf __rte_unused,
+ struct rte_mempool *mp)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_rx_queue_setup(dev, queue_idx, nb_desc,
+ socket_id, rx_conf, mp);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
static void
mana_dev_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
{
@@ -820,33 +925,253 @@ mana_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
return mana_ifreq(priv, SIOCSIFMTU, &request);
}
+static int
+mana_dev_configure_lock(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_configure(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_dev_start_lock(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_dev_start(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+/*
+ * Join the reset thread if it is active. Uses CAS on
+ * reset_thread_active to ensure only one caller joins.
+ */
+static void
+mana_join_reset_thread(struct mana_priv *priv)
+{
+ bool expected = true;
+
+ if (rte_atomic_compare_exchange_strong_explicit(
+ &priv->reset_thread_active, &expected, false,
+ rte_memory_order_acq_rel,
+ rte_memory_order_acquire)) {
+ pthread_mutex_lock(&priv->reset_cond_mutex);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_ACTIVE, rte_memory_order_release);
+ pthread_cond_signal(&priv->reset_cond);
+ pthread_mutex_unlock(&priv->reset_cond_mutex);
+ rte_thread_join(priv->reset_thread, NULL);
+ }
+}
+
+/*
+ * Clear per-queue burst_state so the data path CAS can succeed again.
+ * Must be called under reset_ops_lock when transitioning back to ACTIVE
+ * after a failed or aborted reset.
+ */
+static void
+mana_clear_burst_state(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int i;
+
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ if (rxq)
+ rte_atomic_store_explicit(&rxq->burst_state, 0,
+ rte_memory_order_release);
+ if (txq)
+ rte_atomic_store_explicit(&txq->burst_state, 0,
+ rte_memory_order_release);
+ }
+}
+
+/*
+ * Custom lock wrappers for dev_stop and dev_close.
+ * These join any active reset thread and use a blocking lock (not
+ * trylock) so they wait for any in-progress reset processing to
+ * finish, rather than returning -EBUSY. When the device is not in
+ * MANA_DEV_ACTIVE state, they transition state to MANA_DEV_ACTIVE.
+ */
+static int
+mana_dev_stop_lock(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ mana_join_reset_thread(priv);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_ACTIVE) {
+ mana_clear_burst_state(dev);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_ACTIVE, rte_memory_order_release);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return 0;
+ }
+
+ ret = mana_dev_stop(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_dev_close_lock(struct rte_eth_dev *dev)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ mana_join_reset_thread(priv);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_ACTIVE) {
+ mana_clear_burst_state(dev);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_ACTIVE, rte_memory_order_release);
+ }
+
+ ret = mana_dev_close(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_rss_hash_update_lock(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_rss_hash_update(dev, rss_conf);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_rss_hash_conf_get_lock(struct rte_eth_dev *dev,
+ struct rte_eth_rss_conf *rss_conf)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_rss_hash_conf_get(dev, rss_conf);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static void
+mana_dev_tx_queue_release_lock(struct rte_eth_dev *dev, uint16_t qid)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+
+ if (mana_reset_trylock(priv)) {
+ DRV_LOG(ERR, "Device reset in progress, "
+ "mana_dev_tx_queue_release not called");
+ return;
+ }
+ mana_dev_tx_queue_release(dev, qid);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+}
+
+static void
+mana_dev_rx_queue_release_lock(struct rte_eth_dev *dev, uint16_t qid)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+
+ if (mana_reset_trylock(priv)) {
+ DRV_LOG(ERR, "Device reset in progress, "
+ "mana_dev_rx_queue_release not called");
+ return;
+ }
+ mana_dev_rx_queue_release(dev, qid);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+}
+
+static int
+mana_rx_intr_enable_lock(struct rte_eth_dev *dev, uint16_t rx_queue_id)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_rx_intr_enable(dev, rx_queue_id);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_rx_intr_disable_lock(struct rte_eth_dev *dev, uint16_t rx_queue_id)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_rx_intr_disable(dev, rx_queue_id);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
+static int
+mana_mtu_set_lock(struct rte_eth_dev *dev, uint16_t mtu)
+{
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ if (mana_reset_trylock(priv))
+ return -EBUSY;
+ ret = mana_mtu_set(dev, mtu);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+}
+
static const struct eth_dev_ops mana_dev_ops = {
- .dev_configure = mana_dev_configure,
- .dev_start = mana_dev_start,
- .dev_stop = mana_dev_stop,
- .dev_close = mana_dev_close,
- .dev_infos_get = mana_dev_info_get,
+ .dev_configure = mana_dev_configure_lock,
+ .dev_start = mana_dev_start_lock,
+ .dev_stop = mana_dev_stop_lock,
+ .dev_close = mana_dev_close_lock,
+ .dev_infos_get = mana_dev_info_get_lock,
.txq_info_get = mana_dev_tx_queue_info,
.rxq_info_get = mana_dev_rx_queue_info,
.dev_supported_ptypes_get = mana_supported_ptypes,
- .rss_hash_update = mana_rss_hash_update,
- .rss_hash_conf_get = mana_rss_hash_conf_get,
- .tx_queue_setup = mana_dev_tx_queue_setup,
- .tx_queue_release = mana_dev_tx_queue_release,
- .rx_queue_setup = mana_dev_rx_queue_setup,
- .rx_queue_release = mana_dev_rx_queue_release,
- .rx_queue_intr_enable = mana_rx_intr_enable,
- .rx_queue_intr_disable = mana_rx_intr_disable,
+ .rss_hash_update = mana_rss_hash_update_lock,
+ .rss_hash_conf_get = mana_rss_hash_conf_get_lock,
+ .tx_queue_setup = mana_dev_tx_queue_setup_lock,
+ .tx_queue_release = mana_dev_tx_queue_release_lock,
+ .rx_queue_setup = mana_dev_rx_queue_setup_lock,
+ .rx_queue_release = mana_dev_rx_queue_release_lock,
+ .rx_queue_intr_enable = mana_rx_intr_enable_lock,
+ .rx_queue_intr_disable = mana_rx_intr_disable_lock,
.link_update = mana_dev_link_update,
.stats_get = mana_dev_stats_get,
.stats_reset = mana_dev_stats_reset,
- .mtu_set = mana_mtu_set,
+ .mtu_set = mana_mtu_set_lock,
};
static const struct eth_dev_ops mana_dev_secondary_ops = {
.stats_get = mana_dev_stats_get,
.stats_reset = mana_dev_stats_reset,
- .dev_infos_get = mana_dev_info_get,
+ .dev_infos_get = mana_dev_info_get_lock,
};
uint16_t
@@ -1031,28 +1356,490 @@ mana_ibv_device_to_pci_addr(const struct ibv_device *device,
return 0;
}
+static int mana_pci_probe(struct rte_pci_driver *pci_drv,
+ struct rte_pci_device *pci_dev);
+static void mana_intr_handler(void *arg);
+static void mana_reset_exit(struct mana_priv *priv);
+
+/* Delay before initiating reset exit after reset enter completes */
+#define MANA_RESET_TIMER_US (15 * 1000000ULL) /* 15 seconds */
+
/*
- * Interrupt handler from IB layer to notify this device is being removed.
+ * Callback for PCI device removal events from EAL.
+ * If the device is in reset (RESET_EXIT state), this means the PCI
+ * device was hot-removed rather than a service reset. Wake the reset
+ * thread via condvar and notify netvsc via RTE_ETH_EVENT_INTR_RMV.
+ */
+static void
+mana_pci_remove_event_cb(const char *device_name,
+ enum rte_dev_event_type event, void *cb_arg)
+{
+ struct mana_priv *priv = cb_arg;
+ struct rte_eth_dev *dev;
+
+ if (event != RTE_DEV_EVENT_REMOVE)
+ return;
+
+ DRV_LOG(INFO, "PCI device %s removed", device_name);
+
+ /* Wake the reset thread immediately */
+ pthread_mutex_lock(&priv->reset_cond_mutex);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_RESET_FAILED, rte_memory_order_release);
+ pthread_cond_signal(&priv->reset_cond);
+ pthread_mutex_unlock(&priv->reset_cond_mutex);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+
+ dev = &rte_eth_devices[priv->port_id];
+ DRV_LOG(INFO, "Sending RTE_ETH_EVENT_INTR_RMV for port %u",
+ priv->port_id);
+ rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_INTR_RMV, NULL);
+}
+
+/*
+ * Reset thread: sleeps for the reset timer period, then performs
+ * the reset exit sequence. Runs on a control thread so it can call
+ * rte_intr_callback_unregister (which fails from alarm/intr thread).
+ */
+static uint32_t
+mana_reset_thread(void *arg)
+{
+ struct mana_priv *priv = (struct mana_priv *)arg;
+ struct timespec ts;
+
+ DRV_LOG(INFO, "Reset thread started, waiting %us",
+ (unsigned int)(MANA_RESET_TIMER_US / 1000000));
+
+ /* Wait on condvar with timeout — can be woken early by PCI remove */
+ clock_gettime(CLOCK_REALTIME, &ts);
+ ts.tv_sec += MANA_RESET_TIMER_US / 1000000;
+
+ pthread_mutex_lock(&priv->reset_cond_mutex);
+ pthread_cond_timedwait(&priv->reset_cond, &priv->reset_cond_mutex, &ts);
+ pthread_mutex_unlock(&priv->reset_cond_mutex);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_RESET_EXIT) {
+ DRV_LOG(INFO, "Reset thread: dev_state=%d, skipping",
+ (int)rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire));
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return 0;
+ }
+
+ DRV_LOG(INFO, "Reset thread: initiating reset exit");
+ mana_reset_exit(priv);
+ /* Lock is released by mana_reset_exit_delay at the end of
+ * the reset exit processing.
+ *
+ * reset_thread_active is NOT cleared here — the joiner
+ * (dev_stop_lock/dev_close_lock) is responsible for joining
+ * and clearing the flag to avoid leaking the thread.
+ */
+ return 0;
+}
+
+static void
+mana_reset_enter(struct mana_priv *priv)
+{
+ int ret;
+ int i;
+ struct rte_eth_dev *dev = &rte_eth_devices[priv->port_id];
+
+ /*
+ * Lock ownership for reset_ops_lock through the reset path:
+ *
+ * mana_intr_handler — acquires the lock
+ * mana_reset_enter — called with lock held, releases it
+ * after spawning the reset thread
+ * mana_reset_thread — re-acquires the lock after the
+ * recovery delay
+ * mana_reset_exit — called with lock held, passes it
+ * to mana_reset_exit_delay
+ * mana_reset_exit_delay — called with lock held, releases it
+ * on completion
+ */
+
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_ENTER,
+ rte_memory_order_release);
+
+ DRV_LOG(DEBUG, "Entering into device reset state");
+ DRV_LOG(DEBUG, "Resetting dev = %p, priv = %p", dev, priv);
+
+ /* Set state bits on each queue's burst_state so new bursts are
+ * rejected, then wait for any in-flight burst (bit 0) to finish.
+ */
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ if (rxq)
+ rte_atomic_fetch_or_explicit(&rxq->burst_state,
+ (uint32_t)(MANA_DEV_RESET_ENTER << 1),
+ rte_memory_order_release);
+ if (txq)
+ rte_atomic_fetch_or_explicit(&txq->burst_state,
+ (uint32_t)(MANA_DEV_RESET_ENTER << 1),
+ rte_memory_order_release);
+ }
+
+ /* Wait for all in-flight burst calls to finish (bit 0 to clear) */
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ if (rxq)
+ while (rte_atomic_load_explicit(&rxq->burst_state,
+ rte_memory_order_acquire) & 1)
+ rte_pause();
+ if (txq)
+ while (rte_atomic_load_explicit(&txq->burst_state,
+ rte_memory_order_acquire) & 1)
+ rte_pause();
+ }
+
+ DRV_LOG(DEBUG, "All data path threads drained");
+
+ /* Stop data path on primary and secondary before unmapping doorbell */
+ ret = mana_dev_stop(dev);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to stop mana dev ret %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+ goto reset_failed;
+ }
+
+ /* Unmap secondary doorbell pages after data path is stopped */
+ ret = mana_mp_req_on_rxtx(dev, MANA_MP_REQ_RESET_ENTER);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to reset secondary processes ret = %d",
+ ret);
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+ goto reset_failed;
+ }
+
+ ret = mana_dev_close(dev);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to close mana dev ret %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+ goto reset_failed;
+ }
+
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ DRV_LOG(DEBUG, "Free MR for priv = %p, rxq %u, txq %u",
+ priv, rxq->rxq_idx, txq->txq_idx);
+ mana_mr_btree_free(&rxq->mr_btree);
+ mana_mr_btree_free(&txq->mr_btree);
+ }
+
+ DRV_LOG(DEBUG, "Reset processing exited successfully");
+
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_EXIT,
+ rte_memory_order_release);
+
+ /* Join previous reset thread if it completed but was not joined.
+ * Use CAS to avoid double-join if another path joined first.
+ * Don't use mana_join_reset_thread() here — we are already in
+ * RESET_ENTER state and must not change dev_state to ACTIVE.
+ */
+ {
+ bool expected = true;
+
+ if (rte_atomic_compare_exchange_strong_explicit(
+ &priv->reset_thread_active, &expected, false,
+ rte_memory_order_acq_rel,
+ rte_memory_order_acquire))
+ rte_thread_join(priv->reset_thread, NULL);
+ }
+
+ ret = rte_thread_create_internal_control(&priv->reset_thread,
+ "mana-reset",
+ mana_reset_thread, priv);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to create reset thread ret %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state,
+ MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+ goto reset_failed;
+ }
+ rte_atomic_store_explicit(&priv->reset_thread_active,
+ true, rte_memory_order_release);
+
+ DRV_LOG(DEBUG, "Reset thread started");
+
+ /* Release the lock so the application can call dev_stop/dev_close */
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return;
+
+reset_failed:
+ mana_clear_burst_state(dev);
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+}
+
+static uint32_t
+mana_reset_exit_delay(void *arg)
+{
+ struct mana_priv *priv = (struct mana_priv *)arg;
+ uint32_t ret = 0;
+ int i;
+ struct rte_eth_dev *dev;
+ struct rte_pci_device *pci_dev;
+
+ DRV_LOG(DEBUG, "Delayed mana device reset complete processing");
+
+ /* If the app called dev_stop/dev_close during the timer window,
+ * state is no longer RESET_EXIT. Nothing to do.
+ */
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) != MANA_DEV_RESET_EXIT) {
+ DRV_LOG(DEBUG, "State is not RESET_EXIT, skipping");
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+ return ret;
+ }
+
+ dev = &rte_eth_devices[priv->port_id];
+ pci_dev = RTE_CLASS_TO_BUS_DEVICE(dev, *pci_dev);
+
+ DRV_LOG(DEBUG, "Resetting dev = %p, priv = %p", dev, priv);
+
+ ret = ibv_close_device(priv->ib_ctx);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to close ibv device %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+ goto out;
+ }
+ priv->ib_ctx = NULL;
+
+ ret = mana_pci_probe(NULL, pci_dev);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to probe mana pci dev ret %d", ret);
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+ goto out;
+ }
+
+ /*
+ * Init the local MR caches.
+ */
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ ret = mana_mr_btree_init(&rxq->mr_btree,
+ MANA_MR_BTREE_PER_QUEUE_N,
+ rxq->socket);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to init RXQ %d MR btree "
+ "on socket %u, ret %d", i, rxq->socket, ret);
+ goto mr_init_failed_rxq;
+ }
+
+ ret = mana_mr_btree_init(&txq->mr_btree,
+ MANA_MR_BTREE_PER_QUEUE_N,
+ txq->socket);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to init TXQ %d MR btree "
+ "on socket %u, ret %d", i, txq->socket, ret);
+ goto mr_init_failed_txq;
+ }
+ }
+ DRV_LOG(DEBUG, "priv %p, num_queues %u", priv, priv->num_queues);
+
+ /* Start secondaries */
+ ret = mana_mp_req_on_rxtx(dev, MANA_MP_REQ_RESET_EXIT);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to start secondary processes ret = %d",
+ ret);
+ goto mr_init_failed_all;
+ }
+
+ ret = mana_dev_start(dev);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to start mana dev ret %d", ret);
+ goto mr_init_failed_all;
+ }
+
+ /* Clear per-queue burst_state before marking device active so
+ * data path CAS can succeed again.
+ */
+ for (i = 0; i < priv->num_queues; i++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[i];
+ struct mana_txq *txq = dev->data->tx_queues[i];
+
+ if (rxq)
+ rte_atomic_store_explicit(&rxq->burst_state, 0,
+ rte_memory_order_release);
+ if (txq)
+ rte_atomic_store_explicit(&txq->burst_state, 0,
+ rte_memory_order_release);
+ }
+
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_ACTIVE,
+ rte_memory_order_release);
+
+ DRV_LOG(DEBUG, "Exiting the reset complete processing");
+ goto out;
+
+mr_init_failed_all:
+ i = priv->num_queues;
+ goto mr_init_failed_rxq;
+
+mr_init_failed_txq:
+ /* RXQ btree at index i was initialized, free it */
+ mana_mr_btree_free(&((struct mana_rxq *)
+ dev->data->rx_queues[i])->mr_btree);
+
+mr_init_failed_rxq:
+ /* Free all fully initialized btrees for indices < i */
+ for (int j = 0; j < i; j++) {
+ struct mana_rxq *rxq = dev->data->rx_queues[j];
+ struct mana_txq *txq = dev->data->tx_queues[j];
+
+ mana_mr_btree_free(&rxq->mr_btree);
+ mana_mr_btree_free(&txq->mr_btree);
+ }
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_RESET_FAILED,
+ rte_memory_order_release);
+
+out:
+ pthread_mutex_unlock(&priv->reset_ops_lock);
+
+ if (!ret) {
+ DRV_LOG(INFO, "Sending RTE_ETH_EVENT_RECOVERY_SUCCESS for port %u",
+ priv->port_id);
+ rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_RECOVERY_SUCCESS, NULL);
+ } else {
+ DRV_LOG(INFO, "Sending RTE_ETH_EVENT_RECOVERY_FAILED for port %u",
+ priv->port_id);
+ rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_RECOVERY_FAILED, NULL);
+ }
+ return ret;
+}
+
+static void
+mana_reset_exit(struct mana_priv *priv)
+{
+ int ret;
+
+ if (!priv) {
+ DRV_LOG(ERR, "Private structure invalid");
+ return;
+ }
+ DRV_LOG(DEBUG, "Entering into device reset complete processing");
+
+ rxq_intr_disable(priv);
+
+ /* Unregister the interrupt handler. Since mana_reset_exit is always
+ * called from mana_reset_thread (a non-interrupt thread), the
+ * interrupt source is inactive and rte_intr_callback_unregister
+ * succeeds directly.
+ */
+ if (priv->intr_handle) {
+ ret = rte_intr_callback_unregister(priv->intr_handle,
+ mana_intr_handler, priv);
+ if (ret < 0)
+ DRV_LOG(ERR, "Failed to unregister intr callback ret %d",
+ ret);
+ else
+ DRV_LOG(DEBUG, "%d intr callback(s) removed", ret);
+
+ rte_intr_instance_free(priv->intr_handle);
+ priv->intr_handle = NULL;
+ }
+
+ /* Proceed directly to reset exit delay (re-probe and restart).
+ * No need for a separate thread - we are already on
+ * mana_reset_thread which is a non-interrupt control thread.
+ */
+ mana_reset_exit_delay(priv);
+}
+
+/*
+ * Interrupt handler from IB layer to notify this device is
+ * being removed or reset.
*/
static void
mana_intr_handler(void *arg)
{
struct mana_priv *priv = arg;
struct ibv_context *ctx = priv->ib_ctx;
- struct ibv_async_event event;
+ struct ibv_async_event event = { 0 };
+ struct rte_eth_dev *dev;
/* Read and ack all messages from IB device */
while (true) {
if (ibv_get_async_event(ctx, &event))
break;
- if (event.event_type == IBV_EVENT_DEVICE_FATAL) {
- struct rte_eth_dev *dev;
-
- dev = &rte_eth_devices[priv->port_id];
- if (dev->data->dev_conf.intr_conf.rmv)
+ switch (event.event_type) {
+ case IBV_EVENT_DEVICE_FATAL:
+ DRV_LOG(INFO, "IBV_EVENT_DEVICE_FATAL received, dev_state=%d",
+ (int)rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire));
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) == MANA_DEV_ACTIVE) {
+ /* Notify upper layers (e.g. netvsc) before
+ * acquiring the lock so they can switch data
+ * path before mana stops queues. Emitting
+ * outside the lock avoids deadlock if the
+ * callback calls dev_stop/dev_close.
+ */
+ dev = &rte_eth_devices[priv->port_id];
+ DRV_LOG(INFO,
+ "Sending RTE_ETH_EVENT_ERR_RECOVERING for port %u",
+ priv->port_id);
rte_eth_dev_callback_process(dev,
- RTE_ETH_EVENT_INTR_RMV, NULL);
+ RTE_ETH_EVENT_ERR_RECOVERING,
+ NULL);
+
+ pthread_mutex_lock(&priv->reset_ops_lock);
+
+ /* Re-check after lock to avoid racing with
+ * mana_pci_remove_event_cb which may have
+ * set RESET_FAILED while we waited.
+ */
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) !=
+ MANA_DEV_ACTIVE) {
+ pthread_mutex_unlock(
+ &priv->reset_ops_lock);
+ break;
+ }
+
+ mana_reset_enter(priv);
+
+ if (rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire) ==
+ MANA_DEV_RESET_FAILED) {
+ DRV_LOG(INFO,
+ "Sending RTE_ETH_EVENT_RECOVERY_FAILED for port %u",
+ priv->port_id);
+ rte_eth_dev_callback_process(dev,
+ RTE_ETH_EVENT_RECOVERY_FAILED,
+ NULL);
+ }
+ } else {
+ DRV_LOG(ERR, "Already in reset handling, dev_state=%d",
+ (int)rte_atomic_load_explicit(&priv->dev_state,
+ rte_memory_order_acquire));
+ }
+ break;
+
+ default:
+ break;
}
ibv_ack_async_event(&event);
@@ -1063,6 +1850,23 @@ static int
mana_intr_uninstall(struct mana_priv *priv)
{
int ret;
+ struct rte_eth_dev *dev;
+
+ if (!priv->intr_handle)
+ return 0;
+
+ /* Unregister PCI device removal event callback.
+ * Do not retry on -EAGAIN to avoid deadlock: the callback
+ * may be blocked waiting for reset_ops_lock which we hold.
+ */
+ dev = &rte_eth_devices[priv->port_id];
+ if (dev->device) {
+ ret = rte_dev_event_callback_unregister(dev->device->name,
+ mana_pci_remove_event_cb, priv);
+ if (ret < 0 && ret != -ENOENT)
+ DRV_LOG(WARNING, "Failed to unregister PCI remove cb ret %d",
+ ret);
+ }
ret = rte_intr_callback_unregister(priv->intr_handle,
mana_intr_handler, priv);
@@ -1072,6 +1876,7 @@ mana_intr_uninstall(struct mana_priv *priv)
}
rte_intr_instance_free(priv->intr_handle);
+ priv->intr_handle = NULL;
return 0;
}
@@ -1127,6 +1932,16 @@ mana_intr_install(struct rte_eth_dev *eth_dev, struct mana_priv *priv)
goto free_intr;
}
+ /* Register for PCI device removal events to distinguish
+ * PCI hot-remove from service reset. This requires the
+ * application to call rte_dev_event_monitor_start() for
+ * events to be delivered (e.g. testpmd --hot-plug-handling).
+ */
+ ret = rte_dev_event_callback_register(eth_dev->device->name,
+ mana_pci_remove_event_cb, priv);
+ if (ret)
+ DRV_LOG(WARNING, "Failed to register PCI remove event callback");
+
eth_dev->intr_handle = priv->intr_handle;
return 0;
@@ -1156,7 +1971,7 @@ mana_proc_priv_init(struct rte_eth_dev *dev)
/*
* Map the doorbell page for the secondary process through IB device handle.
*/
-static int
+int
mana_map_doorbell_secondary(struct rte_eth_dev *eth_dev, int fd)
{
struct mana_process_priv *priv = eth_dev->process_private;
@@ -1294,17 +2109,29 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
char name[RTE_ETH_NAME_MAX_LEN];
int ret;
struct ibv_context *ctx = NULL;
+ bool is_reset = false;
+ pthread_mutexattr_t mattr;
+ pthread_condattr_t cattr;
rte_ether_format_addr(address, sizeof(address), addr);
- DRV_LOG(INFO, "device located port %u address %s", port, address);
- priv = rte_zmalloc_socket(NULL, sizeof(*priv), RTE_CACHE_LINE_SIZE,
- SOCKET_ID_ANY);
- if (!priv)
- return -ENOMEM;
+ DRV_LOG(DEBUG, "device located port %u address %s", port, address);
snprintf(name, sizeof(name), "%s_port%d", pci_dev->device.name, port);
+ eth_dev = rte_eth_dev_allocated(name);
+ if (eth_dev) {
+ is_reset = true;
+ priv = eth_dev->data->dev_private;
+ DRV_LOG(DEBUG, "Device reset for eth_dev %p priv %p",
+ eth_dev, priv);
+ } else {
+ priv = rte_zmalloc_socket(NULL, sizeof(*priv), RTE_CACHE_LINE_SIZE,
+ SOCKET_ID_ANY);
+ if (!priv)
+ return -ENOMEM;
+ }
+
if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
int fd;
@@ -1317,6 +2144,7 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
eth_dev->device = &pci_dev->device;
eth_dev->dev_ops = &mana_dev_secondary_ops;
+
ret = mana_proc_priv_init(eth_dev);
if (ret)
goto failed;
@@ -1336,7 +2164,7 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
goto failed;
}
- /* fd is no not used after mapping doorbell */
+ /* fd is not used after mapping doorbell */
close(fd);
eth_dev->tx_pkt_burst = mana_tx_burst;
@@ -1355,22 +2183,6 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
goto failed;
}
- eth_dev = rte_eth_dev_allocate(name);
- if (!eth_dev) {
- ret = -ENOMEM;
- goto failed;
- }
-
- eth_dev->data->mac_addrs =
- rte_calloc("mana_mac", 1,
- sizeof(struct rte_ether_addr), 0);
- if (!eth_dev->data->mac_addrs) {
- ret = -ENOMEM;
- goto failed;
- }
-
- rte_ether_addr_copy(addr, eth_dev->data->mac_addrs);
-
priv->ib_pd = ibv_alloc_pd(ctx);
if (!priv->ib_pd) {
DRV_LOG(ERR, "ibv_alloc_pd failed port %d", port);
@@ -1390,10 +2202,6 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
}
priv->ib_ctx = ctx;
- priv->port_id = eth_dev->data->port_id;
- priv->dev_port = port;
- eth_dev->data->dev_private = priv;
- priv->dev_data = eth_dev->data;
priv->max_rx_queues = dev_attr->orig_attr.max_qp;
priv->max_tx_queues = dev_attr->orig_attr.max_qp;
@@ -1415,23 +2223,73 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
name, priv->max_rx_queues, priv->max_rx_desc,
priv->max_send_sge, priv->max_mr_size);
- rte_eth_copy_pci_info(eth_dev, pci_dev);
+ if (!is_reset) {
+ eth_dev = rte_eth_dev_allocate(name);
+ if (!eth_dev) {
+ ret = -ENOMEM;
+ goto failed;
+ }
- /* Create async interrupt handler */
- ret = mana_intr_install(eth_dev, priv);
- if (ret) {
- DRV_LOG(ERR, "Failed to install intr handler");
- goto failed;
+ eth_dev->data->mac_addrs =
+ rte_calloc("mana_mac", 1,
+ sizeof(struct rte_ether_addr), 0);
+ if (!eth_dev->data->mac_addrs) {
+ ret = -ENOMEM;
+ goto failed;
+ }
+
+ rte_ether_addr_copy(addr, eth_dev->data->mac_addrs);
+ } else {
+ /*
+ * Reset path.
+ */
+ rte_ether_format_addr(address, RTE_ETHER_ADDR_FMT_SIZE,
+ eth_dev->data->mac_addrs);
+ DRV_LOG(DEBUG, "Found existing eth_dev %p with mac addr %s",
+ eth_dev, address);
+ DRV_LOG(DEBUG, "ib_ctx = %p", priv->ib_ctx);
+ goto out;
}
- eth_dev->device = &pci_dev->device;
+ priv->port_id = eth_dev->data->port_id;
+ priv->dev_port = port;
+ eth_dev->data->dev_private = priv;
+ priv->dev_data = eth_dev->data;
+ rte_atomic_store_explicit(&priv->dev_state, MANA_DEV_ACTIVE,
+ rte_memory_order_release);
+
+ rte_eth_copy_pci_info(eth_dev, pci_dev);
- DRV_LOG(INFO, "device %s at port %u", name, eth_dev->data->port_id);
+ pthread_mutexattr_init(&mattr);
+ pthread_mutexattr_setpshared(&mattr, PTHREAD_PROCESS_SHARED);
+ pthread_mutex_init(&priv->reset_ops_lock, &mattr);
+ pthread_mutex_init(&priv->reset_cond_mutex, &mattr);
+ pthread_mutexattr_destroy(&mattr);
+
+ pthread_condattr_init(&cattr);
+ pthread_condattr_setpshared(&cattr, PTHREAD_PROCESS_SHARED);
+ pthread_cond_init(&priv->reset_cond, &cattr);
+ pthread_condattr_destroy(&cattr);
+
+ eth_dev->device = &pci_dev->device;
eth_dev->rx_pkt_burst = mana_rx_burst_removed;
eth_dev->tx_pkt_burst = mana_tx_burst_removed;
eth_dev->dev_ops = &mana_dev_ops;
+out:
+ /* Create async interrupt handler */
+ ret = mana_intr_install(eth_dev, priv);
+ if (ret) {
+ DRV_LOG(ERR, "Failed to install intr handler, ret %d", ret);
+ goto failed;
+ } else {
+ DRV_LOG(INFO, "mana_intr_install succeeded");
+ }
+
+ DRV_LOG(INFO, "device %s priv %p dev port %d at port %u",
+ name, priv, priv->dev_port, eth_dev->data->port_id);
+
rte_eth_dev_probing_finish(eth_dev);
return 0;
@@ -1439,20 +2297,29 @@ mana_probe_port(struct ibv_device *ibdev, struct ibv_device_attr_ex *dev_attr,
failed:
/* Free the resource for the port failed */
if (priv) {
- if (priv->ib_parent_pd)
+ if (priv->ib_parent_pd) {
ibv_dealloc_pd(priv->ib_parent_pd);
+ priv->ib_parent_pd = NULL;
+ }
- if (priv->ib_pd)
+ if (priv->ib_pd) {
ibv_dealloc_pd(priv->ib_pd);
+ priv->ib_pd = NULL;
+ }
}
- if (eth_dev)
- rte_eth_dev_release_port(eth_dev);
+ if (!is_reset) {
+ if (eth_dev)
+ rte_eth_dev_release_port(eth_dev);
- rte_free(priv);
+ rte_free(priv);
+ }
- if (ctx)
+ if (ctx) {
ibv_close_device(ctx);
+ if (is_reset && priv)
+ priv->ib_ctx = NULL;
+ }
return ret;
}
@@ -1617,7 +2484,17 @@ mana_pci_probe(struct rte_pci_driver *pci_drv __rte_unused,
static int
mana_dev_uninit(struct rte_eth_dev *dev)
{
- return mana_dev_close(dev);
+ struct mana_priv *priv = dev->data->dev_private;
+ int ret;
+
+ /* Join reset thread before teardown to ensure it has exited
+ * before we destroy the condvar/mutex in free_resources.
+ */
+ mana_join_reset_thread(priv);
+
+ ret = mana_dev_close(dev);
+ mana_dev_free_resources(dev);
+ return ret;
}
/*
diff --git a/drivers/net/mana/mana.h b/drivers/net/mana/mana.h
index 79cc47b6ab..29445d9de2 100644
--- a/drivers/net/mana/mana.h
+++ b/drivers/net/mana/mana.h
@@ -5,6 +5,8 @@
#ifndef __MANA_H__
#define __MANA_H__
+#include <pthread.h>
+
#define PCI_VENDOR_ID_MICROSOFT 0x1414
#define PCI_DEVICE_ID_MICROSOFT_MANA_PF 0x00b9
#define PCI_DEVICE_ID_MICROSOFT_MANA 0x00ba
@@ -337,6 +339,20 @@ struct mana_process_priv {
void *db_page;
};
+enum mana_device_state {
+ /* Normal running */
+ MANA_DEV_ACTIVE = 0,
+ /* In reset enter processing */
+ MANA_DEV_RESET_ENTER = 1,
+ /*
+ * Reset enter processing completed.
+ * Waiting for reset exit or in reset exit processing.
+ */
+ MANA_DEV_RESET_EXIT = 2,
+ /* Reset failed */
+ MANA_DEV_RESET_FAILED = 3,
+};
+
struct mana_priv {
struct rte_eth_dev_data *dev_data;
struct mana_process_priv *process_priv;
@@ -368,6 +384,15 @@ struct mana_priv {
uint64_t max_mr_size;
struct mana_mr_btree mr_btree;
rte_spinlock_t mr_btree_lock;
+ RTE_ATOMIC(enum mana_device_state) dev_state;
+ /* mutex for synchronizing mana reset and some mana_dev_ops callbacks */
+ pthread_mutex_t reset_ops_lock;
+ /* Reset thread ID, valid when reset_thread_active is true */
+ rte_thread_t reset_thread;
+ RTE_ATOMIC(bool) reset_thread_active;
+ /* Condvar to wake reset thread early on PCI remove */
+ pthread_mutex_t reset_cond_mutex;
+ pthread_cond_t reset_cond;
};
struct mana_txq_desc {
@@ -427,6 +452,14 @@ struct mana_txq {
struct mana_mr_btree mr_btree;
struct mana_stats stats;
unsigned int socket;
+ unsigned int txq_idx;
+
+ /*
+ * Bit 0: in-burst flag (set by data path, cleared on exit).
+ * Bits 1+: device state (set by reset path via fetch_or).
+ * Data path CAS 0→1 to enter; fails if any state bits are set.
+ */
+ RTE_ATOMIC(uint32_t) burst_state;
};
struct mana_rxq {
@@ -462,6 +495,14 @@ struct mana_rxq {
struct mana_mr_btree mr_btree;
unsigned int socket;
+ unsigned int rxq_idx;
+
+ /*
+ * Bit 0: in-burst flag (set by data path, cleared on exit).
+ * Bits 1+: device state (set by reset path via fetch_or).
+ * Data path CAS 0→1 to enter; fails if any state bits are set.
+ */
+ RTE_ATOMIC(uint32_t) burst_state;
};
extern int mana_logtype_driver;
@@ -543,6 +584,8 @@ enum mana_mp_req_type {
MANA_MP_REQ_CREATE_MR,
MANA_MP_REQ_START_RXTX,
MANA_MP_REQ_STOP_RXTX,
+ MANA_MP_REQ_RESET_ENTER,
+ MANA_MP_REQ_RESET_EXIT,
};
/* Pameters for IPC. */
@@ -563,8 +606,9 @@ void mana_mp_uninit_primary(void);
void mana_mp_uninit_secondary(void);
int mana_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev);
int mana_mp_req_mr_create(struct mana_priv *priv, uintptr_t addr, uint32_t len);
+int mana_map_doorbell_secondary(struct rte_eth_dev *eth_dev, int fd);
-void mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type);
+int mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type);
void *mana_alloc_verbs_buf(size_t size, void *data);
void mana_free_verbs_buf(void *ptr, void *data __rte_unused);
diff --git a/drivers/net/mana/mp.c b/drivers/net/mana/mp.c
index 72417fc0c7..1161ebd71c 100644
--- a/drivers/net/mana/mp.c
+++ b/drivers/net/mana/mp.c
@@ -2,10 +2,13 @@
* Copyright 2022 Microsoft Corporation
*/
+#include <sys/mman.h>
#include <rte_malloc.h>
#include <ethdev_driver.h>
#include <rte_log.h>
+#include <rte_eal_paging.h>
#include <stdlib.h>
+#include <unistd.h>
#include <infiniband/verbs.h>
@@ -119,6 +122,23 @@ mana_mp_primary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
return ret;
}
+static int
+mana_mp_reset_enter(struct rte_eth_dev *dev)
+{
+ struct mana_process_priv *proc_priv = dev->process_private;
+
+ void *addr = proc_priv->db_page;
+
+ /* Reset the db_page to NULL */
+ proc_priv->db_page = NULL;
+
+ if (addr)
+ (void)munmap(addr, rte_mem_page_size());
+
+ DRV_LOG(DEBUG, "Secondary doorbell pages unmapped");
+ return 0;
+}
+
static int
mana_mp_secondary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
{
@@ -171,6 +191,49 @@ mana_mp_secondary_handle(const struct rte_mp_msg *mp_msg, const void *peer)
ret = rte_mp_reply(&mp_res, peer);
break;
+ case MANA_MP_REQ_RESET_ENTER:
+ DRV_LOG(INFO, "Port %u reset enter", dev->data->port_id);
+ res->result = mana_mp_reset_enter(dev);
+
+ ret = rte_mp_reply(&mp_res, peer);
+ break;
+
+ case MANA_MP_REQ_RESET_EXIT:
+ DRV_LOG(INFO, "Port %u reset exit", dev->data->port_id);
+ {
+ struct mana_process_priv *proc_priv =
+ dev->process_private;
+
+ if (proc_priv->db_page != NULL) {
+ DRV_LOG(DEBUG,
+ "Secondary doorbell already "
+ "mapped to %p",
+ proc_priv->db_page);
+ res->result = 0;
+ } else if (mp_msg->num_fds < 1) {
+ DRV_LOG(ERR,
+ "No FD in RESET_EXIT message");
+ res->result = -EINVAL;
+ } else {
+ int fd = mp_msg->fds[0];
+
+ ret = mana_map_doorbell_secondary(dev,
+ fd);
+ if (ret) {
+ DRV_LOG(ERR,
+ "Failed secondary "
+ "doorbell map %d",
+ fd);
+ res->result = -ENODEV;
+ } else {
+ res->result = 0;
+ }
+ close(fd);
+ }
+ }
+ ret = rte_mp_reply(&mp_res, peer);
+ break;
+
default:
DRV_LOG(ERR, "Port %u unknown secondary MP type %u",
param->port_id, param->type);
@@ -254,7 +317,7 @@ mana_mp_req_verbs_cmd_fd(struct rte_eth_dev *dev)
}
ret = mp_res->fds[0];
- DRV_LOG(ERR, "port %u command FD from primary is %d",
+ DRV_LOG(DEBUG, "port %u command FD from primary is %d",
dev->data->port_id, ret);
exit:
free(mp_rep.msgs);
@@ -298,27 +361,36 @@ mana_mp_req_mr_create(struct mana_priv *priv, uintptr_t addr, uint32_t len)
return ret;
}
-void
+int
mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type)
{
struct rte_mp_msg mp_req = { 0 };
struct rte_mp_msg *mp_res;
- struct rte_mp_reply mp_rep;
+ struct rte_mp_reply mp_rep = { 0 };
struct mana_mp_param *res;
struct timespec ts = {.tv_sec = MANA_MP_REQ_TIMEOUT_SEC, .tv_nsec = 0};
- int i, ret;
+ int i, ret = 0;
- if (type != MANA_MP_REQ_START_RXTX && type != MANA_MP_REQ_STOP_RXTX) {
+ if (type != MANA_MP_REQ_START_RXTX && type != MANA_MP_REQ_STOP_RXTX &&
+ type != MANA_MP_REQ_RESET_ENTER && type != MANA_MP_REQ_RESET_EXIT) {
DRV_LOG(ERR, "port %u unknown request (req_type %d)",
dev->data->port_id, type);
- return;
+ return -EINVAL;
}
if (rte_atomic_load_explicit(&mana_shared_data->secondary_cnt, rte_memory_order_relaxed) == 0)
- return;
+ return 0;
mp_init_msg(&mp_req, type, dev->data->port_id);
+ /* Include IB cmd FD for secondary doorbell remap */
+ if (type == MANA_MP_REQ_RESET_EXIT) {
+ struct mana_priv *priv = dev->data->dev_private;
+
+ mp_req.num_fds = 1;
+ mp_req.fds[0] = priv->ib_ctx->cmd_fd;
+ }
+
ret = rte_mp_request_sync(&mp_req, &mp_rep, &ts);
if (ret) {
if (rte_errno != ENOTSUP)
@@ -329,6 +401,7 @@ mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type)
if (mp_rep.nb_sent != mp_rep.nb_received) {
DRV_LOG(ERR, "port %u not all secondaries responded (%d)",
dev->data->port_id, type);
+ ret = -ETIMEDOUT;
goto exit;
}
for (i = 0; i < mp_rep.nb_received; i++) {
@@ -337,9 +410,11 @@ mana_mp_req_on_rxtx(struct rte_eth_dev *dev, enum mana_mp_req_type type)
if (res->result) {
DRV_LOG(ERR, "port %u request failed on secondary %d",
dev->data->port_id, i);
+ ret = res->result;
goto exit;
}
}
exit:
free(mp_rep.msgs);
+ return ret;
}
diff --git a/drivers/net/mana/mr.c b/drivers/net/mana/mr.c
index c4045141bc..8914f4cf04 100644
--- a/drivers/net/mana/mr.c
+++ b/drivers/net/mana/mr.c
@@ -314,8 +314,10 @@ mana_mr_btree_init(struct mana_mr_btree *bt, int n, int socket)
void
mana_mr_btree_free(struct mana_mr_btree *bt)
{
- rte_free(bt->table);
- memset(bt, 0, sizeof(*bt));
+ if (bt && bt->table) {
+ rte_free(bt->table);
+ memset(bt, 0, sizeof(*bt));
+ }
}
int
diff --git a/drivers/net/mana/rx.c b/drivers/net/mana/rx.c
index 1b8ba1f3a9..53b19c67ed 100644
--- a/drivers/net/mana/rx.c
+++ b/drivers/net/mana/rx.c
@@ -36,6 +36,11 @@ mana_rq_ring_doorbell(struct mana_rxq *rxq)
db_page = process_priv->db_page;
}
+ if (!db_page) {
+ DP_LOG(ERR, "db_page is NULL, cannot ring RX doorbell");
+ return -EINVAL;
+ }
+
/* Hardware Spec specifies that software client should set 0 for
* wqe_cnt for Receive Queues.
*/
@@ -172,7 +177,7 @@ mana_stop_rx_queues(struct rte_eth_dev *dev)
for (i = 0; i < priv->num_queues; i++)
if (dev->data->rx_queue_state[i] == RTE_ETH_QUEUE_STATE_STOPPED)
- return -EINVAL;
+ return 0;
if (priv->rwq_qp) {
ret = ibv_destroy_qp(priv->rwq_qp);
@@ -256,6 +261,9 @@ mana_start_rx_queues(struct rte_eth_dev *dev)
struct mana_rxq *rxq = dev->data->rx_queues[i];
struct ibv_wq_init_attr wq_attr = {};
+ rxq->rxq_idx = i;
+ DRV_LOG(DEBUG, "assigning rxq_idx to %d", i);
+
manadv_set_context_attr(priv->ib_ctx,
MANADV_CTX_ATTR_BUF_ALLOCATORS,
(void *)((uintptr_t)&(struct manadv_ctx_allocators){
@@ -451,6 +459,16 @@ mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
uint32_t pkt_len;
uint32_t i;
int polled = 0;
+ uint32_t expected = 0;
+
+ /* Single atomic CAS: enter burst only if device is active (0→1).
+ * Fails immediately if reset path has set state bits.
+ */
+ if (unlikely(!rte_atomic_compare_exchange_strong_explicit(
+ &rxq->burst_state, &expected, 1,
+ rte_memory_order_acquire,
+ rte_memory_order_relaxed)))
+ return 0;
repoll:
/* Polling on new completions if we have no backlog */
@@ -592,6 +610,9 @@ mana_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts, uint16_t pkts_n)
wqe_consumed, ret);
}
+ rte_atomic_fetch_and_explicit(&rxq->burst_state, ~(uint32_t)1,
+ rte_memory_order_release);
+
return pkt_received;
}
diff --git a/drivers/net/mana/tx.c b/drivers/net/mana/tx.c
index 57dbbc3651..b10deedba6 100644
--- a/drivers/net/mana/tx.c
+++ b/drivers/net/mana/tx.c
@@ -17,7 +17,7 @@ mana_stop_tx_queues(struct rte_eth_dev *dev)
for (i = 0; i < priv->num_queues; i++)
if (dev->data->tx_queue_state[i] == RTE_ETH_QUEUE_STATE_STOPPED)
- return -EINVAL;
+ return 0;
for (i = 0; i < priv->num_queues; i++) {
struct mana_txq *txq = dev->data->tx_queues[i];
@@ -83,6 +83,9 @@ mana_start_tx_queues(struct rte_eth_dev *dev)
txq = dev->data->tx_queues[i];
+ txq->txq_idx = i;
+ DRV_LOG(DEBUG, "assigning txq_idx to %d", txq->txq_idx);
+
manadv_set_context_attr(priv->ib_ctx,
MANADV_CTX_ATTR_BUF_ALLOCATORS,
(void *)((uintptr_t)&(struct manadv_ctx_allocators){
@@ -190,10 +193,34 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
void *db_page;
uint16_t pkt_sent = 0;
uint32_t num_comp, i;
+ uint32_t expected = 0;
#ifdef RTE_ARCH_32
uint32_t wqe_count = 0;
#endif
+ db_page = priv->db_page;
+ if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+ struct rte_eth_dev *dev =
+ &rte_eth_devices[priv->dev_data->port_id];
+ struct mana_process_priv *process_priv = dev->process_private;
+
+ db_page = process_priv->db_page;
+ }
+
+ /* Single atomic CAS: enter burst only if device is active (0→1).
+ * Fails immediately if reset path has set state bits.
+ */
+ if (unlikely(!rte_atomic_compare_exchange_strong_explicit(
+ &txq->burst_state, &expected, 1,
+ rte_memory_order_acquire,
+ rte_memory_order_relaxed) || !db_page)) {
+ if (!expected) /* CAS succeeded but db_page NULL — undo */
+ rte_atomic_fetch_and_explicit(&txq->burst_state,
+ ~(uint32_t)1,
+ rte_memory_order_release);
+ return 0;
+ }
+
/* Process send completions from GDMA */
num_comp = gdma_poll_completion_queue(&txq->gdma_cq,
txq->gdma_comp_buf, txq->num_desc);
@@ -216,7 +243,8 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
}
if (!desc->pkt) {
- DP_LOG(ERR, "mana_txq_desc has a NULL pkt");
+ DP_LOG(ERR, "mana_txq_desc has a NULL pkt, priv %p, "
+ "txq = %d", priv, txq->txq_idx);
} else {
txq->stats.bytes += desc->pkt->pkt_len;
rte_pktmbuf_free(desc->pkt);
@@ -474,15 +502,6 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
}
/* Ring hardware door bell */
- db_page = priv->db_page;
- if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
- struct rte_eth_dev *dev =
- &rte_eth_devices[priv->dev_data->port_id];
- struct mana_process_priv *process_priv = dev->process_private;
-
- db_page = process_priv->db_page;
- }
-
if (pkt_sent) {
#ifdef RTE_ARCH_32
ret = mana_ring_short_doorbell(db_page, GDMA_QUEUE_SEND,
@@ -501,5 +520,8 @@ mana_tx_burst(void *dpdk_txq, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
DP_LOG(ERR, "mana_ring_doorbell failed ret %d", ret);
}
+ rte_atomic_fetch_and_explicit(&txq->burst_state, ~(uint32_t)1,
+ rte_memory_order_release);
+
return pkt_sent;
}
--
2.34.1
^ permalink raw reply related
* Re: [PATCH] net/mlx5: fix counter TAILQ race between free and query callback
From: Dariusz Sosnowski @ 2026-06-08 12:41 UTC (permalink / raw)
To: Laaahu; +Cc: dev, stable
In-Reply-To: <20260604101112.72177-1-lilinhu618@gmail.com>
Hi,
Thank you for the contribution.
On Thu, Jun 04, 2026 at 06:11:12PM +0800, Laaahu wrote:
> From: lilinhu <lilinhu618@gmail.com>
>
> flow_dv_counter_free() inserts counters into
> pool->counters[pool->query_gen] under pool->csl. Meanwhile,
> mlx5_flow_async_pool_query_handle() moves counters from
> pool->counters[query_gen ^ 1] to the global free list via
> TAILQ_CONCAT while holding only cmng->csl, not pool->csl.
>
> The comment in flow_dv_counter_free() claims the lock is not needed
> because the query callback and the release function operate on
> different lists. That holds only if the free path always observes
> the up-to-date query_gen. It can be violated:
>
> 1. A counter free thread (non-PMD, e.g. OVS offload thread) reads
> pool->query_gen == 0 and is about to insert into counters[0].
> 2. The free thread is preempted by the OS scheduler; it is a regular
> pthread, not pinned to a core.
> 3. The eal-intr-thread alarm fires: query_gen++ (now 1) and the async
> query is sent.
> 4. Hardware completes the query and the callback runs TAILQ_CONCAT on
> counters[0] (= query_gen ^ 1).
> 5. The free thread resumes and runs TAILQ_INSERT_TAIL on counters[0]
> concurrently with step 4 on another core.
>
> Because the two paths take different locks, TAILQ_INSERT_TAIL and
> TAILQ_CONCAT run concurrently on the same list with no
> synchronization and corrupt it: the pool-local list ends up with a
> NULL head but a dangling tqh_last, and the global free list tail no
> longer points to the real tail. The just-freed counter and every
> counter inserted afterwards become unreachable and are leaked.
>
> Non-PMD threads can be preempted for hundreds of microseconds under
> CPU pressure, which is well within the async query round-trip time,
> so the window is reachable in practice.
>
> Fix it by taking pool->csl in the query completion callback before
> operating on pool->counters[query_gen], serializing the CONCAT with
> any concurrent INSERT. The lock is taken once per pool per query
> completion in the eal-intr-thread context, not on the datapath, so
> the cost is negligible. Lock order is pool->csl then cmng->csl,
> matching all other sites.
>
> Also handle the error path: previously the counters accumulated in
> pool->counters[query_gen] were abandoned when a query failed. Move
> them back to the global free list to avoid a leak on persistent
> query failures.
>
> Fixes: ac79183dc6f7 ("net/mlx5: optimize free counter lookup")
> Cc: stable@dpdk.org
>
> Signed-off-by: lilinhu <lilinhu618@gmail.com>
Code looks good to me.
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
DPDK community uses Signed-off-by to indicate
Developer's Certificate of Origin:
https://developercertificate.org/
This requires full name in the Signed-off-by tag.
Could you please help with providing us with your full name
in English alphabet?
Best regards,
Dariusz Sosnowski
^ permalink raw reply
* Re: [PATCH] doc: move firmware instructions in mlx5 guide
From: Dariusz Sosnowski @ 2026-06-08 12:46 UTC (permalink / raw)
To: Thomas Monjalon
Cc: dev, Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou,
Matan Azrad
In-Reply-To: <20260608120531.1037367-1-thomas@monjalon.net>
On Mon, Jun 08, 2026 at 02:05:31PM +0200, Thomas Monjalon wrote:
> Having firmware update instructions before firmware config
> looks simpler to find than in compilation prerequisites.
>
> A link is also added after listing minimum firmware versions.
>
> Signed-off-by: Thomas Monjalon <thomas@monjalon.net>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
^ permalink raw reply
* [PATCH v5 1/5] eal: fix wrong log message in async IPC request
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
To: dev, Jianfeng Tan
In-Reply-To: <740b39c5098b4d40cafb9881ad70865a3c889012.1773936429.git.anatoly.burakov@intel.com>
The allocation failure log message in mp_request_async() says "sync
request" but the function handles asynchronous requests.
Fix the log to say "async request".
Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/eal/common/eal_common_proc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 06f151818c..799c6e81b0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -883,7 +883,7 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
pending_req = calloc(1, sizeof(*pending_req));
reply_msg = calloc(1, sizeof(*reply_msg));
if (pending_req == NULL || reply_msg == NULL) {
- EAL_LOG(ERR, "Could not allocate space for sync request");
+ EAL_LOG(ERR, "Could not allocate space for async request");
rte_errno = ENOMEM;
ret = -1;
goto fail;
--
2.47.3
^ permalink raw reply related
* [PATCH v5 2/5] eal: fix async IPC callback not fired when no peers
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
To: dev, Jianfeng Tan
In-Reply-To: <2bc77b94493d94b53a28ea535ed96d92a157a7c7.1780924381.git.anatoly.burakov@intel.com>
Currently, when rte_mp_request_async() is called and no peer processes
are connected (nb_sent == 0), the user callback is never invoked.
The original implementation used a dedicated background thread and
pthread_cond_signal() to wake it after queuing the dummy request. When
that thread was replaced with per-message alarms, no alarm was set for
the dummy request, silently breaking the nb_sent == 0 path.
This was not noticed because async requests are used while handling
secondary process requests, where peers are typically already present.
Fix it by setting a 1us alarm on the dummy request, so the callback path
immediately triggers and processes it.
Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/eal/common/eal_common_proc.c | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 799c6e81b0..2a99162a21 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1187,11 +1187,22 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
ret = mp_request_async(eal_mp_socket_path(), copy, param, ts);
- /* if we didn't send anything, put dummy request on the queue */
+ /* if we didn't send anything, put dummy request on the queue
+ * and set a minimum-delay alarm so the callback fires immediately.
+ */
if (ret == 0 && reply->nb_sent == 0) {
TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
next);
dummy_used = true;
+
+ if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+ EAL_LOG(ERR, "Fail to set alarm for dummy request");
+ /* roll back the changes */
+ TAILQ_REMOVE(&pending_requests.requests, dummy, next);
+ dummy_used = false;
+ ret = -1;
+ goto unlock_fail;
+ }
}
pthread_mutex_unlock(&pending_requests.lock);
@@ -1232,10 +1243,21 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
} else if (mp_request_async(path, copy, param, ts))
ret = -1;
}
- /* if we didn't send anything, put dummy request on the queue */
+ /* if we didn't send anything, put dummy request on the queue
+ * and set a minimum-delay alarm so the callback fires immediately.
+ */
if (ret == 0 && reply->nb_sent == 0) {
TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
dummy_used = true;
+
+ if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+ EAL_LOG(ERR, "Fail to set alarm for dummy request");
+ /* roll back the changes */
+ TAILQ_REMOVE(&pending_requests.requests, dummy, next);
+ dummy_used = false;
+ ret = -1;
+ goto closedir_fail;
+ }
}
/* finally, unlock the queue */
--
2.47.3
^ permalink raw reply related
* [PATCH v5 3/5] eal: fix memory leak in async IPC secondary path
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
To: dev, Jianfeng Tan
In-Reply-To: <2bc77b94493d94b53a28ea535ed96d92a157a7c7.1780924381.git.anatoly.burakov@intel.com>
When rte_mp_request_async() succeeds on the secondary process path, the
dummy request is freed only if it was inserted into the queue. However,
when the actual request was sent successfully (nb_sent > 0), the dummy is
not used and the function returns without freeing it.
Free dummy before returning on the success path when it was not inserted
into the queue.
Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/eal/common/eal_common_proc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 2a99162a21..0dd25bef8b 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -1210,6 +1210,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
/* if we couldn't send anything, clean up */
if (ret != 0)
goto fail;
+ if (!dummy_used)
+ free(dummy);
return 0;
}
--
2.47.3
^ permalink raw reply related
* [PATCH v5 4/5] eal: fix async IPC resource leaks on partial failure
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
To: dev, Jianfeng Tan
In-Reply-To: <2bc77b94493d94b53a28ea535ed96d92a157a7c7.1780924381.git.anatoly.burakov@intel.com>
When rte_mp_request_async() fails to send requests to all peers,
copy and param can lose ownership and leak.
On partial failure, some requests may already be queued and still
reference copy and param, so freeing them directly on the error
path can cause use-after-free when those requests are later handled.
Fix this by rolling back queued requests from the current batch,
resetting nb_sent to 0, and freeing copy/param only after rollback.
Use a numeric request ID for alarm callback lookup so stale callbacks
from rolled-back requests become harmless no-ops.
Coverity issue: 501503
Fixes: f05e26051c15 ("eal: add IPC asynchronous request")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/eal/common/eal_common_proc.c | 112 +++++++++++++++++++++++--------
1 file changed, 84 insertions(+), 28 deletions(-)
diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index 0dd25bef8b..ddcaa2f20b 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -74,6 +74,7 @@ struct async_request_param {
struct pending_request {
TAILQ_ENTRY(pending_request) next;
+ unsigned long id;
enum {
REQUEST_TYPE_SYNC,
REQUEST_TYPE_ASYNC
@@ -92,6 +93,8 @@ struct pending_request {
};
};
+static unsigned long next_request_id;
+
TAILQ_HEAD(pending_request_list, pending_request);
static struct {
@@ -111,9 +114,9 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type);
static void
async_reply_handle(void *arg);
-/* for use with process_msg */
+/* for use with alarm callback and process_msg */
static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg);
+async_reply_handle_thread_unsafe(struct pending_request *req);
static void
trigger_async_action(struct pending_request *req);
@@ -132,6 +135,19 @@ find_pending_request(const char *dst, const char *act_name)
return r;
}
+static struct pending_request *
+find_async_request_by_id(unsigned long id)
+{
+ struct pending_request *r;
+
+ TAILQ_FOREACH(r, &pending_requests.requests, next) {
+ if (r->id == id && r->type == REQUEST_TYPE_ASYNC)
+ return r;
+ }
+
+ return NULL;
+}
+
/*
* Combine prefix and name(optional) to return unix domain socket path
* return the number of characters that would have been put into buffer.
@@ -519,9 +535,8 @@ trigger_async_action(struct pending_request *sr)
}
static struct pending_request *
-async_reply_handle_thread_unsafe(void *arg)
+async_reply_handle_thread_unsafe(struct pending_request *req)
{
- struct pending_request *req = (struct pending_request *)arg;
enum async_action action;
struct timespec ts_now;
@@ -534,7 +549,8 @@ async_reply_handle_thread_unsafe(void *arg)
TAILQ_REMOVE(&pending_requests.requests, req, next);
- if (rte_eal_alarm_cancel(async_reply_handle, req) < 0) {
+ if (rte_eal_alarm_cancel(async_reply_handle,
+ (void *)(uintptr_t)req->id) < 0) {
/* if we failed to cancel the alarm because it's already in
* progress, don't proceed because otherwise we will end up
* handling the same message twice.
@@ -557,9 +573,13 @@ static void
async_reply_handle(void *arg)
{
struct pending_request *req;
+ /* alarm arg carries the request ID packed into a void * via uintptr_t */
+ unsigned long id = (uintptr_t)arg;
pthread_mutex_lock(&pending_requests.lock);
- req = async_reply_handle_thread_unsafe(arg);
+ req = find_async_request_by_id(id);
+ if (req != NULL)
+ req = async_reply_handle_thread_unsafe(req);
pthread_mutex_unlock(&pending_requests.lock);
if (req != NULL)
@@ -878,7 +898,29 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
{
struct rte_mp_msg *reply_msg;
struct pending_request *pending_req, *exist;
- int ret = -1;
+ unsigned long id;
+ int ret;
+
+ /* queue already locked by caller */
+
+ exist = find_pending_request(dst, req->name);
+ if (exist) {
+ EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
+ rte_errno = EEXIST;
+ return -1;
+ }
+
+ /* Set alarm before allocating or sending so request timeout tracking
+ * is active as soon as this request ID is reserved.
+ */
+ id = ++next_request_id;
+ if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
+ async_reply_handle,
+ (void *)(uintptr_t)id) < 0) {
+ EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
+ dst, req->name);
+ return -1;
+ }
pending_req = calloc(1, sizeof(*pending_req));
reply_msg = calloc(1, sizeof(*reply_msg));
@@ -890,21 +932,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
}
pending_req->type = REQUEST_TYPE_ASYNC;
+ pending_req->id = id;
strlcpy(pending_req->dst, dst, sizeof(pending_req->dst));
pending_req->request = req;
pending_req->reply = reply_msg;
pending_req->async.param = param;
- /* queue already locked by caller */
-
- exist = find_pending_request(dst, req->name);
- if (exist) {
- EAL_LOG(ERR, "A pending request %s:%s", dst, req->name);
- rte_errno = EEXIST;
- ret = -1;
- goto fail;
- }
-
ret = send_msg(dst, req, MP_REQ);
if (ret < 0) {
EAL_LOG(ERR, "Fail to send request %s:%s",
@@ -917,14 +950,6 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
}
param->user_reply.nb_sent++;
- /* if alarm set fails, we simply ignore the reply */
- if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
- async_reply_handle, pending_req) < 0) {
- EAL_LOG(ERR, "Fail to set alarm for request %s:%s",
- dst, req->name);
- ret = -1;
- goto fail;
- }
TAILQ_INSERT_TAIL(&pending_requests.requests, pending_req, next);
return 0;
@@ -1178,6 +1203,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
* it, and put it on the queue if we don't send any requests.
*/
dummy->type = REQUEST_TYPE_ASYNC;
+ dummy->id = ++next_request_id;
dummy->request = copy;
dummy->reply = NULL;
dummy->async.param = param;
@@ -1194,8 +1220,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
TAILQ_INSERT_TAIL(&pending_requests.requests, dummy,
next);
dummy_used = true;
-
- if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+ if (rte_eal_alarm_set(1, async_reply_handle,
+ (void *)(uintptr_t)dummy->id) < 0) {
EAL_LOG(ERR, "Fail to set alarm for dummy request");
/* roll back the changes */
TAILQ_REMOVE(&pending_requests.requests, dummy, next);
@@ -1245,6 +1271,30 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
} else if (mp_request_async(path, copy, param, ts))
ret = -1;
}
+
+ /*
+ * On partial failure, roll back all queued requests in this batch while
+ * holding pending_requests.lock. Any alarm callback that runs later for
+ * these removed IDs will not find a pending request and will return.
+ */
+ if (ret != 0 && reply->nb_sent > 0) {
+ struct pending_request *r, *next;
+
+ for (r = TAILQ_FIRST(&pending_requests.requests);
+ r != NULL; r = next) {
+ next = TAILQ_NEXT(r, next);
+ if (r->type == REQUEST_TYPE_ASYNC &&
+ r->async.param == param) {
+ TAILQ_REMOVE(&pending_requests.requests,
+ r, next);
+ free(r->reply);
+ /* r->request == copy, freed below after the loop */
+ free(r);
+ }
+ }
+ reply->nb_sent = 0;
+ }
+
/* if we didn't send anything, put dummy request on the queue
* and set a minimum-delay alarm so the callback fires immediately.
*/
@@ -1252,7 +1302,8 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
TAILQ_INSERT_HEAD(&pending_requests.requests, dummy, next);
dummy_used = true;
- if (rte_eal_alarm_set(1, async_reply_handle, dummy) < 0) {
+ if (rte_eal_alarm_set(1, async_reply_handle,
+ (void *)(uintptr_t)dummy->id) < 0) {
EAL_LOG(ERR, "Fail to set alarm for dummy request");
/* roll back the changes */
TAILQ_REMOVE(&pending_requests.requests, dummy, next);
@@ -1274,6 +1325,11 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
/* if dummy was unused, free it */
if (!dummy_used)
free(dummy);
+ /* if nothing was sent, nobody owns copy/param */
+ if (ret != 0) {
+ free(param);
+ free(copy);
+ }
return ret;
closedir_fail:
--
2.47.3
^ permalink raw reply related
* [PATCH v5 5/5] eal: avoid deadlock in async IPC alarm callback
From: Anatoly Burakov @ 2026-06-08 13:13 UTC (permalink / raw)
To: dev, Jianfeng Tan
In-Reply-To: <2bc77b94493d94b53a28ea535ed96d92a157a7c7.1780924381.git.anatoly.burakov@intel.com>
async_reply_handle_thread_unsafe() can run while holding
pending_requests.lock and currently calls rte_eal_alarm_cancel().
rte_eal_alarm_cancel() may spin-wait for an executing callback, which can
deadlock if that callback is blocked on the same lock.
Remove callback-side alarm cancellation. It is safe to do so, because any
callback triggered without a pending request becomes a noop.
Fixes: daf9bfca717e ("ipc: remove thread for async requests")
Cc: stable@dpdk.org
Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com>
---
lib/eal/common/eal_common_proc.c | 28 ++++++++++------------------
1 file changed, 10 insertions(+), 18 deletions(-)
diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index ddcaa2f20b..908e86f6b0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -549,19 +549,6 @@ async_reply_handle_thread_unsafe(struct pending_request *req)
TAILQ_REMOVE(&pending_requests.requests, req, next);
- if (rte_eal_alarm_cancel(async_reply_handle,
- (void *)(uintptr_t)req->id) < 0) {
- /* if we failed to cancel the alarm because it's already in
- * progress, don't proceed because otherwise we will end up
- * handling the same message twice.
- */
- if (rte_errno == EINPROGRESS) {
- EAL_LOG(DEBUG, "Request handling is already in progress");
- goto no_trigger;
- }
- EAL_LOG(ERR, "Failed to cancel alarm");
- }
-
if (action == ACTION_TRIGGER)
return req;
no_trigger:
@@ -910,8 +897,12 @@ mp_request_async(const char *dst, struct rte_mp_msg *req,
return -1;
}
- /* Set alarm before allocating or sending so request timeout tracking
- * is active as soon as this request ID is reserved.
+ /* Set alarm before allocating or sending. The alarm is never cancelled:
+ * rte_eal_alarm_cancel spin-waits for an executing callback to finish,
+ * which deadlocks if we hold pending_requests.lock while the callback
+ * is blocked on it. Instead, let stale alarms fire; with ID-based
+ * lookup the callback will simply not find the request and return
+ * harmlessly.
*/
id = ++next_request_id;
if (rte_eal_alarm_set(ts->tv_sec * 1000000 + ts->tv_nsec / 1000,
@@ -1273,9 +1264,10 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
}
/*
- * On partial failure, roll back all queued requests in this batch while
- * holding pending_requests.lock. Any alarm callback that runs later for
- * these removed IDs will not find a pending request and will return.
+ * On partial failure, roll back all queued requests. We hold the lock
+ * so no one else touches the queue. All requests in this batch share
+ * the same param pointer. Stale alarms will fire and harmlessly find
+ * nothing via ID-based lookup.
*/
if (ret != 0 && reply->nb_sent > 0) {
struct pending_request *r, *next;
--
2.47.3
^ permalink raw reply related
* DPDK Tech Board meeting minutes 27-May-2026
From: Konstantin Ananyev @ 2026-06-08 14:06 UTC (permalink / raw)
To: dev@dpdk.org; +Cc: techboard@dpdk.org
Members Attending
=================
Aaron Conole
Bruce Richardson
Jerin Jacob Kollanukkaran
Kevin Traynor
Konstantin Ananyev (chair)
Maxime Coquelin
Morten Brørup
Stephen Hemminger
Thomas Monjalon
NOTE
====
The Technical Board meetings take place every second Wednesday at 3 pm UTC.
Meetings are public, and DPDK community members are welcome to attend.
Agenda and previous minutes:
http://core.dpdk.org/techboard/minutes
The next meeting will follow the regular schedule.
1. DPDK Vulnerability Management - request for more engineers (Maxime, Thomas)
---------------------------------------------------------------------------------------------------------------
- 15 unprocessed CVEs in the backlog
- One of the current DPDK security maintainers is not active any more
- DPDK security group needs more people to coupe with existing and new CVEs
- Options considered:
- Intel and Marvell will poke for some internal resources
- Try to reach universities that specialize in that topic
- Hire research interns for that role:
AR to current TB representative in the DPDK GB:
Bring up that problem to DPDK GB attention and request for funding
2. Excessive usage of __rte_always_inline (Stephen)
---------------------------------------------------------------------
The ``__rte_always_inline`` attribute forces the compiler to inline a function regardless of its size or call-graph heuristics.
Excessive usage of forced inlining can hurt performance by inflating function bodies, increasing register pressure,
and overriding profile-guided optimization.
In most cases preferred way would be plain ``inline`` (or no annotation at all for static functions) and let the compiler decide.
Modern compilers at ``-O2`` make good inlining decisions for small ``static inline`` functions.
New usages of ``__rte_always_inline`` have to be properly justified for the submitter od the patch.
Stephen to submit new patch for DPDK coding guideless to address that matter:
https://patchwork.dpdk.org/project/dpdk/patch/20260601172104.311909-1-stephen@networkplumber.org/
^ permalink raw reply
* RE: [PATCH v2] net/mlx5: fix counter TAILQ race between free and query callback
From: Dariusz Sosnowski @ 2026-06-08 14:11 UTC (permalink / raw)
To: Linhu Li, dev@dpdk.org; +Cc: stable@dpdk.org
In-Reply-To: <20260608132555.31439-1-lilinhu618@gmail.com>
> -----Original Message-----
> From: Linhu Li <lilinhu618@gmail.com>
> Sent: Monday, June 8, 2026 3:26 PM
> To: dev@dpdk.org
> Cc: stable@dpdk.org; Dariusz Sosnowski <dsosnowski@nvidia.com>; Linhu Li
> <lilinhu618@gmail.com>
> Subject: [PATCH v2] net/mlx5: fix counter TAILQ race between free and query
> callback
>
> flow_dv_counter_free() inserts counters into
> pool->counters[pool->query_gen] under pool->csl. Meanwhile,
> mlx5_flow_async_pool_query_handle() moves counters from
> pool->counters[query_gen ^ 1] to the global free list via
> TAILQ_CONCAT while holding only cmng->csl, not pool->csl.
>
> The comment in flow_dv_counter_free() claims the lock is not needed
> because the query callback and the release function operate on different lists.
> That holds only if the free path always observes the up-to-date query_gen. It
> can be violated:
>
> 1. A counter free thread (non-PMD, e.g. OVS offload thread) reads
> pool->query_gen == 0 and is about to insert into counters[0].
> 2. The free thread is preempted by the OS scheduler; it is a regular
> pthread, not pinned to a core.
> 3. The eal-intr-thread alarm fires: query_gen++ (now 1) and the async
> query is sent.
> 4. Hardware completes the query and the callback runs TAILQ_CONCAT on
> counters[0] (= query_gen ^ 1).
> 5. The free thread resumes and runs TAILQ_INSERT_TAIL on counters[0]
> concurrently with step 4 on another core.
>
> Because the two paths take different locks, TAILQ_INSERT_TAIL and
> TAILQ_CONCAT run concurrently on the same list with no synchronization and
> corrupt it: the pool-local list ends up with a NULL head but a dangling
> tqh_last, and the global free list tail no longer points to the real tail. The just-
> freed counter and every counter inserted afterwards become unreachable
> and are leaked.
>
> Non-PMD threads can be preempted for hundreds of microseconds under
> CPU pressure, which is well within the async query round-trip time, so the
> window is reachable in practice.
>
> Fix it by taking pool->csl in the query completion callback before operating on
> pool->counters[query_gen], serializing the CONCAT with any concurrent
> INSERT. The lock is taken once per pool per query completion in the eal-intr-
> thread context, not on the datapath, so the cost is negligible. Lock order is
> pool->csl then cmng->csl, matching all other sites.
>
> Also handle the error path: previously the counters accumulated in
> pool->counters[query_gen] were abandoned when a query failed. Move
> them back to the global free list to avoid a leak on persistent query failures.
>
> Fixes: ac79183dc6f7 ("net/mlx5: optimize free counter lookup")
> Cc: stable@dpdk.org
>
> Signed-off-by: Linhu Li <lilinhu618@gmail.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
^ permalink raw reply
* Re: [PATCH v4] ethdev: support inline calculating masked item value
From: Dariusz Sosnowski @ 2026-06-08 14:49 UTC (permalink / raw)
To: Bing Zhao, orika
Cc: viacheslavo, dev, rasland, stephen, suanmingm, matan, thomas
In-Reply-To: <20260603092805.9837-1-bingz@nvidia.com>
On Wed, Jun 03, 2026 at 12:28:05PM +0300, Bing Zhao wrote:
> In the asynchronous API definition and some drivers, the
> rte_flow_item spec value may not be calculated by the driver due to the
> reason of speed of light rule insertion rate and sometimes the input
> parameters will be copied and changed internally.
>
> After copying, the spec and last will be protected by the keyword
> const and cannot be changed in the code itself. And also the driver
> needs some extra memory to do the calculation and extra conditions
> to understand the length of each item spec. This is not efficient.
>
> To solve the issue and support usage of the following fix, a new OP
> was introduced to calculate the spec and last values after applying
> the mask inline.
>
> Signed-off-by: Bing Zhao <bingz@nvidia.com>
Ori has some technical issues with plain text emails on his side.
On his behalf:
Acked-by: Ori Kam <orika@nvidia.com>
Best regards,
Dariusz Sosnowski
^ permalink raw reply
* [PATCH 0/3] net/iavf: vf reset fixes
From: Ciara Loftus @ 2026-06-08 14:55 UTC (permalink / raw)
To: dev; +Cc: Ciara Loftus
The patch [1] aimed to address a race condition in the iavf driver
during a reset and also reduced noisy logging during resets.
Patch 1 of this series extracts the noisy logging fix into its own
commit.
Patch 2 offers an alternative approach to fixing the race condition.
Patch 3 fixes a pre-existing refcount imbalance in the shared event
handler thread that became visible while investigating the reset path.
[1] https://patches.dpdk.org/project/dpdk/patch/20260605123646.1328492-1-chaitanyababux.talluri@intel.com/
Ciara Loftus (2):
net/iavf: wait for PF reset start before reinitializing
net/iavf: fix event handler refcount leak on HW reset
Talluri Chaitanyababu (1):
net/iavf: downgrade opcode 0 ARQ log to debug
drivers/net/intel/iavf/iavf.h | 1 +
drivers/net/intel/iavf/iavf_ethdev.c | 14 +++++++++++++-
drivers/net/intel/iavf/iavf_vchnl.c | 11 +++++++++--
3 files changed, 23 insertions(+), 3 deletions(-)
--
2.43.0
^ permalink raw reply
* [PATCH 1/3] net/iavf: downgrade opcode 0 ARQ log to debug
From: Ciara Loftus @ 2026-06-08 14:55 UTC (permalink / raw)
To: dev; +Cc: Talluri Chaitanyababu
In-Reply-To: <20260608145518.1705524-1-ciara.loftus@intel.com>
From: Talluri Chaitanyababu <chaitanyababux.talluri@intel.com>
After admin queue reinitialisation, completions from uninitialised
ARQ ring descriptor memory may arrive before any real PF response.
These carry opcode 0 (`VIRTCHNL_OP_UNKNOWN`) and trigger a WARNING
log on every poll iteration, flooding the log during reset recovery.
Treat opcode 0 as a distinct case and log it at DEBUG level, while
retaining WARNING for genuine opcode mismatches.
Signed-off-by: Talluri Chaitanyababu <chaitanyababux.talluri@intel.com>
---
drivers/net/intel/iavf/iavf_vchnl.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/intel/iavf/iavf_vchnl.c b/drivers/net/intel/iavf/iavf_vchnl.c
index 94ccfb5d6e..cd90d35023 100644
--- a/drivers/net/intel/iavf/iavf_vchnl.c
+++ b/drivers/net/intel/iavf/iavf_vchnl.c
@@ -299,8 +299,15 @@ iavf_read_msg_from_pf(struct iavf_adapter *adapter, uint16_t buf_len,
/* async reply msg on command issued by vf previously */
result = IAVF_MSG_CMD;
if (opcode != vf->pend_cmd) {
- PMD_DRV_LOG(WARNING, "command mismatch, expect %u, get %u",
- vf->pend_cmd, opcode);
+ if (opcode == VIRTCHNL_OP_UNKNOWN)
+ PMD_DRV_LOG(DEBUG,
+ "Spurious msg with opcode 0, pending cmd %u",
+ vf->pend_cmd);
+ else
+ PMD_DRV_LOG(WARNING,
+ "command mismatch, expect %u, get %u",
+ vf->pend_cmd, opcode);
+
result = IAVF_MSG_ERR;
}
}
--
2.43.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox