* [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
@ 2024-12-04 22:09 Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 01/11] net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits Tariq Toukan
` (12 more replies)
0 siblings, 13 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Tariq Toukan
Hi,
This patchset starts with 4 patches that modify the IFC, targeted to
mlx5-next in order to be taken to rdma-next branch side sooner than in
the next merge window.
This patchset consists of two features:
1. In patches 5-6, Itamar adds SW Steering support for ConnectX-8.
2. Followed by patches by Carolina that add rate management support on
traffic classes in devlink and mlx5, more details below [1].
Series generated against:
commit bb18265c3aba ("r8169: remove support for chip version 11")
Regards,
Tariq
V5:
- Fix warning in devlink_nl_rate_tc_bw_set().
- Fix target branch of patch #4.
V4:
- Renamed the nested attribute for traffic class bandwidth to
DEVLINK_ATTR_RATE_TC_BWS.
- Changed the order of the attributes in `devlink.h`.
- Refactored the initialization tc-bw array in
devlink_nl_rate_tc_bw_set().
- Added extack messages to provide clear feedback on issues with tc-bw
arguments.
- Updated `rate-tc-bws` to support a multi-attr set, where each
attribute includes an index and the corresponding bandwidth for that
traffic class.
- Handled the issue where the user could provide
DEVLINK_ATTR_RATE_TC_BWS with duplicate indices.
- Provided ynl exmaples in devlink patch commit message.
- Take IFC patches to beginning of the series, targeted for mlx5-next.
V3:
- Dropped rate-tc-index, using tc-bw array index instead.
- Renamed rate-bw to rate-tc-bw.
- Documneted what the rate-tc-bw represents and added a range check for
validation.
- Intorduced devlink_nl_rate_tc_bw_set() to parse and set the TC
bandwidth values.
- Updated the user API in the commit message of patch 1/6 to ensure
bandwidths sum equals 100.
- Fixed missing filling of rate-parent in devlink_nl_rate_fill().
V2:
- Included <linux/dcbnl.h> in devlink.h to resolve missing
IEEE_8021QAZ_MAX_TCS definition.
- Refactored the rate-tc-bw attribute structure to use a separate
rate-tc-index.
- Updated patch 2/6 title.
[1]
This patch series extends the devlink-rate API to support traffic class
(TC) bandwidth management, enabling more granular control over traffic
shaping and rate limiting across multiple TCs. The API now allows users
to specify bandwidth proportions for different traffic classes in a
single command. This is particularly useful for managing Enhanced
Transmission Selection (ETS) for groups of Virtual Functions (VFs),
allowing precise bandwidth allocation across traffic classes.
Additionally the series refines the QoS handling in net/mlx5 to support
TC arbitration and bandwidth management on vports and rate nodes.
Extend devlink-rate API to support rate management on TCs:
- devlink: Extend the devlink rate API to support traffic class
bandwidth management
Introduce a no-op implementation:
- net/mlx5: Add no-op implementation for setting tc-bw on rate objects
Add support for enabling and disabling TC QoS on vports and nodes:
- net/mlx5: Add support for setting tc-bw on nodes
- net/mlx5: Add traffic class scheduling support for vport QoS
Support for setting tc-bw on rate objects:
- net/mlx5: Manage TC arbiter nodes and implement full support for
tc-bw
Carolina Jubran (6):
net/mlx5: Add support for new scheduling elements
devlink: Extend devlink rate API with traffic classes bandwidth
management
net/mlx5: Add no-op implementation for setting tc-bw on rate objects
net/mlx5: Add support for setting tc-bw on nodes
net/mlx5: Add traffic class scheduling support for vport QoS
net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw
Cosmin Ratiu (2):
net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits
net/mlx5: qos: Add ifc support for cross-esw scheduling
Itamar Gozlan (2):
net/mlx5: DR, Expand SWS STE callbacks and consolidate common structs
net/mlx5: DR, Add support for ConnectX-8 steering
Yevgeny Kliteynik (1):
net/mlx5: Add ConnectX-8 device to ifc
Documentation/netlink/specs/devlink.yaml | 28 +-
.../net/ethernet/mellanox/mlx5/core/Makefile | 1 +
.../net/ethernet/mellanox/mlx5/core/devlink.c | 2 +
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 795 +++++++++++++++++-
.../net/ethernet/mellanox/mlx5/core/esw/qos.h | 4 +
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 13 +-
drivers/net/ethernet/mellanox/mlx5/core/rl.c | 4 +
.../mlx5/core/steering/sws/dr_domain.c | 2 +-
.../mellanox/mlx5/core/steering/sws/dr_ste.c | 6 +-
.../mellanox/mlx5/core/steering/sws/dr_ste.h | 19 +-
.../mlx5/core/steering/sws/dr_ste_v0.c | 6 +-
.../mlx5/core/steering/sws/dr_ste_v1.c | 207 +----
.../mlx5/core/steering/sws/dr_ste_v1.h | 147 +++-
.../mlx5/core/steering/sws/dr_ste_v2.c | 169 +---
.../mlx5/core/steering/sws/dr_ste_v2.h | 168 ++++
.../mlx5/core/steering/sws/dr_ste_v3.c | 221 +++++
.../mlx5/core/steering/sws/mlx5_ifc_dr.h | 40 +
.../mellanox/mlx5/core/steering/sws/mlx5dr.h | 2 +-
include/linux/mlx5/mlx5_ifc.h | 56 +-
include/net/devlink.h | 7 +
include/uapi/linux/devlink.h | 4 +
net/devlink/netlink_gen.c | 15 +-
net/devlink/netlink_gen.h | 1 +
net/devlink/rate.c | 124 +++
24 files changed, 1645 insertions(+), 396 deletions(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v2.h
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v3.c
--
2.44.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH mlx5-next V5 01/11] net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 02/11] net/mlx5: Add ConnectX-8 device to ifc Tariq Toukan
` (11 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Cosmin Ratiu,
Tariq Toukan
From: Cosmin Ratiu <cratiu@nvidia.com>
The nested union at the end is not in the same style as the rest of the
code, so un-nest it to make the style uniformly applied again.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
include/linux/mlx5/mlx5_ifc.h | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 4fbbcf35498b..f3650f989e68 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -6324,6 +6324,20 @@ struct mlx5_ifc_modify_other_hca_cap_in_bits {
struct mlx5_ifc_other_hca_cap_bits other_capability;
};
+struct mlx5_ifc_sw_owner_icm_root_params_bits {
+ u8 sw_owner_icm_root_1[0x40];
+
+ u8 sw_owner_icm_root_0[0x40];
+};
+
+struct mlx5_ifc_rtc_params_bits {
+ u8 rtc_id_0[0x20];
+
+ u8 rtc_id_1[0x20];
+
+ u8 reserved_at_40[0x40];
+};
+
struct mlx5_ifc_flow_table_context_bits {
u8 reformat_en[0x1];
u8 decap_en[0x1];
@@ -6342,20 +6356,10 @@ struct mlx5_ifc_flow_table_context_bits {
u8 lag_master_next_table_id[0x18];
u8 reserved_at_60[0x60];
- union {
- struct {
- u8 sw_owner_icm_root_1[0x40];
-
- u8 sw_owner_icm_root_0[0x40];
- } sws;
- struct {
- u8 rtc_id_0[0x20];
- u8 rtc_id_1[0x20];
-
- u8 reserved_at_100[0x40];
-
- } hws;
+ union {
+ struct mlx5_ifc_sw_owner_icm_root_params_bits sws;
+ struct mlx5_ifc_rtc_params_bits hws;
};
};
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH mlx5-next V5 02/11] net/mlx5: Add ConnectX-8 device to ifc
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 01/11] net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 03/11] net/mlx5: Add support for new scheduling elements Tariq Toukan
` (10 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma,
Yevgeny Kliteynik, Tariq Toukan
From: Yevgeny Kliteynik <kliteyn@nvidia.com>
In preparation for ConnectX-8 SWS support, add enum for the new device
type.
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
include/linux/mlx5/mlx5_ifc.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index f3650f989e68..bd9b1833408e 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1590,6 +1590,7 @@ enum {
MLX5_STEERING_FORMAT_CONNECTX_5 = 0,
MLX5_STEERING_FORMAT_CONNECTX_6DX = 1,
MLX5_STEERING_FORMAT_CONNECTX_7 = 2,
+ MLX5_STEERING_FORMAT_CONNECTX_8 = 3,
};
struct mlx5_ifc_cmd_hca_cap_bits {
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH mlx5-next V5 03/11] net/mlx5: Add support for new scheduling elements
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 01/11] net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 02/11] net/mlx5: Add ConnectX-8 device to ifc Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 04/11] net/mlx5: qos: Add ifc support for cross-esw scheduling Tariq Toukan
` (9 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Carolina Jubran,
Cosmin Ratiu, Tariq Toukan
From: Carolina Jubran <cjubran@nvidia.com>
Introduce new scheduling elements in the E-Switch QoS hierarchy to
enhance traffic management capabilities. This patch adds support for:
- Rate Limit scheduling elements: Enables bandwidth limitation across
multiple nodes without a shared ancestor, providing a mechanism for
more granular control of bandwidth allocation.
- Traffic Class Transmit Scheduling Arbiter (TSAR): Introduces the
infrastructure for creating Traffic Class TSARs, allowing
hierarchical arbitration based on traffic classes.
- Traffic Class Arbiter TSAR: Adds support for a TSAR capable of
managing arbitration between multiple traffic classes, enabling
improved bandwidth prioritization and traffic management.
No functional changes are introduced in this patch.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/rl.c | 4 ++++
include/linux/mlx5/mlx5_ifc.h | 14 +++++++++++---
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/rl.c b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
index e393391966e0..39a209b9b684 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/rl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/rl.c
@@ -56,6 +56,8 @@ bool mlx5_qos_tsar_type_supported(struct mlx5_core_dev *dev, int type, u8 hierar
return cap & TSAR_TYPE_CAP_MASK_ROUND_ROBIN;
case TSAR_ELEMENT_TSAR_TYPE_ETS:
return cap & TSAR_TYPE_CAP_MASK_ETS;
+ case TSAR_ELEMENT_TSAR_TYPE_TC_ARB:
+ return cap & TSAR_TYPE_CAP_MASK_TC_ARB;
}
return false;
@@ -87,6 +89,8 @@ bool mlx5_qos_element_type_supported(struct mlx5_core_dev *dev, int type, u8 hie
return cap & ELEMENT_TYPE_CAP_MASK_PARA_VPORT_TC;
case SCHEDULING_CONTEXT_ELEMENT_TYPE_QUEUE_GROUP:
return cap & ELEMENT_TYPE_CAP_MASK_QUEUE_GROUP;
+ case SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT:
+ return cap & ELEMENT_TYPE_CAP_MASK_RATE_LIMIT;
}
return false;
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index bd9b1833408e..8b202521b774 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1103,7 +1103,8 @@ struct mlx5_ifc_qos_cap_bits {
u8 packet_pacing_min_rate[0x20];
- u8 reserved_at_80[0x10];
+ u8 reserved_at_80[0xb];
+ u8 log_esw_max_rate_limit[0x5];
u8 packet_pacing_rate_table_size[0x10];
u8 esw_element_type[0x10];
@@ -4104,6 +4105,7 @@ enum {
SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC = 0x2,
SCHEDULING_CONTEXT_ELEMENT_TYPE_PARA_VPORT_TC = 0x3,
SCHEDULING_CONTEXT_ELEMENT_TYPE_QUEUE_GROUP = 0x4,
+ SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT = 0x5,
};
enum {
@@ -4112,22 +4114,26 @@ enum {
ELEMENT_TYPE_CAP_MASK_VPORT_TC = 1 << 2,
ELEMENT_TYPE_CAP_MASK_PARA_VPORT_TC = 1 << 3,
ELEMENT_TYPE_CAP_MASK_QUEUE_GROUP = 1 << 4,
+ ELEMENT_TYPE_CAP_MASK_RATE_LIMIT = 1 << 5,
};
enum {
TSAR_ELEMENT_TSAR_TYPE_DWRR = 0x0,
TSAR_ELEMENT_TSAR_TYPE_ROUND_ROBIN = 0x1,
TSAR_ELEMENT_TSAR_TYPE_ETS = 0x2,
+ TSAR_ELEMENT_TSAR_TYPE_TC_ARB = 0x3,
};
enum {
TSAR_TYPE_CAP_MASK_DWRR = 1 << 0,
TSAR_TYPE_CAP_MASK_ROUND_ROBIN = 1 << 1,
TSAR_TYPE_CAP_MASK_ETS = 1 << 2,
+ TSAR_TYPE_CAP_MASK_TC_ARB = 1 << 3,
};
struct mlx5_ifc_tsar_element_bits {
- u8 reserved_at_0[0x8];
+ u8 traffic_class[0x4];
+ u8 reserved_at_4[0x4];
u8 tsar_type[0x8];
u8 reserved_at_10[0x10];
};
@@ -4164,7 +4170,9 @@ struct mlx5_ifc_scheduling_context_bits {
u8 max_average_bw[0x20];
- u8 reserved_at_e0[0x120];
+ u8 max_bw_obj_id[0x20];
+
+ u8 reserved_at_100[0x100];
};
struct mlx5_ifc_rqtc_bits {
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH mlx5-next V5 04/11] net/mlx5: qos: Add ifc support for cross-esw scheduling
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (2 preceding siblings ...)
2024-12-04 22:09 ` [PATCH mlx5-next V5 03/11] net/mlx5: Add support for new scheduling elements Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 05/11] net/mlx5: DR, Expand SWS STE callbacks and consolidate common structs Tariq Toukan
` (8 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Cosmin Ratiu,
Tariq Toukan
From: Cosmin Ratiu <cratiu@nvidia.com>
This adds the capability bit and the vport element fields related to
cross-esw scheduling.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
include/linux/mlx5/mlx5_ifc.h | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 8b202521b774..5451ff1d4356 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1095,7 +1095,9 @@ struct mlx5_ifc_qos_cap_bits {
u8 log_esw_max_sched_depth[0x4];
u8 reserved_at_10[0x10];
- u8 reserved_at_20[0xb];
+ u8 reserved_at_20[0x9];
+ u8 esw_cross_esw_sched[0x1];
+ u8 reserved_at_2a[0x1];
u8 log_max_qos_nic_queue_group[0x5];
u8 reserved_at_30[0x10];
@@ -4139,13 +4141,16 @@ struct mlx5_ifc_tsar_element_bits {
};
struct mlx5_ifc_vport_element_bits {
- u8 reserved_at_0[0x10];
+ u8 reserved_at_0[0x4];
+ u8 eswitch_owner_vhca_id_valid[0x1];
+ u8 eswitch_owner_vhca_id[0xb];
u8 vport_number[0x10];
};
struct mlx5_ifc_vport_tc_element_bits {
u8 traffic_class[0x4];
- u8 reserved_at_4[0xc];
+ u8 eswitch_owner_vhca_id_valid[0x1];
+ u8 eswitch_owner_vhca_id[0xb];
u8 vport_number[0x10];
};
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next V5 05/11] net/mlx5: DR, Expand SWS STE callbacks and consolidate common structs
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (3 preceding siblings ...)
2024-12-04 22:09 ` [PATCH mlx5-next V5 04/11] net/mlx5: qos: Add ifc support for cross-esw scheduling Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 06/11] net/mlx5: DR, Add support for ConnectX-8 steering Tariq Toukan
` (7 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Itamar Gozlan,
Yevgeny Kliteynik, Tariq Toukan
From: Itamar Gozlan <igozlan@nvidia.com>
Expand SWS STE callbacks to support ConnectX-8 hardware.
Move common enums and structures to a shared header file.
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../mellanox/mlx5/core/steering/sws/dr_ste.c | 4 +-
.../mellanox/mlx5/core/steering/sws/dr_ste.h | 18 +-
.../mlx5/core/steering/sws/dr_ste_v0.c | 6 +-
.../mlx5/core/steering/sws/dr_ste_v1.c | 207 ++++--------------
.../mlx5/core/steering/sws/dr_ste_v1.h | 147 ++++++++++++-
.../mlx5/core/steering/sws/dr_ste_v2.c | 169 +-------------
.../mlx5/core/steering/sws/dr_ste_v2.h | 168 ++++++++++++++
7 files changed, 377 insertions(+), 342 deletions(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v2.h
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.c
index e94fbb015efa..01ba8eae2983 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.c
@@ -555,7 +555,7 @@ void mlx5dr_ste_set_actions_tx(struct mlx5dr_ste_ctx *ste_ctx,
struct mlx5dr_ste_actions_attr *attr,
u32 *added_stes)
{
- ste_ctx->set_actions_tx(dmn, action_type_set, ste_ctx->actions_caps,
+ ste_ctx->set_actions_tx(ste_ctx, dmn, action_type_set, ste_ctx->actions_caps,
hw_ste_arr, attr, added_stes);
}
@@ -566,7 +566,7 @@ void mlx5dr_ste_set_actions_rx(struct mlx5dr_ste_ctx *ste_ctx,
struct mlx5dr_ste_actions_attr *attr,
u32 *added_stes)
{
- ste_ctx->set_actions_rx(dmn, action_type_set, ste_ctx->actions_caps,
+ ste_ctx->set_actions_rx(ste_ctx, dmn, action_type_set, ste_ctx->actions_caps,
hw_ste_arr, attr, added_stes);
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.h
index 54a6619c3ecb..b6ec8d30d990 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.h
@@ -160,13 +160,15 @@ struct mlx5dr_ste_ctx {
/* Actions */
u32 actions_caps;
- void (*set_actions_rx)(struct mlx5dr_domain *dmn,
+ void (*set_actions_rx)(struct mlx5dr_ste_ctx *ste_ctx,
+ struct mlx5dr_domain *dmn,
u8 *action_type_set,
u32 actions_caps,
u8 *hw_ste_arr,
struct mlx5dr_ste_actions_attr *attr,
u32 *added_stes);
- void (*set_actions_tx)(struct mlx5dr_domain *dmn,
+ void (*set_actions_tx)(struct mlx5dr_ste_ctx *ste_ctx,
+ struct mlx5dr_domain *dmn,
u8 *action_type_set,
u32 actions_caps,
u8 *hw_ste_arr,
@@ -197,7 +199,17 @@ struct mlx5dr_ste_ctx {
u16 *used_hw_action_num);
int (*alloc_modify_hdr_chunk)(struct mlx5dr_action *action);
void (*dealloc_modify_hdr_chunk)(struct mlx5dr_action *action);
-
+ /* Actions bit set */
+ void (*set_encap)(u8 *hw_ste_p, u8 *d_action,
+ u32 reformat_id, int size);
+ void (*set_push_vlan)(u8 *ste, u8 *d_action,
+ u32 vlan_hdr);
+ void (*set_pop_vlan)(u8 *hw_ste_p, u8 *s_action,
+ u8 vlans_num);
+ void (*set_rx_decap)(u8 *hw_ste_p, u8 *s_action);
+ void (*set_encap_l3)(u8 *hw_ste_p, u8 *frst_s_action,
+ u8 *scnd_d_action, u32 reformat_id,
+ int size);
/* Send */
void (*prepare_for_postsend)(u8 *hw_ste_p, u32 ste_size);
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v0.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v0.c
index e9f6c7ed7a7b..42536bee55e2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v0.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v0.c
@@ -406,7 +406,8 @@ static void dr_ste_v0_arr_init_next(u8 **last_ste,
}
static void
-dr_ste_v0_set_actions_tx(struct mlx5dr_domain *dmn,
+dr_ste_v0_set_actions_tx(struct mlx5dr_ste_ctx *ste_ctx,
+ struct mlx5dr_domain *dmn,
u8 *action_type_set,
u32 actions_caps,
u8 *last_ste,
@@ -476,7 +477,8 @@ dr_ste_v0_set_actions_tx(struct mlx5dr_domain *dmn,
}
static void
-dr_ste_v0_set_actions_rx(struct mlx5dr_domain *dmn,
+dr_ste_v0_set_actions_rx(struct mlx5dr_ste_ctx *ste_ctx,
+ struct mlx5dr_domain *dmn,
u8 *action_type_set,
u32 actions_caps,
u8 *last_ste,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v1.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v1.c
index 1d49704b9542..7f83d77c43ef 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v1.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v1.c
@@ -5,136 +5,6 @@
#include "mlx5_ifc_dr_ste_v1.h"
#include "dr_ste_v1.h"
-#define DR_STE_CALC_DFNR_TYPE(lookup_type, inner) \
- ((inner) ? DR_STE_V1_LU_TYPE_##lookup_type##_I : \
- DR_STE_V1_LU_TYPE_##lookup_type##_O)
-
-enum dr_ste_v1_entry_format {
- DR_STE_V1_TYPE_BWC_BYTE = 0x0,
- DR_STE_V1_TYPE_BWC_DW = 0x1,
- DR_STE_V1_TYPE_MATCH = 0x2,
- DR_STE_V1_TYPE_MATCH_RANGES = 0x7,
-};
-
-/* Lookup type is built from 2B: [ Definer mode 1B ][ Definer index 1B ] */
-enum {
- DR_STE_V1_LU_TYPE_NOP = 0x0000,
- DR_STE_V1_LU_TYPE_ETHL2_TNL = 0x0002,
- DR_STE_V1_LU_TYPE_IBL3_EXT = 0x0102,
- DR_STE_V1_LU_TYPE_ETHL2_O = 0x0003,
- DR_STE_V1_LU_TYPE_IBL4 = 0x0103,
- DR_STE_V1_LU_TYPE_ETHL2_I = 0x0004,
- DR_STE_V1_LU_TYPE_SRC_QP_GVMI = 0x0104,
- DR_STE_V1_LU_TYPE_ETHL2_SRC_O = 0x0005,
- DR_STE_V1_LU_TYPE_ETHL2_HEADERS_O = 0x0105,
- DR_STE_V1_LU_TYPE_ETHL2_SRC_I = 0x0006,
- DR_STE_V1_LU_TYPE_ETHL2_HEADERS_I = 0x0106,
- DR_STE_V1_LU_TYPE_ETHL3_IPV4_5_TUPLE_O = 0x0007,
- DR_STE_V1_LU_TYPE_IPV6_DES_O = 0x0107,
- DR_STE_V1_LU_TYPE_ETHL3_IPV4_5_TUPLE_I = 0x0008,
- DR_STE_V1_LU_TYPE_IPV6_DES_I = 0x0108,
- DR_STE_V1_LU_TYPE_ETHL4_O = 0x0009,
- DR_STE_V1_LU_TYPE_IPV6_SRC_O = 0x0109,
- DR_STE_V1_LU_TYPE_ETHL4_I = 0x000a,
- DR_STE_V1_LU_TYPE_IPV6_SRC_I = 0x010a,
- DR_STE_V1_LU_TYPE_ETHL2_SRC_DST_O = 0x000b,
- DR_STE_V1_LU_TYPE_MPLS_O = 0x010b,
- DR_STE_V1_LU_TYPE_ETHL2_SRC_DST_I = 0x000c,
- DR_STE_V1_LU_TYPE_MPLS_I = 0x010c,
- DR_STE_V1_LU_TYPE_ETHL3_IPV4_MISC_O = 0x000d,
- DR_STE_V1_LU_TYPE_GRE = 0x010d,
- DR_STE_V1_LU_TYPE_FLEX_PARSER_TNL_HEADER = 0x000e,
- DR_STE_V1_LU_TYPE_GENERAL_PURPOSE = 0x010e,
- DR_STE_V1_LU_TYPE_ETHL3_IPV4_MISC_I = 0x000f,
- DR_STE_V1_LU_TYPE_STEERING_REGISTERS_0 = 0x010f,
- DR_STE_V1_LU_TYPE_STEERING_REGISTERS_1 = 0x0110,
- DR_STE_V1_LU_TYPE_FLEX_PARSER_OK = 0x0011,
- DR_STE_V1_LU_TYPE_FLEX_PARSER_0 = 0x0111,
- DR_STE_V1_LU_TYPE_FLEX_PARSER_1 = 0x0112,
- DR_STE_V1_LU_TYPE_ETHL4_MISC_O = 0x0113,
- DR_STE_V1_LU_TYPE_ETHL4_MISC_I = 0x0114,
- DR_STE_V1_LU_TYPE_INVALID = 0x00ff,
- DR_STE_V1_LU_TYPE_DONT_CARE = MLX5DR_STE_LU_TYPE_DONT_CARE,
-};
-
-enum dr_ste_v1_header_anchors {
- DR_STE_HEADER_ANCHOR_START_OUTER = 0x00,
- DR_STE_HEADER_ANCHOR_1ST_VLAN = 0x02,
- DR_STE_HEADER_ANCHOR_IPV6_IPV4 = 0x07,
- DR_STE_HEADER_ANCHOR_INNER_MAC = 0x13,
- DR_STE_HEADER_ANCHOR_INNER_IPV6_IPV4 = 0x19,
-};
-
-enum dr_ste_v1_action_size {
- DR_STE_ACTION_SINGLE_SZ = 4,
- DR_STE_ACTION_DOUBLE_SZ = 8,
- DR_STE_ACTION_TRIPLE_SZ = 12,
-};
-
-enum dr_ste_v1_action_insert_ptr_attr {
- DR_STE_V1_ACTION_INSERT_PTR_ATTR_NONE = 0, /* Regular push header (e.g. push vlan) */
- DR_STE_V1_ACTION_INSERT_PTR_ATTR_ENCAP = 1, /* Encapsulation / Tunneling */
- DR_STE_V1_ACTION_INSERT_PTR_ATTR_ESP = 2, /* IPsec */
-};
-
-enum dr_ste_v1_action_id {
- DR_STE_V1_ACTION_ID_NOP = 0x00,
- DR_STE_V1_ACTION_ID_COPY = 0x05,
- DR_STE_V1_ACTION_ID_SET = 0x06,
- DR_STE_V1_ACTION_ID_ADD = 0x07,
- DR_STE_V1_ACTION_ID_REMOVE_BY_SIZE = 0x08,
- DR_STE_V1_ACTION_ID_REMOVE_HEADER_TO_HEADER = 0x09,
- DR_STE_V1_ACTION_ID_INSERT_INLINE = 0x0a,
- DR_STE_V1_ACTION_ID_INSERT_POINTER = 0x0b,
- DR_STE_V1_ACTION_ID_FLOW_TAG = 0x0c,
- DR_STE_V1_ACTION_ID_QUEUE_ID_SEL = 0x0d,
- DR_STE_V1_ACTION_ID_ACCELERATED_LIST = 0x0e,
- DR_STE_V1_ACTION_ID_MODIFY_LIST = 0x0f,
- DR_STE_V1_ACTION_ID_ASO = 0x12,
- DR_STE_V1_ACTION_ID_TRAILER = 0x13,
- DR_STE_V1_ACTION_ID_COUNTER_ID = 0x14,
- DR_STE_V1_ACTION_ID_MAX = 0x21,
- /* use for special cases */
- DR_STE_V1_ACTION_ID_SPECIAL_ENCAP_L3 = 0x22,
-};
-
-enum {
- DR_STE_V1_ACTION_MDFY_FLD_L2_OUT_0 = 0x00,
- DR_STE_V1_ACTION_MDFY_FLD_L2_OUT_1 = 0x01,
- DR_STE_V1_ACTION_MDFY_FLD_L2_OUT_2 = 0x02,
- DR_STE_V1_ACTION_MDFY_FLD_SRC_L2_OUT_0 = 0x08,
- DR_STE_V1_ACTION_MDFY_FLD_SRC_L2_OUT_1 = 0x09,
- DR_STE_V1_ACTION_MDFY_FLD_L3_OUT_0 = 0x0e,
- DR_STE_V1_ACTION_MDFY_FLD_L4_OUT_0 = 0x18,
- DR_STE_V1_ACTION_MDFY_FLD_L4_OUT_1 = 0x19,
- DR_STE_V1_ACTION_MDFY_FLD_IPV4_OUT_0 = 0x40,
- DR_STE_V1_ACTION_MDFY_FLD_IPV4_OUT_1 = 0x41,
- DR_STE_V1_ACTION_MDFY_FLD_IPV6_DST_OUT_0 = 0x44,
- DR_STE_V1_ACTION_MDFY_FLD_IPV6_DST_OUT_1 = 0x45,
- DR_STE_V1_ACTION_MDFY_FLD_IPV6_DST_OUT_2 = 0x46,
- DR_STE_V1_ACTION_MDFY_FLD_IPV6_DST_OUT_3 = 0x47,
- DR_STE_V1_ACTION_MDFY_FLD_IPV6_SRC_OUT_0 = 0x4c,
- DR_STE_V1_ACTION_MDFY_FLD_IPV6_SRC_OUT_1 = 0x4d,
- DR_STE_V1_ACTION_MDFY_FLD_IPV6_SRC_OUT_2 = 0x4e,
- DR_STE_V1_ACTION_MDFY_FLD_IPV6_SRC_OUT_3 = 0x4f,
- DR_STE_V1_ACTION_MDFY_FLD_TCP_MISC_0 = 0x5e,
- DR_STE_V1_ACTION_MDFY_FLD_TCP_MISC_1 = 0x5f,
- DR_STE_V1_ACTION_MDFY_FLD_CFG_HDR_0_0 = 0x6f,
- DR_STE_V1_ACTION_MDFY_FLD_CFG_HDR_0_1 = 0x70,
- DR_STE_V1_ACTION_MDFY_FLD_METADATA_2_CQE = 0x7b,
- DR_STE_V1_ACTION_MDFY_FLD_GNRL_PURPOSE = 0x7c,
- DR_STE_V1_ACTION_MDFY_FLD_REGISTER_2_0 = 0x8c,
- DR_STE_V1_ACTION_MDFY_FLD_REGISTER_2_1 = 0x8d,
- DR_STE_V1_ACTION_MDFY_FLD_REGISTER_1_0 = 0x8e,
- DR_STE_V1_ACTION_MDFY_FLD_REGISTER_1_1 = 0x8f,
- DR_STE_V1_ACTION_MDFY_FLD_REGISTER_0_0 = 0x90,
- DR_STE_V1_ACTION_MDFY_FLD_REGISTER_0_1 = 0x91,
-};
-
-enum dr_ste_v1_aso_ctx_type {
- DR_STE_V1_ASO_CTX_TYPE_POLICERS = 0x2,
-};
-
static const struct mlx5dr_ste_action_modify_field dr_ste_v1_action_modify_field_arr[] = {
[MLX5_ACTION_IN_FIELD_OUT_SMAC_47_16] = {
.hw_field = DR_STE_V1_ACTION_MDFY_FLD_SRC_L2_OUT_0, .start = 0, .end = 31,
@@ -379,13 +249,12 @@ static void dr_ste_v1_set_counter_id(u8 *hw_ste_p, u32 ctr_id)
MLX5_SET(ste_match_bwc_v1, hw_ste_p, counter_id, ctr_id);
}
-static void dr_ste_v1_set_reparse(u8 *hw_ste_p)
+void dr_ste_v1_set_reparse(u8 *hw_ste_p)
{
MLX5_SET(ste_match_bwc_v1, hw_ste_p, reparse, 1);
}
-static void dr_ste_v1_set_encap(u8 *hw_ste_p, u8 *d_action,
- u32 reformat_id, int size)
+void dr_ste_v1_set_encap(u8 *hw_ste_p, u8 *d_action, u32 reformat_id, int size)
{
MLX5_SET(ste_double_action_insert_with_ptr_v1, d_action, action_id,
DR_STE_V1_ACTION_ID_INSERT_POINTER);
@@ -432,8 +301,7 @@ static void dr_ste_v1_set_remove_hdr(u8 *hw_ste_p, u8 *s_action,
dr_ste_v1_set_reparse(hw_ste_p);
}
-static void dr_ste_v1_set_push_vlan(u8 *hw_ste_p, u8 *d_action,
- u32 vlan_hdr)
+void dr_ste_v1_set_push_vlan(u8 *hw_ste_p, u8 *d_action, u32 vlan_hdr)
{
MLX5_SET(ste_double_action_insert_with_inline_v1, d_action,
action_id, DR_STE_V1_ACTION_ID_INSERT_INLINE);
@@ -446,7 +314,7 @@ static void dr_ste_v1_set_push_vlan(u8 *hw_ste_p, u8 *d_action,
dr_ste_v1_set_reparse(hw_ste_p);
}
-static void dr_ste_v1_set_pop_vlan(u8 *hw_ste_p, u8 *s_action, u8 vlans_num)
+void dr_ste_v1_set_pop_vlan(u8 *hw_ste_p, u8 *s_action, u8 vlans_num)
{
MLX5_SET(ste_single_action_remove_header_size_v1, s_action,
action_id, DR_STE_V1_ACTION_ID_REMOVE_BY_SIZE);
@@ -459,11 +327,8 @@ static void dr_ste_v1_set_pop_vlan(u8 *hw_ste_p, u8 *s_action, u8 vlans_num)
dr_ste_v1_set_reparse(hw_ste_p);
}
-static void dr_ste_v1_set_encap_l3(u8 *hw_ste_p,
- u8 *frst_s_action,
- u8 *scnd_d_action,
- u32 reformat_id,
- int size)
+void dr_ste_v1_set_encap_l3(u8 *hw_ste_p, u8 *frst_s_action, u8 *scnd_d_action,
+ u32 reformat_id, int size)
{
/* Remove L2 headers */
MLX5_SET(ste_single_action_remove_header_v1, frst_s_action, action_id,
@@ -483,7 +348,7 @@ static void dr_ste_v1_set_encap_l3(u8 *hw_ste_p,
dr_ste_v1_set_reparse(hw_ste_p);
}
-static void dr_ste_v1_set_rx_decap(u8 *hw_ste_p, u8 *s_action)
+void dr_ste_v1_set_rx_decap(u8 *hw_ste_p, u8 *s_action)
{
MLX5_SET(ste_single_action_remove_header_v1, s_action, action_id,
DR_STE_V1_ACTION_ID_REMOVE_HEADER_TO_HEADER);
@@ -620,7 +485,8 @@ static void dr_ste_v1_arr_init_next_match_range(u8 **last_ste,
dr_ste_v1_set_entry_type(*last_ste, DR_STE_V1_TYPE_MATCH_RANGES);
}
-void dr_ste_v1_set_actions_tx(struct mlx5dr_domain *dmn,
+void dr_ste_v1_set_actions_tx(struct mlx5dr_ste_ctx *ste_ctx,
+ struct mlx5dr_domain *dmn,
u8 *action_type_set,
u32 actions_caps,
u8 *last_ste,
@@ -640,7 +506,7 @@ void dr_ste_v1_set_actions_tx(struct mlx5dr_domain *dmn,
last_ste, action);
action_sz = DR_STE_ACTION_TRIPLE_SZ;
}
- dr_ste_v1_set_pop_vlan(last_ste, action, attr->vlans.count);
+ ste_ctx->set_pop_vlan(last_ste, action, attr->vlans.count);
action_sz -= DR_STE_ACTION_SINGLE_SZ;
action += DR_STE_ACTION_SINGLE_SZ;
@@ -677,8 +543,8 @@ void dr_ste_v1_set_actions_tx(struct mlx5dr_domain *dmn,
action_sz = DR_STE_ACTION_TRIPLE_SZ;
allow_encap = true;
}
- dr_ste_v1_set_push_vlan(last_ste, action,
- attr->vlans.headers[i]);
+ ste_ctx->set_push_vlan(last_ste, action,
+ attr->vlans.headers[i]);
action_sz -= DR_STE_ACTION_DOUBLE_SZ;
action += DR_STE_ACTION_DOUBLE_SZ;
}
@@ -691,9 +557,9 @@ void dr_ste_v1_set_actions_tx(struct mlx5dr_domain *dmn,
action_sz = DR_STE_ACTION_TRIPLE_SZ;
allow_encap = true;
}
- dr_ste_v1_set_encap(last_ste, action,
- attr->reformat.id,
- attr->reformat.size);
+ ste_ctx->set_encap(last_ste, action,
+ attr->reformat.id,
+ attr->reformat.size);
action_sz -= DR_STE_ACTION_DOUBLE_SZ;
action += DR_STE_ACTION_DOUBLE_SZ;
} else if (action_type_set[DR_ACTION_TYP_L2_TO_TNL_L3]) {
@@ -706,10 +572,10 @@ void dr_ste_v1_set_actions_tx(struct mlx5dr_domain *dmn,
}
d_action = action + DR_STE_ACTION_SINGLE_SZ;
- dr_ste_v1_set_encap_l3(last_ste,
- action, d_action,
- attr->reformat.id,
- attr->reformat.size);
+ ste_ctx->set_encap_l3(last_ste,
+ action, d_action,
+ attr->reformat.id,
+ attr->reformat.size);
action_sz -= DR_STE_ACTION_TRIPLE_SZ;
action += DR_STE_ACTION_TRIPLE_SZ;
} else if (action_type_set[DR_ACTION_TYP_INSERT_HDR]) {
@@ -776,7 +642,8 @@ void dr_ste_v1_set_actions_tx(struct mlx5dr_domain *dmn,
dr_ste_v1_set_hit_addr(last_ste, attr->final_icm_addr, 1);
}
-void dr_ste_v1_set_actions_rx(struct mlx5dr_domain *dmn,
+void dr_ste_v1_set_actions_rx(struct mlx5dr_ste_ctx *ste_ctx,
+ struct mlx5dr_domain *dmn,
u8 *action_type_set,
u32 actions_caps,
u8 *last_ste,
@@ -799,7 +666,7 @@ void dr_ste_v1_set_actions_rx(struct mlx5dr_domain *dmn,
allow_modify_hdr = false;
allow_ctr = false;
} else if (action_type_set[DR_ACTION_TYP_TNL_L2_TO_L2]) {
- dr_ste_v1_set_rx_decap(last_ste, action);
+ ste_ctx->set_rx_decap(last_ste, action);
action_sz -= DR_STE_ACTION_SINGLE_SZ;
action += DR_STE_ACTION_SINGLE_SZ;
allow_modify_hdr = false;
@@ -827,7 +694,7 @@ void dr_ste_v1_set_actions_rx(struct mlx5dr_domain *dmn,
action_sz = DR_STE_ACTION_TRIPLE_SZ;
}
- dr_ste_v1_set_pop_vlan(last_ste, action, attr->vlans.count);
+ ste_ctx->set_pop_vlan(last_ste, action, attr->vlans.count);
action_sz -= DR_STE_ACTION_SINGLE_SZ;
action += DR_STE_ACTION_SINGLE_SZ;
allow_ctr = false;
@@ -868,8 +735,8 @@ void dr_ste_v1_set_actions_rx(struct mlx5dr_domain *dmn,
last_ste, action);
action_sz = DR_STE_ACTION_TRIPLE_SZ;
}
- dr_ste_v1_set_push_vlan(last_ste, action,
- attr->vlans.headers[i]);
+ ste_ctx->set_push_vlan(last_ste, action,
+ attr->vlans.headers[i]);
action_sz -= DR_STE_ACTION_DOUBLE_SZ;
action += DR_STE_ACTION_DOUBLE_SZ;
}
@@ -895,9 +762,9 @@ void dr_ste_v1_set_actions_rx(struct mlx5dr_domain *dmn,
action = MLX5_ADDR_OF(ste_mask_and_match_v1, last_ste, action);
action_sz = DR_STE_ACTION_TRIPLE_SZ;
}
- dr_ste_v1_set_encap(last_ste, action,
- attr->reformat.id,
- attr->reformat.size);
+ ste_ctx->set_encap(last_ste, action,
+ attr->reformat.id,
+ attr->reformat.size);
action_sz -= DR_STE_ACTION_DOUBLE_SZ;
action += DR_STE_ACTION_DOUBLE_SZ;
allow_modify_hdr = false;
@@ -912,10 +779,10 @@ void dr_ste_v1_set_actions_rx(struct mlx5dr_domain *dmn,
d_action = action + DR_STE_ACTION_SINGLE_SZ;
- dr_ste_v1_set_encap_l3(last_ste,
- action, d_action,
- attr->reformat.id,
- attr->reformat.size);
+ ste_ctx->set_encap_l3(last_ste,
+ action, d_action,
+ attr->reformat.id,
+ attr->reformat.size);
action_sz -= DR_STE_ACTION_TRIPLE_SZ;
allow_modify_hdr = false;
} else if (action_type_set[DR_ACTION_TYP_INSERT_HDR]) {
@@ -1027,9 +894,6 @@ void dr_ste_v1_set_action_copy(u8 *d_action,
MLX5_SET(ste_double_action_copy_v1, d_action, source_right_shifter, src_shifter);
}
-#define DR_STE_DECAP_L3_ACTION_NUM 8
-#define DR_STE_L2_HDR_MAX_SZ 20
-
int dr_ste_v1_set_action_decap_l3_list(void *data,
u32 data_sz,
u8 *hw_action,
@@ -2330,7 +2194,12 @@ static struct mlx5dr_ste_ctx ste_ctx_v1 = {
.set_action_decap_l3_list = &dr_ste_v1_set_action_decap_l3_list,
.alloc_modify_hdr_chunk = &dr_ste_v1_alloc_modify_hdr_ptrn_arg,
.dealloc_modify_hdr_chunk = &dr_ste_v1_free_modify_hdr_ptrn_arg,
-
+ /* Actions bit set */
+ .set_encap = &dr_ste_v1_set_encap,
+ .set_push_vlan = &dr_ste_v1_set_push_vlan,
+ .set_pop_vlan = &dr_ste_v1_set_pop_vlan,
+ .set_rx_decap = &dr_ste_v1_set_rx_decap,
+ .set_encap_l3 = &dr_ste_v1_set_encap_l3,
/* Send */
.prepare_for_postsend = &dr_ste_v1_prepare_for_postsend,
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v1.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v1.h
index e2fc69867088..a8d9e308d339 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v1.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v1.h
@@ -7,6 +7,138 @@
#include "dr_types.h"
#include "dr_ste.h"
+#define DR_STE_DECAP_L3_ACTION_NUM 8
+#define DR_STE_L2_HDR_MAX_SZ 20
+#define DR_STE_CALC_DFNR_TYPE(lookup_type, inner) \
+ ((inner) ? DR_STE_V1_LU_TYPE_##lookup_type##_I : \
+ DR_STE_V1_LU_TYPE_##lookup_type##_O)
+
+enum dr_ste_v1_entry_format {
+ DR_STE_V1_TYPE_BWC_BYTE = 0x0,
+ DR_STE_V1_TYPE_BWC_DW = 0x1,
+ DR_STE_V1_TYPE_MATCH = 0x2,
+ DR_STE_V1_TYPE_MATCH_RANGES = 0x7,
+};
+
+/* Lookup type is built from 2B: [ Definer mode 1B ][ Definer index 1B ] */
+enum {
+ DR_STE_V1_LU_TYPE_NOP = 0x0000,
+ DR_STE_V1_LU_TYPE_ETHL2_TNL = 0x0002,
+ DR_STE_V1_LU_TYPE_IBL3_EXT = 0x0102,
+ DR_STE_V1_LU_TYPE_ETHL2_O = 0x0003,
+ DR_STE_V1_LU_TYPE_IBL4 = 0x0103,
+ DR_STE_V1_LU_TYPE_ETHL2_I = 0x0004,
+ DR_STE_V1_LU_TYPE_SRC_QP_GVMI = 0x0104,
+ DR_STE_V1_LU_TYPE_ETHL2_SRC_O = 0x0005,
+ DR_STE_V1_LU_TYPE_ETHL2_HEADERS_O = 0x0105,
+ DR_STE_V1_LU_TYPE_ETHL2_SRC_I = 0x0006,
+ DR_STE_V1_LU_TYPE_ETHL2_HEADERS_I = 0x0106,
+ DR_STE_V1_LU_TYPE_ETHL3_IPV4_5_TUPLE_O = 0x0007,
+ DR_STE_V1_LU_TYPE_IPV6_DES_O = 0x0107,
+ DR_STE_V1_LU_TYPE_ETHL3_IPV4_5_TUPLE_I = 0x0008,
+ DR_STE_V1_LU_TYPE_IPV6_DES_I = 0x0108,
+ DR_STE_V1_LU_TYPE_ETHL4_O = 0x0009,
+ DR_STE_V1_LU_TYPE_IPV6_SRC_O = 0x0109,
+ DR_STE_V1_LU_TYPE_ETHL4_I = 0x000a,
+ DR_STE_V1_LU_TYPE_IPV6_SRC_I = 0x010a,
+ DR_STE_V1_LU_TYPE_ETHL2_SRC_DST_O = 0x000b,
+ DR_STE_V1_LU_TYPE_MPLS_O = 0x010b,
+ DR_STE_V1_LU_TYPE_ETHL2_SRC_DST_I = 0x000c,
+ DR_STE_V1_LU_TYPE_MPLS_I = 0x010c,
+ DR_STE_V1_LU_TYPE_ETHL3_IPV4_MISC_O = 0x000d,
+ DR_STE_V1_LU_TYPE_GRE = 0x010d,
+ DR_STE_V1_LU_TYPE_FLEX_PARSER_TNL_HEADER = 0x000e,
+ DR_STE_V1_LU_TYPE_GENERAL_PURPOSE = 0x010e,
+ DR_STE_V1_LU_TYPE_ETHL3_IPV4_MISC_I = 0x000f,
+ DR_STE_V1_LU_TYPE_STEERING_REGISTERS_0 = 0x010f,
+ DR_STE_V1_LU_TYPE_STEERING_REGISTERS_1 = 0x0110,
+ DR_STE_V1_LU_TYPE_FLEX_PARSER_OK = 0x0011,
+ DR_STE_V1_LU_TYPE_FLEX_PARSER_0 = 0x0111,
+ DR_STE_V1_LU_TYPE_FLEX_PARSER_1 = 0x0112,
+ DR_STE_V1_LU_TYPE_ETHL4_MISC_O = 0x0113,
+ DR_STE_V1_LU_TYPE_ETHL4_MISC_I = 0x0114,
+ DR_STE_V1_LU_TYPE_INVALID = 0x00ff,
+ DR_STE_V1_LU_TYPE_DONT_CARE = MLX5DR_STE_LU_TYPE_DONT_CARE,
+};
+
+enum dr_ste_v1_header_anchors {
+ DR_STE_HEADER_ANCHOR_START_OUTER = 0x00,
+ DR_STE_HEADER_ANCHOR_1ST_VLAN = 0x02,
+ DR_STE_HEADER_ANCHOR_IPV6_IPV4 = 0x07,
+ DR_STE_HEADER_ANCHOR_INNER_MAC = 0x13,
+ DR_STE_HEADER_ANCHOR_INNER_IPV6_IPV4 = 0x19,
+};
+
+enum dr_ste_v1_action_size {
+ DR_STE_ACTION_SINGLE_SZ = 4,
+ DR_STE_ACTION_DOUBLE_SZ = 8,
+ DR_STE_ACTION_TRIPLE_SZ = 12,
+};
+
+enum dr_ste_v1_action_insert_ptr_attr {
+ DR_STE_V1_ACTION_INSERT_PTR_ATTR_NONE = 0, /* Regular push header (e.g. push vlan) */
+ DR_STE_V1_ACTION_INSERT_PTR_ATTR_ENCAP = 1, /* Encapsulation / Tunneling */
+ DR_STE_V1_ACTION_INSERT_PTR_ATTR_ESP = 2, /* IPsec */
+};
+
+enum dr_ste_v1_action_id {
+ DR_STE_V1_ACTION_ID_NOP = 0x00,
+ DR_STE_V1_ACTION_ID_COPY = 0x05,
+ DR_STE_V1_ACTION_ID_SET = 0x06,
+ DR_STE_V1_ACTION_ID_ADD = 0x07,
+ DR_STE_V1_ACTION_ID_REMOVE_BY_SIZE = 0x08,
+ DR_STE_V1_ACTION_ID_REMOVE_HEADER_TO_HEADER = 0x09,
+ DR_STE_V1_ACTION_ID_INSERT_INLINE = 0x0a,
+ DR_STE_V1_ACTION_ID_INSERT_POINTER = 0x0b,
+ DR_STE_V1_ACTION_ID_FLOW_TAG = 0x0c,
+ DR_STE_V1_ACTION_ID_QUEUE_ID_SEL = 0x0d,
+ DR_STE_V1_ACTION_ID_ACCELERATED_LIST = 0x0e,
+ DR_STE_V1_ACTION_ID_MODIFY_LIST = 0x0f,
+ DR_STE_V1_ACTION_ID_ASO = 0x12,
+ DR_STE_V1_ACTION_ID_TRAILER = 0x13,
+ DR_STE_V1_ACTION_ID_COUNTER_ID = 0x14,
+ DR_STE_V1_ACTION_ID_MAX = 0x21,
+ /* use for special cases */
+ DR_STE_V1_ACTION_ID_SPECIAL_ENCAP_L3 = 0x22,
+};
+
+enum {
+ DR_STE_V1_ACTION_MDFY_FLD_L2_OUT_0 = 0x00,
+ DR_STE_V1_ACTION_MDFY_FLD_L2_OUT_1 = 0x01,
+ DR_STE_V1_ACTION_MDFY_FLD_L2_OUT_2 = 0x02,
+ DR_STE_V1_ACTION_MDFY_FLD_SRC_L2_OUT_0 = 0x08,
+ DR_STE_V1_ACTION_MDFY_FLD_SRC_L2_OUT_1 = 0x09,
+ DR_STE_V1_ACTION_MDFY_FLD_L3_OUT_0 = 0x0e,
+ DR_STE_V1_ACTION_MDFY_FLD_L4_OUT_0 = 0x18,
+ DR_STE_V1_ACTION_MDFY_FLD_L4_OUT_1 = 0x19,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV4_OUT_0 = 0x40,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV4_OUT_1 = 0x41,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV6_DST_OUT_0 = 0x44,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV6_DST_OUT_1 = 0x45,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV6_DST_OUT_2 = 0x46,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV6_DST_OUT_3 = 0x47,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV6_SRC_OUT_0 = 0x4c,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV6_SRC_OUT_1 = 0x4d,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV6_SRC_OUT_2 = 0x4e,
+ DR_STE_V1_ACTION_MDFY_FLD_IPV6_SRC_OUT_3 = 0x4f,
+ DR_STE_V1_ACTION_MDFY_FLD_TCP_MISC_0 = 0x5e,
+ DR_STE_V1_ACTION_MDFY_FLD_TCP_MISC_1 = 0x5f,
+ DR_STE_V1_ACTION_MDFY_FLD_CFG_HDR_0_0 = 0x6f,
+ DR_STE_V1_ACTION_MDFY_FLD_CFG_HDR_0_1 = 0x70,
+ DR_STE_V1_ACTION_MDFY_FLD_METADATA_2_CQE = 0x7b,
+ DR_STE_V1_ACTION_MDFY_FLD_GNRL_PURPOSE = 0x7c,
+ DR_STE_V1_ACTION_MDFY_FLD_REGISTER_2_0 = 0x8c,
+ DR_STE_V1_ACTION_MDFY_FLD_REGISTER_2_1 = 0x8d,
+ DR_STE_V1_ACTION_MDFY_FLD_REGISTER_1_0 = 0x8e,
+ DR_STE_V1_ACTION_MDFY_FLD_REGISTER_1_1 = 0x8f,
+ DR_STE_V1_ACTION_MDFY_FLD_REGISTER_0_0 = 0x90,
+ DR_STE_V1_ACTION_MDFY_FLD_REGISTER_0_1 = 0x91,
+};
+
+enum dr_ste_v1_aso_ctx_type {
+ DR_STE_V1_ASO_CTX_TYPE_POLICERS = 0x2,
+};
+
bool dr_ste_v1_is_miss_addr_set(u8 *hw_ste_p);
void dr_ste_v1_set_miss_addr(u8 *hw_ste_p, u64 miss_addr);
u64 dr_ste_v1_get_miss_addr(u8 *hw_ste_p);
@@ -17,11 +149,18 @@ u16 dr_ste_v1_get_next_lu_type(u8 *hw_ste_p);
void dr_ste_v1_set_hit_addr(u8 *hw_ste_p, u64 icm_addr, u32 ht_size);
void dr_ste_v1_init(u8 *hw_ste_p, u16 lu_type, bool is_rx, u16 gvmi);
void dr_ste_v1_prepare_for_postsend(u8 *hw_ste_p, u32 ste_size);
-void dr_ste_v1_set_actions_tx(struct mlx5dr_domain *dmn, u8 *action_type_set,
- u32 actions_caps, u8 *last_ste,
+void dr_ste_v1_set_reparse(u8 *hw_ste_p);
+void dr_ste_v1_set_encap(u8 *hw_ste_p, u8 *d_action, u32 reformat_id, int size);
+void dr_ste_v1_set_push_vlan(u8 *hw_ste_p, u8 *d_action, u32 vlan_hdr);
+void dr_ste_v1_set_pop_vlan(u8 *hw_ste_p, u8 *s_action, u8 vlans_num);
+void dr_ste_v1_set_encap_l3(u8 *hw_ste_p, u8 *frst_s_action, u8 *scnd_d_action,
+ u32 reformat_id, int size);
+void dr_ste_v1_set_rx_decap(u8 *hw_ste_p, u8 *s_action);
+void dr_ste_v1_set_actions_tx(struct mlx5dr_ste_ctx *ste_ctx, struct mlx5dr_domain *dmn,
+ u8 *action_type_set, u32 actions_caps, u8 *last_ste,
struct mlx5dr_ste_actions_attr *attr, u32 *added_stes);
-void dr_ste_v1_set_actions_rx(struct mlx5dr_domain *dmn, u8 *action_type_set,
- u32 actions_caps, u8 *last_ste,
+void dr_ste_v1_set_actions_rx(struct mlx5dr_ste_ctx *ste_ctx, struct mlx5dr_domain *dmn,
+ u8 *action_type_set, u32 actions_caps, u8 *last_ste,
struct mlx5dr_ste_actions_attr *attr, u32 *added_stes);
void dr_ste_v1_set_action_set(u8 *d_action, u8 hw_field, u8 shifter,
u8 length, u32 data);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v2.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v2.c
index 808b013cf48c..0882dba0f64b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v2.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v2.c
@@ -2,167 +2,7 @@
/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
#include "dr_ste_v1.h"
-
-enum {
- DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_0 = 0x00,
- DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_1 = 0x01,
- DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_2 = 0x02,
- DR_STE_V2_ACTION_MDFY_FLD_SRC_L2_OUT_0 = 0x08,
- DR_STE_V2_ACTION_MDFY_FLD_SRC_L2_OUT_1 = 0x09,
- DR_STE_V2_ACTION_MDFY_FLD_L3_OUT_0 = 0x0e,
- DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0 = 0x18,
- DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_1 = 0x19,
- DR_STE_V2_ACTION_MDFY_FLD_IPV4_OUT_0 = 0x40,
- DR_STE_V2_ACTION_MDFY_FLD_IPV4_OUT_1 = 0x41,
- DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_0 = 0x44,
- DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_1 = 0x45,
- DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_2 = 0x46,
- DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_3 = 0x47,
- DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_0 = 0x4c,
- DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_1 = 0x4d,
- DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_2 = 0x4e,
- DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_3 = 0x4f,
- DR_STE_V2_ACTION_MDFY_FLD_TCP_MISC_0 = 0x5e,
- DR_STE_V2_ACTION_MDFY_FLD_TCP_MISC_1 = 0x5f,
- DR_STE_V2_ACTION_MDFY_FLD_CFG_HDR_0_0 = 0x6f,
- DR_STE_V2_ACTION_MDFY_FLD_CFG_HDR_0_1 = 0x70,
- DR_STE_V2_ACTION_MDFY_FLD_METADATA_2_CQE = 0x7b,
- DR_STE_V2_ACTION_MDFY_FLD_GNRL_PURPOSE = 0x7c,
- DR_STE_V2_ACTION_MDFY_FLD_REGISTER_2_0 = 0x90,
- DR_STE_V2_ACTION_MDFY_FLD_REGISTER_2_1 = 0x91,
- DR_STE_V2_ACTION_MDFY_FLD_REGISTER_1_0 = 0x92,
- DR_STE_V2_ACTION_MDFY_FLD_REGISTER_1_1 = 0x93,
- DR_STE_V2_ACTION_MDFY_FLD_REGISTER_0_0 = 0x94,
- DR_STE_V2_ACTION_MDFY_FLD_REGISTER_0_1 = 0x95,
-};
-
-static const struct mlx5dr_ste_action_modify_field dr_ste_v2_action_modify_field_arr[] = {
- [MLX5_ACTION_IN_FIELD_OUT_SMAC_47_16] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_SRC_L2_OUT_0, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_OUT_SMAC_15_0] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_SRC_L2_OUT_1, .start = 16, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_OUT_ETHERTYPE] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_1, .start = 0, .end = 15,
- },
- [MLX5_ACTION_IN_FIELD_OUT_DMAC_47_16] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_0, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_OUT_DMAC_15_0] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_1, .start = 16, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_OUT_IP_DSCP] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L3_OUT_0, .start = 18, .end = 23,
- },
- [MLX5_ACTION_IN_FIELD_OUT_TCP_FLAGS] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_1, .start = 16, .end = 24,
- .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_TCP,
- },
- [MLX5_ACTION_IN_FIELD_OUT_TCP_SPORT] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0, .start = 16, .end = 31,
- .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_TCP,
- },
- [MLX5_ACTION_IN_FIELD_OUT_TCP_DPORT] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0, .start = 0, .end = 15,
- .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_TCP,
- },
- [MLX5_ACTION_IN_FIELD_OUT_IP_TTL] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L3_OUT_0, .start = 8, .end = 15,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV4,
- },
- [MLX5_ACTION_IN_FIELD_OUT_IPV6_HOPLIMIT] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L3_OUT_0, .start = 8, .end = 15,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
- },
- [MLX5_ACTION_IN_FIELD_OUT_UDP_SPORT] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0, .start = 16, .end = 31,
- .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_UDP,
- },
- [MLX5_ACTION_IN_FIELD_OUT_UDP_DPORT] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0, .start = 0, .end = 15,
- .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_UDP,
- },
- [MLX5_ACTION_IN_FIELD_OUT_SIPV6_127_96] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_0, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
- },
- [MLX5_ACTION_IN_FIELD_OUT_SIPV6_95_64] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_1, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
- },
- [MLX5_ACTION_IN_FIELD_OUT_SIPV6_63_32] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_2, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
- },
- [MLX5_ACTION_IN_FIELD_OUT_SIPV6_31_0] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_3, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
- },
- [MLX5_ACTION_IN_FIELD_OUT_DIPV6_127_96] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_0, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
- },
- [MLX5_ACTION_IN_FIELD_OUT_DIPV6_95_64] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_1, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
- },
- [MLX5_ACTION_IN_FIELD_OUT_DIPV6_63_32] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_2, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
- },
- [MLX5_ACTION_IN_FIELD_OUT_DIPV6_31_0] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_3, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
- },
- [MLX5_ACTION_IN_FIELD_OUT_SIPV4] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV4_OUT_0, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV4,
- },
- [MLX5_ACTION_IN_FIELD_OUT_DIPV4] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV4_OUT_1, .start = 0, .end = 31,
- .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV4,
- },
- [MLX5_ACTION_IN_FIELD_METADATA_REG_A] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_GNRL_PURPOSE, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_METADATA_REG_B] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_METADATA_2_CQE, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_METADATA_REG_C_0] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_0_0, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_METADATA_REG_C_1] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_0_1, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_METADATA_REG_C_2] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_1_0, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_METADATA_REG_C_3] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_1_1, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_METADATA_REG_C_4] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_2_0, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_METADATA_REG_C_5] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_2_1, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_OUT_TCP_SEQ_NUM] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_TCP_MISC_0, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_OUT_TCP_ACK_NUM] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_TCP_MISC_1, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_OUT_FIRST_VID] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_2, .start = 0, .end = 15,
- },
- [MLX5_ACTION_IN_FIELD_OUT_EMD_31_0] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_CFG_HDR_0_1, .start = 0, .end = 31,
- },
- [MLX5_ACTION_IN_FIELD_OUT_EMD_47_32] = {
- .hw_field = DR_STE_V2_ACTION_MDFY_FLD_CFG_HDR_0_0, .start = 0, .end = 15,
- },
-};
+#include "dr_ste_v2.h"
static struct mlx5dr_ste_ctx ste_ctx_v2 = {
/* Builders */
@@ -223,7 +63,12 @@ static struct mlx5dr_ste_ctx ste_ctx_v2 = {
.set_action_decap_l3_list = &dr_ste_v1_set_action_decap_l3_list,
.alloc_modify_hdr_chunk = &dr_ste_v1_alloc_modify_hdr_ptrn_arg,
.dealloc_modify_hdr_chunk = &dr_ste_v1_free_modify_hdr_ptrn_arg,
-
+ /* Actions bit set */
+ .set_encap = &dr_ste_v1_set_encap,
+ .set_push_vlan = &dr_ste_v1_set_push_vlan,
+ .set_pop_vlan = &dr_ste_v1_set_pop_vlan,
+ .set_rx_decap = &dr_ste_v1_set_rx_decap,
+ .set_encap_l3 = &dr_ste_v1_set_encap_l3,
/* Send */
.prepare_for_postsend = &dr_ste_v1_prepare_for_postsend,
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v2.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v2.h
new file mode 100644
index 000000000000..d853fde49cfc
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v2.h
@@ -0,0 +1,168 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#ifndef _DR_STE_V2_
+#define _DR_STE_V2_
+
+enum {
+ DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_0 = 0x00,
+ DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_1 = 0x01,
+ DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_2 = 0x02,
+ DR_STE_V2_ACTION_MDFY_FLD_SRC_L2_OUT_0 = 0x08,
+ DR_STE_V2_ACTION_MDFY_FLD_SRC_L2_OUT_1 = 0x09,
+ DR_STE_V2_ACTION_MDFY_FLD_L3_OUT_0 = 0x0e,
+ DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0 = 0x18,
+ DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_1 = 0x19,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV4_OUT_0 = 0x40,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV4_OUT_1 = 0x41,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_0 = 0x44,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_1 = 0x45,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_2 = 0x46,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_3 = 0x47,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_0 = 0x4c,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_1 = 0x4d,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_2 = 0x4e,
+ DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_3 = 0x4f,
+ DR_STE_V2_ACTION_MDFY_FLD_TCP_MISC_0 = 0x5e,
+ DR_STE_V2_ACTION_MDFY_FLD_TCP_MISC_1 = 0x5f,
+ DR_STE_V2_ACTION_MDFY_FLD_CFG_HDR_0_0 = 0x6f,
+ DR_STE_V2_ACTION_MDFY_FLD_CFG_HDR_0_1 = 0x70,
+ DR_STE_V2_ACTION_MDFY_FLD_METADATA_2_CQE = 0x7b,
+ DR_STE_V2_ACTION_MDFY_FLD_GNRL_PURPOSE = 0x7c,
+ DR_STE_V2_ACTION_MDFY_FLD_REGISTER_2_0 = 0x90,
+ DR_STE_V2_ACTION_MDFY_FLD_REGISTER_2_1 = 0x91,
+ DR_STE_V2_ACTION_MDFY_FLD_REGISTER_1_0 = 0x92,
+ DR_STE_V2_ACTION_MDFY_FLD_REGISTER_1_1 = 0x93,
+ DR_STE_V2_ACTION_MDFY_FLD_REGISTER_0_0 = 0x94,
+ DR_STE_V2_ACTION_MDFY_FLD_REGISTER_0_1 = 0x95,
+};
+
+static const struct mlx5dr_ste_action_modify_field dr_ste_v2_action_modify_field_arr[] = {
+ [MLX5_ACTION_IN_FIELD_OUT_SMAC_47_16] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_SRC_L2_OUT_0, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_SMAC_15_0] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_SRC_L2_OUT_1, .start = 16, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_ETHERTYPE] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_1, .start = 0, .end = 15,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_DMAC_47_16] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_0, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_DMAC_15_0] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_1, .start = 16, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_IP_DSCP] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L3_OUT_0, .start = 18, .end = 23,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_TCP_FLAGS] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_1, .start = 16, .end = 24,
+ .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_TCP,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_TCP_SPORT] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0, .start = 16, .end = 31,
+ .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_TCP,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_TCP_DPORT] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0, .start = 0, .end = 15,
+ .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_TCP,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_IP_TTL] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L3_OUT_0, .start = 8, .end = 15,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV4,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_IPV6_HOPLIMIT] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L3_OUT_0, .start = 8, .end = 15,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_UDP_SPORT] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0, .start = 16, .end = 31,
+ .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_UDP,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_UDP_DPORT] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L4_OUT_0, .start = 0, .end = 15,
+ .l4_type = DR_STE_ACTION_MDFY_TYPE_L4_UDP,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_SIPV6_127_96] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_0, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_SIPV6_95_64] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_1, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_SIPV6_63_32] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_2, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_SIPV6_31_0] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_SRC_OUT_3, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_DIPV6_127_96] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_0, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_DIPV6_95_64] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_1, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_DIPV6_63_32] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_2, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_DIPV6_31_0] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV6_DST_OUT_3, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV6,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_SIPV4] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV4_OUT_0, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV4,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_DIPV4] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_IPV4_OUT_1, .start = 0, .end = 31,
+ .l3_type = DR_STE_ACTION_MDFY_TYPE_L3_IPV4,
+ },
+ [MLX5_ACTION_IN_FIELD_METADATA_REG_A] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_GNRL_PURPOSE, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_METADATA_REG_B] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_METADATA_2_CQE, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_METADATA_REG_C_0] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_0_0, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_METADATA_REG_C_1] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_0_1, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_METADATA_REG_C_2] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_1_0, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_METADATA_REG_C_3] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_1_1, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_METADATA_REG_C_4] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_2_0, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_METADATA_REG_C_5] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_REGISTER_2_1, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_TCP_SEQ_NUM] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_TCP_MISC_0, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_TCP_ACK_NUM] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_TCP_MISC_1, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_FIRST_VID] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_L2_OUT_2, .start = 0, .end = 15,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_EMD_31_0] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_CFG_HDR_0_1, .start = 0, .end = 31,
+ },
+ [MLX5_ACTION_IN_FIELD_OUT_EMD_47_32] = {
+ .hw_field = DR_STE_V2_ACTION_MDFY_FLD_CFG_HDR_0_0, .start = 0, .end = 15,
+ },
+};
+
+#endif /* _DR_STE_V2_ */
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next V5 06/11] net/mlx5: DR, Add support for ConnectX-8 steering
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (4 preceding siblings ...)
2024-12-04 22:09 ` [PATCH net-next V5 05/11] net/mlx5: DR, Expand SWS STE callbacks and consolidate common structs Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management Tariq Toukan
` (6 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Itamar Gozlan,
Yevgeny Kliteynik, Tariq Toukan
From: Itamar Gozlan <igozlan@nvidia.com>
Add support for a new steering format version that is implemented by
ConnectX-8.
Except for several differences, the STEv3 is identical to STEv2, so
for most callbacks STEv3 context struct will call STEv2 functions.
Signed-off-by: Itamar Gozlan <igozlan@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/Makefile | 1 +
.../mlx5/core/steering/sws/dr_domain.c | 2 +-
.../mellanox/mlx5/core/steering/sws/dr_ste.c | 2 +
.../mellanox/mlx5/core/steering/sws/dr_ste.h | 1 +
.../mlx5/core/steering/sws/dr_ste_v3.c | 221 ++++++++++++++++++
.../mlx5/core/steering/sws/mlx5_ifc_dr.h | 40 ++++
.../mellanox/mlx5/core/steering/sws/mlx5dr.h | 2 +-
7 files changed, 267 insertions(+), 2 deletions(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v3.c
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index be3d0876c521..f9db8b8374fa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -123,6 +123,7 @@ mlx5_core-$(CONFIG_MLX5_SW_STEERING) += steering/sws/dr_domain.o \
steering/sws/dr_ste_v0.o \
steering/sws/dr_ste_v1.o \
steering/sws/dr_ste_v2.o \
+ steering/sws/dr_ste_v3.o \
steering/sws/dr_cmd.o \
steering/sws/dr_fw.o \
steering/sws/dr_action.o \
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_domain.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_domain.c
index 3d74109f8230..bd361ba6658c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_domain.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_domain.c
@@ -8,7 +8,7 @@
#define DR_DOMAIN_SW_STEERING_SUPPORTED(dmn, dmn_type) \
((dmn)->info.caps.dmn_type##_sw_owner || \
((dmn)->info.caps.dmn_type##_sw_owner_v2 && \
- (dmn)->info.caps.sw_format_ver <= MLX5_STEERING_FORMAT_CONNECTX_7))
+ (dmn)->info.caps.sw_format_ver <= MLX5_STEERING_FORMAT_CONNECTX_8))
bool mlx5dr_domain_is_support_ptrn_arg(struct mlx5dr_domain *dmn)
{
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.c
index 01ba8eae2983..c8b8ff80c7c7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.c
@@ -1458,6 +1458,8 @@ struct mlx5dr_ste_ctx *mlx5dr_ste_get_ctx(u8 version)
return mlx5dr_ste_get_ctx_v1();
else if (version == MLX5_STEERING_FORMAT_CONNECTX_7)
return mlx5dr_ste_get_ctx_v2();
+ else if (version == MLX5_STEERING_FORMAT_CONNECTX_8)
+ return mlx5dr_ste_get_ctx_v3();
return NULL;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.h
index b6ec8d30d990..5f409dc30aca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste.h
@@ -217,5 +217,6 @@ struct mlx5dr_ste_ctx {
struct mlx5dr_ste_ctx *mlx5dr_ste_get_ctx_v0(void);
struct mlx5dr_ste_ctx *mlx5dr_ste_get_ctx_v1(void);
struct mlx5dr_ste_ctx *mlx5dr_ste_get_ctx_v2(void);
+struct mlx5dr_ste_ctx *mlx5dr_ste_get_ctx_v3(void);
#endif /* _DR_STE_ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v3.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v3.c
new file mode 100644
index 000000000000..cc60ce1d274e
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/dr_ste_v3.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */
+
+#include "dr_ste_v1.h"
+#include "dr_ste_v2.h"
+
+static void dr_ste_v3_set_encap(u8 *hw_ste_p, u8 *d_action,
+ u32 reformat_id, int size)
+{
+ MLX5_SET(ste_double_action_insert_with_ptr_v3, d_action, action_id,
+ DR_STE_V1_ACTION_ID_INSERT_POINTER);
+ /* The hardware expects here size in words (2 byte) */
+ MLX5_SET(ste_double_action_insert_with_ptr_v3, d_action, size, size / 2);
+ MLX5_SET(ste_double_action_insert_with_ptr_v3, d_action, pointer, reformat_id);
+ MLX5_SET(ste_double_action_insert_with_ptr_v3, d_action, attributes,
+ DR_STE_V1_ACTION_INSERT_PTR_ATTR_ENCAP);
+ dr_ste_v1_set_reparse(hw_ste_p);
+}
+
+static void dr_ste_v3_set_push_vlan(u8 *ste, u8 *d_action,
+ u32 vlan_hdr)
+{
+ MLX5_SET(ste_double_action_insert_with_inline_v3, d_action, action_id,
+ DR_STE_V1_ACTION_ID_INSERT_INLINE);
+ /* The hardware expects here offset to vlan header in words (2 byte) */
+ MLX5_SET(ste_double_action_insert_with_inline_v3, d_action, start_offset,
+ HDR_LEN_L2_MACS >> 1);
+ MLX5_SET(ste_double_action_insert_with_inline_v3, d_action, inline_data, vlan_hdr);
+ dr_ste_v1_set_reparse(ste);
+}
+
+static void dr_ste_v3_set_pop_vlan(u8 *hw_ste_p, u8 *s_action,
+ u8 vlans_num)
+{
+ MLX5_SET(ste_single_action_remove_header_size_v3, s_action,
+ action_id, DR_STE_V1_ACTION_ID_REMOVE_BY_SIZE);
+ MLX5_SET(ste_single_action_remove_header_size_v3, s_action,
+ start_anchor, DR_STE_HEADER_ANCHOR_1ST_VLAN);
+ /* The hardware expects here size in words (2 byte) */
+ MLX5_SET(ste_single_action_remove_header_size_v3, s_action,
+ remove_size, (HDR_LEN_L2_VLAN >> 1) * vlans_num);
+
+ dr_ste_v1_set_reparse(hw_ste_p);
+}
+
+static void dr_ste_v3_set_encap_l3(u8 *hw_ste_p,
+ u8 *frst_s_action,
+ u8 *scnd_d_action,
+ u32 reformat_id,
+ int size)
+{
+ /* Remove L2 headers */
+ MLX5_SET(ste_single_action_remove_header_v3, frst_s_action, action_id,
+ DR_STE_V1_ACTION_ID_REMOVE_HEADER_TO_HEADER);
+ MLX5_SET(ste_single_action_remove_header_v3, frst_s_action, end_anchor,
+ DR_STE_HEADER_ANCHOR_IPV6_IPV4);
+
+ /* Encapsulate with given reformat ID */
+ MLX5_SET(ste_double_action_insert_with_ptr_v3, scnd_d_action, action_id,
+ DR_STE_V1_ACTION_ID_INSERT_POINTER);
+ /* The hardware expects here size in words (2 byte) */
+ MLX5_SET(ste_double_action_insert_with_ptr_v3, scnd_d_action, size, size / 2);
+ MLX5_SET(ste_double_action_insert_with_ptr_v3, scnd_d_action, pointer, reformat_id);
+ MLX5_SET(ste_double_action_insert_with_ptr_v3, scnd_d_action, attributes,
+ DR_STE_V1_ACTION_INSERT_PTR_ATTR_ENCAP);
+
+ dr_ste_v1_set_reparse(hw_ste_p);
+}
+
+static void dr_ste_v3_set_rx_decap(u8 *hw_ste_p, u8 *s_action)
+{
+ MLX5_SET(ste_single_action_remove_header_v3, s_action, action_id,
+ DR_STE_V1_ACTION_ID_REMOVE_HEADER_TO_HEADER);
+ MLX5_SET(ste_single_action_remove_header_v3, s_action, decap, 1);
+ MLX5_SET(ste_single_action_remove_header_v3, s_action, vni_to_cqe, 1);
+ MLX5_SET(ste_single_action_remove_header_v3, s_action, end_anchor,
+ DR_STE_HEADER_ANCHOR_INNER_MAC);
+
+ dr_ste_v1_set_reparse(hw_ste_p);
+}
+
+static int
+dr_ste_v3_set_action_decap_l3_list(void *data, u32 data_sz,
+ u8 *hw_action, u32 hw_action_sz,
+ uint16_t *used_hw_action_num)
+{
+ u8 padded_data[DR_STE_L2_HDR_MAX_SZ] = {};
+ void *data_ptr = padded_data;
+ u16 used_actions = 0;
+ u32 inline_data_sz;
+ u32 i;
+
+ if (hw_action_sz / DR_STE_ACTION_DOUBLE_SZ < DR_STE_DECAP_L3_ACTION_NUM)
+ return -EINVAL;
+
+ inline_data_sz =
+ MLX5_FLD_SZ_BYTES(ste_double_action_insert_with_inline_v3, inline_data);
+
+ /* Add an alignment padding */
+ memcpy(padded_data + data_sz % inline_data_sz, data, data_sz);
+
+ /* Remove L2L3 outer headers */
+ MLX5_SET(ste_single_action_remove_header_v3, hw_action, action_id,
+ DR_STE_V1_ACTION_ID_REMOVE_HEADER_TO_HEADER);
+ MLX5_SET(ste_single_action_remove_header_v3, hw_action, decap, 1);
+ MLX5_SET(ste_single_action_remove_header_v3, hw_action, vni_to_cqe, 1);
+ MLX5_SET(ste_single_action_remove_header_v3, hw_action, end_anchor,
+ DR_STE_HEADER_ANCHOR_INNER_IPV6_IPV4);
+ hw_action += DR_STE_ACTION_DOUBLE_SZ;
+ used_actions++; /* Remove and NOP are a single double action */
+
+ /* Point to the last dword of the header */
+ data_ptr += (data_sz / inline_data_sz) * inline_data_sz;
+
+ /* Add the new header using inline action 4Byte at a time, the header
+ * is added in reversed order to the beginning of the packet to avoid
+ * incorrect parsing by the HW. Since header is 14B or 18B an extra
+ * two bytes are padded and later removed.
+ */
+ for (i = 0; i < data_sz / inline_data_sz + 1; i++) {
+ void *addr_inline;
+
+ MLX5_SET(ste_double_action_insert_with_inline_v3, hw_action, action_id,
+ DR_STE_V1_ACTION_ID_INSERT_INLINE);
+ /* The hardware expects here offset to words (2 bytes) */
+ MLX5_SET(ste_double_action_insert_with_inline_v3, hw_action, start_offset, 0);
+
+ /* Copy bytes one by one to avoid endianness problem */
+ addr_inline = MLX5_ADDR_OF(ste_double_action_insert_with_inline_v3,
+ hw_action, inline_data);
+ memcpy(addr_inline, data_ptr - i * inline_data_sz, inline_data_sz);
+ hw_action += DR_STE_ACTION_DOUBLE_SZ;
+ used_actions++;
+ }
+
+ /* Remove first 2 extra bytes */
+ MLX5_SET(ste_single_action_remove_header_size_v3, hw_action, action_id,
+ DR_STE_V1_ACTION_ID_REMOVE_BY_SIZE);
+ MLX5_SET(ste_single_action_remove_header_size_v3, hw_action, start_offset, 0);
+ /* The hardware expects here size in words (2 bytes) */
+ MLX5_SET(ste_single_action_remove_header_size_v3, hw_action, remove_size, 1);
+ used_actions++;
+
+ *used_hw_action_num = used_actions;
+
+ return 0;
+}
+
+static struct mlx5dr_ste_ctx ste_ctx_v3 = {
+ /* Builders */
+ .build_eth_l2_src_dst_init = &dr_ste_v1_build_eth_l2_src_dst_init,
+ .build_eth_l3_ipv6_src_init = &dr_ste_v1_build_eth_l3_ipv6_src_init,
+ .build_eth_l3_ipv6_dst_init = &dr_ste_v1_build_eth_l3_ipv6_dst_init,
+ .build_eth_l3_ipv4_5_tuple_init = &dr_ste_v1_build_eth_l3_ipv4_5_tuple_init,
+ .build_eth_l2_src_init = &dr_ste_v1_build_eth_l2_src_init,
+ .build_eth_l2_dst_init = &dr_ste_v1_build_eth_l2_dst_init,
+ .build_eth_l2_tnl_init = &dr_ste_v1_build_eth_l2_tnl_init,
+ .build_eth_l3_ipv4_misc_init = &dr_ste_v1_build_eth_l3_ipv4_misc_init,
+ .build_eth_ipv6_l3_l4_init = &dr_ste_v1_build_eth_ipv6_l3_l4_init,
+ .build_mpls_init = &dr_ste_v1_build_mpls_init,
+ .build_tnl_gre_init = &dr_ste_v1_build_tnl_gre_init,
+ .build_tnl_mpls_init = &dr_ste_v1_build_tnl_mpls_init,
+ .build_tnl_mpls_over_udp_init = &dr_ste_v1_build_tnl_mpls_over_udp_init,
+ .build_tnl_mpls_over_gre_init = &dr_ste_v1_build_tnl_mpls_over_gre_init,
+ .build_icmp_init = &dr_ste_v1_build_icmp_init,
+ .build_general_purpose_init = &dr_ste_v1_build_general_purpose_init,
+ .build_eth_l4_misc_init = &dr_ste_v1_build_eth_l4_misc_init,
+ .build_tnl_vxlan_gpe_init = &dr_ste_v1_build_flex_parser_tnl_vxlan_gpe_init,
+ .build_tnl_geneve_init = &dr_ste_v1_build_flex_parser_tnl_geneve_init,
+ .build_tnl_geneve_tlv_opt_init = &dr_ste_v1_build_flex_parser_tnl_geneve_tlv_opt_init,
+ .build_tnl_geneve_tlv_opt_exist_init =
+ &dr_ste_v1_build_flex_parser_tnl_geneve_tlv_opt_exist_init,
+ .build_register_0_init = &dr_ste_v1_build_register_0_init,
+ .build_register_1_init = &dr_ste_v1_build_register_1_init,
+ .build_src_gvmi_qpn_init = &dr_ste_v1_build_src_gvmi_qpn_init,
+ .build_flex_parser_0_init = &dr_ste_v1_build_flex_parser_0_init,
+ .build_flex_parser_1_init = &dr_ste_v1_build_flex_parser_1_init,
+ .build_tnl_gtpu_init = &dr_ste_v1_build_flex_parser_tnl_gtpu_init,
+ .build_tnl_header_0_1_init = &dr_ste_v1_build_tnl_header_0_1_init,
+ .build_tnl_gtpu_flex_parser_0_init = &dr_ste_v1_build_tnl_gtpu_flex_parser_0_init,
+ .build_tnl_gtpu_flex_parser_1_init = &dr_ste_v1_build_tnl_gtpu_flex_parser_1_init,
+
+ /* Getters and Setters */
+ .ste_init = &dr_ste_v1_init,
+ .set_next_lu_type = &dr_ste_v1_set_next_lu_type,
+ .get_next_lu_type = &dr_ste_v1_get_next_lu_type,
+ .is_miss_addr_set = &dr_ste_v1_is_miss_addr_set,
+ .set_miss_addr = &dr_ste_v1_set_miss_addr,
+ .get_miss_addr = &dr_ste_v1_get_miss_addr,
+ .set_hit_addr = &dr_ste_v1_set_hit_addr,
+ .set_byte_mask = &dr_ste_v1_set_byte_mask,
+ .get_byte_mask = &dr_ste_v1_get_byte_mask,
+
+ /* Actions */
+ .actions_caps = DR_STE_CTX_ACTION_CAP_TX_POP |
+ DR_STE_CTX_ACTION_CAP_RX_PUSH |
+ DR_STE_CTX_ACTION_CAP_RX_ENCAP,
+ .set_actions_rx = &dr_ste_v1_set_actions_rx,
+ .set_actions_tx = &dr_ste_v1_set_actions_tx,
+ .modify_field_arr_sz = ARRAY_SIZE(dr_ste_v2_action_modify_field_arr),
+ .modify_field_arr = dr_ste_v2_action_modify_field_arr,
+ .set_action_set = &dr_ste_v1_set_action_set,
+ .set_action_add = &dr_ste_v1_set_action_add,
+ .set_action_copy = &dr_ste_v1_set_action_copy,
+ .set_action_decap_l3_list = &dr_ste_v3_set_action_decap_l3_list,
+ .alloc_modify_hdr_chunk = &dr_ste_v1_alloc_modify_hdr_ptrn_arg,
+ .dealloc_modify_hdr_chunk = &dr_ste_v1_free_modify_hdr_ptrn_arg,
+ /* Actions bit set */
+ .set_encap = &dr_ste_v3_set_encap,
+ .set_push_vlan = &dr_ste_v3_set_push_vlan,
+ .set_pop_vlan = &dr_ste_v3_set_pop_vlan,
+ .set_rx_decap = &dr_ste_v3_set_rx_decap,
+ .set_encap_l3 = &dr_ste_v3_set_encap_l3,
+ /* Send */
+ .prepare_for_postsend = &dr_ste_v1_prepare_for_postsend,
+};
+
+struct mlx5dr_ste_ctx *mlx5dr_ste_get_ctx_v3(void)
+{
+ return &ste_ctx_v3;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/mlx5_ifc_dr.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/mlx5_ifc_dr.h
index fb078fa0f0cc..898c3618ff26 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/mlx5_ifc_dr.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/mlx5_ifc_dr.h
@@ -600,4 +600,44 @@ struct mlx5_ifc_ste_double_action_aso_v1_bits {
};
};
+struct mlx5_ifc_ste_single_action_remove_header_v3_bits {
+ u8 action_id[0x8];
+ u8 start_anchor[0x7];
+ u8 end_anchor[0x7];
+ u8 reserved_at_16[0x1];
+ u8 outer_l4_remove[0x1];
+ u8 reserved_at_18[0x4];
+ u8 decap[0x1];
+ u8 vni_to_cqe[0x1];
+ u8 qos_profile[0x2];
+};
+
+struct mlx5_ifc_ste_single_action_remove_header_size_v3_bits {
+ u8 action_id[0x8];
+ u8 start_anchor[0x7];
+ u8 start_offset[0x8];
+ u8 outer_l4_remove[0x1];
+ u8 reserved_at_18[0x2];
+ u8 remove_size[0x6];
+};
+
+struct mlx5_ifc_ste_double_action_insert_with_inline_v3_bits {
+ u8 action_id[0x8];
+ u8 start_anchor[0x7];
+ u8 start_offset[0x8];
+ u8 reserved_at_17[0x9];
+
+ u8 inline_data[0x20];
+};
+
+struct mlx5_ifc_ste_double_action_insert_with_ptr_v3_bits {
+ u8 action_id[0x8];
+ u8 start_anchor[0x7];
+ u8 start_offset[0x8];
+ u8 size[0x6];
+ u8 attributes[0x3];
+
+ u8 pointer[0x20];
+};
+
#endif /* MLX5_IFC_DR_H */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/mlx5dr.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/mlx5dr.h
index 3ac7dc67509f..0bb3724c10c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/mlx5dr.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/sws/mlx5dr.h
@@ -160,7 +160,7 @@ mlx5dr_is_supported(struct mlx5_core_dev *dev)
(MLX5_CAP_ESW_FLOWTABLE_FDB(dev, sw_owner) ||
(MLX5_CAP_ESW_FLOWTABLE_FDB(dev, sw_owner_v2) &&
(MLX5_CAP_GEN(dev, steering_format_version) <=
- MLX5_STEERING_FORMAT_CONNECTX_7)));
+ MLX5_STEERING_FORMAT_CONNECTX_8)));
}
/* buddy functions & structure */
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (5 preceding siblings ...)
2024-12-04 22:09 ` [PATCH net-next V5 06/11] net/mlx5: DR, Add support for ConnectX-8 steering Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-07 2:10 ` Jakub Kicinski
2024-12-04 22:09 ` [PATCH net-next V5 08/11] net/mlx5: Add no-op implementation for setting tc-bw on rate objects Tariq Toukan
` (5 subsequent siblings)
12 siblings, 1 reply; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Carolina Jubran,
Cosmin Ratiu, Jiri Pirko, Tariq Toukan
From: Carolina Jubran <cjubran@nvidia.com>
Introduce support for specifying bandwidth proportions between traffic
classes (TC) in the devlink-rate API. This new option allows users to
allocate bandwidth across multiple traffic classes in a single command.
This feature provides a more granular control over traffic management,
especially for scenarios requiring Enhanced Transmission Selection.
Users can now define a specific bandwidth share for each traffic class,
such as allocating 20% for TC0 (TCP/UDP) and 80% for TC5 (RoCE).
Example:
DEV=pci/0000:08:00.0
$ devlink port function rate add $DEV/vfs_group tx_share 10Gbit \
tx_max 50Gbit tc-bw 0:20 1:0 2:0 3:0 4:0 5:80 6:0 7:0
$ devlink port function rate set $DEV/vfs_group \
tc-bw 0:20 1:0 2:0 3:0 4:0 5:20 6:60 7:0
Example usage with ynl:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
--do rate-set --json '{
"bus-name": "pci",
"dev-name": "0000:08:00.0",
"port-index": 1,
"rate-tc-bws": [
{"rate-tc-index": 0, "rate-tc-bw": 50},
{"rate-tc-index": 1, "rate-tc-bw": 50},
{"rate-tc-index": 2, "rate-tc-bw": 0},
{"rate-tc-index": 3, "rate-tc-bw": 0},
{"rate-tc-index": 4, "rate-tc-bw": 0},
{"rate-tc-index": 5, "rate-tc-bw": 0},
{"rate-tc-index": 6, "rate-tc-bw": 0},
{"rate-tc-index": 7, "rate-tc-bw": 0}
]
}'
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/devlink.yaml \
--do rate-get --json '{
"bus-name": "pci",
"dev-name": "0000:08:00.0",
"port-index": 1
}'
output for rate-get:
{'bus-name': 'pci',
'dev-name': '0000:08:00.0',
'port-index': 1,
'rate-tc-bws': [{'rate-tc-bw': 50, 'rate-tc-index': 0},
{'rate-tc-bw': 50, 'rate-tc-index': 1},
{'rate-tc-bw': 0, 'rate-tc-index': 2},
{'rate-tc-bw': 0, 'rate-tc-index': 3},
{'rate-tc-bw': 0, 'rate-tc-index': 4},
{'rate-tc-bw': 0, 'rate-tc-index': 5},
{'rate-tc-bw': 0, 'rate-tc-index': 6},
{'rate-tc-bw': 0, 'rate-tc-index': 7}],
'rate-tx-max': 0,
'rate-tx-priority': 0,
'rate-tx-share': 0,
'rate-tx-weight': 0,
'rate-type': 'leaf'}
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
Documentation/netlink/specs/devlink.yaml | 28 ++++-
include/net/devlink.h | 7 ++
include/uapi/linux/devlink.h | 4 +
net/devlink/netlink_gen.c | 15 ++-
net/devlink/netlink_gen.h | 1 +
net/devlink/rate.c | 124 +++++++++++++++++++++++
6 files changed, 174 insertions(+), 5 deletions(-)
diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
index 09fbb4c03fc8..7fceb8fdc73b 100644
--- a/Documentation/netlink/specs/devlink.yaml
+++ b/Documentation/netlink/specs/devlink.yaml
@@ -820,7 +820,23 @@ attribute-sets:
-
name: region-direct
type: flag
-
+ -
+ name: rate-tc-bws
+ type: nest
+ multi-attr: true
+ nested-attributes: dl-rate-tc-bws
+ -
+ name: rate-tc-index
+ type: u8
+ -
+ name: rate-tc-bw
+ type: u32
+ doc: |
+ Specifies the bandwidth allocation for the Traffic Class as a
+ percentage.
+ checks:
+ min: 0
+ max: 100
-
name: dl-dev-stats
subset-of: devlink
@@ -1225,6 +1241,14 @@ attribute-sets:
-
name: flash
type: flag
+ -
+ name: dl-rate-tc-bws
+ subset-of: devlink
+ attributes:
+ -
+ name: rate-tc-index
+ -
+ name: rate-tc-bw
operations:
enum-model: directional
@@ -2149,6 +2173,7 @@ operations:
- rate-tx-priority
- rate-tx-weight
- rate-parent-node-name
+ - rate-tc-bws
-
name: rate-new
@@ -2169,6 +2194,7 @@ operations:
- rate-tx-priority
- rate-tx-weight
- rate-parent-node-name
+ - rate-tc-bws
-
name: rate-del
diff --git a/include/net/devlink.h b/include/net/devlink.h
index fbb9a2668e24..277b826cdd60 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -20,6 +20,7 @@
#include <uapi/linux/devlink.h>
#include <linux/xarray.h>
#include <linux/firmware.h>
+#include <linux/dcbnl.h>
struct devlink;
struct devlink_linecard;
@@ -117,6 +118,8 @@ struct devlink_rate {
u32 tx_priority;
u32 tx_weight;
+
+ u32 tc_bw[IEEE_8021QAZ_MAX_TCS];
};
struct devlink_port {
@@ -1469,6 +1472,8 @@ struct devlink_ops {
u32 tx_priority, struct netlink_ext_ack *extack);
int (*rate_leaf_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv,
u32 tx_weight, struct netlink_ext_ack *extack);
+ int (*rate_leaf_tc_bw_set)(struct devlink_rate *devlink_rate, void *priv,
+ u32 *tc_bw, struct netlink_ext_ack *extack);
int (*rate_node_tx_share_set)(struct devlink_rate *devlink_rate, void *priv,
u64 tx_share, struct netlink_ext_ack *extack);
int (*rate_node_tx_max_set)(struct devlink_rate *devlink_rate, void *priv,
@@ -1477,6 +1482,8 @@ struct devlink_ops {
u32 tx_priority, struct netlink_ext_ack *extack);
int (*rate_node_tx_weight_set)(struct devlink_rate *devlink_rate, void *priv,
u32 tx_weight, struct netlink_ext_ack *extack);
+ int (*rate_node_tc_bw_set)(struct devlink_rate *devlink_rate, void *priv,
+ u32 *tc_bw, struct netlink_ext_ack *extack);
int (*rate_node_new)(struct devlink_rate *rate_node, void **priv,
struct netlink_ext_ack *extack);
int (*rate_node_del)(struct devlink_rate *rate_node, void *priv,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 9401aa343673..b3b538c67c34 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -614,6 +614,10 @@ enum devlink_attr {
DEVLINK_ATTR_REGION_DIRECT, /* flag */
+ DEVLINK_ATTR_RATE_TC_BWS, /* nested */
+ DEVLINK_ATTR_RATE_TC_INDEX, /* u8 */
+ DEVLINK_ATTR_RATE_TC_BW, /* u32 */
+
/* Add new attributes above here, update the spec in
* Documentation/netlink/specs/devlink.yaml and re-generate
* net/devlink/netlink_gen.c.
diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
index f9786d51f68f..b7b7829175dc 100644
--- a/net/devlink/netlink_gen.c
+++ b/net/devlink/netlink_gen.c
@@ -18,6 +18,11 @@ const struct nla_policy devlink_dl_port_function_nl_policy[DEVLINK_PORT_FN_ATTR_
[DEVLINK_PORT_FN_ATTR_CAPS] = NLA_POLICY_BITFIELD32(15),
};
+const struct nla_policy devlink_dl_rate_tc_bws_nl_policy[DEVLINK_ATTR_RATE_TC_BW + 1] = {
+ [DEVLINK_ATTR_RATE_TC_INDEX] = { .type = NLA_U8, },
+ [DEVLINK_ATTR_RATE_TC_BW] = NLA_POLICY_RANGE(NLA_U32, 0, 100),
+};
+
const struct nla_policy devlink_dl_selftest_id_nl_policy[DEVLINK_ATTR_SELFTEST_ID_FLASH + 1] = {
[DEVLINK_ATTR_SELFTEST_ID_FLASH] = { .type = NLA_FLAG, },
};
@@ -496,7 +501,7 @@ static const struct nla_policy devlink_rate_get_dump_nl_policy[DEVLINK_ATTR_DEV_
};
/* DEVLINK_CMD_RATE_SET - do */
-static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_RATE_TX_WEIGHT + 1] = {
+static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_RATE_TC_BWS + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING, },
@@ -505,10 +510,11 @@ static const struct nla_policy devlink_rate_set_nl_policy[DEVLINK_ATTR_RATE_TX_W
[DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32, },
[DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32, },
[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] = { .type = NLA_NUL_STRING, },
+ [DEVLINK_ATTR_RATE_TC_BWS] = NLA_POLICY_NESTED(devlink_dl_rate_tc_bws_nl_policy),
};
/* DEVLINK_CMD_RATE_NEW - do */
-static const struct nla_policy devlink_rate_new_nl_policy[DEVLINK_ATTR_RATE_TX_WEIGHT + 1] = {
+static const struct nla_policy devlink_rate_new_nl_policy[DEVLINK_ATTR_RATE_TC_BWS + 1] = {
[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
[DEVLINK_ATTR_RATE_NODE_NAME] = { .type = NLA_NUL_STRING, },
@@ -517,6 +523,7 @@ static const struct nla_policy devlink_rate_new_nl_policy[DEVLINK_ATTR_RATE_TX_W
[DEVLINK_ATTR_RATE_TX_PRIORITY] = { .type = NLA_U32, },
[DEVLINK_ATTR_RATE_TX_WEIGHT] = { .type = NLA_U32, },
[DEVLINK_ATTR_RATE_PARENT_NODE_NAME] = { .type = NLA_NUL_STRING, },
+ [DEVLINK_ATTR_RATE_TC_BWS] = NLA_POLICY_NESTED(devlink_dl_rate_tc_bws_nl_policy),
};
/* DEVLINK_CMD_RATE_DEL - do */
@@ -1164,7 +1171,7 @@ const struct genl_split_ops devlink_nl_ops[74] = {
.doit = devlink_nl_rate_set_doit,
.post_doit = devlink_nl_post_doit,
.policy = devlink_rate_set_nl_policy,
- .maxattr = DEVLINK_ATTR_RATE_TX_WEIGHT,
+ .maxattr = DEVLINK_ATTR_RATE_TC_BWS,
.flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
},
{
@@ -1174,7 +1181,7 @@ const struct genl_split_ops devlink_nl_ops[74] = {
.doit = devlink_nl_rate_new_doit,
.post_doit = devlink_nl_post_doit,
.policy = devlink_rate_new_nl_policy,
- .maxattr = DEVLINK_ATTR_RATE_TX_WEIGHT,
+ .maxattr = DEVLINK_ATTR_RATE_TC_BWS,
.flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
},
{
diff --git a/net/devlink/netlink_gen.h b/net/devlink/netlink_gen.h
index 8f2bd50ddf5e..fb733b5d4ff1 100644
--- a/net/devlink/netlink_gen.h
+++ b/net/devlink/netlink_gen.h
@@ -13,6 +13,7 @@
/* Common nested types */
extern const struct nla_policy devlink_dl_port_function_nl_policy[DEVLINK_PORT_FN_ATTR_CAPS + 1];
+extern const struct nla_policy devlink_dl_rate_tc_bws_nl_policy[DEVLINK_ATTR_RATE_TC_BW + 1];
extern const struct nla_policy devlink_dl_selftest_id_nl_policy[DEVLINK_ATTR_SELFTEST_ID_FLASH + 1];
/* Ops table for devlink */
diff --git a/net/devlink/rate.c b/net/devlink/rate.c
index 8828ffaf6cbc..4bee2481115c 100644
--- a/net/devlink/rate.c
+++ b/net/devlink/rate.c
@@ -80,6 +80,29 @@ devlink_rate_get_from_info(struct devlink *devlink, struct genl_info *info)
return ERR_PTR(-EINVAL);
}
+static int devlink_rate_put_tc_bws(struct sk_buff *msg, u32 *tc_bw)
+{
+ struct nlattr *nla_tc_bw;
+ int i;
+
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+ nla_tc_bw = nla_nest_start(msg, DEVLINK_ATTR_RATE_TC_BWS);
+ if (!nla_tc_bw)
+ return -EMSGSIZE;
+
+ if (nla_put_u8(msg, DEVLINK_ATTR_RATE_TC_INDEX, i) ||
+ nla_put_u32(msg, DEVLINK_ATTR_RATE_TC_BW, tc_bw[i]))
+ goto nla_put_failure;
+
+ nla_nest_end(msg, nla_tc_bw);
+ }
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(msg, nla_tc_bw);
+ return -EMSGSIZE;
+}
+
static int devlink_nl_rate_fill(struct sk_buff *msg,
struct devlink_rate *devlink_rate,
enum devlink_command cmd, u32 portid, u32 seq,
@@ -129,6 +152,9 @@ static int devlink_nl_rate_fill(struct sk_buff *msg,
devlink_rate->parent->name))
goto nla_put_failure;
+ if (devlink_rate_put_tc_bws(msg, devlink_rate->tc_bw))
+ goto nla_put_failure;
+
genlmsg_end(msg, hdr);
return 0;
@@ -316,6 +342,86 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
return 0;
}
+static int devlink_nl_rate_tc_bw_parse(struct nlattr *parent_nest, u32 *tc_bw,
+ unsigned long *bitmap, struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[DEVLINK_ATTR_MAX + 1];
+ u8 tc_index;
+
+ nla_parse_nested(tb, DEVLINK_ATTR_MAX, parent_nest, devlink_dl_rate_tc_bws_nl_policy,
+ extack);
+ if (!tb[DEVLINK_ATTR_RATE_TC_INDEX]) {
+ NL_SET_ERR_MSG(extack, "Traffic class index is expected");
+ return -EINVAL;
+ }
+
+ tc_index = nla_get_u8(tb[DEVLINK_ATTR_RATE_TC_INDEX]);
+
+ if (tc_index >= IEEE_8021QAZ_MAX_TCS) {
+ NL_SET_ERR_MSG_FMT(extack,
+ "Provided traffic class index (%u) exceeds the maximum allowed value (%u)",
+ tc_index, IEEE_8021QAZ_MAX_TCS - 1);
+ return -EINVAL;
+ }
+
+ if (!tb[DEVLINK_ATTR_RATE_TC_BW]) {
+ NL_SET_ERR_MSG(extack, "Traffic class bandwidth is expected");
+ return -EINVAL;
+ }
+
+ if (test_and_set_bit(tc_index, bitmap)) {
+ NL_SET_ERR_MSG(extack, "Duplicate traffic class index specified");
+ return -EINVAL;
+ }
+
+ tc_bw[tc_index] = nla_get_u32(tb[DEVLINK_ATTR_RATE_TC_BW]);
+
+ return 0;
+}
+
+static int devlink_nl_rate_tc_bw_set(struct devlink_rate *devlink_rate,
+ struct genl_info *info)
+{
+ DECLARE_BITMAP(bitmap, IEEE_8021QAZ_MAX_TCS) = {};
+ struct devlink *devlink = devlink_rate->devlink;
+ const struct devlink_ops *ops = devlink->ops;
+ u32 tc_bw[IEEE_8021QAZ_MAX_TCS] = {};
+ int rem, err = -EOPNOTSUPP, i;
+ struct nlattr *attr;
+
+ nla_for_each_attr(attr, genlmsg_data(info->genlhdr),
+ genlmsg_len(info->genlhdr), rem) {
+ if (nla_type(attr) == DEVLINK_ATTR_RATE_TC_BWS) {
+ err = devlink_nl_rate_tc_bw_parse(attr, tc_bw, bitmap, info->extack);
+ if (err)
+ return err;
+ }
+ }
+
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+ if (!test_bit(i, bitmap)) {
+ NL_SET_ERR_MSG_FMT(info->extack,
+ "Bandwidth values must be specified for all %u traffic classes",
+ IEEE_8021QAZ_MAX_TCS);
+ return -EINVAL;
+ }
+ }
+
+ if (devlink_rate_is_leaf(devlink_rate))
+ err = ops->rate_leaf_tc_bw_set(devlink_rate, devlink_rate->priv, tc_bw,
+ info->extack);
+ else if (devlink_rate_is_node(devlink_rate))
+ err = ops->rate_node_tc_bw_set(devlink_rate, devlink_rate->priv, tc_bw,
+ info->extack);
+
+ if (err)
+ return err;
+
+ memcpy(devlink_rate->tc_bw, tc_bw, sizeof(tc_bw));
+
+ return 0;
+}
+
static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
const struct devlink_ops *ops,
struct genl_info *info)
@@ -388,6 +494,12 @@ static int devlink_nl_rate_set(struct devlink_rate *devlink_rate,
return err;
}
+ if (attrs[DEVLINK_ATTR_RATE_TC_BWS]) {
+ err = devlink_nl_rate_tc_bw_set(devlink_rate, info);
+ if (err)
+ return err;
+ }
+
return 0;
}
@@ -423,6 +535,12 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
"TX weight set isn't supported for the leafs");
return false;
}
+ if (attrs[DEVLINK_ATTR_RATE_TC_BWS] && !ops->rate_leaf_tc_bw_set) {
+ NL_SET_ERR_MSG_ATTR(info->extack,
+ attrs[DEVLINK_ATTR_RATE_TC_BWS],
+ "TC bandwidth set isn't supported for the leafs");
+ return false;
+ }
} else if (type == DEVLINK_RATE_TYPE_NODE) {
if (attrs[DEVLINK_ATTR_RATE_TX_SHARE] && !ops->rate_node_tx_share_set) {
NL_SET_ERR_MSG(info->extack, "TX share set isn't supported for the nodes");
@@ -449,6 +567,12 @@ static bool devlink_rate_set_ops_supported(const struct devlink_ops *ops,
"TX weight set isn't supported for the nodes");
return false;
}
+ if (attrs[DEVLINK_ATTR_RATE_TC_BWS] && !ops->rate_node_tc_bw_set) {
+ NL_SET_ERR_MSG_ATTR(info->extack,
+ attrs[DEVLINK_ATTR_RATE_TC_BWS],
+ "TC bandwidth set isn't supported for the nodes");
+ return false;
+ }
} else {
WARN(1, "Unknown type of rate object");
return false;
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next V5 08/11] net/mlx5: Add no-op implementation for setting tc-bw on rate objects
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (6 preceding siblings ...)
2024-12-04 22:09 ` [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 09/11] net/mlx5: Add support for setting tc-bw on nodes Tariq Toukan
` (4 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Carolina Jubran,
Cosmin Ratiu, Tariq Toukan
From: Carolina Jubran <cjubran@nvidia.com>
Introduce `mlx5_esw_devlink_rate_node_tc_bw_set()` and
`mlx5_esw_devlink_rate_leaf_tc_bw_set()` with no-op logic.
Future patches will add support for setting traffic class bandwidth
on rate objects.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 2 ++
drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c | 14 ++++++++++++++
drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h | 4 ++++
3 files changed, 20 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 98d4306929f3..728d5c06d612 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -320,6 +320,8 @@ static const struct devlink_ops mlx5_devlink_ops = {
.eswitch_encap_mode_get = mlx5_devlink_eswitch_encap_mode_get,
.rate_leaf_tx_share_set = mlx5_esw_devlink_rate_leaf_tx_share_set,
.rate_leaf_tx_max_set = mlx5_esw_devlink_rate_leaf_tx_max_set,
+ .rate_leaf_tc_bw_set = mlx5_esw_devlink_rate_leaf_tc_bw_set,
+ .rate_node_tc_bw_set = mlx5_esw_devlink_rate_node_tc_bw_set,
.rate_node_tx_share_set = mlx5_esw_devlink_rate_node_tx_share_set,
.rate_node_tx_max_set = mlx5_esw_devlink_rate_node_tx_max_set,
.rate_node_new = mlx5_esw_devlink_rate_node_new,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index 8b7c843446e1..db112a87b7ee 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -882,6 +882,20 @@ int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *
return err;
}
+int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, void *priv,
+ u32 *tc_bw, struct netlink_ext_ack *extack)
+{
+ NL_SET_ERR_MSG_MOD(extack, "TC bandwidth shares are not supported on leafs");
+ return -EOPNOTSUPP;
+}
+
+int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, void *priv,
+ u32 *tc_bw, struct netlink_ext_ack *extack)
+{
+ NL_SET_ERR_MSG_MOD(extack, "TC bandwidth shares are not supported on nodes");
+ return -EOPNOTSUPP;
+}
+
int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, void *priv,
u64 tx_share, struct netlink_ext_ack *extack)
{
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h
index 6eb8f6a648c8..0239f10f95e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.h
@@ -21,6 +21,10 @@ int mlx5_esw_devlink_rate_leaf_tx_share_set(struct devlink_rate *rate_leaf, void
u64 tx_share, struct netlink_ext_ack *extack);
int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *priv,
u64 tx_max, struct netlink_ext_ack *extack);
+int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_node, void *priv,
+ u32 *tc_bw, struct netlink_ext_ack *extack);
+int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, void *priv,
+ u32 *tc_bw, struct netlink_ext_ack *extack);
int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, void *priv,
u64 tx_share, struct netlink_ext_ack *extack);
int mlx5_esw_devlink_rate_node_tx_max_set(struct devlink_rate *rate_node, void *priv,
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next V5 09/11] net/mlx5: Add support for setting tc-bw on nodes
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (7 preceding siblings ...)
2024-12-04 22:09 ` [PATCH net-next V5 08/11] net/mlx5: Add no-op implementation for setting tc-bw on rate objects Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 10/11] net/mlx5: Add traffic class scheduling support for vport QoS Tariq Toukan
` (3 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Carolina Jubran,
Cosmin Ratiu, Tariq Toukan
From: Carolina Jubran <cjubran@nvidia.com>
Introduce support for enabling and disabling Traffic Class (TC)
arbitration for existing devlink rate nodes. This patch adds support
for a new scheduling node type, `SCHED_NODE_TYPE_TC_ARBITER_TSAR`.
Key changes include:
- New helper functions for transitioning existing rate nodes to TC
arbiter nodes and vice versa. These functions handle the allocation
of TC arbiter nodes, copying of child nodes, and restoring vport QoS
settings when TC arbitration is disabled.
- Implementation of `mlx5_esw_devlink_rate_node_tc_bw_set()` to manage
tc-bw configuration on nodes.
- Introduced stubs for `esw_qos_tc_arbiter_scheduling_setup()` and
`esw_qos_tc_arbiter_scheduling_teardown()`, which will be extended in
future patches to provide full support for tc-bw on devlink rate
objects.
- Validation functions for tc-bw settings, allowing graceful handling
of unsupported traffic class bandwidth configurations.
- Updated `__esw_qos_alloc_node()` to insert the new node into the
parent’s children list only if the parent is not NULL. For the root
TSAR, the new node is inserted directly after the allocation call.
This patch lays the groundwork for future support for configuring tc-bw
on devlink rate nodes. Although the infrastructure is in place, full
support for tc-bw is not yet implemented; attempts to set tc-bw on
nodes will return `-EOPNOTSUPP`.
No functional changes are introduced at this stage.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 260 +++++++++++++++++-
1 file changed, 246 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index db112a87b7ee..b17c3a82d175 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -64,11 +64,13 @@ static void esw_qos_domain_release(struct mlx5_eswitch *esw)
enum sched_node_type {
SCHED_NODE_TYPE_VPORTS_TSAR,
SCHED_NODE_TYPE_VPORT,
+ SCHED_NODE_TYPE_TC_ARBITER_TSAR,
};
static const char * const sched_node_type_str[] = {
[SCHED_NODE_TYPE_VPORTS_TSAR] = "vports TSAR",
[SCHED_NODE_TYPE_VPORT] = "vport",
+ [SCHED_NODE_TYPE_TC_ARBITER_TSAR] = "TC Arbiter TSAR",
};
struct mlx5_esw_sched_node {
@@ -92,6 +94,13 @@ struct mlx5_esw_sched_node {
struct mlx5_vport *vport;
};
+static int esw_qos_num_tcs(struct mlx5_core_dev *dev)
+{
+ int num_tcs = mlx5_max_tc(dev) + 1;
+
+ return num_tcs < IEEE_8021QAZ_MAX_TCS ? num_tcs : IEEE_8021QAZ_MAX_TCS;
+}
+
static void
esw_qos_node_set_parent(struct mlx5_esw_sched_node *node, struct mlx5_esw_sched_node *parent)
{
@@ -101,6 +110,15 @@ esw_qos_node_set_parent(struct mlx5_esw_sched_node *node, struct mlx5_esw_sched_
node->esw = parent->esw;
}
+static void
+esw_qos_nodes_set_parent(struct list_head *nodes, struct mlx5_esw_sched_node *parent)
+{
+ struct mlx5_esw_sched_node *node, *tmp;
+
+ list_for_each_entry_safe(node, tmp, nodes, entry)
+ esw_qos_node_set_parent(node, parent);
+}
+
void mlx5_esw_qos_vport_qos_free(struct mlx5_vport *vport)
{
kfree(vport->qos.sched_node);
@@ -126,16 +144,23 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport *vport)
static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int err, const char *op)
{
- if (node->vport) {
+ switch (node->type) {
+ case SCHED_NODE_TYPE_VPORT:
esw_warn(node->esw->dev,
"E-Switch %s %s scheduling element failed (vport=%d,err=%d)\n",
op, sched_node_type_str[node->type], node->vport->vport, err);
- return;
+ break;
+ case SCHED_NODE_TYPE_TC_ARBITER_TSAR:
+ case SCHED_NODE_TYPE_VPORTS_TSAR:
+ esw_warn(node->esw->dev,
+ "E-Switch %s %s scheduling element failed (err=%d)\n",
+ op, sched_node_type_str[node->type], err);
+ break;
+ default:
+ esw_warn(node->esw->dev,
+ "E-Switch %s scheduling element failed (err=%d)\n", op, err);
+ break;
}
-
- esw_warn(node->esw->dev,
- "E-Switch %s %s scheduling element failed (err=%d)\n",
- op, sched_node_type_str[node->type], err);
}
static int esw_qos_node_create_sched_element(struct mlx5_esw_sched_node *node, void *ctx,
@@ -358,7 +383,6 @@ static struct mlx5_esw_sched_node *
__esw_qos_alloc_node(struct mlx5_eswitch *esw, u32 tsar_ix, enum sched_node_type type,
struct mlx5_esw_sched_node *parent)
{
- struct list_head *parent_children;
struct mlx5_esw_sched_node *node;
node = kzalloc(sizeof(*node), GFP_KERNEL);
@@ -370,8 +394,10 @@ __esw_qos_alloc_node(struct mlx5_eswitch *esw, u32 tsar_ix, enum sched_node_type
node->type = type;
node->parent = parent;
INIT_LIST_HEAD(&node->children);
- parent_children = parent ? &parent->children : &esw->qos.domain->nodes;
- list_add_tail(&node->entry, parent_children);
+ if (parent)
+ list_add_tail(&node->entry, &parent->children);
+ else
+ INIT_LIST_HEAD(&node->entry);
return node;
}
@@ -409,6 +435,7 @@ __esw_qos_create_vports_sched_node(struct mlx5_eswitch *esw, struct mlx5_esw_sch
goto err_alloc_node;
}
+ list_add_tail(&node->entry, &esw->qos.domain->nodes);
esw_qos_normalize_min_rate(esw, NULL, extack);
trace_mlx5_esw_node_qos_create(esw->dev, node, node->ix);
@@ -475,11 +502,11 @@ static int esw_qos_create(struct mlx5_eswitch *esw, struct netlink_ext_ack *exta
/* The eswitch doesn't support scheduling nodes.
* Create a software-only node0 using the root TSAR to attach vport QoS to.
*/
- if (!__esw_qos_alloc_node(esw,
- esw->qos.root_tsar_ix,
- SCHED_NODE_TYPE_VPORTS_TSAR,
+ if (!__esw_qos_alloc_node(esw, esw->qos.root_tsar_ix, SCHED_NODE_TYPE_VPORTS_TSAR,
NULL))
esw->qos.node0 = ERR_PTR(-ENOMEM);
+ else
+ list_add_tail(&esw->qos.node0->entry, &esw->qos.domain->nodes);
}
if (IS_ERR(esw->qos.node0)) {
err = PTR_ERR(esw->qos.node0);
@@ -537,6 +564,17 @@ static void esw_qos_put(struct mlx5_eswitch *esw)
esw_qos_destroy(esw);
}
+static void esw_qos_tc_arbiter_scheduling_teardown(struct mlx5_esw_sched_node *node,
+ struct netlink_ext_ack *extack)
+{}
+
+static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node *node,
+ struct netlink_ext_ack *extack)
+{
+ NL_SET_ERR_MSG_MOD(extack, "TC arbiter elements are not supported.");
+ return -EOPNOTSUPP;
+}
+
static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_ack *extack)
{
struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node;
@@ -699,6 +737,157 @@ static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw
return err;
}
+static void esw_qos_switch_vport_tcs_to_vport(struct mlx5_esw_sched_node *tc_arbiter_node,
+ struct mlx5_esw_sched_node *node,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *vports_tc_node, *vport_tc_node, *tmp;
+
+ vports_tc_node = list_first_entry(&tc_arbiter_node->children, struct mlx5_esw_sched_node,
+ entry);
+
+ list_for_each_entry_safe(vport_tc_node, tmp, &vports_tc_node->children, entry)
+ esw_qos_vport_update_parent(vport_tc_node->vport, node, extack);
+}
+
+static int esw_qos_switch_tc_arbiter_node_to_vports(struct mlx5_esw_sched_node *tc_arbiter_node,
+ struct mlx5_esw_sched_node *node,
+ struct netlink_ext_ack *extack)
+{
+ u32 parent_tsar_ix = node->parent ? node->parent->ix : node->esw->qos.root_tsar_ix;
+ int err;
+
+ err = esw_qos_create_node_sched_elem(node->esw->dev, parent_tsar_ix, &node->ix);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Failed to create scheduling element for vports node when disabliing vports TC QoS");
+ return err;
+ }
+
+ node->type = SCHED_NODE_TYPE_VPORTS_TSAR;
+
+ /* Disable TC QoS for vports in the arbiter node. */
+ esw_qos_switch_vport_tcs_to_vport(tc_arbiter_node, node, extack);
+
+ return 0;
+}
+
+static int esw_qos_switch_vports_node_to_tc_arbiter(struct mlx5_esw_sched_node *node,
+ struct mlx5_esw_sched_node *tc_arbiter_node,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *vport_node, *tmp;
+ struct mlx5_vport *vport;
+ int err;
+
+ /* Enable TC QoS for each vport in the node. */
+ list_for_each_entry_safe(vport_node, tmp, &node->children, entry) {
+ vport = vport_node->vport;
+ err = esw_qos_vport_update_parent(vport, tc_arbiter_node, extack);
+ if (err)
+ goto err_out;
+ }
+
+ /* Destroy the current vports node TSAR. */
+ err = mlx5_destroy_scheduling_element_cmd(node->esw->dev, SCHEDULING_HIERARCHY_E_SWITCH,
+ node->ix);
+ if (err)
+ goto err_out;
+
+ return 0;
+err_out:
+ /* Restore vports back into the node if an error occurs. */
+ esw_qos_switch_vport_tcs_to_vport(tc_arbiter_node, node, NULL);
+
+ return err;
+}
+
+static struct mlx5_esw_sched_node *esw_qos_move_node(struct mlx5_esw_sched_node *curr_node)
+{
+ struct mlx5_esw_sched_node *new_node;
+
+ new_node = __esw_qos_alloc_node(curr_node->esw, curr_node->ix, curr_node->type, NULL);
+ if (!IS_ERR(new_node))
+ esw_qos_nodes_set_parent(&curr_node->children, new_node);
+
+ return new_node;
+}
+
+static int esw_qos_node_disable_tc_arbitration(struct mlx5_esw_sched_node *node,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *curr_node;
+ int err;
+
+ if (node->type != SCHED_NODE_TYPE_TC_ARBITER_TSAR)
+ return 0;
+
+ /* Allocate a new rate node to hold the current state, which will allow
+ * for restoring the vports back to this node after disabling TC arbitration.
+ */
+ curr_node = esw_qos_move_node(node);
+ if (IS_ERR(curr_node)) {
+ NL_SET_ERR_MSG_MOD(extack, "Failed setting up vports node");
+ return PTR_ERR(curr_node);
+ }
+
+ /* Disable TC QoS for all vports, and assign them back to the node. */
+ err = esw_qos_switch_tc_arbiter_node_to_vports(curr_node, node, extack);
+ if (err)
+ goto err_out;
+
+ /* Clean up the TC arbiter node after disabling TC QoS for vports. */
+ esw_qos_tc_arbiter_scheduling_teardown(curr_node, extack);
+ goto out;
+err_out:
+ esw_qos_nodes_set_parent(&curr_node->children, node);
+out:
+ __esw_qos_free_node(curr_node);
+ return err;
+}
+
+static int esw_qos_node_enable_tc_arbitration(struct mlx5_esw_sched_node *node,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *curr_node;
+ int err;
+
+ if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR)
+ return 0;
+
+ /* Allocate a new node that will store the information of the current node.
+ * This will be used later to restore the node if necessary.
+ */
+ curr_node = esw_qos_move_node(node);
+ if (IS_ERR(curr_node)) {
+ NL_SET_ERR_MSG_MOD(extack, "Failed setting up node TC QoS");
+ return PTR_ERR(curr_node);
+ }
+
+ /* Initialize the TC arbiter node for QoS management.
+ * This step prepares the node for handling Traffic Class arbitration.
+ */
+ err = esw_qos_tc_arbiter_scheduling_setup(node, extack);
+ if (err)
+ goto err_setup;
+
+ /* Enable TC QoS for each vport within the current node. */
+ err = esw_qos_switch_vports_node_to_tc_arbiter(curr_node, node, extack);
+ if (err)
+ goto err_switch_vports;
+ goto out;
+
+err_switch_vports:
+ esw_qos_tc_arbiter_scheduling_teardown(node, NULL);
+ node->ix = curr_node->ix;
+ node->type = curr_node->type;
+err_setup:
+ esw_qos_nodes_set_parent(&curr_node->children, node);
+out:
+ __esw_qos_free_node(curr_node);
+ return err;
+}
+
static u32 mlx5_esw_qos_lag_link_speed_get_locked(struct mlx5_core_dev *mdev)
{
struct ethtool_link_ksettings lksettings;
@@ -824,6 +1013,30 @@ static int esw_qos_devlink_rate_to_mbps(struct mlx5_core_dev *mdev, const char *
return 0;
}
+static bool esw_qos_validate_unsupported_tc_bw(struct mlx5_eswitch *esw, u32 *tc_bw)
+{
+ int i, num_tcs = esw_qos_num_tcs(esw->dev);
+
+ for (i = num_tcs; i < IEEE_8021QAZ_MAX_TCS; i++) {
+ if (tc_bw[i])
+ return false;
+ }
+
+ return true;
+}
+
+static bool esw_qos_tc_bw_disabled(u32 *tc_bw)
+{
+ int i;
+
+ for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++) {
+ if (tc_bw[i])
+ return false;
+ }
+
+ return true;
+}
+
int mlx5_esw_qos_init(struct mlx5_eswitch *esw)
{
if (esw->qos.domain)
@@ -892,8 +1105,27 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, void *p
int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, void *priv,
u32 *tc_bw, struct netlink_ext_ack *extack)
{
- NL_SET_ERR_MSG_MOD(extack, "TC bandwidth shares are not supported on nodes");
- return -EOPNOTSUPP;
+ struct mlx5_esw_sched_node *node = priv;
+ struct mlx5_eswitch *esw = node->esw;
+ bool disable;
+ int err;
+
+ if (!esw_qos_validate_unsupported_tc_bw(esw, tc_bw)) {
+ NL_SET_ERR_MSG_MOD(extack, "E-Switch traffic classes number is not supported");
+ return -EOPNOTSUPP;
+ }
+
+ disable = esw_qos_tc_bw_disabled(tc_bw);
+ esw_qos_lock(esw);
+ if (disable) {
+ err = esw_qos_node_disable_tc_arbitration(node, extack);
+ goto unlock;
+ }
+
+ err = esw_qos_node_enable_tc_arbitration(node, extack);
+unlock:
+ esw_qos_unlock(esw);
+ return err;
}
int mlx5_esw_devlink_rate_node_tx_share_set(struct devlink_rate *rate_node, void *priv,
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next V5 10/11] net/mlx5: Add traffic class scheduling support for vport QoS
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (8 preceding siblings ...)
2024-12-04 22:09 ` [PATCH net-next V5 09/11] net/mlx5: Add support for setting tc-bw on nodes Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 11/11] net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw Tariq Toukan
` (2 subsequent siblings)
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Carolina Jubran,
Cosmin Ratiu, Tariq Toukan
From: Carolina Jubran <cjubran@nvidia.com>
Introduce support for traffic class (TC) scheduling on vports by
allowing the vport to own multiple TC scheduling nodes. This patch
enables more granular control of QoS by defining three distinct QoS
states for vports, each providing unique scheduling behavior:
1. Regular QoS: The `sched_node` represents the vport directly,
handling QoS as a single scheduling entity.
2. TC QoS on the vport: The `sched_node` acts as a TC arbiter, enabling
TC scheduling directly on the vport.
3. TC QoS on the parent node: The `sched_node` functions as a rate
limiter, with TC arbitration enabled at the parent level, associating
multiple scheduling nodes with each vport.
Key changes include:
- Added support for new scheduling elements, vport traffic class and
rate limiter.
- New helper functions for creating, destroying, and restoring vport TC
scheduling nodes, handling transitions between regular QoS and TC
arbitration states.
- Updated `esw_qos_vport_enable()` and `esw_qos_vport_disable()` to
support both regular QoS and TC arbitration states, ensuring consistent
transitions between scheduling modes.
- Introduced a `sched_nodes` array under `vport->qos` to store multiple
TC scheduling nodes per vport, enabling finer control over per-TC QoS.
- Enhanced `esw_qos_vport_update_parent()` to handle transitions between
the three QoS states based on the current and new parent node types.
This patch lays the groundwork for future support for configuring tc-bw
on vports. Although the infrastructure is in place, full support for
tc-bw is not yet implemented; attempts to set tc-bw on vports will
return `-EOPNOTSUPP`.
No functional changes are introduced at this stage.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 360 +++++++++++++++++-
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 13 +-
2 files changed, 352 insertions(+), 21 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index b17c3a82d175..afb00deaae16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -65,12 +65,16 @@ enum sched_node_type {
SCHED_NODE_TYPE_VPORTS_TSAR,
SCHED_NODE_TYPE_VPORT,
SCHED_NODE_TYPE_TC_ARBITER_TSAR,
+ SCHED_NODE_TYPE_RATE_LIMITER,
+ SCHED_NODE_TYPE_VPORT_TC,
};
static const char * const sched_node_type_str[] = {
[SCHED_NODE_TYPE_VPORTS_TSAR] = "vports TSAR",
[SCHED_NODE_TYPE_VPORT] = "vport",
[SCHED_NODE_TYPE_TC_ARBITER_TSAR] = "TC Arbiter TSAR",
+ [SCHED_NODE_TYPE_RATE_LIMITER] = "Rate Limiter",
+ [SCHED_NODE_TYPE_VPORT_TC] = "vport TC",
};
struct mlx5_esw_sched_node {
@@ -92,6 +96,8 @@ struct mlx5_esw_sched_node {
struct list_head children;
/* Valid only if this node is associated with a vport. */
struct mlx5_vport *vport;
+ /* Valid only when this node represents a traffic class. */
+ u8 tc;
};
static int esw_qos_num_tcs(struct mlx5_core_dev *dev)
@@ -121,6 +127,14 @@ esw_qos_nodes_set_parent(struct list_head *nodes, struct mlx5_esw_sched_node *pa
void mlx5_esw_qos_vport_qos_free(struct mlx5_vport *vport)
{
+ if (vport->qos.sched_nodes) {
+ int i, num_tcs = esw_qos_num_tcs(vport->qos.sched_node->esw->dev);
+
+ for (i = 0; i < num_tcs; i++)
+ kfree(vport->qos.sched_nodes[i]);
+ kfree(vport->qos.sched_nodes);
+ }
+
kfree(vport->qos.sched_node);
memset(&vport->qos, 0, sizeof(vport->qos));
}
@@ -145,11 +159,17 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport *vport)
static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int err, const char *op)
{
switch (node->type) {
+ case SCHED_NODE_TYPE_VPORT_TC:
+ esw_warn(node->esw->dev,
+ "E-Switch %s %s scheduling element failed (vport=%d,tc=%d,err=%d)\n",
+ op, sched_node_type_str[node->type], node->vport->vport, node->tc, err);
+ break;
case SCHED_NODE_TYPE_VPORT:
esw_warn(node->esw->dev,
"E-Switch %s %s scheduling element failed (vport=%d,err=%d)\n",
op, sched_node_type_str[node->type], node->vport->vport, err);
break;
+ case SCHED_NODE_TYPE_RATE_LIMITER:
case SCHED_NODE_TYPE_TC_ARBITER_TSAR:
case SCHED_NODE_TYPE_VPORTS_TSAR:
esw_warn(node->esw->dev,
@@ -243,6 +263,23 @@ static int esw_qos_sched_elem_config(struct mlx5_esw_sched_node *node, u32 max_r
return 0;
}
+static int esw_qos_create_rate_limit_element(struct mlx5_esw_sched_node *node,
+ struct netlink_ext_ack *extack)
+{
+ u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {};
+
+ if (!mlx5_qos_element_type_supported(node->esw->dev,
+ SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT,
+ SCHEDULING_HIERARCHY_E_SWITCH))
+ return -EOPNOTSUPP;
+
+ MLX5_SET(scheduling_context, sched_ctx, max_average_bw, node->max_rate);
+ MLX5_SET(scheduling_context, sched_ctx, element_type,
+ SCHEDULING_CONTEXT_ELEMENT_TYPE_RATE_LIMIT);
+
+ return esw_qos_node_create_sched_element(node, sched_ctx, extack);
+}
+
static u32 esw_qos_calculate_min_rate_divider(struct mlx5_eswitch *esw,
struct mlx5_esw_sched_node *parent)
{
@@ -379,6 +416,31 @@ static int esw_qos_vport_create_sched_element(struct mlx5_esw_sched_node *vport_
return esw_qos_node_create_sched_element(vport_node, sched_ctx, extack);
}
+static int esw_qos_vport_tc_create_sched_element(struct mlx5_esw_sched_node *vport_tc_node,
+ u32 rate_limit_elem_ix,
+ struct netlink_ext_ack *extack)
+{
+ u32 sched_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {};
+ struct mlx5_core_dev *dev = vport_tc_node->esw->dev;
+ void *attr;
+
+ if (!mlx5_qos_element_type_supported(dev,
+ SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC,
+ SCHEDULING_HIERARCHY_E_SWITCH))
+ return -EOPNOTSUPP;
+
+ MLX5_SET(scheduling_context, sched_ctx, element_type,
+ SCHEDULING_CONTEXT_ELEMENT_TYPE_VPORT_TC);
+ attr = MLX5_ADDR_OF(scheduling_context, sched_ctx, element_attributes);
+ MLX5_SET(vport_tc_element, attr, vport_number, vport_tc_node->vport->vport);
+ MLX5_SET(vport_tc_element, attr, traffic_class, vport_tc_node->tc);
+ MLX5_SET(scheduling_context, sched_ctx, max_bw_obj_id, rate_limit_elem_ix);
+ MLX5_SET(scheduling_context, sched_ctx, parent_element_id, vport_tc_node->parent->ix);
+ MLX5_SET(scheduling_context, sched_ctx, bw_share, vport_tc_node->bw_share);
+
+ return esw_qos_node_create_sched_element(vport_tc_node, sched_ctx, extack);
+}
+
static struct mlx5_esw_sched_node *
__esw_qos_alloc_node(struct mlx5_eswitch *esw, u32 tsar_ix, enum sched_node_type type,
struct mlx5_esw_sched_node *parent)
@@ -575,12 +637,169 @@ static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node *node,
return -EOPNOTSUPP;
}
+static int esw_qos_create_vport_tc_sched_node(struct mlx5_vport *vport,
+ u32 rate_limit_elem_ix,
+ struct mlx5_esw_sched_node *vports_tc_node,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node;
+ struct mlx5_esw_sched_node *vport_tc_node;
+ u8 tc = vports_tc_node->tc;
+ int err;
+
+ vport_tc_node = __esw_qos_alloc_node(vport_node->esw, 0, SCHED_NODE_TYPE_VPORT_TC,
+ vports_tc_node);
+ if (!vport_tc_node)
+ return -ENOMEM;
+
+ vport_tc_node->min_rate = vport_node->min_rate;
+ vport_tc_node->tc = tc;
+ vport_tc_node->vport = vport;
+ err = esw_qos_vport_tc_create_sched_element(vport_tc_node, rate_limit_elem_ix, extack);
+ if (err)
+ goto err_out;
+
+ vport->qos.sched_nodes[tc] = vport_tc_node;
+
+ return 0;
+err_out:
+ __esw_qos_free_node(vport_tc_node);
+ return err;
+}
+
+static void esw_qos_destroy_vport_tc_sched_elements(struct mlx5_vport *vport,
+ struct netlink_ext_ack *extack)
+{
+ int i, num_tcs = esw_qos_num_tcs(vport->qos.sched_node->esw->dev);
+
+ for (i = 0; i < num_tcs; i++) {
+ if (vport->qos.sched_nodes[i])
+ __esw_qos_destroy_node(vport->qos.sched_nodes[i], extack);
+ }
+
+ kfree(vport->qos.sched_nodes);
+ vport->qos.sched_nodes = NULL;
+}
+
+static int esw_qos_create_vport_tc_sched_elements(struct mlx5_vport *vport,
+ enum sched_node_type type,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node;
+ struct mlx5_esw_sched_node *tc_arbiter_node, *vports_tc_node;
+ int err, num_tcs = esw_qos_num_tcs(vport_node->esw->dev);
+ u32 rate_limit_elem_ix;
+
+ vport->qos.sched_nodes = kcalloc(num_tcs, sizeof(struct mlx5_esw_sched_node *), GFP_KERNEL);
+ if (!vport->qos.sched_nodes) {
+ NL_SET_ERR_MSG_MOD(extack, "Allocating the vport TC scheduling elements failed.");
+ return -ENOMEM;
+ }
+
+ rate_limit_elem_ix = type == SCHED_NODE_TYPE_RATE_LIMITER ? vport_node->ix : 0;
+ tc_arbiter_node = type == SCHED_NODE_TYPE_RATE_LIMITER ? vport_node->parent : vport_node;
+ list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) {
+ err = esw_qos_create_vport_tc_sched_node(vport, rate_limit_elem_ix, vports_tc_node,
+ extack);
+ if (err)
+ goto err_create_vport_tc;
+ }
+
+ return 0;
+
+err_create_vport_tc:
+ esw_qos_destroy_vport_tc_sched_elements(vport, NULL);
+
+ return err;
+}
+
+static int esw_qos_vport_tc_enable(struct mlx5_vport *vport, enum sched_node_type type,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node;
+ int err;
+
+ if (type == SCHED_NODE_TYPE_TC_ARBITER_TSAR &&
+ MLX5_CAP_QOS(vport_node->esw->dev, log_esw_max_sched_depth) < 2) {
+ NL_SET_ERR_MSG_MOD(extack, "Setting up TC Arbiter for a vport is not supported.");
+ return -EOPNOTSUPP;
+ }
+
+ esw_assert_qos_lock_held(vport->dev->priv.eswitch);
+
+ if (type == SCHED_NODE_TYPE_RATE_LIMITER)
+ err = esw_qos_create_rate_limit_element(vport_node, extack);
+ else
+ err = esw_qos_tc_arbiter_scheduling_setup(vport_node, extack);
+ if (err)
+ return err;
+
+ /* Rate limiters impact multiple nodes not directly connected to them
+ * and are not direct members of the QoS hierarchy.
+ * Unlink it from the parent to reflect that.
+ */
+ if (type == SCHED_NODE_TYPE_RATE_LIMITER)
+ list_del_init(&vport_node->entry);
+
+ err = esw_qos_create_vport_tc_sched_elements(vport, type, extack);
+ if (err)
+ goto err_sched_nodes;
+
+ return 0;
+
+err_sched_nodes:
+ if (type == SCHED_NODE_TYPE_RATE_LIMITER) {
+ esw_qos_node_destroy_sched_element(vport_node, NULL);
+ list_add_tail(&vport_node->entry, &vport_node->parent->children);
+ } else {
+ esw_qos_tc_arbiter_scheduling_teardown(vport_node, NULL);
+ }
+ return err;
+}
+
+static void esw_qos_vport_tc_disable(struct mlx5_vport *vport, struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node;
+ enum sched_node_type curr_type = vport_node->type;
+
+ esw_qos_destroy_vport_tc_sched_elements(vport, extack);
+
+ if (curr_type == SCHED_NODE_TYPE_RATE_LIMITER)
+ esw_qos_node_destroy_sched_element(vport_node, extack);
+ else
+ esw_qos_tc_arbiter_scheduling_teardown(vport_node, extack);
+}
+
+static int esw_qos_set_vport_tcs_min_rate(struct mlx5_vport *vport, u32 min_rate,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node;
+ int err, i, num_tcs = esw_qos_num_tcs(vport_node->esw->dev);
+
+ for (i = 0; i < num_tcs; i++) {
+ err = esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], min_rate, extack);
+ if (err)
+ goto err_out;
+ }
+ vport_node->min_rate = min_rate;
+
+ return 0;
+err_out:
+ for (--i; i >= 0; i--)
+ esw_qos_set_node_min_rate(vport->qos.sched_nodes[i], vport_node->min_rate, extack);
+ return err;
+}
+
static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_ack *extack)
{
struct mlx5_esw_sched_node *vport_node = vport->qos.sched_node;
struct mlx5_esw_sched_node *parent = vport_node->parent;
+ enum sched_node_type curr_type = vport_node->type;
- esw_qos_node_destroy_sched_element(vport_node, extack);
+ if (curr_type == SCHED_NODE_TYPE_VPORT)
+ esw_qos_node_destroy_sched_element(vport_node, extack);
+ else
+ esw_qos_vport_tc_disable(vport, extack);
vport_node->bw_share = 0;
list_del_init(&vport_node->entry);
@@ -589,7 +808,8 @@ static void esw_qos_vport_disable(struct mlx5_vport *vport, struct netlink_ext_a
trace_mlx5_esw_vport_qos_destroy(vport_node->esw->dev, vport);
}
-static int esw_qos_vport_enable(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent,
+static int esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_type type,
+ struct mlx5_esw_sched_node *parent,
struct netlink_ext_ack *extack)
{
int err;
@@ -597,10 +817,14 @@ static int esw_qos_vport_enable(struct mlx5_vport *vport, struct mlx5_esw_sched_
esw_assert_qos_lock_held(vport->dev->priv.eswitch);
esw_qos_node_set_parent(vport->qos.sched_node, parent);
- err = esw_qos_vport_create_sched_element(vport->qos.sched_node, extack);
+ if (type == SCHED_NODE_TYPE_VPORT)
+ err = esw_qos_vport_create_sched_element(vport->qos.sched_node, extack);
+ else
+ err = esw_qos_vport_tc_enable(vport, type, extack);
if (err)
return err;
+ vport->qos.sched_node->type = type;
esw_qos_normalize_min_rate(parent->esw, parent, extack);
return 0;
@@ -628,7 +852,7 @@ static int mlx5_esw_qos_vport_enable(struct mlx5_vport *vport, enum sched_node_t
sched_node->min_rate = min_rate;
sched_node->vport = vport;
vport->qos.sched_node = sched_node;
- err = esw_qos_vport_enable(vport, parent, extack);
+ err = esw_qos_vport_enable(vport, type, parent, extack);
if (err)
esw_qos_put(esw);
@@ -680,6 +904,8 @@ static int mlx5_esw_qos_set_vport_min_rate(struct mlx5_vport *vport, u32 min_rat
if (!vport_node)
return mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, NULL, 0, min_rate,
extack);
+ else if (vport_node->type == SCHED_NODE_TYPE_RATE_LIMITER)
+ return esw_qos_set_vport_tcs_min_rate(vport, min_rate, extack);
else
return esw_qos_set_node_min_rate(vport_node, min_rate, extack);
}
@@ -712,12 +938,59 @@ bool mlx5_esw_qos_get_vport_rate(struct mlx5_vport *vport, u32 *max_rate, u32 *m
return enabled;
}
+static int esw_qos_vport_tc_check_type(enum sched_node_type curr_type,
+ enum sched_node_type new_type,
+ struct netlink_ext_ack *extack)
+{
+ if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR &&
+ new_type == SCHED_NODE_TYPE_RATE_LIMITER) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Cannot switch from vport-level TC arbitration to node-level TC arbitration");
+ return -EOPNOTSUPP;
+ }
+
+ if (curr_type == SCHED_NODE_TYPE_RATE_LIMITER &&
+ new_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Cannot switch from node-level TC arbitration to vport-level TC arbitration");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int esw_qos_vport_update(struct mlx5_vport *vport, enum sched_node_type type,
+ struct mlx5_esw_sched_node *parent,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *curr_parent = vport->qos.sched_node->parent;
+ enum sched_node_type curr_type = vport->qos.sched_node->type;
+ int err;
+
+ esw_assert_qos_lock_held(vport->dev->priv.eswitch);
+ parent = parent ?: curr_parent;
+ if (curr_type == type && curr_parent == parent)
+ return 0;
+
+ err = esw_qos_vport_tc_check_type(curr_type, type, extack);
+ if (err)
+ return err;
+
+ esw_qos_vport_disable(vport, extack);
+
+ err = esw_qos_vport_enable(vport, type, parent, extack);
+ if (err)
+ esw_qos_vport_enable(vport, curr_type, curr_parent, NULL);
+
+ return err;
+}
+
static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw_sched_node *parent,
struct netlink_ext_ack *extack)
{
struct mlx5_eswitch *esw = vport->dev->priv.eswitch;
struct mlx5_esw_sched_node *curr_parent;
- int err;
+ enum sched_node_type type;
esw_assert_qos_lock_held(esw);
curr_parent = vport->qos.sched_node->parent;
@@ -725,16 +998,17 @@ static int esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw
if (curr_parent == parent)
return 0;
- esw_qos_vport_disable(vport, extack);
-
- err = esw_qos_vport_enable(vport, parent, extack);
- if (err) {
- if (esw_qos_vport_enable(vport, curr_parent, NULL))
- esw_warn(parent->esw->dev, "vport restore QoS failed (vport=%d)\n",
- vport->vport);
- }
+ /* Set vport QoS type based on parent node type if different from default QoS;
+ * otherwise, use the vport's current QoS type.
+ */
+ if (parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR)
+ type = SCHED_NODE_TYPE_RATE_LIMITER;
+ else if (curr_parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR)
+ type = SCHED_NODE_TYPE_VPORT;
+ else
+ type = vport->qos.sched_node->type;
- return err;
+ return esw_qos_vport_update(vport, type, parent, extack);
}
static void esw_qos_switch_vport_tcs_to_vport(struct mlx5_esw_sched_node *tc_arbiter_node,
@@ -1025,6 +1299,14 @@ static bool esw_qos_validate_unsupported_tc_bw(struct mlx5_eswitch *esw, u32 *tc
return true;
}
+static bool esw_qos_vport_validate_unsupported_tc_bw(struct mlx5_vport *vport, u32 *tc_bw)
+{
+ struct mlx5_eswitch *esw = vport->qos.sched_node ?
+ vport->qos.sched_node->parent->esw : vport->dev->priv.eswitch;
+
+ return esw_qos_validate_unsupported_tc_bw(esw, tc_bw);
+}
+
static bool esw_qos_tc_bw_disabled(u32 *tc_bw)
{
int i;
@@ -1098,8 +1380,44 @@ int mlx5_esw_devlink_rate_leaf_tx_max_set(struct devlink_rate *rate_leaf, void *
int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, void *priv,
u32 *tc_bw, struct netlink_ext_ack *extack)
{
- NL_SET_ERR_MSG_MOD(extack, "TC bandwidth shares are not supported on leafs");
- return -EOPNOTSUPP;
+ struct mlx5_esw_sched_node *vport_node;
+ struct mlx5_vport *vport = priv;
+ struct mlx5_eswitch *esw;
+ bool disable;
+ int err = 0;
+
+ esw = vport->dev->priv.eswitch;
+ if (!mlx5_esw_allowed(esw))
+ return -EPERM;
+
+ disable = esw_qos_tc_bw_disabled(tc_bw);
+ esw_qos_lock(esw);
+
+ if (!esw_qos_vport_validate_unsupported_tc_bw(vport, tc_bw)) {
+ NL_SET_ERR_MSG_MOD(extack, "E-Switch traffic classes number is not supported");
+ err = -EOPNOTSUPP;
+ goto unlock;
+ }
+
+ vport_node = vport->qos.sched_node;
+ if (disable && !vport_node)
+ goto unlock;
+
+ if (disable && vport_node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR) {
+ err = esw_qos_vport_update(vport, SCHED_NODE_TYPE_VPORT, NULL, extack);
+ goto unlock;
+ }
+
+ if (!vport_node) {
+ err = mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_TC_ARBITER_TSAR, NULL, 0, 0,
+ extack);
+ vport_node = vport->qos.sched_node;
+ } else {
+ err = esw_qos_vport_update(vport, SCHED_NODE_TYPE_TC_ARBITER_TSAR, NULL, extack);
+ }
+unlock:
+ esw_qos_unlock(esw);
+ return err;
}
int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, void *priv,
@@ -1218,10 +1536,14 @@ int mlx5_esw_qos_vport_update_parent(struct mlx5_vport *vport, struct mlx5_esw_s
}
esw_qos_lock(esw);
- if (!vport->qos.sched_node && parent)
- err = mlx5_esw_qos_vport_enable(vport, SCHED_NODE_TYPE_VPORT, parent, 0, 0, extack);
- else if (vport->qos.sched_node)
+ if (!vport->qos.sched_node && parent) {
+ enum sched_node_type type = parent->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR ?
+ SCHED_NODE_TYPE_RATE_LIMITER : SCHED_NODE_TYPE_VPORT;
+
+ err = mlx5_esw_qos_vport_enable(vport, type, parent, 0, 0, extack);
+ } else if (vport->qos.sched_node) {
err = esw_qos_vport_update_parent(vport, parent, extack);
+ }
esw_qos_unlock(esw);
return err;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index a83d41121db6..9b0be25631df 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -212,10 +212,19 @@ struct mlx5_vport {
struct mlx5_vport_info info;
- /* Protected with the E-Switch qos domain lock. */
+ /* Protected with the E-Switch qos domain lock. The Vport QoS can
+ * either be disabled (sched_node is NULL) or in one of three states:
+ * 1. Regular QoS (sched_node is a vport node).
+ * 2. TC QoS enabled on the vport (sched_node is a TC arbiter).
+ * 3. TC QoS enabled on the vport's parent node
+ * (sched_node is a rate limit node).
+ * When TC is enabled in either mode, the vport owns vport TC scheduling nodes.
+ */
struct {
- /* Vport scheduling element node. */
+ /* Vport scheduling node. */
struct mlx5_esw_sched_node *sched_node;
+ /* Array of vport traffic class scheduling nodes. */
+ struct mlx5_esw_sched_node **sched_nodes;
} qos;
u16 vport;
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH net-next V5 11/11] net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (9 preceding siblings ...)
2024-12-04 22:09 ` [PATCH net-next V5 10/11] net/mlx5: Add traffic class scheduling support for vport QoS Tariq Toukan
@ 2024-12-04 22:09 ` Tariq Toukan
2024-12-05 9:23 ` [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Leon Romanovsky
2024-12-07 2:13 ` Jakub Kicinski
12 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2024-12-04 22:09 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky
Cc: netdev, Saeed Mahameed, Gal Pressman, linux-rdma, Carolina Jubran,
Cosmin Ratiu, Tariq Toukan
From: Carolina Jubran <cjubran@nvidia.com>
Introduce support for managing Traffic Class (TC) arbiter nodes and
associated vports TC nodes within the E-Switch QoS hierarchy. This
patch adds support for the new scheduling node type,
`SCHED_NODE_TYPE_VPORTS_TC_TSAR`, and implements full support for
setting tc-bw on both vports and nodes.
Key changes include:
- Introduced the new scheduling node type,
`SCHED_NODE_TYPE_VPORTS_TC_TSAR`, for managing vports within the TC
arbiter node.
- New helper functions for creating and destroying vports TC nodes
under the TC arbiter.
- Updated the minimum rate normalization function to skip nodes of type
`SCHED_NODE_TYPE_VPORTS_TC_TSAR`. Vports TC TSARs have bandwidth
shares configured on them but not minimum rates, so their `min_rate`
cannot be normalized.
- Implementation of `esw_qos_tc_arbiter_scheduling_setup()` and
`esw_qos_tc_arbiter_scheduling_teardown()` for initializing and
cleaning up TC arbiter scheduling elements. These functions now fully
support tc-bw configuration on TC arbiter nodes.
- Added `esw_qos_tc_arbiter_get_bw_shares()` and
`esw_qos_set_tc_arbiter_bw_shares()` to handle the settings of
bandwidth shares for vports traffic class TSARs.
- Refactored `mlx5_esw_devlink_rate_node_tc_bw_set()` and
`mlx5_esw_devlink_rate_leaf_tc_bw_set()` to fully support configuring
tc-bw on devlink rate nodes and vports, respectively.
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 185 +++++++++++++++++-
1 file changed, 180 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
index afb00deaae16..87c9789c2836 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/qos.c
@@ -67,6 +67,7 @@ enum sched_node_type {
SCHED_NODE_TYPE_TC_ARBITER_TSAR,
SCHED_NODE_TYPE_RATE_LIMITER,
SCHED_NODE_TYPE_VPORT_TC,
+ SCHED_NODE_TYPE_VPORTS_TC_TSAR,
};
static const char * const sched_node_type_str[] = {
@@ -75,6 +76,7 @@ static const char * const sched_node_type_str[] = {
[SCHED_NODE_TYPE_TC_ARBITER_TSAR] = "TC Arbiter TSAR",
[SCHED_NODE_TYPE_RATE_LIMITER] = "Rate Limiter",
[SCHED_NODE_TYPE_VPORT_TC] = "vport TC",
+ [SCHED_NODE_TYPE_VPORTS_TC_TSAR] = "vports TC TSAR",
};
struct mlx5_esw_sched_node {
@@ -159,6 +161,11 @@ mlx5_esw_qos_vport_get_parent(const struct mlx5_vport *vport)
static void esw_qos_sched_elem_warn(struct mlx5_esw_sched_node *node, int err, const char *op)
{
switch (node->type) {
+ case SCHED_NODE_TYPE_VPORTS_TC_TSAR:
+ esw_warn(node->esw->dev,
+ "E-Switch %s %s scheduling element failed (tc=%d,err=%d)\n",
+ op, sched_node_type_str[node->type], node->tc, err);
+ break;
case SCHED_NODE_TYPE_VPORT_TC:
esw_warn(node->esw->dev,
"E-Switch %s %s scheduling element failed (vport=%d,tc=%d,err=%d)\n",
@@ -344,7 +351,11 @@ static void esw_qos_normalize_min_rate(struct mlx5_eswitch *esw,
if (node->esw != esw || node->ix == esw->qos.root_tsar_ix)
continue;
- esw_qos_update_sched_node_bw_share(node, divider, extack);
+ /* Vports TC TSARs don't have a minimum rate configured,
+ * so there's no need to update the bw_share on them.
+ */
+ if (node->type != SCHED_NODE_TYPE_VPORTS_TC_TSAR)
+ esw_qos_update_sched_node_bw_share(node, divider, extack);
if (list_empty(&node->children))
continue;
@@ -476,6 +487,129 @@ static void esw_qos_destroy_node(struct mlx5_esw_sched_node *node, struct netlin
__esw_qos_free_node(node);
}
+static int esw_qos_create_vports_tc_node(struct mlx5_esw_sched_node *parent, u8 tc,
+ struct netlink_ext_ack *extack)
+{
+ u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {};
+ struct mlx5_core_dev *dev = parent->esw->dev;
+ struct mlx5_esw_sched_node *vports_tc_node;
+ void *attr;
+ int err;
+
+ if (!mlx5_qos_element_type_supported(dev,
+ SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR,
+ SCHEDULING_HIERARCHY_E_SWITCH) ||
+ !mlx5_qos_tsar_type_supported(dev,
+ TSAR_ELEMENT_TSAR_TYPE_DWRR,
+ SCHEDULING_HIERARCHY_E_SWITCH))
+ return -EOPNOTSUPP;
+
+ vports_tc_node = __esw_qos_alloc_node(parent->esw, 0, SCHED_NODE_TYPE_VPORTS_TC_TSAR,
+ parent);
+ if (!vports_tc_node) {
+ NL_SET_ERR_MSG_MOD(extack, "E-Switch alloc node failed");
+ esw_warn(dev, "Failed to alloc vports TC node (tc=%d)\n", tc);
+ return -ENOMEM;
+ }
+
+ attr = MLX5_ADDR_OF(scheduling_context, tsar_ctx, element_attributes);
+ MLX5_SET(tsar_element, attr, tsar_type, TSAR_ELEMENT_TSAR_TYPE_DWRR);
+ MLX5_SET(tsar_element, attr, traffic_class, tc);
+ MLX5_SET(scheduling_context, tsar_ctx, parent_element_id, parent->ix);
+ MLX5_SET(scheduling_context, tsar_ctx, element_type, SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR);
+
+ err = esw_qos_node_create_sched_element(vports_tc_node, tsar_ctx, extack);
+ if (err)
+ goto err_create_sched_element;
+
+ vports_tc_node->tc = tc;
+
+ return 0;
+
+err_create_sched_element:
+ __esw_qos_free_node(vports_tc_node);
+ return err;
+}
+
+static void
+esw_qos_tc_arbiter_get_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node, u32 *tc_bw)
+{
+ struct mlx5_esw_sched_node *vports_tc_node;
+
+ list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry)
+ tc_bw[vports_tc_node->tc] = vports_tc_node->bw_share;
+}
+
+static void esw_qos_set_tc_arbiter_bw_shares(struct mlx5_esw_sched_node *tc_arbiter_node,
+ u32 *tc_bw, struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *vports_tc_node;
+
+ list_for_each_entry(vports_tc_node, &tc_arbiter_node->children, entry) {
+ u32 bw_share;
+ u8 tc;
+
+ tc = vports_tc_node->tc;
+ bw_share = tc_bw[tc] ?: MLX5_MIN_BW_SHARE;
+ esw_qos_sched_elem_config(vports_tc_node, 0, bw_share, extack);
+ }
+}
+
+static void esw_qos_destroy_vports_tc_nodes(struct mlx5_esw_sched_node *tc_arbiter_node,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_esw_sched_node *vports_tc_node, *tmp;
+
+ list_for_each_entry_safe(vports_tc_node, tmp, &tc_arbiter_node->children, entry)
+ esw_qos_destroy_node(vports_tc_node, extack);
+}
+
+static int esw_qos_create_vports_tc_nodes(struct mlx5_esw_sched_node *tc_arbiter_node,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_eswitch *esw = tc_arbiter_node->esw;
+ int err, i, num_tcs = esw_qos_num_tcs(esw->dev);
+
+ for (i = 0; i < num_tcs; i++) {
+ err = esw_qos_create_vports_tc_node(tc_arbiter_node, i, extack);
+ if (err)
+ goto err_tc_node_create;
+ }
+
+ return 0;
+
+err_tc_node_create:
+ esw_qos_destroy_vports_tc_nodes(tc_arbiter_node, NULL);
+ return err;
+}
+
+static int esw_qos_create_tc_arbiter_sched_elem(struct mlx5_esw_sched_node *tc_arbiter_node,
+ struct netlink_ext_ack *extack)
+{
+ u32 tsar_ctx[MLX5_ST_SZ_DW(scheduling_context)] = {};
+ u32 tsar_parent_ix;
+ void *attr;
+
+ if (!mlx5_qos_tsar_type_supported(tc_arbiter_node->esw->dev,
+ TSAR_ELEMENT_TSAR_TYPE_TC_ARB,
+ SCHEDULING_HIERARCHY_E_SWITCH)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "E-Switch TC Arbiter scheduling element is not supported");
+ return -EOPNOTSUPP;
+ }
+
+ attr = MLX5_ADDR_OF(scheduling_context, tsar_ctx, element_attributes);
+ MLX5_SET(tsar_element, attr, tsar_type, TSAR_ELEMENT_TSAR_TYPE_TC_ARB);
+ tsar_parent_ix = tc_arbiter_node->parent ? tc_arbiter_node->parent->ix :
+ tc_arbiter_node->esw->qos.root_tsar_ix;
+ MLX5_SET(scheduling_context, tsar_ctx, parent_element_id, tsar_parent_ix);
+ MLX5_SET(scheduling_context, tsar_ctx, element_type, SCHEDULING_CONTEXT_ELEMENT_TYPE_TSAR);
+ MLX5_SET(scheduling_context, tsar_ctx, max_average_bw, tc_arbiter_node->max_rate);
+ MLX5_SET(scheduling_context, tsar_ctx, bw_share, tc_arbiter_node->bw_share);
+
+ return esw_qos_node_create_sched_element(tc_arbiter_node, tsar_ctx, extack);
+}
+
static struct mlx5_esw_sched_node *
__esw_qos_create_vports_sched_node(struct mlx5_eswitch *esw, struct mlx5_esw_sched_node *parent,
struct netlink_ext_ack *extack)
@@ -539,6 +673,9 @@ static void __esw_qos_destroy_node(struct mlx5_esw_sched_node *node, struct netl
{
struct mlx5_eswitch *esw = node->esw;
+ if (node->type == SCHED_NODE_TYPE_TC_ARBITER_TSAR)
+ esw_qos_destroy_vports_tc_nodes(node, extack);
+
trace_mlx5_esw_node_qos_destroy(esw->dev, node, node->ix);
esw_qos_destroy_node(node, extack);
esw_qos_normalize_min_rate(esw, NULL, extack);
@@ -628,13 +765,38 @@ static void esw_qos_put(struct mlx5_eswitch *esw)
static void esw_qos_tc_arbiter_scheduling_teardown(struct mlx5_esw_sched_node *node,
struct netlink_ext_ack *extack)
-{}
+{
+ /* Clean up all Vports TC nodes within the TC arbiter node. */
+ esw_qos_destroy_vports_tc_nodes(node, extack);
+ /* Destroy the scheduling element for the TC arbiter node itself. */
+ esw_qos_node_destroy_sched_element(node, extack);
+}
static int esw_qos_tc_arbiter_scheduling_setup(struct mlx5_esw_sched_node *node,
struct netlink_ext_ack *extack)
{
- NL_SET_ERR_MSG_MOD(extack, "TC arbiter elements are not supported.");
- return -EOPNOTSUPP;
+ u32 curr_ix = node->ix;
+ int err;
+
+ err = esw_qos_create_tc_arbiter_sched_elem(node, extack);
+ if (err)
+ return err;
+ /* Initialize the vports TC nodes within created TC arbiter TSAR. */
+ err = esw_qos_create_vports_tc_nodes(node, extack);
+ if (err)
+ goto err_vports_tc_nodes;
+
+ node->type = SCHED_NODE_TYPE_TC_ARBITER_TSAR;
+
+ return 0;
+
+err_vports_tc_nodes:
+ /* If initialization fails, clean up the scheduling element
+ * for the TC arbiter node.
+ */
+ esw_qos_node_destroy_sched_element(node, NULL);
+ node->ix = curr_ix;
+ return err;
}
static int esw_qos_create_vport_tc_sched_node(struct mlx5_vport *vport,
@@ -965,6 +1127,7 @@ static int esw_qos_vport_update(struct mlx5_vport *vport, enum sched_node_type t
{
struct mlx5_esw_sched_node *curr_parent = vport->qos.sched_node->parent;
enum sched_node_type curr_type = vport->qos.sched_node->type;
+ u32 curr_tc_bw[IEEE_8021QAZ_MAX_TCS] = {0};
int err;
esw_assert_qos_lock_held(vport->dev->priv.eswitch);
@@ -976,11 +1139,19 @@ static int esw_qos_vport_update(struct mlx5_vport *vport, enum sched_node_type t
if (err)
return err;
+ if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type)
+ esw_qos_tc_arbiter_get_bw_shares(vport->qos.sched_node, curr_tc_bw);
+
esw_qos_vport_disable(vport, extack);
err = esw_qos_vport_enable(vport, type, parent, extack);
- if (err)
+ if (err) {
esw_qos_vport_enable(vport, curr_type, curr_parent, NULL);
+ extack = NULL;
+ }
+
+ if (curr_type == SCHED_NODE_TYPE_TC_ARBITER_TSAR && curr_type == type)
+ esw_qos_set_tc_arbiter_bw_shares(vport->qos.sched_node, curr_tc_bw, extack);
return err;
}
@@ -1415,6 +1586,8 @@ int mlx5_esw_devlink_rate_leaf_tc_bw_set(struct devlink_rate *rate_leaf, void *p
} else {
err = esw_qos_vport_update(vport, SCHED_NODE_TYPE_TC_ARBITER_TSAR, NULL, extack);
}
+ if (!err)
+ esw_qos_set_tc_arbiter_bw_shares(vport_node, tc_bw, extack);
unlock:
esw_qos_unlock(esw);
return err;
@@ -1441,6 +1614,8 @@ int mlx5_esw_devlink_rate_node_tc_bw_set(struct devlink_rate *rate_node, void *p
}
err = esw_qos_node_enable_tc_arbitration(node, extack);
+ if (!err)
+ esw_qos_set_tc_arbiter_bw_shares(node, tc_bw, extack);
unlock:
esw_qos_unlock(esw);
return err;
--
2.44.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (10 preceding siblings ...)
2024-12-04 22:09 ` [PATCH net-next V5 11/11] net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw Tariq Toukan
@ 2024-12-05 9:23 ` Leon Romanovsky
2024-12-07 2:13 ` Jakub Kicinski
12 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2024-12-05 9:23 UTC (permalink / raw)
To: Tariq Toukan, Jakub Kicinski
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn, netdev,
Saeed Mahameed, Gal Pressman, linux-rdma
On Thu, Dec 05, 2024 at 12:09:20AM +0200, Tariq Toukan wrote:
> Hi,
>
> This patchset starts with 4 patches that modify the IFC, targeted to
> mlx5-next in order to be taken to rdma-next branch side sooner than in
> the next merge window.
>
> This patchset consists of two features:
> 1. In patches 5-6, Itamar adds SW Steering support for ConnectX-8.
> 2. Followed by patches by Carolina that add rate management support on
> traffic classes in devlink and mlx5, more details below [1].
>
> Series generated against:
> commit bb18265c3aba ("r8169: remove support for chip version 11")
>
> Regards,
> Tariq
<...>
> Carolina Jubran (6):
> net/mlx5: Add support for new scheduling elements
>
> Cosmin Ratiu (2):
> net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits
> net/mlx5: qos: Add ifc support for cross-esw scheduling
>
> Yevgeny Kliteynik (1):
> net/mlx5: Add ConnectX-8 device to ifc
I applied these IFC patches to our mlx5-next shared branch.
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-next
Thanks
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2024-12-04 22:09 ` [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management Tariq Toukan
@ 2024-12-07 2:10 ` Jakub Kicinski
2024-12-09 21:03 ` Tariq Toukan
0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2024-12-07 2:10 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Leon Romanovsky, netdev, Saeed Mahameed, Gal Pressman, linux-rdma,
Carolina Jubran, Cosmin Ratiu, Jiri Pirko
On Thu, 5 Dec 2024 00:09:27 +0200 Tariq Toukan wrote:
> + min: 0
> + max: 100
Are full percentage points sufficient granularity?
> + if (!tb[DEVLINK_ATTR_RATE_TC_INDEX]) {
NL_SET_ERR_ATTR_MISS()
Please limit the string messages where error can be expressed in
machine readable form.
> + NL_SET_ERR_MSG(extack, "Traffic class index is expected");
> + return -EINVAL;
> + }
> +
> + tc_index = nla_get_u8(tb[DEVLINK_ATTR_RATE_TC_INDEX]);
> +
> + if (tc_index >= IEEE_8021QAZ_MAX_TCS) {
This can't be enforced by the policy?
> + NL_SET_ERR_MSG_FMT(extack,
> + "Provided traffic class index (%u) exceeds the maximum allowed value (%u)",
> + tc_index, IEEE_8021QAZ_MAX_TCS - 1);
> + return -EINVAL;
> + }
> +
> + if (!tb[DEVLINK_ATTR_RATE_TC_BW]) {
> + NL_SET_ERR_MSG(extack, "Traffic class bandwidth is expected");
> + return -EINVAL;
> + }
> +
> + if (test_and_set_bit(tc_index, bitmap)) {
> + NL_SET_ERR_MSG(extack, "Duplicate traffic class index specified");
always try to point to attr that caused the issue
> + return -EINVAL;
> + }
--
pw-bot: cr
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
` (11 preceding siblings ...)
2024-12-05 9:23 ` [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Leon Romanovsky
@ 2024-12-07 2:13 ` Jakub Kicinski
2024-12-09 19:32 ` Tariq Toukan
12 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2024-12-07 2:13 UTC (permalink / raw)
To: Tariq Toukan
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Leon Romanovsky, netdev, Saeed Mahameed, Gal Pressman, linux-rdma
On Thu, 5 Dec 2024 00:09:20 +0200 Tariq Toukan wrote:
> This patch series extends the devlink-rate API to support traffic class
> (TC) bandwidth management, enabling more granular control over traffic
> shaping and rate limiting across multiple TCs. The API now allows users
> to specify bandwidth proportions for different traffic classes in a
> single command. This is particularly useful for managing Enhanced
> Transmission Selection (ETS) for groups of Virtual Functions (VFs),
> allowing precise bandwidth allocation across traffic classes.
>
> Additionally the series refines the QoS handling in net/mlx5 to support
> TC arbitration and bandwidth management on vports and rate nodes.
>
> Extend devlink-rate API to support rate management on TCs:
> - devlink: Extend the devlink rate API to support traffic class
> bandwidth management
>
> Introduce a no-op implementation:
> - net/mlx5: Add no-op implementation for setting tc-bw on rate objects
>
> Add support for enabling and disabling TC QoS on vports and nodes:
> - net/mlx5: Add support for setting tc-bw on nodes
> - net/mlx5: Add traffic class scheduling support for vport QoS
>
> Support for setting tc-bw on rate objects:
> - net/mlx5: Manage TC arbiter nodes and implement full support for
> tc-bw
Do you expect TC bw allocation to work on non-leaf nodes?
How does this relate to the rate API which Paolo added? He was asked
to build in a way to integrate with devlink now devlink is growing
extra features again, which presumably the other API will also need.
And the integration may turn out to be challenging.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
2024-12-07 2:13 ` Jakub Kicinski
@ 2024-12-09 19:32 ` Tariq Toukan
2024-12-09 21:41 ` Jakub Kicinski
0 siblings, 1 reply; 32+ messages in thread
From: Tariq Toukan @ 2024-12-09 19:32 UTC (permalink / raw)
To: Jakub Kicinski, Tariq Toukan
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Leon Romanovsky, netdev, Saeed Mahameed, Gal Pressman, linux-rdma
On 07/12/2024 4:13, Jakub Kicinski wrote:
> On Thu, 5 Dec 2024 00:09:20 +0200 Tariq Toukan wrote:
>> This patch series extends the devlink-rate API to support traffic class
>> (TC) bandwidth management, enabling more granular control over traffic
>> shaping and rate limiting across multiple TCs. The API now allows users
>> to specify bandwidth proportions for different traffic classes in a
>> single command. This is particularly useful for managing Enhanced
>> Transmission Selection (ETS) for groups of Virtual Functions (VFs),
>> allowing precise bandwidth allocation across traffic classes.
>>
>> Additionally the series refines the QoS handling in net/mlx5 to support
>> TC arbitration and bandwidth management on vports and rate nodes.
>>
>> Extend devlink-rate API to support rate management on TCs:
>> - devlink: Extend the devlink rate API to support traffic class
>> bandwidth management
>>
>> Introduce a no-op implementation:
>> - net/mlx5: Add no-op implementation for setting tc-bw on rate objects
>>
>> Add support for enabling and disabling TC QoS on vports and nodes:
>> - net/mlx5: Add support for setting tc-bw on nodes
>> - net/mlx5: Add traffic class scheduling support for vport QoS
>>
>> Support for setting tc-bw on rate objects:
>> - net/mlx5: Manage TC arbiter nodes and implement full support for
>> tc-bw
>
> Do you expect TC bw allocation to work on non-leaf nodes?
>
Yes. That's the point. It works.
> How does this relate to the rate API which Paolo added? He was asked
> to build in a way to integrate with devlink now devlink is growing
> extra features again, which presumably the other API will also need.
> And the integration may turn out to be challenging.
>
AFAIU Paolo's work is not for shapers 'above' the network device level,
i.e. groups.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2024-12-07 2:10 ` Jakub Kicinski
@ 2024-12-09 21:03 ` Tariq Toukan
2024-12-09 21:27 ` Jakub Kicinski
0 siblings, 1 reply; 32+ messages in thread
From: Tariq Toukan @ 2024-12-09 21:03 UTC (permalink / raw)
To: Jakub Kicinski, Tariq Toukan
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Leon Romanovsky, netdev, Saeed Mahameed, Gal Pressman, linux-rdma,
Carolina Jubran, Cosmin Ratiu, Jiri Pirko
On 07/12/2024 4:10, Jakub Kicinski wrote:
> On Thu, 5 Dec 2024 00:09:27 +0200 Tariq Toukan wrote:
>> + min: 0
>> + max: 100
>
> Are full percentage points sufficient granularity?
>
Yes, we think so.
>> + if (!tb[DEVLINK_ATTR_RATE_TC_INDEX]) {
>
> NL_SET_ERR_ATTR_MISS()
>
> Please limit the string messages where error can be expressed in
> machine readable form.
>
I'll fix.
>> + NL_SET_ERR_MSG(extack, "Traffic class index is expected");
>> + return -EINVAL;
>> + }
>> +
>> + tc_index = nla_get_u8(tb[DEVLINK_ATTR_RATE_TC_INDEX]);
>> +
>> + if (tc_index >= IEEE_8021QAZ_MAX_TCS) {
>
> This can't be enforced by the policy?
>
If we enforce by policy we need to use the constant 7, not the macro
IEEE_8021QAZ_MAX_TCS-1.
I'll keep it.
>> + NL_SET_ERR_MSG_FMT(extack,
>> + "Provided traffic class index (%u) exceeds the maximum allowed value (%u)",
>> + tc_index, IEEE_8021QAZ_MAX_TCS - 1);
>> + return -EINVAL;
>> + }
>> +
>> + if (!tb[DEVLINK_ATTR_RATE_TC_BW]) {
>> + NL_SET_ERR_MSG(extack, "Traffic class bandwidth is expected");
>> + return -EINVAL;
>> + }
>> +
>> + if (test_and_set_bit(tc_index, bitmap)) {
>> + NL_SET_ERR_MSG(extack, "Duplicate traffic class index specified");
>
> always try to point to attr that caused the issue
>
I'll fix.
>> + return -EINVAL;
>> + }
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2024-12-09 21:03 ` Tariq Toukan
@ 2024-12-09 21:27 ` Jakub Kicinski
2025-01-20 11:55 ` Carolina Jubran
0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2024-12-09 21:27 UTC (permalink / raw)
To: Tariq Toukan
Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky, netdev, Saeed Mahameed,
Gal Pressman, linux-rdma, Carolina Jubran, Cosmin Ratiu,
Jiri Pirko
On Mon, 9 Dec 2024 23:03:04 +0200 Tariq Toukan wrote:
> >> + tc_index = nla_get_u8(tb[DEVLINK_ATTR_RATE_TC_INDEX]);
> >> +
> >> + if (tc_index >= IEEE_8021QAZ_MAX_TCS) {
> >
> > This can't be enforced by the policy?
> >
>
> If we enforce by policy we need to use the constant 7, not the macro
> IEEE_8021QAZ_MAX_TCS-1.
> I'll keep it.
The spec should support using "foreign constants"
Off the top of my head - you can define the ieee-8021qaz-max-tcs contant
as if you were defining a devlink constant, then add a header:
attribute. This will tell C codegen to include that header instead of
generating the definition.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
2024-12-09 19:32 ` Tariq Toukan
@ 2024-12-09 21:41 ` Jakub Kicinski
2024-12-11 9:49 ` Cosmin Ratiu
0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2024-12-09 21:41 UTC (permalink / raw)
To: Tariq Toukan
Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky, netdev, Saeed Mahameed,
Gal Pressman, linux-rdma
On Mon, 9 Dec 2024 21:32:11 +0200 Tariq Toukan wrote:
> > Do you expect TC bw allocation to work on non-leaf nodes?
>
> Yes. That's the point. It works.
Let's level -- I'm not trying to be difficult, but you're defining
uAPI with little to no documentation. "It works" is not going to cut it.
> > How does this relate to the rate API which Paolo added? He was asked
> > to build in a way to integrate with devlink now devlink is growing
> > extra features again, which presumably the other API will also need.
> > And the integration may turn out to be challenging.
>
> AFAIU Paolo's work is not for shapers 'above' the network device level,
> i.e. groups.
What's the difference between queue group and a VF?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
2024-12-09 21:41 ` Jakub Kicinski
@ 2024-12-11 9:49 ` Cosmin Ratiu
2024-12-12 1:49 ` Jakub Kicinski
0 siblings, 1 reply; 32+ messages in thread
From: Cosmin Ratiu @ 2024-12-11 9:49 UTC (permalink / raw)
To: ttoukan.linux@gmail.com, jiri@resnulli.us, kuba@kernel.org
Cc: andrew+netdev@lunn.ch, linux-rdma@vger.kernel.org,
davem@davemloft.net, Tariq Toukan, Gal Pressman, Leon Romanovsky,
pabeni@redhat.com, edumazet@google.com, Saeed Mahameed,
netdev@vger.kernel.org
On Mon, 2024-12-09 at 13:41 -0800, Jakub Kicinski wrote:
> On Mon, 9 Dec 2024 21:32:11 +0200 Tariq Toukan wrote:
> > > Do you expect TC bw allocation to work on non-leaf nodes?
> >
> > Yes. That's the point. It works.
>
> Let's level -- I'm not trying to be difficult, but you're defining
> uAPI with little to no documentation. "It works" is not going to cut
> it.
The original intent was to document this in the devlink man page. But
we will add something in the kernel documentation as well in the next
submission.
>
> > > How does this relate to the rate API which Paolo added? He was
> > > asked
> > > to build in a way to integrate with devlink now devlink is
> > > growing
> > > extra features again, which presumably the other API will also
> > > need.
> > > And the integration may turn out to be challenging.
> >
> > AFAIU Paolo's work is not for shapers 'above' the network device
> > level,
> > i.e. groups.
>
> What's the difference between queue group and a VF?
>
I've looked over the latest version of the net-shapers API.
There is some conceptual overlap between this patchset and net-shapers
ability to define a group of device queues and manipulate its tx
limits. But as far as I am aware ([1]), the net-shapers API doesn't
intend to shape entities above netdev level.
So there are two things to discuss here:
1. Integrating device-level TC shaping into net-shapers. The net-
shapers model would need to be extended with the ability to define TC
queues. At the moment I see it's concerned with device tx queues which
don't necessarily map 1:1 to traffic classes.
Then, it would need to have the ability to group TC queues into a node.
Then the integration should be easy. Either API can call the device
driver implementation or one API can call the other's function to do
so.
Paolo, what are your thoughts on tc shaping in the net-shapers API?
2. VF-group TC shaping. The current patchset offers the ability to
split TC bandwidth on a devlink rate node, applying to all VFs in the
node. As far as I am aware, net-shapers doesn't intend to address this
use case. Do we want to have two completely different APIs to
manipulate tc bandwidth?
Cosmin.
[1]
https://lore.kernel.org/netdev/7195630a-1021-4e1e-b48b-a07945477863@redhat.com/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
2024-12-11 9:49 ` Cosmin Ratiu
@ 2024-12-12 1:49 ` Jakub Kicinski
2024-12-13 13:42 ` Cosmin Ratiu
0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2024-12-12 1:49 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: ttoukan.linux@gmail.com, jiri@resnulli.us, andrew+netdev@lunn.ch,
linux-rdma@vger.kernel.org, davem@davemloft.net, Tariq Toukan,
Gal Pressman, Leon Romanovsky, pabeni@redhat.com,
edumazet@google.com, Saeed Mahameed, netdev@vger.kernel.org
On Wed, 11 Dec 2024 09:49:28 +0000 Cosmin Ratiu wrote:
> I've looked over the latest version of the net-shapers API.
> There is some conceptual overlap between this patchset and net-shapers
> ability to define a group of device queues and manipulate its tx
> limits. But as far as I am aware ([1]), the net-shapers API doesn't
> intend to shape entities above netdev level.
It's not about the uAPI but about having a uniform way of representing
the shaping hierarchy.
> So there are two things to discuss here:
> 1. Integrating device-level TC shaping into net-shapers. The net-
> shapers model would need to be extended with the ability to define TC
> queues. At the moment I see it's concerned with device tx queues which
> don't necessarily map 1:1 to traffic classes.
What are "TC queues"? NIC queues with assigned TC? Your patches shape
on a group of VFs, so the equivalent would be a group of queues
(e.g. group of queues assigned to a container).
> Then, it would need to have the ability to group TC queues into a node.
🧐️ .. grouping of queues was the main direct use case for net-shapers,
so it's definitely there, perhaps I don't understand what you mean.
> Then the integration should be easy. Either API can call the device
> driver implementation or one API can call the other's function to do
> so.
>
> Paolo, what are your thoughts on tc shaping in the net-shapers API?
>
> 2. VF-group TC shaping. The current patchset offers the ability to
> split TC bandwidth on a devlink rate node, applying to all VFs in the
> node. As far as I am aware, net-shapers doesn't intend to address this
> use case. Do we want to have two completely different APIs to
> manipulate tc bandwidth?
Exactly my point. We have too many disjoint APIs. net-shapers was
merged on the premise that it will at least align the internal and
driver facing APIs, even if we still need multiple uAPIs.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
2024-12-12 1:49 ` Jakub Kicinski
@ 2024-12-13 13:42 ` Cosmin Ratiu
2024-12-17 5:45 ` Jakub Kicinski
0 siblings, 1 reply; 32+ messages in thread
From: Cosmin Ratiu @ 2024-12-13 13:42 UTC (permalink / raw)
To: kuba@kernel.org
Cc: andrew+netdev@lunn.ch, linux-rdma@vger.kernel.org,
ttoukan.linux@gmail.com, Tariq Toukan, Gal Pressman,
davem@davemloft.net, jiri@resnulli.us, Leon Romanovsky,
edumazet@google.com, pabeni@redhat.com, Saeed Mahameed,
netdev@vger.kernel.org
On Wed, 2024-12-11 at 17:49 -0800, Jakub Kicinski wrote:
> On Wed, 11 Dec 2024 09:49:28 +0000 Cosmin Ratiu wrote:
> > I've looked over the latest version of the net-shapers API.
> > There is some conceptual overlap between this patchset and net-
> > shapers
> > ability to define a group of device queues and manipulate its tx
> > limits. But as far as I am aware ([1]), the net-shapers API doesn't
> > intend to shape entities above netdev level.
>
> It's not about the uAPI but about having a uniform way of
> representing the shaping hierarchy.
I understand your point now.
> > So there are two things to discuss here:
> > 1. Integrating device-level TC shaping into net-shapers. The net-
> > shapers model would need to be extended with the ability to define
> > TC
> > queues. At the moment I see it's concerned with device tx queues
> > which
> > don't necessarily map 1:1 to traffic classes.
>
> What are "TC queues"? NIC queues with assigned TC? Your patches shape
> on a group of VFs, so the equivalent would be a group of queues
> (e.g. group of queues assigned to a container).
My terminology was slightly off. "TC queues" are a logical construct,
not necessarily corresponding to device queues. As far as I know,
packet traffic classes are determined with a variety of methods, and
can be encoded in the IP header (ToS) or as metadata in the tx
descriptor somewhere. I am not sure there's any correspondence with
device queues although one could define specific queues for specific
traffic classes, I guess. The "TC queues" I was mentioning are a
logical representation of the packet flow and refer to the hardware's
ability to treat different TCs differently with HW scheduling elements.
> > Then, it would need to have the ability to group TC queues into a
> > node.
>
> 🧐️ .. grouping of queues was the main direct use case for net-
> shapers,
> so it's definitely there, perhaps I don't understand what you mean.
Another side effect of the terminology I used. net-shapers currently
deals with device queues and grouping them, I was thinking about
defining another logical view for traffic split according to traffic
class, grouping of traffic classes and applying limits to the group.
Right now TCs are not modeled in net-shapers at all. That's why I'm
waiting for Paolo to comment, I have no idea what thoughts were put
into modeling traffic classes in net-shapers.
> > Do we want to have two completely different APIs to
> > manipulate tc bandwidth?
>
> Exactly my point. We have too many disjoint APIs. net-shapers was
> merged on the premise that it will at least align the internal and
> driver facing APIs, even if we still need multiple uAPIs.
The current patchset extends the well-established devlink rate API with
the minimal set of fields required to model traffic classes and offers
a full implementation of this new feature in a driver. It was designed
and started before net-shapers was merged. We did consider integrating
with net-shapers, but because it wasn't yet merged, it couldn't do TCs
or multi-device grouping (nor intend to do multi-device), it was
rejected as the API of choice.
Can the missing link between TC modeling in net-shapers can be added in
another series once it is better understood and discussed with Paolo?
Cosmin.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
2024-12-13 13:42 ` Cosmin Ratiu
@ 2024-12-17 5:45 ` Jakub Kicinski
2024-12-17 15:14 ` Cosmin Ratiu
0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2024-12-17 5:45 UTC (permalink / raw)
To: Cosmin Ratiu
Cc: andrew+netdev@lunn.ch, ttoukan.linux@gmail.com, Tariq Toukan,
Gal Pressman, davem@davemloft.net, jiri@resnulli.us,
Leon Romanovsky, edumazet@google.com, pabeni@redhat.com,
Saeed Mahameed, netdev@vger.kernel.org
On Fri, 13 Dec 2024 13:42:46 +0000 Cosmin Ratiu wrote:
> > > Then, it would need to have the ability to group TC queues into a
> > > node.
> >
> > 🧐️ .. grouping of queues was the main direct use case for net-
> > shapers,
> > so it's definitely there, perhaps I don't understand what you mean.
>
> Another side effect of the terminology I used. net-shapers currently
> deals with device queues and grouping them, I was thinking about
> defining another logical view for traffic split according to traffic
> class, grouping of traffic classes and applying limits to the group.
>
> Right now TCs are not modeled in net-shapers at all. That's why I'm
> waiting for Paolo to comment, I have no idea what thoughts were put
> into modeling traffic classes in net-shapers.
I'm not sure if Paolo is familiar with details of any device capable of
implementing shaping based on TC. You are, I hope, as you're trying
to add Linux uAPI for it.
> > > Do we want to have two completely different APIs to
> > > manipulate tc bandwidth?
> >
> > Exactly my point. We have too many disjoint APIs. net-shapers was
> > merged on the premise that it will at least align the internal and
> > driver facing APIs, even if we still need multiple uAPIs.
>
> The current patchset extends the well-established devlink rate API with
> the minimal set of fields required to model traffic classes and offers
> a full implementation of this new feature in a driver.
"I just need these few extra fields" is not conducive to good
designs which can stand the test of time.
> It was designed and started before net-shapers was merged.
> We did consider integrating with net-shapers, but because it wasn't
> yet merged, it couldn't do TCs or multi-device grouping (nor intend
> to do multi-device), it was rejected as the API of choice.
net-shapers took months of meetings and many, many revisions.
Providing an abstraction for all scheduling hierarchies was
an explicit goal. And someone from nVidia was actively reviewing.
Now you say that in another part of nVidia a decision was made
to ignore what the community was working on, sit out discussions
and then try to get your own code merged. Did I get that right?
> Can the missing link between TC modeling in net-shapers can be added
> in another series once it is better understood and discussed with
> Paolo?
Certainly Paolo's contributions would be appreciated. But I hope
you realize that he is a networking maintainer (like myself)
and may not have the time to solve your problem for you.
I want to be clear that I don't expect you to migrate devlink rate
to net-shaper style API at this point. But we do need at the very
least a clear understanding of how the objects are mapped, and
therefore how TC support would fit into net-shapers.
Ideally, also, less disregard for community efforts.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes
2024-12-17 5:45 ` Jakub Kicinski
@ 2024-12-17 15:14 ` Cosmin Ratiu
0 siblings, 0 replies; 32+ messages in thread
From: Cosmin Ratiu @ 2024-12-17 15:14 UTC (permalink / raw)
To: kuba@kernel.org
Cc: andrew+netdev@lunn.ch, ttoukan.linux@gmail.com, Tariq Toukan,
Gal Pressman, davem@davemloft.net, jiri@resnulli.us,
Leon Romanovsky, edumazet@google.com, pabeni@redhat.com,
Saeed Mahameed, netdev@vger.kernel.org
On Mon, 2024-12-16 at 21:45 -0800, Jakub Kicinski wrote:
> >
> > Right now TCs are not modeled in net-shapers at all. That's why I'm
> > waiting for Paolo to comment, I have no idea what thoughts were put
> > into modeling traffic classes in net-shapers.
>
> I'm not sure if Paolo is familiar with details of any device capable
> of
> implementing shaping based on TC. You are, I hope, as you're trying
> to add Linux uAPI for it.
I guess nobody has perfect expertise in both TC and net-shapers. But
between all of us, we'll find a way.
> > >
> > The current patchset extends the well-established devlink rate API
> > with
> > the minimal set of fields required to model traffic classes and
> > offers
> > a full implementation of this new feature in a driver.
>
> "I just need these few extra fields" is not conducive to good
> designs which can stand the test of time.
I had meant that as "the uAPI needs at least that set of fields to
express tc bandwidth proportions anyway, regardless of discussions
around net-shapers". I am not making any claims about the quality of
the design besides its minimalistic nature.
> > It was designed and started before net-shapers was merged.
> > We did consider integrating with net-shapers, but because it wasn't
> > yet merged, it couldn't do TCs or multi-device grouping (nor intend
> > to do multi-device), it was rejected as the API of choice.
>
> net-shapers took months of meetings and many, many revisions.
> Providing an abstraction for all scheduling hierarchies was
> an explicit goal. And someone from nVidia was actively reviewing.
> Now you say that in another part of nVidia a decision was made
> to ignore what the community was working on, sit out discussions
> and then try to get your own code merged. Did I get that right?
You got it wrong. Nobody decided to "ignore what the community was
working on" or "get our own code merged".
Back in July, we reached out to Paolo explicitly asking about this use
case. Back then things were still in flux with net-shapers but the
direction was clear for it, to model shaping at netdev and below
levels. Paolo's recommendation was to "stick it with the devlink-rate
objects". Our mistake was not cc-ing the ML in that discussion.
What followed was not "ignoring what the community was working on", but
"focusing on solving the TC shaping at VF and group-of-VFs level" with
the more suitable API as opposed to investing time and effort into
influencing net-shapers towards handling use-cases it didn't intend to.
I'm also not sure who else from nVidia was actively reviewing. I asked
a question at some point, Jiri was also active (but mostly with his
maintainer hat, I guess).
> > Can the missing link between TC modeling in net-shapers can be
> > added
> > in another series once it is better understood and discussed with
> > Paolo?
>
> Certainly Paolo's contributions would be appreciated. But I hope
> you realize that he is a networking maintainer (like myself)
> and may not have the time to solve your problem for you.
I'm not waiting for him to solve the problem for us, but to offer his
opinion as someone who spent the most time thinking about net-shapers.
It would be bad form for me to start hacking away at that API without
at least hearing from the original author.
> I want to be clear that I don't expect you to migrate devlink rate
> to net-shaper style API at this point. But we do need at the very
> least a clear understanding of how the objects are mapped, and
> therefore how TC support would fit into net-shapers.
> Ideally, also, less disregard for community efforts.
Again, there was no "disregard for community efforts". There was maybe
a missed opportunity to influence the design of net-shapers towards
handling TCs from the start because we focused on devlink-rate instead.
That will now be corrected.
Separately from that, I was planning to add support for net-shapers ops
in the mlx5 QoS infra, similar with how it was done for the other
drivers that do hw QoS.
Cosmin.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2024-12-09 21:27 ` Jakub Kicinski
@ 2025-01-20 11:55 ` Carolina Jubran
2025-01-20 18:14 ` Jakub Kicinski
0 siblings, 1 reply; 32+ messages in thread
From: Carolina Jubran @ 2025-01-20 11:55 UTC (permalink / raw)
To: Jakub Kicinski, Tariq Toukan
Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky, netdev, Saeed Mahameed,
Gal Pressman, linux-rdma, Cosmin Ratiu, Jiri Pirko
On 09/12/2024 23:27, Jakub Kicinski wrote:
> On Mon, 9 Dec 2024 23:03:04 +0200 Tariq Toukan wrote:
>>>> + tc_index = nla_get_u8(tb[DEVLINK_ATTR_RATE_TC_INDEX]);
>>>> +
>>>> + if (tc_index >= IEEE_8021QAZ_MAX_TCS) {
>>>
>>> This can't be enforced by the policy?
>>>
>>
>> If we enforce by policy we need to use the constant 7, not the macro
>> IEEE_8021QAZ_MAX_TCS-1.
>> I'll keep it.
>
> The spec should support using "foreign constants"
> Off the top of my head - you can define the ieee-8021qaz-max-tcs contant
> as if you were defining a devlink constant, then add a header:
> attribute. This will tell C codegen to include that header instead of
> generating the definition.
>
Hi Jakub,
I tried implementing this as you suggested, but it seems that the only
supported definition types are ['const', 'enum', 'flags', 'struct'],
while the max value in checks only accepts patterns matching
^[su](8|16|32|64)-(min|max)$.
From what I see, it doesn’t currently support using a const value for
the max or min checks. Let me know if I’m missing something or if
there’s an alternative way to achieve this.
Thanks,
Carolina
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2025-01-20 11:55 ` Carolina Jubran
@ 2025-01-20 18:14 ` Jakub Kicinski
2025-01-21 12:36 ` Carolina Jubran
0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2025-01-20 18:14 UTC (permalink / raw)
To: Carolina Jubran
Cc: Tariq Toukan, Tariq Toukan, David S. Miller, Paolo Abeni,
Eric Dumazet, Andrew Lunn, Leon Romanovsky, netdev,
Saeed Mahameed, Gal Pressman, linux-rdma, Cosmin Ratiu,
Jiri Pirko
On Mon, 20 Jan 2025 13:55:58 +0200 Carolina Jubran wrote:
> On 09/12/2024 23:27, Jakub Kicinski wrote:
> > On Mon, 9 Dec 2024 23:03:04 +0200 Tariq Toukan wrote:
> >> If we enforce by policy we need to use the constant 7, not the macro
> >> IEEE_8021QAZ_MAX_TCS-1.
> >> I'll keep it.
> >
> > The spec should support using "foreign constants"
> > Off the top of my head - you can define the ieee-8021qaz-max-tcs contant
> > as if you were defining a devlink constant, then add a header:
> > attribute. This will tell C codegen to include that header instead of
> > generating the definition.
> >
>
> Hi Jakub,
>
> I tried implementing this as you suggested, but it seems that the only
> supported definition types are ['const', 'enum', 'flags', 'struct'],
> while the max value in checks only accepts patterns matching
> ^[su](8|16|32|64)-(min|max)$.
>
> From what I see, it doesn’t currently support using a const value for
> the max or min checks. Let me know if I’m missing something or if
> there’s an alternative way to achieve this.
Ah, I thought we already implemented this, sorry.
Can you try the two patches from the top of this branch?
https://github.com/kuba-moo/linux/tree/ynl-limits
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2025-01-20 18:14 ` Jakub Kicinski
@ 2025-01-21 12:36 ` Carolina Jubran
2025-01-22 12:48 ` Carolina Jubran
0 siblings, 1 reply; 32+ messages in thread
From: Carolina Jubran @ 2025-01-21 12:36 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tariq Toukan, Tariq Toukan, David S. Miller, Paolo Abeni,
Eric Dumazet, Andrew Lunn, Leon Romanovsky, netdev,
Saeed Mahameed, Gal Pressman, linux-rdma, Cosmin Ratiu,
Jiri Pirko
On 20/01/2025 20:14, Jakub Kicinski wrote:
> On Mon, 20 Jan 2025 13:55:58 +0200 Carolina Jubran wrote:
>> On 09/12/2024 23:27, Jakub Kicinski wrote:
>>> On Mon, 9 Dec 2024 23:03:04 +0200 Tariq Toukan wrote:
>>>> If we enforce by policy we need to use the constant 7, not the macro
>>>> IEEE_8021QAZ_MAX_TCS-1.
>>>> I'll keep it.
>>>
>>> The spec should support using "foreign constants"
>>> Off the top of my head - you can define the ieee-8021qaz-max-tcs contant
>>> as if you were defining a devlink constant, then add a header:
>>> attribute. This will tell C codegen to include that header instead of
>>> generating the definition.
>>>
>>
>> Hi Jakub,
>>
>> I tried implementing this as you suggested, but it seems that the only
>> supported definition types are ['const', 'enum', 'flags', 'struct'],
>> while the max value in checks only accepts patterns matching
>> ^[su](8|16|32|64)-(min|max)$.
>>
>> From what I see, it doesn’t currently support using a const value for
>> the max or min checks. Let me know if I’m missing something or if
>> there’s an alternative way to achieve this.
>
> Ah, I thought we already implemented this, sorry.
> Can you try the two patches from the top of this branch?
>
> https://github.com/kuba-moo/linux/tree/ynl-limits
Yes, it worked after applying the pattern changes to the
genetlink-legacy.yaml , as devlink uses it.
Thank you!
Carolina
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2025-01-21 12:36 ` Carolina Jubran
@ 2025-01-22 12:48 ` Carolina Jubran
2025-01-22 14:30 ` Jakub Kicinski
0 siblings, 1 reply; 32+ messages in thread
From: Carolina Jubran @ 2025-01-22 12:48 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Tariq Toukan, Tariq Toukan, David S. Miller, Paolo Abeni,
Eric Dumazet, Andrew Lunn, Leon Romanovsky, netdev,
Saeed Mahameed, Gal Pressman, linux-rdma, Cosmin Ratiu,
Jiri Pirko
On 21/01/2025 14:36, Carolina Jubran wrote:
>
>
> On 20/01/2025 20:14, Jakub Kicinski wrote:
>> On Mon, 20 Jan 2025 13:55:58 +0200 Carolina Jubran wrote:
>>> On 09/12/2024 23:27, Jakub Kicinski wrote:
>>>> On Mon, 9 Dec 2024 23:03:04 +0200 Tariq Toukan wrote:
>>>>> If we enforce by policy we need to use the constant 7, not the macro
>>>>> IEEE_8021QAZ_MAX_TCS-1.
>>>>> I'll keep it.
>>>>
>>>> The spec should support using "foreign constants"
>>>> Off the top of my head - you can define the ieee-8021qaz-max-tcs
>>>> contant
>>>> as if you were defining a devlink constant, then add a header:
>>>> attribute. This will tell C codegen to include that header instead of
>>>> generating the definition.
>>>
>>> Hi Jakub,
>>>
>>> I tried implementing this as you suggested, but it seems that the only
>>> supported definition types are ['const', 'enum', 'flags', 'struct'],
>>> while the max value in checks only accepts patterns matching
>>> ^[su](8|16|32|64)-(min|max)$.
>>>
>>> From what I see, it doesn’t currently support using a const value for
>>> the max or min checks. Let me know if I’m missing something or if
>>> there’s an alternative way to achieve this.
>>
>> Ah, I thought we already implemented this, sorry.
>> Can you try the two patches from the top of this branch?
>>
>> https://github.com/kuba-moo/linux/tree/ynl-limits
>
> Yes, it worked after applying the pattern changes to the
> genetlink-legacy.yaml , as devlink uses it.
>
> Thank you!
>
> Carolina
>
Since this worked and the devlink patch now depends on it, would it be
possible to include the top two patches
https://github.com/kuba-moo/linux/tree/ynl-limits in the next submission
of the devlink and mlx5 patches?
Thanks!
Carolina
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2025-01-22 12:48 ` Carolina Jubran
@ 2025-01-22 14:30 ` Jakub Kicinski
2025-02-05 6:22 ` Tariq Toukan
0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2025-01-22 14:30 UTC (permalink / raw)
To: Carolina Jubran
Cc: Tariq Toukan, Tariq Toukan, David S. Miller, Paolo Abeni,
Eric Dumazet, Andrew Lunn, Leon Romanovsky, netdev,
Saeed Mahameed, Gal Pressman, linux-rdma, Cosmin Ratiu,
Jiri Pirko
On Wed, 22 Jan 2025 14:48:55 +0200 Carolina Jubran wrote:
> Since this worked and the devlink patch now depends on it, would it be
> possible to include the top two patches
> https://github.com/kuba-moo/linux/tree/ynl-limits in the next submission
> of the devlink and mlx5 patches?
I'll post the two patches right after the merge window.
They stand on their own, and we can keep your series short-ish.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2025-01-22 14:30 ` Jakub Kicinski
@ 2025-02-05 6:22 ` Tariq Toukan
2025-02-05 6:56 ` Gal Pressman
0 siblings, 1 reply; 32+ messages in thread
From: Tariq Toukan @ 2025-02-05 6:22 UTC (permalink / raw)
To: Jakub Kicinski, Carolina Jubran
Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky, netdev, Saeed Mahameed,
Gal Pressman, linux-rdma, Cosmin Ratiu, Jiri Pirko
On 22/01/2025 16:30, Jakub Kicinski wrote:
> On Wed, 22 Jan 2025 14:48:55 +0200 Carolina Jubran wrote:
>> Since this worked and the devlink patch now depends on it, would it be
>> possible to include the top two patches
>> https://github.com/kuba-moo/linux/tree/ynl-limits in the next submission
>> of the devlink and mlx5 patches?
>
> I'll post the two patches right after the merge window.
> They stand on their own, and we can keep your series short-ish.
Hi Jakub,
A kind reminder, as we have dependency on these.
Regards,
Tariq
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2025-02-05 6:22 ` Tariq Toukan
@ 2025-02-05 6:56 ` Gal Pressman
2025-02-05 8:02 ` Tariq Toukan
0 siblings, 1 reply; 32+ messages in thread
From: Gal Pressman @ 2025-02-05 6:56 UTC (permalink / raw)
To: Tariq Toukan, Jakub Kicinski, Carolina Jubran
Cc: Tariq Toukan, David S. Miller, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Leon Romanovsky, netdev, Saeed Mahameed, linux-rdma,
Cosmin Ratiu, Jiri Pirko
On 05/02/2025 8:22, Tariq Toukan wrote:
>
>
> On 22/01/2025 16:30, Jakub Kicinski wrote:
>> On Wed, 22 Jan 2025 14:48:55 +0200 Carolina Jubran wrote:
>>> Since this worked and the devlink patch now depends on it, would it be
>>> possible to include the top two patches
>>> https://github.com/kuba-moo/linux/tree/ynl-limits in the next submission
>>> of the devlink and mlx5 patches?
>>
>> I'll post the two patches right after the merge window.
>> They stand on their own, and we can keep your series short-ish.
>
> Hi Jakub,
>
> A kind reminder, as we have dependency on these.
They're submitted already:
https://lore.kernel.org/netdev/20250203215510.1288728-2-kuba@kernel.org/
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management
2025-02-05 6:56 ` Gal Pressman
@ 2025-02-05 8:02 ` Tariq Toukan
0 siblings, 0 replies; 32+ messages in thread
From: Tariq Toukan @ 2025-02-05 8:02 UTC (permalink / raw)
To: Gal Pressman, Tariq Toukan, Jakub Kicinski, Carolina Jubran
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Leon Romanovsky, netdev, Saeed Mahameed, linux-rdma, Cosmin Ratiu,
Jiri Pirko
On 05/02/2025 8:56, Gal Pressman wrote:
> On 05/02/2025 8:22, Tariq Toukan wrote:
>>
>>
>> On 22/01/2025 16:30, Jakub Kicinski wrote:
>>> On Wed, 22 Jan 2025 14:48:55 +0200 Carolina Jubran wrote:
>>>> Since this worked and the devlink patch now depends on it, would it be
>>>> possible to include the top two patches
>>>> https://github.com/kuba-moo/linux/tree/ynl-limits in the next submission
>>>> of the devlink and mlx5 patches?
>>>
>>> I'll post the two patches right after the merge window.
>>> They stand on their own, and we can keep your series short-ish.
>>
>> Hi Jakub,
>>
>> A kind reminder, as we have dependency on these.
>
> They're submitted already:
> https://lore.kernel.org/netdev/20250203215510.1288728-2-kuba@kernel.org/
I see. Patch was renamed.
I searched for the old name "don't output foreign constants".
Great!
Thanks,
Tariq
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-02-05 8:02 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-04 22:09 [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 01/11] net/mlx5: ifc: Reorganize mlx5_ifc_flow_table_context_bits Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 02/11] net/mlx5: Add ConnectX-8 device to ifc Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 03/11] net/mlx5: Add support for new scheduling elements Tariq Toukan
2024-12-04 22:09 ` [PATCH mlx5-next V5 04/11] net/mlx5: qos: Add ifc support for cross-esw scheduling Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 05/11] net/mlx5: DR, Expand SWS STE callbacks and consolidate common structs Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 06/11] net/mlx5: DR, Add support for ConnectX-8 steering Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 07/11] devlink: Extend devlink rate API with traffic classes bandwidth management Tariq Toukan
2024-12-07 2:10 ` Jakub Kicinski
2024-12-09 21:03 ` Tariq Toukan
2024-12-09 21:27 ` Jakub Kicinski
2025-01-20 11:55 ` Carolina Jubran
2025-01-20 18:14 ` Jakub Kicinski
2025-01-21 12:36 ` Carolina Jubran
2025-01-22 12:48 ` Carolina Jubran
2025-01-22 14:30 ` Jakub Kicinski
2025-02-05 6:22 ` Tariq Toukan
2025-02-05 6:56 ` Gal Pressman
2025-02-05 8:02 ` Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 08/11] net/mlx5: Add no-op implementation for setting tc-bw on rate objects Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 09/11] net/mlx5: Add support for setting tc-bw on nodes Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 10/11] net/mlx5: Add traffic class scheduling support for vport QoS Tariq Toukan
2024-12-04 22:09 ` [PATCH net-next V5 11/11] net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw Tariq Toukan
2024-12-05 9:23 ` [PATCH net-next V5 00/11] net/mlx5: ConnectX-8 SW Steering + Rate management on traffic classes Leon Romanovsky
2024-12-07 2:13 ` Jakub Kicinski
2024-12-09 19:32 ` Tariq Toukan
2024-12-09 21:41 ` Jakub Kicinski
2024-12-11 9:49 ` Cosmin Ratiu
2024-12-12 1:49 ` Jakub Kicinski
2024-12-13 13:42 ` Cosmin Ratiu
2024-12-17 5:45 ` Jakub Kicinski
2024-12-17 15:14 ` Cosmin Ratiu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).