linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage
@ 2025-06-22 17:22 Mark Bloch
  2025-06-22 17:22 ` [PATCH net-next v2 1/8] net/mlx5: HWS, remove unused create_dest_array parameter Mark Bloch
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Mark Bloch @ 2025-06-22 17:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe, Mark Bloch

This series optimizes ICM usage for unidirectional rules and
empty matchers and with the last patch we make hardware steering
the default FDB steering provider for NICs that don't support software
steering.

Hardware steering (HWS) uses a type of rule table container (RTC) that
is unidirectional, so matchers consist of two RTCs to accommodate
bidirectional rules.

This small series enables resizing the two RTCs independently by
tracking the number of rules separately. For extreme cases where all
rules are unidirectional, this results in saving close to half the
memory footprint.

Results for inserting 1M unidirectional rules using a simple module:

			Pages		Memory
Before this patch:	300k		1.5GiB
After this patch:	160k		900MiB

The 'Pages' column measures the number of 4KiB pages the device requests
for itself (the ICM).

The 'Memory' column is the difference between peak usage and baseline
usage (before starting the test) as reported by `free -h`.

In addition, second to last patch of the series handles a case where all
the matcher's rules were deleted: the large RTCs of the matcher are no
longer required, and we can save some more ICM by shrinking the matcher
to its initial size.

Finally the last patch makes hardware steering the default mode
when in swichdev for NICs that don't have software steering support.

Changelog
=========
Changes from v1 [0]:
- Fixed author on patches 5 and 6.

References
==========
[0] v1: https://lore.kernel.org/all/20250619115522.68469-1-mbloch@nvidia.com/

Moshe Shemesh (1):
  net/mlx5: Add HWS as secondary steering mode

Vlad Dogaru (5):
  net/mlx5: HWS, remove unused create_dest_array parameter
  net/mlx5: HWS, Refactor and export rule skip logic
  net/mlx5: HWS, Create STEs directly from matcher
  net/mlx5: HWS, Decouple matcher RX and TX sizes
  net/mlx5: HWS, Track matcher sizes individually

Yevgeny Kliteynik (2):
  net/mlx5: HWS, remove incorrect comment
  net/mlx5: HWS, Shrink empty matchers

 .../net/ethernet/mellanox/mlx5/core/fs_core.c |   2 +
 .../mellanox/mlx5/core/steering/hws/action.c  |   7 +-
 .../mellanox/mlx5/core/steering/hws/bwc.c     | 284 ++++++++++++++----
 .../mellanox/mlx5/core/steering/hws/bwc.h     |  14 +-
 .../mellanox/mlx5/core/steering/hws/debug.c   |  20 +-
 .../mellanox/mlx5/core/steering/hws/fs_hws.c  |  15 +-
 .../mellanox/mlx5/core/steering/hws/matcher.c | 166 ++++++----
 .../mellanox/mlx5/core/steering/hws/matcher.h |   3 +-
 .../mellanox/mlx5/core/steering/hws/mlx5hws.h |  36 ++-
 .../mellanox/mlx5/core/steering/hws/rule.c    |  35 +--
 .../mellanox/mlx5/core/steering/hws/rule.h    |   3 +
 11 files changed, 403 insertions(+), 182 deletions(-)


base-commit: 091d019adce033118776ef93b50a268f715ae8f6
-- 
2.34.1


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net-next v2 1/8] net/mlx5: HWS, remove unused create_dest_array parameter
  2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
@ 2025-06-22 17:22 ` Mark Bloch
  2025-06-24 18:37   ` Simon Horman
  2025-06-22 17:22 ` [PATCH net-next v2 2/8] net/mlx5: HWS, remove incorrect comment Mark Bloch
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Mark Bloch @ 2025-06-22 17:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe, Vlad Dogaru, Yevgeny Kliteynik, Mark Bloch

From: Vlad Dogaru <vdogaru@nvidia.com>

`flow_source` is not used anywhere in mlx5hws_action_create_dest_array.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../mellanox/mlx5/core/steering/hws/action.c      |  7 ++-----
 .../mellanox/mlx5/core/steering/hws/fs_hws.c      | 15 ++++++---------
 .../mellanox/mlx5/core/steering/hws/mlx5hws.h     |  8 ++------
 3 files changed, 10 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c
index 447ea3f8722c..396804369b00 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c
@@ -1358,12 +1358,9 @@ mlx5hws_action_create_modify_header(struct mlx5hws_context *ctx,
 }
 
 struct mlx5hws_action *
-mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx,
-				 size_t num_dest,
+mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, size_t num_dest,
 				 struct mlx5hws_action_dest_attr *dests,
-				 bool ignore_flow_level,
-				 u32 flow_source,
-				 u32 flags)
+				 bool ignore_flow_level, u32 flags)
 {
 	struct mlx5hws_cmd_set_fte_dest *dest_list = NULL;
 	struct mlx5hws_cmd_ft_create_attr ft_attr = {0};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c
index bf4643d0ce17..57592b92e24b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c
@@ -571,14 +571,12 @@ static void mlx5_fs_put_dest_action_sampler(struct mlx5_fs_hws_context *fs_ctx,
 static struct mlx5hws_action *
 mlx5_fs_create_action_dest_array(struct mlx5hws_context *ctx,
 				 struct mlx5hws_action_dest_attr *dests,
-				 u32 num_of_dests, bool ignore_flow_level,
-				 u32 flow_source)
+				 u32 num_of_dests, bool ignore_flow_level)
 {
 	u32 flags = MLX5HWS_ACTION_FLAG_HWS_FDB | MLX5HWS_ACTION_FLAG_SHARED;
 
 	return mlx5hws_action_create_dest_array(ctx, num_of_dests, dests,
-						ignore_flow_level,
-						flow_source, flags);
+						ignore_flow_level, flags);
 }
 
 static struct mlx5hws_action *
@@ -1015,7 +1013,6 @@ static int mlx5_fs_fte_get_hws_actions(struct mlx5_flow_root_namespace *ns,
 		}
 		(*ractions)[num_actions++].action = dest_actions->dest;
 	} else if (num_dest_actions > 1) {
-		u32 flow_source = fte->act_dests.flow_context.flow_source;
 		bool ignore_flow_level;
 
 		if (num_actions == MLX5_FLOW_CONTEXT_ACTION_MAX ||
@@ -1025,10 +1022,10 @@ static int mlx5_fs_fte_get_hws_actions(struct mlx5_flow_root_namespace *ns,
 		}
 		ignore_flow_level =
 			!!(fte_action->flags & FLOW_ACT_IGNORE_FLOW_LEVEL);
-		tmp_action = mlx5_fs_create_action_dest_array(ctx, dest_actions,
-							      num_dest_actions,
-							      ignore_flow_level,
-							      flow_source);
+		tmp_action =
+			mlx5_fs_create_action_dest_array(ctx, dest_actions,
+							 num_dest_actions,
+							 ignore_flow_level);
 		if (!tmp_action) {
 			err = -EOPNOTSUPP;
 			goto free_actions;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h
index d8ac6c196211..a1295a311b70 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h
@@ -727,18 +727,14 @@ mlx5hws_action_create_push_vlan(struct mlx5hws_context *ctx, u32 flags);
  * @dests: The destination array. Each contains a destination action and can
  *	   have additional actions.
  * @ignore_flow_level: Whether to turn on 'ignore_flow_level' for this dest.
- * @flow_source: Source port of the traffic for this actions.
  * @flags: Action creation flags (enum mlx5hws_action_flags).
  *
  * Return: pointer to mlx5hws_action on success NULL otherwise.
  */
 struct mlx5hws_action *
-mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx,
-				 size_t num_dest,
+mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx, size_t num_dest,
 				 struct mlx5hws_action_dest_attr *dests,
-				 bool ignore_flow_level,
-				 u32 flow_source,
-				 u32 flags);
+				 bool ignore_flow_level, u32 flags);
 
 /**
  * mlx5hws_action_create_insert_header - Create insert header action.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v2 2/8] net/mlx5: HWS, remove incorrect comment
  2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
  2025-06-22 17:22 ` [PATCH net-next v2 1/8] net/mlx5: HWS, remove unused create_dest_array parameter Mark Bloch
@ 2025-06-22 17:22 ` Mark Bloch
  2025-06-24 18:37   ` Simon Horman
  2025-06-22 17:22 ` [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic Mark Bloch
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Mark Bloch @ 2025-06-22 17:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe, Yevgeny Kliteynik, Vlad Dogaru, Mark Bloch

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Removing incorrect comment section that is probably some
copy-paste artifact.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
index 9e057f808ea5..665e6e285db5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
@@ -876,8 +876,6 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule,
 
 	/* At this point the rule wasn't added.
 	 * It could be because there was collision, or some other problem.
-	 * If we don't dive deeper than API, the only thing we know is that
-	 * the status of completion is RTE_FLOW_OP_ERROR.
 	 * Try rehash by size and insert rule again - last chance.
 	 */
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic
  2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
  2025-06-22 17:22 ` [PATCH net-next v2 1/8] net/mlx5: HWS, remove unused create_dest_array parameter Mark Bloch
  2025-06-22 17:22 ` [PATCH net-next v2 2/8] net/mlx5: HWS, remove incorrect comment Mark Bloch
@ 2025-06-22 17:22 ` Mark Bloch
  2025-06-24 18:38   ` Simon Horman
  2025-06-22 17:22 ` [PATCH net-next v2 4/8] net/mlx5: HWS, Create STEs directly from matcher Mark Bloch
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Mark Bloch @ 2025-06-22 17:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe, Vlad Dogaru, Yevgeny Kliteynik, Mark Bloch

From: Vlad Dogaru <vdogaru@nvidia.com>

The bwc layer will use `mlx5hws_rule_skip` to keep track of numbers of
RX and TX rules individually, so export this function for future usage.

While we're in there, reduce nesting by adding a couple of early return
statements.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../mellanox/mlx5/core/steering/hws/rule.c    | 35 ++++++++++---------
 .../mellanox/mlx5/core/steering/hws/rule.h    |  3 ++
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c
index 5342a4cc7194..0370b9b87d4e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c
@@ -3,10 +3,8 @@
 
 #include "internal.h"
 
-static void hws_rule_skip(struct mlx5hws_matcher *matcher,
-			  struct mlx5hws_match_template *mt,
-			  u32 flow_source,
-			  bool *skip_rx, bool *skip_tx)
+void mlx5hws_rule_skip(struct mlx5hws_matcher *matcher, u32 flow_source,
+		       bool *skip_rx, bool *skip_tx)
 {
 	/* By default FDB rules are added to both RX and TX */
 	*skip_rx = false;
@@ -14,20 +12,22 @@ static void hws_rule_skip(struct mlx5hws_matcher *matcher,
 
 	if (flow_source == MLX5_FLOW_CONTEXT_FLOW_SOURCE_LOCAL_VPORT) {
 		*skip_rx = true;
-	} else if (flow_source == MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK) {
+		return;
+	}
+
+	if (flow_source == MLX5_FLOW_CONTEXT_FLOW_SOURCE_UPLINK) {
 		*skip_tx = true;
-	} else {
-		/* If no flow source was set for current rule,
-		 * check for flow source in matcher attributes.
-		 */
-		if (matcher->attr.optimize_flow_src) {
-			*skip_tx =
-				matcher->attr.optimize_flow_src == MLX5HWS_MATCHER_FLOW_SRC_WIRE;
-			*skip_rx =
-				matcher->attr.optimize_flow_src == MLX5HWS_MATCHER_FLOW_SRC_VPORT;
-			return;
-		}
+		return;
 	}
+
+	/* If no flow source was set for the rule, check for flow source in
+	 * matcher attributes.
+	 */
+	if (matcher->attr.optimize_flow_src == MLX5HWS_MATCHER_FLOW_SRC_WIRE)
+		*skip_tx = true;
+	else if (matcher->attr.optimize_flow_src ==
+		 MLX5HWS_MATCHER_FLOW_SRC_VPORT)
+		*skip_rx = true;
 }
 
 static void
@@ -66,7 +66,8 @@ static void hws_rule_init_dep_wqe(struct mlx5hws_send_ring_dep_wqe *dep_wqe,
 				attr->rule_idx : 0;
 
 	if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) {
-		hws_rule_skip(matcher, mt, attr->flow_source, &skip_rx, &skip_tx);
+		mlx5hws_rule_skip(matcher, attr->flow_source, &skip_rx,
+				  &skip_tx);
 
 		if (!skip_rx) {
 			dep_wqe->rtc_0 = matcher->match_ste.rtc_0_id;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h
index 1c47a9c11572..d0f082b8dbf5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h
@@ -69,6 +69,9 @@ struct mlx5hws_rule {
 			   */
 };
 
+void mlx5hws_rule_skip(struct mlx5hws_matcher *matcher, u32 flow_source,
+		       bool *skip_rx, bool *skip_tx);
+
 void mlx5hws_rule_free_action_ste(struct mlx5hws_action_ste_chunk *action_ste);
 
 int mlx5hws_rule_move_hws_remove(struct mlx5hws_rule *rule,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v2 4/8] net/mlx5: HWS, Create STEs directly from matcher
  2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
                   ` (2 preceding siblings ...)
  2025-06-22 17:22 ` [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic Mark Bloch
@ 2025-06-22 17:22 ` Mark Bloch
  2025-06-24 18:57   ` Simon Horman
  2025-06-22 17:22 ` [PATCH net-next v2 5/8] net/mlx5: HWS, Decouple matcher RX and TX sizes Mark Bloch
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Mark Bloch @ 2025-06-22 17:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe, Vlad Dogaru, Yevgeny Kliteynik, Mark Bloch

From: Vlad Dogaru <vdogaru@nvidia.com>

Matchers were using the pool abstraction solely as a convenience
to allocate two STE ranges. The pool's core functionality, that
of allocating individual items from the range, was unused.
Matchers rely either on the hardware to hash rules into a table,
or on a user-provided index.

Remove the STE pool from the matcher and allocate the STE ranges
manually instead.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../mellanox/mlx5/core/steering/hws/debug.c   | 10 +--
 .../mellanox/mlx5/core/steering/hws/matcher.c | 71 ++++++++++---------
 .../mellanox/mlx5/core/steering/hws/matcher.h |  3 +-
 3 files changed, 41 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c
index 91568d6c1dac..f9b75aefcaa7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c
@@ -118,7 +118,6 @@ static int hws_debug_dump_matcher(struct seq_file *f, struct mlx5hws_matcher *ma
 {
 	enum mlx5hws_table_type tbl_type = matcher->tbl->type;
 	struct mlx5hws_cmd_ft_query_attr ft_attr = {0};
-	struct mlx5hws_pool *ste_pool;
 	u64 icm_addr_0 = 0;
 	u64 icm_addr_1 = 0;
 	u32 ste_0_id = -1;
@@ -133,12 +132,9 @@ static int hws_debug_dump_matcher(struct seq_file *f, struct mlx5hws_matcher *ma
 		   matcher->end_ft_id,
 		   matcher->col_matcher ? HWS_PTR_TO_ID(matcher->col_matcher) : 0);
 
-	ste_pool = matcher->match_ste.pool;
-	if (ste_pool) {
-		ste_0_id = mlx5hws_pool_get_base_id(ste_pool);
-		if (tbl_type == MLX5HWS_TABLE_TYPE_FDB)
-			ste_1_id = mlx5hws_pool_get_base_mirror_id(ste_pool);
-	}
+	ste_0_id = matcher->match_ste.ste_0_base;
+	if (tbl_type == MLX5HWS_TABLE_TYPE_FDB)
+		ste_1_id = matcher->match_ste.ste_1_base;
 
 	seq_printf(f, ",%d,%d,%d,%d",
 		   matcher->match_ste.rtc_0_id,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c
index ce28ee1c0e41..b0fcaf508e06 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c
@@ -507,10 +507,8 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher)
 		}
 	}
 
-	obj_id = mlx5hws_pool_get_base_id(matcher->match_ste.pool);
-
 	rtc_attr.pd = ctx->pd_num;
-	rtc_attr.ste_base = obj_id;
+	rtc_attr.ste_base = matcher->match_ste.ste_0_base;
 	rtc_attr.reparse_mode = mlx5hws_context_get_reparse_mode(ctx);
 	rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, false);
 	hws_matcher_set_rtc_attr_sz(matcher, &rtc_attr, false);
@@ -527,9 +525,7 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher)
 	}
 
 	if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) {
-		obj_id = mlx5hws_pool_get_base_mirror_id(
-			matcher->match_ste.pool);
-		rtc_attr.ste_base = obj_id;
+		rtc_attr.ste_base = matcher->match_ste.ste_1_base;
 		rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, true);
 
 		obj_id = mlx5hws_pool_get_base_mirror_id(ctx->stc_pool);
@@ -588,21 +584,6 @@ hws_matcher_check_attr_sz(struct mlx5hws_cmd_query_caps *caps,
 	return 0;
 }
 
-static void hws_matcher_set_pool_attr(struct mlx5hws_pool_attr *attr,
-				      struct mlx5hws_matcher *matcher)
-{
-	switch (matcher->attr.optimize_flow_src) {
-	case MLX5HWS_MATCHER_FLOW_SRC_VPORT:
-		attr->opt_type = MLX5HWS_POOL_OPTIMIZE_ORIG;
-		break;
-	case MLX5HWS_MATCHER_FLOW_SRC_WIRE:
-		attr->opt_type = MLX5HWS_POOL_OPTIMIZE_MIRROR;
-		break;
-	default:
-		break;
-	}
-}
-
 static int hws_matcher_check_and_process_at(struct mlx5hws_matcher *matcher,
 					    struct mlx5hws_action_template *at)
 {
@@ -683,8 +664,8 @@ static void hws_matcher_set_ip_version_match(struct mlx5hws_matcher *matcher)
 
 static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher)
 {
+	struct mlx5hws_cmd_ste_create_attr ste_attr = {};
 	struct mlx5hws_context *ctx = matcher->tbl->ctx;
-	struct mlx5hws_pool_attr pool_attr = {0};
 	int ret;
 
 	/* Calculate match, range and hash definers */
@@ -699,22 +680,39 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher)
 
 	hws_matcher_set_ip_version_match(matcher);
 
-	/* Create an STE pool per matcher*/
-	pool_attr.table_type = matcher->tbl->type;
-	pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE;
-	pool_attr.alloc_log_sz = matcher->attr.table.sz_col_log +
-				 matcher->attr.table.sz_row_log;
-	hws_matcher_set_pool_attr(&pool_attr, matcher);
-
-	matcher->match_ste.pool = mlx5hws_pool_create(ctx, &pool_attr);
-	if (!matcher->match_ste.pool) {
-		mlx5hws_err(ctx, "Failed to allocate matcher STE pool\n");
-		ret = -EOPNOTSUPP;
+	/* Create an STE range each for RX and TX. */
+	ste_attr.table_type = FS_FT_FDB_RX;
+	ste_attr.log_obj_range =
+		matcher->attr.optimize_flow_src ==
+				MLX5HWS_MATCHER_FLOW_SRC_VPORT ?
+				0 : matcher->attr.table.sz_col_log +
+				    matcher->attr.table.sz_row_log;
+
+	ret = mlx5hws_cmd_ste_create(ctx->mdev, &ste_attr,
+				     &matcher->match_ste.ste_0_base);
+	if (ret) {
+		mlx5hws_err(ctx, "Failed to allocate RX STE range (%d)\n", ret);
 		goto uninit_match_definer;
 	}
 
+	ste_attr.table_type = FS_FT_FDB_TX;
+	ste_attr.log_obj_range =
+		matcher->attr.optimize_flow_src ==
+				MLX5HWS_MATCHER_FLOW_SRC_WIRE ?
+				0 : matcher->attr.table.sz_col_log +
+				    matcher->attr.table.sz_row_log;
+
+	ret = mlx5hws_cmd_ste_create(ctx->mdev, &ste_attr,
+				     &matcher->match_ste.ste_1_base);
+	if (ret) {
+		mlx5hws_err(ctx, "Failed to allocate TX STE range (%d)\n", ret);
+		goto destroy_rx_ste_range;
+	}
+
 	return 0;
 
+destroy_rx_ste_range:
+	mlx5hws_cmd_ste_destroy(ctx->mdev, matcher->match_ste.ste_0_base);
 uninit_match_definer:
 	if (!(matcher->flags & MLX5HWS_MATCHER_FLAGS_COLLISION))
 		mlx5hws_definer_mt_uninit(ctx, matcher->mt);
@@ -723,9 +721,12 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher)
 
 static void hws_matcher_unbind_mt(struct mlx5hws_matcher *matcher)
 {
-	mlx5hws_pool_destroy(matcher->match_ste.pool);
+	struct mlx5hws_context *ctx = matcher->tbl->ctx;
+
+	mlx5hws_cmd_ste_destroy(ctx->mdev, matcher->match_ste.ste_1_base);
+	mlx5hws_cmd_ste_destroy(ctx->mdev, matcher->match_ste.ste_0_base);
 	if (!(matcher->flags & MLX5HWS_MATCHER_FLAGS_COLLISION))
-		mlx5hws_definer_mt_uninit(matcher->tbl->ctx, matcher->mt);
+		mlx5hws_definer_mt_uninit(ctx, matcher->mt);
 }
 
 static int
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h
index 32e83cddcd60..ae20bcebfdde 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h
@@ -48,7 +48,8 @@ struct mlx5hws_match_template {
 struct mlx5hws_matcher_match_ste {
 	u32 rtc_0_id;
 	u32 rtc_1_id;
-	struct mlx5hws_pool *pool;
+	u32 ste_0_base;
+	u32 ste_1_base;
 };
 
 enum {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v2 5/8] net/mlx5: HWS, Decouple matcher RX and TX sizes
  2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
                   ` (3 preceding siblings ...)
  2025-06-22 17:22 ` [PATCH net-next v2 4/8] net/mlx5: HWS, Create STEs directly from matcher Mark Bloch
@ 2025-06-22 17:22 ` Mark Bloch
  2025-06-24 18:57   ` Simon Horman
  2025-06-22 17:22 ` [PATCH net-next v2 6/8] net/mlx5: HWS, Track matcher sizes individually Mark Bloch
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 23+ messages in thread
From: Mark Bloch @ 2025-06-22 17:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe, Vlad Dogaru, Yevgeny Kliteynik, Mark Bloch

From: Vlad Dogaru <vdogaru@nvidia.com>

Kernel HWS only uses FDB tables and, as such, creates two lower level
containers (RTCs) for each matcher: one for RX and one for TX. Allow
these RTCs to differ in size by converting the size part of the matcher
attribute to a two element array.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../mellanox/mlx5/core/steering/hws/bwc.c     |   7 +-
 .../mellanox/mlx5/core/steering/hws/debug.c   |  10 +-
 .../mellanox/mlx5/core/steering/hws/matcher.c | 107 ++++++++++++------
 .../mellanox/mlx5/core/steering/hws/mlx5hws.h |  28 +++--
 4 files changed, 104 insertions(+), 48 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
index 665e6e285db5..009641e6c874 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
@@ -48,7 +48,7 @@ static void hws_bwc_unlock_all_queues(struct mlx5hws_context *ctx)
 
 static void hws_bwc_matcher_init_attr(struct mlx5hws_bwc_matcher *bwc_matcher,
 				      u32 priority,
-				      u8 size_log,
+				      u8 size_log_rx, u8 size_log_tx,
 				      struct mlx5hws_matcher_attr *attr)
 {
 	struct mlx5hws_bwc_matcher *first_matcher =
@@ -62,7 +62,8 @@ static void hws_bwc_matcher_init_attr(struct mlx5hws_bwc_matcher *bwc_matcher,
 	attr->optimize_flow_src = MLX5HWS_MATCHER_FLOW_SRC_ANY;
 	attr->insert_mode = MLX5HWS_MATCHER_INSERT_BY_HASH;
 	attr->distribute_mode = MLX5HWS_MATCHER_DISTRIBUTE_BY_HASH;
-	attr->rule.num_log = size_log;
+	attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX].rule.num_log = size_log_rx;
+	attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX].rule.num_log = size_log_tx;
 	attr->resizable = true;
 	attr->max_num_of_at_attach = MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM;
 
@@ -93,6 +94,7 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher,
 	hws_bwc_matcher_init_attr(bwc_matcher,
 				  priority,
 				  MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG,
+				  MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG,
 				  &attr);
 
 	bwc_matcher->priority = priority;
@@ -696,6 +698,7 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher)
 	hws_bwc_matcher_init_attr(bwc_matcher,
 				  bwc_matcher->priority,
 				  bwc_matcher->size_log,
+				  bwc_matcher->size_log,
 				  &matcher_attr);
 
 	old_matcher = bwc_matcher->matcher;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c
index f9b75aefcaa7..2ec8cb10139a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c
@@ -99,17 +99,19 @@ hws_debug_dump_matcher_attr(struct seq_file *f, struct mlx5hws_matcher *matcher)
 {
 	struct mlx5hws_matcher_attr *attr = &matcher->attr;
 
-	seq_printf(f, "%d,0x%llx,%d,%d,%d,%d,%d,%d,%d,%d\n",
+	seq_printf(f, "%d,0x%llx,%d,%d,%d,%d,%d,%d,%d,%d,-1,-1,%d,%d\n",
 		   MLX5HWS_DEBUG_RES_TYPE_MATCHER_ATTR,
 		   HWS_PTR_TO_ID(matcher),
 		   attr->priority,
 		   attr->mode,
-		   attr->table.sz_row_log,
-		   attr->table.sz_col_log,
+		   attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX].table.sz_row_log,
+		   attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX].table.sz_col_log,
 		   attr->optimize_using_rule_idx,
 		   attr->optimize_flow_src,
 		   attr->insert_mode,
-		   attr->distribute_mode);
+		   attr->distribute_mode,
+		   attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX].table.sz_row_log,
+		   attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX].table.sz_col_log);
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c
index b0fcaf508e06..f3ea09caba2b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c
@@ -468,12 +468,16 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher)
 	struct mlx5hws_cmd_rtc_create_attr rtc_attr = {0};
 	struct mlx5hws_match_template *mt = matcher->mt;
 	struct mlx5hws_context *ctx = matcher->tbl->ctx;
+	union mlx5hws_matcher_size *size_rx, *size_tx;
 	struct mlx5hws_table *tbl = matcher->tbl;
 	u32 obj_id;
 	int ret;
 
-	rtc_attr.log_size = attr->table.sz_row_log;
-	rtc_attr.log_depth = attr->table.sz_col_log;
+	size_rx = &attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX];
+	size_tx = &attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX];
+
+	rtc_attr.log_size = size_rx->table.sz_row_log;
+	rtc_attr.log_depth = size_rx->table.sz_col_log;
 	rtc_attr.is_frst_jumbo = mlx5hws_matcher_mt_is_jumbo(mt);
 	rtc_attr.is_scnd_range = 0;
 	rtc_attr.miss_ft_id = matcher->end_ft_id;
@@ -525,6 +529,8 @@ static int hws_matcher_create_rtc(struct mlx5hws_matcher *matcher)
 	}
 
 	if (tbl->type == MLX5HWS_TABLE_TYPE_FDB) {
+		rtc_attr.log_size = size_tx->table.sz_row_log;
+		rtc_attr.log_depth = size_tx->table.sz_col_log;
 		rtc_attr.ste_base = matcher->match_ste.ste_1_base;
 		rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(tbl->type, true);
 
@@ -562,23 +568,33 @@ hws_matcher_check_attr_sz(struct mlx5hws_cmd_query_caps *caps,
 			  struct mlx5hws_matcher *matcher)
 {
 	struct mlx5hws_matcher_attr *attr = &matcher->attr;
+	struct mlx5hws_context *ctx = matcher->tbl->ctx;
+	union mlx5hws_matcher_size *size;
+	int i;
 
-	if (attr->table.sz_col_log > caps->rtc_log_depth_max) {
-		mlx5hws_err(matcher->tbl->ctx, "Matcher depth exceeds limit %d\n",
-			    caps->rtc_log_depth_max);
-		return -EOPNOTSUPP;
-	}
+	for (i = 0; i < 2; i++) {
+		size = &attr->size[i];
 
-	if (attr->table.sz_col_log + attr->table.sz_row_log > caps->ste_alloc_log_max) {
-		mlx5hws_err(matcher->tbl->ctx, "Total matcher size exceeds limit %d\n",
-			    caps->ste_alloc_log_max);
-		return -EOPNOTSUPP;
-	}
+		if (size->table.sz_col_log > caps->rtc_log_depth_max) {
+			mlx5hws_err(ctx, "Matcher depth exceeds limit %d\n",
+				    caps->rtc_log_depth_max);
+			return -EOPNOTSUPP;
+		}
 
-	if (attr->table.sz_col_log + attr->table.sz_row_log < caps->ste_alloc_log_gran) {
-		mlx5hws_err(matcher->tbl->ctx, "Total matcher size below limit %d\n",
-			    caps->ste_alloc_log_gran);
-		return -EOPNOTSUPP;
+		if (size->table.sz_col_log + size->table.sz_row_log >
+		    caps->ste_alloc_log_max) {
+			mlx5hws_err(ctx,
+				    "Total matcher size exceeds limit %d\n",
+				    caps->ste_alloc_log_max);
+			return -EOPNOTSUPP;
+		}
+
+		if (size->table.sz_col_log + size->table.sz_row_log <
+		    caps->ste_alloc_log_gran) {
+			mlx5hws_err(ctx, "Total matcher size below limit %d\n",
+				    caps->ste_alloc_log_gran);
+			return -EOPNOTSUPP;
+		}
 	}
 
 	return 0;
@@ -666,6 +682,7 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher)
 {
 	struct mlx5hws_cmd_ste_create_attr ste_attr = {};
 	struct mlx5hws_context *ctx = matcher->tbl->ctx;
+	union mlx5hws_matcher_size *size;
 	int ret;
 
 	/* Calculate match, range and hash definers */
@@ -682,11 +699,11 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher)
 
 	/* Create an STE range each for RX and TX. */
 	ste_attr.table_type = FS_FT_FDB_RX;
+	size = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_RX];
 	ste_attr.log_obj_range =
 		matcher->attr.optimize_flow_src ==
-				MLX5HWS_MATCHER_FLOW_SRC_VPORT ?
-				0 : matcher->attr.table.sz_col_log +
-				    matcher->attr.table.sz_row_log;
+			MLX5HWS_MATCHER_FLOW_SRC_VPORT ?
+			0 : size->table.sz_col_log + size->table.sz_row_log;
 
 	ret = mlx5hws_cmd_ste_create(ctx->mdev, &ste_attr,
 				     &matcher->match_ste.ste_0_base);
@@ -696,11 +713,11 @@ static int hws_matcher_bind_mt(struct mlx5hws_matcher *matcher)
 	}
 
 	ste_attr.table_type = FS_FT_FDB_TX;
+	size = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_TX];
 	ste_attr.log_obj_range =
 		matcher->attr.optimize_flow_src ==
-				MLX5HWS_MATCHER_FLOW_SRC_WIRE ?
-				0 : matcher->attr.table.sz_col_log +
-				    matcher->attr.table.sz_row_log;
+			MLX5HWS_MATCHER_FLOW_SRC_WIRE ?
+			0 : size->table.sz_col_log + size->table.sz_row_log;
 
 	ret = mlx5hws_cmd_ste_create(ctx->mdev, &ste_attr,
 				     &matcher->match_ste.ste_1_base);
@@ -735,6 +752,10 @@ hws_matcher_validate_insert_mode(struct mlx5hws_cmd_query_caps *caps,
 {
 	struct mlx5hws_matcher_attr *attr = &matcher->attr;
 	struct mlx5hws_context *ctx = matcher->tbl->ctx;
+	union mlx5hws_matcher_size *size_rx, *size_tx;
+
+	size_rx = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_RX];
+	size_tx = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_TX];
 
 	switch (attr->insert_mode) {
 	case MLX5HWS_MATCHER_INSERT_BY_HASH:
@@ -745,7 +766,7 @@ hws_matcher_validate_insert_mode(struct mlx5hws_cmd_query_caps *caps,
 		break;
 
 	case MLX5HWS_MATCHER_INSERT_BY_INDEX:
-		if (attr->table.sz_col_log) {
+		if (size_rx->table.sz_col_log || size_tx->table.sz_col_log) {
 			mlx5hws_err(ctx, "Matcher with INSERT_BY_INDEX supports only Nx1 table size\n");
 			return -EOPNOTSUPP;
 		}
@@ -765,7 +786,10 @@ hws_matcher_validate_insert_mode(struct mlx5hws_cmd_query_caps *caps,
 				return -EOPNOTSUPP;
 			}
 
-			if (attr->table.sz_row_log > MLX5_IFC_RTC_LINEAR_LOOKUP_TBL_LOG_MAX) {
+			if (size_rx->table.sz_row_log >
+				MLX5_IFC_RTC_LINEAR_LOOKUP_TBL_LOG_MAX ||
+			    size_tx->table.sz_row_log >
+				MLX5_IFC_RTC_LINEAR_LOOKUP_TBL_LOG_MAX) {
 				mlx5hws_err(ctx, "Matcher with linear distribute: rows exceed limit %d",
 					    MLX5_IFC_RTC_LINEAR_LOOKUP_TBL_LOG_MAX);
 				return -EOPNOTSUPP;
@@ -789,6 +813,10 @@ hws_matcher_process_attr(struct mlx5hws_cmd_query_caps *caps,
 			 struct mlx5hws_matcher *matcher)
 {
 	struct mlx5hws_matcher_attr *attr = &matcher->attr;
+	union mlx5hws_matcher_size *size_rx, *size_tx;
+
+	size_rx = &attr->size[MLX5HWS_MATCHER_SIZE_TYPE_RX];
+	size_tx = &attr->size[MLX5HWS_MATCHER_SIZE_TYPE_TX];
 
 	if (hws_matcher_validate_insert_mode(caps, matcher))
 		return -EOPNOTSUPP;
@@ -800,8 +828,12 @@ hws_matcher_process_attr(struct mlx5hws_cmd_query_caps *caps,
 
 	/* Convert number of rules to the required depth */
 	if (attr->mode == MLX5HWS_MATCHER_RESOURCE_MODE_RULE &&
-	    attr->insert_mode == MLX5HWS_MATCHER_INSERT_BY_HASH)
-		attr->table.sz_col_log = hws_matcher_rules_to_tbl_depth(attr->rule.num_log);
+	    attr->insert_mode == MLX5HWS_MATCHER_INSERT_BY_HASH) {
+		size_rx->table.sz_col_log =
+			hws_matcher_rules_to_tbl_depth(size_rx->rule.num_log);
+		size_tx->table.sz_col_log =
+			hws_matcher_rules_to_tbl_depth(size_tx->rule.num_log);
+	}
 
 	matcher->flags |= attr->resizable ? MLX5HWS_MATCHER_FLAGS_RESIZABLE : 0;
 	matcher->flags |= attr->isolated_matcher_end_ft_id ?
@@ -862,14 +894,19 @@ static int
 hws_matcher_create_col_matcher(struct mlx5hws_matcher *matcher)
 {
 	struct mlx5hws_context *ctx = matcher->tbl->ctx;
+	union mlx5hws_matcher_size *size_rx, *size_tx;
 	struct mlx5hws_matcher *col_matcher;
-	int ret;
+	int i, ret;
+
+	size_rx = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_RX];
+	size_tx = &matcher->attr.size[MLX5HWS_MATCHER_SIZE_TYPE_TX];
 
 	if (matcher->attr.mode != MLX5HWS_MATCHER_RESOURCE_MODE_RULE ||
 	    matcher->attr.insert_mode == MLX5HWS_MATCHER_INSERT_BY_INDEX)
 		return 0;
 
-	if (!hws_matcher_requires_col_tbl(matcher->attr.rule.num_log))
+	if (!hws_matcher_requires_col_tbl(size_rx->rule.num_log) &&
+	    !hws_matcher_requires_col_tbl(size_tx->rule.num_log))
 		return 0;
 
 	col_matcher = kzalloc(sizeof(*matcher), GFP_KERNEL);
@@ -886,10 +923,16 @@ hws_matcher_create_col_matcher(struct mlx5hws_matcher *matcher)
 	col_matcher->flags |= MLX5HWS_MATCHER_FLAGS_COLLISION;
 	col_matcher->attr.mode = MLX5HWS_MATCHER_RESOURCE_MODE_HTABLE;
 	col_matcher->attr.optimize_flow_src = matcher->attr.optimize_flow_src;
-	col_matcher->attr.table.sz_row_log = matcher->attr.rule.num_log;
-	col_matcher->attr.table.sz_col_log = MLX5HWS_MATCHER_ASSURED_COL_TBL_DEPTH;
-	if (col_matcher->attr.table.sz_row_log > MLX5HWS_MATCHER_ASSURED_ROW_RATIO)
-		col_matcher->attr.table.sz_row_log -= MLX5HWS_MATCHER_ASSURED_ROW_RATIO;
+	for (i = 0; i < 2; i++) {
+		union mlx5hws_matcher_size *dst = &col_matcher->attr.size[i];
+		union mlx5hws_matcher_size *src = &matcher->attr.size[i];
+
+		dst->table.sz_row_log = src->rule.num_log;
+		dst->table.sz_col_log = MLX5HWS_MATCHER_ASSURED_COL_TBL_DEPTH;
+		if (dst->table.sz_row_log > MLX5HWS_MATCHER_ASSURED_ROW_RATIO)
+			dst->table.sz_row_log -=
+				MLX5HWS_MATCHER_ASSURED_ROW_RATIO;
+	}
 
 	col_matcher->attr.max_num_of_at_attach = matcher->attr.max_num_of_at_attach;
 	col_matcher->attr.isolated_matcher_end_ft_id =
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h
index a1295a311b70..59c14745ed0c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h
@@ -93,6 +93,23 @@ enum mlx5hws_matcher_distribute_mode {
 	MLX5HWS_MATCHER_DISTRIBUTE_BY_LINEAR = 0x1,
 };
 
+enum mlx5hws_matcher_size_type {
+	MLX5HWS_MATCHER_SIZE_TYPE_RX,
+	MLX5HWS_MATCHER_SIZE_TYPE_TX,
+	MLX5HWS_MATCHER_SIZE_TYPE_MAX,
+};
+
+union mlx5hws_matcher_size {
+	struct {
+		u8 sz_row_log;
+		u8 sz_col_log;
+	} table;
+
+	struct {
+		u8 num_log;
+	} rule;
+};
+
 struct mlx5hws_matcher_attr {
 	/* Processing priority inside table */
 	u32 priority;
@@ -107,16 +124,7 @@ struct mlx5hws_matcher_attr {
 	enum mlx5hws_matcher_distribute_mode distribute_mode;
 	/* Define whether the created matcher supports resizing into a bigger matcher */
 	bool resizable;
-	union {
-		struct {
-			u8 sz_row_log;
-			u8 sz_col_log;
-		} table;
-
-		struct {
-			u8 num_log;
-		} rule;
-	};
+	union mlx5hws_matcher_size size[MLX5HWS_MATCHER_SIZE_TYPE_MAX];
 	/* Optional AT attach configuration - Max number of additional AT */
 	u8 max_num_of_at_attach;
 	/* Optional end FT (miss FT ID) for match RTC (for isolated matcher) */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v2 6/8] net/mlx5: HWS, Track matcher sizes individually
  2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
                   ` (4 preceding siblings ...)
  2025-06-22 17:22 ` [PATCH net-next v2 5/8] net/mlx5: HWS, Decouple matcher RX and TX sizes Mark Bloch
@ 2025-06-22 17:22 ` Mark Bloch
  2025-06-22 17:22 ` [PATCH net-next v2 7/8] net/mlx5: HWS, Shrink empty matchers Mark Bloch
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Mark Bloch @ 2025-06-22 17:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe, Vlad Dogaru, Yevgeny Kliteynik, Mark Bloch

From: Vlad Dogaru <vdogaru@nvidia.com>

Track and grow matcher sizes individually for RX and TX RTCs. This
allows RX-only or TX-only use cases to effectively halve the device
resources they use.

For testing we used a simple module that inserts 1M RX-only rules and
measured the number of pages the device requests, and memory usage as
reported by `free -h`.

			Pages		Memory
Before this patch:	300k		1.5GiB
After this patch:	160k		900MiB

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../mellanox/mlx5/core/steering/hws/bwc.c     | 213 +++++++++++++-----
 .../mellanox/mlx5/core/steering/hws/bwc.h     |  14 +-
 2 files changed, 167 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
index 009641e6c874..0a7903cf75e8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
@@ -93,12 +93,11 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher,
 
 	hws_bwc_matcher_init_attr(bwc_matcher,
 				  priority,
-				  MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG,
-				  MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG,
+				  bwc_matcher->rx_size.size_log,
+				  bwc_matcher->tx_size.size_log,
 				  &attr);
 
 	bwc_matcher->priority = priority;
-	bwc_matcher->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG;
 
 	bwc_matcher->size_of_at_array = MLX5HWS_BWC_MATCHER_ATTACH_AT_NUM;
 	bwc_matcher->at = kcalloc(bwc_matcher->size_of_at_array,
@@ -150,6 +149,20 @@ int mlx5hws_bwc_matcher_create_simple(struct mlx5hws_bwc_matcher *bwc_matcher,
 	return -EINVAL;
 }
 
+static void
+hws_bwc_matcher_init_size_rxtx(struct mlx5hws_bwc_matcher_size *size)
+{
+	size->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG;
+	atomic_set(&size->num_of_rules, 0);
+	atomic_set(&size->rehash_required, false);
+}
+
+static void hws_bwc_matcher_init_size(struct mlx5hws_bwc_matcher *bwc_matcher)
+{
+	hws_bwc_matcher_init_size_rxtx(&bwc_matcher->rx_size);
+	hws_bwc_matcher_init_size_rxtx(&bwc_matcher->tx_size);
+}
+
 struct mlx5hws_bwc_matcher *
 mlx5hws_bwc_matcher_create(struct mlx5hws_table *table,
 			   u32 priority,
@@ -170,8 +183,7 @@ mlx5hws_bwc_matcher_create(struct mlx5hws_table *table,
 	if (!bwc_matcher)
 		return NULL;
 
-	atomic_set(&bwc_matcher->num_of_rules, 0);
-	atomic_set(&bwc_matcher->rehash_required, false);
+	hws_bwc_matcher_init_size(bwc_matcher);
 
 	/* Check if the required match params can be all matched
 	 * in single STE, otherwise complex matcher is needed.
@@ -221,12 +233,13 @@ int mlx5hws_bwc_matcher_destroy_simple(struct mlx5hws_bwc_matcher *bwc_matcher)
 
 int mlx5hws_bwc_matcher_destroy(struct mlx5hws_bwc_matcher *bwc_matcher)
 {
-	u32 num_of_rules = atomic_read(&bwc_matcher->num_of_rules);
+	u32 rx_rules = atomic_read(&bwc_matcher->rx_size.num_of_rules);
+	u32 tx_rules = atomic_read(&bwc_matcher->tx_size.num_of_rules);
 
-	if (num_of_rules)
+	if (rx_rules || tx_rules)
 		mlx5hws_err(bwc_matcher->matcher->tbl->ctx,
-			    "BWC matcher destroy: matcher still has %d rules\n",
-			    num_of_rules);
+			    "BWC matcher destroy: matcher still has %u RX and %u TX rules\n",
+			    rx_rules, tx_rules);
 
 	if (bwc_matcher->complex)
 		mlx5hws_bwc_matcher_destroy_complex(bwc_matcher);
@@ -386,6 +399,16 @@ hws_bwc_rule_destroy_hws_sync(struct mlx5hws_bwc_rule *bwc_rule,
 	return 0;
 }
 
+static void hws_bwc_rule_cnt_dec(struct mlx5hws_bwc_rule *bwc_rule)
+{
+	struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher;
+
+	if (!bwc_rule->skip_rx)
+		atomic_dec(&bwc_matcher->rx_size.num_of_rules);
+	if (!bwc_rule->skip_tx)
+		atomic_dec(&bwc_matcher->tx_size.num_of_rules);
+}
+
 int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule)
 {
 	struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher;
@@ -402,7 +425,7 @@ int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule)
 	mutex_lock(queue_lock);
 
 	ret = hws_bwc_rule_destroy_hws_sync(bwc_rule, &attr);
-	atomic_dec(&bwc_matcher->num_of_rules);
+	hws_bwc_rule_cnt_dec(bwc_rule);
 	hws_bwc_rule_list_remove(bwc_rule);
 
 	mutex_unlock(queue_lock);
@@ -489,25 +512,27 @@ hws_bwc_rule_update_sync(struct mlx5hws_bwc_rule *bwc_rule,
 }
 
 static bool
-hws_bwc_matcher_size_maxed_out(struct mlx5hws_bwc_matcher *bwc_matcher)
+hws_bwc_matcher_size_maxed_out(struct mlx5hws_bwc_matcher *bwc_matcher,
+			       struct mlx5hws_bwc_matcher_size *size)
 {
 	struct mlx5hws_cmd_query_caps *caps = bwc_matcher->matcher->tbl->ctx->caps;
 
 	/* check the match RTC size */
-	return (bwc_matcher->size_log + MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH +
+	return (size->size_log + MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH +
 		MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP) >
 	       (caps->ste_alloc_log_max - 1);
 }
 
 static bool
 hws_bwc_matcher_rehash_size_needed(struct mlx5hws_bwc_matcher *bwc_matcher,
+				   struct mlx5hws_bwc_matcher_size *size,
 				   u32 num_of_rules)
 {
-	if (unlikely(hws_bwc_matcher_size_maxed_out(bwc_matcher)))
+	if (unlikely(hws_bwc_matcher_size_maxed_out(bwc_matcher, size)))
 		return false;
 
 	if (unlikely((num_of_rules * 100 / MLX5HWS_BWC_MATCHER_REHASH_PERCENT_TH) >=
-		     (1UL << bwc_matcher->size_log)))
+		     (1UL << size->size_log)))
 		return true;
 
 	return false;
@@ -564,20 +589,21 @@ hws_bwc_matcher_extend_at(struct mlx5hws_bwc_matcher *bwc_matcher,
 }
 
 static int
-hws_bwc_matcher_extend_size(struct mlx5hws_bwc_matcher *bwc_matcher)
+hws_bwc_matcher_extend_size(struct mlx5hws_bwc_matcher *bwc_matcher,
+			    struct mlx5hws_bwc_matcher_size *size)
 {
 	struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx;
 	struct mlx5hws_cmd_query_caps *caps = ctx->caps;
 
-	if (unlikely(hws_bwc_matcher_size_maxed_out(bwc_matcher))) {
+	if (unlikely(hws_bwc_matcher_size_maxed_out(bwc_matcher, size))) {
 		mlx5hws_err(ctx, "Can't resize matcher: depth exceeds limit %d\n",
 			    caps->rtc_log_depth_max);
 		return -ENOMEM;
 	}
 
-	bwc_matcher->size_log =
-		min(bwc_matcher->size_log + MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP,
-		    caps->ste_alloc_log_max - MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH);
+	size->size_log = min(size->size_log + MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP,
+			     caps->ste_alloc_log_max -
+				     MLX5HWS_MATCHER_ASSURED_MAIN_TBL_DEPTH);
 
 	return 0;
 }
@@ -697,8 +723,8 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher)
 
 	hws_bwc_matcher_init_attr(bwc_matcher,
 				  bwc_matcher->priority,
-				  bwc_matcher->size_log,
-				  bwc_matcher->size_log,
+				  bwc_matcher->rx_size.size_log,
+				  bwc_matcher->tx_size.size_log,
 				  &matcher_attr);
 
 	old_matcher = bwc_matcher->matcher;
@@ -736,21 +762,39 @@ static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher)
 static int
 hws_bwc_matcher_rehash_size(struct mlx5hws_bwc_matcher *bwc_matcher)
 {
+	bool need_rx_rehash, need_tx_rehash;
 	int ret;
 
-	/* If the current matcher size is already at its max size, we can't
-	 * do the rehash. Skip it and try adding the rule again - perhaps
-	 * there was some change.
+	need_rx_rehash = atomic_read(&bwc_matcher->rx_size.rehash_required);
+	need_tx_rehash = atomic_read(&bwc_matcher->tx_size.rehash_required);
+
+	/* It is possible that another rule has already performed rehash.
+	 * Need to check again if we really need rehash.
 	 */
-	if (hws_bwc_matcher_size_maxed_out(bwc_matcher))
+	if (!need_rx_rehash && !need_tx_rehash)
 		return 0;
 
-	/* It is possible that other rule has already performed rehash.
-	 * Need to check again if we really need rehash.
+	/* If the current matcher RX/TX size is already at its max size,
+	 * it can't be rehashed.
 	 */
-	if (!atomic_read(&bwc_matcher->rehash_required) &&
-	    !hws_bwc_matcher_rehash_size_needed(bwc_matcher,
-						atomic_read(&bwc_matcher->num_of_rules)))
+	if (need_rx_rehash &&
+	    hws_bwc_matcher_size_maxed_out(bwc_matcher,
+					   &bwc_matcher->rx_size)) {
+		atomic_set(&bwc_matcher->rx_size.rehash_required, false);
+		need_rx_rehash = false;
+	}
+	if (need_tx_rehash &&
+	    hws_bwc_matcher_size_maxed_out(bwc_matcher,
+					   &bwc_matcher->tx_size)) {
+		atomic_set(&bwc_matcher->tx_size.rehash_required, false);
+		need_tx_rehash = false;
+	}
+
+	/* If both RX and TX rehash flags are now off, it means that whatever
+	 * we wanted to rehash is now at its max size - no rehash can be done.
+	 * Return and try adding the rule again - perhaps there was some change.
+	 */
+	if (!need_rx_rehash && !need_tx_rehash)
 		return 0;
 
 	/* Now we're done all the checking - do the rehash:
@@ -759,12 +803,22 @@ hws_bwc_matcher_rehash_size(struct mlx5hws_bwc_matcher *bwc_matcher)
 	 *  - move all the rules to the new matcher
 	 *  - destroy the old matcher
 	 */
+	atomic_set(&bwc_matcher->rx_size.rehash_required, false);
+	atomic_set(&bwc_matcher->tx_size.rehash_required, false);
 
-	atomic_set(&bwc_matcher->rehash_required, false);
+	if (need_rx_rehash) {
+		ret = hws_bwc_matcher_extend_size(bwc_matcher,
+						  &bwc_matcher->rx_size);
+		if (ret)
+			return ret;
+	}
 
-	ret = hws_bwc_matcher_extend_size(bwc_matcher);
-	if (ret)
-		return ret;
+	if (need_tx_rehash) {
+		ret = hws_bwc_matcher_extend_size(bwc_matcher,
+						  &bwc_matcher->tx_size);
+		if (ret)
+			return ret;
+	}
 
 	return hws_bwc_matcher_move(bwc_matcher);
 }
@@ -816,6 +870,62 @@ static int hws_bwc_rule_get_at_idx(struct mlx5hws_bwc_rule *bwc_rule,
 	return at_idx;
 }
 
+static void hws_bwc_rule_cnt_inc_rxtx(struct mlx5hws_bwc_rule *bwc_rule,
+				      struct mlx5hws_bwc_matcher_size *size)
+{
+	u32 num_of_rules = atomic_inc_return(&size->num_of_rules);
+
+	if (unlikely(hws_bwc_matcher_rehash_size_needed(bwc_rule->bwc_matcher,
+							size, num_of_rules)))
+		atomic_set(&size->rehash_required, true);
+}
+
+static void hws_bwc_rule_cnt_inc(struct mlx5hws_bwc_rule *bwc_rule)
+{
+	struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher;
+
+	if (!bwc_rule->skip_rx)
+		hws_bwc_rule_cnt_inc_rxtx(bwc_rule, &bwc_matcher->rx_size);
+	if (!bwc_rule->skip_tx)
+		hws_bwc_rule_cnt_inc_rxtx(bwc_rule, &bwc_matcher->tx_size);
+}
+
+static int hws_bwc_rule_cnt_inc_with_rehash(struct mlx5hws_bwc_rule *bwc_rule,
+					    u16 bwc_queue_idx)
+{
+	struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher;
+	struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx;
+	struct mutex *queue_lock; /* Protect the queue */
+	int ret;
+
+	hws_bwc_rule_cnt_inc(bwc_rule);
+
+	if (!atomic_read(&bwc_matcher->rx_size.rehash_required) &&
+	    !atomic_read(&bwc_matcher->tx_size.rehash_required))
+		return 0;
+
+	queue_lock = hws_bwc_get_queue_lock(ctx, bwc_queue_idx);
+	mutex_unlock(queue_lock);
+
+	hws_bwc_lock_all_queues(ctx);
+	ret = hws_bwc_matcher_rehash_size(bwc_matcher);
+	hws_bwc_unlock_all_queues(ctx);
+
+	mutex_lock(queue_lock);
+
+	if (likely(!ret))
+		return 0;
+
+	/* Failed to rehash. Print a diagnostic and rollback the counters. */
+	mlx5hws_err(ctx,
+		    "BWC rule insertion: rehash to sizes [%d, %d] failed (%d)\n",
+		    bwc_matcher->rx_size.size_log,
+		    bwc_matcher->tx_size.size_log, ret);
+	hws_bwc_rule_cnt_dec(bwc_rule);
+
+	return ret;
+}
+
 int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule,
 				   u32 *match_param,
 				   struct mlx5hws_rule_action rule_actions[],
@@ -826,7 +936,6 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule,
 	struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx;
 	struct mlx5hws_rule_attr rule_attr;
 	struct mutex *queue_lock; /* Protect the queue */
-	u32 num_of_rules;
 	int ret = 0;
 	int at_idx;
 
@@ -844,26 +953,10 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule,
 		return -EINVAL;
 	}
 
-	/* check if number of rules require rehash */
-	num_of_rules = atomic_inc_return(&bwc_matcher->num_of_rules);
-
-	if (unlikely(hws_bwc_matcher_rehash_size_needed(bwc_matcher, num_of_rules))) {
+	ret = hws_bwc_rule_cnt_inc_with_rehash(bwc_rule, bwc_queue_idx);
+	if (unlikely(ret)) {
 		mutex_unlock(queue_lock);
-
-		hws_bwc_lock_all_queues(ctx);
-		ret = hws_bwc_matcher_rehash_size(bwc_matcher);
-		hws_bwc_unlock_all_queues(ctx);
-
-		if (ret) {
-			mlx5hws_err(ctx, "BWC rule insertion: rehash size [%d -> %d] failed (%d)\n",
-				    bwc_matcher->size_log - MLX5HWS_BWC_MATCHER_SIZE_LOG_STEP,
-				    bwc_matcher->size_log,
-				    ret);
-			atomic_dec(&bwc_matcher->num_of_rules);
-			return ret;
-		}
-
-		mutex_lock(queue_lock);
+		return ret;
 	}
 
 	ret = hws_bwc_rule_create_sync(bwc_rule,
@@ -881,8 +974,11 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule,
 	 * It could be because there was collision, or some other problem.
 	 * Try rehash by size and insert rule again - last chance.
 	 */
+	if (!bwc_rule->skip_rx)
+		atomic_set(&bwc_matcher->rx_size.rehash_required, true);
+	if (!bwc_rule->skip_tx)
+		atomic_set(&bwc_matcher->tx_size.rehash_required, true);
 
-	atomic_set(&bwc_matcher->rehash_required, true);
 	mutex_unlock(queue_lock);
 
 	hws_bwc_lock_all_queues(ctx);
@@ -891,7 +987,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule,
 
 	if (ret) {
 		mlx5hws_err(ctx, "BWC rule insertion: rehash failed (%d)\n", ret);
-		atomic_dec(&bwc_matcher->num_of_rules);
+		hws_bwc_rule_cnt_dec(bwc_rule);
 		return ret;
 	}
 
@@ -907,7 +1003,7 @@ int mlx5hws_bwc_rule_create_simple(struct mlx5hws_bwc_rule *bwc_rule,
 	if (unlikely(ret)) {
 		mutex_unlock(queue_lock);
 		mlx5hws_err(ctx, "BWC rule insertion failed (%d)\n", ret);
-		atomic_dec(&bwc_matcher->num_of_rules);
+		hws_bwc_rule_cnt_dec(bwc_rule);
 		return ret;
 	}
 
@@ -937,6 +1033,9 @@ mlx5hws_bwc_rule_create(struct mlx5hws_bwc_matcher *bwc_matcher,
 	if (unlikely(!bwc_rule))
 		return NULL;
 
+	mlx5hws_rule_skip(bwc_matcher->matcher, flow_source,
+			  &bwc_rule->skip_rx, &bwc_rule->skip_tx);
+
 	bwc_queue_idx = hws_bwc_gen_queue_idx(ctx);
 
 	if (bwc_matcher->complex)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h
index d21fc247a510..1e9de6b9222c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h
@@ -19,6 +19,13 @@
 #define MLX5HWS_BWC_POLLING_TIMEOUT 60
 
 struct mlx5hws_bwc_matcher_complex_data;
+
+struct mlx5hws_bwc_matcher_size {
+	u8 size_log;
+	atomic_t num_of_rules;
+	atomic_t rehash_required;
+};
+
 struct mlx5hws_bwc_matcher {
 	struct mlx5hws_matcher *matcher;
 	struct mlx5hws_match_template *mt;
@@ -27,10 +34,9 @@ struct mlx5hws_bwc_matcher {
 	struct mlx5hws_bwc_matcher *complex_first_bwc_matcher;
 	u8 num_of_at;
 	u8 size_of_at_array;
-	u8 size_log;
 	u32 priority;
-	atomic_t num_of_rules;
-	atomic_t rehash_required;
+	struct mlx5hws_bwc_matcher_size rx_size;
+	struct mlx5hws_bwc_matcher_size tx_size;
 	struct list_head *rules;
 };
 
@@ -40,6 +46,8 @@ struct mlx5hws_bwc_rule {
 	struct mlx5hws_bwc_rule *isolated_bwc_rule;
 	struct mlx5hws_bwc_complex_rule_hash_node *complex_hash_node;
 	u16 bwc_queue_idx;
+	bool skip_rx;
+	bool skip_tx;
 	struct list_head list_node;
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v2 7/8] net/mlx5: HWS, Shrink empty matchers
  2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
                   ` (5 preceding siblings ...)
  2025-06-22 17:22 ` [PATCH net-next v2 6/8] net/mlx5: HWS, Track matcher sizes individually Mark Bloch
@ 2025-06-22 17:22 ` Mark Bloch
  2025-06-25  0:08   ` Jakub Kicinski
  2025-06-22 17:22 ` [PATCH net-next v2 8/8] net/mlx5: Add HWS as secondary steering mode Mark Bloch
  2025-06-22 22:39 ` [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Zhu Yanjun
  8 siblings, 1 reply; 23+ messages in thread
From: Mark Bloch @ 2025-06-22 17:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe, Yevgeny Kliteynik, Vlad Dogaru, Mark Bloch

From: Yevgeny Kliteynik <kliteyn@nvidia.com>

Matcher size is dynamic: it starts at initial size, and then it grows
through rehash as more and more rules are added to this matcher.
When rules are deleted, matcher's size is not decreased. Rehash
approach is greedy. The idea is: if the matcher got to a certain size
at some point, chances are - it will get to this size again, so it is
better to avoid costly rehash operations whenever possible.

However, when all the rules of the matcher are deleted, this should
be viewed as special case. If the matcher actually got to the point
where it has zero rules, it might be an indication that some usecase
from the past is no longer happening. This is where some ICM can be
freed.

This patch handles this case: when a number of rules in a matcher
goes down to zero, the matcher's tables are shrunk to the initial
size.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../mellanox/mlx5/core/steering/hws/bwc.c     | 68 ++++++++++++++++++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
index 0a7903cf75e8..b7098c7d2112 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
@@ -3,6 +3,8 @@
 
 #include "internal.h"
 
+static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher);
+
 static u16 hws_bwc_gen_queue_idx(struct mlx5hws_context *ctx)
 {
 	/* assign random queue */
@@ -409,6 +411,70 @@ static void hws_bwc_rule_cnt_dec(struct mlx5hws_bwc_rule *bwc_rule)
 		atomic_dec(&bwc_matcher->tx_size.num_of_rules);
 }
 
+static int
+hws_bwc_matcher_rehash_shrink(struct mlx5hws_bwc_matcher *bwc_matcher)
+{
+	struct mlx5hws_bwc_matcher_size *rx_size = &bwc_matcher->rx_size;
+	struct mlx5hws_bwc_matcher_size *tx_size = &bwc_matcher->tx_size;
+
+	/* It is possible that another thread has added a rule.
+	 * Need to check again if we really need rehash/shrink.
+	 */
+	if (atomic_read(&rx_size->num_of_rules) ||
+	    atomic_read(&tx_size->num_of_rules))
+		return 0;
+
+	/* If the current matcher RX/TX size is already at its initial size. */
+	if (rx_size->size_log == MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG &&
+	    tx_size->size_log == MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG)
+		return 0;
+
+	/* Now we've done all the checking - do the shrinking:
+	 *  - reset match RTC size to the initial size
+	 *  - create new matcher
+	 *  - move the rules, which will not do anything as the matcher is empty
+	 *  - destroy the old matcher
+	 */
+
+	rx_size->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG;
+	tx_size->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG;
+
+	return hws_bwc_matcher_move(bwc_matcher);
+}
+
+static int hws_bwc_rule_cnt_dec_with_shrink(struct mlx5hws_bwc_rule *bwc_rule,
+					    u16 bwc_queue_idx)
+{
+	struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher;
+	struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx;
+	struct mutex *queue_lock; /* Protect the queue */
+	int ret;
+
+	hws_bwc_rule_cnt_dec(bwc_rule);
+
+	if (atomic_read(&bwc_matcher->rx_size.num_of_rules) ||
+	    atomic_read(&bwc_matcher->tx_size.num_of_rules))
+		return 0;
+
+	/* Matcher has no more rules - shrink it to save ICM. */
+
+	queue_lock = hws_bwc_get_queue_lock(ctx, bwc_queue_idx);
+	mutex_unlock(queue_lock);
+
+	hws_bwc_lock_all_queues(ctx);
+	ret = hws_bwc_matcher_rehash_shrink(bwc_matcher);
+	hws_bwc_unlock_all_queues(ctx);
+
+	mutex_lock(queue_lock);
+
+	if (unlikely(ret))
+		mlx5hws_err(ctx,
+			    "BWC rule deletion: shrinking empty matcher failed (%d)\n",
+			    ret);
+
+	return ret;
+}
+
 int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule)
 {
 	struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher;
@@ -425,8 +491,8 @@ int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule)
 	mutex_lock(queue_lock);
 
 	ret = hws_bwc_rule_destroy_hws_sync(bwc_rule, &attr);
-	hws_bwc_rule_cnt_dec(bwc_rule);
 	hws_bwc_rule_list_remove(bwc_rule);
+	hws_bwc_rule_cnt_dec_with_shrink(bwc_rule, idx);
 
 	mutex_unlock(queue_lock);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net-next v2 8/8] net/mlx5: Add HWS as secondary steering mode
  2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
                   ` (6 preceding siblings ...)
  2025-06-22 17:22 ` [PATCH net-next v2 7/8] net/mlx5: HWS, Shrink empty matchers Mark Bloch
@ 2025-06-22 17:22 ` Mark Bloch
  2025-06-22 22:39 ` [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Zhu Yanjun
  8 siblings, 0 replies; 23+ messages in thread
From: Mark Bloch @ 2025-06-22 17:22 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe, Yevgeny Kliteynik, Mark Bloch

From: Moshe Shemesh <moshe@nvidia.com>

Add HW Steering (HWS) as a secondary option for device steering mode. If
the device does not support SW Steering (SWS), HW Steering will be used
as the default, provided it is supported. FW Steering will now be
selected as the default only if both HWS and SWS are unavailable.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index a8046200d376..f30fc793e1fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -3919,6 +3919,8 @@ int mlx5_fs_core_alloc(struct mlx5_core_dev *dev)
 
 	if (mlx5_fs_dr_is_supported(dev))
 		steering->mode = MLX5_FLOW_STEERING_MODE_SMFS;
+	else if (mlx5_fs_hws_is_supported(dev))
+		steering->mode = MLX5_FLOW_STEERING_MODE_HMFS;
 	else
 		steering->mode = MLX5_FLOW_STEERING_MODE_DMFS;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage
  2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
                   ` (7 preceding siblings ...)
  2025-06-22 17:22 ` [PATCH net-next v2 8/8] net/mlx5: Add HWS as secondary steering mode Mark Bloch
@ 2025-06-22 22:39 ` Zhu Yanjun
  2025-06-23 12:03   ` Mark Bloch
  8 siblings, 1 reply; 23+ messages in thread
From: Zhu Yanjun @ 2025-06-22 22:39 UTC (permalink / raw)
  To: Mark Bloch, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe

在 2025/6/22 10:22, Mark Bloch 写道:
> This series optimizes ICM usage for unidirectional rules and
> empty matchers and with the last patch we make hardware steering
> the default FDB steering provider for NICs that don't support software
> steering.

In this patchset, ICM is not explained. I googled this ICM. And I got 
the following

"
ICM stands for Internal Context Memory, a specialized memory region used 
by Mellanox/NVIDIA network devices (e.g., ConnectX series NICs) to store 
hardware context and rule tables for offloaded operations like flow 
steering, filtering, and traffic redirection.

ICM is crucial when using hardware steering (HWS), where the NIC itself 
performs packet matching and forwarding without involving the host CPU.
"
If I am missing something, please correct me.

Zhu Yanjun

> 
> Hardware steering (HWS) uses a type of rule table container (RTC) that
> is unidirectional, so matchers consist of two RTCs to accommodate
> bidirectional rules.
> 
> This small series enables resizing the two RTCs independently by
> tracking the number of rules separately. For extreme cases where all
> rules are unidirectional, this results in saving close to half the
> memory footprint.
> 
> Results for inserting 1M unidirectional rules using a simple module:
> 
> 			Pages		Memory
> Before this patch:	300k		1.5GiB
> After this patch:	160k		900MiB
> 
> The 'Pages' column measures the number of 4KiB pages the device requests
> for itself (the ICM).
> 
> The 'Memory' column is the difference between peak usage and baseline
> usage (before starting the test) as reported by `free -h`.
> 
> In addition, second to last patch of the series handles a case where all
> the matcher's rules were deleted: the large RTCs of the matcher are no
> longer required, and we can save some more ICM by shrinking the matcher
> to its initial size.
> 
> Finally the last patch makes hardware steering the default mode
> when in swichdev for NICs that don't have software steering support.
> 
> Changelog
> =========
> Changes from v1 [0]:
> - Fixed author on patches 5 and 6.
> 
> References
> ==========
> [0] v1: https://lore.kernel.org/all/20250619115522.68469-1-mbloch@nvidia.com/
> 
> Moshe Shemesh (1):
>    net/mlx5: Add HWS as secondary steering mode
> 
> Vlad Dogaru (5):
>    net/mlx5: HWS, remove unused create_dest_array parameter
>    net/mlx5: HWS, Refactor and export rule skip logic
>    net/mlx5: HWS, Create STEs directly from matcher
>    net/mlx5: HWS, Decouple matcher RX and TX sizes
>    net/mlx5: HWS, Track matcher sizes individually
> 
> Yevgeny Kliteynik (2):
>    net/mlx5: HWS, remove incorrect comment
>    net/mlx5: HWS, Shrink empty matchers
> 
>   .../net/ethernet/mellanox/mlx5/core/fs_core.c |   2 +
>   .../mellanox/mlx5/core/steering/hws/action.c  |   7 +-
>   .../mellanox/mlx5/core/steering/hws/bwc.c     | 284 ++++++++++++++----
>   .../mellanox/mlx5/core/steering/hws/bwc.h     |  14 +-
>   .../mellanox/mlx5/core/steering/hws/debug.c   |  20 +-
>   .../mellanox/mlx5/core/steering/hws/fs_hws.c  |  15 +-
>   .../mellanox/mlx5/core/steering/hws/matcher.c | 166 ++++++----
>   .../mellanox/mlx5/core/steering/hws/matcher.h |   3 +-
>   .../mellanox/mlx5/core/steering/hws/mlx5hws.h |  36 ++-
>   .../mellanox/mlx5/core/steering/hws/rule.c    |  35 +--
>   .../mellanox/mlx5/core/steering/hws/rule.h    |   3 +
>   11 files changed, 403 insertions(+), 182 deletions(-)
> 
> 
> base-commit: 091d019adce033118776ef93b50a268f715ae8f6


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage
  2025-06-22 22:39 ` [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Zhu Yanjun
@ 2025-06-23 12:03   ` Mark Bloch
  0 siblings, 0 replies; 23+ messages in thread
From: Mark Bloch @ 2025-06-23 12:03 UTC (permalink / raw)
  To: Zhu Yanjun, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, Simon Horman
  Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
	linux-kernel, moshe



On 23/06/2025 1:39, Zhu Yanjun wrote:
> 在 2025/6/22 10:22, Mark Bloch 写道:
>> This series optimizes ICM usage for unidirectional rules and
>> empty matchers and with the last patch we make hardware steering
>> the default FDB steering provider for NICs that don't support software
>> steering.
> 
> In this patchset, ICM is not explained. I googled this ICM. And I got the following
> 
> "
> ICM stands for Internal Context Memory, a specialized memory region used by Mellanox/NVIDIA network devices (e.g., ConnectX series NICs) to store hardware context and rule tables for offloaded operations like flow steering, filtering, and traffic redirection.
> 
> ICM is crucial when using hardware steering (HWS), where the NIC itself performs packet matching and forwarding without involving the host CPU.
> "

Broadly speaking, yes. You can also check its consumption via devlink health reporter.
https://docs.kernel.org/networking/devlink/mlx5.html
check out icm_consumption on the above page.

Mark

> If I am missing something, please correct me.
> 
> Zhu Yanjun
> 
>>
>> Hardware steering (HWS) uses a type of rule table container (RTC) that
>> is unidirectional, so matchers consist of two RTCs to accommodate
>> bidirectional rules.
>>
>> This small series enables resizing the two RTCs independently by
>> tracking the number of rules separately. For extreme cases where all
>> rules are unidirectional, this results in saving close to half the
>> memory footprint.
>>
>> Results for inserting 1M unidirectional rules using a simple module:
>>
>>             Pages        Memory
>> Before this patch:    300k        1.5GiB
>> After this patch:    160k        900MiB
>>
>> The 'Pages' column measures the number of 4KiB pages the device requests
>> for itself (the ICM).
>>
>> The 'Memory' column is the difference between peak usage and baseline
>> usage (before starting the test) as reported by `free -h`.
>>
>> In addition, second to last patch of the series handles a case where all
>> the matcher's rules were deleted: the large RTCs of the matcher are no
>> longer required, and we can save some more ICM by shrinking the matcher
>> to its initial size.
>>
>> Finally the last patch makes hardware steering the default mode
>> when in swichdev for NICs that don't have software steering support.
>>
>> Changelog
>> =========
>> Changes from v1 [0]:
>> - Fixed author on patches 5 and 6.
>>
>> References
>> ==========
>> [0] v1: https://lore.kernel.org/all/20250619115522.68469-1-mbloch@nvidia.com/
>>
>> Moshe Shemesh (1):
>>    net/mlx5: Add HWS as secondary steering mode
>>
>> Vlad Dogaru (5):
>>    net/mlx5: HWS, remove unused create_dest_array parameter
>>    net/mlx5: HWS, Refactor and export rule skip logic
>>    net/mlx5: HWS, Create STEs directly from matcher
>>    net/mlx5: HWS, Decouple matcher RX and TX sizes
>>    net/mlx5: HWS, Track matcher sizes individually
>>
>> Yevgeny Kliteynik (2):
>>    net/mlx5: HWS, remove incorrect comment
>>    net/mlx5: HWS, Shrink empty matchers
>>
>>   .../net/ethernet/mellanox/mlx5/core/fs_core.c |   2 +
>>   .../mellanox/mlx5/core/steering/hws/action.c  |   7 +-
>>   .../mellanox/mlx5/core/steering/hws/bwc.c     | 284 ++++++++++++++----
>>   .../mellanox/mlx5/core/steering/hws/bwc.h     |  14 +-
>>   .../mellanox/mlx5/core/steering/hws/debug.c   |  20 +-
>>   .../mellanox/mlx5/core/steering/hws/fs_hws.c  |  15 +-
>>   .../mellanox/mlx5/core/steering/hws/matcher.c | 166 ++++++----
>>   .../mellanox/mlx5/core/steering/hws/matcher.h |   3 +-
>>   .../mellanox/mlx5/core/steering/hws/mlx5hws.h |  36 ++-
>>   .../mellanox/mlx5/core/steering/hws/rule.c    |  35 +--
>>   .../mellanox/mlx5/core/steering/hws/rule.h    |   3 +
>>   11 files changed, 403 insertions(+), 182 deletions(-)
>>
>>
>> base-commit: 091d019adce033118776ef93b50a268f715ae8f6
> 
> 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 1/8] net/mlx5: HWS, remove unused create_dest_array parameter
  2025-06-22 17:22 ` [PATCH net-next v2 1/8] net/mlx5: HWS, remove unused create_dest_array parameter Mark Bloch
@ 2025-06-24 18:37   ` Simon Horman
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Horman @ 2025-06-24 18:37 UTC (permalink / raw)
  To: Mark Bloch
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, moshe, Vlad Dogaru, Yevgeny Kliteynik

On Sun, Jun 22, 2025 at 08:22:19PM +0300, Mark Bloch wrote:
> From: Vlad Dogaru <vdogaru@nvidia.com>
> 
> `flow_source` is not used anywhere in mlx5hws_action_create_dest_array.
> 
> Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 2/8] net/mlx5: HWS, remove incorrect comment
  2025-06-22 17:22 ` [PATCH net-next v2 2/8] net/mlx5: HWS, remove incorrect comment Mark Bloch
@ 2025-06-24 18:37   ` Simon Horman
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Horman @ 2025-06-24 18:37 UTC (permalink / raw)
  To: Mark Bloch
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, moshe, Yevgeny Kliteynik, Vlad Dogaru

On Sun, Jun 22, 2025 at 08:22:20PM +0300, Mark Bloch wrote:
> From: Yevgeny Kliteynik <kliteyn@nvidia.com>
> 
> Removing incorrect comment section that is probably some
> copy-paste artifact.
> 
> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
> Reviewed-by: Vlad Dogaru <vdogaru@nvidia.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic
  2025-06-22 17:22 ` [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic Mark Bloch
@ 2025-06-24 18:38   ` Simon Horman
  2025-06-25  0:35     ` Yevgeny Kliteynik
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Horman @ 2025-06-24 18:38 UTC (permalink / raw)
  To: Mark Bloch
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, moshe, Vlad Dogaru, Yevgeny Kliteynik

On Sun, Jun 22, 2025 at 08:22:21PM +0300, Mark Bloch wrote:
> From: Vlad Dogaru <vdogaru@nvidia.com>
> 
> The bwc layer will use `mlx5hws_rule_skip` to keep track of numbers of
> RX and TX rules individually, so export this function for future usage.
> 
> While we're in there, reduce nesting by adding a couple of early return
> statements.

I'm all for reducing nesting. But this patch has two distinct changes.
Please consider splitting it into two patches.

> 
> Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>

...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 4/8] net/mlx5: HWS, Create STEs directly from matcher
  2025-06-22 17:22 ` [PATCH net-next v2 4/8] net/mlx5: HWS, Create STEs directly from matcher Mark Bloch
@ 2025-06-24 18:57   ` Simon Horman
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Horman @ 2025-06-24 18:57 UTC (permalink / raw)
  To: Mark Bloch
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, moshe, Vlad Dogaru, Yevgeny Kliteynik

On Sun, Jun 22, 2025 at 08:22:22PM +0300, Mark Bloch wrote:
> From: Vlad Dogaru <vdogaru@nvidia.com>
> 
> Matchers were using the pool abstraction solely as a convenience
> to allocate two STE ranges. The pool's core functionality, that
> of allocating individual items from the range, was unused.
> Matchers rely either on the hardware to hash rules into a table,
> or on a user-provided index.
> 
> Remove the STE pool from the matcher and allocate the STE ranges
> manually instead.
> 
> Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 5/8] net/mlx5: HWS, Decouple matcher RX and TX sizes
  2025-06-22 17:22 ` [PATCH net-next v2 5/8] net/mlx5: HWS, Decouple matcher RX and TX sizes Mark Bloch
@ 2025-06-24 18:57   ` Simon Horman
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Horman @ 2025-06-24 18:57 UTC (permalink / raw)
  To: Mark Bloch
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, moshe, Vlad Dogaru, Yevgeny Kliteynik

On Sun, Jun 22, 2025 at 08:22:23PM +0300, Mark Bloch wrote:
> From: Vlad Dogaru <vdogaru@nvidia.com>
> 
> Kernel HWS only uses FDB tables and, as such, creates two lower level
> containers (RTCs) for each matcher: one for RX and one for TX. Allow
> these RTCs to differ in size by converting the size part of the matcher
> attribute to a two element array.
> 
> Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 7/8] net/mlx5: HWS, Shrink empty matchers
  2025-06-22 17:22 ` [PATCH net-next v2 7/8] net/mlx5: HWS, Shrink empty matchers Mark Bloch
@ 2025-06-25  0:08   ` Jakub Kicinski
  2025-06-25 14:42     ` Yevgeny Kliteynik
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-06-25  0:08 UTC (permalink / raw)
  To: Mark Bloch
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Simon Horman, saeedm, gal, leonro, tariqt, Leon Romanovsky,
	netdev, linux-rdma, linux-kernel, moshe, Yevgeny Kliteynik,
	Vlad Dogaru

On Sun, 22 Jun 2025 20:22:25 +0300 Mark Bloch wrote:
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
> index 0a7903cf75e8..b7098c7d2112 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
> @@ -3,6 +3,8 @@
>  
>  #include "internal.h"
>  
> +static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher);

Is there a circular dependency? Normally we recommend that people
reorder code rather that add forward declarations.

>  static u16 hws_bwc_gen_queue_idx(struct mlx5hws_context *ctx)
>  {
>  	/* assign random queue */
> @@ -409,6 +411,70 @@ static void hws_bwc_rule_cnt_dec(struct mlx5hws_bwc_rule *bwc_rule)
>  		atomic_dec(&bwc_matcher->tx_size.num_of_rules);
>  }
>  
> +static int
> +hws_bwc_matcher_rehash_shrink(struct mlx5hws_bwc_matcher *bwc_matcher)
> +{
> +	struct mlx5hws_bwc_matcher_size *rx_size = &bwc_matcher->rx_size;
> +	struct mlx5hws_bwc_matcher_size *tx_size = &bwc_matcher->tx_size;
> +
> +	/* It is possible that another thread has added a rule.
> +	 * Need to check again if we really need rehash/shrink.
> +	 */
> +	if (atomic_read(&rx_size->num_of_rules) ||
> +	    atomic_read(&tx_size->num_of_rules))
> +		return 0;
> +
> +	/* If the current matcher RX/TX size is already at its initial size. */
> +	if (rx_size->size_log == MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG &&
> +	    tx_size->size_log == MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG)
> +		return 0;
> +
> +	/* Now we've done all the checking - do the shrinking:
> +	 *  - reset match RTC size to the initial size
> +	 *  - create new matcher
> +	 *  - move the rules, which will not do anything as the matcher is empty
> +	 *  - destroy the old matcher
> +	 */
> +
> +	rx_size->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG;
> +	tx_size->size_log = MLX5HWS_BWC_MATCHER_INIT_SIZE_LOG;
> +
> +	return hws_bwc_matcher_move(bwc_matcher);
> +}
> +
> +static int hws_bwc_rule_cnt_dec_with_shrink(struct mlx5hws_bwc_rule *bwc_rule,
> +					    u16 bwc_queue_idx)
> +{
> +	struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher;
> +	struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx;
> +	struct mutex *queue_lock; /* Protect the queue */
> +	int ret;
> +
> +	hws_bwc_rule_cnt_dec(bwc_rule);
> +
> +	if (atomic_read(&bwc_matcher->rx_size.num_of_rules) ||
> +	    atomic_read(&bwc_matcher->tx_size.num_of_rules))
> +		return 0;
> +
> +	/* Matcher has no more rules - shrink it to save ICM. */
> +
> +	queue_lock = hws_bwc_get_queue_lock(ctx, bwc_queue_idx);
> +	mutex_unlock(queue_lock);
> +
> +	hws_bwc_lock_all_queues(ctx);
> +	ret = hws_bwc_matcher_rehash_shrink(bwc_matcher);
> +	hws_bwc_unlock_all_queues(ctx);
> +
> +	mutex_lock(queue_lock);

Dropping and re-taking caller-held locks is a bad code smell.
Please refactor - presumably you want some portion of the condition
to be under the lock with the dec? return true / false based on that.
let the caller drop the lock and do the shrink if true was returned
(directly or with another helper)

> +	if (unlikely(ret))
> +		mlx5hws_err(ctx,
> +			    "BWC rule deletion: shrinking empty matcher failed (%d)\n",
> +			    ret);
> +
> +	return ret;
> +}
> +
>  int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule)
>  {
>  	struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher;
> @@ -425,8 +491,8 @@ int mlx5hws_bwc_rule_destroy_simple(struct mlx5hws_bwc_rule *bwc_rule)
>  	mutex_lock(queue_lock);
>  
>  	ret = hws_bwc_rule_destroy_hws_sync(bwc_rule, &attr);
> -	hws_bwc_rule_cnt_dec(bwc_rule);
>  	hws_bwc_rule_list_remove(bwc_rule);
> +	hws_bwc_rule_cnt_dec_with_shrink(bwc_rule, idx);
>  
>  	mutex_unlock(queue_lock);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic
  2025-06-24 18:38   ` Simon Horman
@ 2025-06-25  0:35     ` Yevgeny Kliteynik
  2025-06-25  0:45       ` Jakub Kicinski
  2025-06-25  9:45       ` Simon Horman
  0 siblings, 2 replies; 23+ messages in thread
From: Yevgeny Kliteynik @ 2025-06-25  0:35 UTC (permalink / raw)
  To: Simon Horman, Mark Bloch
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Andrew Lunn, saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev,
	linux-rdma, linux-kernel, moshe, Vlad Dogaru

Hi Simon,
Thanks for reviewing the patches!

On 24-Jun-25 21:38, Simon Horman wrote:
> On Sun, Jun 22, 2025 at 08:22:21PM +0300, Mark Bloch wrote:
>> From: Vlad Dogaru <vdogaru@nvidia.com>
>>
>> The bwc layer will use `mlx5hws_rule_skip` to keep track of numbers of
>> RX and TX rules individually, so export this function for future usage.
>>
>> While we're in there, reduce nesting by adding a couple of early return
>> statements.
> 
> I'm all for reducing nesting. But this patch has two distinct changes.
> Please consider splitting it into two patches.

Not sure I'd send the refactor thing alone - it isn't worth the effort
IMHO... But since I'm already in here - sure, will sent it in a separate
patch.


>>
>> Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
>> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
>> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> 
> ...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic
  2025-06-25  0:35     ` Yevgeny Kliteynik
@ 2025-06-25  0:45       ` Jakub Kicinski
  2025-06-25 14:42         ` Yevgeny Kliteynik
  2025-06-25  9:45       ` Simon Horman
  1 sibling, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-06-25  0:45 UTC (permalink / raw)
  To: Yevgeny Kliteynik
  Cc: Simon Horman, Mark Bloch, David S. Miller, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, saeedm, gal, leonro, tariqt,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel, moshe,
	Vlad Dogaru

On Wed, 25 Jun 2025 03:35:52 +0300 Yevgeny Kliteynik wrote:
> >> The bwc layer will use `mlx5hws_rule_skip` to keep track of numbers of
> >> RX and TX rules individually, so export this function for future usage.
> >>
> >> While we're in there, reduce nesting by adding a couple of early return
> >> statements.  
> > 
> > I'm all for reducing nesting. But this patch has two distinct changes.
> > Please consider splitting it into two patches.  
> 
> Not sure I'd send the refactor thing alone - it isn't worth the effort
> IMHO... But since I'm already in here - sure, will sent it in a separate
> patch.

FWIW having a function which returns void but with 2 output parameters
is in itself a bit awkward. I'd personally return a 2 bit bitmask of
which mode is enabled. But there's no accounting for taste.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic
  2025-06-25  0:35     ` Yevgeny Kliteynik
  2025-06-25  0:45       ` Jakub Kicinski
@ 2025-06-25  9:45       ` Simon Horman
  2025-06-25 13:41         ` Yevgeny Kliteynik
  1 sibling, 1 reply; 23+ messages in thread
From: Simon Horman @ 2025-06-25  9:45 UTC (permalink / raw)
  To: Yevgeny Kliteynik
  Cc: Mark Bloch, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, saeedm, gal, leonro, tariqt,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel, moshe,
	Vlad Dogaru

On Wed, Jun 25, 2025 at 03:35:52AM +0300, Yevgeny Kliteynik wrote:
> Hi Simon,
> Thanks for reviewing the patches!
> 
> On 24-Jun-25 21:38, Simon Horman wrote:
> > On Sun, Jun 22, 2025 at 08:22:21PM +0300, Mark Bloch wrote:
> > > From: Vlad Dogaru <vdogaru@nvidia.com>
> > > 
> > > The bwc layer will use `mlx5hws_rule_skip` to keep track of numbers of
> > > RX and TX rules individually, so export this function for future usage.
> > > 
> > > While we're in there, reduce nesting by adding a couple of early return
> > > statements.
> > 
> > I'm all for reducing nesting. But this patch has two distinct changes.
> > Please consider splitting it into two patches.
> 
> Not sure I'd send the refactor thing alone - it isn't worth the effort
> IMHO... But since I'm already in here - sure, will sent it in a separate
> patch.

FWIIW, I think the refactor is fine in the context of this patchset.
But I do feel that it being a separate patch within the patchset is best.

...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic
  2025-06-25  9:45       ` Simon Horman
@ 2025-06-25 13:41         ` Yevgeny Kliteynik
  0 siblings, 0 replies; 23+ messages in thread
From: Yevgeny Kliteynik @ 2025-06-25 13:41 UTC (permalink / raw)
  To: Simon Horman
  Cc: Mark Bloch, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, saeedm, gal, leonro, tariqt,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel, moshe,
	Vlad Dogaru

On 25-Jun-25 12:45, Simon Horman wrote:
> On Wed, Jun 25, 2025 at 03:35:52AM +0300, Yevgeny Kliteynik wrote:
>> Hi Simon,
>> Thanks for reviewing the patches!
>>
>> On 24-Jun-25 21:38, Simon Horman wrote:
>>> On Sun, Jun 22, 2025 at 08:22:21PM +0300, Mark Bloch wrote:
>>>> From: Vlad Dogaru <vdogaru@nvidia.com>
>>>>
>>>> The bwc layer will use `mlx5hws_rule_skip` to keep track of numbers of
>>>> RX and TX rules individually, so export this function for future usage.
>>>>
>>>> While we're in there, reduce nesting by adding a couple of early return
>>>> statements.
>>>
>>> I'm all for reducing nesting. But this patch has two distinct changes.
>>> Please consider splitting it into two patches.
>>
>> Not sure I'd send the refactor thing alone - it isn't worth the effort
>> IMHO... But since I'm already in here - sure, will sent it in a separate
>> patch.
> 
> FWIIW, I think the refactor is fine in the context of this patchset.
> But I do feel that it being a separate patch within the patchset is best.

Ack, thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic
  2025-06-25  0:45       ` Jakub Kicinski
@ 2025-06-25 14:42         ` Yevgeny Kliteynik
  0 siblings, 0 replies; 23+ messages in thread
From: Yevgeny Kliteynik @ 2025-06-25 14:42 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Simon Horman, Mark Bloch, David S. Miller, Paolo Abeni,
	Eric Dumazet, Andrew Lunn, saeedm, gal, leonro, tariqt,
	Leon Romanovsky, netdev, linux-rdma, linux-kernel, moshe,
	Vlad Dogaru

On 25-Jun-25 03:45, Jakub Kicinski wrote:
> On Wed, 25 Jun 2025 03:35:52 +0300 Yevgeny Kliteynik wrote:
>>>> The bwc layer will use `mlx5hws_rule_skip` to keep track of numbers of
>>>> RX and TX rules individually, so export this function for future usage.
>>>>
>>>> While we're in there, reduce nesting by adding a couple of early return
>>>> statements.
>>>
>>> I'm all for reducing nesting. But this patch has two distinct changes.
>>> Please consider splitting it into two patches.
>>
>> Not sure I'd send the refactor thing alone - it isn't worth the effort
>> IMHO... But since I'm already in here - sure, will sent it in a separate
>> patch.
> 
> FWIW having a function which returns void but with 2 output parameters
> is in itself a bit awkward. I'd personally return a 2 bit bitmask of
> which mode is enabled. But there's no accounting for taste.

Indeed, it's a matter of taste.
I see that it's kind of a style in this area of the code ¯\_(ツ)_/¯
There are more somewhat similar cases, and I'd like to be aligned
with the existing style.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net-next v2 7/8] net/mlx5: HWS, Shrink empty matchers
  2025-06-25  0:08   ` Jakub Kicinski
@ 2025-06-25 14:42     ` Yevgeny Kliteynik
  0 siblings, 0 replies; 23+ messages in thread
From: Yevgeny Kliteynik @ 2025-06-25 14:42 UTC (permalink / raw)
  To: Jakub Kicinski, Mark Bloch
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
	Simon Horman, saeedm, gal, leonro, tariqt, Leon Romanovsky,
	netdev, linux-rdma, linux-kernel, moshe, Vlad Dogaru


On 25-Jun-25 03:08, Jakub Kicinski wrote:
> On Sun, 22 Jun 2025 20:22:25 +0300 Mark Bloch wrote:
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
>> index 0a7903cf75e8..b7098c7d2112 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c
>> @@ -3,6 +3,8 @@
>>   
>>   #include "internal.h"
>>   
>> +static int hws_bwc_matcher_move(struct mlx5hws_bwc_matcher *bwc_matcher);
> 
> Is there a circular dependency? Normally we recommend that people
> reorder code rather that add forward declarations.

Sure, I can rearrange the code. It would, however, mean moving a lot
of code... I think I'll do it in a separate refactoring patch before
this functional one.

>> +static int hws_bwc_rule_cnt_dec_with_shrink(struct mlx5hws_bwc_rule *bwc_rule,
>> +					    u16 bwc_queue_idx)
>> +{
>> +	struct mlx5hws_bwc_matcher *bwc_matcher = bwc_rule->bwc_matcher;
>> +	struct mlx5hws_context *ctx = bwc_matcher->matcher->tbl->ctx;
>> +	struct mutex *queue_lock; /* Protect the queue */
>> +	int ret;
>> +
>> +	hws_bwc_rule_cnt_dec(bwc_rule);
>> +
>> +	if (atomic_read(&bwc_matcher->rx_size.num_of_rules) ||
>> +	    atomic_read(&bwc_matcher->tx_size.num_of_rules))
>> +		return 0;
>> +
>> +	/* Matcher has no more rules - shrink it to save ICM. */
>> +
>> +	queue_lock = hws_bwc_get_queue_lock(ctx, bwc_queue_idx);
>> +	mutex_unlock(queue_lock);
>> +
>> +	hws_bwc_lock_all_queues(ctx);
>> +	ret = hws_bwc_matcher_rehash_shrink(bwc_matcher);
>> +	hws_bwc_unlock_all_queues(ctx);
>> +
>> +	mutex_lock(queue_lock);
> 
> Dropping and re-taking caller-held locks is a bad code smell.
> Please refactor - presumably you want some portion of the condition
> to be under the lock with the dec? return true / false based on that.
> let the caller drop the lock and do the shrink if true was returned
> (directly or with another helper)

There are multiple queues that can function in parallel. Each rule
selects a random queue and immediately locks it. All the further
processing of this rule is done when this lock is held.
Sometimes there is need to do operation that requires full ownership
of the matcher. That is, this rule has to be the only rule that is
being processed. In such case, all the locks should be acquired,
which means that we're facing the 'dining philosophers' scenario.
All the locks should be acquired in the same order: the lock is
freed, and then all the locks are acquired in an orderly manner.
To have all this logic in the same function that acquires the first
lock would mean really complicating the code and breaking the simple
logical flow of the functions.

Thanks for the review!

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-06-25 14:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-22 17:22 [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Mark Bloch
2025-06-22 17:22 ` [PATCH net-next v2 1/8] net/mlx5: HWS, remove unused create_dest_array parameter Mark Bloch
2025-06-24 18:37   ` Simon Horman
2025-06-22 17:22 ` [PATCH net-next v2 2/8] net/mlx5: HWS, remove incorrect comment Mark Bloch
2025-06-24 18:37   ` Simon Horman
2025-06-22 17:22 ` [PATCH net-next v2 3/8] net/mlx5: HWS, Refactor and export rule skip logic Mark Bloch
2025-06-24 18:38   ` Simon Horman
2025-06-25  0:35     ` Yevgeny Kliteynik
2025-06-25  0:45       ` Jakub Kicinski
2025-06-25 14:42         ` Yevgeny Kliteynik
2025-06-25  9:45       ` Simon Horman
2025-06-25 13:41         ` Yevgeny Kliteynik
2025-06-22 17:22 ` [PATCH net-next v2 4/8] net/mlx5: HWS, Create STEs directly from matcher Mark Bloch
2025-06-24 18:57   ` Simon Horman
2025-06-22 17:22 ` [PATCH net-next v2 5/8] net/mlx5: HWS, Decouple matcher RX and TX sizes Mark Bloch
2025-06-24 18:57   ` Simon Horman
2025-06-22 17:22 ` [PATCH net-next v2 6/8] net/mlx5: HWS, Track matcher sizes individually Mark Bloch
2025-06-22 17:22 ` [PATCH net-next v2 7/8] net/mlx5: HWS, Shrink empty matchers Mark Bloch
2025-06-25  0:08   ` Jakub Kicinski
2025-06-25 14:42     ` Yevgeny Kliteynik
2025-06-22 17:22 ` [PATCH net-next v2 8/8] net/mlx5: Add HWS as secondary steering mode Mark Bloch
2025-06-22 22:39 ` [PATCH net-next v2 0/8] net/mlx5: HWS, Optimize matchers ICM usage Zhu Yanjun
2025-06-23 12:03   ` Mark Bloch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).