netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [pull request][net-next 00/11] mlx5 updates 2025-01-16
@ 2025-01-16 21:55 Saeed Mahameed
  2025-01-16 21:55 ` [net-next 01/11] net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR Saeed Mahameed
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky

From: Saeed Mahameed <saeedm@nvidia.com>

This series adds support for devmem TCP with mlx5.
For more information please see tag log below.

Please pull and let me know if there is any problem.

Thanks,
Saeed.


The following changes since commit 2ee738e90e80850582cbe10f34c6447965c1d87b:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2025-01-16 10:34:59 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2025-01-16

for you to fetch changes up to 45fc1c1ce6f92b7dd1cdd5a46072d41d36a8a816:

  net/mlx5e: Support ethtool tcp-data-split settings (2025-01-16 13:52:55 -0800)

----------------------------------------------------------------
mlx5-updates-2025-01-16

devmem TCP with mlx5.

Add support for netmem, mgmt queue API and tcp-data-split.
 - Minor refactoring
 - Separate page pool for headers
 - Use netmem struct as the page frag container in mlx5
 - Support UNREADABLE netmem for special page pools
 - Implement queue management API
 - Support ethtool tcp-data-split settings

Tested with tools/testing/selftests/drivers/net/hw/ncdevmem.c

----------------------------------------------------------------
Saeed Mahameed (11):
      net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR
      net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc
      net/mlx5e: SHAMPO: Remove redundant params
      net/mlx5e: SHAMPO: Improve hw gro capability checking
      net/mlx5e: SHAMPO: Separate pool for headers
      net/mlx5e: SHAMPO: Headers page pool stats
      net/mlx5e: Convert over to netmem
      net/mlx5e: Handle iov backed netmems
      net/mlx5e: Add support for UNREADABLE netmem page pools
      net/mlx5e: Implement queue mgmt ops and single channel swap
      net/mlx5e: Support ethtool tcp-data-split settings

 drivers/net/ethernet/mellanox/mlx5/core/en.h       |  10 +-
 .../net/ethernet/mellanox/mlx5/core/en/params.c    |   4 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  49 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 279 +++++++++++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    | 112 +++++----
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.c |  53 ++++
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |  24 ++
 net/Kconfig                                        |   2 +-
 8 files changed, 391 insertions(+), 142 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [net-next 01/11] net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-01-16 21:55 ` [net-next 02/11] net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc Saeed Mahameed
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky

From: Saeed Mahameed <saeedm@nvidia.com>

GENERIC_ALLOCATOR is a non-prompt kconfig, meaning users can't enable it
selectively. All kconfig users of GENERIC_ALLOCATOR select it, except of
NET_DEVMEM which only depends on it, there is no easy way to turn
GENERIC_ALLOCATOR on unless we select other unnecessary configs that
will select it.

Instead of depending on it, select it when NET_DEVMEM is enabled.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
 net/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/Kconfig b/net/Kconfig
index c3fca69a7c83..4c18dd416a50 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -68,8 +68,8 @@ config SKB_EXTENSIONS
 
 config NET_DEVMEM
 	def_bool y
+	select GENERIC_ALLOCATOR
 	depends on DMA_SHARED_BUFFER
-	depends on GENERIC_ALLOCATOR
 	depends on PAGE_POOL
 
 config NET_SHAPER
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 02/11] net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
  2025-01-16 21:55 ` [net-next 01/11] net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-01-16 21:55 ` [net-next 03/11] net/mlx5e: SHAMPO: Remove redundant params Saeed Mahameed
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

Drop redundant SHAMPO structure alloc/free functions.

Gather together function calls pertaining to header split info, pass
header per WQE (hd_per_wqe) as parameter to those function to avoid use
before initialization future mistakes.

Allocate HW GRO related info outside of the header related info scope.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |   1 -
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 132 +++++++++---------
 2 files changed, 63 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 979fc56205e1..66c93816803e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -628,7 +628,6 @@ struct mlx5e_shampo_hd {
 	struct mlx5e_frag_page *pages;
 	u32 hd_per_wq;
 	u16 hd_per_wqe;
-	u16 pages_per_wq;
 	unsigned long *bitmap;
 	u16 pi;
 	u16 ci;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index bd41b75d246e..c687c926cba3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -330,47 +330,6 @@ static inline void mlx5e_build_umr_wqe(struct mlx5e_rq *rq,
 	ucseg->mkey_mask     = cpu_to_be64(MLX5_MKEY_MASK_FREE);
 }
 
-static int mlx5e_rq_shampo_hd_alloc(struct mlx5e_rq *rq, int node)
-{
-	rq->mpwqe.shampo = kvzalloc_node(sizeof(*rq->mpwqe.shampo),
-					 GFP_KERNEL, node);
-	if (!rq->mpwqe.shampo)
-		return -ENOMEM;
-	return 0;
-}
-
-static void mlx5e_rq_shampo_hd_free(struct mlx5e_rq *rq)
-{
-	kvfree(rq->mpwqe.shampo);
-}
-
-static int mlx5e_rq_shampo_hd_info_alloc(struct mlx5e_rq *rq, int node)
-{
-	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
-
-	shampo->bitmap = bitmap_zalloc_node(shampo->hd_per_wq, GFP_KERNEL,
-					    node);
-	shampo->pages = kvzalloc_node(array_size(shampo->hd_per_wq,
-						 sizeof(*shampo->pages)),
-				     GFP_KERNEL, node);
-	if (!shampo->bitmap || !shampo->pages)
-		goto err_nomem;
-
-	return 0;
-
-err_nomem:
-	kvfree(shampo->bitmap);
-	kvfree(shampo->pages);
-
-	return -ENOMEM;
-}
-
-static void mlx5e_rq_shampo_hd_info_free(struct mlx5e_rq *rq)
-{
-	kvfree(rq->mpwqe.shampo->bitmap);
-	kvfree(rq->mpwqe.shampo->pages);
-}
-
 static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq, int node)
 {
 	int wq_sz = mlx5_wq_ll_get_size(&rq->mpwqe.wq);
@@ -581,19 +540,18 @@ static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct mlx5e_rq
 }
 
 static int mlx5e_create_rq_hd_umr_mkey(struct mlx5_core_dev *mdev,
-				       struct mlx5e_rq *rq)
+				       u16 hd_per_wq, u32 *umr_mkey)
 {
 	u32 max_ksm_size = BIT(MLX5_CAP_GEN(mdev, log_max_klm_list_size));
 
-	if (max_ksm_size < rq->mpwqe.shampo->hd_per_wq) {
+	if (max_ksm_size < hd_per_wq) {
 		mlx5_core_err(mdev, "max ksm list size 0x%x is smaller than shampo header buffer list size 0x%x\n",
-			      max_ksm_size, rq->mpwqe.shampo->hd_per_wq);
+			      max_ksm_size, hd_per_wq);
 		return -EINVAL;
 	}
-
-	return mlx5e_create_umr_ksm_mkey(mdev, rq->mpwqe.shampo->hd_per_wq,
+	return mlx5e_create_umr_ksm_mkey(mdev, hd_per_wq,
 					 MLX5E_SHAMPO_LOG_HEADER_ENTRY_SIZE,
-					 &rq->mpwqe.shampo->mkey);
+					 umr_mkey);
 }
 
 static void mlx5e_init_frags_partition(struct mlx5e_rq *rq)
@@ -755,6 +713,33 @@ static int mlx5e_init_rxq_rq(struct mlx5e_channel *c, struct mlx5e_params *param
 				  xdp_frag_size);
 }
 
+static int mlx5e_rq_shampo_hd_info_alloc(struct mlx5e_rq *rq, u16 hd_per_wq, int node)
+{
+	struct mlx5e_shampo_hd *shampo = rq->mpwqe.shampo;
+
+	shampo->hd_per_wq = hd_per_wq;
+
+	shampo->bitmap = bitmap_zalloc_node(hd_per_wq, GFP_KERNEL, node);
+	shampo->pages = kvzalloc_node(array_size(hd_per_wq, sizeof(*shampo->pages)),
+				      GFP_KERNEL, node);
+	if (!shampo->bitmap || !shampo->pages)
+		goto err_nomem;
+
+	return 0;
+
+err_nomem:
+	kvfree(shampo->pages);
+	bitmap_free(shampo->bitmap);
+
+	return -ENOMEM;
+}
+
+static void mlx5e_rq_shampo_hd_info_free(struct mlx5e_rq *rq)
+{
+	kvfree(rq->mpwqe.shampo->pages);
+	bitmap_free(rq->mpwqe.shampo->bitmap);
+}
+
 static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev,
 				struct mlx5e_params *params,
 				struct mlx5e_rq_param *rqp,
@@ -762,42 +747,51 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev,
 				u32 *pool_size,
 				int node)
 {
+	void *wqc = MLX5_ADDR_OF(rqc, rqp->rqc, wq);
+	u16 hd_per_wq;
+	int wq_size;
 	int err;
 
 	if (!test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
 		return 0;
-	err = mlx5e_rq_shampo_hd_alloc(rq, node);
-	if (err)
-		goto out;
-	rq->mpwqe.shampo->hd_per_wq =
-		mlx5e_shampo_hd_per_wq(mdev, params, rqp);
-	err = mlx5e_create_rq_hd_umr_mkey(mdev, rq);
+
+	rq->mpwqe.shampo = kvzalloc_node(sizeof(*rq->mpwqe.shampo),
+					 GFP_KERNEL, node);
+	if (!rq->mpwqe.shampo)
+		return -ENOMEM;
+
+	/* split headers data structures */
+	hd_per_wq = mlx5e_shampo_hd_per_wq(mdev, params, rqp);
+	err = mlx5e_rq_shampo_hd_info_alloc(rq, hd_per_wq, node);
 	if (err)
-		goto err_shampo_hd;
-	err = mlx5e_rq_shampo_hd_info_alloc(rq, node);
+		goto err_shampo_hd_info_alloc;
+
+	err = mlx5e_create_rq_hd_umr_mkey(mdev, hd_per_wq, &rq->mpwqe.shampo->mkey);
 	if (err)
-		goto err_shampo_info;
+		goto err_umr_mkey;
+
+	rq->mpwqe.shampo->key = cpu_to_be32(rq->mpwqe.shampo->mkey);
+	rq->mpwqe.shampo->hd_per_wqe =
+		mlx5e_shampo_hd_per_wqe(mdev, params, rqp);
+	wq_size = BIT(MLX5_GET(wq, wqc, log_wq_sz));
+	*pool_size += (rq->mpwqe.shampo->hd_per_wqe * wq_size) /
+		     MLX5E_SHAMPO_WQ_HEADER_PER_PAGE;
+
+	/* gro only data structures */
 	rq->hw_gro_data = kvzalloc_node(sizeof(*rq->hw_gro_data), GFP_KERNEL, node);
 	if (!rq->hw_gro_data) {
 		err = -ENOMEM;
 		goto err_hw_gro_data;
 	}
-	rq->mpwqe.shampo->key =
-		cpu_to_be32(rq->mpwqe.shampo->mkey);
-	rq->mpwqe.shampo->hd_per_wqe =
-		mlx5e_shampo_hd_per_wqe(mdev, params, rqp);
-	rq->mpwqe.shampo->pages_per_wq =
-		rq->mpwqe.shampo->hd_per_wq / MLX5E_SHAMPO_WQ_HEADER_PER_PAGE;
-	*pool_size += rq->mpwqe.shampo->pages_per_wq;
+
 	return 0;
 
 err_hw_gro_data:
-	mlx5e_rq_shampo_hd_info_free(rq);
-err_shampo_info:
 	mlx5_core_destroy_mkey(mdev, rq->mpwqe.shampo->mkey);
-err_shampo_hd:
-	mlx5e_rq_shampo_hd_free(rq);
-out:
+err_umr_mkey:
+	mlx5e_rq_shampo_hd_info_free(rq);
+err_shampo_hd_info_alloc:
+	kvfree(rq->mpwqe.shampo);
 	return err;
 }
 
@@ -809,7 +803,7 @@ static void mlx5e_rq_free_shampo(struct mlx5e_rq *rq)
 	kvfree(rq->hw_gro_data);
 	mlx5e_rq_shampo_hd_info_free(rq);
 	mlx5_core_destroy_mkey(rq->mdev, rq->mpwqe.shampo->mkey);
-	mlx5e_rq_shampo_hd_free(rq);
+	kvfree(rq->mpwqe.shampo);
 }
 
 static int mlx5e_alloc_rq(struct mlx5e_params *params,
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 03/11] net/mlx5e: SHAMPO: Remove redundant params
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
  2025-01-16 21:55 ` [net-next 01/11] net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR Saeed Mahameed
  2025-01-16 21:55 ` [net-next 02/11] net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-01-16 21:55 ` [net-next 04/11] net/mlx5e: SHAMPO: Improve hw gro capability checking Saeed Mahameed
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

Two SHAMPO params are static and always the same, remove them from the
global mlx5e_params struct.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h        | 4 ----
 drivers/net/ethernet/mellanox/mlx5/core/en/params.c | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c   | 4 ----
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 66c93816803e..18f8c00f4d7f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -274,10 +274,6 @@ enum packet_merge {
 struct mlx5e_packet_merge_param {
 	enum packet_merge type;
 	u32 timeout;
-	struct {
-		u8 match_criteria_type;
-		u8 alignment_granularity;
-	} shampo;
 };
 
 struct mlx5e_params {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 64b62ed17b07..377363eb1faa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -930,9 +930,9 @@ int mlx5e_build_rq_param(struct mlx5_core_dev *mdev,
 			MLX5_SET(rqc, rqc, reservation_timeout,
 				 mlx5e_choose_lro_timeout(mdev, MLX5E_DEFAULT_SHAMPO_TIMEOUT));
 			MLX5_SET(rqc, rqc, shampo_match_criteria_type,
-				 params->packet_merge.shampo.match_criteria_type);
+				 MLX5_RQC_SHAMPO_MATCH_CRITERIA_TYPE_EXTENDED);
 			MLX5_SET(rqc, rqc, shampo_no_match_alignment_granularity,
-				 params->packet_merge.shampo.alignment_granularity);
+				 MLX5_RQC_SHAMPO_NO_MATCH_ALIGNMENT_GRANULARITY_STRIDE);
 		}
 		break;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index c687c926cba3..73947df91a33 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4047,10 +4047,6 @@ static int set_feature_hw_gro(struct net_device *netdev, bool enable)
 
 	if (enable) {
 		new_params.packet_merge.type = MLX5E_PACKET_MERGE_SHAMPO;
-		new_params.packet_merge.shampo.match_criteria_type =
-			MLX5_RQC_SHAMPO_MATCH_CRITERIA_TYPE_EXTENDED;
-		new_params.packet_merge.shampo.alignment_granularity =
-			MLX5_RQC_SHAMPO_NO_MATCH_ALIGNMENT_GRANULARITY_STRIDE;
 	} else if (new_params.packet_merge.type == MLX5E_PACKET_MERGE_SHAMPO) {
 		new_params.packet_merge.type = MLX5E_PACKET_MERGE_NONE;
 	} else {
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 04/11] net/mlx5e: SHAMPO: Improve hw gro capability checking
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
                   ` (2 preceding siblings ...)
  2025-01-16 21:55 ` [net-next 03/11] net/mlx5e: SHAMPO: Remove redundant params Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-01-16 21:55 ` [net-next 05/11] net/mlx5e: SHAMPO: Separate pool for headers Saeed Mahameed
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

Add missing HW capabilities, declare the feature in
netdev->vlan_features, similar to other features in mlx5e_build_nic_netdev.
No functional change here as all by default disabled features are
explicitly disabled at the bottom of the function.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 73947df91a33..66d1b3fe3134 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -77,7 +77,8 @@
 
 static bool mlx5e_hw_gro_supported(struct mlx5_core_dev *mdev)
 {
-	if (!MLX5_CAP_GEN(mdev, shampo))
+	if (!MLX5_CAP_GEN(mdev, shampo) ||
+	    !MLX5_CAP_SHAMPO(mdev, shampo_header_split_data_merge))
 		return false;
 
 	/* Our HW-GRO implementation relies on "KSM Mkey" for
@@ -5508,17 +5509,17 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 						   MLX5E_MPWRQ_UMR_MODE_ALIGNED))
 		netdev->vlan_features    |= NETIF_F_LRO;
 
+	if (mlx5e_hw_gro_supported(mdev) &&
+	    mlx5e_check_fragmented_striding_rq_cap(mdev, PAGE_SHIFT,
+						   MLX5E_MPWRQ_UMR_MODE_ALIGNED))
+		netdev->vlan_features |= NETIF_F_GRO_HW;
+
 	netdev->hw_features       = netdev->vlan_features;
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_TX;
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_RX;
 	netdev->hw_features      |= NETIF_F_HW_VLAN_CTAG_FILTER;
 	netdev->hw_features      |= NETIF_F_HW_VLAN_STAG_TX;
 
-	if (mlx5e_hw_gro_supported(mdev) &&
-	    mlx5e_check_fragmented_striding_rq_cap(mdev, PAGE_SHIFT,
-						   MLX5E_MPWRQ_UMR_MODE_ALIGNED))
-		netdev->hw_features    |= NETIF_F_GRO_HW;
-
 	if (mlx5e_tunnel_any_tx_proto_supported(mdev)) {
 		netdev->hw_enc_features |= NETIF_F_HW_CSUM;
 		netdev->hw_enc_features |= NETIF_F_TSO;
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 05/11] net/mlx5e: SHAMPO: Separate pool for headers
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
                   ` (3 preceding siblings ...)
  2025-01-16 21:55 ` [net-next 04/11] net/mlx5e: SHAMPO: Improve hw gro capability checking Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-01-16 21:55 ` [net-next 06/11] net/mlx5e: SHAMPO: Headers page pool stats Saeed Mahameed
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

Allocate a separate page pool for headers when SHAMPO is enabled.
This will be useful for adding support to zc page pool, which has to be
different from the headers page pool.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  3 ++
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 37 ++++++++++++++++---
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 35 +++++++++---------
 3 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 18f8c00f4d7f..29b9bcecd125 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -706,7 +706,10 @@ struct mlx5e_rq {
 	struct bpf_prog __rcu *xdp_prog;
 	struct mlx5e_xdpsq    *xdpsq;
 	DECLARE_BITMAP(flags, 8);
+
+	/* page pools */
 	struct page_pool      *page_pool;
+	struct page_pool      *hd_page_pool;
 
 	/* AF_XDP zero-copy */
 	struct xsk_buff_pool  *xsk_pool;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 66d1b3fe3134..02c9737868b3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -745,12 +745,10 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev,
 				struct mlx5e_params *params,
 				struct mlx5e_rq_param *rqp,
 				struct mlx5e_rq *rq,
-				u32 *pool_size,
 				int node)
 {
 	void *wqc = MLX5_ADDR_OF(rqc, rqp->rqc, wq);
 	u16 hd_per_wq;
-	int wq_size;
 	int err;
 
 	if (!test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
@@ -774,9 +772,33 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev,
 	rq->mpwqe.shampo->key = cpu_to_be32(rq->mpwqe.shampo->mkey);
 	rq->mpwqe.shampo->hd_per_wqe =
 		mlx5e_shampo_hd_per_wqe(mdev, params, rqp);
-	wq_size = BIT(MLX5_GET(wq, wqc, log_wq_sz));
-	*pool_size += (rq->mpwqe.shampo->hd_per_wqe * wq_size) /
-		     MLX5E_SHAMPO_WQ_HEADER_PER_PAGE;
+
+	/* separate page pool for shampo headers */
+	{
+		int wq_size = BIT(MLX5_GET(wq, wqc, log_wq_sz));
+		struct page_pool_params pp_params = { };
+		u32 pool_size;
+
+		pool_size = (rq->mpwqe.shampo->hd_per_wqe * wq_size) /
+				MLX5E_SHAMPO_WQ_HEADER_PER_PAGE;
+
+		pp_params.order     = 0;
+		pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
+		pp_params.pool_size = pool_size;
+		pp_params.nid       = node;
+		pp_params.dev       = rq->pdev;
+		pp_params.napi      = rq->cq.napi;
+		pp_params.netdev    = rq->netdev;
+		pp_params.dma_dir   = rq->buff.map_dir;
+		pp_params.max_len   = PAGE_SIZE;
+
+		rq->hd_page_pool = page_pool_create(&pp_params);
+		if (IS_ERR(rq->hd_page_pool)) {
+			err = PTR_ERR(rq->hd_page_pool);
+			rq->hd_page_pool = NULL;
+			goto err_hds_page_pool;
+		}
+	}
 
 	/* gro only data structures */
 	rq->hw_gro_data = kvzalloc_node(sizeof(*rq->hw_gro_data), GFP_KERNEL, node);
@@ -788,6 +810,8 @@ static int mlx5_rq_shampo_alloc(struct mlx5_core_dev *mdev,
 	return 0;
 
 err_hw_gro_data:
+	page_pool_destroy(rq->hd_page_pool);
+err_hds_page_pool:
 	mlx5_core_destroy_mkey(mdev, rq->mpwqe.shampo->mkey);
 err_umr_mkey:
 	mlx5e_rq_shampo_hd_info_free(rq);
@@ -802,6 +826,7 @@ static void mlx5e_rq_free_shampo(struct mlx5e_rq *rq)
 		return;
 
 	kvfree(rq->hw_gro_data);
+	page_pool_destroy(rq->hd_page_pool);
 	mlx5e_rq_shampo_hd_info_free(rq);
 	mlx5_core_destroy_mkey(rq->mdev, rq->mpwqe.shampo->mkey);
 	kvfree(rq->mpwqe.shampo);
@@ -881,7 +906,7 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
 		if (err)
 			goto err_rq_mkey;
 
-		err = mlx5_rq_shampo_alloc(mdev, params, rqp, rq, &pool_size, node);
+		err = mlx5_rq_shampo_alloc(mdev, params, rqp, rq, node);
 		if (err)
 			goto err_free_mpwqe_info;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 1963bc5adb18..df561251b30b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -273,12 +273,12 @@ static inline u32 mlx5e_decompress_cqes_start(struct mlx5e_rq *rq,
 
 #define MLX5E_PAGECNT_BIAS_MAX (PAGE_SIZE / 64)
 
-static int mlx5e_page_alloc_fragmented(struct mlx5e_rq *rq,
+static int mlx5e_page_alloc_fragmented(struct page_pool *pool,
 				       struct mlx5e_frag_page *frag_page)
 {
 	struct page *page;
 
-	page = page_pool_dev_alloc_pages(rq->page_pool);
+	page = page_pool_dev_alloc_pages(pool);
 	if (unlikely(!page))
 		return -ENOMEM;
 
@@ -292,14 +292,14 @@ static int mlx5e_page_alloc_fragmented(struct mlx5e_rq *rq,
 	return 0;
 }
 
-static void mlx5e_page_release_fragmented(struct mlx5e_rq *rq,
+static void mlx5e_page_release_fragmented(struct page_pool *pool,
 					  struct mlx5e_frag_page *frag_page)
 {
 	u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags;
 	struct page *page = frag_page->page;
 
 	if (page_pool_unref_page(page, drain_count) == 0)
-		page_pool_put_unrefed_page(rq->page_pool, page, -1, true);
+		page_pool_put_unrefed_page(pool, page, -1, true);
 }
 
 static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq,
@@ -313,7 +313,7 @@ static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq,
 		 * offset) should just use the new one without replenishing again
 		 * by themselves.
 		 */
-		err = mlx5e_page_alloc_fragmented(rq, frag->frag_page);
+		err = mlx5e_page_alloc_fragmented(rq->page_pool, frag->frag_page);
 
 	return err;
 }
@@ -332,7 +332,7 @@ static inline void mlx5e_put_rx_frag(struct mlx5e_rq *rq,
 				     struct mlx5e_wqe_frag_info *frag)
 {
 	if (mlx5e_frag_can_release(frag))
-		mlx5e_page_release_fragmented(rq, frag->frag_page);
+		mlx5e_page_release_fragmented(rq->page_pool, frag->frag_page);
 }
 
 static inline struct mlx5e_wqe_frag_info *get_frag(struct mlx5e_rq *rq, u16 ix)
@@ -584,7 +584,7 @@ mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi)
 				struct mlx5e_frag_page *frag_page;
 
 				frag_page = &wi->alloc_units.frag_pages[i];
-				mlx5e_page_release_fragmented(rq, frag_page);
+				mlx5e_page_release_fragmented(rq->page_pool, frag_page);
 			}
 		}
 	}
@@ -679,11 +679,10 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 		struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, index);
 		u64 addr;
 
-		err = mlx5e_page_alloc_fragmented(rq, frag_page);
+		err = mlx5e_page_alloc_fragmented(rq->hd_page_pool, frag_page);
 		if (unlikely(err))
 			goto err_unmap;
 
-
 		addr = page_pool_get_dma_addr(frag_page->page);
 
 		for (int j = 0; j < MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; j++) {
@@ -715,7 +714,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 		if (!header_offset) {
 			struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, index);
 
-			mlx5e_page_release_fragmented(rq, frag_page);
+			mlx5e_page_release_fragmented(rq->hd_page_pool, frag_page);
 		}
 	}
 
@@ -791,7 +790,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 	for (i = 0; i < rq->mpwqe.pages_per_wqe; i++, frag_page++) {
 		dma_addr_t addr;
 
-		err = mlx5e_page_alloc_fragmented(rq, frag_page);
+		err = mlx5e_page_alloc_fragmented(rq->page_pool, frag_page);
 		if (unlikely(err))
 			goto err_unmap;
 		addr = page_pool_get_dma_addr(frag_page->page);
@@ -836,7 +835,7 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 err_unmap:
 	while (--i >= 0) {
 		frag_page--;
-		mlx5e_page_release_fragmented(rq, frag_page);
+		mlx5e_page_release_fragmented(rq->page_pool, frag_page);
 	}
 
 	bitmap_fill(wi->skip_release_bitmap, rq->mpwqe.pages_per_wqe);
@@ -855,7 +854,7 @@ mlx5e_free_rx_shampo_hd_entry(struct mlx5e_rq *rq, u16 header_index)
 	if (((header_index + 1) & (MLX5E_SHAMPO_WQ_HEADER_PER_PAGE - 1)) == 0) {
 		struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index);
 
-		mlx5e_page_release_fragmented(rq, frag_page);
+		mlx5e_page_release_fragmented(rq->hd_page_pool, frag_page);
 	}
 	clear_bit(header_index, shampo->bitmap);
 }
@@ -1100,6 +1099,8 @@ INDIRECT_CALLABLE_SCOPE bool mlx5e_post_rx_mpwqes(struct mlx5e_rq *rq)
 
 	if (rq->page_pool)
 		page_pool_nid_changed(rq->page_pool, numa_mem_id());
+	if (rq->hd_page_pool)
+		page_pool_nid_changed(rq->hd_page_pool, numa_mem_id());
 
 	head = rq->mpwqe.actual_wq_head;
 	i = missing;
@@ -2001,7 +2002,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 	if (prog) {
 		/* area for bpf_xdp_[store|load]_bytes */
 		net_prefetchw(page_address(frag_page->page) + frag_offset);
-		if (unlikely(mlx5e_page_alloc_fragmented(rq, &wi->linear_page))) {
+		if (unlikely(mlx5e_page_alloc_fragmented(rq->page_pool, &wi->linear_page))) {
 			rq->stats->buff_alloc_err++;
 			return NULL;
 		}
@@ -2063,7 +2064,7 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 				wi->linear_page.frags++;
 			}
-			mlx5e_page_release_fragmented(rq, &wi->linear_page);
+			mlx5e_page_release_fragmented(rq->page_pool, &wi->linear_page);
 			return NULL; /* page/packet was consumed by XDP */
 		}
 
@@ -2072,13 +2073,13 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 					     mxbuf.xdp.data - mxbuf.xdp.data_hard_start, 0,
 					     mxbuf.xdp.data - mxbuf.xdp.data_meta);
 		if (unlikely(!skb)) {
-			mlx5e_page_release_fragmented(rq, &wi->linear_page);
+			mlx5e_page_release_fragmented(rq->page_pool, &wi->linear_page);
 			return NULL;
 		}
 
 		skb_mark_for_recycle(skb);
 		wi->linear_page.frags++;
-		mlx5e_page_release_fragmented(rq, &wi->linear_page);
+		mlx5e_page_release_fragmented(rq->page_pool, &wi->linear_page);
 
 		if (xdp_buff_has_frags(&mxbuf.xdp)) {
 			struct mlx5e_frag_page *pagep;
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 06/11] net/mlx5e: SHAMPO: Headers page pool stats
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
                   ` (4 preceding siblings ...)
  2025-01-16 21:55 ` [net-next 05/11] net/mlx5e: SHAMPO: Separate pool for headers Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-01-16 21:55 ` [net-next 07/11] net/mlx5e: Convert over to netmem Saeed Mahameed
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

Expose the stats of the new headers page pool.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en_stats.c    | 53 +++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/en_stats.h    | 24 +++++++++
 2 files changed, 77 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 611ec4b6f370..a34b829a810b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -208,6 +208,18 @@ static const struct counter_desc sw_stats_desc[] = {
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring_full) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_released_ref) },
+
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_alloc_fast) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_alloc_slow) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_alloc_slow_high_order) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_alloc_empty) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_alloc_refill) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_alloc_waive) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_recycle_cached) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_recycle_cache_full) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_recycle_ring) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_recycle_ring_full) },
+	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_hd_recycle_released_ref) },
 #endif
 #ifdef CONFIG_MLX5_EN_TLS
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_tls_decrypted_packets) },
@@ -389,6 +401,18 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
 	s->rx_pp_recycle_ring			+= rq_stats->pp_recycle_ring;
 	s->rx_pp_recycle_ring_full		+= rq_stats->pp_recycle_ring_full;
 	s->rx_pp_recycle_released_ref		+= rq_stats->pp_recycle_released_ref;
+
+	s->rx_pp_hd_alloc_fast          += rq_stats->pp_hd_alloc_fast;
+	s->rx_pp_hd_alloc_slow          += rq_stats->pp_hd_alloc_slow;
+	s->rx_pp_hd_alloc_empty         += rq_stats->pp_hd_alloc_empty;
+	s->rx_pp_hd_alloc_refill        += rq_stats->pp_hd_alloc_refill;
+	s->rx_pp_hd_alloc_waive         += rq_stats->pp_hd_alloc_waive;
+	s->rx_pp_hd_alloc_slow_high_order	+= rq_stats->pp_hd_alloc_slow_high_order;
+	s->rx_pp_hd_recycle_cached		+= rq_stats->pp_hd_recycle_cached;
+	s->rx_pp_hd_recycle_cache_full		+= rq_stats->pp_hd_recycle_cache_full;
+	s->rx_pp_hd_recycle_ring		+= rq_stats->pp_hd_recycle_ring;
+	s->rx_pp_hd_recycle_ring_full		+= rq_stats->pp_hd_recycle_ring_full;
+	s->rx_pp_hd_recycle_released_ref	+= rq_stats->pp_hd_recycle_released_ref;
 #endif
 #ifdef CONFIG_MLX5_EN_TLS
 	s->rx_tls_decrypted_packets   += rq_stats->tls_decrypted_packets;
@@ -518,6 +542,23 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
 	rq_stats->pp_recycle_ring = stats.recycle_stats.ring;
 	rq_stats->pp_recycle_ring_full = stats.recycle_stats.ring_full;
 	rq_stats->pp_recycle_released_ref = stats.recycle_stats.released_refcnt;
+
+	pool = c->rq.hd_page_pool;
+	if (!pool || !page_pool_get_stats(pool, &stats))
+		return;
+
+	rq_stats->pp_hd_alloc_fast = stats.alloc_stats.fast;
+	rq_stats->pp_hd_alloc_slow = stats.alloc_stats.slow;
+	rq_stats->pp_hd_alloc_slow_high_order = stats.alloc_stats.slow_high_order;
+	rq_stats->pp_hd_alloc_empty = stats.alloc_stats.empty;
+	rq_stats->pp_hd_alloc_waive = stats.alloc_stats.waive;
+	rq_stats->pp_hd_alloc_refill = stats.alloc_stats.refill;
+
+	rq_stats->pp_hd_recycle_cached = stats.recycle_stats.cached;
+	rq_stats->pp_hd_recycle_cache_full = stats.recycle_stats.cache_full;
+	rq_stats->pp_hd_recycle_ring = stats.recycle_stats.ring;
+	rq_stats->pp_hd_recycle_ring_full = stats.recycle_stats.ring_full;
+	rq_stats->pp_hd_recycle_released_ref = stats.recycle_stats.released_refcnt;
 }
 #else
 static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
@@ -2098,6 +2139,18 @@ static const struct counter_desc rq_stats_desc[] = {
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring_full) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_released_ref) },
+
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_alloc_fast) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_alloc_slow) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_alloc_slow_high_order) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_alloc_empty) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_alloc_refill) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_alloc_waive) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_recycle_cached) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_recycle_cache_full) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_recycle_ring) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_recycle_ring_full) },
+	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_hd_recycle_released_ref) },
 #endif
 #ifdef CONFIG_MLX5_EN_TLS
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, tls_decrypted_packets) },
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 5961c569cfe0..d69071e20083 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -227,6 +227,18 @@ struct mlx5e_sw_stats {
 	u64 rx_pp_recycle_ring;
 	u64 rx_pp_recycle_ring_full;
 	u64 rx_pp_recycle_released_ref;
+
+	u64 rx_pp_hd_alloc_fast;
+	u64 rx_pp_hd_alloc_slow;
+	u64 rx_pp_hd_alloc_slow_high_order;
+	u64 rx_pp_hd_alloc_empty;
+	u64 rx_pp_hd_alloc_refill;
+	u64 rx_pp_hd_alloc_waive;
+	u64 rx_pp_hd_recycle_cached;
+	u64 rx_pp_hd_recycle_cache_full;
+	u64 rx_pp_hd_recycle_ring;
+	u64 rx_pp_hd_recycle_ring_full;
+	u64 rx_pp_hd_recycle_released_ref;
 #endif
 #ifdef CONFIG_MLX5_EN_TLS
 	u64 tx_tls_encrypted_packets;
@@ -393,6 +405,18 @@ struct mlx5e_rq_stats {
 	u64 pp_recycle_ring;
 	u64 pp_recycle_ring_full;
 	u64 pp_recycle_released_ref;
+
+	u64 pp_hd_alloc_fast;
+	u64 pp_hd_alloc_slow;
+	u64 pp_hd_alloc_slow_high_order;
+	u64 pp_hd_alloc_empty;
+	u64 pp_hd_alloc_refill;
+	u64 pp_hd_alloc_waive;
+	u64 pp_hd_recycle_cached;
+	u64 pp_hd_recycle_cache_full;
+	u64 pp_hd_recycle_ring;
+	u64 pp_hd_recycle_ring_full;
+	u64 pp_hd_recycle_released_ref;
 #endif
 #ifdef CONFIG_MLX5_EN_TLS
 	u64 tls_decrypted_packets;
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 07/11] net/mlx5e: Convert over to netmem
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
                   ` (5 preceding siblings ...)
  2025-01-16 21:55 ` [net-next 06/11] net/mlx5e: SHAMPO: Headers page pool stats Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-02-05 20:14   ` Mina Almasry
  2025-01-16 21:55 ` [net-next 08/11] net/mlx5e: Handle iov backed netmems Saeed Mahameed
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

mlx5e_page_frag holds the physical page itself, to naturally support
zc page pools, remove physical page reference from mlx5 and replace it
with netmem_ref, to avoid internal handling in mlx5 for net_iov backed
pages.

No performance degradation observed.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 80 ++++++++++---------
 2 files changed, 43 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 29b9bcecd125..8f4c21f88f78 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -543,7 +543,7 @@ struct mlx5e_icosq {
 } ____cacheline_aligned_in_smp;
 
 struct mlx5e_frag_page {
-	struct page *page;
+	netmem_ref netmem;
 	u16 frags;
 };
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index df561251b30b..b08c2ac10b67 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -273,33 +273,32 @@ static inline u32 mlx5e_decompress_cqes_start(struct mlx5e_rq *rq,
 
 #define MLX5E_PAGECNT_BIAS_MAX (PAGE_SIZE / 64)
 
-static int mlx5e_page_alloc_fragmented(struct page_pool *pool,
+static int mlx5e_page_alloc_fragmented(struct page_pool *pp,
 				       struct mlx5e_frag_page *frag_page)
 {
-	struct page *page;
+	netmem_ref netmem = page_pool_alloc_netmems(pp, GFP_ATOMIC | __GFP_NOWARN);
 
-	page = page_pool_dev_alloc_pages(pool);
-	if (unlikely(!page))
+	if (unlikely(!netmem))
 		return -ENOMEM;
 
-	page_pool_fragment_page(page, MLX5E_PAGECNT_BIAS_MAX);
+	page_pool_fragment_netmem(netmem, MLX5E_PAGECNT_BIAS_MAX);
 
 	*frag_page = (struct mlx5e_frag_page) {
-		.page	= page,
+		.netmem	= netmem,
 		.frags	= 0,
 	};
 
 	return 0;
 }
 
-static void mlx5e_page_release_fragmented(struct page_pool *pool,
+static void mlx5e_page_release_fragmented(struct page_pool *pp,
 					  struct mlx5e_frag_page *frag_page)
 {
 	u16 drain_count = MLX5E_PAGECNT_BIAS_MAX - frag_page->frags;
-	struct page *page = frag_page->page;
+	netmem_ref netmem = frag_page->netmem;
 
-	if (page_pool_unref_page(page, drain_count) == 0)
-		page_pool_put_unrefed_page(pool, page, -1, true);
+	if (page_pool_unref_netmem(netmem, drain_count) == 0)
+		page_pool_put_unrefed_netmem(pp, netmem, -1, true);
 }
 
 static inline int mlx5e_get_rx_frag(struct mlx5e_rq *rq,
@@ -358,7 +357,7 @@ static int mlx5e_alloc_rx_wqe(struct mlx5e_rq *rq, struct mlx5e_rx_wqe_cyc *wqe,
 		frag->flags &= ~BIT(MLX5E_WQE_FRAG_SKIP_RELEASE);
 
 		headroom = i == 0 ? rq->buff.headroom : 0;
-		addr = page_pool_get_dma_addr(frag->frag_page->page);
+		addr = page_pool_get_dma_addr_netmem(frag->frag_page->netmem);
 		wqe->data[i].addr = cpu_to_be64(addr + frag->offset + headroom);
 	}
 
@@ -499,9 +498,10 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
 			       struct xdp_buff *xdp, struct mlx5e_frag_page *frag_page,
 			       u32 frag_offset, u32 len)
 {
+	netmem_ref netmem = frag_page->netmem;
 	skb_frag_t *frag;
 
-	dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
+	dma_addr_t addr = page_pool_get_dma_addr_netmem(netmem);
 
 	dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len, rq->buff.map_dir);
 	if (!xdp_buff_has_frags(xdp)) {
@@ -514,9 +514,9 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
 	}
 
 	frag = &sinfo->frags[sinfo->nr_frags++];
-	skb_frag_fill_page_desc(frag, frag_page->page, frag_offset, len);
+	skb_frag_fill_netmem_desc(frag, netmem, frag_offset, len);
 
-	if (page_is_pfmemalloc(frag_page->page))
+	if (!netmem_is_net_iov(netmem) && page_is_pfmemalloc(netmem_to_page(netmem)))
 		xdp_buff_set_frag_pfmemalloc(xdp);
 	sinfo->xdp_frags_size += len;
 }
@@ -527,27 +527,29 @@ mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb,
 		   u32 frag_offset, u32 len,
 		   unsigned int truesize)
 {
-	dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
+	dma_addr_t addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
+	struct page *page = netmem_to_page(frag_page->netmem);
 	u8 next_frag = skb_shinfo(skb)->nr_frags;
 
 	dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len,
 				rq->buff.map_dir);
 
-	if (skb_can_coalesce(skb, next_frag, frag_page->page, frag_offset)) {
+	if (skb_can_coalesce(skb, next_frag, page, frag_offset)) {
 		skb_coalesce_rx_frag(skb, next_frag - 1, len, truesize);
-	} else {
-		frag_page->frags++;
-		skb_add_rx_frag(skb, next_frag, frag_page->page,
-				frag_offset, len, truesize);
+		return;
 	}
+
+	frag_page->frags++;
+	skb_add_rx_frag_netmem(skb, next_frag, frag_page->netmem,
+			       frag_offset, len, truesize);
 }
 
 static inline void
 mlx5e_copy_skb_header(struct mlx5e_rq *rq, struct sk_buff *skb,
-		      struct page *page, dma_addr_t addr,
+		      netmem_ref netmem, dma_addr_t addr,
 		      int offset_from, int dma_offset, u32 headlen)
 {
-	const void *from = page_address(page) + offset_from;
+	const void *from = netmem_address(netmem) + offset_from;
 	/* Aligning len to sizeof(long) optimizes memcpy performance */
 	unsigned int len = ALIGN(headlen, sizeof(long));
 
@@ -683,7 +685,7 @@ static int mlx5e_build_shampo_hd_umr(struct mlx5e_rq *rq,
 		if (unlikely(err))
 			goto err_unmap;
 
-		addr = page_pool_get_dma_addr(frag_page->page);
+		addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
 
 		for (int j = 0; j < MLX5E_SHAMPO_WQ_HEADER_PER_PAGE; j++) {
 			header_offset = mlx5e_shampo_hd_offset(index++);
@@ -793,7 +795,8 @@ static int mlx5e_alloc_rx_mpwqe(struct mlx5e_rq *rq, u16 ix)
 		err = mlx5e_page_alloc_fragmented(rq->page_pool, frag_page);
 		if (unlikely(err))
 			goto err_unmap;
-		addr = page_pool_get_dma_addr(frag_page->page);
+
+		addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
 		umr_wqe->inline_mtts[i] = (struct mlx5_mtt) {
 			.ptag = cpu_to_be64(addr | MLX5_EN_WR),
 		};
@@ -1213,7 +1216,7 @@ static void *mlx5e_shampo_get_packet_hd(struct mlx5e_rq *rq, u16 header_index)
 	struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index);
 	u16 head_offset = mlx5e_shampo_hd_offset(header_index) + rq->buff.headroom;
 
-	return page_address(frag_page->page) + head_offset;
+	return netmem_address(frag_page->netmem) + head_offset;
 }
 
 static void mlx5e_shampo_update_ipv4_udp_hdr(struct mlx5e_rq *rq, struct iphdr *ipv4)
@@ -1674,11 +1677,11 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi,
 	dma_addr_t addr;
 	u32 frag_size;
 
-	va             = page_address(frag_page->page) + wi->offset;
+	va             = netmem_address(frag_page->netmem) + wi->offset;
 	data           = va + rx_headroom;
 	frag_size      = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
 
-	addr = page_pool_get_dma_addr(frag_page->page);
+	addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
 	dma_sync_single_range_for_cpu(rq->pdev, addr, wi->offset,
 				      frag_size, rq->buff.map_dir);
 	net_prefetch(data);
@@ -1728,10 +1731,10 @@ mlx5e_skb_from_cqe_nonlinear(struct mlx5e_rq *rq, struct mlx5e_wqe_frag_info *wi
 
 	frag_page = wi->frag_page;
 
-	va = page_address(frag_page->page) + wi->offset;
+	va = netmem_address(frag_page->netmem) + wi->offset;
 	frag_consumed_bytes = min_t(u32, frag_info->frag_size, cqe_bcnt);
 
-	addr = page_pool_get_dma_addr(frag_page->page);
+	addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
 	dma_sync_single_range_for_cpu(rq->pdev, addr, wi->offset,
 				      rq->buff.frame0_sz, rq->buff.map_dir);
 	net_prefetchw(va); /* xdp_frame data area */
@@ -2001,12 +2004,13 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 
 	if (prog) {
 		/* area for bpf_xdp_[store|load]_bytes */
-		net_prefetchw(page_address(frag_page->page) + frag_offset);
+		net_prefetchw(netmem_address(frag_page->netmem) + frag_offset);
 		if (unlikely(mlx5e_page_alloc_fragmented(rq->page_pool, &wi->linear_page))) {
 			rq->stats->buff_alloc_err++;
 			return NULL;
 		}
-		va = page_address(wi->linear_page.page);
+
+		va = netmem_address(wi->linear_page.netmem);
 		net_prefetchw(va); /* xdp_frame data area */
 		linear_hr = XDP_PACKET_HEADROOM;
 		linear_data_len = 0;
@@ -2111,8 +2115,8 @@ mlx5e_skb_from_cqe_mpwrq_nonlinear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *w
 			while (++pagep < frag_page);
 		}
 		/* copy header */
-		addr = page_pool_get_dma_addr(head_page->page);
-		mlx5e_copy_skb_header(rq, skb, head_page->page, addr,
+		addr = page_pool_get_dma_addr_netmem(head_page->netmem);
+		mlx5e_copy_skb_header(rq, skb, head_page->netmem, addr,
 				      head_offset, head_offset, headlen);
 		/* skb linear part was allocated with headlen and aligned to long */
 		skb->tail += headlen;
@@ -2142,11 +2146,11 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 		return NULL;
 	}
 
-	va             = page_address(frag_page->page) + head_offset;
+	va             = netmem_address(frag_page->netmem) + head_offset;
 	data           = va + rx_headroom;
 	frag_size      = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
 
-	addr = page_pool_get_dma_addr(frag_page->page);
+	addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
 	dma_sync_single_range_for_cpu(rq->pdev, addr, head_offset,
 				      frag_size, rq->buff.map_dir);
 	net_prefetch(data);
@@ -2185,7 +2189,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 			  struct mlx5_cqe64 *cqe, u16 header_index)
 {
 	struct mlx5e_frag_page *frag_page = mlx5e_shampo_hd_to_frag_page(rq, header_index);
-	dma_addr_t page_dma_addr = page_pool_get_dma_addr(frag_page->page);
+	dma_addr_t page_dma_addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
 	u16 head_offset = mlx5e_shampo_hd_offset(header_index);
 	dma_addr_t dma_addr = page_dma_addr + head_offset;
 	u16 head_size = cqe->shampo.header_size;
@@ -2194,7 +2198,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 	void *hdr, *data;
 	u32 frag_size;
 
-	hdr		= page_address(frag_page->page) + head_offset;
+	hdr		= netmem_address(frag_page->netmem) + head_offset;
 	data		= hdr + rx_headroom;
 	frag_size	= MLX5_SKB_FRAG_SZ(rx_headroom + head_size);
 
@@ -2219,7 +2223,7 @@ mlx5e_skb_from_cqe_shampo(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 		}
 
 		net_prefetchw(skb->data);
-		mlx5e_copy_skb_header(rq, skb, frag_page->page, dma_addr,
+		mlx5e_copy_skb_header(rq, skb, frag_page->netmem, dma_addr,
 				      head_offset + rx_headroom,
 				      rx_headroom, head_size);
 		/* skb linear part was allocated with headlen and aligned to long */
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 08/11] net/mlx5e: Handle iov backed netmems
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
                   ` (6 preceding siblings ...)
  2025-01-16 21:55 ` [net-next 07/11] net/mlx5e: Convert over to netmem Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-01-16 21:55 ` [net-next 09/11] net/mlx5e: Add support for UNREADABLE netmem page pools Saeed Mahameed
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

Special page pools can allocate an iov backed netmem, such netmem pages
are unreachable by driver, for such cases don't attempt to access those
pages in the driver. The only affected path is
mlx5e_add_skb_frag()->skb_can_coalesce().

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index b08c2ac10b67..2ac00962c7a3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -528,15 +528,18 @@ mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb,
 		   unsigned int truesize)
 {
 	dma_addr_t addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
-	struct page *page = netmem_to_page(frag_page->netmem);
 	u8 next_frag = skb_shinfo(skb)->nr_frags;
 
 	dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len,
 				rq->buff.map_dir);
 
-	if (skb_can_coalesce(skb, next_frag, page, frag_offset)) {
-		skb_coalesce_rx_frag(skb, next_frag - 1, len, truesize);
-		return;
+	if (!netmem_is_net_iov(frag_page->netmem)) {
+		struct page *page = netmem_to_page(frag_page->netmem);
+
+		if (skb_can_coalesce(skb, next_frag, page, frag_offset)) {
+			skb_coalesce_rx_frag(skb, next_frag - 1, len, truesize);
+			return;
+		}
 	}
 
 	frag_page->frags++;
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 09/11] net/mlx5e: Add support for UNREADABLE netmem page pools
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
                   ` (7 preceding siblings ...)
  2025-01-16 21:55 ` [net-next 08/11] net/mlx5e: Handle iov backed netmems Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-01-16 21:55 ` [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap Saeed Mahameed
  2025-01-16 21:55 ` [net-next 11/11] net/mlx5e: Support ethtool tcp-data-split settings Saeed Mahameed
  10 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

On netdev_rx_queue_restart, a special type of page pool maybe expected.

In this patch declare support for UNREADABLE netmem iov pages in the
pool params only when header data split shampo RQ mode is enabled, also
set the queue index in the page pool params struct.

Shampo mode requirement: Without header split rx needs to peek at the data,
we can't do UNREADABLE_NETMEM.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 02c9737868b3..340ed7d3feac 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -946,6 +946,11 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
 		pp_params.netdev    = rq->netdev;
 		pp_params.dma_dir   = rq->buff.map_dir;
 		pp_params.max_len   = PAGE_SIZE;
+		pp_params.queue_idx = rq->ix;
+
+		/* Shampo header data split rx path allow for unreadable netmem */
+		if (test_bit(MLX5E_RQ_STATE_SHAMPO, &rq->state))
+			pp_params.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM;
 
 		/* page_pool can be used even when there is no rq->xdp_prog,
 		 * given page_pool does not handle DMA mapping there is no
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
                   ` (8 preceding siblings ...)
  2025-01-16 21:55 ` [net-next 09/11] net/mlx5e: Add support for UNREADABLE netmem page pools Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  2025-01-16 23:21   ` Jakub Kicinski
  2025-01-16 21:55 ` [net-next 11/11] net/mlx5e: Support ethtool tcp-data-split settings Saeed Mahameed
  10 siblings, 1 reply; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

The bulk of the work is done in mlx5e_queue_mem_alloc, where we allocate
and create the new channel resources, similar to
mlx5e_safe_switch_params, but here we do it for a single channel using
existing params, sort of a clone channel.
To swap the old channel with the new one, we deactivate and close the
old channel then replace it with the new one, since the swap procedure
doesn't fail in mlx5, we do it all in one place (mlx5e_queue_start).

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c | 96 +++++++++++++++++++
 1 file changed, 96 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 340ed7d3feac..1e03f2afe625 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5489,6 +5489,101 @@ static const struct netdev_stat_ops mlx5e_stat_ops = {
 	.get_base_stats      = mlx5e_get_base_stats,
 };
 
+struct mlx5_qmgmt_data {
+	struct mlx5e_channel *c;
+	struct mlx5e_channel_param cparam;
+};
+
+static int mlx5e_queue_mem_alloc(struct net_device *dev, void *newq, int queue_index)
+{
+	struct mlx5_qmgmt_data *new = (struct mlx5_qmgmt_data *)newq;
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5e_channels *chs = &priv->channels;
+	struct mlx5e_params params = chs->params;
+	struct mlx5_core_dev *mdev;
+	int err;
+
+	ASSERT_RTNL();
+	mutex_lock(&priv->state_lock);
+	if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) {
+		err = -ENODEV;
+		goto unlock;
+	}
+
+	if (queue_index >= chs->num) {
+		err = -ERANGE;
+		goto unlock;
+	}
+
+	if (MLX5E_GET_PFLAG(&chs->params, MLX5E_PFLAG_TX_PORT_TS) ||
+	    chs->params.ptp_rx   ||
+	    chs->params.xdp_prog ||
+	    priv->htb) {
+		netdev_err(priv->netdev,
+			   "Cloning channels with Port/rx PTP, XDP or HTB is not supported\n");
+		err = -EOPNOTSUPP;
+		goto unlock;
+	}
+
+	mdev = mlx5_sd_ch_ix_get_dev(priv->mdev, queue_index);
+	err = mlx5e_build_channel_param(mdev, &params, &new->cparam);
+	if (err) {
+		return err;
+		goto unlock;
+	}
+
+	err = mlx5e_open_channel(priv, queue_index, &params, NULL, &new->c);
+unlock:
+	mutex_unlock(&priv->state_lock);
+	return err;
+}
+
+static void mlx5e_queue_mem_free(struct net_device *dev, void *mem)
+{
+	struct mlx5_qmgmt_data *data = (struct mlx5_qmgmt_data *)mem;
+
+	/* not supposed to happen since mlx5e_queue_start never fails
+	 * but this is how this should be implemented just in case
+	 */
+	if (data->c)
+		mlx5e_close_channel(data->c);
+}
+
+static int mlx5e_queue_stop(struct net_device *dev, void *oldq, int queue_index)
+{
+	/* mlx5e_queue_start does not fail, we stop the old queue there */
+	return 0;
+}
+
+static int mlx5e_queue_start(struct net_device *dev, void *newq, int queue_index)
+{
+	struct mlx5_qmgmt_data *new = (struct mlx5_qmgmt_data *)newq;
+	struct mlx5e_priv *priv = netdev_priv(dev);
+	struct mlx5e_channel *old;
+
+	mutex_lock(&priv->state_lock);
+
+	/* stop and close the old */
+	old = priv->channels.c[queue_index];
+	mlx5e_deactivate_priv_channels(priv);
+	/* close old before activating new, to avoid napi conflict */
+	mlx5e_close_channel(old);
+
+	/* start the new */
+	priv->channels.c[queue_index] = new->c;
+	mlx5e_activate_priv_channels(priv);
+	mutex_unlock(&priv->state_lock);
+	return 0;
+}
+
+static const struct netdev_queue_mgmt_ops mlx5e_queue_mgmt_ops = {
+	.ndo_queue_mem_size	=	sizeof(struct mlx5_qmgmt_data),
+	.ndo_queue_mem_alloc	=	mlx5e_queue_mem_alloc,
+	.ndo_queue_mem_free	=	mlx5e_queue_mem_free,
+	.ndo_queue_start	=	mlx5e_queue_start,
+	.ndo_queue_stop		=	mlx5e_queue_stop,
+};
+
 static void mlx5e_build_nic_netdev(struct net_device *netdev)
 {
 	struct mlx5e_priv *priv = netdev_priv(netdev);
@@ -5499,6 +5594,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 	SET_NETDEV_DEV(netdev, mdev->device);
 
 	netdev->netdev_ops = &mlx5e_netdev_ops;
+	netdev->queue_mgmt_ops = &mlx5e_queue_mgmt_ops;
 	netdev->xdp_metadata_ops = &mlx5e_xdp_metadata_ops;
 	netdev->xsk_tx_metadata_ops = &mlx5e_xsk_tx_metadata_ops;
 
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [net-next 11/11] net/mlx5e: Support ethtool tcp-data-split settings
  2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
                   ` (9 preceding siblings ...)
  2025-01-16 21:55 ` [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap Saeed Mahameed
@ 2025-01-16 21:55 ` Saeed Mahameed
  10 siblings, 0 replies; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 21:55 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet
  Cc: Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

From: Saeed Mahameed <saeedm@nvidia.com>

Try enabling HW GRO when requested.

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../ethernet/mellanox/mlx5/core/en_ethtool.c  | 49 +++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index cae39198b4db..ee188e033e99 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -349,6 +349,14 @@ void mlx5e_ethtool_get_ringparam(struct mlx5e_priv *priv,
 		(priv->channels.params.packet_merge.type == MLX5E_PACKET_MERGE_SHAMPO) ?
 		ETHTOOL_TCP_DATA_SPLIT_ENABLED :
 		ETHTOOL_TCP_DATA_SPLIT_DISABLED;
+
+	/* if HW GRO is not enabled due to external limitations but is wanted,
+	 * report HDS state as unknown so it won't get truned off explicitly.
+	 */
+	if (kernel_param->tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_DISABLED &&
+	    priv->netdev->wanted_features & NETIF_F_GRO_HW)
+		kernel_param->tcp_data_split = ETHTOOL_TCP_DATA_SPLIT_UNKNOWN;
+
 }
 
 static void mlx5e_get_ringparam(struct net_device *dev,
@@ -361,6 +369,43 @@ static void mlx5e_get_ringparam(struct net_device *dev,
 	mlx5e_ethtool_get_ringparam(priv, param, kernel_param);
 }
 
+static bool mlx5e_ethtool_set_tcp_data_split(struct mlx5e_priv *priv,
+					     u8 tcp_data_split)
+{
+	bool enable = (tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_ENABLED);
+	struct net_device *dev = priv->netdev;
+
+	if (tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_UNKNOWN)
+		return true;
+
+	if (enable && !(dev->hw_features & NETIF_F_GRO_HW)) {
+		netdev_warn(dev, "TCP-data-split is not supported when GRO HW is not supported\n");
+		return false; /* GRO HW is not supported */
+	}
+
+	if (enable && (dev->features & NETIF_F_GRO_HW)) {
+		/* Already enabled */
+		dev->wanted_features |= NETIF_F_GRO_HW;
+		return true;
+	}
+
+	if (!enable && !(dev->features & NETIF_F_GRO_HW)) {
+		/* Already disabled */
+		dev->wanted_features &= ~NETIF_F_GRO_HW;
+		return true;
+	}
+
+	/* Try enable or disable GRO HW */
+	if (enable)
+		dev->wanted_features |= NETIF_F_GRO_HW;
+	else
+		dev->wanted_features &= ~NETIF_F_GRO_HW;
+
+	netdev_change_features(dev);
+
+	return enable == !!(dev->features & NETIF_F_GRO_HW);
+}
+
 int mlx5e_ethtool_set_ringparam(struct mlx5e_priv *priv,
 				struct ethtool_ringparam *param,
 				struct netlink_ext_ack *extack)
@@ -419,6 +464,9 @@ static int mlx5e_set_ringparam(struct net_device *dev,
 {
 	struct mlx5e_priv *priv = netdev_priv(dev);
 
+	if (!mlx5e_ethtool_set_tcp_data_split(priv, kernel_param->tcp_data_split))
+		return -EINVAL;
+
 	return mlx5e_ethtool_set_ringparam(priv, param, extack);
 }
 
@@ -2613,6 +2661,7 @@ const struct ethtool_ops mlx5e_ethtool_ops = {
 				     ETHTOOL_COALESCE_MAX_FRAMES |
 				     ETHTOOL_COALESCE_USE_ADAPTIVE |
 				     ETHTOOL_COALESCE_USE_CQE,
+	.supported_ring_params = ETHTOOL_RING_USE_TCP_DATA_SPLIT,
 	.get_drvinfo       = mlx5e_get_drvinfo,
 	.get_link          = ethtool_op_get_link,
 	.get_link_ext_state  = mlx5e_get_link_ext_state,
-- 
2.48.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-16 21:55 ` [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap Saeed Mahameed
@ 2025-01-16 23:21   ` Jakub Kicinski
  2025-01-16 23:46     ` Saeed Mahameed
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-01-16 23:21 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea

On Thu, 16 Jan 2025 13:55:28 -0800 Saeed Mahameed wrote:
> +static const struct netdev_queue_mgmt_ops mlx5e_queue_mgmt_ops = {
> +	.ndo_queue_mem_size	=	sizeof(struct mlx5_qmgmt_data),
> +	.ndo_queue_mem_alloc	=	mlx5e_queue_mem_alloc,
> +	.ndo_queue_mem_free	=	mlx5e_queue_mem_free,
> +	.ndo_queue_start	=	mlx5e_queue_start,
> +	.ndo_queue_stop		=	mlx5e_queue_stop,
> +};

We need to pay off some technical debt we accrued before we merge more
queue ops implementations. Specifically the locking needs to move from
under rtnl. Sorry, this is not going in for 6.14.
-- 
pw-bot: defer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-16 23:21   ` Jakub Kicinski
@ 2025-01-16 23:46     ` Saeed Mahameed
  2025-01-16 23:54       ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-16 23:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea

On 16 Jan 15:21, Jakub Kicinski wrote:
>On Thu, 16 Jan 2025 13:55:28 -0800 Saeed Mahameed wrote:
>> +static const struct netdev_queue_mgmt_ops mlx5e_queue_mgmt_ops = {
>> +	.ndo_queue_mem_size	=	sizeof(struct mlx5_qmgmt_data),
>> +	.ndo_queue_mem_alloc	=	mlx5e_queue_mem_alloc,
>> +	.ndo_queue_mem_free	=	mlx5e_queue_mem_free,
>> +	.ndo_queue_start	=	mlx5e_queue_start,
>> +	.ndo_queue_stop		=	mlx5e_queue_stop,
>> +};
>
>We need to pay off some technical debt we accrued before we merge more
>queue ops implementations. Specifically the locking needs to move from
>under rtnl. Sorry, this is not going in for 6.14.

What technical debt accrued ? I haven't seen any changes in queue API since
bnxt and gve got merged, what changed since then ?

mlx5 doesn't require rtnl if this is because of the assert, I can remove
it. I don't understand what this series is being deferred for, please
elaborate, what do I need to do to get it accepted ?

Thanks,
Saeed.

>-- 
>pw-bot: defer

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-16 23:46     ` Saeed Mahameed
@ 2025-01-16 23:54       ` Jakub Kicinski
  2025-01-24  0:39         ` Stanislav Fomichev
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-01-16 23:54 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Saeed Mahameed,
	netdev, Tariq Toukan, Gal Pressman, Leon Romanovsky,
	Dragos Tatulea

On Thu, 16 Jan 2025 15:46:43 -0800 Saeed Mahameed wrote:
> >We need to pay off some technical debt we accrued before we merge more
> >queue ops implementations. Specifically the locking needs to move from
> >under rtnl. Sorry, this is not going in for 6.14.  
> 
> What technical debt accrued ? I haven't seen any changes in queue API since
> bnxt and gve got merged, what changed since then ?
> 
> mlx5 doesn't require rtnl if this is because of the assert, I can remove
> it. I don't understand what this series is being deferred for, please
> elaborate, what do I need to do to get it accepted ?

Remove the dependency on rtnl_lock _in the core kernel_.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-16 23:54       ` Jakub Kicinski
@ 2025-01-24  0:39         ` Stanislav Fomichev
  2025-01-24  0:55           ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Stanislav Fomichev @ 2025-01-24  0:39 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

On 01/16, Jakub Kicinski wrote:
> On Thu, 16 Jan 2025 15:46:43 -0800 Saeed Mahameed wrote:
> > >We need to pay off some technical debt we accrued before we merge more
> > >queue ops implementations. Specifically the locking needs to move from
> > >under rtnl. Sorry, this is not going in for 6.14.  
> > 
> > What technical debt accrued ? I haven't seen any changes in queue API since
> > bnxt and gve got merged, what changed since then ?
> > 
> > mlx5 doesn't require rtnl if this is because of the assert, I can remove
> > it. I don't understand what this series is being deferred for, please
> > elaborate, what do I need to do to get it accepted ?
> 
> Remove the dependency on rtnl_lock _in the core kernel_.

IIUC, we want queue API to move away from rtnl and use only (new) netdev
lock. Otherwise, removing this dependency in the future might be
complicated. I'll talk to Jakub so can we can maybe get something out early
in the next merge window so you can retest the mlx5 changes on top.
Will that work? (unless, Saeed, you want to look into that core locking part
yourself)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-24  0:39         ` Stanislav Fomichev
@ 2025-01-24  0:55           ` Jakub Kicinski
  2025-01-24  3:11             ` Saeed Mahameed
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-01-24  0:55 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Saeed Mahameed, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

On Thu, 23 Jan 2025 16:39:05 -0800 Stanislav Fomichev wrote:
> > > What technical debt accrued ? I haven't seen any changes in queue API since
> > > bnxt and gve got merged, what changed since then ?
> > > 
> > > mlx5 doesn't require rtnl if this is because of the assert, I can remove
> > > it. I don't understand what this series is being deferred for, please
> > > elaborate, what do I need to do to get it accepted ?  
> > 
> > Remove the dependency on rtnl_lock _in the core kernel_.  
> 
> IIUC, we want queue API to move away from rtnl and use only (new) netdev
> lock. Otherwise, removing this dependency in the future might be
> complicated.

Correct. We only have one driver now which reportedly works (gve).
Let's pull queues under optional netdev_lock protection.
Then we can use queue mgmt op support as a carrot for drivers
to convert / test the netdev_lock protection... "compliance".

I added netdev_lock protection for NAPI before the merge window.
Queues are configured in much more ad-hoc fashion, so I think 
the best way to make queue changes netdev_lock safe would be to
wrap all driver ops which are currently under rtnl_lock with
netdev_lock.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-24  0:55           ` Jakub Kicinski
@ 2025-01-24  3:11             ` Saeed Mahameed
  2025-01-24 15:26               ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-24  3:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Stanislav Fomichev, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

On 23 Jan 16:55, Jakub Kicinski wrote:
>On Thu, 23 Jan 2025 16:39:05 -0800 Stanislav Fomichev wrote:
>> > > What technical debt accrued ? I haven't seen any changes in queue API since
>> > > bnxt and gve got merged, what changed since then ?
>> > >
>> > > mlx5 doesn't require rtnl if this is because of the assert, I can remove
>> > > it. I don't understand what this series is being deferred for, please
>> > > elaborate, what do I need to do to get it accepted ?
>> >
>> > Remove the dependency on rtnl_lock _in the core kernel_.
>>
>> IIUC, we want queue API to move away from rtnl and use only (new) netdev
>> lock. Otherwise, removing this dependency in the future might be
>> complicated.
>
>Correct. We only have one driver now which reportedly works (gve).
>Let's pull queues under optional netdev_lock protection.
>Then we can use queue mgmt op support as a carrot for drivers
>to convert / test the netdev_lock protection... "compliance".
>
>I added netdev_lock protection for NAPI before the merge window.
>Queues are configured in much more ad-hoc fashion, so I think
>the best way to make queue changes netdev_lock safe would be to
>wrap all driver ops which are currently under rtnl_lock with
>netdev_lock.

Are you expecting drivers to hold netdev_lock internally? 
I was thinking something more scalable, queue_mgmt API to take
netdev_lock,  and any other place in the stack that can access 
"netdev queue config" e.g ethtool/netlink/netdev_ops should grab 
netdev_lock as well, this is better for the future when we want to 
reduce rtnl usage in the stack to protect single netdev ops where
netdev_lock will be sufficient, otherwise you will have to wait for ALL
drivers to properly use netdev_lock internally to even start thinking of
getting rid of rtnl from some parts of the core stack.

  



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-24  3:11             ` Saeed Mahameed
@ 2025-01-24 15:26               ` Jakub Kicinski
  2025-01-24 19:34                 ` Saeed Mahameed
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2025-01-24 15:26 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Stanislav Fomichev, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

On Thu, 23 Jan 2025 19:11:23 -0800 Saeed Mahameed wrote:
> On 23 Jan 16:55, Jakub Kicinski wrote:
> >> IIUC, we want queue API to move away from rtnl and use only (new) netdev
> >> lock. Otherwise, removing this dependency in the future might be
> >> complicated.  
> >
> >Correct. We only have one driver now which reportedly works (gve).
> >Let's pull queues under optional netdev_lock protection.
> >Then we can use queue mgmt op support as a carrot for drivers
> >to convert / test the netdev_lock protection... "compliance".
> >
> >I added netdev_lock protection for NAPI before the merge window.
> >Queues are configured in much more ad-hoc fashion, so I think
> >the best way to make queue changes netdev_lock safe would be to
> >wrap all driver ops which are currently under rtnl_lock with
> >netdev_lock.  
> 
> Are you expecting drivers to hold netdev_lock internally? 
> I was thinking something more scalable, queue_mgmt API to take
> netdev_lock,  and any other place in the stack that can access 
> "netdev queue config" e.g ethtool/netlink/netdev_ops should grab 
> netdev_lock as well, this is better for the future when we want to 
> reduce rtnl usage in the stack to protect single netdev ops where
> netdev_lock will be sufficient, otherwise you will have to wait for ALL
> drivers to properly use netdev_lock internally to even start thinking of
> getting rid of rtnl from some parts of the core stack.

Agreed, expecting drivers to get the locking right internally is easier
short term but messy long term. I'm thinking opt-in for drivers to have
netdev_lock taken by the core. Probably around all ops which today hold
rtnl_lock, to keep the expectations simple.

net_shaper and queue_mgmt ops can require that drivers that support
them opt-in and these ops can hold just the netdev_lock, no rtnl_lock.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-24 15:26               ` Jakub Kicinski
@ 2025-01-24 19:34                 ` Saeed Mahameed
  2025-01-27 19:27                   ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Saeed Mahameed @ 2025-01-24 19:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Stanislav Fomichev, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

On 24 Jan 07:26, Jakub Kicinski wrote:
>On Thu, 23 Jan 2025 19:11:23 -0800 Saeed Mahameed wrote:
>> On 23 Jan 16:55, Jakub Kicinski wrote:
>> >> IIUC, we want queue API to move away from rtnl and use only (new) netdev
>> >> lock. Otherwise, removing this dependency in the future might be
>> >> complicated.
>> >
>> >Correct. We only have one driver now which reportedly works (gve).
>> >Let's pull queues under optional netdev_lock protection.
>> >Then we can use queue mgmt op support as a carrot for drivers
>> >to convert / test the netdev_lock protection... "compliance".
>> >
>> >I added netdev_lock protection for NAPI before the merge window.
>> >Queues are configured in much more ad-hoc fashion, so I think
>> >the best way to make queue changes netdev_lock safe would be to
>> >wrap all driver ops which are currently under rtnl_lock with
>> >netdev_lock.
>>
>> Are you expecting drivers to hold netdev_lock internally?
>> I was thinking something more scalable, queue_mgmt API to take
>> netdev_lock,  and any other place in the stack that can access
>> "netdev queue config" e.g ethtool/netlink/netdev_ops should grab
>> netdev_lock as well, this is better for the future when we want to
>> reduce rtnl usage in the stack to protect single netdev ops where
>> netdev_lock will be sufficient, otherwise you will have to wait for ALL
>> drivers to properly use netdev_lock internally to even start thinking of
>> getting rid of rtnl from some parts of the core stack.
>
>Agreed, expecting drivers to get the locking right internally is easier
>short term but messy long term. I'm thinking opt-in for drivers to have
>netdev_lock taken by the core. Probably around all ops which today hold
>rtnl_lock, to keep the expectations simple.
>

Why opt-in? I don't see any overhead of taking netdev_lock by default in
rtnl_lock flows.

>net_shaper and queue_mgmt ops can require that drivers that support
>them opt-in and these ops can hold just the netdev_lock, no rtnl_lock.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap
  2025-01-24 19:34                 ` Saeed Mahameed
@ 2025-01-27 19:27                   ` Jakub Kicinski
  0 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2025-01-27 19:27 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Stanislav Fomichev, David S. Miller, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

On Fri, 24 Jan 2025 11:34:54 -0800 Saeed Mahameed wrote:
> On 24 Jan 07:26, Jakub Kicinski wrote:
> >> Are you expecting drivers to hold netdev_lock internally?
> >> I was thinking something more scalable, queue_mgmt API to take
> >> netdev_lock,  and any other place in the stack that can access
> >> "netdev queue config" e.g ethtool/netlink/netdev_ops should grab
> >> netdev_lock as well, this is better for the future when we want to
> >> reduce rtnl usage in the stack to protect single netdev ops where
> >> netdev_lock will be sufficient, otherwise you will have to wait for ALL
> >> drivers to properly use netdev_lock internally to even start thinking of
> >> getting rid of rtnl from some parts of the core stack.  
> >
> >Agreed, expecting drivers to get the locking right internally is easier
> >short term but messy long term. I'm thinking opt-in for drivers to have
> >netdev_lock taken by the core. Probably around all ops which today hold
> >rtnl_lock, to keep the expectations simple.
> 
> Why opt-in? I don't see any overhead of taking netdev_lock by default in
> rtnl_lock flows.

We could, depends on how close we take the dev lock to the ndo vs to
rtnl_lock. Some drivers may call back into the stack so if we're not
careful enough we'll get flooded by static analysis reports saying 
that we had deadlocked some old Sun driver :(

Then there are SW upper drivers like bonding which we'll need at 
the very least lockdep nesting allocations for.

Would be great to solve all these issues, but IMHO not a hard
requirement, we can at least start with opt in. Unless always
taking the lock gives us some worthwhile invariant I haven't considered?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 07/11] net/mlx5e: Convert over to netmem
  2025-01-16 21:55 ` [net-next 07/11] net/mlx5e: Convert over to netmem Saeed Mahameed
@ 2025-02-05 20:14   ` Mina Almasry
  2025-04-09 12:40     ` Dragos Tatulea
  0 siblings, 1 reply; 23+ messages in thread
From: Mina Almasry @ 2025-02-05 20:14 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, Dragos Tatulea

On Thu, Jan 16, 2025 at 1:56 PM Saeed Mahameed <saeed@kernel.org> wrote:
>
> From: Saeed Mahameed <saeedm@nvidia.com>
>
> mlx5e_page_frag holds the physical page itself, to naturally support
> zc page pools, remove physical page reference from mlx5 and replace it
> with netmem_ref, to avoid internal handling in mlx5 for net_iov backed
> pages.
>
> No performance degradation observed.
>
> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 80 ++++++++++---------
>  2 files changed, 43 insertions(+), 39 deletions(-)
>
...
> @@ -514,9 +514,9 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
>         }
>
>         frag = &sinfo->frags[sinfo->nr_frags++];
> -       skb_frag_fill_page_desc(frag, frag_page->page, frag_offset, len);
> +       skb_frag_fill_netmem_desc(frag, netmem, frag_offset, len);
>
> -       if (page_is_pfmemalloc(frag_page->page))
> +       if (!netmem_is_net_iov(netmem) && page_is_pfmemalloc(netmem_to_page(netmem)))
>                 xdp_buff_set_frag_pfmemalloc(xdp);

Consider using:

netmem_is_pfmemalloc(netmem_ref netmem)

In general we try to avoid netmem_to_page() casts in the driver. These
assumptions may break in the future.

>         sinfo->xdp_frags_size += len;
>  }
> @@ -527,27 +527,29 @@ mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb,
>                    u32 frag_offset, u32 len,
>                    unsigned int truesize)
>  {
> -       dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
> +       dma_addr_t addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
> +       struct page *page = netmem_to_page(frag_page->netmem);
>         u8 next_frag = skb_shinfo(skb)->nr_frags;
>
>         dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len,
>                                 rq->buff.map_dir);
>
> -       if (skb_can_coalesce(skb, next_frag, frag_page->page, frag_offset)) {
> +       if (skb_can_coalesce(skb, next_frag, page, frag_offset)) {

Similarly here, consider adding skb_can_coalesce_netmem() that handles
this correctly in core code (which future drivers can reuse) rather
than doing 1-off handling in the driver.

Also, from a quick look at skb_can_coalesce(), I think it can work
fine with netmems? Because it just needs to be converted to use
skb_frag_netmem istead of skb_frag_page() inside of the function, but
otherwise the function looks applicable to netmem for me.

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [net-next 07/11] net/mlx5e: Convert over to netmem
  2025-02-05 20:14   ` Mina Almasry
@ 2025-04-09 12:40     ` Dragos Tatulea
  0 siblings, 0 replies; 23+ messages in thread
From: Dragos Tatulea @ 2025-04-09 12:40 UTC (permalink / raw)
  To: Mina Almasry, Saeed Mahameed
  Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
	Saeed Mahameed, netdev, Tariq Toukan, Gal Pressman,
	Leon Romanovsky, cratiu

On Wed, Feb 05, 2025 at 12:14:08PM -0800, Mina Almasry wrote:
> On Thu, Jan 16, 2025 at 1:56 PM Saeed Mahameed <saeed@kernel.org> wrote:
> >
> > From: Saeed Mahameed <saeedm@nvidia.com>
> >
> > mlx5e_page_frag holds the physical page itself, to naturally support
> > zc page pools, remove physical page reference from mlx5 and replace it
> > with netmem_ref, to avoid internal handling in mlx5 for net_iov backed
> > pages.
> >
> > No performance degradation observed.
> >
> > Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> > Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> > ---
> >  drivers/net/ethernet/mellanox/mlx5/core/en.h  |  2 +-
> >  .../net/ethernet/mellanox/mlx5/core/en_rx.c   | 80 ++++++++++---------
> >  2 files changed, 43 insertions(+), 39 deletions(-)
> >
> ...
> > @@ -514,9 +514,9 @@ mlx5e_add_skb_shared_info_frag(struct mlx5e_rq *rq, struct skb_shared_info *sinf
> >         }
> >
> >         frag = &sinfo->frags[sinfo->nr_frags++];
> > -       skb_frag_fill_page_desc(frag, frag_page->page, frag_offset, len);
> > +       skb_frag_fill_netmem_desc(frag, netmem, frag_offset, len);
> >
> > -       if (page_is_pfmemalloc(frag_page->page))
> > +       if (!netmem_is_net_iov(netmem) && page_is_pfmemalloc(netmem_to_page(netmem)))
> >                 xdp_buff_set_frag_pfmemalloc(xdp);
> 
> Consider using:
> 
> netmem_is_pfmemalloc(netmem_ref netmem)
> 
> In general we try to avoid netmem_to_page() casts in the driver. These
> assumptions may break in the future.
>
We will fix in v2 which we are preparing.

> >         sinfo->xdp_frags_size += len;
> >  }
> > @@ -527,27 +527,29 @@ mlx5e_add_skb_frag(struct mlx5e_rq *rq, struct sk_buff *skb,
> >                    u32 frag_offset, u32 len,
> >                    unsigned int truesize)
> >  {
> > -       dma_addr_t addr = page_pool_get_dma_addr(frag_page->page);
> > +       dma_addr_t addr = page_pool_get_dma_addr_netmem(frag_page->netmem);
> > +       struct page *page = netmem_to_page(frag_page->netmem);
> >         u8 next_frag = skb_shinfo(skb)->nr_frags;
> >
> >         dma_sync_single_for_cpu(rq->pdev, addr + frag_offset, len,
> >                                 rq->buff.map_dir);
> >
> > -       if (skb_can_coalesce(skb, next_frag, frag_page->page, frag_offset)) {
> > +       if (skb_can_coalesce(skb, next_frag, page, frag_offset)) {
> 
> Similarly here, consider adding skb_can_coalesce_netmem() that handles
> this correctly in core code (which future drivers can reuse) rather
> than doing 1-off handling in the driver.
> 
Good point. It is definitely worth adding as coalescing is desirable.

> Also, from a quick look at skb_can_coalesce(), I think it can work
> fine with netmems? Because it just needs to be converted to use
> skb_frag_netmem istead of skb_frag_page() inside of the function, but
> otherwise the function looks applicable to netmem for me.
>
Having an extra skb_can_coalesce_netmems() which can be called
by the driver on the rx path makes sense. I don't think we can drop the
skb_zcopy() check from skb_can_coalesce() as this is also used on the tx
path.

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-04-09 12:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-16 21:55 [pull request][net-next 00/11] mlx5 updates 2025-01-16 Saeed Mahameed
2025-01-16 21:55 ` [net-next 01/11] net: Kconfig NET_DEVMEM selects GENERIC_ALLOCATOR Saeed Mahameed
2025-01-16 21:55 ` [net-next 02/11] net/mlx5e: SHAMPO: Reorganize mlx5_rq_shampo_alloc Saeed Mahameed
2025-01-16 21:55 ` [net-next 03/11] net/mlx5e: SHAMPO: Remove redundant params Saeed Mahameed
2025-01-16 21:55 ` [net-next 04/11] net/mlx5e: SHAMPO: Improve hw gro capability checking Saeed Mahameed
2025-01-16 21:55 ` [net-next 05/11] net/mlx5e: SHAMPO: Separate pool for headers Saeed Mahameed
2025-01-16 21:55 ` [net-next 06/11] net/mlx5e: SHAMPO: Headers page pool stats Saeed Mahameed
2025-01-16 21:55 ` [net-next 07/11] net/mlx5e: Convert over to netmem Saeed Mahameed
2025-02-05 20:14   ` Mina Almasry
2025-04-09 12:40     ` Dragos Tatulea
2025-01-16 21:55 ` [net-next 08/11] net/mlx5e: Handle iov backed netmems Saeed Mahameed
2025-01-16 21:55 ` [net-next 09/11] net/mlx5e: Add support for UNREADABLE netmem page pools Saeed Mahameed
2025-01-16 21:55 ` [net-next 10/11] net/mlx5e: Implement queue mgmt ops and single channel swap Saeed Mahameed
2025-01-16 23:21   ` Jakub Kicinski
2025-01-16 23:46     ` Saeed Mahameed
2025-01-16 23:54       ` Jakub Kicinski
2025-01-24  0:39         ` Stanislav Fomichev
2025-01-24  0:55           ` Jakub Kicinski
2025-01-24  3:11             ` Saeed Mahameed
2025-01-24 15:26               ` Jakub Kicinski
2025-01-24 19:34                 ` Saeed Mahameed
2025-01-27 19:27                   ` Jakub Kicinski
2025-01-16 21:55 ` [net-next 11/11] net/mlx5e: Support ethtool tcp-data-split settings Saeed Mahameed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).