Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode
@ 2026-05-06 13:32 Tariq Toukan
  2026-05-06 13:32 ` [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Tariq Toukan @ 2026-05-06 13:32 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Moshe Shemesh, Akiva Goldberger, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Dragos Tatulea

Hi,

Find detailed description by Moshe below.

Regards,
Tariq

This series adds driver support for the VHCA_ID page management mode.
When firmware and driver support this mode, ICM (Interconnect Context
Memory) page management uses the device vhca_id as the function
identifier in MANAGE_PAGES, QUERY_PAGES, and page request events instead
of the legacy function_id + ec_function pair.

Background
Firmware can operate page management in two modes:
FUNC_ID mode (current): Function identity is (function_id, ec_function).
This remains the default and is used for boot pages and when the new
mode capability is not set.
VHCA_ID mode (new): Function identity is vhca_id only; ec_function is
ignored. This aligns page management with the vhca_id-based model used
by other firmware commands and simplifies identification on SmartNIC and
multi-function setups.

---

V2:
- Cache vhca_id to type mapping to provide lockless lookup.
- Store the resolved type on each fw_page at allocation time so reclaim
  and release paths read it directly without any lookup.
- Reorder erase old key and insert new key on migrate function.
- Fixed comment on mlx5_satisfy_startup_pages.

V1: https://lore.kernel.org/all/20260501044156.260875-1-tariqt@nvidia.com/

Moshe Shemesh (3):
  net/mlx5: Relax capability check for eswitch query paths
  net/mlx5: Make debugfs page counters by function type dynamic
  net/mlx5: Add VHCA_ID page management mode support

 .../net/ethernet/mellanox/mlx5/core/debugfs.c |  39 ++-
 .../ethernet/mellanox/mlx5/core/esw/ipsec.c   |   2 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  49 +++-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |   8 +
 .../mellanox/mlx5/core/eswitch_offloads.c     |  14 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    |  10 +-
 .../ethernet/mellanox/mlx5/core/pagealloc.c   | 250 +++++++++++++-----
 include/linux/mlx5/driver.h                   |   9 +
 8 files changed, 304 insertions(+), 77 deletions(-)


base-commit: 7e0cccae6b45b12eaf71fc3ab8eb133bb50b28ad
-- 
2.44.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths
  2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan
@ 2026-05-06 13:32 ` Tariq Toukan
  2026-05-06 13:32 ` [PATCH net-next V2 2/3] net/mlx5: Make debugfs page counters by function type dynamic Tariq Toukan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Tariq Toukan @ 2026-05-06 13:32 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Moshe Shemesh, Akiva Goldberger, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Dragos Tatulea

From: Moshe Shemesh <moshe@nvidia.com>

Several eswitch functions that only query other functions' HCA
capabilities or read cached vport state are guarded by the
vhca_resource_manager capability. This capability is required for
set_hca_cap operations but query_hca_cap of other functions only
requires the vport_group_manager capability.

Relax the capability check from vhca_resource_manager to
vport_group_manager in the following query-only paths:
- mlx5_esw_vport_caps_get() - queries other function general caps
- esw_ipsec_vf_query_generic() - queries other function ipsec cap
- mlx5_devlink_port_fn_migratable_get() - reads cached vport state
- mlx5_devlink_port_fn_roce_get() - reads cached vport state
- mlx5_devlink_port_fn_max_io_eqs_get() - queries other function caps
- mlx5_esw_vport_enable/disable() - vhca_id map/unmap

Functions that perform also set_hca_cap (migratable_set, roce_set,
max_io_eqs_set, esw_ipsec_vf_set_generic, esw_ipsec_vf_set_bytype)
retain the vhca_resource_manager requirement.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/esw/ipsec.c    |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  6 +++---
 .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 14 ++++++++------
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec.c
index 8b12c3ae0cf7..4811b60ea430 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec.c
@@ -12,7 +12,7 @@ static int esw_ipsec_vf_query_generic(struct mlx5_core_dev *dev, u16 vport_num,
 	void *hca_cap, *query_cap;
 	int err;
 
-	if (!MLX5_CAP_GEN(dev, vhca_resource_manager))
+	if (!MLX5_CAP_GEN(dev, vport_group_manager))
 		return -EOPNOTSUPP;
 
 	if (!mlx5_esw_ipsec_vf_offload_supported(dev)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 66a773a99876..e0eafcf0c52a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -806,7 +806,7 @@ static int mlx5_esw_vport_caps_get(struct mlx5_eswitch *esw, struct mlx5_vport *
 	void *hca_caps;
 	int err;
 
-	if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager))
+	if (!MLX5_CAP_GEN(esw->dev, vport_group_manager))
 		return 0;
 
 	query_ctx = kzalloc(query_out_sz, GFP_KERNEL);
@@ -938,7 +938,7 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport,
 		vport->info.trusted = true;
 
 	if (!mlx5_esw_is_manager_vport(esw, vport_num) &&
-	    MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
+	    MLX5_CAP_GEN(esw->dev, vport_group_manager)) {
 		ret = mlx5_esw_vport_vhca_id_map(esw, vport);
 		if (ret)
 			goto err_vhca_mapping;
@@ -976,7 +976,7 @@ void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
 		arm_vport_context_events_cmd(esw->dev, vport_num, 0);
 
 	if (!mlx5_esw_is_manager_vport(esw, vport_num) &&
-	    MLX5_CAP_GEN(esw->dev, vhca_resource_manager))
+	    MLX5_CAP_GEN(esw->dev, vport_group_manager))
 		mlx5_esw_vport_vhca_id_unmap(esw, vport);
 
 	if (vport->vport != MLX5_VPORT_HOST_PF &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 69ddf56e2fc9..392d8f364db6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -4677,8 +4677,9 @@ int mlx5_devlink_port_fn_migratable_get(struct devlink_port *port, bool *is_enab
 		return -EOPNOTSUPP;
 	}
 
-	if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
-		NL_SET_ERR_MSG_MOD(extack, "Device doesn't support VHCA management");
+	if (!MLX5_CAP_GEN(esw->dev, vport_group_manager)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Device doesn't support vport group management");
 		return -EOPNOTSUPP;
 	}
 
@@ -4753,8 +4754,9 @@ int mlx5_devlink_port_fn_roce_get(struct devlink_port *port, bool *is_enabled,
 	struct mlx5_eswitch *esw = mlx5_devlink_eswitch_nocheck_get(port->devlink);
 	struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
 
-	if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
-		NL_SET_ERR_MSG_MOD(extack, "Device doesn't support VHCA management");
+	if (!MLX5_CAP_GEN(esw->dev, vport_group_manager)) {
+		NL_SET_ERR_MSG_MOD(extack,
+				   "Device doesn't support vport group management");
 		return -EOPNOTSUPP;
 	}
 
@@ -5076,9 +5078,9 @@ mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 *max_io_eqs,
 	int err;
 
 	esw = mlx5_devlink_eswitch_nocheck_get(port->devlink);
-	if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) {
+	if (!MLX5_CAP_GEN(esw->dev, vport_group_manager)) {
 		NL_SET_ERR_MSG_MOD(extack,
-				   "Device doesn't support VHCA management");
+				   "Device doesn't support vport group management");
 		return -EOPNOTSUPP;
 	}
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next V2 2/3] net/mlx5: Make debugfs page counters by function type dynamic
  2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan
  2026-05-06 13:32 ` [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan
@ 2026-05-06 13:32 ` Tariq Toukan
  2026-05-06 13:32 ` [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan
  2026-05-09  2:10 ` [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode patchwork-bot+netdevbpf
  3 siblings, 0 replies; 6+ messages in thread
From: Tariq Toukan @ 2026-05-06 13:32 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Moshe Shemesh, Akiva Goldberger, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Dragos Tatulea

From: Moshe Shemesh <moshe@nvidia.com>

Make the per function type debugfs page counters dynamically added after
mlx5_eswitch_init(). When page management operates in vhca_id mode, only
the function acting as either eSwitch or vport manager can initialize
the eSwitch structure and translate the vhca_id to function type for the
functions to which it supplies pages. The next patch will add support
for page management in vhca_id mode.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/debugfs.c | 39 +++++++++++++++++--
 .../net/ethernet/mellanox/mlx5/core/main.c    |  7 +++-
 include/linux/mlx5/driver.h                   |  2 +
 3 files changed, 42 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
index 8fe263190d38..6347957fefcb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c
@@ -285,10 +285,6 @@ void mlx5_pages_debugfs_init(struct mlx5_core_dev *dev)
 	pages = dev->priv.dbg.pages_debugfs;
 
 	debugfs_create_u32("fw_pages_total", 0400, pages, &dev->priv.fw_pages);
-	debugfs_create_u32("fw_pages_vfs", 0400, pages, &dev->priv.page_counters[MLX5_VF]);
-	debugfs_create_u32("fw_pages_ec_vfs", 0400, pages, &dev->priv.page_counters[MLX5_EC_VF]);
-	debugfs_create_u32("fw_pages_sfs", 0400, pages, &dev->priv.page_counters[MLX5_SF]);
-	debugfs_create_u32("fw_pages_host_pf", 0400, pages, &dev->priv.page_counters[MLX5_HOST_PF]);
 	debugfs_create_u32("fw_pages_alloc_failed", 0400, pages, &dev->priv.fw_pages_alloc_failed);
 	debugfs_create_u32("fw_pages_give_dropped", 0400, pages, &dev->priv.give_pages_dropped);
 	debugfs_create_u32("fw_pages_reclaim_discard", 0400, pages,
@@ -300,6 +296,41 @@ void mlx5_pages_debugfs_cleanup(struct mlx5_core_dev *dev)
 	debugfs_remove_recursive(dev->priv.dbg.pages_debugfs);
 }
 
+void mlx5_pages_by_func_type_debugfs_init(struct mlx5_core_dev *dev)
+{
+	struct dentry *pages = dev->priv.dbg.pages_debugfs;
+
+	if (!pages)
+		return;
+
+	if (!dev->priv.eswitch &&
+	    MLX5_CAP_GEN(dev, icm_mng_function_id_mode) ==
+	    MLX5_ID_MODE_FUNCTION_VHCA_ID)
+		return;
+
+	debugfs_create_u32("fw_pages_vfs", 0400, pages,
+			   &dev->priv.page_counters[MLX5_VF]);
+	debugfs_create_u32("fw_pages_ec_vfs", 0400, pages,
+			   &dev->priv.page_counters[MLX5_EC_VF]);
+	debugfs_create_u32("fw_pages_sfs", 0400, pages,
+			   &dev->priv.page_counters[MLX5_SF]);
+	debugfs_create_u32("fw_pages_host_pf", 0400, pages,
+			   &dev->priv.page_counters[MLX5_HOST_PF]);
+}
+
+void mlx5_pages_by_func_type_debugfs_cleanup(struct mlx5_core_dev *dev)
+{
+	struct dentry *pages = dev->priv.dbg.pages_debugfs;
+
+	if (!pages)
+		return;
+
+	debugfs_lookup_and_remove("fw_pages_vfs", pages);
+	debugfs_lookup_and_remove("fw_pages_ec_vfs", pages);
+	debugfs_lookup_and_remove("fw_pages_sfs", pages);
+	debugfs_lookup_and_remove("fw_pages_host_pf", pages);
+}
+
 static u64 qp_read_field(struct mlx5_core_dev *dev, struct mlx5_core_qp *qp,
 			 int index, int *is_str)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index b1b9ebfd3866..0c1c906b60fa 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -987,11 +987,12 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 		mlx5_core_err(dev, "Failed to init eswitch %d\n", err);
 		goto err_sriov_cleanup;
 	}
+	mlx5_pages_by_func_type_debugfs_init(dev);
 
 	err = mlx5_fpga_init(dev);
 	if (err) {
 		mlx5_core_err(dev, "Failed to init fpga device %d\n", err);
-		goto err_eswitch_cleanup;
+		goto err_page_debugfs_cleanup;
 	}
 
 	err = mlx5_vhca_event_init(dev);
@@ -1034,7 +1035,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev)
 	mlx5_vhca_event_cleanup(dev);
 err_fpga_cleanup:
 	mlx5_fpga_cleanup(dev);
-err_eswitch_cleanup:
+err_page_debugfs_cleanup:
+	mlx5_pages_by_func_type_debugfs_cleanup(dev);
 	mlx5_eswitch_cleanup(dev->priv.eswitch);
 err_sriov_cleanup:
 	mlx5_sriov_cleanup(dev);
@@ -1072,6 +1074,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
 	mlx5_sf_hw_table_cleanup(dev);
 	mlx5_vhca_event_cleanup(dev);
 	mlx5_fpga_cleanup(dev);
+	mlx5_pages_by_func_type_debugfs_cleanup(dev);
 	mlx5_eswitch_cleanup(dev->priv.eswitch);
 	mlx5_sriov_cleanup(dev);
 	mlx5_mpfs_cleanup(dev);
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 531ce66fc8ef..d1751c5d01c7 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1039,6 +1039,8 @@ void mlx5_pagealloc_start(struct mlx5_core_dev *dev);
 void mlx5_pagealloc_stop(struct mlx5_core_dev *dev);
 void mlx5_pages_debugfs_init(struct mlx5_core_dev *dev);
 void mlx5_pages_debugfs_cleanup(struct mlx5_core_dev *dev);
+void mlx5_pages_by_func_type_debugfs_init(struct mlx5_core_dev *dev);
+void mlx5_pages_by_func_type_debugfs_cleanup(struct mlx5_core_dev *dev);
 int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot);
 int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev);
 void mlx5_register_debugfs(void);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support
  2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan
  2026-05-06 13:32 ` [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan
  2026-05-06 13:32 ` [PATCH net-next V2 2/3] net/mlx5: Make debugfs page counters by function type dynamic Tariq Toukan
@ 2026-05-06 13:32 ` Tariq Toukan
  2026-05-07 15:39   ` Moshe Shemesh
  2026-05-09  2:10 ` [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode patchwork-bot+netdevbpf
  3 siblings, 1 reply; 6+ messages in thread
From: Tariq Toukan @ 2026-05-06 13:32 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
	Moshe Shemesh, Akiva Goldberger, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Dragos Tatulea

From: Moshe Shemesh <moshe@nvidia.com>

Add support for VHCA_ID-based page management mode. When the device
firmware advertises the icm_mng_function_id_mode capability with
MLX5_ID_MODE_FUNCTION_VHCA_ID, page management operations between the
driver and firmware may use vhca_id instead of function_id as the
effective function identifier, and the ec_function field is ignored.

Update page management commands to conditionally set ec_function field
only in FUNC_ID mode. Boot page allocation always uses FUNC_ID mode
semantics for backward compatibility, as the capability bit is only
available after set_hca_cap(). If after set_hca_cap() VHCA_ID mode was
set, modify the tracking of the boot pages in page_root_xa to use
vhca_id too.

Add mlx5_esw_vhca_id_to_func_type() to resolve the function type in
VHCA_ID mode, enabling per-type debugfs counters. Use a dedicated
vhca_type_map xarray, to provide lockless lookup. Store the resolved
type on each fw_page at allocation time so reclaim and release paths
read it directly without any lookup.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |  45 +++-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |   8 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |   3 +
 .../ethernet/mellanox/mlx5/core/pagealloc.c   | 250 +++++++++++++-----
 include/linux/mlx5/driver.h                   |   7 +
 5 files changed, 251 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index e0eafcf0c52a..125129ef43e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -852,6 +852,38 @@ bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id)
 	return true;
 }
 
+static enum mlx5_func_type
+esw_vport_to_func_type(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
+{
+	u16 vport_num = vport->vport;
+
+	if (vport_num == MLX5_VPORT_HOST_PF)
+		return MLX5_HOST_PF;
+	if (xa_get_mark(&esw->vports, vport_num, MLX5_ESW_VPT_SF))
+		return MLX5_SF;
+	if (xa_get_mark(&esw->vports, vport_num, MLX5_ESW_VPT_VF))
+		return MLX5_VF;
+	return MLX5_EC_VF;
+}
+
+u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id)
+{
+	struct mlx5_eswitch *esw = dev->priv.eswitch;
+	void *entry;
+
+	if (vhca_id == MLX5_CAP_GEN(dev, vhca_id))
+		return MLX5_SELF;
+
+	if (!esw)
+		return MLX5_FUNC_TYPE_NONE;
+
+	entry = xa_load(&esw->vhca_type_map, vhca_id);
+	if (entry)
+		return xa_to_value(entry);
+
+	return MLX5_FUNC_TYPE_NONE;
+}
+
 static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
 {
 	bool vst_mode_steering = esw_vst_mode_is_steering(esw);
@@ -942,6 +974,11 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport,
 		ret = mlx5_esw_vport_vhca_id_map(esw, vport);
 		if (ret)
 			goto err_vhca_mapping;
+		ret = xa_insert(&esw->vhca_type_map, vport->vhca_id,
+				xa_mk_value(esw_vport_to_func_type(esw, vport)),
+				GFP_KERNEL);
+		if (ret)
+			goto err_type_map;
 	}
 
 	esw_vport_change_handle_locked(vport);
@@ -952,6 +989,8 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport,
 	mutex_unlock(&esw->state_lock);
 	return ret;
 
+err_type_map:
+	mlx5_esw_vport_vhca_id_unmap(esw, vport);
 err_vhca_mapping:
 	esw_vport_cleanup(esw, vport);
 	mutex_unlock(&esw->state_lock);
@@ -976,8 +1015,10 @@ void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
 		arm_vport_context_events_cmd(esw->dev, vport_num, 0);
 
 	if (!mlx5_esw_is_manager_vport(esw, vport_num) &&
-	    MLX5_CAP_GEN(esw->dev, vport_group_manager))
+	    MLX5_CAP_GEN(esw->dev, vport_group_manager)) {
+		xa_erase(&esw->vhca_type_map, vport->vhca_id);
 		mlx5_esw_vport_vhca_id_unmap(esw, vport);
+	}
 
 	if (vport->vport != MLX5_VPORT_HOST_PF &&
 	    (vport->info.ipsec_crypto_enabled || vport->info.ipsec_packet_enabled))
@@ -2084,6 +2125,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
 	atomic64_set(&esw->offloads.num_flows, 0);
 	ida_init(&esw->offloads.vport_metadata_ida);
 	xa_init_flags(&esw->offloads.vhca_map, XA_FLAGS_ALLOC);
+	xa_init(&esw->vhca_type_map);
 	mutex_init(&esw->state_lock);
 	init_rwsem(&esw->mode_lock);
 	refcount_set(&esw->qos.refcnt, 0);
@@ -2133,6 +2175,7 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw)
 	mutex_destroy(&esw->state_lock);
 	WARN_ON(!xa_empty(&esw->offloads.vhca_map));
 	xa_destroy(&esw->offloads.vhca_map);
+	xa_destroy(&esw->vhca_type_map);
 	ida_destroy(&esw->offloads.vport_metadata_ida);
 	mlx5e_mod_hdr_tbl_destroy(&esw->offloads.mod_hdr);
 	mutex_destroy(&esw->offloads.encap_tbl_lock);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 2fd601bd102f..b06d097824ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -373,6 +373,7 @@ struct mlx5_eswitch {
 	struct dentry *debugfs_root;
 	struct workqueue_struct *work_queue;
 	struct xarray vports;
+	struct xarray vhca_type_map;
 	u32 flags;
 	int                     total_vports;
 	int                     enabled_vports;
@@ -863,6 +864,7 @@ void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw,
 				  struct mlx5_vport *vport);
 int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num);
 bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id);
+u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id);
 
 void mlx5_esw_offloads_rep_remove(struct mlx5_eswitch *esw,
 				  const struct mlx5_vport *vport);
@@ -1034,6 +1036,12 @@ mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id)
 	return false;
 }
 
+static inline u16
+mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id)
+{
+	return MLX5_FUNC_TYPE_NONE;
+}
+
 static inline void
 mlx5_eswitch_safe_aux_devs_remove(struct mlx5_core_dev *dev) {}
 static inline struct mlx5_flow_handle *
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0c1c906b60fa..296c5223cf61 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -597,6 +597,9 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx)
 	if (MLX5_CAP_GEN_MAX(dev, release_all_pages))
 		MLX5_SET(cmd_hca_cap, set_hca_cap, release_all_pages, 1);
 
+	if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode))
+		MLX5_SET(cmd_hca_cap, set_hca_cap, icm_mng_function_id_mode, 1);
+
 	if (MLX5_CAP_GEN_MAX(dev, mkey_by_name))
 		MLX5_SET(cmd_hca_cap, set_hca_cap, mkey_by_name, 1);
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index 77ffa31cc505..ce2f7fa9bd48 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -38,6 +38,7 @@
 #include "mlx5_core.h"
 #include "lib/eq.h"
 #include "lib/tout.h"
+#include "eswitch.h"
 
 enum {
 	MLX5_PAGES_CANT_GIVE	= 0,
@@ -59,6 +60,7 @@ struct fw_page {
 	u64			addr;
 	struct page	       *page;
 	u32			function;
+	u16			func_type;
 	unsigned long		bitmask;
 	struct list_head	list;
 	unsigned int free_count;
@@ -69,9 +71,24 @@ enum {
 	MLX5_NUM_4K_IN_PAGE		= PAGE_SIZE / MLX5_ADAPTER_PAGE_SIZE,
 };
 
-static u32 get_function(u16 func_id, bool ec_function)
+static bool mlx5_page_mgt_mode_is_vhca_id(const struct mlx5_core_dev *dev)
 {
-	return (u32)func_id | (ec_function << 16);
+	return dev->priv.page_mgt_mode == MLX5_PAGE_MGT_MODE_VHCA_ID;
+}
+
+static void mlx5_page_mgt_mode_set(struct mlx5_core_dev *dev,
+				   enum mlx5_page_mgt_mode mode)
+{
+	dev->priv.page_mgt_mode = mode;
+}
+
+static u32 get_function_key(struct mlx5_core_dev *dev, u16 func_vhca_id,
+			    bool ec_function)
+{
+	if (mlx5_page_mgt_mode_is_vhca_id(dev))
+		return (u32)func_vhca_id;
+
+	return (u32)func_vhca_id | (ec_function << 16);
 }
 
 static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_function)
@@ -89,12 +106,21 @@ static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_funct
 	return MLX5_SF;
 }
 
+static u16 func_vhca_id_to_type(struct mlx5_core_dev *dev, u16 func_vhca_id,
+				bool ec_function)
+{
+	if (mlx5_page_mgt_mode_is_vhca_id(dev))
+		return mlx5_esw_vhca_id_to_func_type(dev, func_vhca_id);
+
+	return func_id_to_type(dev, func_vhca_id, ec_function);
+}
+
 static u32 mlx5_get_ec_function(u32 function)
 {
 	return function >> 16;
 }
 
-static u32 mlx5_get_func_id(u32 function)
+static u32 mlx5_get_func_vhca_id(u32 function)
 {
 	return function & 0xffff;
 }
@@ -123,7 +149,8 @@ static struct rb_root *page_root_per_function(struct mlx5_core_dev *dev, u32 fun
 	return root;
 }
 
-static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, u32 function)
+static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page,
+		       u32 function, u16 func_type)
 {
 	struct rb_node *parent = NULL;
 	struct rb_root *root;
@@ -156,6 +183,7 @@ static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, u
 	nfp->addr = addr;
 	nfp->page = page;
 	nfp->function = function;
+	nfp->func_type = func_type;
 	nfp->free_count = MLX5_NUM_4K_IN_PAGE;
 	for (i = 0; i < MLX5_NUM_4K_IN_PAGE; i++)
 		set_bit(i, &nfp->bitmask);
@@ -196,7 +224,7 @@ static struct fw_page *find_fw_page(struct mlx5_core_dev *dev, u64 addr,
 	return result;
 }
 
-static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_id,
+static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_vhca_id,
 				s32 *npages, int boot)
 {
 	u32 out[MLX5_ST_SZ_DW(query_pages_out)] = {};
@@ -207,14 +235,20 @@ static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_id,
 	MLX5_SET(query_pages_in, in, op_mod, boot ?
 		 MLX5_QUERY_PAGES_IN_OP_MOD_BOOT_PAGES :
 		 MLX5_QUERY_PAGES_IN_OP_MOD_INIT_PAGES);
-	MLX5_SET(query_pages_in, in, embedded_cpu_function, mlx5_core_is_ecpf(dev));
+
+	if (mlx5_page_mgt_mode_is_vhca_id(dev))
+		MLX5_SET(query_pages_in, in, function_id,
+			 MLX5_CAP_GEN(dev, vhca_id));
+	else
+		MLX5_SET(query_pages_in, in, embedded_cpu_function,
+			 mlx5_core_is_ecpf(dev));
 
 	err = mlx5_cmd_exec_inout(dev, query_pages, in, out);
 	if (err)
 		return err;
 
 	*npages = MLX5_GET(query_pages_out, out, num_pages);
-	*func_id = MLX5_GET(query_pages_out, out, function_id);
+	*func_vhca_id = MLX5_GET(query_pages_out, out, function_id);
 
 	return err;
 }
@@ -245,6 +279,10 @@ static int alloc_4k(struct mlx5_core_dev *dev, u64 *addr, u32 function)
 	if (!fp->free_count)
 		list_del(&fp->list);
 
+	if (fp->func_type != MLX5_FUNC_TYPE_NONE)
+		dev->priv.page_counters[fp->func_type]++;
+	dev->priv.fw_pages++;
+
 	*addr = fp->addr + n * MLX5_ADAPTER_PAGE_SIZE;
 
 	return 0;
@@ -280,6 +318,11 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function)
 		mlx5_core_warn_rl(dev, "page not found\n");
 		return;
 	}
+
+	if (fwp->func_type != MLX5_FUNC_TYPE_NONE)
+		dev->priv.page_counters[fwp->func_type]--;
+	dev->priv.fw_pages--;
+
 	n = (addr & ~MLX5_U64_4K_PAGE_MASK) >> MLX5_ADAPTER_PAGE_SHIFT;
 	fwp->free_count++;
 	set_bit(n, &fwp->bitmask);
@@ -289,7 +332,8 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function)
 		list_add(&fwp->list, &dev->priv.free_list);
 }
 
-static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
+static int alloc_system_page(struct mlx5_core_dev *dev, u32 function,
+			     u16 func_type)
 {
 	struct device *device = mlx5_core_dma_dev(dev);
 	int nid = dev->priv.numa_node;
@@ -317,7 +361,7 @@ static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
 		goto map;
 	}
 
-	err = insert_page(dev, addr, page, function);
+	err = insert_page(dev, addr, page, function, func_type);
 	if (err) {
 		mlx5_core_err(dev, "failed to track allocated page\n");
 		dma_unmap_page(device, addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
@@ -334,7 +378,7 @@ static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
 	return err;
 }
 
-static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_id,
+static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_vhca_id,
 			     bool ec_function)
 {
 	u32 in[MLX5_ST_SZ_DW(manage_pages_in)] = {};
@@ -342,19 +386,23 @@ static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_id,
 
 	MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES);
 	MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_CANT_GIVE);
-	MLX5_SET(manage_pages_in, in, function_id, func_id);
-	MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function);
+	MLX5_SET(manage_pages_in, in, function_id, func_vhca_id);
+
+	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
+		MLX5_SET(manage_pages_in, in, embedded_cpu_function,
+			 ec_function);
 
 	err = mlx5_cmd_exec_in(dev, manage_pages, in);
 	if (err)
-		mlx5_core_warn(dev, "page notify failed func_id(%d) err(%d)\n",
-			       func_id, err);
+		mlx5_core_warn(dev,
+			       "page notify failed func_vhca_id(%d) err(%d)\n",
+			       func_vhca_id, err);
 }
 
-static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
+static int give_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, int npages,
 		      int event, bool ec_function)
 {
-	u32 function = get_function(func_id, ec_function);
+	u32 function = get_function_key(dev, func_vhca_id, ec_function);
 	u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0};
 	int inlen = MLX5_ST_SZ_BYTES(manage_pages_in);
 	int notify_fail = event;
@@ -364,6 +412,8 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 	u32 *in;
 	int i;
 
+	func_type = func_vhca_id_to_type(dev, func_vhca_id, ec_function);
+
 	inlen += npages * MLX5_FLD_SZ_BYTES(manage_pages_in, pas[0]);
 	in = kvzalloc(inlen, GFP_KERNEL);
 	if (!in) {
@@ -377,7 +427,8 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 		err = alloc_4k(dev, &addr, function);
 		if (err) {
 			if (err == -ENOMEM)
-				err = alloc_system_page(dev, function);
+				err = alloc_system_page(dev, function,
+							func_type);
 			if (err) {
 				dev->priv.fw_pages_alloc_failed += (npages - i);
 				goto out_4k;
@@ -390,9 +441,12 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 
 	MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES);
 	MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_GIVE);
-	MLX5_SET(manage_pages_in, in, function_id, func_id);
+	MLX5_SET(manage_pages_in, in, function_id, func_vhca_id);
 	MLX5_SET(manage_pages_in, in, input_num_entries, npages);
-	MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function);
+
+	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
+		MLX5_SET(manage_pages_in, in, embedded_cpu_function,
+			 ec_function);
 
 	err = mlx5_cmd_do(dev, in, inlen, out, sizeof(out));
 	if (err == -EREMOTEIO) {
@@ -405,17 +459,15 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 	}
 	err = mlx5_cmd_check(dev, err, in, out);
 	if (err) {
-		mlx5_core_warn(dev, "func_id 0x%x, npages %d, err %d\n",
-			       func_id, npages, err);
+		mlx5_core_warn(dev, "func_vhca_id 0x%x, npages %d, err %d\n",
+			       func_vhca_id, npages, err);
 		goto out_dropped;
 	}
 
-	func_type = func_id_to_type(dev, func_id, ec_function);
-	dev->priv.page_counters[func_type] += npages;
-	dev->priv.fw_pages += npages;
-
-	mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x, err %d\n",
-		      npages, ec_function, func_id, err);
+	mlx5_core_dbg(dev,
+		      "npages %d, ec_function %d, func 0x%x, mode %d, err %d\n",
+		      npages, ec_function, func_vhca_id,
+		      mlx5_page_mgt_mode_is_vhca_id(dev), err);
 
 	kvfree(in);
 	return 0;
@@ -428,18 +480,17 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 out_free:
 	kvfree(in);
 	if (notify_fail)
-		page_notify_fail(dev, func_id, ec_function);
+		page_notify_fail(dev, func_vhca_id, ec_function);
 	return err;
 }
 
-static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id,
+static void release_all_pages(struct mlx5_core_dev *dev, u16 func_vhca_id,
 			      bool ec_function)
 {
-	u32 function = get_function(func_id, ec_function);
+	u32 function = get_function_key(dev, func_vhca_id, ec_function);
 	struct rb_root *root;
 	struct rb_node *p;
 	int npages = 0;
-	u16 func_type;
 
 	root = xa_load(&dev->priv.page_root_xa, function);
 	if (WARN_ON_ONCE(!root))
@@ -448,18 +499,20 @@ static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id,
 	p = rb_first(root);
 	while (p) {
 		struct fw_page *fwp = rb_entry(p, struct fw_page, rb_node);
+		int used = MLX5_NUM_4K_IN_PAGE - fwp->free_count;
 
 		p = rb_next(p);
-		npages += (MLX5_NUM_4K_IN_PAGE - fwp->free_count);
+		npages += used;
+		if (fwp->func_type != MLX5_FUNC_TYPE_NONE)
+			dev->priv.page_counters[fwp->func_type] -= used;
 		free_fwp(dev, fwp, fwp->free_count);
 	}
 
-	func_type = func_id_to_type(dev, func_id, ec_function);
-	dev->priv.page_counters[func_type] -= npages;
 	dev->priv.fw_pages -= npages;
 
-	mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x\n",
-		      npages, ec_function, func_id);
+	mlx5_core_dbg(dev, "npages %d, ec_function %d, func 0x%x, mode %d\n",
+		      npages, ec_function, func_vhca_id,
+		      mlx5_page_mgt_mode_is_vhca_id(dev));
 }
 
 static u32 fwp_fill_manage_pages_out(struct fw_page *fwp, u32 *out, u32 index,
@@ -487,7 +540,7 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev,
 	struct fw_page *fwp;
 	struct rb_node *p;
 	bool ec_function;
-	u32 func_id;
+	u32 func_vhca_id;
 	u32 npages;
 	u32 i = 0;
 	int err;
@@ -499,10 +552,11 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev,
 
 	/* No hard feelings, we want our pages back! */
 	npages = MLX5_GET(manage_pages_in, in, input_num_entries);
-	func_id = MLX5_GET(manage_pages_in, in, function_id);
+	func_vhca_id = MLX5_GET(manage_pages_in, in, function_id);
 	ec_function = MLX5_GET(manage_pages_in, in, embedded_cpu_function);
 
-	root = xa_load(&dev->priv.page_root_xa, get_function(func_id, ec_function));
+	root = xa_load(&dev->priv.page_root_xa,
+		       get_function_key(dev, func_vhca_id, ec_function));
 	if (WARN_ON_ONCE(!root))
 		return -EEXIST;
 
@@ -518,14 +572,14 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev,
 	return 0;
 }
 
-static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
-			 int *nclaimed, bool event, bool ec_function)
+static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_vhca_id,
+			 int npages, int *nclaimed, bool event,
+			 bool ec_function)
 {
-	u32 function = get_function(func_id, ec_function);
+	u32 function = get_function_key(dev, func_vhca_id, ec_function);
 	int outlen = MLX5_ST_SZ_BYTES(manage_pages_out);
 	u32 in[MLX5_ST_SZ_DW(manage_pages_in)] = {};
 	int num_claimed;
-	u16 func_type;
 	u32 *out;
 	int err;
 	int i;
@@ -540,12 +594,16 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 
 	MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES);
 	MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_TAKE);
-	MLX5_SET(manage_pages_in, in, function_id, func_id);
+	MLX5_SET(manage_pages_in, in, function_id, func_vhca_id);
 	MLX5_SET(manage_pages_in, in, input_num_entries, npages);
-	MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function);
 
-	mlx5_core_dbg(dev, "func 0x%x, npages %d, outlen %d\n",
-		      func_id, npages, outlen);
+	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
+		MLX5_SET(manage_pages_in, in, embedded_cpu_function,
+			 ec_function);
+
+	mlx5_core_dbg(dev, "func 0x%x, npages %d, outlen %d mode %d\n",
+		      func_vhca_id, npages, outlen,
+		      mlx5_page_mgt_mode_is_vhca_id(dev));
 	err = reclaim_pages_cmd(dev, in, sizeof(in), out, outlen);
 	if (err) {
 		npages = MLX5_GET(manage_pages_in, in, input_num_entries);
@@ -577,10 +635,6 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
 	if (nclaimed)
 		*nclaimed = num_claimed;
 
-	func_type = func_id_to_type(dev, func_id, ec_function);
-	dev->priv.page_counters[func_type] -= num_claimed;
-	dev->priv.fw_pages -= num_claimed;
-
 out_free:
 	kvfree(out);
 	return err;
@@ -658,30 +712,102 @@ static int req_pages_handler(struct notifier_block *nb,
 	 * req->npages (and not min ()).
 	 */
 	req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES);
-	req->ec_function = ec_function;
+	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
+		req->ec_function = ec_function;
 	req->release_all = release_all;
 	INIT_WORK(&req->work, pages_work_handler);
 	queue_work(dev->priv.pg_wq, &req->work);
 	return NOTIFY_OK;
 }
 
+/*
+ * After set_hca_cap(), the second satisfy_startup_pages(dev, 0) may see
+ * VHCA_ID mode. If page_root_xa already has the PF entry from the first
+ * (boot) call under FUNC_ID keys 0 or (ec_function << 16), migrate that
+ * entry to the device vhca_id key so lookups use VHCA_ID semantics.
+ */
+static int mlx5_pagealloc_migrate_pf_to_vhca_id(struct mlx5_core_dev *dev)
+{
+	u32 vhca_id_key, old_key;
+	struct rb_root *root;
+	struct fw_page *fwp;
+	struct rb_node *p;
+	bool ec_function;
+	int err;
+
+	if (xa_empty(&dev->priv.page_root_xa))
+		return 0;
+
+	vhca_id_key = MLX5_CAP_GEN(dev, vhca_id);
+	ec_function = mlx5_core_is_ecpf(dev);
+
+	old_key = ec_function ? (1U << 16) : 0;
+	root = xa_load(&dev->priv.page_root_xa, old_key);
+	if (!root)
+		return 0;
+
+	if (old_key == vhca_id_key)
+		return 0;
+
+	err = xa_insert(&dev->priv.page_root_xa, vhca_id_key, root, GFP_KERNEL);
+	if (err) {
+		mlx5_core_warn(dev,
+			       "failed to migrate page root key 0x%x to vhca_id 0x%x\n",
+			       old_key, vhca_id_key);
+		return err;
+	}
+
+	for (p = rb_first(root); p; p = rb_next(p)) {
+		fwp = rb_entry(p, struct fw_page, rb_node);
+		fwp->function = vhca_id_key;
+	}
+
+	xa_erase(&dev->priv.page_root_xa, old_key);
+
+	return 0;
+}
+
 int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot)
 {
-	u16 func_id;
+	bool ec_function = false;
+	u16 func_vhca_id;
 	s32 npages;
 	int err;
 
-	err = mlx5_cmd_query_pages(dev, &func_id, &npages, boot);
+	/* Boot pages are requested before set_hca_cap(), so the capability
+	 * is not negotiated yet; use FUNC_ID mode for backward compatibility.
+	 * Init pages are requested after set_hca_cap(), which unconditionally
+	 * enables CAP_GEN_MAX. Current caps are not re-queried at this point,
+	 * so check CAP_GEN_MAX directly.
+	 */
+	if (boot) {
+		mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_FUNC_ID);
+	} else {
+		if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode) ==
+		    MLX5_ID_MODE_FUNCTION_VHCA_ID) {
+			err = mlx5_pagealloc_migrate_pf_to_vhca_id(dev);
+			if (err)
+				return err;
+			mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_VHCA_ID);
+		}
+	}
+
+	err = mlx5_cmd_query_pages(dev, &func_vhca_id, &npages, boot);
 	if (err)
 		return err;
 
-	mlx5_core_dbg(dev, "requested %d %s pages for func_id 0x%x\n",
-		      npages, boot ? "boot" : "init", func_id);
+	mlx5_core_dbg(dev,
+		      "requested %d %s pages for func_vhca_id 0x%x\n",
+		      npages, boot ? "boot" : "init", func_vhca_id);
 
 	if (!npages)
 		return 0;
 
-	return give_pages(dev, func_id, npages, 0, mlx5_core_is_ecpf(dev));
+	/* In VHCA_ID mode, ec_function remains false (not used). */
+	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
+		ec_function = mlx5_core_is_ecpf(dev);
+
+	return give_pages(dev, func_vhca_id, npages, 0, ec_function);
 }
 
 enum {
@@ -709,15 +835,17 @@ static int mlx5_reclaim_root_pages(struct mlx5_core_dev *dev,
 
 	while (!RB_EMPTY_ROOT(root)) {
 		u32 ec_function = mlx5_get_ec_function(function);
-		u32 function_id = mlx5_get_func_id(function);
+		u32 func_vhca_id = mlx5_get_func_vhca_id(function);
 		int nclaimed;
 		int err;
 
-		err = reclaim_pages(dev, function_id, optimal_reclaimed_pages(),
+		err = reclaim_pages(dev, func_vhca_id,
+				    optimal_reclaimed_pages(),
 				    &nclaimed, false, ec_function);
 		if (err) {
-			mlx5_core_warn(dev, "reclaim_pages err (%d) func_id=0x%x ec_func=0x%x\n",
-				       err, function_id, ec_function);
+			mlx5_core_warn(dev,
+				       "reclaim_pages err (%d) func_vhca_id=0x%x ec_func=0x%x\n",
+				       err, func_vhca_id, ec_function);
 			return err;
 		}
 
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d1751c5d01c7..8b4d384125d1 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -558,6 +558,12 @@ enum mlx5_func_type {
 	MLX5_HOST_PF,
 	MLX5_EC_VF,
 	MLX5_FUNC_TYPE_NUM,
+	MLX5_FUNC_TYPE_NONE = MLX5_FUNC_TYPE_NUM,
+};
+
+enum mlx5_page_mgt_mode {
+	MLX5_PAGE_MGT_MODE_FUNC_ID,
+	MLX5_PAGE_MGT_MODE_VHCA_ID,
 };
 
 struct mlx5_frag_buf_node_pools;
@@ -578,6 +584,7 @@ struct mlx5_priv {
 	u32			fw_pages_alloc_failed;
 	u32			give_pages_dropped;
 	u32			reclaim_pages_discard;
+	enum mlx5_page_mgt_mode	page_mgt_mode;
 
 	struct mlx5_core_health health;
 	struct list_head	traps;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support
  2026-05-06 13:32 ` [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan
@ 2026-05-07 15:39   ` Moshe Shemesh
  0 siblings, 0 replies; 6+ messages in thread
From: Moshe Shemesh @ 2026-05-07 15:39 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Saeed Mahameed, Leon Romanovsky, Mark Bloch, Akiva Goldberger,
	netdev, linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea



On 5/6/2026 4:32 PM, Tariq Toukan wrote:
> From: Moshe Shemesh <moshe@nvidia.com>
> 
> Add support for VHCA_ID-based page management mode. When the device
> firmware advertises the icm_mng_function_id_mode capability with
> MLX5_ID_MODE_FUNCTION_VHCA_ID, page management operations between the
> driver and firmware may use vhca_id instead of function_id as the
> effective function identifier, and the ec_function field is ignored.
> 
> Update page management commands to conditionally set ec_function field
> only in FUNC_ID mode. Boot page allocation always uses FUNC_ID mode
> semantics for backward compatibility, as the capability bit is only
> available after set_hca_cap(). If after set_hca_cap() VHCA_ID mode was
> set, modify the tracking of the boot pages in page_root_xa to use
> vhca_id too.
> 
> Add mlx5_esw_vhca_id_to_func_type() to resolve the function type in
> VHCA_ID mode, enabling per-type debugfs counters. Use a dedicated
> vhca_type_map xarray, to provide lockless lookup. Store the resolved
> type on each fw_page at allocation time so reclaim and release paths
> read it directly without any lookup.
> 
> Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
> Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com>
> Reviewed-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>   .../net/ethernet/mellanox/mlx5/core/eswitch.c |  45 +++-
>   .../net/ethernet/mellanox/mlx5/core/eswitch.h |   8 +
>   .../net/ethernet/mellanox/mlx5/core/main.c    |   3 +
>   .../ethernet/mellanox/mlx5/core/pagealloc.c   | 250 +++++++++++++-----
>   include/linux/mlx5/driver.h                   |   7 +
>   5 files changed, 251 insertions(+), 62 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index e0eafcf0c52a..125129ef43e3 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -852,6 +852,38 @@ bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id)
>   	return true;
>   }
>   
> +static enum mlx5_func_type
> +esw_vport_to_func_type(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
> +{
> +	u16 vport_num = vport->vport;
> +
> +	if (vport_num == MLX5_VPORT_HOST_PF)
> +		return MLX5_HOST_PF;
> +	if (xa_get_mark(&esw->vports, vport_num, MLX5_ESW_VPT_SF))
> +		return MLX5_SF;
> +	if (xa_get_mark(&esw->vports, vport_num, MLX5_ESW_VPT_VF))
> +		return MLX5_VF;
> +	return MLX5_EC_VF;
> +}
> +
> +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id)
> +{
> +	struct mlx5_eswitch *esw = dev->priv.eswitch;
> +	void *entry;
> +
> +	if (vhca_id == MLX5_CAP_GEN(dev, vhca_id))
> +		return MLX5_SELF;
> +
> +	if (!esw)
> +		return MLX5_FUNC_TYPE_NONE;
> +
> +	entry = xa_load(&esw->vhca_type_map, vhca_id);
> +	if (entry)
> +		return xa_to_value(entry);
> +
> +	return MLX5_FUNC_TYPE_NONE;
> +}
> +
>   static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
>   {
>   	bool vst_mode_steering = esw_vst_mode_is_steering(esw);
> @@ -942,6 +974,11 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport,
>   		ret = mlx5_esw_vport_vhca_id_map(esw, vport);
>   		if (ret)
>   			goto err_vhca_mapping;
> +		ret = xa_insert(&esw->vhca_type_map, vport->vhca_id,
> +				xa_mk_value(esw_vport_to_func_type(esw, vport)),
> +				GFP_KERNEL);
> +		if (ret)
> +			goto err_type_map;

Sashiko says:
"
If xa_insert() fails here, the error path goes to err_type_map but does
not appear to revert vport->enabled or the increment of
esw->enabled_ipsec_vf_count that occurred earlier in the function.
Since esw->enabled_vports is only incremented on success, could this leave
the vport in an inconsistent state? A later call to
mlx5_esw_vport_disable() might see vport->enabled as true and decrement
esw->enabled_vports, potentially causing an integer underflow.
"

The same inconsistency existed before this patch with the 
err_vhca_mapping path for the mlx5_esw_vport_vhca_id_map() failure.
So it will be addressed in a separate fix patch.

- Moshe

>   	}
>   
>   	esw_vport_change_handle_locked(vport);
> @@ -952,6 +989,8 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport,
>   	mutex_unlock(&esw->state_lock);
>   	return ret;
>   
> +err_type_map:
> +	mlx5_esw_vport_vhca_id_unmap(esw, vport);
>   err_vhca_mapping:
>   	esw_vport_cleanup(esw, vport);
>   	mutex_unlock(&esw->state_lock);
> @@ -976,8 +1015,10 @@ void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
>   		arm_vport_context_events_cmd(esw->dev, vport_num, 0);
>   
>   	if (!mlx5_esw_is_manager_vport(esw, vport_num) &&
> -	    MLX5_CAP_GEN(esw->dev, vport_group_manager))
> +	    MLX5_CAP_GEN(esw->dev, vport_group_manager)) {
> +		xa_erase(&esw->vhca_type_map, vport->vhca_id);
>   		mlx5_esw_vport_vhca_id_unmap(esw, vport);
> +	}
>   
>   	if (vport->vport != MLX5_VPORT_HOST_PF &&
>   	    (vport->info.ipsec_crypto_enabled || vport->info.ipsec_packet_enabled))
> @@ -2084,6 +2125,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev)
>   	atomic64_set(&esw->offloads.num_flows, 0);
>   	ida_init(&esw->offloads.vport_metadata_ida);
>   	xa_init_flags(&esw->offloads.vhca_map, XA_FLAGS_ALLOC);
> +	xa_init(&esw->vhca_type_map);
>   	mutex_init(&esw->state_lock);
>   	init_rwsem(&esw->mode_lock);
>   	refcount_set(&esw->qos.refcnt, 0);
> @@ -2133,6 +2175,7 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw)
>   	mutex_destroy(&esw->state_lock);
>   	WARN_ON(!xa_empty(&esw->offloads.vhca_map));
>   	xa_destroy(&esw->offloads.vhca_map);
> +	xa_destroy(&esw->vhca_type_map);
>   	ida_destroy(&esw->offloads.vport_metadata_ida);
>   	mlx5e_mod_hdr_tbl_destroy(&esw->offloads.mod_hdr);
>   	mutex_destroy(&esw->offloads.encap_tbl_lock);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> index 2fd601bd102f..b06d097824ad 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> @@ -373,6 +373,7 @@ struct mlx5_eswitch {
>   	struct dentry *debugfs_root;
>   	struct workqueue_struct *work_queue;
>   	struct xarray vports;
> +	struct xarray vhca_type_map;
>   	u32 flags;
>   	int                     total_vports;
>   	int                     enabled_vports;
> @@ -863,6 +864,7 @@ void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw,
>   				  struct mlx5_vport *vport);
>   int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num);
>   bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id);
> +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id);
>   
>   void mlx5_esw_offloads_rep_remove(struct mlx5_eswitch *esw,
>   				  const struct mlx5_vport *vport);
> @@ -1034,6 +1036,12 @@ mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id)
>   	return false;
>   }
>   
> +static inline u16
> +mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id)
> +{
> +	return MLX5_FUNC_TYPE_NONE;
> +}
> +
>   static inline void
>   mlx5_eswitch_safe_aux_devs_remove(struct mlx5_core_dev *dev) {}
>   static inline struct mlx5_flow_handle *
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 0c1c906b60fa..296c5223cf61 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -597,6 +597,9 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx)
>   	if (MLX5_CAP_GEN_MAX(dev, release_all_pages))
>   		MLX5_SET(cmd_hca_cap, set_hca_cap, release_all_pages, 1);
>   
> +	if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode))
> +		MLX5_SET(cmd_hca_cap, set_hca_cap, icm_mng_function_id_mode, 1);
> +
>   	if (MLX5_CAP_GEN_MAX(dev, mkey_by_name))
>   		MLX5_SET(cmd_hca_cap, set_hca_cap, mkey_by_name, 1);
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> index 77ffa31cc505..ce2f7fa9bd48 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> @@ -38,6 +38,7 @@
>   #include "mlx5_core.h"
>   #include "lib/eq.h"
>   #include "lib/tout.h"
> +#include "eswitch.h"
>   
>   enum {
>   	MLX5_PAGES_CANT_GIVE	= 0,
> @@ -59,6 +60,7 @@ struct fw_page {
>   	u64			addr;
>   	struct page	       *page;
>   	u32			function;
> +	u16			func_type;
>   	unsigned long		bitmask;
>   	struct list_head	list;
>   	unsigned int free_count;
> @@ -69,9 +71,24 @@ enum {
>   	MLX5_NUM_4K_IN_PAGE		= PAGE_SIZE / MLX5_ADAPTER_PAGE_SIZE,
>   };
>   
> -static u32 get_function(u16 func_id, bool ec_function)
> +static bool mlx5_page_mgt_mode_is_vhca_id(const struct mlx5_core_dev *dev)
>   {
> -	return (u32)func_id | (ec_function << 16);
> +	return dev->priv.page_mgt_mode == MLX5_PAGE_MGT_MODE_VHCA_ID;
> +}
> +
> +static void mlx5_page_mgt_mode_set(struct mlx5_core_dev *dev,
> +				   enum mlx5_page_mgt_mode mode)
> +{
> +	dev->priv.page_mgt_mode = mode;
> +}
> +
> +static u32 get_function_key(struct mlx5_core_dev *dev, u16 func_vhca_id,
> +			    bool ec_function)
> +{
> +	if (mlx5_page_mgt_mode_is_vhca_id(dev))
> +		return (u32)func_vhca_id;
> +
> +	return (u32)func_vhca_id | (ec_function << 16);
>   }
>   
>   static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_function)
> @@ -89,12 +106,21 @@ static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_funct
>   	return MLX5_SF;
>   }
>   
> +static u16 func_vhca_id_to_type(struct mlx5_core_dev *dev, u16 func_vhca_id,
> +				bool ec_function)
> +{
> +	if (mlx5_page_mgt_mode_is_vhca_id(dev))
> +		return mlx5_esw_vhca_id_to_func_type(dev, func_vhca_id);
> +
> +	return func_id_to_type(dev, func_vhca_id, ec_function);
> +}
> +
>   static u32 mlx5_get_ec_function(u32 function)
>   {
>   	return function >> 16;
>   }
>   
> -static u32 mlx5_get_func_id(u32 function)
> +static u32 mlx5_get_func_vhca_id(u32 function)
>   {
>   	return function & 0xffff;
>   }
> @@ -123,7 +149,8 @@ static struct rb_root *page_root_per_function(struct mlx5_core_dev *dev, u32 fun
>   	return root;
>   }
>   
> -static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, u32 function)
> +static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page,
> +		       u32 function, u16 func_type)
>   {
>   	struct rb_node *parent = NULL;
>   	struct rb_root *root;
> @@ -156,6 +183,7 @@ static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, u
>   	nfp->addr = addr;
>   	nfp->page = page;
>   	nfp->function = function;
> +	nfp->func_type = func_type;
>   	nfp->free_count = MLX5_NUM_4K_IN_PAGE;
>   	for (i = 0; i < MLX5_NUM_4K_IN_PAGE; i++)
>   		set_bit(i, &nfp->bitmask);
> @@ -196,7 +224,7 @@ static struct fw_page *find_fw_page(struct mlx5_core_dev *dev, u64 addr,
>   	return result;
>   }
>   
> -static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_id,
> +static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_vhca_id,
>   				s32 *npages, int boot)
>   {
>   	u32 out[MLX5_ST_SZ_DW(query_pages_out)] = {};
> @@ -207,14 +235,20 @@ static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_id,
>   	MLX5_SET(query_pages_in, in, op_mod, boot ?
>   		 MLX5_QUERY_PAGES_IN_OP_MOD_BOOT_PAGES :
>   		 MLX5_QUERY_PAGES_IN_OP_MOD_INIT_PAGES);
> -	MLX5_SET(query_pages_in, in, embedded_cpu_function, mlx5_core_is_ecpf(dev));
> +
> +	if (mlx5_page_mgt_mode_is_vhca_id(dev))
> +		MLX5_SET(query_pages_in, in, function_id,
> +			 MLX5_CAP_GEN(dev, vhca_id));
> +	else
> +		MLX5_SET(query_pages_in, in, embedded_cpu_function,
> +			 mlx5_core_is_ecpf(dev));
>   
>   	err = mlx5_cmd_exec_inout(dev, query_pages, in, out);
>   	if (err)
>   		return err;
>   
>   	*npages = MLX5_GET(query_pages_out, out, num_pages);
> -	*func_id = MLX5_GET(query_pages_out, out, function_id);
> +	*func_vhca_id = MLX5_GET(query_pages_out, out, function_id);
>   
>   	return err;
>   }
> @@ -245,6 +279,10 @@ static int alloc_4k(struct mlx5_core_dev *dev, u64 *addr, u32 function)
>   	if (!fp->free_count)
>   		list_del(&fp->list);
>   
> +	if (fp->func_type != MLX5_FUNC_TYPE_NONE)
> +		dev->priv.page_counters[fp->func_type]++;
> +	dev->priv.fw_pages++;
> +
>   	*addr = fp->addr + n * MLX5_ADAPTER_PAGE_SIZE;
>   
>   	return 0;
> @@ -280,6 +318,11 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function)
>   		mlx5_core_warn_rl(dev, "page not found\n");
>   		return;
>   	}
> +
> +	if (fwp->func_type != MLX5_FUNC_TYPE_NONE)
> +		dev->priv.page_counters[fwp->func_type]--;
> +	dev->priv.fw_pages--;
> +
>   	n = (addr & ~MLX5_U64_4K_PAGE_MASK) >> MLX5_ADAPTER_PAGE_SHIFT;
>   	fwp->free_count++;
>   	set_bit(n, &fwp->bitmask);
> @@ -289,7 +332,8 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function)
>   		list_add(&fwp->list, &dev->priv.free_list);
>   }
>   
> -static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
> +static int alloc_system_page(struct mlx5_core_dev *dev, u32 function,
> +			     u16 func_type)
>   {
>   	struct device *device = mlx5_core_dma_dev(dev);
>   	int nid = dev->priv.numa_node;
> @@ -317,7 +361,7 @@ static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
>   		goto map;
>   	}
>   
> -	err = insert_page(dev, addr, page, function);
> +	err = insert_page(dev, addr, page, function, func_type);
>   	if (err) {
>   		mlx5_core_err(dev, "failed to track allocated page\n");
>   		dma_unmap_page(device, addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
> @@ -334,7 +378,7 @@ static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
>   	return err;
>   }
>   
> -static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_id,
> +static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_vhca_id,
>   			     bool ec_function)
>   {
>   	u32 in[MLX5_ST_SZ_DW(manage_pages_in)] = {};
> @@ -342,19 +386,23 @@ static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_id,
>   
>   	MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES);
>   	MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_CANT_GIVE);
> -	MLX5_SET(manage_pages_in, in, function_id, func_id);
> -	MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function);
> +	MLX5_SET(manage_pages_in, in, function_id, func_vhca_id);
> +
> +	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
> +		MLX5_SET(manage_pages_in, in, embedded_cpu_function,
> +			 ec_function);
>   
>   	err = mlx5_cmd_exec_in(dev, manage_pages, in);
>   	if (err)
> -		mlx5_core_warn(dev, "page notify failed func_id(%d) err(%d)\n",
> -			       func_id, err);
> +		mlx5_core_warn(dev,
> +			       "page notify failed func_vhca_id(%d) err(%d)\n",
> +			       func_vhca_id, err);
>   }
>   
> -static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
> +static int give_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, int npages,
>   		      int event, bool ec_function)
>   {
> -	u32 function = get_function(func_id, ec_function);
> +	u32 function = get_function_key(dev, func_vhca_id, ec_function);
>   	u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0};
>   	int inlen = MLX5_ST_SZ_BYTES(manage_pages_in);
>   	int notify_fail = event;
> @@ -364,6 +412,8 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>   	u32 *in;
>   	int i;
>   
> +	func_type = func_vhca_id_to_type(dev, func_vhca_id, ec_function);
> +
>   	inlen += npages * MLX5_FLD_SZ_BYTES(manage_pages_in, pas[0]);
>   	in = kvzalloc(inlen, GFP_KERNEL);
>   	if (!in) {
> @@ -377,7 +427,8 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>   		err = alloc_4k(dev, &addr, function);
>   		if (err) {
>   			if (err == -ENOMEM)
> -				err = alloc_system_page(dev, function);
> +				err = alloc_system_page(dev, function,
> +							func_type);
>   			if (err) {
>   				dev->priv.fw_pages_alloc_failed += (npages - i);
>   				goto out_4k;
> @@ -390,9 +441,12 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>   
>   	MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES);
>   	MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_GIVE);
> -	MLX5_SET(manage_pages_in, in, function_id, func_id);
> +	MLX5_SET(manage_pages_in, in, function_id, func_vhca_id);
>   	MLX5_SET(manage_pages_in, in, input_num_entries, npages);
> -	MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function);
> +
> +	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
> +		MLX5_SET(manage_pages_in, in, embedded_cpu_function,
> +			 ec_function);
>   
>   	err = mlx5_cmd_do(dev, in, inlen, out, sizeof(out));
>   	if (err == -EREMOTEIO) {
> @@ -405,17 +459,15 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>   	}
>   	err = mlx5_cmd_check(dev, err, in, out);
>   	if (err) {
> -		mlx5_core_warn(dev, "func_id 0x%x, npages %d, err %d\n",
> -			       func_id, npages, err);
> +		mlx5_core_warn(dev, "func_vhca_id 0x%x, npages %d, err %d\n",
> +			       func_vhca_id, npages, err);
>   		goto out_dropped;
>   	}
>   
> -	func_type = func_id_to_type(dev, func_id, ec_function);
> -	dev->priv.page_counters[func_type] += npages;
> -	dev->priv.fw_pages += npages;
> -
> -	mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x, err %d\n",
> -		      npages, ec_function, func_id, err);
> +	mlx5_core_dbg(dev,
> +		      "npages %d, ec_function %d, func 0x%x, mode %d, err %d\n",
> +		      npages, ec_function, func_vhca_id,
> +		      mlx5_page_mgt_mode_is_vhca_id(dev), err);
>   
>   	kvfree(in);
>   	return 0;
> @@ -428,18 +480,17 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>   out_free:
>   	kvfree(in);
>   	if (notify_fail)
> -		page_notify_fail(dev, func_id, ec_function);
> +		page_notify_fail(dev, func_vhca_id, ec_function);
>   	return err;
>   }
>   
> -static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id,
> +static void release_all_pages(struct mlx5_core_dev *dev, u16 func_vhca_id,
>   			      bool ec_function)
>   {
> -	u32 function = get_function(func_id, ec_function);
> +	u32 function = get_function_key(dev, func_vhca_id, ec_function);
>   	struct rb_root *root;
>   	struct rb_node *p;
>   	int npages = 0;
> -	u16 func_type;
>   
>   	root = xa_load(&dev->priv.page_root_xa, function);
>   	if (WARN_ON_ONCE(!root))
> @@ -448,18 +499,20 @@ static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id,
>   	p = rb_first(root);
>   	while (p) {
>   		struct fw_page *fwp = rb_entry(p, struct fw_page, rb_node);
> +		int used = MLX5_NUM_4K_IN_PAGE - fwp->free_count;
>   
>   		p = rb_next(p);
> -		npages += (MLX5_NUM_4K_IN_PAGE - fwp->free_count);
> +		npages += used;
> +		if (fwp->func_type != MLX5_FUNC_TYPE_NONE)
> +			dev->priv.page_counters[fwp->func_type] -= used;
>   		free_fwp(dev, fwp, fwp->free_count);
>   	}
>   
> -	func_type = func_id_to_type(dev, func_id, ec_function);
> -	dev->priv.page_counters[func_type] -= npages;
>   	dev->priv.fw_pages -= npages;
>   
> -	mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x\n",
> -		      npages, ec_function, func_id);
> +	mlx5_core_dbg(dev, "npages %d, ec_function %d, func 0x%x, mode %d\n",
> +		      npages, ec_function, func_vhca_id,
> +		      mlx5_page_mgt_mode_is_vhca_id(dev));
>   }
>   
>   static u32 fwp_fill_manage_pages_out(struct fw_page *fwp, u32 *out, u32 index,
> @@ -487,7 +540,7 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev,
>   	struct fw_page *fwp;
>   	struct rb_node *p;
>   	bool ec_function;
> -	u32 func_id;
> +	u32 func_vhca_id;
>   	u32 npages;
>   	u32 i = 0;
>   	int err;
> @@ -499,10 +552,11 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev,
>   
>   	/* No hard feelings, we want our pages back! */
>   	npages = MLX5_GET(manage_pages_in, in, input_num_entries);
> -	func_id = MLX5_GET(manage_pages_in, in, function_id);
> +	func_vhca_id = MLX5_GET(manage_pages_in, in, function_id);
>   	ec_function = MLX5_GET(manage_pages_in, in, embedded_cpu_function);
>   
> -	root = xa_load(&dev->priv.page_root_xa, get_function(func_id, ec_function));
> +	root = xa_load(&dev->priv.page_root_xa,
> +		       get_function_key(dev, func_vhca_id, ec_function));
>   	if (WARN_ON_ONCE(!root))
>   		return -EEXIST;
>   
> @@ -518,14 +572,14 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev,
>   	return 0;
>   }
>   
> -static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
> -			 int *nclaimed, bool event, bool ec_function)
> +static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_vhca_id,
> +			 int npages, int *nclaimed, bool event,
> +			 bool ec_function)
>   {
> -	u32 function = get_function(func_id, ec_function);
> +	u32 function = get_function_key(dev, func_vhca_id, ec_function);
>   	int outlen = MLX5_ST_SZ_BYTES(manage_pages_out);
>   	u32 in[MLX5_ST_SZ_DW(manage_pages_in)] = {};
>   	int num_claimed;
> -	u16 func_type;
>   	u32 *out;
>   	int err;
>   	int i;
> @@ -540,12 +594,16 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>   
>   	MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES);
>   	MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_TAKE);
> -	MLX5_SET(manage_pages_in, in, function_id, func_id);
> +	MLX5_SET(manage_pages_in, in, function_id, func_vhca_id);
>   	MLX5_SET(manage_pages_in, in, input_num_entries, npages);
> -	MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function);
>   
> -	mlx5_core_dbg(dev, "func 0x%x, npages %d, outlen %d\n",
> -		      func_id, npages, outlen);
> +	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
> +		MLX5_SET(manage_pages_in, in, embedded_cpu_function,
> +			 ec_function);
> +
> +	mlx5_core_dbg(dev, "func 0x%x, npages %d, outlen %d mode %d\n",
> +		      func_vhca_id, npages, outlen,
> +		      mlx5_page_mgt_mode_is_vhca_id(dev));
>   	err = reclaim_pages_cmd(dev, in, sizeof(in), out, outlen);
>   	if (err) {
>   		npages = MLX5_GET(manage_pages_in, in, input_num_entries);
> @@ -577,10 +635,6 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
>   	if (nclaimed)
>   		*nclaimed = num_claimed;
>   
> -	func_type = func_id_to_type(dev, func_id, ec_function);
> -	dev->priv.page_counters[func_type] -= num_claimed;
> -	dev->priv.fw_pages -= num_claimed;
> -
>   out_free:
>   	kvfree(out);
>   	return err;
> @@ -658,30 +712,102 @@ static int req_pages_handler(struct notifier_block *nb,
>   	 * req->npages (and not min ()).
>   	 */
>   	req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES);
> -	req->ec_function = ec_function;
> +	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
> +		req->ec_function = ec_function;
>   	req->release_all = release_all;
>   	INIT_WORK(&req->work, pages_work_handler);
>   	queue_work(dev->priv.pg_wq, &req->work);
>   	return NOTIFY_OK;
>   }
>   
> +/*
> + * After set_hca_cap(), the second satisfy_startup_pages(dev, 0) may see
> + * VHCA_ID mode. If page_root_xa already has the PF entry from the first
> + * (boot) call under FUNC_ID keys 0 or (ec_function << 16), migrate that
> + * entry to the device vhca_id key so lookups use VHCA_ID semantics.
> + */
> +static int mlx5_pagealloc_migrate_pf_to_vhca_id(struct mlx5_core_dev *dev)
> +{
> +	u32 vhca_id_key, old_key;
> +	struct rb_root *root;
> +	struct fw_page *fwp;
> +	struct rb_node *p;
> +	bool ec_function;
> +	int err;
> +
> +	if (xa_empty(&dev->priv.page_root_xa))
> +		return 0;
> +
> +	vhca_id_key = MLX5_CAP_GEN(dev, vhca_id);
> +	ec_function = mlx5_core_is_ecpf(dev);
> +
> +	old_key = ec_function ? (1U << 16) : 0;
> +	root = xa_load(&dev->priv.page_root_xa, old_key);
> +	if (!root)
> +		return 0;
> +
> +	if (old_key == vhca_id_key)
> +		return 0;
> +
> +	err = xa_insert(&dev->priv.page_root_xa, vhca_id_key, root, GFP_KERNEL);
> +	if (err) {
> +		mlx5_core_warn(dev,
> +			       "failed to migrate page root key 0x%x to vhca_id 0x%x\n",
> +			       old_key, vhca_id_key);
> +		return err;
> +	}
> +
> +	for (p = rb_first(root); p; p = rb_next(p)) {
> +		fwp = rb_entry(p, struct fw_page, rb_node);
> +		fwp->function = vhca_id_key;
> +	}
> +
> +	xa_erase(&dev->priv.page_root_xa, old_key);
> +
> +	return 0;
> +}
> +
>   int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot)
>   {
> -	u16 func_id;
> +	bool ec_function = false;
> +	u16 func_vhca_id;
>   	s32 npages;
>   	int err;
>   
> -	err = mlx5_cmd_query_pages(dev, &func_id, &npages, boot);
> +	/* Boot pages are requested before set_hca_cap(), so the capability
> +	 * is not negotiated yet; use FUNC_ID mode for backward compatibility.
> +	 * Init pages are requested after set_hca_cap(), which unconditionally
> +	 * enables CAP_GEN_MAX. Current caps are not re-queried at this point,
> +	 * so check CAP_GEN_MAX directly.
> +	 */
> +	if (boot) {
> +		mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_FUNC_ID);
> +	} else {
> +		if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode) ==
> +		    MLX5_ID_MODE_FUNCTION_VHCA_ID) {
> +			err = mlx5_pagealloc_migrate_pf_to_vhca_id(dev);
> +			if (err)
> +				return err;
> +			mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_VHCA_ID);
> +		}
> +	}
> +
> +	err = mlx5_cmd_query_pages(dev, &func_vhca_id, &npages, boot);
>   	if (err)
>   		return err;
>   
> -	mlx5_core_dbg(dev, "requested %d %s pages for func_id 0x%x\n",
> -		      npages, boot ? "boot" : "init", func_id);
> +	mlx5_core_dbg(dev,
> +		      "requested %d %s pages for func_vhca_id 0x%x\n",
> +		      npages, boot ? "boot" : "init", func_vhca_id);
>   
>   	if (!npages)
>   		return 0;
>   
> -	return give_pages(dev, func_id, npages, 0, mlx5_core_is_ecpf(dev));
> +	/* In VHCA_ID mode, ec_function remains false (not used). */
> +	if (!mlx5_page_mgt_mode_is_vhca_id(dev))
> +		ec_function = mlx5_core_is_ecpf(dev);
> +
> +	return give_pages(dev, func_vhca_id, npages, 0, ec_function);
>   }
>   
>   enum {
> @@ -709,15 +835,17 @@ static int mlx5_reclaim_root_pages(struct mlx5_core_dev *dev,
>   
>   	while (!RB_EMPTY_ROOT(root)) {
>   		u32 ec_function = mlx5_get_ec_function(function);
> -		u32 function_id = mlx5_get_func_id(function);
> +		u32 func_vhca_id = mlx5_get_func_vhca_id(function);
>   		int nclaimed;
>   		int err;
>   
> -		err = reclaim_pages(dev, function_id, optimal_reclaimed_pages(),
> +		err = reclaim_pages(dev, func_vhca_id,
> +				    optimal_reclaimed_pages(),
>   				    &nclaimed, false, ec_function);
>   		if (err) {
> -			mlx5_core_warn(dev, "reclaim_pages err (%d) func_id=0x%x ec_func=0x%x\n",
> -				       err, function_id, ec_function);
> +			mlx5_core_warn(dev,
> +				       "reclaim_pages err (%d) func_vhca_id=0x%x ec_func=0x%x\n",
> +				       err, func_vhca_id, ec_function);
>   			return err;
>   		}
>   
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index d1751c5d01c7..8b4d384125d1 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -558,6 +558,12 @@ enum mlx5_func_type {
>   	MLX5_HOST_PF,
>   	MLX5_EC_VF,
>   	MLX5_FUNC_TYPE_NUM,
> +	MLX5_FUNC_TYPE_NONE = MLX5_FUNC_TYPE_NUM,
> +};
> +
> +enum mlx5_page_mgt_mode {
> +	MLX5_PAGE_MGT_MODE_FUNC_ID,
> +	MLX5_PAGE_MGT_MODE_VHCA_ID,
>   };
>   
>   struct mlx5_frag_buf_node_pools;
> @@ -578,6 +584,7 @@ struct mlx5_priv {
>   	u32			fw_pages_alloc_failed;
>   	u32			give_pages_dropped;
>   	u32			reclaim_pages_discard;
> +	enum mlx5_page_mgt_mode	page_mgt_mode;
>   
>   	struct mlx5_core_health health;
>   	struct list_head	traps;


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode
  2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan
                   ` (2 preceding siblings ...)
  2026-05-06 13:32 ` [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan
@ 2026-05-09  2:10 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-05-09  2:10 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: edumazet, kuba, pabeni, andrew+netdev, davem, saeedm, leon,
	mbloch, moshe, agoldberger, netdev, linux-rdma, linux-kernel, gal,
	dtatulea

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 6 May 2026 16:32:36 +0300 you wrote:
> Hi,
> 
> Find detailed description by Moshe below.
> 
> Regards,
> Tariq
> 
> [...]

Here is the summary with links:
  - [net-next,V2,1/3] net/mlx5: Relax capability check for eswitch query paths
    https://git.kernel.org/netdev/net-next/c/8ca32460815f
  - [net-next,V2,2/3] net/mlx5: Make debugfs page counters by function type dynamic
    https://git.kernel.org/netdev/net-next/c/5796d9fe0b88
  - [net-next,V2,3/3] net/mlx5: Add VHCA_ID page management mode support
    https://git.kernel.org/netdev/net-next/c/1fba57c91416

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-09  2:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan
2026-05-06 13:32 ` [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan
2026-05-06 13:32 ` [PATCH net-next V2 2/3] net/mlx5: Make debugfs page counters by function type dynamic Tariq Toukan
2026-05-06 13:32 ` [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan
2026-05-07 15:39   ` Moshe Shemesh
2026-05-09  2:10 ` [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox