* [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode
@ 2026-05-06 13:32 Tariq Toukan
2026-05-06 13:32 ` [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Tariq Toukan @ 2026-05-06 13:32 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Moshe Shemesh, Akiva Goldberger, netdev, linux-rdma, linux-kernel,
Gal Pressman, Dragos Tatulea
Hi,
Find detailed description by Moshe below.
Regards,
Tariq
This series adds driver support for the VHCA_ID page management mode.
When firmware and driver support this mode, ICM (Interconnect Context
Memory) page management uses the device vhca_id as the function
identifier in MANAGE_PAGES, QUERY_PAGES, and page request events instead
of the legacy function_id + ec_function pair.
Background
Firmware can operate page management in two modes:
FUNC_ID mode (current): Function identity is (function_id, ec_function).
This remains the default and is used for boot pages and when the new
mode capability is not set.
VHCA_ID mode (new): Function identity is vhca_id only; ec_function is
ignored. This aligns page management with the vhca_id-based model used
by other firmware commands and simplifies identification on SmartNIC and
multi-function setups.
---
V2:
- Cache vhca_id to type mapping to provide lockless lookup.
- Store the resolved type on each fw_page at allocation time so reclaim
and release paths read it directly without any lookup.
- Reorder erase old key and insert new key on migrate function.
- Fixed comment on mlx5_satisfy_startup_pages.
V1: https://lore.kernel.org/all/20260501044156.260875-1-tariqt@nvidia.com/
Moshe Shemesh (3):
net/mlx5: Relax capability check for eswitch query paths
net/mlx5: Make debugfs page counters by function type dynamic
net/mlx5: Add VHCA_ID page management mode support
.../net/ethernet/mellanox/mlx5/core/debugfs.c | 39 ++-
.../ethernet/mellanox/mlx5/core/esw/ipsec.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/eswitch.c | 49 +++-
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 8 +
.../mellanox/mlx5/core/eswitch_offloads.c | 14 +-
.../net/ethernet/mellanox/mlx5/core/main.c | 10 +-
.../ethernet/mellanox/mlx5/core/pagealloc.c | 250 +++++++++++++-----
include/linux/mlx5/driver.h | 9 +
8 files changed, 304 insertions(+), 77 deletions(-)
base-commit: 7e0cccae6b45b12eaf71fc3ab8eb133bb50b28ad
--
2.44.0
^ permalink raw reply [flat|nested] 6+ messages in thread* [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths 2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan @ 2026-05-06 13:32 ` Tariq Toukan 2026-05-06 13:32 ` [PATCH net-next V2 2/3] net/mlx5: Make debugfs page counters by function type dynamic Tariq Toukan ` (2 subsequent siblings) 3 siblings, 0 replies; 6+ messages in thread From: Tariq Toukan @ 2026-05-06 13:32 UTC (permalink / raw) To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn, David S. Miller Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, Moshe Shemesh, Akiva Goldberger, netdev, linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea From: Moshe Shemesh <moshe@nvidia.com> Several eswitch functions that only query other functions' HCA capabilities or read cached vport state are guarded by the vhca_resource_manager capability. This capability is required for set_hca_cap operations but query_hca_cap of other functions only requires the vport_group_manager capability. Relax the capability check from vhca_resource_manager to vport_group_manager in the following query-only paths: - mlx5_esw_vport_caps_get() - queries other function general caps - esw_ipsec_vf_query_generic() - queries other function ipsec cap - mlx5_devlink_port_fn_migratable_get() - reads cached vport state - mlx5_devlink_port_fn_roce_get() - reads cached vport state - mlx5_devlink_port_fn_max_io_eqs_get() - queries other function caps - mlx5_esw_vport_enable/disable() - vhca_id map/unmap Functions that perform also set_hca_cap (migratable_set, roce_set, max_io_eqs_set, esw_ipsec_vf_set_generic, esw_ipsec_vf_set_bytype) retain the vhca_resource_manager requirement. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> --- .../net/ethernet/mellanox/mlx5/core/esw/ipsec.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 6 +++--- .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 14 ++++++++------ 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec.c index 8b12c3ae0cf7..4811b60ea430 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/ipsec.c @@ -12,7 +12,7 @@ static int esw_ipsec_vf_query_generic(struct mlx5_core_dev *dev, u16 vport_num, void *hca_cap, *query_cap; int err; - if (!MLX5_CAP_GEN(dev, vhca_resource_manager)) + if (!MLX5_CAP_GEN(dev, vport_group_manager)) return -EOPNOTSUPP; if (!mlx5_esw_ipsec_vf_offload_supported(dev)) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 66a773a99876..e0eafcf0c52a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -806,7 +806,7 @@ static int mlx5_esw_vport_caps_get(struct mlx5_eswitch *esw, struct mlx5_vport * void *hca_caps; int err; - if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) + if (!MLX5_CAP_GEN(esw->dev, vport_group_manager)) return 0; query_ctx = kzalloc(query_out_sz, GFP_KERNEL); @@ -938,7 +938,7 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport, vport->info.trusted = true; if (!mlx5_esw_is_manager_vport(esw, vport_num) && - MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { + MLX5_CAP_GEN(esw->dev, vport_group_manager)) { ret = mlx5_esw_vport_vhca_id_map(esw, vport); if (ret) goto err_vhca_mapping; @@ -976,7 +976,7 @@ void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, struct mlx5_vport *vport) arm_vport_context_events_cmd(esw->dev, vport_num, 0); if (!mlx5_esw_is_manager_vport(esw, vport_num) && - MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) + MLX5_CAP_GEN(esw->dev, vport_group_manager)) mlx5_esw_vport_vhca_id_unmap(esw, vport); if (vport->vport != MLX5_VPORT_HOST_PF && diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c index 69ddf56e2fc9..392d8f364db6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c @@ -4677,8 +4677,9 @@ int mlx5_devlink_port_fn_migratable_get(struct devlink_port *port, bool *is_enab return -EOPNOTSUPP; } - if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { - NL_SET_ERR_MSG_MOD(extack, "Device doesn't support VHCA management"); + if (!MLX5_CAP_GEN(esw->dev, vport_group_manager)) { + NL_SET_ERR_MSG_MOD(extack, + "Device doesn't support vport group management"); return -EOPNOTSUPP; } @@ -4753,8 +4754,9 @@ int mlx5_devlink_port_fn_roce_get(struct devlink_port *port, bool *is_enabled, struct mlx5_eswitch *esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port); - if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { - NL_SET_ERR_MSG_MOD(extack, "Device doesn't support VHCA management"); + if (!MLX5_CAP_GEN(esw->dev, vport_group_manager)) { + NL_SET_ERR_MSG_MOD(extack, + "Device doesn't support vport group management"); return -EOPNOTSUPP; } @@ -5076,9 +5078,9 @@ mlx5_devlink_port_fn_max_io_eqs_get(struct devlink_port *port, u32 *max_io_eqs, int err; esw = mlx5_devlink_eswitch_nocheck_get(port->devlink); - if (!MLX5_CAP_GEN(esw->dev, vhca_resource_manager)) { + if (!MLX5_CAP_GEN(esw->dev, vport_group_manager)) { NL_SET_ERR_MSG_MOD(extack, - "Device doesn't support VHCA management"); + "Device doesn't support vport group management"); return -EOPNOTSUPP; } -- 2.44.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net-next V2 2/3] net/mlx5: Make debugfs page counters by function type dynamic 2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan 2026-05-06 13:32 ` [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan @ 2026-05-06 13:32 ` Tariq Toukan 2026-05-06 13:32 ` [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan 2026-05-09 2:10 ` [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode patchwork-bot+netdevbpf 3 siblings, 0 replies; 6+ messages in thread From: Tariq Toukan @ 2026-05-06 13:32 UTC (permalink / raw) To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn, David S. Miller Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, Moshe Shemesh, Akiva Goldberger, netdev, linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea From: Moshe Shemesh <moshe@nvidia.com> Make the per function type debugfs page counters dynamically added after mlx5_eswitch_init(). When page management operates in vhca_id mode, only the function acting as either eSwitch or vport manager can initialize the eSwitch structure and translate the vhca_id to function type for the functions to which it supplies pages. The next patch will add support for page management in vhca_id mode. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> --- .../net/ethernet/mellanox/mlx5/core/debugfs.c | 39 +++++++++++++++++-- .../net/ethernet/mellanox/mlx5/core/main.c | 7 +++- include/linux/mlx5/driver.h | 2 + 3 files changed, 42 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c index 8fe263190d38..6347957fefcb 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/debugfs.c @@ -285,10 +285,6 @@ void mlx5_pages_debugfs_init(struct mlx5_core_dev *dev) pages = dev->priv.dbg.pages_debugfs; debugfs_create_u32("fw_pages_total", 0400, pages, &dev->priv.fw_pages); - debugfs_create_u32("fw_pages_vfs", 0400, pages, &dev->priv.page_counters[MLX5_VF]); - debugfs_create_u32("fw_pages_ec_vfs", 0400, pages, &dev->priv.page_counters[MLX5_EC_VF]); - debugfs_create_u32("fw_pages_sfs", 0400, pages, &dev->priv.page_counters[MLX5_SF]); - debugfs_create_u32("fw_pages_host_pf", 0400, pages, &dev->priv.page_counters[MLX5_HOST_PF]); debugfs_create_u32("fw_pages_alloc_failed", 0400, pages, &dev->priv.fw_pages_alloc_failed); debugfs_create_u32("fw_pages_give_dropped", 0400, pages, &dev->priv.give_pages_dropped); debugfs_create_u32("fw_pages_reclaim_discard", 0400, pages, @@ -300,6 +296,41 @@ void mlx5_pages_debugfs_cleanup(struct mlx5_core_dev *dev) debugfs_remove_recursive(dev->priv.dbg.pages_debugfs); } +void mlx5_pages_by_func_type_debugfs_init(struct mlx5_core_dev *dev) +{ + struct dentry *pages = dev->priv.dbg.pages_debugfs; + + if (!pages) + return; + + if (!dev->priv.eswitch && + MLX5_CAP_GEN(dev, icm_mng_function_id_mode) == + MLX5_ID_MODE_FUNCTION_VHCA_ID) + return; + + debugfs_create_u32("fw_pages_vfs", 0400, pages, + &dev->priv.page_counters[MLX5_VF]); + debugfs_create_u32("fw_pages_ec_vfs", 0400, pages, + &dev->priv.page_counters[MLX5_EC_VF]); + debugfs_create_u32("fw_pages_sfs", 0400, pages, + &dev->priv.page_counters[MLX5_SF]); + debugfs_create_u32("fw_pages_host_pf", 0400, pages, + &dev->priv.page_counters[MLX5_HOST_PF]); +} + +void mlx5_pages_by_func_type_debugfs_cleanup(struct mlx5_core_dev *dev) +{ + struct dentry *pages = dev->priv.dbg.pages_debugfs; + + if (!pages) + return; + + debugfs_lookup_and_remove("fw_pages_vfs", pages); + debugfs_lookup_and_remove("fw_pages_ec_vfs", pages); + debugfs_lookup_and_remove("fw_pages_sfs", pages); + debugfs_lookup_and_remove("fw_pages_host_pf", pages); +} + static u64 qp_read_field(struct mlx5_core_dev *dev, struct mlx5_core_qp *qp, int index, int *is_str) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index b1b9ebfd3866..0c1c906b60fa 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -987,11 +987,12 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) mlx5_core_err(dev, "Failed to init eswitch %d\n", err); goto err_sriov_cleanup; } + mlx5_pages_by_func_type_debugfs_init(dev); err = mlx5_fpga_init(dev); if (err) { mlx5_core_err(dev, "Failed to init fpga device %d\n", err); - goto err_eswitch_cleanup; + goto err_page_debugfs_cleanup; } err = mlx5_vhca_event_init(dev); @@ -1034,7 +1035,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) mlx5_vhca_event_cleanup(dev); err_fpga_cleanup: mlx5_fpga_cleanup(dev); -err_eswitch_cleanup: +err_page_debugfs_cleanup: + mlx5_pages_by_func_type_debugfs_cleanup(dev); mlx5_eswitch_cleanup(dev->priv.eswitch); err_sriov_cleanup: mlx5_sriov_cleanup(dev); @@ -1072,6 +1074,7 @@ static void mlx5_cleanup_once(struct mlx5_core_dev *dev) mlx5_sf_hw_table_cleanup(dev); mlx5_vhca_event_cleanup(dev); mlx5_fpga_cleanup(dev); + mlx5_pages_by_func_type_debugfs_cleanup(dev); mlx5_eswitch_cleanup(dev->priv.eswitch); mlx5_sriov_cleanup(dev); mlx5_mpfs_cleanup(dev); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 531ce66fc8ef..d1751c5d01c7 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1039,6 +1039,8 @@ void mlx5_pagealloc_start(struct mlx5_core_dev *dev); void mlx5_pagealloc_stop(struct mlx5_core_dev *dev); void mlx5_pages_debugfs_init(struct mlx5_core_dev *dev); void mlx5_pages_debugfs_cleanup(struct mlx5_core_dev *dev); +void mlx5_pages_by_func_type_debugfs_init(struct mlx5_core_dev *dev); +void mlx5_pages_by_func_type_debugfs_cleanup(struct mlx5_core_dev *dev); int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot); int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev); void mlx5_register_debugfs(void); -- 2.44.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support 2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan 2026-05-06 13:32 ` [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan 2026-05-06 13:32 ` [PATCH net-next V2 2/3] net/mlx5: Make debugfs page counters by function type dynamic Tariq Toukan @ 2026-05-06 13:32 ` Tariq Toukan 2026-05-07 15:39 ` Moshe Shemesh 2026-05-09 2:10 ` [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode patchwork-bot+netdevbpf 3 siblings, 1 reply; 6+ messages in thread From: Tariq Toukan @ 2026-05-06 13:32 UTC (permalink / raw) To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn, David S. Miller Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch, Moshe Shemesh, Akiva Goldberger, netdev, linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea From: Moshe Shemesh <moshe@nvidia.com> Add support for VHCA_ID-based page management mode. When the device firmware advertises the icm_mng_function_id_mode capability with MLX5_ID_MODE_FUNCTION_VHCA_ID, page management operations between the driver and firmware may use vhca_id instead of function_id as the effective function identifier, and the ec_function field is ignored. Update page management commands to conditionally set ec_function field only in FUNC_ID mode. Boot page allocation always uses FUNC_ID mode semantics for backward compatibility, as the capability bit is only available after set_hca_cap(). If after set_hca_cap() VHCA_ID mode was set, modify the tracking of the boot pages in page_root_xa to use vhca_id too. Add mlx5_esw_vhca_id_to_func_type() to resolve the function type in VHCA_ID mode, enabling per-type debugfs counters. Use a dedicated vhca_type_map xarray, to provide lockless lookup. Store the resolved type on each fw_page at allocation time so reclaim and release paths read it directly without any lookup. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> --- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 45 +++- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 8 + .../net/ethernet/mellanox/mlx5/core/main.c | 3 + .../ethernet/mellanox/mlx5/core/pagealloc.c | 250 +++++++++++++----- include/linux/mlx5/driver.h | 7 + 5 files changed, 251 insertions(+), 62 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index e0eafcf0c52a..125129ef43e3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -852,6 +852,38 @@ bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) return true; } +static enum mlx5_func_type +esw_vport_to_func_type(struct mlx5_eswitch *esw, struct mlx5_vport *vport) +{ + u16 vport_num = vport->vport; + + if (vport_num == MLX5_VPORT_HOST_PF) + return MLX5_HOST_PF; + if (xa_get_mark(&esw->vports, vport_num, MLX5_ESW_VPT_SF)) + return MLX5_SF; + if (xa_get_mark(&esw->vports, vport_num, MLX5_ESW_VPT_VF)) + return MLX5_VF; + return MLX5_EC_VF; +} + +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id) +{ + struct mlx5_eswitch *esw = dev->priv.eswitch; + void *entry; + + if (vhca_id == MLX5_CAP_GEN(dev, vhca_id)) + return MLX5_SELF; + + if (!esw) + return MLX5_FUNC_TYPE_NONE; + + entry = xa_load(&esw->vhca_type_map, vhca_id); + if (entry) + return xa_to_value(entry); + + return MLX5_FUNC_TYPE_NONE; +} + static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) { bool vst_mode_steering = esw_vst_mode_is_steering(esw); @@ -942,6 +974,11 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport, ret = mlx5_esw_vport_vhca_id_map(esw, vport); if (ret) goto err_vhca_mapping; + ret = xa_insert(&esw->vhca_type_map, vport->vhca_id, + xa_mk_value(esw_vport_to_func_type(esw, vport)), + GFP_KERNEL); + if (ret) + goto err_type_map; } esw_vport_change_handle_locked(vport); @@ -952,6 +989,8 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport, mutex_unlock(&esw->state_lock); return ret; +err_type_map: + mlx5_esw_vport_vhca_id_unmap(esw, vport); err_vhca_mapping: esw_vport_cleanup(esw, vport); mutex_unlock(&esw->state_lock); @@ -976,8 +1015,10 @@ void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, struct mlx5_vport *vport) arm_vport_context_events_cmd(esw->dev, vport_num, 0); if (!mlx5_esw_is_manager_vport(esw, vport_num) && - MLX5_CAP_GEN(esw->dev, vport_group_manager)) + MLX5_CAP_GEN(esw->dev, vport_group_manager)) { + xa_erase(&esw->vhca_type_map, vport->vhca_id); mlx5_esw_vport_vhca_id_unmap(esw, vport); + } if (vport->vport != MLX5_VPORT_HOST_PF && (vport->info.ipsec_crypto_enabled || vport->info.ipsec_packet_enabled)) @@ -2084,6 +2125,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) atomic64_set(&esw->offloads.num_flows, 0); ida_init(&esw->offloads.vport_metadata_ida); xa_init_flags(&esw->offloads.vhca_map, XA_FLAGS_ALLOC); + xa_init(&esw->vhca_type_map); mutex_init(&esw->state_lock); init_rwsem(&esw->mode_lock); refcount_set(&esw->qos.refcnt, 0); @@ -2133,6 +2175,7 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) mutex_destroy(&esw->state_lock); WARN_ON(!xa_empty(&esw->offloads.vhca_map)); xa_destroy(&esw->offloads.vhca_map); + xa_destroy(&esw->vhca_type_map); ida_destroy(&esw->offloads.vport_metadata_ida); mlx5e_mod_hdr_tbl_destroy(&esw->offloads.mod_hdr); mutex_destroy(&esw->offloads.encap_tbl_lock); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 2fd601bd102f..b06d097824ad 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -373,6 +373,7 @@ struct mlx5_eswitch { struct dentry *debugfs_root; struct workqueue_struct *work_queue; struct xarray vports; + struct xarray vhca_type_map; u32 flags; int total_vports; int enabled_vports; @@ -863,6 +864,7 @@ void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw, struct mlx5_vport *vport); int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num); bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id); +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id); void mlx5_esw_offloads_rep_remove(struct mlx5_eswitch *esw, const struct mlx5_vport *vport); @@ -1034,6 +1036,12 @@ mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) return false; } +static inline u16 +mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id) +{ + return MLX5_FUNC_TYPE_NONE; +} + static inline void mlx5_eswitch_safe_aux_devs_remove(struct mlx5_core_dev *dev) {} static inline struct mlx5_flow_handle * diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 0c1c906b60fa..296c5223cf61 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -597,6 +597,9 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx) if (MLX5_CAP_GEN_MAX(dev, release_all_pages)) MLX5_SET(cmd_hca_cap, set_hca_cap, release_all_pages, 1); + if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode)) + MLX5_SET(cmd_hca_cap, set_hca_cap, icm_mng_function_id_mode, 1); + if (MLX5_CAP_GEN_MAX(dev, mkey_by_name)) MLX5_SET(cmd_hca_cap, set_hca_cap, mkey_by_name, 1); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c index 77ffa31cc505..ce2f7fa9bd48 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c @@ -38,6 +38,7 @@ #include "mlx5_core.h" #include "lib/eq.h" #include "lib/tout.h" +#include "eswitch.h" enum { MLX5_PAGES_CANT_GIVE = 0, @@ -59,6 +60,7 @@ struct fw_page { u64 addr; struct page *page; u32 function; + u16 func_type; unsigned long bitmask; struct list_head list; unsigned int free_count; @@ -69,9 +71,24 @@ enum { MLX5_NUM_4K_IN_PAGE = PAGE_SIZE / MLX5_ADAPTER_PAGE_SIZE, }; -static u32 get_function(u16 func_id, bool ec_function) +static bool mlx5_page_mgt_mode_is_vhca_id(const struct mlx5_core_dev *dev) { - return (u32)func_id | (ec_function << 16); + return dev->priv.page_mgt_mode == MLX5_PAGE_MGT_MODE_VHCA_ID; +} + +static void mlx5_page_mgt_mode_set(struct mlx5_core_dev *dev, + enum mlx5_page_mgt_mode mode) +{ + dev->priv.page_mgt_mode = mode; +} + +static u32 get_function_key(struct mlx5_core_dev *dev, u16 func_vhca_id, + bool ec_function) +{ + if (mlx5_page_mgt_mode_is_vhca_id(dev)) + return (u32)func_vhca_id; + + return (u32)func_vhca_id | (ec_function << 16); } static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_function) @@ -89,12 +106,21 @@ static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_funct return MLX5_SF; } +static u16 func_vhca_id_to_type(struct mlx5_core_dev *dev, u16 func_vhca_id, + bool ec_function) +{ + if (mlx5_page_mgt_mode_is_vhca_id(dev)) + return mlx5_esw_vhca_id_to_func_type(dev, func_vhca_id); + + return func_id_to_type(dev, func_vhca_id, ec_function); +} + static u32 mlx5_get_ec_function(u32 function) { return function >> 16; } -static u32 mlx5_get_func_id(u32 function) +static u32 mlx5_get_func_vhca_id(u32 function) { return function & 0xffff; } @@ -123,7 +149,8 @@ static struct rb_root *page_root_per_function(struct mlx5_core_dev *dev, u32 fun return root; } -static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, u32 function) +static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, + u32 function, u16 func_type) { struct rb_node *parent = NULL; struct rb_root *root; @@ -156,6 +183,7 @@ static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, u nfp->addr = addr; nfp->page = page; nfp->function = function; + nfp->func_type = func_type; nfp->free_count = MLX5_NUM_4K_IN_PAGE; for (i = 0; i < MLX5_NUM_4K_IN_PAGE; i++) set_bit(i, &nfp->bitmask); @@ -196,7 +224,7 @@ static struct fw_page *find_fw_page(struct mlx5_core_dev *dev, u64 addr, return result; } -static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_id, +static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_vhca_id, s32 *npages, int boot) { u32 out[MLX5_ST_SZ_DW(query_pages_out)] = {}; @@ -207,14 +235,20 @@ static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_id, MLX5_SET(query_pages_in, in, op_mod, boot ? MLX5_QUERY_PAGES_IN_OP_MOD_BOOT_PAGES : MLX5_QUERY_PAGES_IN_OP_MOD_INIT_PAGES); - MLX5_SET(query_pages_in, in, embedded_cpu_function, mlx5_core_is_ecpf(dev)); + + if (mlx5_page_mgt_mode_is_vhca_id(dev)) + MLX5_SET(query_pages_in, in, function_id, + MLX5_CAP_GEN(dev, vhca_id)); + else + MLX5_SET(query_pages_in, in, embedded_cpu_function, + mlx5_core_is_ecpf(dev)); err = mlx5_cmd_exec_inout(dev, query_pages, in, out); if (err) return err; *npages = MLX5_GET(query_pages_out, out, num_pages); - *func_id = MLX5_GET(query_pages_out, out, function_id); + *func_vhca_id = MLX5_GET(query_pages_out, out, function_id); return err; } @@ -245,6 +279,10 @@ static int alloc_4k(struct mlx5_core_dev *dev, u64 *addr, u32 function) if (!fp->free_count) list_del(&fp->list); + if (fp->func_type != MLX5_FUNC_TYPE_NONE) + dev->priv.page_counters[fp->func_type]++; + dev->priv.fw_pages++; + *addr = fp->addr + n * MLX5_ADAPTER_PAGE_SIZE; return 0; @@ -280,6 +318,11 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function) mlx5_core_warn_rl(dev, "page not found\n"); return; } + + if (fwp->func_type != MLX5_FUNC_TYPE_NONE) + dev->priv.page_counters[fwp->func_type]--; + dev->priv.fw_pages--; + n = (addr & ~MLX5_U64_4K_PAGE_MASK) >> MLX5_ADAPTER_PAGE_SHIFT; fwp->free_count++; set_bit(n, &fwp->bitmask); @@ -289,7 +332,8 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function) list_add(&fwp->list, &dev->priv.free_list); } -static int alloc_system_page(struct mlx5_core_dev *dev, u32 function) +static int alloc_system_page(struct mlx5_core_dev *dev, u32 function, + u16 func_type) { struct device *device = mlx5_core_dma_dev(dev); int nid = dev->priv.numa_node; @@ -317,7 +361,7 @@ static int alloc_system_page(struct mlx5_core_dev *dev, u32 function) goto map; } - err = insert_page(dev, addr, page, function); + err = insert_page(dev, addr, page, function, func_type); if (err) { mlx5_core_err(dev, "failed to track allocated page\n"); dma_unmap_page(device, addr, PAGE_SIZE, DMA_BIDIRECTIONAL); @@ -334,7 +378,7 @@ static int alloc_system_page(struct mlx5_core_dev *dev, u32 function) return err; } -static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_id, +static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_vhca_id, bool ec_function) { u32 in[MLX5_ST_SZ_DW(manage_pages_in)] = {}; @@ -342,19 +386,23 @@ static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_id, MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES); MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_CANT_GIVE); - MLX5_SET(manage_pages_in, in, function_id, func_id); - MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function); + MLX5_SET(manage_pages_in, in, function_id, func_vhca_id); + + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) + MLX5_SET(manage_pages_in, in, embedded_cpu_function, + ec_function); err = mlx5_cmd_exec_in(dev, manage_pages, in); if (err) - mlx5_core_warn(dev, "page notify failed func_id(%d) err(%d)\n", - func_id, err); + mlx5_core_warn(dev, + "page notify failed func_vhca_id(%d) err(%d)\n", + func_vhca_id, err); } -static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, +static int give_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, int npages, int event, bool ec_function) { - u32 function = get_function(func_id, ec_function); + u32 function = get_function_key(dev, func_vhca_id, ec_function); u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0}; int inlen = MLX5_ST_SZ_BYTES(manage_pages_in); int notify_fail = event; @@ -364,6 +412,8 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, u32 *in; int i; + func_type = func_vhca_id_to_type(dev, func_vhca_id, ec_function); + inlen += npages * MLX5_FLD_SZ_BYTES(manage_pages_in, pas[0]); in = kvzalloc(inlen, GFP_KERNEL); if (!in) { @@ -377,7 +427,8 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, err = alloc_4k(dev, &addr, function); if (err) { if (err == -ENOMEM) - err = alloc_system_page(dev, function); + err = alloc_system_page(dev, function, + func_type); if (err) { dev->priv.fw_pages_alloc_failed += (npages - i); goto out_4k; @@ -390,9 +441,12 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES); MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_GIVE); - MLX5_SET(manage_pages_in, in, function_id, func_id); + MLX5_SET(manage_pages_in, in, function_id, func_vhca_id); MLX5_SET(manage_pages_in, in, input_num_entries, npages); - MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function); + + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) + MLX5_SET(manage_pages_in, in, embedded_cpu_function, + ec_function); err = mlx5_cmd_do(dev, in, inlen, out, sizeof(out)); if (err == -EREMOTEIO) { @@ -405,17 +459,15 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, } err = mlx5_cmd_check(dev, err, in, out); if (err) { - mlx5_core_warn(dev, "func_id 0x%x, npages %d, err %d\n", - func_id, npages, err); + mlx5_core_warn(dev, "func_vhca_id 0x%x, npages %d, err %d\n", + func_vhca_id, npages, err); goto out_dropped; } - func_type = func_id_to_type(dev, func_id, ec_function); - dev->priv.page_counters[func_type] += npages; - dev->priv.fw_pages += npages; - - mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x, err %d\n", - npages, ec_function, func_id, err); + mlx5_core_dbg(dev, + "npages %d, ec_function %d, func 0x%x, mode %d, err %d\n", + npages, ec_function, func_vhca_id, + mlx5_page_mgt_mode_is_vhca_id(dev), err); kvfree(in); return 0; @@ -428,18 +480,17 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, out_free: kvfree(in); if (notify_fail) - page_notify_fail(dev, func_id, ec_function); + page_notify_fail(dev, func_vhca_id, ec_function); return err; } -static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id, +static void release_all_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, bool ec_function) { - u32 function = get_function(func_id, ec_function); + u32 function = get_function_key(dev, func_vhca_id, ec_function); struct rb_root *root; struct rb_node *p; int npages = 0; - u16 func_type; root = xa_load(&dev->priv.page_root_xa, function); if (WARN_ON_ONCE(!root)) @@ -448,18 +499,20 @@ static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id, p = rb_first(root); while (p) { struct fw_page *fwp = rb_entry(p, struct fw_page, rb_node); + int used = MLX5_NUM_4K_IN_PAGE - fwp->free_count; p = rb_next(p); - npages += (MLX5_NUM_4K_IN_PAGE - fwp->free_count); + npages += used; + if (fwp->func_type != MLX5_FUNC_TYPE_NONE) + dev->priv.page_counters[fwp->func_type] -= used; free_fwp(dev, fwp, fwp->free_count); } - func_type = func_id_to_type(dev, func_id, ec_function); - dev->priv.page_counters[func_type] -= npages; dev->priv.fw_pages -= npages; - mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x\n", - npages, ec_function, func_id); + mlx5_core_dbg(dev, "npages %d, ec_function %d, func 0x%x, mode %d\n", + npages, ec_function, func_vhca_id, + mlx5_page_mgt_mode_is_vhca_id(dev)); } static u32 fwp_fill_manage_pages_out(struct fw_page *fwp, u32 *out, u32 index, @@ -487,7 +540,7 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev, struct fw_page *fwp; struct rb_node *p; bool ec_function; - u32 func_id; + u32 func_vhca_id; u32 npages; u32 i = 0; int err; @@ -499,10 +552,11 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev, /* No hard feelings, we want our pages back! */ npages = MLX5_GET(manage_pages_in, in, input_num_entries); - func_id = MLX5_GET(manage_pages_in, in, function_id); + func_vhca_id = MLX5_GET(manage_pages_in, in, function_id); ec_function = MLX5_GET(manage_pages_in, in, embedded_cpu_function); - root = xa_load(&dev->priv.page_root_xa, get_function(func_id, ec_function)); + root = xa_load(&dev->priv.page_root_xa, + get_function_key(dev, func_vhca_id, ec_function)); if (WARN_ON_ONCE(!root)) return -EEXIST; @@ -518,14 +572,14 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev, return 0; } -static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, - int *nclaimed, bool event, bool ec_function) +static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, + int npages, int *nclaimed, bool event, + bool ec_function) { - u32 function = get_function(func_id, ec_function); + u32 function = get_function_key(dev, func_vhca_id, ec_function); int outlen = MLX5_ST_SZ_BYTES(manage_pages_out); u32 in[MLX5_ST_SZ_DW(manage_pages_in)] = {}; int num_claimed; - u16 func_type; u32 *out; int err; int i; @@ -540,12 +594,16 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES); MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_TAKE); - MLX5_SET(manage_pages_in, in, function_id, func_id); + MLX5_SET(manage_pages_in, in, function_id, func_vhca_id); MLX5_SET(manage_pages_in, in, input_num_entries, npages); - MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function); - mlx5_core_dbg(dev, "func 0x%x, npages %d, outlen %d\n", - func_id, npages, outlen); + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) + MLX5_SET(manage_pages_in, in, embedded_cpu_function, + ec_function); + + mlx5_core_dbg(dev, "func 0x%x, npages %d, outlen %d mode %d\n", + func_vhca_id, npages, outlen, + mlx5_page_mgt_mode_is_vhca_id(dev)); err = reclaim_pages_cmd(dev, in, sizeof(in), out, outlen); if (err) { npages = MLX5_GET(manage_pages_in, in, input_num_entries); @@ -577,10 +635,6 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, if (nclaimed) *nclaimed = num_claimed; - func_type = func_id_to_type(dev, func_id, ec_function); - dev->priv.page_counters[func_type] -= num_claimed; - dev->priv.fw_pages -= num_claimed; - out_free: kvfree(out); return err; @@ -658,30 +712,102 @@ static int req_pages_handler(struct notifier_block *nb, * req->npages (and not min ()). */ req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES); - req->ec_function = ec_function; + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) + req->ec_function = ec_function; req->release_all = release_all; INIT_WORK(&req->work, pages_work_handler); queue_work(dev->priv.pg_wq, &req->work); return NOTIFY_OK; } +/* + * After set_hca_cap(), the second satisfy_startup_pages(dev, 0) may see + * VHCA_ID mode. If page_root_xa already has the PF entry from the first + * (boot) call under FUNC_ID keys 0 or (ec_function << 16), migrate that + * entry to the device vhca_id key so lookups use VHCA_ID semantics. + */ +static int mlx5_pagealloc_migrate_pf_to_vhca_id(struct mlx5_core_dev *dev) +{ + u32 vhca_id_key, old_key; + struct rb_root *root; + struct fw_page *fwp; + struct rb_node *p; + bool ec_function; + int err; + + if (xa_empty(&dev->priv.page_root_xa)) + return 0; + + vhca_id_key = MLX5_CAP_GEN(dev, vhca_id); + ec_function = mlx5_core_is_ecpf(dev); + + old_key = ec_function ? (1U << 16) : 0; + root = xa_load(&dev->priv.page_root_xa, old_key); + if (!root) + return 0; + + if (old_key == vhca_id_key) + return 0; + + err = xa_insert(&dev->priv.page_root_xa, vhca_id_key, root, GFP_KERNEL); + if (err) { + mlx5_core_warn(dev, + "failed to migrate page root key 0x%x to vhca_id 0x%x\n", + old_key, vhca_id_key); + return err; + } + + for (p = rb_first(root); p; p = rb_next(p)) { + fwp = rb_entry(p, struct fw_page, rb_node); + fwp->function = vhca_id_key; + } + + xa_erase(&dev->priv.page_root_xa, old_key); + + return 0; +} + int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot) { - u16 func_id; + bool ec_function = false; + u16 func_vhca_id; s32 npages; int err; - err = mlx5_cmd_query_pages(dev, &func_id, &npages, boot); + /* Boot pages are requested before set_hca_cap(), so the capability + * is not negotiated yet; use FUNC_ID mode for backward compatibility. + * Init pages are requested after set_hca_cap(), which unconditionally + * enables CAP_GEN_MAX. Current caps are not re-queried at this point, + * so check CAP_GEN_MAX directly. + */ + if (boot) { + mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_FUNC_ID); + } else { + if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode) == + MLX5_ID_MODE_FUNCTION_VHCA_ID) { + err = mlx5_pagealloc_migrate_pf_to_vhca_id(dev); + if (err) + return err; + mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_VHCA_ID); + } + } + + err = mlx5_cmd_query_pages(dev, &func_vhca_id, &npages, boot); if (err) return err; - mlx5_core_dbg(dev, "requested %d %s pages for func_id 0x%x\n", - npages, boot ? "boot" : "init", func_id); + mlx5_core_dbg(dev, + "requested %d %s pages for func_vhca_id 0x%x\n", + npages, boot ? "boot" : "init", func_vhca_id); if (!npages) return 0; - return give_pages(dev, func_id, npages, 0, mlx5_core_is_ecpf(dev)); + /* In VHCA_ID mode, ec_function remains false (not used). */ + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) + ec_function = mlx5_core_is_ecpf(dev); + + return give_pages(dev, func_vhca_id, npages, 0, ec_function); } enum { @@ -709,15 +835,17 @@ static int mlx5_reclaim_root_pages(struct mlx5_core_dev *dev, while (!RB_EMPTY_ROOT(root)) { u32 ec_function = mlx5_get_ec_function(function); - u32 function_id = mlx5_get_func_id(function); + u32 func_vhca_id = mlx5_get_func_vhca_id(function); int nclaimed; int err; - err = reclaim_pages(dev, function_id, optimal_reclaimed_pages(), + err = reclaim_pages(dev, func_vhca_id, + optimal_reclaimed_pages(), &nclaimed, false, ec_function); if (err) { - mlx5_core_warn(dev, "reclaim_pages err (%d) func_id=0x%x ec_func=0x%x\n", - err, function_id, ec_function); + mlx5_core_warn(dev, + "reclaim_pages err (%d) func_vhca_id=0x%x ec_func=0x%x\n", + err, func_vhca_id, ec_function); return err; } diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index d1751c5d01c7..8b4d384125d1 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -558,6 +558,12 @@ enum mlx5_func_type { MLX5_HOST_PF, MLX5_EC_VF, MLX5_FUNC_TYPE_NUM, + MLX5_FUNC_TYPE_NONE = MLX5_FUNC_TYPE_NUM, +}; + +enum mlx5_page_mgt_mode { + MLX5_PAGE_MGT_MODE_FUNC_ID, + MLX5_PAGE_MGT_MODE_VHCA_ID, }; struct mlx5_frag_buf_node_pools; @@ -578,6 +584,7 @@ struct mlx5_priv { u32 fw_pages_alloc_failed; u32 give_pages_dropped; u32 reclaim_pages_discard; + enum mlx5_page_mgt_mode page_mgt_mode; struct mlx5_core_health health; struct list_head traps; -- 2.44.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support 2026-05-06 13:32 ` [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan @ 2026-05-07 15:39 ` Moshe Shemesh 0 siblings, 0 replies; 6+ messages in thread From: Moshe Shemesh @ 2026-05-07 15:39 UTC (permalink / raw) To: Tariq Toukan, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn, David S. Miller Cc: Saeed Mahameed, Leon Romanovsky, Mark Bloch, Akiva Goldberger, netdev, linux-rdma, linux-kernel, Gal Pressman, Dragos Tatulea On 5/6/2026 4:32 PM, Tariq Toukan wrote: > From: Moshe Shemesh <moshe@nvidia.com> > > Add support for VHCA_ID-based page management mode. When the device > firmware advertises the icm_mng_function_id_mode capability with > MLX5_ID_MODE_FUNCTION_VHCA_ID, page management operations between the > driver and firmware may use vhca_id instead of function_id as the > effective function identifier, and the ec_function field is ignored. > > Update page management commands to conditionally set ec_function field > only in FUNC_ID mode. Boot page allocation always uses FUNC_ID mode > semantics for backward compatibility, as the capability bit is only > available after set_hca_cap(). If after set_hca_cap() VHCA_ID mode was > set, modify the tracking of the boot pages in page_root_xa to use > vhca_id too. > > Add mlx5_esw_vhca_id_to_func_type() to resolve the function type in > VHCA_ID mode, enabling per-type debugfs counters. Use a dedicated > vhca_type_map xarray, to provide lockless lookup. Store the resolved > type on each fw_page at allocation time so reclaim and release paths > read it directly without any lookup. > > Signed-off-by: Moshe Shemesh <moshe@nvidia.com> > Reviewed-by: Akiva Goldberger <agoldberger@nvidia.com> > Reviewed-by: Mark Bloch <mbloch@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- > .../net/ethernet/mellanox/mlx5/core/eswitch.c | 45 +++- > .../net/ethernet/mellanox/mlx5/core/eswitch.h | 8 + > .../net/ethernet/mellanox/mlx5/core/main.c | 3 + > .../ethernet/mellanox/mlx5/core/pagealloc.c | 250 +++++++++++++----- > include/linux/mlx5/driver.h | 7 + > 5 files changed, 251 insertions(+), 62 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c > index e0eafcf0c52a..125129ef43e3 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c > @@ -852,6 +852,38 @@ bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) > return true; > } > > +static enum mlx5_func_type > +esw_vport_to_func_type(struct mlx5_eswitch *esw, struct mlx5_vport *vport) > +{ > + u16 vport_num = vport->vport; > + > + if (vport_num == MLX5_VPORT_HOST_PF) > + return MLX5_HOST_PF; > + if (xa_get_mark(&esw->vports, vport_num, MLX5_ESW_VPT_SF)) > + return MLX5_SF; > + if (xa_get_mark(&esw->vports, vport_num, MLX5_ESW_VPT_VF)) > + return MLX5_VF; > + return MLX5_EC_VF; > +} > + > +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id) > +{ > + struct mlx5_eswitch *esw = dev->priv.eswitch; > + void *entry; > + > + if (vhca_id == MLX5_CAP_GEN(dev, vhca_id)) > + return MLX5_SELF; > + > + if (!esw) > + return MLX5_FUNC_TYPE_NONE; > + > + entry = xa_load(&esw->vhca_type_map, vhca_id); > + if (entry) > + return xa_to_value(entry); > + > + return MLX5_FUNC_TYPE_NONE; > +} > + > static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) > { > bool vst_mode_steering = esw_vst_mode_is_steering(esw); > @@ -942,6 +974,11 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport, > ret = mlx5_esw_vport_vhca_id_map(esw, vport); > if (ret) > goto err_vhca_mapping; > + ret = xa_insert(&esw->vhca_type_map, vport->vhca_id, > + xa_mk_value(esw_vport_to_func_type(esw, vport)), > + GFP_KERNEL); > + if (ret) > + goto err_type_map; Sashiko says: " If xa_insert() fails here, the error path goes to err_type_map but does not appear to revert vport->enabled or the increment of esw->enabled_ipsec_vf_count that occurred earlier in the function. Since esw->enabled_vports is only incremented on success, could this leave the vport in an inconsistent state? A later call to mlx5_esw_vport_disable() might see vport->enabled as true and decrement esw->enabled_vports, potentially causing an integer underflow. " The same inconsistency existed before this patch with the err_vhca_mapping path for the mlx5_esw_vport_vhca_id_map() failure. So it will be addressed in a separate fix patch. - Moshe > } > > esw_vport_change_handle_locked(vport); > @@ -952,6 +989,8 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport, > mutex_unlock(&esw->state_lock); > return ret; > > +err_type_map: > + mlx5_esw_vport_vhca_id_unmap(esw, vport); > err_vhca_mapping: > esw_vport_cleanup(esw, vport); > mutex_unlock(&esw->state_lock); > @@ -976,8 +1015,10 @@ void mlx5_esw_vport_disable(struct mlx5_eswitch *esw, struct mlx5_vport *vport) > arm_vport_context_events_cmd(esw->dev, vport_num, 0); > > if (!mlx5_esw_is_manager_vport(esw, vport_num) && > - MLX5_CAP_GEN(esw->dev, vport_group_manager)) > + MLX5_CAP_GEN(esw->dev, vport_group_manager)) { > + xa_erase(&esw->vhca_type_map, vport->vhca_id); > mlx5_esw_vport_vhca_id_unmap(esw, vport); > + } > > if (vport->vport != MLX5_VPORT_HOST_PF && > (vport->info.ipsec_crypto_enabled || vport->info.ipsec_packet_enabled)) > @@ -2084,6 +2125,7 @@ int mlx5_eswitch_init(struct mlx5_core_dev *dev) > atomic64_set(&esw->offloads.num_flows, 0); > ida_init(&esw->offloads.vport_metadata_ida); > xa_init_flags(&esw->offloads.vhca_map, XA_FLAGS_ALLOC); > + xa_init(&esw->vhca_type_map); > mutex_init(&esw->state_lock); > init_rwsem(&esw->mode_lock); > refcount_set(&esw->qos.refcnt, 0); > @@ -2133,6 +2175,7 @@ void mlx5_eswitch_cleanup(struct mlx5_eswitch *esw) > mutex_destroy(&esw->state_lock); > WARN_ON(!xa_empty(&esw->offloads.vhca_map)); > xa_destroy(&esw->offloads.vhca_map); > + xa_destroy(&esw->vhca_type_map); > ida_destroy(&esw->offloads.vport_metadata_ida); > mlx5e_mod_hdr_tbl_destroy(&esw->offloads.mod_hdr); > mutex_destroy(&esw->offloads.encap_tbl_lock); > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > index 2fd601bd102f..b06d097824ad 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h > @@ -373,6 +373,7 @@ struct mlx5_eswitch { > struct dentry *debugfs_root; > struct workqueue_struct *work_queue; > struct xarray vports; > + struct xarray vhca_type_map; > u32 flags; > int total_vports; > int enabled_vports; > @@ -863,6 +864,7 @@ void mlx5_esw_vport_vhca_id_unmap(struct mlx5_eswitch *esw, > struct mlx5_vport *vport); > int mlx5_eswitch_vhca_id_to_vport(struct mlx5_eswitch *esw, u16 vhca_id, u16 *vport_num); > bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id); > +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id); > > void mlx5_esw_offloads_rep_remove(struct mlx5_eswitch *esw, > const struct mlx5_vport *vport); > @@ -1034,6 +1036,12 @@ mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id) > return false; > } > > +static inline u16 > +mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id) > +{ > + return MLX5_FUNC_TYPE_NONE; > +} > + > static inline void > mlx5_eswitch_safe_aux_devs_remove(struct mlx5_core_dev *dev) {} > static inline struct mlx5_flow_handle * > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c > index 0c1c906b60fa..296c5223cf61 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c > @@ -597,6 +597,9 @@ static int handle_hca_cap(struct mlx5_core_dev *dev, void *set_ctx) > if (MLX5_CAP_GEN_MAX(dev, release_all_pages)) > MLX5_SET(cmd_hca_cap, set_hca_cap, release_all_pages, 1); > > + if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode)) > + MLX5_SET(cmd_hca_cap, set_hca_cap, icm_mng_function_id_mode, 1); > + > if (MLX5_CAP_GEN_MAX(dev, mkey_by_name)) > MLX5_SET(cmd_hca_cap, set_hca_cap, mkey_by_name, 1); > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > index 77ffa31cc505..ce2f7fa9bd48 100644 > --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c > @@ -38,6 +38,7 @@ > #include "mlx5_core.h" > #include "lib/eq.h" > #include "lib/tout.h" > +#include "eswitch.h" > > enum { > MLX5_PAGES_CANT_GIVE = 0, > @@ -59,6 +60,7 @@ struct fw_page { > u64 addr; > struct page *page; > u32 function; > + u16 func_type; > unsigned long bitmask; > struct list_head list; > unsigned int free_count; > @@ -69,9 +71,24 @@ enum { > MLX5_NUM_4K_IN_PAGE = PAGE_SIZE / MLX5_ADAPTER_PAGE_SIZE, > }; > > -static u32 get_function(u16 func_id, bool ec_function) > +static bool mlx5_page_mgt_mode_is_vhca_id(const struct mlx5_core_dev *dev) > { > - return (u32)func_id | (ec_function << 16); > + return dev->priv.page_mgt_mode == MLX5_PAGE_MGT_MODE_VHCA_ID; > +} > + > +static void mlx5_page_mgt_mode_set(struct mlx5_core_dev *dev, > + enum mlx5_page_mgt_mode mode) > +{ > + dev->priv.page_mgt_mode = mode; > +} > + > +static u32 get_function_key(struct mlx5_core_dev *dev, u16 func_vhca_id, > + bool ec_function) > +{ > + if (mlx5_page_mgt_mode_is_vhca_id(dev)) > + return (u32)func_vhca_id; > + > + return (u32)func_vhca_id | (ec_function << 16); > } > > static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_function) > @@ -89,12 +106,21 @@ static u16 func_id_to_type(struct mlx5_core_dev *dev, u16 func_id, bool ec_funct > return MLX5_SF; > } > > +static u16 func_vhca_id_to_type(struct mlx5_core_dev *dev, u16 func_vhca_id, > + bool ec_function) > +{ > + if (mlx5_page_mgt_mode_is_vhca_id(dev)) > + return mlx5_esw_vhca_id_to_func_type(dev, func_vhca_id); > + > + return func_id_to_type(dev, func_vhca_id, ec_function); > +} > + > static u32 mlx5_get_ec_function(u32 function) > { > return function >> 16; > } > > -static u32 mlx5_get_func_id(u32 function) > +static u32 mlx5_get_func_vhca_id(u32 function) > { > return function & 0xffff; > } > @@ -123,7 +149,8 @@ static struct rb_root *page_root_per_function(struct mlx5_core_dev *dev, u32 fun > return root; > } > > -static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, u32 function) > +static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, > + u32 function, u16 func_type) > { > struct rb_node *parent = NULL; > struct rb_root *root; > @@ -156,6 +183,7 @@ static int insert_page(struct mlx5_core_dev *dev, u64 addr, struct page *page, u > nfp->addr = addr; > nfp->page = page; > nfp->function = function; > + nfp->func_type = func_type; > nfp->free_count = MLX5_NUM_4K_IN_PAGE; > for (i = 0; i < MLX5_NUM_4K_IN_PAGE; i++) > set_bit(i, &nfp->bitmask); > @@ -196,7 +224,7 @@ static struct fw_page *find_fw_page(struct mlx5_core_dev *dev, u64 addr, > return result; > } > > -static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_id, > +static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_vhca_id, > s32 *npages, int boot) > { > u32 out[MLX5_ST_SZ_DW(query_pages_out)] = {}; > @@ -207,14 +235,20 @@ static int mlx5_cmd_query_pages(struct mlx5_core_dev *dev, u16 *func_id, > MLX5_SET(query_pages_in, in, op_mod, boot ? > MLX5_QUERY_PAGES_IN_OP_MOD_BOOT_PAGES : > MLX5_QUERY_PAGES_IN_OP_MOD_INIT_PAGES); > - MLX5_SET(query_pages_in, in, embedded_cpu_function, mlx5_core_is_ecpf(dev)); > + > + if (mlx5_page_mgt_mode_is_vhca_id(dev)) > + MLX5_SET(query_pages_in, in, function_id, > + MLX5_CAP_GEN(dev, vhca_id)); > + else > + MLX5_SET(query_pages_in, in, embedded_cpu_function, > + mlx5_core_is_ecpf(dev)); > > err = mlx5_cmd_exec_inout(dev, query_pages, in, out); > if (err) > return err; > > *npages = MLX5_GET(query_pages_out, out, num_pages); > - *func_id = MLX5_GET(query_pages_out, out, function_id); > + *func_vhca_id = MLX5_GET(query_pages_out, out, function_id); > > return err; > } > @@ -245,6 +279,10 @@ static int alloc_4k(struct mlx5_core_dev *dev, u64 *addr, u32 function) > if (!fp->free_count) > list_del(&fp->list); > > + if (fp->func_type != MLX5_FUNC_TYPE_NONE) > + dev->priv.page_counters[fp->func_type]++; > + dev->priv.fw_pages++; > + > *addr = fp->addr + n * MLX5_ADAPTER_PAGE_SIZE; > > return 0; > @@ -280,6 +318,11 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function) > mlx5_core_warn_rl(dev, "page not found\n"); > return; > } > + > + if (fwp->func_type != MLX5_FUNC_TYPE_NONE) > + dev->priv.page_counters[fwp->func_type]--; > + dev->priv.fw_pages--; > + > n = (addr & ~MLX5_U64_4K_PAGE_MASK) >> MLX5_ADAPTER_PAGE_SHIFT; > fwp->free_count++; > set_bit(n, &fwp->bitmask); > @@ -289,7 +332,8 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function) > list_add(&fwp->list, &dev->priv.free_list); > } > > -static int alloc_system_page(struct mlx5_core_dev *dev, u32 function) > +static int alloc_system_page(struct mlx5_core_dev *dev, u32 function, > + u16 func_type) > { > struct device *device = mlx5_core_dma_dev(dev); > int nid = dev->priv.numa_node; > @@ -317,7 +361,7 @@ static int alloc_system_page(struct mlx5_core_dev *dev, u32 function) > goto map; > } > > - err = insert_page(dev, addr, page, function); > + err = insert_page(dev, addr, page, function, func_type); > if (err) { > mlx5_core_err(dev, "failed to track allocated page\n"); > dma_unmap_page(device, addr, PAGE_SIZE, DMA_BIDIRECTIONAL); > @@ -334,7 +378,7 @@ static int alloc_system_page(struct mlx5_core_dev *dev, u32 function) > return err; > } > > -static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_id, > +static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_vhca_id, > bool ec_function) > { > u32 in[MLX5_ST_SZ_DW(manage_pages_in)] = {}; > @@ -342,19 +386,23 @@ static void page_notify_fail(struct mlx5_core_dev *dev, u16 func_id, > > MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES); > MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_CANT_GIVE); > - MLX5_SET(manage_pages_in, in, function_id, func_id); > - MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function); > + MLX5_SET(manage_pages_in, in, function_id, func_vhca_id); > + > + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) > + MLX5_SET(manage_pages_in, in, embedded_cpu_function, > + ec_function); > > err = mlx5_cmd_exec_in(dev, manage_pages, in); > if (err) > - mlx5_core_warn(dev, "page notify failed func_id(%d) err(%d)\n", > - func_id, err); > + mlx5_core_warn(dev, > + "page notify failed func_vhca_id(%d) err(%d)\n", > + func_vhca_id, err); > } > > -static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, > +static int give_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, int npages, > int event, bool ec_function) > { > - u32 function = get_function(func_id, ec_function); > + u32 function = get_function_key(dev, func_vhca_id, ec_function); > u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0}; > int inlen = MLX5_ST_SZ_BYTES(manage_pages_in); > int notify_fail = event; > @@ -364,6 +412,8 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, > u32 *in; > int i; > > + func_type = func_vhca_id_to_type(dev, func_vhca_id, ec_function); > + > inlen += npages * MLX5_FLD_SZ_BYTES(manage_pages_in, pas[0]); > in = kvzalloc(inlen, GFP_KERNEL); > if (!in) { > @@ -377,7 +427,8 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, > err = alloc_4k(dev, &addr, function); > if (err) { > if (err == -ENOMEM) > - err = alloc_system_page(dev, function); > + err = alloc_system_page(dev, function, > + func_type); > if (err) { > dev->priv.fw_pages_alloc_failed += (npages - i); > goto out_4k; > @@ -390,9 +441,12 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, > > MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES); > MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_GIVE); > - MLX5_SET(manage_pages_in, in, function_id, func_id); > + MLX5_SET(manage_pages_in, in, function_id, func_vhca_id); > MLX5_SET(manage_pages_in, in, input_num_entries, npages); > - MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function); > + > + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) > + MLX5_SET(manage_pages_in, in, embedded_cpu_function, > + ec_function); > > err = mlx5_cmd_do(dev, in, inlen, out, sizeof(out)); > if (err == -EREMOTEIO) { > @@ -405,17 +459,15 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, > } > err = mlx5_cmd_check(dev, err, in, out); > if (err) { > - mlx5_core_warn(dev, "func_id 0x%x, npages %d, err %d\n", > - func_id, npages, err); > + mlx5_core_warn(dev, "func_vhca_id 0x%x, npages %d, err %d\n", > + func_vhca_id, npages, err); > goto out_dropped; > } > > - func_type = func_id_to_type(dev, func_id, ec_function); > - dev->priv.page_counters[func_type] += npages; > - dev->priv.fw_pages += npages; > - > - mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x, err %d\n", > - npages, ec_function, func_id, err); > + mlx5_core_dbg(dev, > + "npages %d, ec_function %d, func 0x%x, mode %d, err %d\n", > + npages, ec_function, func_vhca_id, > + mlx5_page_mgt_mode_is_vhca_id(dev), err); > > kvfree(in); > return 0; > @@ -428,18 +480,17 @@ static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, > out_free: > kvfree(in); > if (notify_fail) > - page_notify_fail(dev, func_id, ec_function); > + page_notify_fail(dev, func_vhca_id, ec_function); > return err; > } > > -static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id, > +static void release_all_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, > bool ec_function) > { > - u32 function = get_function(func_id, ec_function); > + u32 function = get_function_key(dev, func_vhca_id, ec_function); > struct rb_root *root; > struct rb_node *p; > int npages = 0; > - u16 func_type; > > root = xa_load(&dev->priv.page_root_xa, function); > if (WARN_ON_ONCE(!root)) > @@ -448,18 +499,20 @@ static void release_all_pages(struct mlx5_core_dev *dev, u16 func_id, > p = rb_first(root); > while (p) { > struct fw_page *fwp = rb_entry(p, struct fw_page, rb_node); > + int used = MLX5_NUM_4K_IN_PAGE - fwp->free_count; > > p = rb_next(p); > - npages += (MLX5_NUM_4K_IN_PAGE - fwp->free_count); > + npages += used; > + if (fwp->func_type != MLX5_FUNC_TYPE_NONE) > + dev->priv.page_counters[fwp->func_type] -= used; > free_fwp(dev, fwp, fwp->free_count); > } > > - func_type = func_id_to_type(dev, func_id, ec_function); > - dev->priv.page_counters[func_type] -= npages; > dev->priv.fw_pages -= npages; > > - mlx5_core_dbg(dev, "npages %d, ec_function %d, func_id 0x%x\n", > - npages, ec_function, func_id); > + mlx5_core_dbg(dev, "npages %d, ec_function %d, func 0x%x, mode %d\n", > + npages, ec_function, func_vhca_id, > + mlx5_page_mgt_mode_is_vhca_id(dev)); > } > > static u32 fwp_fill_manage_pages_out(struct fw_page *fwp, u32 *out, u32 index, > @@ -487,7 +540,7 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev, > struct fw_page *fwp; > struct rb_node *p; > bool ec_function; > - u32 func_id; > + u32 func_vhca_id; > u32 npages; > u32 i = 0; > int err; > @@ -499,10 +552,11 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev, > > /* No hard feelings, we want our pages back! */ > npages = MLX5_GET(manage_pages_in, in, input_num_entries); > - func_id = MLX5_GET(manage_pages_in, in, function_id); > + func_vhca_id = MLX5_GET(manage_pages_in, in, function_id); > ec_function = MLX5_GET(manage_pages_in, in, embedded_cpu_function); > > - root = xa_load(&dev->priv.page_root_xa, get_function(func_id, ec_function)); > + root = xa_load(&dev->priv.page_root_xa, > + get_function_key(dev, func_vhca_id, ec_function)); > if (WARN_ON_ONCE(!root)) > return -EEXIST; > > @@ -518,14 +572,14 @@ static int reclaim_pages_cmd(struct mlx5_core_dev *dev, > return 0; > } > > -static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, > - int *nclaimed, bool event, bool ec_function) > +static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, > + int npages, int *nclaimed, bool event, > + bool ec_function) > { > - u32 function = get_function(func_id, ec_function); > + u32 function = get_function_key(dev, func_vhca_id, ec_function); > int outlen = MLX5_ST_SZ_BYTES(manage_pages_out); > u32 in[MLX5_ST_SZ_DW(manage_pages_in)] = {}; > int num_claimed; > - u16 func_type; > u32 *out; > int err; > int i; > @@ -540,12 +594,16 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, > > MLX5_SET(manage_pages_in, in, opcode, MLX5_CMD_OP_MANAGE_PAGES); > MLX5_SET(manage_pages_in, in, op_mod, MLX5_PAGES_TAKE); > - MLX5_SET(manage_pages_in, in, function_id, func_id); > + MLX5_SET(manage_pages_in, in, function_id, func_vhca_id); > MLX5_SET(manage_pages_in, in, input_num_entries, npages); > - MLX5_SET(manage_pages_in, in, embedded_cpu_function, ec_function); > > - mlx5_core_dbg(dev, "func 0x%x, npages %d, outlen %d\n", > - func_id, npages, outlen); > + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) > + MLX5_SET(manage_pages_in, in, embedded_cpu_function, > + ec_function); > + > + mlx5_core_dbg(dev, "func 0x%x, npages %d, outlen %d mode %d\n", > + func_vhca_id, npages, outlen, > + mlx5_page_mgt_mode_is_vhca_id(dev)); > err = reclaim_pages_cmd(dev, in, sizeof(in), out, outlen); > if (err) { > npages = MLX5_GET(manage_pages_in, in, input_num_entries); > @@ -577,10 +635,6 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages, > if (nclaimed) > *nclaimed = num_claimed; > > - func_type = func_id_to_type(dev, func_id, ec_function); > - dev->priv.page_counters[func_type] -= num_claimed; > - dev->priv.fw_pages -= num_claimed; > - > out_free: > kvfree(out); > return err; > @@ -658,30 +712,102 @@ static int req_pages_handler(struct notifier_block *nb, > * req->npages (and not min ()). > */ > req->npages = max_t(s32, npages, MAX_RECLAIM_NPAGES); > - req->ec_function = ec_function; > + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) > + req->ec_function = ec_function; > req->release_all = release_all; > INIT_WORK(&req->work, pages_work_handler); > queue_work(dev->priv.pg_wq, &req->work); > return NOTIFY_OK; > } > > +/* > + * After set_hca_cap(), the second satisfy_startup_pages(dev, 0) may see > + * VHCA_ID mode. If page_root_xa already has the PF entry from the first > + * (boot) call under FUNC_ID keys 0 or (ec_function << 16), migrate that > + * entry to the device vhca_id key so lookups use VHCA_ID semantics. > + */ > +static int mlx5_pagealloc_migrate_pf_to_vhca_id(struct mlx5_core_dev *dev) > +{ > + u32 vhca_id_key, old_key; > + struct rb_root *root; > + struct fw_page *fwp; > + struct rb_node *p; > + bool ec_function; > + int err; > + > + if (xa_empty(&dev->priv.page_root_xa)) > + return 0; > + > + vhca_id_key = MLX5_CAP_GEN(dev, vhca_id); > + ec_function = mlx5_core_is_ecpf(dev); > + > + old_key = ec_function ? (1U << 16) : 0; > + root = xa_load(&dev->priv.page_root_xa, old_key); > + if (!root) > + return 0; > + > + if (old_key == vhca_id_key) > + return 0; > + > + err = xa_insert(&dev->priv.page_root_xa, vhca_id_key, root, GFP_KERNEL); > + if (err) { > + mlx5_core_warn(dev, > + "failed to migrate page root key 0x%x to vhca_id 0x%x\n", > + old_key, vhca_id_key); > + return err; > + } > + > + for (p = rb_first(root); p; p = rb_next(p)) { > + fwp = rb_entry(p, struct fw_page, rb_node); > + fwp->function = vhca_id_key; > + } > + > + xa_erase(&dev->priv.page_root_xa, old_key); > + > + return 0; > +} > + > int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot) > { > - u16 func_id; > + bool ec_function = false; > + u16 func_vhca_id; > s32 npages; > int err; > > - err = mlx5_cmd_query_pages(dev, &func_id, &npages, boot); > + /* Boot pages are requested before set_hca_cap(), so the capability > + * is not negotiated yet; use FUNC_ID mode for backward compatibility. > + * Init pages are requested after set_hca_cap(), which unconditionally > + * enables CAP_GEN_MAX. Current caps are not re-queried at this point, > + * so check CAP_GEN_MAX directly. > + */ > + if (boot) { > + mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_FUNC_ID); > + } else { > + if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode) == > + MLX5_ID_MODE_FUNCTION_VHCA_ID) { > + err = mlx5_pagealloc_migrate_pf_to_vhca_id(dev); > + if (err) > + return err; > + mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_VHCA_ID); > + } > + } > + > + err = mlx5_cmd_query_pages(dev, &func_vhca_id, &npages, boot); > if (err) > return err; > > - mlx5_core_dbg(dev, "requested %d %s pages for func_id 0x%x\n", > - npages, boot ? "boot" : "init", func_id); > + mlx5_core_dbg(dev, > + "requested %d %s pages for func_vhca_id 0x%x\n", > + npages, boot ? "boot" : "init", func_vhca_id); > > if (!npages) > return 0; > > - return give_pages(dev, func_id, npages, 0, mlx5_core_is_ecpf(dev)); > + /* In VHCA_ID mode, ec_function remains false (not used). */ > + if (!mlx5_page_mgt_mode_is_vhca_id(dev)) > + ec_function = mlx5_core_is_ecpf(dev); > + > + return give_pages(dev, func_vhca_id, npages, 0, ec_function); > } > > enum { > @@ -709,15 +835,17 @@ static int mlx5_reclaim_root_pages(struct mlx5_core_dev *dev, > > while (!RB_EMPTY_ROOT(root)) { > u32 ec_function = mlx5_get_ec_function(function); > - u32 function_id = mlx5_get_func_id(function); > + u32 func_vhca_id = mlx5_get_func_vhca_id(function); > int nclaimed; > int err; > > - err = reclaim_pages(dev, function_id, optimal_reclaimed_pages(), > + err = reclaim_pages(dev, func_vhca_id, > + optimal_reclaimed_pages(), > &nclaimed, false, ec_function); > if (err) { > - mlx5_core_warn(dev, "reclaim_pages err (%d) func_id=0x%x ec_func=0x%x\n", > - err, function_id, ec_function); > + mlx5_core_warn(dev, > + "reclaim_pages err (%d) func_vhca_id=0x%x ec_func=0x%x\n", > + err, func_vhca_id, ec_function); > return err; > } > > diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h > index d1751c5d01c7..8b4d384125d1 100644 > --- a/include/linux/mlx5/driver.h > +++ b/include/linux/mlx5/driver.h > @@ -558,6 +558,12 @@ enum mlx5_func_type { > MLX5_HOST_PF, > MLX5_EC_VF, > MLX5_FUNC_TYPE_NUM, > + MLX5_FUNC_TYPE_NONE = MLX5_FUNC_TYPE_NUM, > +}; > + > +enum mlx5_page_mgt_mode { > + MLX5_PAGE_MGT_MODE_FUNC_ID, > + MLX5_PAGE_MGT_MODE_VHCA_ID, > }; > > struct mlx5_frag_buf_node_pools; > @@ -578,6 +584,7 @@ struct mlx5_priv { > u32 fw_pages_alloc_failed; > u32 give_pages_dropped; > u32 reclaim_pages_discard; > + enum mlx5_page_mgt_mode page_mgt_mode; > > struct mlx5_core_health health; > struct list_head traps; ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode 2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan ` (2 preceding siblings ...) 2026-05-06 13:32 ` [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan @ 2026-05-09 2:10 ` patchwork-bot+netdevbpf 3 siblings, 0 replies; 6+ messages in thread From: patchwork-bot+netdevbpf @ 2026-05-09 2:10 UTC (permalink / raw) To: Tariq Toukan Cc: edumazet, kuba, pabeni, andrew+netdev, davem, saeedm, leon, mbloch, moshe, agoldberger, netdev, linux-rdma, linux-kernel, gal, dtatulea Hello: This series was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Wed, 6 May 2026 16:32:36 +0300 you wrote: > Hi, > > Find detailed description by Moshe below. > > Regards, > Tariq > > [...] Here is the summary with links: - [net-next,V2,1/3] net/mlx5: Relax capability check for eswitch query paths https://git.kernel.org/netdev/net-next/c/8ca32460815f - [net-next,V2,2/3] net/mlx5: Make debugfs page counters by function type dynamic https://git.kernel.org/netdev/net-next/c/5796d9fe0b88 - [net-next,V2,3/3] net/mlx5: Add VHCA_ID page management mode support https://git.kernel.org/netdev/net-next/c/1fba57c91416 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-05-09 2:11 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-06 13:32 [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode Tariq Toukan 2026-05-06 13:32 ` [PATCH net-next V2 1/3] net/mlx5: Relax capability check for eswitch query paths Tariq Toukan 2026-05-06 13:32 ` [PATCH net-next V2 2/3] net/mlx5: Make debugfs page counters by function type dynamic Tariq Toukan 2026-05-06 13:32 ` [PATCH net-next V2 3/3] net/mlx5: Add VHCA_ID page management mode support Tariq Toukan 2026-05-07 15:39 ` Moshe Shemesh 2026-05-09 2:10 ` [PATCH net-next V2 0/3] net/mlx5: ICM page management in VHCA_ID mode patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox