* [PATCH net 0/9] mlx5 misc fixes 2025-06-10
@ 2025-06-10 15:15 Mark Bloch
2025-06-10 15:15 ` [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA Mark Bloch
` (10 more replies)
0 siblings, 11 replies; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Mark Bloch
This patchset includes misc fixes from the team for the mlx5 core
and Ethernet drivers.
Thanks,
Mark
Amir Tzin (1):
net/mlx5: Fix ECVF vports unload on shutdown flow
Jianbo Liu (1):
net/mlx5e: Fix leak of Geneve TLV option object
Leon Romanovsky (1):
net/mlx5e: Properly access RCU protected qdisc_sleeping variable
Moshe Shemesh (1):
net/mlx5: Ensure fw pages are always allocated on same NUMA
Patrisious Haddad (1):
net/mlx5: Fix return value when searching for existing flow group
Shahar Shitrit (1):
net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
Vlad Dogaru (2):
net/mlx5: HWS, Init mutex on the correct path
net/mlx5: HWS, make sure the uplink is the last destination
Yevgeny Kliteynik (1):
net/mlx5: HWS, fix missing ip_version handling in definer
.../net/ethernet/mellanox/mlx5/core/en/qos.c | 4 +++-
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 5 +----
.../net/ethernet/mellanox/mlx5/core/en_tc.c | 12 +++++------
.../net/ethernet/mellanox/mlx5/core/eswitch.c | 21 ++++++++++++-------
.../net/ethernet/mellanox/mlx5/core/fs_core.c | 5 ++++-
.../ethernet/mellanox/mlx5/core/pagealloc.c | 2 +-
.../mellanox/mlx5/core/steering/hws/action.c | 14 ++++++-------
.../mellanox/mlx5/core/steering/hws/definer.c | 3 +++
.../mellanox/mlx5/core/steering/hws/fs_hws.c | 5 ++++-
.../mellanox/mlx5/core/steering/hws/mlx5hws.h | 1 +
10 files changed, 43 insertions(+), 29 deletions(-)
base-commit: fdd9ebccfc32c060d027ab9a2c957097e6997de6
--
2.34.1
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
@ 2025-06-10 15:15 ` Mark Bloch
2025-06-13 16:22 ` Zhu Yanjun
2025-06-10 15:15 ` [PATCH net 2/9] net/mlx5: Fix ECVF vports unload on shutdown flow Mark Bloch
` (9 subsequent siblings)
10 siblings, 1 reply; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Moshe Shemesh, Mark Bloch
From: Moshe Shemesh <moshe@nvidia.com>
When firmware asks the driver to allocate more pages, using event of
give_pages, the driver should always allocate it from same NUMA, the
original device NUMA. Current code uses dev_to_node() which can result
in different NUMA as it is changed by other driver flows, such as
mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for
allocating firmware pages.
Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
index 972e8e9df585..9bc9bd83c232 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
@@ -291,7 +291,7 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function)
static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
{
struct device *device = mlx5_core_dma_dev(dev);
- int nid = dev_to_node(device);
+ int nid = dev->priv.numa_node;
struct page *page;
u64 zero_addr = 1;
u64 addr;
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 2/9] net/mlx5: Fix ECVF vports unload on shutdown flow
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
2025-06-10 15:15 ` [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA Mark Bloch
@ 2025-06-10 15:15 ` Mark Bloch
2025-06-10 15:15 ` [PATCH net 3/9] net/mlx5: Fix return value when searching for existing flow group Mark Bloch
` (8 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Amir Tzin, Daniel Jurgens, Moshe Shemesh,
Mark Bloch
From: Amir Tzin <amirtz@nvidia.com>
Fix shutdown flow UAF when a virtual function is created on the embedded
chip (ECVF) of a BlueField device. In such case the vport acl ingress
table is not properly destroyed.
ECVF functionality is independent of ecpf_vport_exists capability and
thus functions mlx5_eswitch_(enable|disable)_pf_vf_vports() should not
test it when enabling/disabling ECVF vports.
kernel log:
[] refcount_t: underflow; use-after-free.
[] WARNING: CPU: 3 PID: 1 at lib/refcount.c:28
refcount_warn_saturate+0x124/0x220
----------------
[] Call trace:
[] refcount_warn_saturate+0x124/0x220
[] tree_put_node+0x164/0x1e0 [mlx5_core]
[] mlx5_destroy_flow_table+0x98/0x2c0 [mlx5_core]
[] esw_acl_ingress_table_destroy+0x28/0x40 [mlx5_core]
[] esw_acl_ingress_lgcy_cleanup+0x80/0xf4 [mlx5_core]
[] esw_legacy_vport_acl_cleanup+0x44/0x60 [mlx5_core]
[] esw_vport_cleanup+0x64/0x90 [mlx5_core]
[] mlx5_esw_vport_disable+0xc0/0x1d0 [mlx5_core]
[] mlx5_eswitch_unload_ec_vf_vports+0xcc/0x150 [mlx5_core]
[] mlx5_eswitch_disable_sriov+0x198/0x2a0 [mlx5_core]
[] mlx5_device_disable_sriov+0xb8/0x1e0 [mlx5_core]
[] mlx5_sriov_detach+0x40/0x50 [mlx5_core]
[] mlx5_unload+0x40/0xc4 [mlx5_core]
[] mlx5_unload_one_devl_locked+0x6c/0xe4 [mlx5_core]
[] mlx5_unload_one+0x3c/0x60 [mlx5_core]
[] shutdown+0x7c/0xa4 [mlx5_core]
[] pci_device_shutdown+0x3c/0xa0
[] device_shutdown+0x170/0x340
[] __do_sys_reboot+0x1f4/0x2a0
[] __arm64_sys_reboot+0x2c/0x40
[] invoke_syscall+0x78/0x100
[] el0_svc_common.constprop.0+0x54/0x184
[] do_el0_svc+0x30/0xac
[] el0_svc+0x48/0x160
[] el0t_64_sync_handler+0xa4/0x12c
[] el0t_64_sync+0x1a4/0x1a8
[] --[ end trace 9c4601d68c70030e ]---
Fixes: a7719b29a821 ("net/mlx5: Add management of EC VF vports")
Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Amir Tzin <amirtz@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/eswitch.c | 21 ++++++++++++-------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 7fb8a3381f84..4917d185d0c3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1295,12 +1295,15 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
ret = mlx5_eswitch_load_pf_vf_vport(esw, MLX5_VPORT_ECPF, enabled_events);
if (ret)
goto ecpf_err;
- if (mlx5_core_ec_sriov_enabled(esw->dev)) {
- ret = mlx5_eswitch_load_ec_vf_vports(esw, esw->esw_funcs.num_ec_vfs,
- enabled_events);
- if (ret)
- goto ec_vf_err;
- }
+ }
+
+ /* Enable ECVF vports */
+ if (mlx5_core_ec_sriov_enabled(esw->dev)) {
+ ret = mlx5_eswitch_load_ec_vf_vports(esw,
+ esw->esw_funcs.num_ec_vfs,
+ enabled_events);
+ if (ret)
+ goto ec_vf_err;
}
/* Enable VF vports */
@@ -1331,9 +1334,11 @@ void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw)
{
mlx5_eswitch_unload_vf_vports(esw, esw->esw_funcs.num_vfs);
+ if (mlx5_core_ec_sriov_enabled(esw->dev))
+ mlx5_eswitch_unload_ec_vf_vports(esw,
+ esw->esw_funcs.num_ec_vfs);
+
if (mlx5_ecpf_vport_exists(esw->dev)) {
- if (mlx5_core_ec_sriov_enabled(esw->dev))
- mlx5_eswitch_unload_ec_vf_vports(esw, esw->esw_funcs.num_vfs);
mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_ECPF);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 3/9] net/mlx5: Fix return value when searching for existing flow group
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
2025-06-10 15:15 ` [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA Mark Bloch
2025-06-10 15:15 ` [PATCH net 2/9] net/mlx5: Fix ECVF vports unload on shutdown flow Mark Bloch
@ 2025-06-10 15:15 ` Mark Bloch
2025-06-10 15:15 ` [PATCH net 4/9] net/mlx5: HWS, Init mutex on the correct path Mark Bloch
` (7 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Patrisious Haddad, Gavi Teitz, Roi Dayan,
Mark Bloch
From: Patrisious Haddad <phaddad@nvidia.com>
When attempting to add a rule to an existing flow group, if a matching
flow group exists but is not active, the error code returned should be
EAGAIN, so that the rule can be added to the matching flow group once
it is active, rather than ENOENT, which indicates that no matching
flow group was found.
Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Gavi Teitz <gavi@nvidia.com>
Signed-off-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 23a7e8e7adfa..a8046200d376 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -2228,6 +2228,7 @@ try_add_to_existing_fg(struct mlx5_flow_table *ft,
struct mlx5_flow_handle *rule;
struct match_list *iter;
bool take_write = false;
+ bool try_again = false;
struct fs_fte *fte;
u64 version = 0;
int err;
@@ -2292,6 +2293,7 @@ try_add_to_existing_fg(struct mlx5_flow_table *ft,
nested_down_write_ref_node(&g->node, FS_LOCK_PARENT);
if (!g->node.active) {
+ try_again = true;
up_write_ref_node(&g->node, false);
continue;
}
@@ -2313,7 +2315,8 @@ try_add_to_existing_fg(struct mlx5_flow_table *ft,
tree_put_node(&fte->node, false);
return rule;
}
- rule = ERR_PTR(-ENOENT);
+ err = try_again ? -EAGAIN : -ENOENT;
+ rule = ERR_PTR(err);
out:
kmem_cache_free(steering->ftes_cache, fte);
return rule;
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 4/9] net/mlx5: HWS, Init mutex on the correct path
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
` (2 preceding siblings ...)
2025-06-10 15:15 ` [PATCH net 3/9] net/mlx5: Fix return value when searching for existing flow group Mark Bloch
@ 2025-06-10 15:15 ` Mark Bloch
2025-06-10 15:15 ` [PATCH net 5/9] net/mlx5: HWS, fix missing ip_version handling in definer Mark Bloch
` (6 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Vlad Dogaru, Yevgeny Kliteynik, Mark Bloch
From: Vlad Dogaru <vdogaru@nvidia.com>
The newly introduced mutex is only used for reformat actions, but it was
initialized for modify header instead.
The struct that contains the mutex is zero-initialized and an all-zero
mutex is valid, so the issue only shows up with CONFIG_DEBUG_MUTEXES.
Fixes: b206d9ec19df ("net/mlx5: HWS, register reformat actions with fw")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c
index 9d1c0e4b224a..372e2be90706 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c
@@ -1357,6 +1357,7 @@ mlx5_cmd_hws_packet_reformat_alloc(struct mlx5_flow_root_namespace *ns,
pkt_reformat->fs_hws_action.pr_data = pr_data;
}
+ mutex_init(&pkt_reformat->fs_hws_action.lock);
pkt_reformat->owner = MLX5_FLOW_RESOURCE_OWNER_HWS;
pkt_reformat->fs_hws_action.hws_action = hws_action;
return 0;
@@ -1503,7 +1504,6 @@ static int mlx5_cmd_hws_modify_header_alloc(struct mlx5_flow_root_namespace *ns,
err = -ENOMEM;
goto release_mh;
}
- mutex_init(&modify_hdr->fs_hws_action.lock);
modify_hdr->fs_hws_action.mh_data = mh_data;
modify_hdr->fs_hws_action.fs_pool = pool;
modify_hdr->owner = MLX5_FLOW_RESOURCE_OWNER_SW;
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 5/9] net/mlx5: HWS, fix missing ip_version handling in definer
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
` (3 preceding siblings ...)
2025-06-10 15:15 ` [PATCH net 4/9] net/mlx5: HWS, Init mutex on the correct path Mark Bloch
@ 2025-06-10 15:15 ` Mark Bloch
2025-06-10 15:15 ` [PATCH net 6/9] net/mlx5: HWS, make sure the uplink is the last destination Mark Bloch
` (5 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Yevgeny Kliteynik, Mark Bloch
From: Yevgeny Kliteynik <kliteyn@nvidia.com>
Fix missing field handling in definer - outer IP version.
Fixes: 74a778b4a63f ("net/mlx5: HWS, added definers handling")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c
index 5cc0dc002ac1..d45e1145d197 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/definer.c
@@ -785,6 +785,9 @@ hws_definer_conv_outer(struct mlx5hws_definer_conv_data *cd,
HWS_SET_HDR(fc, match_param, IP_PROTOCOL_O,
outer_headers.ip_protocol,
eth_l3_outer.protocol_next_header);
+ HWS_SET_HDR(fc, match_param, IP_VERSION_O,
+ outer_headers.ip_version,
+ eth_l3_outer.ip_version);
HWS_SET_HDR(fc, match_param, IP_TTL_O,
outer_headers.ttl_hoplimit,
eth_l3_outer.time_to_live_hop_limit);
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 6/9] net/mlx5: HWS, make sure the uplink is the last destination
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
` (4 preceding siblings ...)
2025-06-10 15:15 ` [PATCH net 5/9] net/mlx5: HWS, fix missing ip_version handling in definer Mark Bloch
@ 2025-06-10 15:15 ` Mark Bloch
2025-06-10 15:15 ` [PATCH net 7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable Mark Bloch
` (4 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Vlad Dogaru, Yevgeny Kliteynik, Mark Bloch
From: Vlad Dogaru <vdogaru@nvidia.com>
When there are more than one destinations, we create a FW flow
table and provide it with all the destinations. FW requires to
have wire as the last destination in the list (if it exists),
otherwise the operation fails with FW syndrome.
This patch fixes the destination array action creation: if it
contains a wire destination, it is moved to the end.
Fixes: 504e536d9010 ("net/mlx5: HWS, added actions handling")
Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
.../mellanox/mlx5/core/steering/hws/action.c | 14 +++++++-------
.../mellanox/mlx5/core/steering/hws/fs_hws.c | 3 +++
.../mellanox/mlx5/core/steering/hws/mlx5hws.h | 1 +
3 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c
index fb62f3bc4bd4..447ea3f8722c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c
@@ -1370,8 +1370,8 @@ mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx,
struct mlx5hws_cmd_set_fte_attr fte_attr = {0};
struct mlx5hws_cmd_forward_tbl *fw_island;
struct mlx5hws_action *action;
- u32 i /*, packet_reformat_id*/;
- int ret;
+ int ret, last_dest_idx = -1;
+ u32 i;
if (num_dest <= 1) {
mlx5hws_err(ctx, "Action must have multiple dests\n");
@@ -1401,11 +1401,8 @@ mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx,
dest_list[i].destination_id = dests[i].dest->dest_obj.obj_id;
fte_attr.action_flags |= MLX5_FLOW_CONTEXT_ACTION_FWD_DEST;
fte_attr.ignore_flow_level = ignore_flow_level;
- /* ToDo: In SW steering we have a handling of 'go to WIRE'
- * destination here by upper layer setting 'is_wire_ft' flag
- * if the destination is wire.
- * This is because uplink should be last dest in the list.
- */
+ if (dests[i].is_wire_ft)
+ last_dest_idx = i;
break;
case MLX5HWS_ACTION_TYP_VPORT:
dest_list[i].destination_type = MLX5_FLOW_DESTINATION_TYPE_VPORT;
@@ -1429,6 +1426,9 @@ mlx5hws_action_create_dest_array(struct mlx5hws_context *ctx,
}
}
+ if (last_dest_idx != -1)
+ swap(dest_list[last_dest_idx], dest_list[num_dest - 1]);
+
fte_attr.dests_num = num_dest;
fte_attr.dests = dest_list;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c
index 372e2be90706..bf4643d0ce17 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/fs_hws.c
@@ -966,6 +966,9 @@ static int mlx5_fs_fte_get_hws_actions(struct mlx5_flow_root_namespace *ns,
switch (attr->type) {
case MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE:
dest_action = mlx5_fs_get_dest_action_ft(fs_ctx, dst);
+ if (dst->dest_attr.ft->flags &
+ MLX5_FLOW_TABLE_UPLINK_VPORT)
+ dest_actions[num_dest_actions].is_wire_ft = true;
break;
case MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE_NUM:
dest_action = mlx5_fs_get_dest_action_table_num(fs_ctx,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h
index 9bbadc4d8a0b..d8ac6c196211 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/mlx5hws.h
@@ -213,6 +213,7 @@ struct mlx5hws_action_dest_attr {
struct mlx5hws_action *dest;
/* Optional reformat action */
struct mlx5hws_action *reformat;
+ bool is_wire_ft;
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
` (5 preceding siblings ...)
2025-06-10 15:15 ` [PATCH net 6/9] net/mlx5: HWS, make sure the uplink is the last destination Mark Bloch
@ 2025-06-10 15:15 ` Mark Bloch
2025-06-11 21:40 ` Jakub Kicinski
2025-06-10 15:15 ` [PATCH net 8/9] net/mlx5e: Fix leak of Geneve TLV option object Mark Bloch
` (3 subsequent siblings)
10 siblings, 1 reply; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Mark Bloch
From: Leon Romanovsky <leonro@nvidia.com>
qdisc_sleeping variable is declared as "struct Qdisc __rcu" and
as such needs proper annotation while accessing it.
Without rtnl_dereference(), the following error is generated by smatch:
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: warning:
incorrect type in initializer (different address spaces)
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: expected
struct Qdisc *qdisc
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: got struct
Qdisc [noderef] __rcu *qdisc_sleeping
Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
index f0744a45db92..2f32111210f8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
@@ -374,7 +374,9 @@ void mlx5e_reactivate_qos_sq(struct mlx5e_priv *priv, u16 qid, struct netdev_que
void mlx5e_reset_qdisc(struct net_device *dev, u16 qid)
{
struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, qid);
- struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
+ struct Qdisc *qdisc;
+
+ qdisc = rtnl_dereference(dev_queue->qdisc_sleeping);
if (!qdisc)
return;
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 8/9] net/mlx5e: Fix leak of Geneve TLV option object
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
` (6 preceding siblings ...)
2025-06-10 15:15 ` [PATCH net 7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable Mark Bloch
@ 2025-06-10 15:15 ` Mark Bloch
2025-06-10 15:15 ` [PATCH net 9/9] net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper Mark Bloch
` (2 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Jianbo Liu, Alex Lazar, Mark Bloch
From: Jianbo Liu <jianbol@nvidia.com>
Previously, a unique tunnel id was added for the matching on TC
non-zero chains, to support inner header rewrite with goto action.
Later, it was used to support VF tunnel offload for vxlan, then for
Geneve and GRE. To support VF tunnel, a temporary mlx5_flow_spec is
used to parse tunnel options. For Geneve, if there is TLV option, a
object is created, or refcnt is added if already exists. But the
temporary mlx5_flow_spec is directly freed after parsing, which causes
the leak because no information regarding the object is saved in
flow's mlx5_flow_spec, which is used to free the object when deleting
the flow.
To fix the leak, call mlx5_geneve_tlv_option_del() before free the
temporary spec if it has TLV object.
Fixes: 521933cdc4aa ("net/mlx5e: Support Geneve and GRE with VF tunnel offload")
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Alex Lazar <alazar@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index f1d908f61134..fef418e1ed1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2028,9 +2028,8 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv,
return err;
}
-static bool mlx5_flow_has_geneve_opt(struct mlx5e_tc_flow *flow)
+static bool mlx5_flow_has_geneve_opt(struct mlx5_flow_spec *spec)
{
- struct mlx5_flow_spec *spec = &flow->attr->parse_attr->spec;
void *headers_v = MLX5_ADDR_OF(fte_match_param,
spec->match_value,
misc_parameters_3);
@@ -2069,7 +2068,7 @@ static void mlx5e_tc_del_fdb_flow(struct mlx5e_priv *priv,
}
complete_all(&flow->del_hw_done);
- if (mlx5_flow_has_geneve_opt(flow))
+ if (mlx5_flow_has_geneve_opt(&attr->parse_attr->spec))
mlx5_geneve_tlv_option_del(priv->mdev->geneve);
if (flow->decap_route)
@@ -2574,12 +2573,13 @@ static int parse_tunnel_attr(struct mlx5e_priv *priv,
err = mlx5e_tc_tun_parse(filter_dev, priv, tmp_spec, f, match_level);
if (err) {
- kvfree(tmp_spec);
NL_SET_ERR_MSG_MOD(extack, "Failed to parse tunnel attributes");
netdev_warn(priv->netdev, "Failed to parse tunnel attributes");
- return err;
+ } else {
+ err = mlx5e_tc_set_attr_rx_tun(flow, tmp_spec);
}
- err = mlx5e_tc_set_attr_rx_tun(flow, tmp_spec);
+ if (mlx5_flow_has_geneve_opt(tmp_spec))
+ mlx5_geneve_tlv_option_del(priv->mdev->geneve);
kvfree(tmp_spec);
if (err)
return err;
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH net 9/9] net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
` (7 preceding siblings ...)
2025-06-10 15:15 ` [PATCH net 8/9] net/mlx5e: Fix leak of Geneve TLV option object Mark Bloch
@ 2025-06-10 15:15 ` Mark Bloch
2025-06-11 21:43 ` [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Jakub Kicinski
2025-06-11 21:50 ` patchwork-bot+netdevbpf
10 siblings, 0 replies; 20+ messages in thread
From: Mark Bloch @ 2025-06-10 15:15 UTC (permalink / raw)
To: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet,
Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Shahar Shitrit, Mark Bloch
From: Shahar Shitrit <shshitrit@nvidia.com>
When the link is up, either eth_proto_oper or ext_eth_proto_oper
typically reports the active link protocol, from which both speed
and number of lanes can be retrieved. However, in certain cases,
such as when a NIC is connected via a non-standard cable, the
firmware may not report the protocol.
In such scenarios, the speed can still be obtained from the
data_rate_oper field in PTYS register. Since data_rate_oper
provides only speed information and lacks lane details, it is
incorrect to derive the number of lanes from it.
This patch corrects the behavior by setting the number of lanes to
UNKNOWN instead of incorrectly using MAX_LANES when relying on
data_rate_oper.
Fixes: 7e959797f021 ("net/mlx5e: Enable lanes configuration when auto-negotiation is off")
Signed-off-by: Shahar Shitrit <shshitrit@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index ea078c9f5d15..3cb8d3bf9044 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -43,7 +43,6 @@
#include "en/fs_ethtool.h"
#define LANES_UNKNOWN 0
-#define MAX_LANES 8
void mlx5e_ethtool_get_drvinfo(struct mlx5e_priv *priv,
struct ethtool_drvinfo *drvinfo)
@@ -1098,10 +1097,8 @@ static void get_link_properties(struct net_device *netdev,
speed = info->speed;
lanes = info->lanes;
duplex = DUPLEX_FULL;
- } else if (data_rate_oper) {
+ } else if (data_rate_oper)
speed = 100 * data_rate_oper;
- lanes = MAX_LANES;
- }
out:
link_ksettings->base.duplex = duplex;
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH net 7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable
2025-06-10 15:15 ` [PATCH net 7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable Mark Bloch
@ 2025-06-11 21:40 ` Jakub Kicinski
2025-06-12 7:31 ` Mark Bloch
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-06-11 21:40 UTC (permalink / raw)
To: Mark Bloch
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Simon Horman, saeedm, gal, leonro, tariqt, Leon Romanovsky,
netdev, linux-rdma, linux-kernel
On Tue, 10 Jun 2025 18:15:12 +0300 Mark Bloch wrote:
> qdisc_sleeping variable is declared as "struct Qdisc __rcu" and
> as such needs proper annotation while accessing it.
>
> Without rtnl_dereference(), the following error is generated by smatch:
sparse ?
>
> drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: warning:
> incorrect type in initializer (different address spaces)
> drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: expected
> struct Qdisc *qdisc
> drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: got struct
> Qdisc [noderef] __rcu *qdisc_sleeping
>
> Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
I don't think this is a functional change? We don't treat silencing
compiler warnings as fixes, not for sparse or W=1 warnings.
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
> index f0744a45db92..2f32111210f8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
> @@ -374,7 +374,9 @@ void mlx5e_reactivate_qos_sq(struct mlx5e_priv *priv, u16 qid, struct netdev_que
> void mlx5e_reset_qdisc(struct net_device *dev, u16 qid)
> {
> struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, qid);
> - struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
> + struct Qdisc *qdisc;
> +
> + qdisc = rtnl_dereference(dev_queue->qdisc_sleeping);
>
> if (!qdisc)
nit: no new line between action and error check
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 0/9] mlx5 misc fixes 2025-06-10
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
` (8 preceding siblings ...)
2025-06-10 15:15 ` [PATCH net 9/9] net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper Mark Bloch
@ 2025-06-11 21:43 ` Jakub Kicinski
2025-06-11 21:50 ` patchwork-bot+netdevbpf
10 siblings, 0 replies; 20+ messages in thread
From: Jakub Kicinski @ 2025-06-11 21:43 UTC (permalink / raw)
To: Mark Bloch
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Simon Horman, saeedm, gal, leonro, tariqt, Leon Romanovsky,
netdev, linux-rdma, linux-kernel
On Tue, 10 Jun 2025 18:15:05 +0300 Mark Bloch wrote:
> This patchset includes misc fixes from the team for the mlx5 core
> and Ethernet drivers.
I'll apply the good patches, the one that should go to net-next looks
completely unrelated to the rest.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 0/9] mlx5 misc fixes 2025-06-10
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
` (9 preceding siblings ...)
2025-06-11 21:43 ` [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Jakub Kicinski
@ 2025-06-11 21:50 ` patchwork-bot+netdevbpf
10 siblings, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-06-11 21:50 UTC (permalink / raw)
To: Mark Bloch
Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, saeedm, gal,
leonro, tariqt, leon, netdev, linux-rdma, linux-kernel
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 10 Jun 2025 18:15:05 +0300 you wrote:
> This patchset includes misc fixes from the team for the mlx5 core
> and Ethernet drivers.
>
> Thanks,
> Mark
>
> Amir Tzin (1):
> net/mlx5: Fix ECVF vports unload on shutdown flow
>
> [...]
Here is the summary with links:
- [net,1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA
https://git.kernel.org/netdev/net/c/f37258133c1e
- [net,2/9] net/mlx5: Fix ECVF vports unload on shutdown flow
https://git.kernel.org/netdev/net/c/687560d8a9a2
- [net,3/9] net/mlx5: Fix return value when searching for existing flow group
https://git.kernel.org/netdev/net/c/8ec40e3f1f72
- [net,4/9] net/mlx5: HWS, Init mutex on the correct path
https://git.kernel.org/netdev/net/c/a002602676cd
- [net,5/9] net/mlx5: HWS, fix missing ip_version handling in definer
https://git.kernel.org/netdev/net/c/b5e3c76f35ee
- [net,6/9] net/mlx5: HWS, make sure the uplink is the last destination
https://git.kernel.org/netdev/net/c/b8335829518e
- [net,7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable
(no matching commit)
- [net,8/9] net/mlx5e: Fix leak of Geneve TLV option object
https://git.kernel.org/netdev/net/c/aa9c44b84209
- [net,9/9] net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper
https://git.kernel.org/netdev/net/c/875d7c160d60
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable
2025-06-11 21:40 ` Jakub Kicinski
@ 2025-06-12 7:31 ` Mark Bloch
2025-06-12 14:22 ` Jakub Kicinski
0 siblings, 1 reply; 20+ messages in thread
From: Mark Bloch @ 2025-06-12 7:31 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Simon Horman, saeedm, gal, leonro, tariqt, Leon Romanovsky,
netdev, linux-rdma, linux-kernel
On 12/06/2025 0:40, Jakub Kicinski wrote:
> On Tue, 10 Jun 2025 18:15:12 +0300 Mark Bloch wrote:
>> qdisc_sleeping variable is declared as "struct Qdisc __rcu" and
>> as such needs proper annotation while accessing it.
>>
>> Without rtnl_dereference(), the following error is generated by smatch:
>
> sparse ?
Right, just tested it myself, it's indeed with sparse.
>
>>
>> drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: warning:
>> incorrect type in initializer (different address spaces)
>> drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: expected
>> struct Qdisc *qdisc
>> drivers/net/ethernet/mellanox/mlx5/core/en/qos.c:377:40: got struct
>> Qdisc [noderef] __rcu *qdisc_sleeping
>>
>> Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
>
> I don't think this is a functional change? We don't treat silencing
> compiler warnings as fixes, not for sparse or W=1 warnings.
Well Eric's commit: d636fc5dd692c8f4e00ae6e0359c0eceeb5d9bdb
that added this annotation was because of a syzbot report.
Anyway, we don't mind pushing via net-next.
Mark
>
>> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 4 +++-
>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
>> index f0744a45db92..2f32111210f8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/qos.c
>> @@ -374,7 +374,9 @@ void mlx5e_reactivate_qos_sq(struct mlx5e_priv *priv, u16 qid, struct netdev_que
>> void mlx5e_reset_qdisc(struct net_device *dev, u16 qid)
>> {
>> struct netdev_queue *dev_queue = netdev_get_tx_queue(dev, qid);
>> - struct Qdisc *qdisc = dev_queue->qdisc_sleeping;
>> + struct Qdisc *qdisc;
>> +
>> + qdisc = rtnl_dereference(dev_queue->qdisc_sleeping);
>>
>> if (!qdisc)
>
> nit: no new line between action and error check
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable
2025-06-12 7:31 ` Mark Bloch
@ 2025-06-12 14:22 ` Jakub Kicinski
2025-06-12 14:47 ` Mark Bloch
0 siblings, 1 reply; 20+ messages in thread
From: Jakub Kicinski @ 2025-06-12 14:22 UTC (permalink / raw)
To: Mark Bloch
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Simon Horman, saeedm, gal, leonro, tariqt, Leon Romanovsky,
netdev, linux-rdma, linux-kernel
On Thu, 12 Jun 2025 10:31:45 +0300 Mark Bloch wrote:
> > I don't think this is a functional change? We don't treat silencing
> > compiler warnings as fixes, not for sparse or W=1 warnings.
>
> Well Eric's commit: d636fc5dd692c8f4e00ae6e0359c0eceeb5d9bdb
> that added this annotation was because of a syzbot report.
And your point is?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable
2025-06-12 14:22 ` Jakub Kicinski
@ 2025-06-12 14:47 ` Mark Bloch
0 siblings, 0 replies; 20+ messages in thread
From: Mark Bloch @ 2025-06-12 14:47 UTC (permalink / raw)
To: Jakub Kicinski
Cc: David S. Miller, Paolo Abeni, Eric Dumazet, Andrew Lunn,
Simon Horman, saeedm, gal, leonro, tariqt, Leon Romanovsky,
netdev, linux-rdma, linux-kernel
On 12/06/2025 17:22, Jakub Kicinski wrote:
> On Thu, 12 Jun 2025 10:31:45 +0300 Mark Bloch wrote:
>>> I don't think this is a functional change? We don't treat silencing
>>> compiler warnings as fixes, not for sparse or W=1 warnings.
>>
>> Well Eric's commit: d636fc5dd692c8f4e00ae6e0359c0eceeb5d9bdb
>> that added this annotation was because of a syzbot report.
>
> And your point is?
I just mean there's a reason for using annotations, and being
the odd one out feels off. I also wouldn't want anyone to
accidentally reference that logic (if they add HTB offloads support).
Will push via net-next.
Mark
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA
2025-06-10 15:15 ` [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA Mark Bloch
@ 2025-06-13 16:22 ` Zhu Yanjun
2025-06-15 5:55 ` Moshe Shemesh
0 siblings, 1 reply; 20+ messages in thread
From: Zhu Yanjun @ 2025-06-13 16:22 UTC (permalink / raw)
To: Mark Bloch, David S. Miller, Jakub Kicinski, Paolo Abeni,
Eric Dumazet, Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel, Moshe Shemesh
在 2025/6/10 8:15, Mark Bloch 写道:
> From: Moshe Shemesh <moshe@nvidia.com>
>
> When firmware asks the driver to allocate more pages, using event of
> give_pages, the driver should always allocate it from same NUMA, the
> original device NUMA. Current code uses dev_to_node() which can result
> in different NUMA as it is changed by other driver flows, such as
> mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for
> allocating firmware pages.
I'm not sure whether NUMA balancing is currently being considered or not.
If I understand correctly, after this commit is applied, all pages will
be allocated from the same NUMA node — specifically, the original
device's NUMA node. This seems like it could lead to NUMA imbalance.
By using dev_to_node, it appears that pages could be allocated from
other NUMA nodes, which might help maintain better NUMA balance.
In the past, I encountered a NUMA balancing issue caused by the mlx5
NIC, so using dev_to_node might be beneficial in addressing similar
problems.
Thanks,
Zhu Yanjun
>
> Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA node")
> Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> index 972e8e9df585..9bc9bd83c232 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> @@ -291,7 +291,7 @@ static void free_4k(struct mlx5_core_dev *dev, u64 addr, u32 function)
> static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
> {
> struct device *device = mlx5_core_dma_dev(dev);
> - int nid = dev_to_node(device);
> + int nid = dev->priv.numa_node;
> struct page *page;
> u64 zero_addr = 1;
> u64 addr;
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA
2025-06-13 16:22 ` Zhu Yanjun
@ 2025-06-15 5:55 ` Moshe Shemesh
2025-06-15 14:44 ` Zhu Yanjun
0 siblings, 1 reply; 20+ messages in thread
From: Moshe Shemesh @ 2025-06-15 5:55 UTC (permalink / raw)
To: Zhu Yanjun, Mark Bloch, David S. Miller, Jakub Kicinski,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel
On 6/13/2025 7:22 PM, Zhu Yanjun wrote:
> 在 2025/6/10 8:15, Mark Bloch 写道:
>> From: Moshe Shemesh <moshe@nvidia.com>
>>
>> When firmware asks the driver to allocate more pages, using event of
>> give_pages, the driver should always allocate it from same NUMA, the
>> original device NUMA. Current code uses dev_to_node() which can result
>> in different NUMA as it is changed by other driver flows, such as
>> mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for
>> allocating firmware pages.
>
> I'm not sure whether NUMA balancing is currently being considered or not.
>
> If I understand correctly, after this commit is applied, all pages will
> be allocated from the same NUMA node — specifically, the original
> device's NUMA node. This seems like it could lead to NUMA imbalance.
The change is applied only on pages allocated for FW use. Pages which
are allocated for driver use as SQ/RQ/CQ/EQ etc, are not affected by
this change.
As for FW pages (allocated for FW use), we did mean to use only the
device close NUMA, we are not looking for balance here. Even before the
change, in most cases, FW pages are allocated from device close NUMA,
the fix only ensures it.
>
> By using dev_to_node, it appears that pages could be allocated from
> other NUMA nodes, which might help maintain better NUMA balance.
>
> In the past, I encountered a NUMA balancing issue caused by the mlx5
> NIC, so using dev_to_node might be beneficial in addressing similar
> problems.
>
> Thanks,
> Zhu Yanjun
>
>>
>> Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on
>> reader NUMA node")
>> Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/
>> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> index 972e8e9df585..9bc9bd83c232 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>> @@ -291,7 +291,7 @@ static void free_4k(struct mlx5_core_dev *dev, u64
>> addr, u32 function)
>> static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
>> {
>> struct device *device = mlx5_core_dma_dev(dev);
>> - int nid = dev_to_node(device);
>> + int nid = dev->priv.numa_node;
>> struct page *page;
>> u64 zero_addr = 1;
>> u64 addr;
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA
2025-06-15 5:55 ` Moshe Shemesh
@ 2025-06-15 14:44 ` Zhu Yanjun
2025-06-19 16:31 ` Moshe Shemesh
0 siblings, 1 reply; 20+ messages in thread
From: Zhu Yanjun @ 2025-06-15 14:44 UTC (permalink / raw)
To: Moshe Shemesh, Mark Bloch, David S. Miller, Jakub Kicinski,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel
在 2025/6/14 22:55, Moshe Shemesh 写道:
>
>
> On 6/13/2025 7:22 PM, Zhu Yanjun wrote:
>> 在 2025/6/10 8:15, Mark Bloch 写道:
>>> From: Moshe Shemesh <moshe@nvidia.com>
>>>
>>> When firmware asks the driver to allocate more pages, using event of
>>> give_pages, the driver should always allocate it from same NUMA, the
>>> original device NUMA. Current code uses dev_to_node() which can result
>>> in different NUMA as it is changed by other driver flows, such as
>>> mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for
>>> allocating firmware pages.
>>
>> I'm not sure whether NUMA balancing is currently being considered or
>> not.
>>
>> If I understand correctly, after this commit is applied, all pages will
>> be allocated from the same NUMA node — specifically, the original
>> device's NUMA node. This seems like it could lead to NUMA imbalance.
>
> The change is applied only on pages allocated for FW use. Pages which
> are allocated for driver use as SQ/RQ/CQ/EQ etc, are not affected by
> this change.
>
> As for FW pages (allocated for FW use), we did mean to use only the
> device close NUMA, we are not looking for balance here. Even before
> the change, in most cases, FW pages are allocated from device close
> NUMA, the fix only ensures it.
Thanks a lot. I’m fine with your explanations.
In the past, I encountered a NUMA-balancing issue where memory
allocations were dependent on the mlx5 device. Specifically, memory was
allocated only from the NUMA node closest to the mlx5 device. As a
result, during the lifetime of the process, more than 100GB of memory
was allocated from that single NUMA node, while other NUMA nodes saw no
significant allocations. This led to a NUMA imbalance problem.
According to your commit, SQ/RQ/CQ/EQ are not affected—only the firmware
(FW) pages are. These FW pages include Memory Region (MR) and On-Demand
Paging (ODP) pages. ODP pages are freed after use, and the amount of MR
pages remains fixed throughout the process lifecycle. Therefore, in
theory, this commit should not cause any NUMA imbalance. However, since
production environments can be complex, I’ll monitor for any NUMA
balancing issues after this commit is deployed in production.
In short, I’m fine with both this commit and your explanations.
Thanks,
Yanjun.Zhu
>
>>
>> By using dev_to_node, it appears that pages could be allocated from
>> other NUMA nodes, which might help maintain better NUMA balance.
>>
>> In the past, I encountered a NUMA balancing issue caused by the mlx5
>> NIC, so using dev_to_node might be beneficial in addressing similar
>> problems.
>>
>> Thanks,
>> Zhu Yanjun
>>
>>>
>>> Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on
>>> reader NUMA node")
>>> Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
>>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>>> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
>>> ---
>>> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/
>>> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>>> index 972e8e9df585..9bc9bd83c232 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>>> @@ -291,7 +291,7 @@ static void free_4k(struct mlx5_core_dev *dev,
>>> u64 addr, u32 function)
>>> static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
>>> {
>>> struct device *device = mlx5_core_dma_dev(dev);
>>> - int nid = dev_to_node(device);
>>> + int nid = dev->priv.numa_node;
>>> struct page *page;
>>> u64 zero_addr = 1;
>>> u64 addr;
>>
>
--
Best Regards,
Yanjun.Zhu
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA
2025-06-15 14:44 ` Zhu Yanjun
@ 2025-06-19 16:31 ` Moshe Shemesh
0 siblings, 0 replies; 20+ messages in thread
From: Moshe Shemesh @ 2025-06-19 16:31 UTC (permalink / raw)
To: Zhu Yanjun, Mark Bloch, David S. Miller, Jakub Kicinski,
Paolo Abeni, Eric Dumazet, Andrew Lunn, Simon Horman
Cc: saeedm, gal, leonro, tariqt, Leon Romanovsky, netdev, linux-rdma,
linux-kernel
On 6/15/2025 5:44 PM, Zhu Yanjun wrote:
> External email: Use caution opening links or attachments
>
>
> 在 2025/6/14 22:55, Moshe Shemesh 写道:
>>
>>
>> On 6/13/2025 7:22 PM, Zhu Yanjun wrote:
>>> 在 2025/6/10 8:15, Mark Bloch 写道:
>>>> From: Moshe Shemesh <moshe@nvidia.com>
>>>>
>>>> When firmware asks the driver to allocate more pages, using event of
>>>> give_pages, the driver should always allocate it from same NUMA, the
>>>> original device NUMA. Current code uses dev_to_node() which can result
>>>> in different NUMA as it is changed by other driver flows, such as
>>>> mlx5_dma_zalloc_coherent_node(). Instead, use saved numa node for
>>>> allocating firmware pages.
>>>
>>> I'm not sure whether NUMA balancing is currently being considered or
>>> not.
>>>
>>> If I understand correctly, after this commit is applied, all pages will
>>> be allocated from the same NUMA node — specifically, the original
>>> device's NUMA node. This seems like it could lead to NUMA imbalance.
>>
>> The change is applied only on pages allocated for FW use. Pages which
>> are allocated for driver use as SQ/RQ/CQ/EQ etc, are not affected by
>> this change.
>>
>> As for FW pages (allocated for FW use), we did mean to use only the
>> device close NUMA, we are not looking for balance here. Even before
>> the change, in most cases, FW pages are allocated from device close
>> NUMA, the fix only ensures it.
>
> Thanks a lot. I’m fine with your explanations.
>
> In the past, I encountered a NUMA-balancing issue where memory
> allocations were dependent on the mlx5 device. Specifically, memory was
> allocated only from the NUMA node closest to the mlx5 device. As a
> result, during the lifetime of the process, more than 100GB of memory
> was allocated from that single NUMA node, while other NUMA nodes saw no
> significant allocations. This led to a NUMA imbalance problem.
>
> According to your commit, SQ/RQ/CQ/EQ are not affected—only the firmware
> (FW) pages are. These FW pages include Memory Region (MR) and On-Demand
> Paging (ODP) pages. ODP pages are freed after use, and the amount of MR
> pages remains fixed throughout the process lifecycle. Therefore, in
> theory, this commit should not cause any NUMA imbalance. However, since
> production environments can be complex, I’ll monitor for any NUMA
> balancing issues after this commit is deployed in production.
Thanks for monitoring it.
Just to clarify, this change does not affect also MR allocation. It
affects pages allocated for FW internal use, handling requests from FW
using give_pages() function and manage_pages command.
>
> In short, I’m fine with both this commit and your explanations.
>
Thanks,
Moshe.
> Thanks,
>
> Yanjun.Zhu
>
>>
>>>
>>> By using dev_to_node, it appears that pages could be allocated from
>>> other NUMA nodes, which might help maintain better NUMA balance.
>>>
>>> In the past, I encountered a NUMA balancing issue caused by the mlx5
>>> NIC, so using dev_to_node might be beneficial in addressing similar
>>> problems.
>>>
>>> Thanks,
>>> Zhu Yanjun
>>>
>>>>
>>>> Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on
>>>> reader NUMA node")
>>>> Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
>>>> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>>>> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
>>>> ---
>>>> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/
>>>> drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>>>> index 972e8e9df585..9bc9bd83c232 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
>>>> @@ -291,7 +291,7 @@ static void free_4k(struct mlx5_core_dev *dev,
>>>> u64 addr, u32 function)
>>>> static int alloc_system_page(struct mlx5_core_dev *dev, u32 function)
>>>> {
>>>> struct device *device = mlx5_core_dma_dev(dev);
>>>> - int nid = dev_to_node(device);
>>>> + int nid = dev->priv.numa_node;
>>>> struct page *page;
>>>> u64 zero_addr = 1;
>>>> u64 addr;
>>>
>>
> --
> Best Regards,
> Yanjun.Zhu
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-06-19 16:32 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-10 15:15 [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Mark Bloch
2025-06-10 15:15 ` [PATCH net 1/9] net/mlx5: Ensure fw pages are always allocated on same NUMA Mark Bloch
2025-06-13 16:22 ` Zhu Yanjun
2025-06-15 5:55 ` Moshe Shemesh
2025-06-15 14:44 ` Zhu Yanjun
2025-06-19 16:31 ` Moshe Shemesh
2025-06-10 15:15 ` [PATCH net 2/9] net/mlx5: Fix ECVF vports unload on shutdown flow Mark Bloch
2025-06-10 15:15 ` [PATCH net 3/9] net/mlx5: Fix return value when searching for existing flow group Mark Bloch
2025-06-10 15:15 ` [PATCH net 4/9] net/mlx5: HWS, Init mutex on the correct path Mark Bloch
2025-06-10 15:15 ` [PATCH net 5/9] net/mlx5: HWS, fix missing ip_version handling in definer Mark Bloch
2025-06-10 15:15 ` [PATCH net 6/9] net/mlx5: HWS, make sure the uplink is the last destination Mark Bloch
2025-06-10 15:15 ` [PATCH net 7/9] net/mlx5e: Properly access RCU protected qdisc_sleeping variable Mark Bloch
2025-06-11 21:40 ` Jakub Kicinski
2025-06-12 7:31 ` Mark Bloch
2025-06-12 14:22 ` Jakub Kicinski
2025-06-12 14:47 ` Mark Bloch
2025-06-10 15:15 ` [PATCH net 8/9] net/mlx5e: Fix leak of Geneve TLV option object Mark Bloch
2025-06-10 15:15 ` [PATCH net 9/9] net/mlx5e: Fix number of lanes to UNKNOWN when using data_rate_oper Mark Bloch
2025-06-11 21:43 ` [PATCH net 0/9] mlx5 misc fixes 2025-06-10 Jakub Kicinski
2025-06-11 21:50 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).