* [PATCH net 0/3] mlx5 misc fixes 2026-03-30
@ 2026-03-30 19:40 Tariq Toukan
2026-03-30 19:40 ` [PATCH net 1/3] net/mlx5: lag: Check for LAG device before creating debugfs Tariq Toukan
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Tariq Toukan @ 2026-03-30 19:40 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Shay Drory, Shay Agroskin, Saeed Mahameed, Jianbo Liu, netdev,
linux-rdma, linux-kernel, Gal Pressman
Hi,
This patchset provides misc bug fixes from the team to the mlx5
core driver.
Thanks,
Tariq.
Saeed Mahameed (2):
net/mlx5: Avoid "No data available" when FW version queries fail
net/mlx5: Fix switchdev mode rollback in case of failure
Shay Drory (1):
net/mlx5: lag: Check for LAG device before creating debugfs
.../net/ethernet/mellanox/mlx5/core/devlink.c | 4 +-
.../mellanox/mlx5/core/eswitch_offloads.c | 2 +
drivers/net/ethernet/mellanox/mlx5/core/fw.c | 53 ++++++++++++-------
.../ethernet/mellanox/mlx5/core/lag/debugfs.c | 3 ++
.../ethernet/mellanox/mlx5/core/mlx5_core.h | 4 +-
5 files changed, 42 insertions(+), 24 deletions(-)
base-commit: d9c2a509c96378d77435e5845561c4afd3eaedad
--
2.44.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net 1/3] net/mlx5: lag: Check for LAG device before creating debugfs
2026-03-30 19:40 [PATCH net 0/3] mlx5 misc fixes 2026-03-30 Tariq Toukan
@ 2026-03-30 19:40 ` Tariq Toukan
2026-03-30 19:40 ` [PATCH net 2/3] net/mlx5: Avoid "No data available" when FW version queries fail Tariq Toukan
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2026-03-30 19:40 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Shay Drory, Shay Agroskin, Saeed Mahameed, Jianbo Liu, netdev,
linux-rdma, linux-kernel, Gal Pressman
From: Shay Drory <shayd@nvidia.com>
__mlx5_lag_dev_add_mdev() may return 0 (success) even when an error
occurs that is handled gracefully. Consequently, the initialization
flow proceeds to call mlx5_ldev_add_debugfs() even when there is no
valid LAG context.
mlx5_ldev_add_debugfs() blindly created the debugfs directory and
attributes. This exposed interfaces (like the members file) that rely on
a valid ldev pointer, leading to potential NULL pointer dereferences if
accessed when ldev is NULL.
Add a check to verify that mlx5_lag_dev(dev) returns a valid pointer
before attempting to create the debugfs entries.
Fixes: 7f46a0b7327a ("net/mlx5: Lag, add debugfs to query hardware lag state")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c
index 62b6faa4276a..b8d5f6a44d26 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/debugfs.c
@@ -160,8 +160,11 @@ DEFINE_SHOW_ATTRIBUTE(members);
void mlx5_ldev_add_debugfs(struct mlx5_core_dev *dev)
{
+ struct mlx5_lag *ldev = mlx5_lag_dev(dev);
struct dentry *dbg;
+ if (!ldev)
+ return;
dbg = debugfs_create_dir("lag", mlx5_debugfs_get_dev_root(dev));
dev->priv.dbg.lag_debugfs = dbg;
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net 2/3] net/mlx5: Avoid "No data available" when FW version queries fail
2026-03-30 19:40 [PATCH net 0/3] mlx5 misc fixes 2026-03-30 Tariq Toukan
2026-03-30 19:40 ` [PATCH net 1/3] net/mlx5: lag: Check for LAG device before creating debugfs Tariq Toukan
@ 2026-03-30 19:40 ` Tariq Toukan
2026-03-30 19:40 ` [PATCH net 3/3] net/mlx5: Fix switchdev mode rollback in case of failure Tariq Toukan
2026-04-02 3:20 ` [PATCH net 0/3] mlx5 misc fixes 2026-03-30 patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2026-03-30 19:40 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Shay Drory, Shay Agroskin, Saeed Mahameed, Jianbo Liu, netdev,
linux-rdma, linux-kernel, Gal Pressman, Moshe Shemesh
From: Saeed Mahameed <saeedm@nvidia.com>
Avoid printing the misleading "kernel answers: No data available" devlink
output when querying firmware or pending firmware version fails
(e.g. MLX5 fw state errors / flash failures).
FW can fail on loading the pending flash image and get its version due
to various reasons, examples:
mlxfw: Firmware flash failed: key not applicable, err (7)
mlx5_fw_image_pending: can't read pending fw version while fw state is 1
and the resulting:
$ devlink dev info
kernel answers: No data available
Instead, just report 0 or 0xfff.. versions in case of failure to indicate
a problem, and let other information be shown.
after the fix:
$ devlink dev info
pci/0000:00:06.0:
driver mlx5_core
serial_number xxx...
board.serial_number MT2225300179
versions:
fixed:
fw.psid MT_0000000436
running:
fw.version 22.41.0188
fw 22.41.0188
stored:
fw.version 255.255.65535
fw 255.255.65535
Fixes: 9c86b07e3069 ("net/mlx5: Added fw version query command")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/devlink.c | 4 +-
drivers/net/ethernet/mellanox/mlx5/core/fw.c | 53 ++++++++++++-------
.../ethernet/mellanox/mlx5/core/mlx5_core.h | 4 +-
3 files changed, 37 insertions(+), 24 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 6698ac55a4bf..73cf0321bb86 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -107,9 +107,7 @@ mlx5_devlink_info_get(struct devlink *devlink, struct devlink_info_req *req,
if (err)
return err;
- err = mlx5_fw_version_query(dev, &running_fw, &stored_fw);
- if (err)
- return err;
+ mlx5_fw_version_query(dev, &running_fw, &stored_fw);
snprintf(version_str, sizeof(version_str), "%d.%d.%04d",
mlx5_fw_ver_major(running_fw), mlx5_fw_ver_minor(running_fw),
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
index eeb4437975f2..c1f220e5fe18 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c
@@ -822,48 +822,63 @@ mlx5_fw_image_pending(struct mlx5_core_dev *dev,
return 0;
}
-int mlx5_fw_version_query(struct mlx5_core_dev *dev,
- u32 *running_ver, u32 *pending_ver)
+void mlx5_fw_version_query(struct mlx5_core_dev *dev,
+ u32 *running_ver, u32 *pending_ver)
{
u32 reg_mcqi_version[MLX5_ST_SZ_DW(mcqi_version)] = {};
bool pending_version_exists;
int component_index;
int err;
+ *running_ver = 0;
+ *pending_ver = 0;
+
if (!MLX5_CAP_GEN(dev, mcam_reg) || !MLX5_CAP_MCAM_REG(dev, mcqi) ||
!MLX5_CAP_MCAM_REG(dev, mcqs)) {
mlx5_core_warn(dev, "fw query isn't supported by the FW\n");
- return -EOPNOTSUPP;
+ return;
}
component_index = mlx5_get_boot_img_component_index(dev);
- if (component_index < 0)
- return component_index;
+ if (component_index < 0) {
+ mlx5_core_warn(dev, "fw query failed to find boot img component index, err %d\n",
+ component_index);
+ return;
+ }
+ *running_ver = U32_MAX; /* indicate failure */
err = mlx5_reg_mcqi_version_query(dev, component_index,
MCQI_FW_RUNNING_VERSION,
reg_mcqi_version);
- if (err)
- return err;
-
- *running_ver = MLX5_GET(mcqi_version, reg_mcqi_version, version);
-
+ if (!err)
+ *running_ver = MLX5_GET(mcqi_version, reg_mcqi_version,
+ version);
+ else
+ mlx5_core_warn(dev, "failed to query running version, err %d\n",
+ err);
+
+ *pending_ver = U32_MAX; /* indicate failure */
err = mlx5_fw_image_pending(dev, component_index, &pending_version_exists);
- if (err)
- return err;
+ if (err) {
+ mlx5_core_warn(dev, "failed to query pending image, err %d\n",
+ err);
+ return;
+ }
if (!pending_version_exists) {
*pending_ver = 0;
- return 0;
+ return;
}
err = mlx5_reg_mcqi_version_query(dev, component_index,
MCQI_FW_STORED_VERSION,
reg_mcqi_version);
- if (err)
- return err;
-
- *pending_ver = MLX5_GET(mcqi_version, reg_mcqi_version, version);
-
- return 0;
+ if (!err)
+ *pending_ver = MLX5_GET(mcqi_version, reg_mcqi_version,
+ version);
+ else
+ mlx5_core_warn(dev, "failed to query pending version, err %d\n",
+ err);
+
+ return;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index b635b423d972..1507e881d962 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -393,8 +393,8 @@ int mlx5_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
int mlx5_firmware_flash(struct mlx5_core_dev *dev, const struct firmware *fw,
struct netlink_ext_ack *extack);
-int mlx5_fw_version_query(struct mlx5_core_dev *dev,
- u32 *running_ver, u32 *stored_ver);
+void mlx5_fw_version_query(struct mlx5_core_dev *dev, u32 *running_ver,
+ u32 *stored_ver);
#ifdef CONFIG_MLX5_CORE_EN
int mlx5e_init(void);
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net 3/3] net/mlx5: Fix switchdev mode rollback in case of failure
2026-03-30 19:40 [PATCH net 0/3] mlx5 misc fixes 2026-03-30 Tariq Toukan
2026-03-30 19:40 ` [PATCH net 1/3] net/mlx5: lag: Check for LAG device before creating debugfs Tariq Toukan
2026-03-30 19:40 ` [PATCH net 2/3] net/mlx5: Avoid "No data available" when FW version queries fail Tariq Toukan
@ 2026-03-30 19:40 ` Tariq Toukan
2026-04-02 3:20 ` [PATCH net 0/3] mlx5 misc fixes 2026-03-30 patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2026-03-30 19:40 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Shay Drory, Shay Agroskin, Saeed Mahameed, Jianbo Liu, netdev,
linux-rdma, linux-kernel, Gal Pressman
From: Saeed Mahameed <saeedm@nvidia.com>
If for some internal reason switchdev mode fails, we rollback to legacy
mode, before this patch, rollback will unregister the uplink netdev and
leave it unregistered causing the below kernel bug.
To fix this, we need to avoid netdev unregister by setting the proper
rollback flag 'MLX5_PRIV_FLAGS_SWITCH_LEGACY' to indicate legacy mode.
devlink (431) used greatest stack depth: 11048 bytes left
mlx5_core 0000:00:03.0: E-Switch: Disable: mode(LEGACY), nvfs(0), \
necvfs(0), active vports(0)
mlx5_core 0000:00:03.0: E-Switch: Supported tc chains and prios offload
mlx5_core 0000:00:03.0: Loading uplink representor for vport 65535
mlx5_core 0000:00:03.0: mlx5_cmd_out_err:816:(pid 456): \
QUERY_HCA_CAP(0x100) op_mod(0x0) failed, \
status bad parameter(0x3), syndrome (0x3a3846), err(-22)
mlx5_core 0000:00:03.0 enp0s3np0 (unregistered): Unloading uplink \
representor for vport 65535
------------[ cut here ]------------
kernel BUG at net/core/dev.c:12070!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 2 UID: 0 PID: 456 Comm: devlink Not tainted 6.16.0-rc3+ \
#9 PREEMPT(voluntary)
RIP: 0010:unregister_netdevice_many_notify+0x123/0xae0
...
Call Trace:
[ 90.923094] unregister_netdevice_queue+0xad/0xf0
[ 90.923323] unregister_netdev+0x1c/0x40
[ 90.923522] mlx5e_vport_rep_unload+0x61/0xc6
[ 90.923736] esw_offloads_enable+0x8e6/0x920
[ 90.923947] mlx5_eswitch_enable_locked+0x349/0x430
[ 90.924182] ? is_mp_supported+0x57/0xb0
[ 90.924376] mlx5_devlink_eswitch_mode_set+0x167/0x350
[ 90.924628] devlink_nl_eswitch_set_doit+0x6f/0xf0
[ 90.924862] genl_family_rcv_msg_doit+0xe8/0x140
[ 90.925088] genl_rcv_msg+0x18b/0x290
[ 90.925269] ? __pfx_devlink_nl_pre_doit+0x10/0x10
[ 90.925506] ? __pfx_devlink_nl_eswitch_set_doit+0x10/0x10
[ 90.925766] ? __pfx_devlink_nl_post_doit+0x10/0x10
[ 90.926001] ? __pfx_genl_rcv_msg+0x10/0x10
[ 90.926206] netlink_rcv_skb+0x52/0x100
[ 90.926393] genl_rcv+0x28/0x40
[ 90.926557] netlink_unicast+0x27d/0x3d0
[ 90.926749] netlink_sendmsg+0x1f7/0x430
[ 90.926942] __sys_sendto+0x213/0x220
[ 90.927127] ? __sys_recvmsg+0x6a/0xd0
[ 90.927312] __x64_sys_sendto+0x24/0x30
[ 90.927504] do_syscall_64+0x50/0x1c0
[ 90.927687] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 90.927929] RIP: 0033:0x7f7d0363e047
Fixes: 2a4f56fbcc47 ("net/mlx5e: Keep netdev when leave switchdev for devlink set legacy only")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 7a9ee36b8dca..01f6aecc4fcc 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -3761,6 +3761,8 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
return 0;
err_vports:
+ /* rollback to legacy, indicates don't unregister the uplink netdev */
+ esw->dev->priv.flags |= MLX5_PRIV_FLAGS_SWITCH_LEGACY;
mlx5_esw_offloads_rep_unload(esw, MLX5_VPORT_UPLINK);
err_uplink:
esw_offloads_steering_cleanup(esw);
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net 0/3] mlx5 misc fixes 2026-03-30
2026-03-30 19:40 [PATCH net 0/3] mlx5 misc fixes 2026-03-30 Tariq Toukan
` (2 preceding siblings ...)
2026-03-30 19:40 ` [PATCH net 3/3] net/mlx5: Fix switchdev mode rollback in case of failure Tariq Toukan
@ 2026-04-02 3:20 ` patchwork-bot+netdevbpf
3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-04-02 3:20 UTC (permalink / raw)
To: Tariq Toukan
Cc: edumazet, kuba, pabeni, andrew+netdev, davem, saeedm, leon,
mbloch, shayd, shayag, saeedm, jianbol, netdev, linux-rdma,
linux-kernel, gal
Hello:
This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Mon, 30 Mar 2026 22:40:12 +0300 you wrote:
> Hi,
>
> This patchset provides misc bug fixes from the team to the mlx5
> core driver.
>
> Thanks,
> Tariq.
>
> [...]
Here is the summary with links:
- [net,1/3] net/mlx5: lag: Check for LAG device before creating debugfs
https://git.kernel.org/netdev/net/c/bf16bca66536
- [net,2/3] net/mlx5: Avoid "No data available" when FW version queries fail
https://git.kernel.org/netdev/net/c/10dc35f6a443
- [net,3/3] net/mlx5: Fix switchdev mode rollback in case of failure
https://git.kernel.org/netdev/net/c/403186400a1a
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-02 3:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30 19:40 [PATCH net 0/3] mlx5 misc fixes 2026-03-30 Tariq Toukan
2026-03-30 19:40 ` [PATCH net 1/3] net/mlx5: lag: Check for LAG device before creating debugfs Tariq Toukan
2026-03-30 19:40 ` [PATCH net 2/3] net/mlx5: Avoid "No data available" when FW version queries fail Tariq Toukan
2026-03-30 19:40 ` [PATCH net 3/3] net/mlx5: Fix switchdev mode rollback in case of failure Tariq Toukan
2026-04-02 3:20 ` [PATCH net 0/3] mlx5 misc fixes 2026-03-30 patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox