* [PATCH] MLX5: Fix semaphore leak on command timeout
@ 2025-05-09 6:48 Shawn.Shao
2025-05-11 12:52 ` Moshe Shemesh
0 siblings, 1 reply; 2+ messages in thread
From: Shawn.Shao @ 2025-05-09 6:48 UTC (permalink / raw)
To: saeedm, leon, tariqt, andrew+netdev, davem, edumazet, kuba,
pabeni, netdev, linux-rdma, linux-kernel
Cc: xiaowu.ding, Shawn Shao
From: Shawn Shao <shawn.shao@jaguarmicro.com>
Fixes a resource leak in the MLX5 driver when handling command timeouts.
The command entry reference count (`mlx5_cmd_work_ent`) was not properly
decremented during timeouts, causing the semaphore to remain unreleased.
In the current flow, the reference count is incremented but not decremented
in timeout cases. This prevents proper release of the semaphore.
Add a condition to decrement the reference count when a timeout occurs,
ensuring the semaphore is released and preventing resource leaks:
if (!forced || mlx5_cmd_is_down(dev)
||!opcode_allowed(cmd, ent->op)
|| ent->ret == -ETIMEDOUT)
cmd_ent_put(ent);
This ensures the semaphore is released properly on command timeouts.
Signed-off-by: Shawn Shao <shawn.shao@jaguarmicro.com>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index e53dbdc0a7a1..7f1f6345d90c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1714,7 +1714,8 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force
if (!forced || /* Real FW completion */
mlx5_cmd_is_down(dev) || /* No real FW completion is expected */
- !opcode_allowed(cmd, ent->op))
+ !opcode_allowed(cmd, ent->op) ||
+ ent->ret == -ETIMEDOUT)
cmd_ent_put(ent);
ent->ts2 = ktime_get_ns();
--
2.34.1
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH] MLX5: Fix semaphore leak on command timeout
2025-05-09 6:48 [PATCH] MLX5: Fix semaphore leak on command timeout Shawn.Shao
@ 2025-05-11 12:52 ` Moshe Shemesh
0 siblings, 0 replies; 2+ messages in thread
From: Moshe Shemesh @ 2025-05-11 12:52 UTC (permalink / raw)
To: Shawn.Shao, saeedm, leon, tariqt, andrew+netdev, davem, edumazet,
kuba, pabeni, netdev, linux-rdma, linux-kernel
Cc: xiaowu.ding
On 5/9/2025 9:48 AM, Shawn.Shao wrote:
> From: Shawn Shao <shawn.shao@jaguarmicro.com>
>
> Fixes a resource leak in the MLX5 driver when handling command timeouts.
> The command entry reference count (`mlx5_cmd_work_ent`) was not properly
> decremented during timeouts, causing the semaphore to remain unreleased.
>
> In the current flow, the reference count is incremented but not decremented
> in timeout cases. This prevents proper release of the semaphore.
>
> Add a condition to decrement the reference count when a timeout occurs,
> ensuring the semaphore is released and preventing resource leaks:
>
> if (!forced || mlx5_cmd_is_down(dev)
> ||!opcode_allowed(cmd, ent->op)
> || ent->ret == -ETIMEDOUT)
> cmd_ent_put(ent);
>
> This ensures the semaphore is released properly on command timeouts.
We can't release it on command timeout. The firmware may still write the
answer on the command slot memory, even if driver had timeout.
Note: few lines above in this code, there is a comment "only real
completion can free the cmd slot". There it will be released:
/* only real completion can free the cmd slot */
if (!forced) {
mlx5_core_err(dev, "Command completion arrived after timeout
(entry idx = %d).\n",
ent->idx);
cmd_ent_put(ent);
}
>
> Signed-off-by: Shawn Shao <shawn.shao@jaguarmicro.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> index e53dbdc0a7a1..7f1f6345d90c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> @@ -1714,7 +1714,8 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force
>
> if (!forced || /* Real FW completion */
> mlx5_cmd_is_down(dev) || /* No real FW completion is expected */
> - !opcode_allowed(cmd, ent->op))
> + !opcode_allowed(cmd, ent->op) ||
> + ent->ret == -ETIMEDOUT)
> cmd_ent_put(ent);
>
> ent->ts2 = ktime_get_ns();
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2025-05-11 12:52 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09 6:48 [PATCH] MLX5: Fix semaphore leak on command timeout Shawn.Shao
2025-05-11 12:52 ` Moshe Shemesh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox