public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] MLX5: Fix semaphore leak on command timeout
@ 2025-05-09  6:48 Shawn.Shao
  2025-05-11 12:52 ` Moshe Shemesh
  0 siblings, 1 reply; 2+ messages in thread
From: Shawn.Shao @ 2025-05-09  6:48 UTC (permalink / raw)
  To: saeedm, leon, tariqt, andrew+netdev, davem, edumazet, kuba,
	pabeni, netdev, linux-rdma, linux-kernel
  Cc: xiaowu.ding, Shawn Shao

From: Shawn Shao <shawn.shao@jaguarmicro.com>

Fixes a resource leak in the MLX5 driver when handling command timeouts.
The command entry reference count (`mlx5_cmd_work_ent`) was not properly
decremented during timeouts, causing the semaphore to remain unreleased.

In the current flow, the reference count is incremented but not decremented
in timeout cases. This prevents proper release of the semaphore.

Add a condition to decrement the reference count when a timeout occurs,
ensuring the semaphore is released and preventing resource leaks:

    if (!forced || mlx5_cmd_is_down(dev)
	    ||!opcode_allowed(cmd, ent->op)
	    || ent->ret == -ETIMEDOUT)
        cmd_ent_put(ent);

This ensures the semaphore is released properly on command timeouts.

Signed-off-by: Shawn Shao <shawn.shao@jaguarmicro.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index e53dbdc0a7a1..7f1f6345d90c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1714,7 +1714,8 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force
 
 			if (!forced || /* Real FW completion */
 			     mlx5_cmd_is_down(dev) || /* No real FW completion is expected */
-			     !opcode_allowed(cmd, ent->op))
+			     !opcode_allowed(cmd, ent->op) ||
+			     ent->ret == -ETIMEDOUT)
 				cmd_ent_put(ent);
 
 			ent->ts2 = ktime_get_ns();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] MLX5: Fix semaphore leak on command timeout
  2025-05-09  6:48 [PATCH] MLX5: Fix semaphore leak on command timeout Shawn.Shao
@ 2025-05-11 12:52 ` Moshe Shemesh
  0 siblings, 0 replies; 2+ messages in thread
From: Moshe Shemesh @ 2025-05-11 12:52 UTC (permalink / raw)
  To: Shawn.Shao, saeedm, leon, tariqt, andrew+netdev, davem, edumazet,
	kuba, pabeni, netdev, linux-rdma, linux-kernel
  Cc: xiaowu.ding



On 5/9/2025 9:48 AM, Shawn.Shao wrote:
> From: Shawn Shao <shawn.shao@jaguarmicro.com>
> 
> Fixes a resource leak in the MLX5 driver when handling command timeouts.
> The command entry reference count (`mlx5_cmd_work_ent`) was not properly
> decremented during timeouts, causing the semaphore to remain unreleased.
> 
> In the current flow, the reference count is incremented but not decremented
> in timeout cases. This prevents proper release of the semaphore.
> 
> Add a condition to decrement the reference count when a timeout occurs,
> ensuring the semaphore is released and preventing resource leaks:
> 
>      if (!forced || mlx5_cmd_is_down(dev)
> 	    ||!opcode_allowed(cmd, ent->op)
> 	    || ent->ret == -ETIMEDOUT)
>          cmd_ent_put(ent);
> 
> This ensures the semaphore is released properly on command timeouts.

We can't release it on command timeout. The firmware may still write the 
answer on the command slot memory, even if driver had timeout.

Note: few lines above in this code, there is a comment "only real 
completion can free the cmd slot". There it will be released:

/* only real completion can free the cmd slot */
if (!forced) {
         mlx5_core_err(dev, "Command completion arrived after timeout 
(entry idx = %d).\n",
                       ent->idx);
         cmd_ent_put(ent);
}


> 
> Signed-off-by: Shawn Shao <shawn.shao@jaguarmicro.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> index e53dbdc0a7a1..7f1f6345d90c 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
> @@ -1714,7 +1714,8 @@ static void mlx5_cmd_comp_handler(struct mlx5_core_dev *dev, u64 vec, bool force
>   
>   			if (!forced || /* Real FW completion */
>   			     mlx5_cmd_is_down(dev) || /* No real FW completion is expected */
> -			     !opcode_allowed(cmd, ent->op))
> +			     !opcode_allowed(cmd, ent->op) ||
> +			     ent->ret == -ETIMEDOUT)
>   				cmd_ent_put(ent);
>   
>   			ent->ts2 = ktime_get_ns();


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-05-11 12:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09  6:48 [PATCH] MLX5: Fix semaphore leak on command timeout Shawn.Shao
2025-05-11 12:52 ` Moshe Shemesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox