* [PATCH rdma-rc] RDMA/mlx5: Fix bind QP error cleanup flow
@ 2025-02-20 6:47 Leon Romanovsky
2025-02-20 12:23 ` Zhu Yanjun
2025-02-23 8:35 ` Leon Romanovsky
0 siblings, 2 replies; 4+ messages in thread
From: Leon Romanovsky @ 2025-02-20 6:47 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Patrisious Haddad, linux-rdma, Mark Zhang
From: Patrisious Haddad <phaddad@nvidia.com>
When there is a failure during bind QP, the cleanup flow destroys the
counter regardless if it is the one that created it or not, which is
problematic since if it isn't the one that created it, that counter could
still be in use.
Fix that by destroying the counter only if it was created during this call.
Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration")
Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/counters.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c
index 4f6c1968a2ee..81cfa74147a1 100644
--- a/drivers/infiniband/hw/mlx5/counters.c
+++ b/drivers/infiniband/hw/mlx5/counters.c
@@ -546,6 +546,7 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
struct ib_qp *qp)
{
struct mlx5_ib_dev *dev = to_mdev(qp->device);
+ bool new = false;
int err;
if (!counter->id) {
@@ -560,6 +561,7 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
return err;
counter->id =
MLX5_GET(alloc_q_counter_out, out, counter_set_id);
+ new = true;
}
err = mlx5_ib_qp_set_counter(qp, counter);
@@ -569,8 +571,10 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
return 0;
fail_set_counter:
- mlx5_ib_counter_dealloc(counter);
- counter->id = 0;
+ if (new) {
+ mlx5_ib_counter_dealloc(counter);
+ counter->id = 0;
+ }
return err;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH rdma-rc] RDMA/mlx5: Fix bind QP error cleanup flow
2025-02-20 6:47 [PATCH rdma-rc] RDMA/mlx5: Fix bind QP error cleanup flow Leon Romanovsky
@ 2025-02-20 12:23 ` Zhu Yanjun
2025-02-23 8:33 ` Leon Romanovsky
2025-02-23 8:35 ` Leon Romanovsky
1 sibling, 1 reply; 4+ messages in thread
From: Zhu Yanjun @ 2025-02-20 12:23 UTC (permalink / raw)
To: Leon Romanovsky, Jason Gunthorpe
Cc: Patrisious Haddad, linux-rdma, Mark Zhang
在 2025/2/20 7:47, Leon Romanovsky 写道:
> From: Patrisious Haddad <phaddad@nvidia.com>
>
> When there is a failure during bind QP, the cleanup flow destroys the
> counter regardless if it is the one that created it or not, which is
> problematic since if it isn't the one that created it, that counter could
> still be in use.
>
> Fix that by destroying the counter only if it was created during this call.
>
> Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration")
> Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
> Reviewed-by: Mark Zhang <markzhang@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> drivers/infiniband/hw/mlx5/counters.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c
> index 4f6c1968a2ee..81cfa74147a1 100644
> --- a/drivers/infiniband/hw/mlx5/counters.c
> +++ b/drivers/infiniband/hw/mlx5/counters.c
> @@ -546,6 +546,7 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
> struct ib_qp *qp)
> {
> struct mlx5_ib_dev *dev = to_mdev(qp->device);
> + bool new = false;
> int err;
>
> if (!counter->id) {
> @@ -560,6 +561,7 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
> return err;
> counter->id =
> MLX5_GET(alloc_q_counter_out, out, counter_set_id);
> + new = true;
It seems that there is no other better method except that a new bool
variable is used. IMO, this method can fix this problem.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Zhu Yanjun
> }
>
> err = mlx5_ib_qp_set_counter(qp, counter);
> @@ -569,8 +571,10 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
> return 0;
>
> fail_set_counter:
> - mlx5_ib_counter_dealloc(counter);
> - counter->id = 0;
> + if (new) {
> + mlx5_ib_counter_dealloc(counter);
> + counter->id = 0;
> + }
>
> return err;
> }
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH rdma-rc] RDMA/mlx5: Fix bind QP error cleanup flow
2025-02-20 12:23 ` Zhu Yanjun
@ 2025-02-23 8:33 ` Leon Romanovsky
0 siblings, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2025-02-23 8:33 UTC (permalink / raw)
To: Zhu Yanjun; +Cc: Jason Gunthorpe, Patrisious Haddad, linux-rdma, Mark Zhang
On Thu, Feb 20, 2025 at 01:23:33PM +0100, Zhu Yanjun wrote:
> 在 2025/2/20 7:47, Leon Romanovsky 写道:
> > From: Patrisious Haddad <phaddad@nvidia.com>
> >
> > When there is a failure during bind QP, the cleanup flow destroys the
> > counter regardless if it is the one that created it or not, which is
> > problematic since if it isn't the one that created it, that counter could
> > still be in use.
> >
> > Fix that by destroying the counter only if it was created during this call.
> >
> > Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration")
> > Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
> > Reviewed-by: Mark Zhang <markzhang@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
> > drivers/infiniband/hw/mlx5/counters.c | 8 ++++++--
> > 1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c
> > index 4f6c1968a2ee..81cfa74147a1 100644
> > --- a/drivers/infiniband/hw/mlx5/counters.c
> > +++ b/drivers/infiniband/hw/mlx5/counters.c
> > @@ -546,6 +546,7 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
> > struct ib_qp *qp)
> > {
> > struct mlx5_ib_dev *dev = to_mdev(qp->device);
> > + bool new = false;
> > int err;
> > if (!counter->id) {
> > @@ -560,6 +561,7 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
> > return err;
> > counter->id =
> > MLX5_GET(alloc_q_counter_out, out, counter_set_id);
> > + new = true;
> It seems that there is no other better method except that a new bool
> variable is used. IMO, this method can fix this problem.
>
> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Thanks
>
> Zhu Yanjun
>
> > }
> > err = mlx5_ib_qp_set_counter(qp, counter);
> > @@ -569,8 +571,10 @@ static int mlx5_ib_counter_bind_qp(struct rdma_counter *counter,
> > return 0;
> > fail_set_counter:
> > - mlx5_ib_counter_dealloc(counter);
> > - counter->id = 0;
> > + if (new) {
> > + mlx5_ib_counter_dealloc(counter);
> > + counter->id = 0;
> > + }
> > return err;
> > }
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH rdma-rc] RDMA/mlx5: Fix bind QP error cleanup flow
2025-02-20 6:47 [PATCH rdma-rc] RDMA/mlx5: Fix bind QP error cleanup flow Leon Romanovsky
2025-02-20 12:23 ` Zhu Yanjun
@ 2025-02-23 8:35 ` Leon Romanovsky
1 sibling, 0 replies; 4+ messages in thread
From: Leon Romanovsky @ 2025-02-23 8:35 UTC (permalink / raw)
To: Jason Gunthorpe, Leon Romanovsky
Cc: Patrisious Haddad, linux-rdma, Mark Zhang
On Thu, 20 Feb 2025 08:47:10 +0200, Leon Romanovsky wrote:
> When there is a failure during bind QP, the cleanup flow destroys the
> counter regardless if it is the one that created it or not, which is
> problematic since if it isn't the one that created it, that counter could
> still be in use.
>
> Fix that by destroying the counter only if it was created during this call.
>
> [...]
Applied, thanks!
[1/1] RDMA/mlx5: Fix bind QP error cleanup flow
https://git.kernel.org/rdma/rdma/c/e1a0bdbdfdf084
Best regards,
--
Leon Romanovsky <leon@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-02-23 8:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-20 6:47 [PATCH rdma-rc] RDMA/mlx5: Fix bind QP error cleanup flow Leon Romanovsky
2025-02-20 12:23 ` Zhu Yanjun
2025-02-23 8:33 ` Leon Romanovsky
2025-02-23 8:35 ` Leon Romanovsky
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox