public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-next v2 0/9] Restore failure of destroy commands
@ 2020-09-07 12:09 Leon Romanovsky
  2020-09-09 18:06 ` Jason Gunthorpe
  0 siblings, 1 reply; 3+ messages in thread
From: Leon Romanovsky @ 2020-09-07 12:09 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Adit Ranadive, Ariel Elior, Bernard Metzler,
	Christian Benvenuti, Dennis Dalessandro, Devesh Sharma,
	Faisal Latif, Gal Pressman, Leon Romanovsky, Lijun Ou,
	linux-kernel, linux-rdma, Michal Kalderon, Mike Marciniszyn,
	Naresh Kumar PBS, Nelson Escobar, Parvi Kaustubhi,
	Potnuri Bharat Teja, Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier), Yishai Hadas, Yuval Shaia, Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

Changelog:
v2:
 * Rebased on top of the 524d8ffd07f0
 * Removed "udata" check in destroy flows
 * Changed ib_free_cq to return early
 * Used Jason's suggestion to implement "RDMA/mlx5: Issue FW command to destroy
   SRQ on reentry" patch.
v1
 * Changed returned value in efa_destroy_ah() from EINVAL to EOPNOTSUPP
 * https://lore.kernel.org/lkml/20200830084010.102381-1-leon@kernel.org
v0:
 * https://lore.kernel.org/lkml/20200824103247.1088464-1-leon@kernel.org

-----------------------------------------------------------------------------
Hi,

This series restores the ability to fail on destroy commands, due to the
fact that mlx5_ib DEVX implementation interleaved ib_core objects
with FW objects without sharing reference counters.

In retrospect, every part of the mlx5_ib flow is correct.

It started from IBTA which was written by HW engineers with HW in mind and
they allowed to fail in destruction. FW implemented it with symmetrical
interface like any other command and propagated error back to the kernel,
which forwarded it to the libibverbs and kernel ULPs.

Libibverbs was designed with IBTA spec in hand putting destroy errors in
stone. Up till mlx5_ib DEVX, it worked well, because the IB verbs objects
are counted by the kernel and ib_core ensures that FW destroy will success
by managing various reference counters on such objects.

The extension of the mlx5 driver changed this flow when allowed DEVX objects
that are not managed by ib_core to be interleaved with the ones under ib_core
responsibility.

The drivers that want to implement DEVX flows must ensure that FW/HW
destroys are performed as early as possible before any other internal
cleanup. After HW destroys, drivers are not allowed to fail.

This series includes two patches (WQ and "potential race") that will
require extra work in mlx5_ib, they both theoretical. WQ is not in use
in DEVX, but is needed to make interface symmetrical to other objects.
"Potential race" is in ULP flow that ensures that SRQ is destroyed in
proper order.

Thanks

Leon Romanovsky (9):
  RDMA: Restore ability to fail on PD deallocate
  RDMA: Restore ability to fail on AH destroy
  RDMA/mlx5: Issue FW command to destroy SRQ on reentry
  RDMA: Restore ability to fail on SRQ destroy
  RDMA/core: Delete function indirection for alloc/free kernel CQ
  RDMA: Allow fail of destroy CQ
  RDMA: Change XRCD destroy return value
  RDMA: Restore ability to return error for destroy WQ
  RDMA: Make counters destroy symmetrical

 drivers/infiniband/core/cq.c                  |  30 ++---
 drivers/infiniband/core/uverbs_std_types.c    |   3 +-
 .../core/uverbs_std_types_counters.c          |   4 +-
 drivers/infiniband/core/uverbs_std_types_wq.c |   2 +-
 drivers/infiniband/core/verbs.c               |  56 +++++++---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c      |  12 +-
 drivers/infiniband/hw/bnxt_re/ib_verbs.h      |   8 +-
 drivers/infiniband/hw/cxgb4/cq.c              |   3 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h        |   4 +-
 drivers/infiniband/hw/cxgb4/provider.c        |   3 +-
 drivers/infiniband/hw/cxgb4/qp.c              |   3 +-
 drivers/infiniband/hw/efa/efa.h               |   6 +-
 drivers/infiniband/hw/efa/efa_verbs.c         |  11 +-
 drivers/infiniband/hw/hns/hns_roce_ah.c       |   5 -
 drivers/infiniband/hw/hns/hns_roce_cq.c       |   3 +-
 drivers/infiniband/hw/hns/hns_roce_device.h   |  13 ++-
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c    |   3 +-
 drivers/infiniband/hw/hns/hns_roce_pd.c       |   3 +-
 drivers/infiniband/hw/hns/hns_roce_srq.c      |   3 +-
 drivers/infiniband/hw/i40iw/i40iw_verbs.c     |   6 +-
 drivers/infiniband/hw/mlx4/ah.c               |   5 -
 drivers/infiniband/hw/mlx4/cq.c               |   3 +-
 drivers/infiniband/hw/mlx4/main.c             |   6 +-
 drivers/infiniband/hw/mlx4/mlx4_ib.h          |  11 +-
 drivers/infiniband/hw/mlx4/qp.c               |   3 +-
 drivers/infiniband/hw/mlx4/srq.c              |   3 +-
 drivers/infiniband/hw/mlx5/ah.c               |   5 -
 drivers/infiniband/hw/mlx5/cmd.c              |   4 +-
 drivers/infiniband/hw/mlx5/cmd.h              |   2 +-
 drivers/infiniband/hw/mlx5/counters.c         |   3 +-
 drivers/infiniband/hw/mlx5/cq.c               |  16 ++-
 drivers/infiniband/hw/mlx5/main.c             |   4 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |  13 ++-
 drivers/infiniband/hw/mlx5/qp.c               |  12 +-
 drivers/infiniband/hw/mlx5/qp.h               |   4 +-
 drivers/infiniband/hw/mlx5/qpc.c              |   5 +-
 drivers/infiniband/hw/mlx5/srq.c              |  26 ++---
 drivers/infiniband/hw/mlx5/srq.h              |   2 +-
 drivers/infiniband/hw/mlx5/srq_cmd.c          |  22 +++-
 drivers/infiniband/hw/mthca/mthca_provider.c  |  12 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c      |   3 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.h      |   2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.c   |  11 +-
 drivers/infiniband/hw/ocrdma/ocrdma_verbs.h   |   6 +-
 drivers/infiniband/hw/qedr/verbs.c            |  14 ++-
 drivers/infiniband/hw/qedr/verbs.h            |   8 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.c  |   7 +-
 drivers/infiniband/hw/usnic/usnic_ib_verbs.h  |   4 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c  |   3 +-
 drivers/infiniband/hw/vmw_pvrdma/pvrdma_srq.c |   3 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.c   |   8 +-
 .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.h   |   8 +-
 drivers/infiniband/sw/rdmavt/ah.c             |   3 +-
 drivers/infiniband/sw/rdmavt/ah.h             |   2 +-
 drivers/infiniband/sw/rdmavt/cq.c             |   3 +-
 drivers/infiniband/sw/rdmavt/cq.h             |   2 +-
 drivers/infiniband/sw/rdmavt/pd.c             |   3 +-
 drivers/infiniband/sw/rdmavt/pd.h             |   2 +-
 drivers/infiniband/sw/rdmavt/srq.c            |   3 +-
 drivers/infiniband/sw/rdmavt/srq.h            |   2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c         |  12 +-
 drivers/infiniband/sw/siw/siw_verbs.c         |   9 +-
 drivers/infiniband/sw/siw/siw_verbs.h         |   6 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c       |   6 +-
 include/rdma/ib_verbs.h                       | 105 +++++-------------
 65 files changed, 308 insertions(+), 269 deletions(-)

--
2.26.2


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH rdma-next v2 0/9] Restore failure of destroy commands
  2020-09-07 12:09 [PATCH rdma-next v2 0/9] Restore failure of destroy commands Leon Romanovsky
@ 2020-09-09 18:06 ` Jason Gunthorpe
  2020-09-10 12:24   ` Leon Romanovsky
  0 siblings, 1 reply; 3+ messages in thread
From: Jason Gunthorpe @ 2020-09-09 18:06 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Adit Ranadive, Ariel Elior,
	Bernard Metzler, Christian Benvenuti, Dennis Dalessandro,
	Devesh Sharma, Faisal Latif, Gal Pressman, Leon Romanovsky,
	Lijun Ou, linux-kernel, linux-rdma, Michal Kalderon,
	Mike Marciniszyn, Naresh Kumar PBS, Nelson Escobar,
	Parvi Kaustubhi, Potnuri Bharat Teja, Selvin Xavier,
	Shiraz Saleem, Somnath Kotur, Sriharsha Basavapatna,
	VMware PV-Drivers, Weihang Li, Wei Hu(Xavier), Yishai Hadas,
	Yuval Shaia, Zhu Yanjun

On Mon, Sep 07, 2020 at 03:09:12PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Changelog:
> v2:
>  * Rebased on top of the 524d8ffd07f0
>  * Removed "udata" check in destroy flows
>  * Changed ib_free_cq to return early
>  * Used Jason's suggestion to implement "RDMA/mlx5: Issue FW command to destroy
>    SRQ on reentry" patch.
> v1
>  * Changed returned value in efa_destroy_ah() from EINVAL to EOPNOTSUPP
>  * https://lore.kernel.org/lkml/20200830084010.102381-1-leon@kernel.org
> v0:
>  * https://lore.kernel.org/lkml/20200824103247.1088464-1-leon@kernel.org
> 
> Hi,
> 
> This series restores the ability to fail on destroy commands, due to the
> fact that mlx5_ib DEVX implementation interleaved ib_core objects
> with FW objects without sharing reference counters.
> 
> In retrospect, every part of the mlx5_ib flow is correct.
> 
> It started from IBTA which was written by HW engineers with HW in mind and
> they allowed to fail in destruction. FW implemented it with symmetrical
> interface like any other command and propagated error back to the kernel,
> which forwarded it to the libibverbs and kernel ULPs.
> 
> Libibverbs was designed with IBTA spec in hand putting destroy errors in
> stone. Up till mlx5_ib DEVX, it worked well, because the IB verbs objects
> are counted by the kernel and ib_core ensures that FW destroy will success
> by managing various reference counters on such objects.
> 
> The extension of the mlx5 driver changed this flow when allowed DEVX objects
> that are not managed by ib_core to be interleaved with the ones under ib_core
> responsibility.
> 
> The drivers that want to implement DEVX flows must ensure that FW/HW
> destroys are performed as early as possible before any other internal
> cleanup. After HW destroys, drivers are not allowed to fail.
> 
> This series includes two patches (WQ and "potential race") that will
> require extra work in mlx5_ib, they both theoretical. WQ is not in use
> in DEVX, but is needed to make interface symmetrical to other objects.
> "Potential race" is in ULP flow that ensures that SRQ is destroyed in
> proper order.
> 
> Thanks
> 
> Leon Romanovsky (9):
>   RDMA: Restore ability to fail on PD deallocate
>   RDMA: Restore ability to fail on AH destroy
>   RDMA/mlx5: Issue FW command to destroy SRQ on reentry
>   RDMA: Restore ability to fail on SRQ destroy
>   RDMA/core: Delete function indirection for alloc/free kernel CQ
>   RDMA: Allow fail of destroy CQ
>   RDMA: Change XRCD destroy return value
>   RDMA: Restore ability to return error for destroy WQ
>   RDMA: Make counters destroy symmetrical

Thanks, applied to for-next with the changes I noted:

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index b2381e01bf6345..35e5bbb44d3d8e 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -1031,15 +1031,14 @@ int mlx5_ib_destroy_cq(struct ib_cq *cq, struct ib_udata *udata)
 	int ret;
 
 	ret = mlx5_core_destroy_cq(dev->mdev, &mcq->mcq);
-	if (ret && udata)
+	if (ret)
 		return ret;
 
-	if (udata) {
+	if (udata)
 		destroy_cq_user(mcq, udata);
-		return 0;
-	}
-	destroy_cq_kernel(dev, mcq);
-	return ret;
+	else
+		destroy_cq_kernel(dev, mcq);
+	return 0;
 }
 
 static int is_equal_rsn(struct mlx5_cqe64 *cqe64, u32 rsn)
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 039f55fd067640..6dfdc13bc36395 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -5092,11 +5092,11 @@ int mlx5_ib_destroy_wq(struct ib_wq *wq, struct ib_udata *udata)
 	int ret;
 
 	ret = mlx5_core_destroy_rq_tracked(dev, &rwq->core_qp);
-	if (ret && udata)
+	if (ret)
 		return ret;
 	destroy_user_rq(dev, wq->pd, rwq, udata);
 	kfree(rwq);
-	return ret;
+	return 0;
 }
 
 struct ib_rwq_ind_table *mlx5_ib_create_rwq_ind_table(struct ib_device *device,
diff --git a/drivers/infiniband/hw/mlx5/srq.c b/drivers/infiniband/hw/mlx5/srq.c
index 6789b8a6927467..e2f720eec1e18b 100644
--- a/drivers/infiniband/hw/mlx5/srq.c
+++ b/drivers/infiniband/hw/mlx5/srq.c
@@ -396,17 +396,14 @@ int mlx5_ib_destroy_srq(struct ib_srq *srq, struct ib_udata *udata)
 	int ret;
 
 	ret = mlx5_cmd_destroy_srq(dev, &msrq->msrq);
-	if (ret && udata)
+	if (ret)
 		return ret;
 
-	if (udata) {
+	if (udata)
 		destroy_srq_user(srq->pd, msrq, udata);
-		return 0;
-	}
-
-	/* We are cleaning kernel resources anyway */
-	destroy_srq_kernel(dev, msrq);
-	return ret;
+	else
+		destroy_srq_kernel(dev, msrq);
+	return 0;
 }
 
 void mlx5_ib_free_srq_wqe(struct mlx5_ib_srq *srq, int wqe_index)
diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c
index 1a707c2d364c1f..db889ec3fd48e8 100644
--- a/drivers/infiniband/hw/mlx5/srq_cmd.c
+++ b/drivers/infiniband/hw/mlx5/srq_cmd.c
@@ -598,7 +598,7 @@ int mlx5_cmd_destroy_srq(struct mlx5_ib_dev *dev, struct mlx5_core_srq *srq)
 
 	/* Delete entry, but leave index occupied */
 	tmp = xa_cmpxchg_irq(&table->array, srq->srqn, srq, XA_ZERO_ENTRY, 0);
-	if (WARN_ON(!tmp || tmp != srq) || xa_err(tmp))
+	if (WARN_ON(tmp != srq))
 		return xa_err(tmp) ?: -EINVAL;
 
 	err = destroy_srq_split(dev, srq);


Jason

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH rdma-next v2 0/9] Restore failure of destroy commands
  2020-09-09 18:06 ` Jason Gunthorpe
@ 2020-09-10 12:24   ` Leon Romanovsky
  0 siblings, 0 replies; 3+ messages in thread
From: Leon Romanovsky @ 2020-09-10 12:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Adit Ranadive, Ariel Elior, Bernard Metzler,
	Christian Benvenuti, Dennis Dalessandro, Devesh Sharma,
	Faisal Latif, Gal Pressman, Lijun Ou, linux-kernel, linux-rdma,
	Michal Kalderon, Mike Marciniszyn, Naresh Kumar PBS,
	Nelson Escobar, Parvi Kaustubhi, Potnuri Bharat Teja,
	Selvin Xavier, Shiraz Saleem, Somnath Kotur,
	Sriharsha Basavapatna, VMware PV-Drivers, Weihang Li,
	Wei Hu(Xavier), Yishai Hadas, Yuval Shaia, Zhu Yanjun

On Wed, Sep 09, 2020 at 03:06:07PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 07, 2020 at 03:09:12PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > Changelog:
> > v2:
> >  * Rebased on top of the 524d8ffd07f0
> >  * Removed "udata" check in destroy flows
> >  * Changed ib_free_cq to return early
> >  * Used Jason's suggestion to implement "RDMA/mlx5: Issue FW command to destroy
> >    SRQ on reentry" patch.
> > v1
> >  * Changed returned value in efa_destroy_ah() from EINVAL to EOPNOTSUPP
> >  * https://lore.kernel.org/lkml/20200830084010.102381-1-leon@kernel.org
> > v0:
> >  * https://lore.kernel.org/lkml/20200824103247.1088464-1-leon@kernel.org
> >
> > Hi,
> >
> > This series restores the ability to fail on destroy commands, due to the
> > fact that mlx5_ib DEVX implementation interleaved ib_core objects
> > with FW objects without sharing reference counters.
> >
> > In retrospect, every part of the mlx5_ib flow is correct.
> >
> > It started from IBTA which was written by HW engineers with HW in mind and
> > they allowed to fail in destruction. FW implemented it with symmetrical
> > interface like any other command and propagated error back to the kernel,
> > which forwarded it to the libibverbs and kernel ULPs.
> >
> > Libibverbs was designed with IBTA spec in hand putting destroy errors in
> > stone. Up till mlx5_ib DEVX, it worked well, because the IB verbs objects
> > are counted by the kernel and ib_core ensures that FW destroy will success
> > by managing various reference counters on such objects.
> >
> > The extension of the mlx5 driver changed this flow when allowed DEVX objects
> > that are not managed by ib_core to be interleaved with the ones under ib_core
> > responsibility.
> >
> > The drivers that want to implement DEVX flows must ensure that FW/HW
> > destroys are performed as early as possible before any other internal
> > cleanup. After HW destroys, drivers are not allowed to fail.
> >
> > This series includes two patches (WQ and "potential race") that will
> > require extra work in mlx5_ib, they both theoretical. WQ is not in use
> > in DEVX, but is needed to make interface symmetrical to other objects.
> > "Potential race" is in ULP flow that ensures that SRQ is destroyed in
> > proper order.
> >
> > Thanks
> >
> > Leon Romanovsky (9):
> >   RDMA: Restore ability to fail on PD deallocate
> >   RDMA: Restore ability to fail on AH destroy
> >   RDMA/mlx5: Issue FW command to destroy SRQ on reentry
> >   RDMA: Restore ability to fail on SRQ destroy
> >   RDMA/core: Delete function indirection for alloc/free kernel CQ
> >   RDMA: Allow fail of destroy CQ
> >   RDMA: Change XRCD destroy return value
> >   RDMA: Restore ability to return error for destroy WQ
> >   RDMA: Make counters destroy symmetrical
>
> Thanks, applied to for-next with the changes I noted:

Thanks for taking care. LGTM.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-09-10 12:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-07 12:09 [PATCH rdma-next v2 0/9] Restore failure of destroy commands Leon Romanovsky
2020-09-09 18:06 ` Jason Gunthorpe
2020-09-10 12:24   ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox