All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Gal Pressman <galpress@amazon.com>
Cc: Leon Romanovsky <leon@kernel.org>,
	Doug Ledford <dledford@redhat.com>,
	Leon Romanovsky <leonro@mellanox.com>,
	Adit Ranadive <aditr@vmware.com>,
	Ariel Elior <aelior@marvell.com>,
	Bernard Metzler <bmt@zurich.ibm.com>,
	Christian Benvenuti <benve@cisco.com>,
	Dennis Dalessandro <dennis.dalessandro@intel.com>,
	Devesh Sharma <devesh.sharma@broadcom.com>,
	Faisal Latif <faisal.latif@intel.com>,
	Lijun Ou <oulijun@huawei.com>, <linux-rdma@vger.kernel.org>,
	Michal Kalderon <mkalderon@marvell.com>,
	"Mike Marciniszyn" <mike.marciniszyn@intel.com>,
	Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>,
	Nelson Escobar <neescoba@cisco.com>,
	"Parvi Kaustubhi" <pkaustub@cisco.com>,
	Potnuri Bharat Teja <bharat@chelsio.com>,
	Selvin Xavier <selvin.xavier@broadcom.com>,
	Shiraz Saleem <shiraz.saleem@intel.com>,
	Somnath Kotur <somnath.kotur@broadcom.com>,
	Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>,
	VMware PV-Drivers <pv-drivers@vmware.com>,
	Weihang Li <liweihang@huawei.com>,
	"Wei Hu(Xavier)" <huwei87@hisilicon.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Zhu Yanjun <yanjunz@nvidia.com>
Subject: Re: [PATCH rdma-next 01/10] RDMA: Restore ability to fail on PD deallocate
Date: Tue, 25 Aug 2020 10:07:36 -0300	[thread overview]
Message-ID: <20200825130736.GQ1152540@nvidia.com> (raw)
In-Reply-To: <110cc351-f8f1-8f88-3912-c4dae711b393@amazon.com>

On Tue, Aug 25, 2020 at 03:12:07PM +0300, Gal Pressman wrote:
> On 25/08/2020 14:52, Jason Gunthorpe wrote:
> > On Tue, Aug 25, 2020 at 11:13:25AM +0300, Gal Pressman wrote:
> >> On 24/08/2020 13:32, Leon Romanovsky wrote:
> >>> diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
> >>> index 1889dd172a25..8547f9d543df 100644
> >>> +++ b/drivers/infiniband/hw/efa/efa.h
> >>> @@ -134,7 +134,7 @@ int efa_query_gid(struct ib_device *ibdev, u8 port, int index,
> >>>  int efa_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
> >>>  		   u16 *pkey);
> >>>  int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
> >>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
> >>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata);
> >>>  int efa_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata);
> >>>  struct ib_qp *efa_create_qp(struct ib_pd *ibpd,
> >>>  			    struct ib_qp_init_attr *init_attr,
> >>> diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
> >>> index 3f7f19b9f463..660a69943e02 100644
> >>> +++ b/drivers/infiniband/hw/efa/efa_verbs.c
> >>> @@ -383,13 +383,14 @@ int efa_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
> >>>  	return err;
> >>>  }
> >>>
> >>> -void efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
> >>> +int efa_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
> >>>  {
> >>>  	struct efa_dev *dev = to_edev(ibpd->device);
> >>>  	struct efa_pd *pd = to_epd(ibpd);
> >>>
> >>>  	ibdev_dbg(&dev->ibdev, "Dealloc pd[%d]\n", pd->pdn);
> >>>  	efa_pd_dealloc(dev, pd->pdn);
> >>> +	return 0;
> >>>  }
> >>
> >> Nice change, thanks Leon.
> >> At least for EFA, I prefer to return the return value of the destroy command
> >> instead of silently ignoring it (same for the other patches).
> > 
> > Drivers can't fail the destroy unless a future destroy will succeed.
> > it breaks everything to do that.
> 
> What does it break?

For uverbs it will go into an infinite loop in
uverbs_destroy_ufile_hw() if destroy doesn't eventually succeed.

For kernel it will trigger WARN_ON's and then a permanent memory leak.

> I agree that drivers shouldn't fail destroy commands, but you know.. bugs/errors
> happen (especially when dealing with hardware), and we have a way to propagate
> them, why do it for only some of the drivers?

There is no way to propogate them.

All destroy must eventually succeed.

> > If the chip fails a destroy when it should not then it has failed and
> > should be disabled at PCI and reset, continuing to free anyhow.
> 
> How do we reset the device when there are active apps using it?

The zap stuff revokes the BAR mmaping, it triggerst device fatal to
userspace and that is mostly it for userspace..

It is more complicated for kernel users

Jason

  parent reply	other threads:[~2020-08-25 13:08 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 10:32 [PATCH rdma-next 00/10] Restore failure of destroy commands Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 01/10] RDMA: Restore ability to fail on PD deallocate Leon Romanovsky
2020-08-25  8:13   ` Gal Pressman
2020-08-25  8:38     ` Leon Romanovsky
2020-08-25 11:52     ` Jason Gunthorpe
2020-08-25 12:12       ` Gal Pressman
2020-08-25 12:34         ` Leon Romanovsky
2020-08-25 13:07         ` Jason Gunthorpe [this message]
2020-08-25 13:32           ` Gal Pressman
2020-08-25 13:44             ` Jason Gunthorpe
2020-08-25 13:50               ` Jason Gunthorpe
2020-08-25 14:04               ` Gal Pressman
2020-08-25 14:32                 ` Jason Gunthorpe
2020-08-26  0:49                 ` Saleem, Shiraz
2020-08-26  6:34                   ` Leon Romanovsky
2020-08-26 11:40                   ` Jason Gunthorpe
2020-08-27  2:06                     ` Saleem, Shiraz
2020-08-27  6:56                       ` Leon Romanovsky
2020-08-27 23:30                         ` Saleem, Shiraz
2020-08-27 12:13                       ` Jason Gunthorpe
2020-08-27 23:29                         ` Saleem, Shiraz
2020-08-28 11:25                           ` Jason Gunthorpe
2020-08-24 10:32 ` [PATCH rdma-next 02/10] RDMA: Restore ability to fail on AH destroy Leon Romanovsky
2020-08-25  8:13   ` Gal Pressman
2020-08-25  8:32     ` Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 03/10] RDMA/mlx5: Issue FW command to destroy SRQ on reentry Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 04/10] RDMA/mlx5: Fix potential race between destroy and CQE poll Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 05/10] RDMA: Restore ability to fail on SRQ destroy Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 06/10] RDMA/core: Delete function indirection for alloc/free kernel CQ Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 07/10] RDMA: Allow fail of destroy CQ Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 08/10] RDMA: Change XRCD destroy return value Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 09/10] RDMA: Restore ability to return error for destroy WQ Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 10/10] RDMA: Make counters destroy symmetrical Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200825130736.GQ1152540@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=aditr@vmware.com \
    --cc=aelior@marvell.com \
    --cc=benve@cisco.com \
    --cc=bharat@chelsio.com \
    --cc=bmt@zurich.ibm.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=devesh.sharma@broadcom.com \
    --cc=dledford@redhat.com \
    --cc=faisal.latif@intel.com \
    --cc=galpress@amazon.com \
    --cc=huwei87@hisilicon.com \
    --cc=leon@kernel.org \
    --cc=leonro@mellanox.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=liweihang@huawei.com \
    --cc=mike.marciniszyn@intel.com \
    --cc=mkalderon@marvell.com \
    --cc=nareshkumar.pbs@broadcom.com \
    --cc=neescoba@cisco.com \
    --cc=oulijun@huawei.com \
    --cc=pkaustub@cisco.com \
    --cc=pv-drivers@vmware.com \
    --cc=selvin.xavier@broadcom.com \
    --cc=shiraz.saleem@intel.com \
    --cc=somnath.kotur@broadcom.com \
    --cc=sriharsha.basavapatna@broadcom.com \
    --cc=yanjunz@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.