All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: "Saleem, Shiraz" <shiraz.saleem@intel.com>
Cc: Gal Pressman <galpress@amazon.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Doug Ledford <dledford@redhat.com>,
	Adit Ranadive <aditr@vmware.com>,
	Ariel Elior <aelior@marvell.com>,
	Bernard Metzler <bmt@zurich.ibm.com>,
	Christian Benvenuti <benve@cisco.com>,
	"Dalessandro, Dennis" <dennis.dalessandro@intel.com>,
	Devesh Sharma <devesh.sharma@broadcom.com>,
	"Latif, Faisal" <faisal.latif@intel.com>,
	Lijun Ou <oulijun@huawei.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	Michal Kalderon <mkalderon@marvell.com>,
	"Marciniszyn, Mike" <mike.marciniszyn@intel.com>,
	Naresh Kumar PBS <nareshkumar.pbs@broadcom.com>,
	Nelson Escobar <neescoba@cisco.com>,
	Parvi Kaustubhi <pkaustub@cisco.com>,
	Potnuri Bharat Teja <bharat@chelsio.com>,
	Selvin Xavier <selvin.xavier@broadcom.com>,
	Somnath Kotur <somnath.kotur@broadcom.com>,
	Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>,
	VMware PV-Drivers <pv-drivers@vmware.com>,
	Weihang Li <liweihang@huawei.com>,
	"Wei Hu(Xavier)" <huwei87@hisilicon.com>,
	Yishai Hadas <yishaih@nvidia.com>,
	Zhu Yanjun <yanjunz@nvidia.com>
Subject: Re: [PATCH rdma-next 01/10] RDMA: Restore ability to fail on PD deallocate
Date: Wed, 26 Aug 2020 09:34:11 +0300	[thread overview]
Message-ID: <20200826063411.GF1362631@unreal> (raw)
In-Reply-To: <9DD61F30A802C4429A01CA4200E302A70106634FB5@fmsmsx124.amr.corp.intel.com>

On Wed, Aug 26, 2020 at 12:49:03AM +0000, Saleem, Shiraz wrote:
> > Subject: Re: [PATCH rdma-next 01/10] RDMA: Restore ability to fail on PD
> > deallocate
> >
> > On 25/08/2020 16:44, Jason Gunthorpe wrote:
> > > On Tue, Aug 25, 2020 at 04:32:57PM +0300, Gal Pressman wrote:
> > >>> For uverbs it will go into an infinite loop in
> > >>> uverbs_destroy_ufile_hw() if destroy doesn't eventually succeed.
> > >>
> > >> The code breaks the loop in such cases, why infinite loop?
> > >
> > > Oh, that is a bug, it should WARN_ON when that happens, because the
> > > driver has triggered a permanent memory leak.
> >
> > Well, a WARN_ON won't do much good if you're stuck in an infinite loop :), the
> > break is definitely needed there.
> >
> > >>> For kernel it will trigger WARN_ON's and then a permanent memory leak.
> > >>>
> > >>>> I agree that drivers shouldn't fail destroy commands, but you
> > >>>> know.. bugs/errors happen (especially when dealing with hardware),
> > >>>> and we have a way to propagate them, why do it for only some of the
> > drivers?
> > >>>
> > >>> There is no way to propogate them.
> > >>>
> > >>> All destroy must eventually succeed.
> > >>
> > >> There is no way to propagate them on process cleanup, but the destroy
> > >> verbs have a return code all the way back to libibverbs, which we can
> > >> use for error propagation.
> > >
> > > It is sort of OK for a driver to fail during RDMA_REMOVE_DESTROY.
> > >
> > > All other reason codes must eventually succeed.
> > >
> > >> The cleanup flow can either ignore the return value, or we can add
> > >> another parameter that explicitly means the call shouldn't fail and
> > >> all allocated memory/state should be freed.
> > >
> > > I don't really see the value to return the error code to userspace, it
> > > would require churning all the drivers and all the destroy functions
> > > to pass the existing reason in.
> > >
> > > Since all the details of the FW failure reason are lost to some EINVAL
> > > (or already logged to dmesg) I don't see much point.
> >
> > Right, as always, the error code would probably not contain much information, but
> > there's a big difference between returning error code X/Y vs returning success
> > instead of an error. To me that just feels wrong, at least in cases where we can
> > prevent that.
> >
>
> The API is quite confusing now. If drivers are not expected to fail the destroy
> and there is no way to propagate the device failures, then the return type should be a void.
>
> Do we really want to have mixed/ambiguous definition of the API to support the quirks of one type of device?

This is in-kernel API and it can be imperfect, because we are not
bounded to not-break-userspace rule.

I don't like the current situation either, just don't know how to
solve it differently.

Thanks

>
> Shiraz
>

  reply	other threads:[~2020-08-26  6:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 10:32 [PATCH rdma-next 00/10] Restore failure of destroy commands Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 01/10] RDMA: Restore ability to fail on PD deallocate Leon Romanovsky
2020-08-25  8:13   ` Gal Pressman
2020-08-25  8:38     ` Leon Romanovsky
2020-08-25 11:52     ` Jason Gunthorpe
2020-08-25 12:12       ` Gal Pressman
2020-08-25 12:34         ` Leon Romanovsky
2020-08-25 13:07         ` Jason Gunthorpe
2020-08-25 13:32           ` Gal Pressman
2020-08-25 13:44             ` Jason Gunthorpe
2020-08-25 13:50               ` Jason Gunthorpe
2020-08-25 14:04               ` Gal Pressman
2020-08-25 14:32                 ` Jason Gunthorpe
2020-08-26  0:49                 ` Saleem, Shiraz
2020-08-26  6:34                   ` Leon Romanovsky [this message]
2020-08-26 11:40                   ` Jason Gunthorpe
2020-08-27  2:06                     ` Saleem, Shiraz
2020-08-27  6:56                       ` Leon Romanovsky
2020-08-27 23:30                         ` Saleem, Shiraz
2020-08-27 12:13                       ` Jason Gunthorpe
2020-08-27 23:29                         ` Saleem, Shiraz
2020-08-28 11:25                           ` Jason Gunthorpe
2020-08-24 10:32 ` [PATCH rdma-next 02/10] RDMA: Restore ability to fail on AH destroy Leon Romanovsky
2020-08-25  8:13   ` Gal Pressman
2020-08-25  8:32     ` Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 03/10] RDMA/mlx5: Issue FW command to destroy SRQ on reentry Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 04/10] RDMA/mlx5: Fix potential race between destroy and CQE poll Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 05/10] RDMA: Restore ability to fail on SRQ destroy Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 06/10] RDMA/core: Delete function indirection for alloc/free kernel CQ Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 07/10] RDMA: Allow fail of destroy CQ Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 08/10] RDMA: Change XRCD destroy return value Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 09/10] RDMA: Restore ability to return error for destroy WQ Leon Romanovsky
2020-08-24 10:32 ` [PATCH rdma-next 10/10] RDMA: Make counters destroy symmetrical Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200826063411.GF1362631@unreal \
    --to=leon@kernel.org \
    --cc=aditr@vmware.com \
    --cc=aelior@marvell.com \
    --cc=benve@cisco.com \
    --cc=bharat@chelsio.com \
    --cc=bmt@zurich.ibm.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=devesh.sharma@broadcom.com \
    --cc=dledford@redhat.com \
    --cc=faisal.latif@intel.com \
    --cc=galpress@amazon.com \
    --cc=huwei87@hisilicon.com \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=liweihang@huawei.com \
    --cc=mike.marciniszyn@intel.com \
    --cc=mkalderon@marvell.com \
    --cc=nareshkumar.pbs@broadcom.com \
    --cc=neescoba@cisco.com \
    --cc=oulijun@huawei.com \
    --cc=pkaustub@cisco.com \
    --cc=pv-drivers@vmware.com \
    --cc=selvin.xavier@broadcom.com \
    --cc=shiraz.saleem@intel.com \
    --cc=somnath.kotur@broadcom.com \
    --cc=sriharsha.basavapatna@broadcom.com \
    --cc=yanjunz@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.