All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Saeed Mahameed <saeedm@nvidia.com>
Cc: Saeed Mahameed <saeed@kernel.org>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Patrisious Haddad <phaddad@nvidia.com>,
	Jakub Kicinski <kuba@kernel.org>,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
	Paolo Abeni <pabeni@redhat.com>,
	Yishai Hadas <yishaih@nvidia.com>
Subject: Re: [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction
Date: Sun, 10 Apr 2022 10:58:16 +0300	[thread overview]
Message-ID: <YlKOGMw7vemdPn/a@unreal> (raw)
In-Reply-To: <20220408193035.2uplgfjnfjo4s4f2@sx1>

On Fri, Apr 08, 2022 at 12:30:35PM -0700, Saeed Mahameed wrote:
> On 06 Apr 10:55, Leon Romanovsky wrote:
> > On Tue, Apr 05, 2022 at 12:48:45PM -0700, Saeed Mahameed wrote:
> > > On 05 Apr 11:12, Leon Romanovsky wrote:
> > > > From: Patrisious Haddad <phaddad@nvidia.com>
> > > >
> > > > Prior to this patch in the case that destroy_unmap_eq()
> > > > failed and was called again, it triggered an additional call of
> > > 
> > > Where is it being failed and called again ? this shouldn't even be an
> > > option, we try to keep mlx5 symmetrical, constructors and destructors are
> > > supposed to be called only once in their respective positions.
> > > the callers must be fixed to avoid re-entry, or change destructors to clear
> > > up all resources even on failures, no matter what do not invent a reentry
> > > protocols to mlx5 destructors.
> > 
> > It can happen when QP is exposed through DEVX interface. In that flow,
> > only FW knows about it and reference count all users. This means that
> > attempt to destroy such QP will fail, but mlx5_core is structured in
> > such way that all cleanup was done before calling to FW to get
> > success/fail response.
> 
> I wasn't talking about destroy_qp, actually destroy_qp is implemented the
> way i am asking you to implement destroy_eq(); remove debugfs on first call
> to destroy EQ, and drop the reentry logic from from mlx5_eq_destroy_generic
> and destroy_async_eq.
> 
> EQ is a core/mlx5_ib resources, it's not exposed to user nor DEVX, it
> shouldn't be subject to DEVX limitations.

I tend to agree with you. I'll take another look on it and resubmit.

> 
> Also looking at the destroy_qp implementation, it removes the debugfs
> unconditionally even if the QP has ref count and removal will fail in FW.
> just FYI.

Right, we don't care about debugfs.

> 
> For EQ I don't even understand why devx can cause ODP EQ removal to fail..
> you must fix this at mlx5_ib layer, but for this patch, please drop the
> re-entry and remove debugfs in destroy_eq, unconditionally.

The reason to complexity is not debugfs, but an existence of
"mlx5_frag_buf_free(dev, &eq->frag_buf);" line, after FW command is
executed.

We need to separate to two flows: the one that can tolerate FW cmd failures
and the one that can't. If you don't add "reentry" flag, you can (theoretically)
find yourself leaking ->frag_buf in the flows that don't know how to reentry.

I'll resubmit.

Thanks

  reply	other threads:[~2022-04-10  7:58 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-05  8:12 [PATCH rdma-next 0/3] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
2022-04-05  8:12 ` [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction Leon Romanovsky
2022-04-05 19:48   ` Saeed Mahameed
2022-04-06  7:55     ` Leon Romanovsky
2022-04-08 19:30       ` Saeed Mahameed
2022-04-10  7:58         ` Leon Romanovsky [this message]
2022-04-05  8:12 ` [PATCH rdma-next 2/3] RDMA/mlx5: Handling dct common resource destruction upon firmware failure Leon Romanovsky
2022-04-05  8:12 ` [PATCH rdma-next 3/3] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YlKOGMw7vemdPn/a@unreal \
    --to=leon@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=phaddad@nvidia.com \
    --cc=saeed@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.