From: Leon Romanovsky <leon@kernel.org>
To: Saeed Mahameed <saeedm@nvidia.com>
Cc: Saeed Mahameed <saeed@kernel.org>,
Jason Gunthorpe <jgg@nvidia.com>,
Patrisious Haddad <phaddad@nvidia.com>,
Jakub Kicinski <kuba@kernel.org>,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
Paolo Abeni <pabeni@redhat.com>,
Yishai Hadas <yishaih@nvidia.com>
Subject: Re: [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction
Date: Sun, 10 Apr 2022 10:58:16 +0300 [thread overview]
Message-ID: <YlKOGMw7vemdPn/a@unreal> (raw)
In-Reply-To: <20220408193035.2uplgfjnfjo4s4f2@sx1>
On Fri, Apr 08, 2022 at 12:30:35PM -0700, Saeed Mahameed wrote:
> On 06 Apr 10:55, Leon Romanovsky wrote:
> > On Tue, Apr 05, 2022 at 12:48:45PM -0700, Saeed Mahameed wrote:
> > > On 05 Apr 11:12, Leon Romanovsky wrote:
> > > > From: Patrisious Haddad <phaddad@nvidia.com>
> > > >
> > > > Prior to this patch in the case that destroy_unmap_eq()
> > > > failed and was called again, it triggered an additional call of
> > >
> > > Where is it being failed and called again ? this shouldn't even be an
> > > option, we try to keep mlx5 symmetrical, constructors and destructors are
> > > supposed to be called only once in their respective positions.
> > > the callers must be fixed to avoid re-entry, or change destructors to clear
> > > up all resources even on failures, no matter what do not invent a reentry
> > > protocols to mlx5 destructors.
> >
> > It can happen when QP is exposed through DEVX interface. In that flow,
> > only FW knows about it and reference count all users. This means that
> > attempt to destroy such QP will fail, but mlx5_core is structured in
> > such way that all cleanup was done before calling to FW to get
> > success/fail response.
>
> I wasn't talking about destroy_qp, actually destroy_qp is implemented the
> way i am asking you to implement destroy_eq(); remove debugfs on first call
> to destroy EQ, and drop the reentry logic from from mlx5_eq_destroy_generic
> and destroy_async_eq.
>
> EQ is a core/mlx5_ib resources, it's not exposed to user nor DEVX, it
> shouldn't be subject to DEVX limitations.
I tend to agree with you. I'll take another look on it and resubmit.
>
> Also looking at the destroy_qp implementation, it removes the debugfs
> unconditionally even if the QP has ref count and removal will fail in FW.
> just FYI.
Right, we don't care about debugfs.
>
> For EQ I don't even understand why devx can cause ODP EQ removal to fail..
> you must fix this at mlx5_ib layer, but for this patch, please drop the
> re-entry and remove debugfs in destroy_eq, unconditionally.
The reason to complexity is not debugfs, but an existence of
"mlx5_frag_buf_free(dev, &eq->frag_buf);" line, after FW command is
executed.
We need to separate to two flows: the one that can tolerate FW cmd failures
and the one that can't. If you don't add "reentry" flag, you can (theoretically)
find yourself leaking ->frag_buf in the flows that don't know how to reentry.
I'll resubmit.
Thanks
next prev parent reply other threads:[~2022-04-10 7:58 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-05 8:12 [PATCH rdma-next 0/3] Handle FW failures to destroy QP/RQ objects Leon Romanovsky
2022-04-05 8:12 ` [PATCH mlx5-next 1/3] net/mlx5: Nullify eq->dbg and qp->dbg pointers post destruction Leon Romanovsky
2022-04-05 19:48 ` Saeed Mahameed
2022-04-06 7:55 ` Leon Romanovsky
2022-04-08 19:30 ` Saeed Mahameed
2022-04-10 7:58 ` Leon Romanovsky [this message]
2022-04-05 8:12 ` [PATCH rdma-next 2/3] RDMA/mlx5: Handling dct common resource destruction upon firmware failure Leon Romanovsky
2022-04-05 8:12 ` [PATCH rdma-next 3/3] RDMA/mlx5: Return the firmware result upon destroying QP/RQ Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YlKOGMw7vemdPn/a@unreal \
--to=leon@kernel.org \
--cc=jgg@nvidia.com \
--cc=kuba@kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=phaddad@nvidia.com \
--cc=saeed@kernel.org \
--cc=saeedm@nvidia.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).