From: Leon Romanovsky <leon@kernel.org>
To: Konstantin Taranov <kotaranov@microsoft.com>
Cc: Konstantin Taranov <kotaranov@linux.microsoft.com>,
Wei Hu <weh@microsoft.com>,
"sharmaajay@microsoft.com" <sharmaajay@microsoft.com>,
Long Li <longli@microsoft.com>, "jgg@ziepe.ca" <jgg@ziepe.ca>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH rdma-next 1/1] RDMA/mana_ib: process QP error events
Date: Thu, 6 Jun 2024 13:50:33 +0300 [thread overview]
Message-ID: <20240606105033.GF13732@unreal> (raw)
In-Reply-To: <PAXPR83MB0559D46DC010185AD7F7D531B4FA2@PAXPR83MB0559.EURPRD83.prod.outlook.com>
On Thu, Jun 06, 2024 at 09:30:51AM +0000, Konstantin Taranov wrote:
> > > > > +static void
> > > > > +mana_ib_event_handler(void *ctx, struct gdma_queue *q, struct
> > > > > +gdma_event *event) {
> > > > > + struct mana_ib_dev *mdev = (struct mana_ib_dev *)ctx;
> > > > > + struct mana_ib_qp *qp;
> > > > > + struct ib_event ev;
> > > > > + unsigned long flag;
> > > > > + u32 qpn;
> > > > > +
> > > > > + switch (event->type) {
> > > > > + case GDMA_EQE_RNIC_QP_FATAL:
> > > > > + qpn = event->details[0];
> > > > > + xa_lock_irqsave(&mdev->qp_table_rq, flag);
> > > > > + qp = xa_load(&mdev->qp_table_rq, qpn);
> > > > > + if (qp)
> > > > > + refcount_inc(&qp->refcount);
> > > > > + xa_unlock_irqrestore(&mdev->qp_table_rq, flag);
> > > > > + if (!qp)
> > > > > + break;
> > > > > + if (qp->ibqp.event_handler) {
> > > > > + ev.device = qp->ibqp.device;
> > > > > + ev.element.qp = &qp->ibqp;
> > > > > + ev.event = IB_EVENT_QP_FATAL;
> > > > > + qp->ibqp.event_handler(&ev, qp->ibqp.qp_context);
> > > > > + }
> > > > > + if (refcount_dec_and_test(&qp->refcount))
> > > > > + complete(&qp->free);
> > > > > + break;
> > > > > + default:
> > > > > + break;
> > > > > + }
> > > > > +}
> > > >
> > > > <...>
> > > >
> > > > > @@ -620,6 +626,11 @@ static int mana_ib_destroy_rc_qp(struct
> > > > mana_ib_qp *qp, struct ib_udata *udata)
> > > > > container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev);
> > > > > int i;
> > > > >
> > > > > + xa_erase_irq(&mdev->qp_table_rq, qp->ibqp.qp_num);
> > > > > + if (refcount_dec_and_test(&qp->refcount))
> > > > > + complete(&qp->free);
> > > > > + wait_for_completion(&qp->free);
> > > >
> > > > This flow is unclear to me. You are destroying the QP and need to
> > > > make sure that mana_ib_event_handler is not running at that point
> > > > and not mess with refcount and complete.
> > >
> > > Hi, Leon. Thanks for the concern. Here is the clarification:
> > > The flow is similar to what mlx5 does with mlx5_get_rsc and
> > mlx5_core_put_rsc.
> > > When we get a QP resource, we increment the counter while holding the xa
> > lock.
> > > So, when we destroy a QP, the code first removes the QP from the xa to
> > ensure that nobody can get it.
> > > And then we check whether mana_ib_event_handler is holding it with
> > refcount_dec_and_test.
> > > If the QP is held, then we wait for the mana_ib_event_handler to call
> > complete.
> > > Once the wait returns, it is impossible to get the QP referenced, since it is
> > not in the xa and all references have been removed.
> > > So now we can release the QP in HW, so the QPN can be assigned to new
> > QPs.
> > >
> > > Leon, have you spotted a bug? Or just wanted to understand the flow?
> >
> > I understand the "general" flow, but think that implementation is very
> > convoluted here. In addition, mlx5 and other drivers make sure sure that HW
> > object is not free before they free it. They don't "mess" with ibqp, and
> > probably you should do the same.
>
> I can make the code more readable by introducing 4 helpers: add_qp_ref, get_qp_ref, put_qp_ref, destroy_qp_ref.
> Would it make the code less convoluted for you?
The thing is that you are trying to open-code part of restrack logic,
which already has xarray DB per-QPN, maybe you should extend it to support
your case, by adding some sort of barrier to prevent QP from being used.
>
> The devices are different. Mana can do EQE with QPN, which can be destroyed by OS. With that reference counting I make sure
> we do not destroy QP which is used in EQE processing. We still destroy the HW resource at same time as before the change.
> The xa table is just a lookup table, which says whether object can be referenced or not. The change just dictates that we first
> make a QP not referenceable.
>
> Note, that if we remove the QP from the table after we destroy it in HW, we can have a bug due to the collision in the xa table when
> at the same time another entity creates a QP. Since the QPN is released in HW, it will most likely given to the next create_qp (so mana, unlike mlx,
> does not assign QPNs with an increment and gives recently used QPN). So, the create_qp can fail as the QPN is still used in the xa.
>
> And what do you mean that "don't "mess" with ibqp"? Are you saying that we cannot process QP-related interrupts?
> What do you propose to change? In any case it will be the same logic:
> 1) remove from table
> 2) make sure that current IRQ does not hold a reference
> I use counters for that as most of other IB providers.
>
> I would also appreciate a list of changes to make this patch approved or confirmation that no changes are required.
I'm not asking to change anything at this point, just trying to see if
there is more general way to solve this problem, which exists in all
drivers now and in the future.
Thanks
> Thanks
>
> > Thanks
> >
> > > Thanks
> > >
> > > >
> > > > Thanks
next prev parent reply other threads:[~2024-06-06 10:50 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-04 11:36 [PATCH rdma-next 1/1] RDMA/mana_ib: process QP error events Konstantin Taranov
2024-06-04 12:08 ` Leon Romanovsky
2024-06-04 14:13 ` Konstantin Taranov
2024-06-06 8:01 ` Leon Romanovsky
2024-06-06 9:30 ` Konstantin Taranov
2024-06-06 10:50 ` Leon Romanovsky [this message]
2024-06-06 11:30 ` [EXTERNAL] " Konstantin Taranov
2024-06-05 22:42 ` Long Li
2024-06-06 8:49 ` Konstantin Taranov
2024-06-07 2:45 ` Long Li
2024-06-07 7:44 ` Konstantin Taranov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240606105033.GF13732@unreal \
--to=leon@kernel.org \
--cc=jgg@ziepe.ca \
--cc=kotaranov@linux.microsoft.com \
--cc=kotaranov@microsoft.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=longli@microsoft.com \
--cc=sharmaajay@microsoft.com \
--cc=weh@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox