public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Konstantin Taranov <kotaranov@microsoft.com>
Cc: Konstantin Taranov <kotaranov@linux.microsoft.com>,
	Wei Hu <weh@microsoft.com>,
	"sharmaajay@microsoft.com" <sharmaajay@microsoft.com>,
	Long Li <longli@microsoft.com>, "jgg@ziepe.ca" <jgg@ziepe.ca>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH rdma-next 1/1] RDMA/mana_ib: process QP error events
Date: Thu, 6 Jun 2024 11:01:36 +0300	[thread overview]
Message-ID: <20240606080136.GB13732@unreal> (raw)
In-Reply-To: <PAXPR83MB0559D385C1AD343A506C45E3B4F82@PAXPR83MB0559.EURPRD83.prod.outlook.com>

On Tue, Jun 04, 2024 at 02:13:39PM +0000, Konstantin Taranov wrote:
> > > +static void
> > > +mana_ib_event_handler(void *ctx, struct gdma_queue *q, struct
> > > +gdma_event *event) {
> > > +	struct mana_ib_dev *mdev = (struct mana_ib_dev *)ctx;
> > > +	struct mana_ib_qp *qp;
> > > +	struct ib_event ev;
> > > +	unsigned long flag;
> > > +	u32 qpn;
> > > +
> > > +	switch (event->type) {
> > > +	case GDMA_EQE_RNIC_QP_FATAL:
> > > +		qpn = event->details[0];
> > > +		xa_lock_irqsave(&mdev->qp_table_rq, flag);
> > > +		qp = xa_load(&mdev->qp_table_rq, qpn);
> > > +		if (qp)
> > > +			refcount_inc(&qp->refcount);
> > > +		xa_unlock_irqrestore(&mdev->qp_table_rq, flag);
> > > +		if (!qp)
> > > +			break;
> > > +		if (qp->ibqp.event_handler) {
> > > +			ev.device = qp->ibqp.device;
> > > +			ev.element.qp = &qp->ibqp;
> > > +			ev.event = IB_EVENT_QP_FATAL;
> > > +			qp->ibqp.event_handler(&ev, qp->ibqp.qp_context);
> > > +		}
> > > +		if (refcount_dec_and_test(&qp->refcount))
> > > +			complete(&qp->free);
> > > +		break;
> > > +	default:
> > > +		break;
> > > +	}
> > > +}
> > 
> > <...>
> > 
> > > @@ -620,6 +626,11 @@ static int mana_ib_destroy_rc_qp(struct
> > mana_ib_qp *qp, struct ib_udata *udata)
> > >  		container_of(qp->ibqp.device, struct mana_ib_dev, ib_dev);
> > >  	int i;
> > >
> > > +	xa_erase_irq(&mdev->qp_table_rq, qp->ibqp.qp_num);
> > > +	if (refcount_dec_and_test(&qp->refcount))
> > > +		complete(&qp->free);
> > > +	wait_for_completion(&qp->free);
> > 
> > This flow is unclear to me. You are destroying the QP and need to make sure
> > that mana_ib_event_handler is not running at that point and not mess with
> > refcount and complete.
> 
> Hi, Leon. Thanks for the concern. Here is the clarification:
> The flow is similar to what mlx5 does with mlx5_get_rsc and mlx5_core_put_rsc.
> When we get a QP resource, we increment the counter while holding the xa lock.
> So, when we destroy a QP, the code first removes the QP from the xa to ensure that nobody can get it.
> And then we check whether mana_ib_event_handler is holding it with refcount_dec_and_test.
> If the QP is held, then we wait for the mana_ib_event_handler to call complete.
> Once the wait returns, it is impossible to get the QP referenced, since it is not in the xa and all references have been removed.
> So now we can release the QP in HW, so the QPN can be assigned to new QPs.
> 
> Leon, have you spotted a bug? Or just wanted to understand the flow?

I understand the "general" flow, but think that implementation is very
convoluted here. In addition, mlx5 and other drivers make sure sure that
HW object is not free before they free it. They don't "mess" with ibqp,
and probably you should do the same.

Thanks

> Thanks
> 
> > 
> > Thanks

  reply	other threads:[~2024-06-06  8:01 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 11:36 [PATCH rdma-next 1/1] RDMA/mana_ib: process QP error events Konstantin Taranov
2024-06-04 12:08 ` Leon Romanovsky
2024-06-04 14:13   ` Konstantin Taranov
2024-06-06  8:01     ` Leon Romanovsky [this message]
2024-06-06  9:30       ` Konstantin Taranov
2024-06-06 10:50         ` Leon Romanovsky
2024-06-06 11:30           ` [EXTERNAL] " Konstantin Taranov
2024-06-05 22:42 ` Long Li
2024-06-06  8:49   ` Konstantin Taranov
2024-06-07  2:45     ` Long Li
2024-06-07  7:44       ` Konstantin Taranov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240606080136.GB13732@unreal \
    --to=leon@kernel.org \
    --cc=jgg@ziepe.ca \
    --cc=kotaranov@linux.microsoft.com \
    --cc=kotaranov@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=sharmaajay@microsoft.com \
    --cc=weh@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox