Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: "linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	Bernard Metzler <bmt@zurich.ibm.com>,
	Bart Van Assche <bvanassche@acm.org>
Subject: Re: IB_POLL_DIRECT
Date: Tue, 6 Jun 2023 21:21:36 -0300	[thread overview]
Message-ID: <ZH/NkL//qx/oz6kZ@nvidia.com> (raw)
In-Reply-To: <ed01bad5-b63b-855c-b2da-d98718fa2b4d@gmail.com>

On Tue, Jun 06, 2023 at 03:54:25PM -0500, Bob Pearson wrote:
> AFAIK the poll workqueue and poll softirq cqs are working correctly but the poll direct cq sometimes
> loses the thread and just stops processing those cqs. The test cases sometimes recover after about
> a 2 second delay and start processing again and eventually fail after about a 10 second delay and
> cleanup and go home.

This sort of sounds like a race with re-arming?
 
> The failures feel like a race or at least are timing sensitive. If you run the test suite several times
> various test cases will sometimes succeed and sometimes fail. But they always fail in the same way.
> 
> Looking at the mlxn drivers for inspiration, I don't see anything specific about IB_POLL_DIRECT except
> that they have a private version of send_queue_drain which also calls a cqe drain function which calls
> ib_process_cq_direct() in a loop until the cq is drained. But this is only during qp tear down. (No other
> verbs driver does this but as far as I know no other driver is passing blktests.) This is only done for
> IB_POLL_DIRECT, so I wonder, is this required to use that correctly?
> 
> I am still figuring out how IB_POLL_DIRECT works. It doesn't allow the driver to call cq->comp_handler so
> I don't know how it figures out when there are new wcs to process.

IIRC POLL_DIRECT means you don't get completion interrutps and instead
the ULP has to occasionally call ib_process_cq_direct() which will
pull out the CQEs.

So you should look at how ib_process_cq_direct() is being called in
srp and presumably something about that logic is not calling it..

It kind of looks like SRP is using it to reap send completions when
the send queue progresses, so maybe your issue is that the sendq is
getting stuck?

Jason

  reply	other threads:[~2023-06-07  0:21 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-06 20:54 IB_POLL_DIRECT Bob Pearson
2023-06-07  0:21 ` Jason Gunthorpe [this message]
2023-06-07 14:54   ` IB_POLL_DIRECT Bob Pearson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZH/NkL//qx/oz6kZ@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=bmt@zurich.ibm.com \
    --cc=bvanassche@acm.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox