From: Bob Pearson <rpearsonhpe@gmail.com>
To: Jason Gunthorpe <jgg@nvidia.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
Bernard Metzler <bmt@zurich.ibm.com>,
Bart Van Assche <bvanassche@acm.org>
Subject: IB_POLL_DIRECT
Date: Tue, 6 Jun 2023 15:54:25 -0500 [thread overview]
Message-ID: <ed01bad5-b63b-855c-b2da-d98718fa2b4d@gmail.com> (raw)
Jason,
Both the rxe driver and the siw driver running the blktests srp test suite exhibit failures on my machine
running the for-next branch. This has been true for months so I decided to try again to track it down.
After a lot of tracing, it looks like the problem is that the built in cq handling in core/cq.c is failing to
continue to process some completion queues.
The traffic is between the srp driver and the srpt driver. The srpt driver uses
cq = ib_cq_pool_get(..., IB_POLL_WORKQUEUE) and
the srp driver uses
cq = ib_alloc_cq(..., IB_POLL_SOFTIRQ) for receive cqs and
cq = ib_alloc_cq(..., IB_POLL_DIRECT) for send cqs.
AFAIK the poll workqueue and poll softirq cqs are working correctly but the poll direct cq sometimes
loses the thread and just stops processing those cqs. The test cases sometimes recover after about
a 2 second delay and start processing again and eventually fail after about a 10 second delay and
cleanup and go home.
The failures feel like a race or at least are timing sensitive. If you run the test suite several times
various test cases will sometimes succeed and sometimes fail. But they always fail in the same way.
Looking at the mlxn drivers for inspiration, I don't see anything specific about IB_POLL_DIRECT except
that they have a private version of send_queue_drain which also calls a cqe drain function which calls
ib_process_cq_direct() in a loop until the cq is drained. But this is only during qp tear down. (No other
verbs driver does this but as far as I know no other driver is passing blktests.) This is only done for
IB_POLL_DIRECT, so I wonder, is this required to use that correctly?
I am still figuring out how IB_POLL_DIRECT works. It doesn't allow the driver to call cq->comp_handler so
I don't know how it figures out when there are new wcs to process.
Any ideas would be really helpful.
Bob
next reply other threads:[~2023-06-06 20:54 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-06 20:54 Bob Pearson [this message]
2023-06-07 0:21 ` IB_POLL_DIRECT Jason Gunthorpe
2023-06-07 14:54 ` IB_POLL_DIRECT Bob Pearson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed01bad5-b63b-855c-b2da-d98718fa2b4d@gmail.com \
--to=rpearsonhpe@gmail.com \
--cc=bmt@zurich.ibm.com \
--cc=bvanassche@acm.org \
--cc=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.