From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
Devesh Sharma <devesh.sharma@avagotech.com>,
Sagi Grimberg <sagig@dev.mellanox.co.il>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 03/18] xprtrdma: Remove completion polling budgets
Date: Thu, 1 Oct 2015 12:15:48 -0600 [thread overview]
Message-ID: <20151001181548.GA8670@obsidianresearch.com> (raw)
In-Reply-To: <CDDBC387-9135-405E-B535-EE89F952386E@oracle.com>
On Thu, Oct 01, 2015 at 01:36:26PM -0400, Chuck Lever wrote:
> >> A missed WC will result in an RPC/RDMA transport deadlock. In
> >> fact that is the reason for this particular patch (although
> >> it addresses only one source of missed WCs). So I would like
> >> to see that there are no windows here.
> >
> > WCs are never missed.
>
> The review comment earlier in this thread suggested there is
> a race condition where a WC can be “delayed” resulting in,
> well, I’m still not certain what the consequences are.
Yes. The consequence would typically be lockup of CQ processing.
> > while (1) {
> > struct ib_wc wcs[100];
> > int rc = ib_poll_cq(cw, NELEMS(wcs), wcs);
> >
> > .. process rc wcs ..
> >
> > if (rc != NELEMS(wcs))
> > if (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
> > IB_CQ_REPORT_MISSED_EVENTS) == 0)
> > break;
> > }
> >
> > API wise, we should probably look at forcing
> > IB_CQ_REPORT_MISSED_EVENTS on and dropping the flag.
>
> It’s been suggested that it’s not clear what a positive
> return value from ib_req_notify_cq() means when the
> REPORT_MISSED_EVENTS flags is set: does it mean that
> the CQ has been re-armed? I had assumed that a positive
> RC meant both missed events and a successful re-arm,
> but the pseudo-code above suggests that is not the
> case.
The ULP must assume the CQ has NOT been armed after a positive return.
What the driver does to the arm state is undefined - for instance the
driver may trigger a callback and still return 1 here.
However, the driver must make this guarentee:
If ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns 0 then
the call back will always be called when the CQ is non-empty.
The ULP must loop doing polling until the above happens, otherwise the
event notification may be missed.
ie the above is guarnteed to close the WC delay/lockup race.
Again, if there has been confusion on the driver side, drivers that
don't implement the above are broken.
Review Roland's original commit comments on this feature.
https://github.com/jgunthorpe/linux/commit/ed23a72778f3dbd465e55b06fe31629e7e1dd2f3
I'm not sure where we are at on the 'significant overhead for some
low-level drivers' issue, but assuming that is still the case, then
the recommendation is this:
bool exiting = false;
while (1) {
struct ib_wc wcs[100];
int rc = ib_poll_cq(cq, NELEMS(wcs), wcs);
if (rc == 0 && exiting)
break;
.. process rc wcs ..
if (rc != NELEMS(wcs)) {
ib_req_notify_cq(cq, IB_CQ_NEXT_COMP)
exiting = true;
} else
exiting = false;
}
ie a double poll.
AFAIK, this is a common pattern in the ULPs.. Perhaps we should
implement this as a core API:
struct ib_wc wcs[100];
while ((rc = ib_poll_cq_and_arm(cq, NELEMS(wcs), wcs)) != 0) {
.. process rc wcs ..
ib_poll_cq_and_arm reads wcs off the CQ. If it returns 0 then the
callback is guarenteed to happen when the CQ is non empty.
Jason
next prev parent reply other threads:[~2015-10-01 18:15 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-17 20:44 [PATCH v1 00/18] RFC NFS/RDMA patches for merging into v4.4 Chuck Lever
2015-09-17 20:44 ` [PATCH v1 01/18] xprtrdma: Enable swap-on-NFS/RDMA Chuck Lever
2015-09-21 8:58 ` Devesh Sharma
2015-09-17 20:44 ` [PATCH v1 02/18] xprtrdma: Replace global lkey with lkey local to PD Chuck Lever
2015-09-21 8:59 ` Devesh Sharma
2015-09-17 20:44 ` [PATCH v1 03/18] xprtrdma: Remove completion polling budgets Chuck Lever
2015-09-18 6:52 ` Devesh Sharma
2015-09-18 14:19 ` Chuck Lever
2015-09-20 10:35 ` Sagi Grimberg
2015-09-21 8:51 ` Devesh Sharma
2015-09-21 15:45 ` Chuck Lever
2015-09-22 17:32 ` Devesh Sharma
2015-10-01 16:37 ` Chuck Lever
2015-10-01 17:13 ` Jason Gunthorpe
2015-10-01 17:36 ` Chuck Lever
2015-10-01 18:15 ` Jason Gunthorpe [this message]
2015-10-01 18:31 ` Chuck Lever
2015-10-01 18:38 ` Jason Gunthorpe
2015-09-21 8:51 ` Devesh Sharma
2015-09-17 20:44 ` [PATCH v1 04/18] xprtrdma: Refactor reply handler error handling Chuck Lever
2015-09-21 8:59 ` Devesh Sharma
2015-09-17 20:44 ` [PATCH v1 05/18] xprtrdma: Replace send and receive arrays Chuck Lever
2015-09-20 10:52 ` Sagi Grimberg
2015-09-21 23:04 ` Chuck Lever
2015-09-17 20:45 ` [PATCH v1 06/18] SUNRPC: Abstract backchannel operations Chuck Lever
2015-09-17 20:45 ` [PATCH v1 07/18] xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers Chuck Lever
2015-09-21 10:28 ` Devesh Sharma
2015-09-17 20:45 ` [PATCH v1 08/18] xprtrdma: Pre-allocate Work Requests for backchannel Chuck Lever
2015-09-21 10:33 ` Devesh Sharma
2015-09-21 22:41 ` Chuck Lever
2015-09-17 20:45 ` [PATCH v1 09/18] xprtrdma: Add support for sending backward direction RPC replies Chuck Lever
2015-09-17 20:45 ` [PATCH v1 10/18] xprtrdma: Handle incoming backward direction RPC calls Chuck Lever
2015-09-17 20:45 ` [PATCH v1 11/18] svcrdma: Add backward direction service for RPC/RDMA transport Chuck Lever
2015-09-17 20:45 ` [PATCH v1 12/18] SUNRPC: Remove the TCP-only restriction in bc_svc_process() Chuck Lever
2015-09-17 20:45 ` [PATCH v1 13/18] NFS: Enable client side NFSv4.1 backchannel to use other transports Chuck Lever
2015-09-17 20:46 ` [PATCH v1 14/18] svcrdma: Define maximum number of backchannel requests Chuck Lever
2015-09-17 20:46 ` [PATCH v1 15/18] svcrdma: Add svc_rdma_get_context() API that is allowed to fail Chuck Lever
2015-09-20 12:40 ` Sagi Grimberg
2015-09-21 22:34 ` Chuck Lever
2015-09-17 20:46 ` [PATCH v1 16/18] svcrdma: Add infrastructure to send backwards direction RPC/RDMA calls Chuck Lever
2015-09-17 20:46 ` [PATCH v1 17/18] svcrdma: Add infrastructure to receive backwards direction RPC/RDMA replies Chuck Lever
2015-09-17 20:46 ` [PATCH v1 18/18] xprtrdma: Add class for RDMA backwards direction transport Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151001181548.GA8670@obsidianresearch.com \
--to=jgunthorpe@obsidianresearch.com \
--cc=chuck.lever@oracle.com \
--cc=devesh.sharma@avagotech.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=sagig@dev.mellanox.co.il \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).