All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	Devesh Sharma <devesh.sharma@avagotech.com>,
	Sagi Grimberg <sagig@dev.mellanox.co.il>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 03/18] xprtrdma: Remove completion polling budgets
Date: Thu, 1 Oct 2015 12:15:48 -0600	[thread overview]
Message-ID: <20151001181548.GA8670@obsidianresearch.com> (raw)
In-Reply-To: <CDDBC387-9135-405E-B535-EE89F952386E@oracle.com>

On Thu, Oct 01, 2015 at 01:36:26PM -0400, Chuck Lever wrote:

> >> A missed WC will result in an RPC/RDMA transport deadlock. In
> >> fact that is the reason for this particular patch (although
> >> it addresses only one source of missed WCs). So I would like
> >> to see that there are no windows here.
> > 
> > WCs are never missed.
> 
> The review comment earlier in this thread suggested there is
> a race condition where a WC can be “delayed” resulting in,
> well, I’m still not certain what the consequences are.

Yes. The consequence would typically be lockup of CQ processing.

> > while (1) {
> >  struct ib_wc wcs[100];
> >  int rc = ib_poll_cq(cw, NELEMS(wcs), wcs);
> > 
> >  .. process rc wcs ..
> > 
> >  if (rc != NELEMS(wcs))
> >    if (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
> >        IB_CQ_REPORT_MISSED_EVENTS) == 0)
> >     break;
> > }
> > 
> > API wise, we should probably look at forcing
> > IB_CQ_REPORT_MISSED_EVENTS on and dropping the flag.
> 
> It’s been suggested that it’s not clear what a positive
> return value from ib_req_notify_cq() means when the
> REPORT_MISSED_EVENTS flags is set: does it mean that
> the CQ has been re-armed? I had assumed that a positive
> RC meant both missed events and a successful re-arm,
> but the pseudo-code above suggests that is not the
> case.

The ULP must assume the CQ has NOT been armed after a positive return.

What the driver does to the arm state is undefined - for instance the
driver may trigger a callback and still return 1 here.

However, the driver must make this guarentee:

 If ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns 0 then
 the call back will always be called when the CQ is non-empty.

The ULP must loop doing polling until the above happens, otherwise the
event notification may be missed.

ie the above is guarnteed to close the WC delay/lockup race.

Again, if there has been confusion on the driver side, drivers that
don't implement the above are broken.

Review Roland's original commit comments on this feature.

 https://github.com/jgunthorpe/linux/commit/ed23a72778f3dbd465e55b06fe31629e7e1dd2f3

I'm not sure where we are at on the 'significant overhead for some
low-level drivers' issue, but assuming that is still the case, then
the recommendation is this:

bool exiting = false;
while (1) {
   struct ib_wc wcs[100];
   int rc = ib_poll_cq(cq, NELEMS(wcs), wcs);
   if (rc == 0 && exiting)
        break;

   .. process rc wcs ..

   if (rc != NELEMS(wcs)) {
        ib_req_notify_cq(cq, IB_CQ_NEXT_COMP)
	exiting = true;
   } else
	exiting = false;
}

ie a double poll.

AFAIK, this is a common pattern in the ULPs.. Perhaps we should
implement this as a core API:

struct ib_wc wcs[100];
while ((rc = ib_poll_cq_and_arm(cq, NELEMS(wcs), wcs)) != 0) {
   .. process rc wcs  ..

 ib_poll_cq_and_arm reads wcs off the CQ. If it returns 0 then the
 callback is guarenteed to happen when the CQ is non empty.

Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: linux-rdma <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Devesh Sharma
	<devesh.sharma-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org>,
	Sagi Grimberg
	<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v1 03/18] xprtrdma: Remove completion polling budgets
Date: Thu, 1 Oct 2015 12:15:48 -0600	[thread overview]
Message-ID: <20151001181548.GA8670@obsidianresearch.com> (raw)
In-Reply-To: <CDDBC387-9135-405E-B535-EE89F952386E-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On Thu, Oct 01, 2015 at 01:36:26PM -0400, Chuck Lever wrote:

> >> A missed WC will result in an RPC/RDMA transport deadlock. In
> >> fact that is the reason for this particular patch (although
> >> it addresses only one source of missed WCs). So I would like
> >> to see that there are no windows here.
> > 
> > WCs are never missed.
> 
> The review comment earlier in this thread suggested there is
> a race condition where a WC can be “delayed” resulting in,
> well, I’m still not certain what the consequences are.

Yes. The consequence would typically be lockup of CQ processing.

> > while (1) {
> >  struct ib_wc wcs[100];
> >  int rc = ib_poll_cq(cw, NELEMS(wcs), wcs);
> > 
> >  .. process rc wcs ..
> > 
> >  if (rc != NELEMS(wcs))
> >    if (ib_req_notify_cq(cq, IB_CQ_NEXT_COMP |
> >        IB_CQ_REPORT_MISSED_EVENTS) == 0)
> >     break;
> > }
> > 
> > API wise, we should probably look at forcing
> > IB_CQ_REPORT_MISSED_EVENTS on and dropping the flag.
> 
> It’s been suggested that it’s not clear what a positive
> return value from ib_req_notify_cq() means when the
> REPORT_MISSED_EVENTS flags is set: does it mean that
> the CQ has been re-armed? I had assumed that a positive
> RC meant both missed events and a successful re-arm,
> but the pseudo-code above suggests that is not the
> case.

The ULP must assume the CQ has NOT been armed after a positive return.

What the driver does to the arm state is undefined - for instance the
driver may trigger a callback and still return 1 here.

However, the driver must make this guarentee:

 If ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns 0 then
 the call back will always be called when the CQ is non-empty.

The ULP must loop doing polling until the above happens, otherwise the
event notification may be missed.

ie the above is guarnteed to close the WC delay/lockup race.

Again, if there has been confusion on the driver side, drivers that
don't implement the above are broken.

Review Roland's original commit comments on this feature.

 https://github.com/jgunthorpe/linux/commit/ed23a72778f3dbd465e55b06fe31629e7e1dd2f3

I'm not sure where we are at on the 'significant overhead for some
low-level drivers' issue, but assuming that is still the case, then
the recommendation is this:

bool exiting = false;
while (1) {
   struct ib_wc wcs[100];
   int rc = ib_poll_cq(cq, NELEMS(wcs), wcs);
   if (rc == 0 && exiting)
        break;

   .. process rc wcs ..

   if (rc != NELEMS(wcs)) {
        ib_req_notify_cq(cq, IB_CQ_NEXT_COMP)
	exiting = true;
   } else
	exiting = false;
}

ie a double poll.

AFAIK, this is a common pattern in the ULPs.. Perhaps we should
implement this as a core API:

struct ib_wc wcs[100];
while ((rc = ib_poll_cq_and_arm(cq, NELEMS(wcs), wcs)) != 0) {
   .. process rc wcs  ..

 ib_poll_cq_and_arm reads wcs off the CQ. If it returns 0 then the
 callback is guarenteed to happen when the CQ is non empty.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2015-10-01 18:15 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17 20:44 [PATCH v1 00/18] RFC NFS/RDMA patches for merging into v4.4 Chuck Lever
2015-09-17 20:44 ` Chuck Lever
2015-09-17 20:44 ` [PATCH v1 01/18] xprtrdma: Enable swap-on-NFS/RDMA Chuck Lever
2015-09-17 20:44   ` Chuck Lever
2015-09-21  8:58   ` Devesh Sharma
2015-09-21  8:58     ` Devesh Sharma
2015-09-17 20:44 ` [PATCH v1 02/18] xprtrdma: Replace global lkey with lkey local to PD Chuck Lever
2015-09-17 20:44   ` Chuck Lever
2015-09-21  8:59   ` Devesh Sharma
2015-09-21  8:59     ` Devesh Sharma
2015-09-17 20:44 ` [PATCH v1 03/18] xprtrdma: Remove completion polling budgets Chuck Lever
2015-09-17 20:44   ` Chuck Lever
2015-09-18  6:52   ` Devesh Sharma
2015-09-18  6:52     ` Devesh Sharma
2015-09-18 14:19     ` Chuck Lever
2015-09-18 14:19       ` Chuck Lever
2015-09-20 10:35       ` Sagi Grimberg
2015-09-20 10:35         ` Sagi Grimberg
2015-09-21  8:51         ` Devesh Sharma
2015-09-21  8:51           ` Devesh Sharma
2015-09-21 15:45           ` Chuck Lever
2015-09-21 15:45             ` Chuck Lever
2015-09-22 17:32             ` Devesh Sharma
2015-09-22 17:32               ` Devesh Sharma
2015-10-01 16:37               ` Chuck Lever
2015-10-01 16:37                 ` Chuck Lever
2015-10-01 17:13                 ` Jason Gunthorpe
2015-10-01 17:13                   ` Jason Gunthorpe
2015-10-01 17:36                   ` Chuck Lever
2015-10-01 17:36                     ` Chuck Lever
2015-10-01 18:15                     ` Jason Gunthorpe [this message]
2015-10-01 18:15                       ` Jason Gunthorpe
2015-10-01 18:31                       ` Chuck Lever
2015-10-01 18:31                         ` Chuck Lever
2015-10-01 18:38                         ` Jason Gunthorpe
2015-10-01 18:38                           ` Jason Gunthorpe
2015-09-21  8:51       ` Devesh Sharma
2015-09-21  8:51         ` Devesh Sharma
2015-09-17 20:44 ` [PATCH v1 04/18] xprtrdma: Refactor reply handler error handling Chuck Lever
2015-09-17 20:44   ` Chuck Lever
2015-09-21  8:59   ` Devesh Sharma
2015-09-21  8:59     ` Devesh Sharma
2015-09-17 20:44 ` [PATCH v1 05/18] xprtrdma: Replace send and receive arrays Chuck Lever
2015-09-17 20:44   ` Chuck Lever
2015-09-20 10:52   ` Sagi Grimberg
2015-09-20 10:52     ` Sagi Grimberg
2015-09-21 23:04     ` Chuck Lever
2015-09-21 23:04       ` Chuck Lever
2015-09-17 20:45 ` [PATCH v1 06/18] SUNRPC: Abstract backchannel operations Chuck Lever
2015-09-17 20:45   ` Chuck Lever
2015-09-17 20:45 ` [PATCH v1 07/18] xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers Chuck Lever
2015-09-17 20:45   ` Chuck Lever
2015-09-21 10:28   ` Devesh Sharma
2015-09-21 10:28     ` Devesh Sharma
2015-09-17 20:45 ` [PATCH v1 08/18] xprtrdma: Pre-allocate Work Requests for backchannel Chuck Lever
2015-09-17 20:45   ` Chuck Lever
2015-09-21 10:33   ` Devesh Sharma
2015-09-21 10:33     ` Devesh Sharma
2015-09-21 22:41     ` Chuck Lever
2015-09-21 22:41       ` Chuck Lever
2015-09-17 20:45 ` [PATCH v1 09/18] xprtrdma: Add support for sending backward direction RPC replies Chuck Lever
2015-09-17 20:45   ` Chuck Lever
2015-09-17 20:45 ` [PATCH v1 10/18] xprtrdma: Handle incoming backward direction RPC calls Chuck Lever
2015-09-17 20:45   ` Chuck Lever
2015-09-17 20:45 ` [PATCH v1 11/18] svcrdma: Add backward direction service for RPC/RDMA transport Chuck Lever
2015-09-17 20:45   ` Chuck Lever
2015-09-17 20:45 ` [PATCH v1 12/18] SUNRPC: Remove the TCP-only restriction in bc_svc_process() Chuck Lever
2015-09-17 20:45   ` Chuck Lever
2015-09-17 20:45 ` [PATCH v1 13/18] NFS: Enable client side NFSv4.1 backchannel to use other transports Chuck Lever
2015-09-17 20:45   ` Chuck Lever
2015-09-17 20:46 ` [PATCH v1 14/18] svcrdma: Define maximum number of backchannel requests Chuck Lever
2015-09-17 20:46   ` Chuck Lever
2015-09-17 20:46 ` [PATCH v1 15/18] svcrdma: Add svc_rdma_get_context() API that is allowed to fail Chuck Lever
2015-09-17 20:46   ` Chuck Lever
2015-09-20 12:40   ` Sagi Grimberg
2015-09-20 12:40     ` Sagi Grimberg
2015-09-21 22:34     ` Chuck Lever
2015-09-21 22:34       ` Chuck Lever
2015-09-17 20:46 ` [PATCH v1 16/18] svcrdma: Add infrastructure to send backwards direction RPC/RDMA calls Chuck Lever
2015-09-17 20:46   ` Chuck Lever
2015-09-17 20:46 ` [PATCH v1 17/18] svcrdma: Add infrastructure to receive backwards direction RPC/RDMA replies Chuck Lever
2015-09-17 20:46   ` Chuck Lever
2015-09-17 20:46 ` [PATCH v1 18/18] xprtrdma: Add class for RDMA backwards direction transport Chuck Lever
2015-09-17 20:46   ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151001181548.GA8670@obsidianresearch.com \
    --to=jgunthorpe@obsidianresearch.com \
    --cc=chuck.lever@oracle.com \
    --cc=devesh.sharma@avagotech.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sagig@dev.mellanox.co.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.