linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Anna Schumaker <Anna.Schumaker@netapp.com>
To: Chuck Lever <chuck.lever@oracle.com>,
	Sagi Grimberg <sagig@dev.mellanox.co.il>
Cc: Linux RDMA Mailing List <linux-rdma@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails
Date: Wed, 9 Mar 2016 16:40:11 -0500	[thread overview]
Message-ID: <56E0983B.2000102@Netapp.com> (raw)
In-Reply-To: <6B59B087-9CFA-458B-8848-B08B8E14E2C7@oracle.com>

On 03/09/2016 03:47 PM, Chuck Lever wrote:
> 
>> On Mar 9, 2016, at 6:09 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>>
>>
>>
>> On 08/03/2016 20:03, Chuck Lever wrote:
>>>
>>>> On Mar 8, 2016, at 12:53 PM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>>>>
>>>>
>>>>
>>>> On 04/03/2016 18:28, Chuck Lever wrote:
>>>>> If ib_post_send() in ro_unmap_sync() fails, the WRs have not been
>>>>> posted, no completions will fire, and wait_for_completion() will
>>>>> wait forever. Skip the wait in that case.
>>>>>
>>>>> To ensure the MRs are invalid, disconnect.
>>>>
>>>> How does that help to ensure that?
>>>
>>> I should have said "To ensure the MRs are fenced,"
>>>
>>>> The first wr that failed and on will leave the
>>>> corresponding MRs invalid, and the others will be valid
>>>> upon completion.
>>>
>>> ? This is in the invalidation code, not in the fastreg
>>> code.
>>
>> Yes, I meant linv...
>>
>>> When this ib_post_send() fails, I've built a set of
>>> chained LOCAL_INV WRs, but they never get posted. So
>>> there is no WR failure here, the WRs are simply
>>> never posted, and they won't complete or flush.
>>
>> That's the thing, some of them may have succeeded.
>> if ib_post_send() fails on a chain of posts, it reports
>> which wr failed (in the third wr pointer).
> 
> I see.
> 
> 
>> Moving the QP into error state right after with rdma_disconnect
>> you are not sure that none of the subset of the invalidations
>> that _were_ posted completed and you get the corresponding MRs
>> in a bogus state...
> 
> Moving the QP to error state and then draining the CQs means
> that all LOCAL_INV WRs that managed to get posted will get
> completed or flushed. That's already handled today.
> 
> It's the WRs that didn't get posted that I'm worried about
> in this patch.
> 
> Are there RDMA consumers in the kernel that use that third
> argument to recover when LOCAL_INV WRs cannot be posted?
> 
> 
>>> I suppose I could reset these MRs instead (that is,
>>> pass them to ib_dereg_mr).
>>
>> Or, just wait for a completion for those that were posted
>> and then all the MRs are in a consistent state.
> 
> When a LOCAL_INV completes with IB_WC_SUCCESS, the associated
> MR is in a known state (ie, invalid).
> 
> The WRs that flush mean the associated MRs are not in a known
> state. Sometimes the MR state is different than the hardware
> state, for example. Trying to do anything with one of these
> inconsistent MRs results in IB_WC_BIND_MW_ERR until the thing
> is deregistered.
> 
> The xprtrdma completion handlers mark the MR associated with
> a flushed LOCAL_INV WR "stale". They all have to be reset with
> ib_dereg_mr to guarantee they are usable again. Have a look at
> __frwr_recovery_worker().
> 
> And, xprtrdma waits for only the last LOCAL_INV in the chain to
> complete. If that one isn't posted, then fr_done is never woken
> up. In that case, frwr_op_unmap_sync() would wait forever.
> 
> If I understand you I think the correct solution is for
> frwr_op_unmap_sync() to regroup and reset the MRs associated
> with the LOCAL_INV WRs that were never posted, using the same
> mechanism as __frwr_recovery_worker() .
> 
> It's already 4.5-rc7, a little late for a significant rework
> of this patch, so maybe I should drop it?

Git doesn't have any conflicts if I drop the patch from my tree, and I was still able to compile.  Let me know if you want me to drop the patch from my tree, so you don't have to resend an entire series!

Thanks,
Anna

> 
> 
> --
> Chuck Lever
> 
> 
> 


  reply	other threads:[~2016-03-09 21:40 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-04 16:27 [PATCH v3 00/11] NFS/RDMA client patches for v4.6 Chuck Lever
2016-03-04 16:27 ` [PATCH v3 01/11] xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro Chuck Lever
2016-03-08 17:48   ` Sagi Grimberg
2016-03-04 16:27 ` [PATCH v3 02/11] xprtrdma: Clean up physical_op_map() Chuck Lever
2016-03-08 17:48   ` Sagi Grimberg
2016-03-04 16:27 ` [PATCH v3 03/11] xprtrdma: Clean up dprintk format string containing a newline Chuck Lever
2016-03-08 17:48   ` Sagi Grimberg
2016-03-04 16:27 ` [PATCH v3 04/11] xprtrdma: Segment head and tail XDR buffers on page boundaries Chuck Lever
2016-03-04 16:28 ` [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails Chuck Lever
2016-03-08 17:53   ` Sagi Grimberg
2016-03-08 18:03     ` Chuck Lever
2016-03-09 11:09       ` Sagi Grimberg
2016-03-09 20:47         ` Chuck Lever
2016-03-09 21:40           ` Anna Schumaker [this message]
2016-03-10 10:25           ` Sagi Grimberg
2016-03-10 15:04             ` Steve Wise
2016-03-10 15:05               ` Chuck Lever
2016-03-10 15:31                 ` Steve Wise
2016-03-10 15:35                   ` Chuck Lever
2016-03-10 15:54                     ` Steve Wise
2016-03-10 15:58                       ` Chuck Lever
2016-03-10 16:10                         ` Steve Wise
2016-03-10 16:14                           ` Chuck Lever
2016-03-10 16:21                             ` Steve Wise
2016-03-10 16:40             ` Chuck Lever
2016-03-10 17:01               ` Anna Schumaker
2016-03-04 16:28 ` [PATCH v3 06/11] rpcrdma: Add RPCRDMA_HDRLEN_ERR Chuck Lever
2016-03-08 17:53   ` Sagi Grimberg
2016-03-04 16:28 ` [PATCH v3 07/11] xprtrdma: Properly handle RDMA_ERROR replies Chuck Lever
2016-03-04 16:28 ` [PATCH v3 08/11] xprtrdma: Serialize credit accounting again Chuck Lever
2016-03-04 16:28 ` [PATCH v3 09/11] xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs Chuck Lever
2016-03-08 17:55   ` Sagi Grimberg
2016-03-04 16:28 ` [PATCH v3 10/11] xprtrdma: Use an anonymous union in struct rpcrdma_mw Chuck Lever
2016-03-08 17:55   ` Sagi Grimberg
2016-03-04 16:28 ` [PATCH v3 11/11] xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56E0983B.2000102@Netapp.com \
    --to=anna.schumaker@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sagig@dev.mellanox.co.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).