All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anna Schumaker <Anna.Schumaker@netapp.com>
To: Chuck Lever <chuck.lever@oracle.com>,
	Sagi Grimberg <sagig@dev.mellanox.co.il>
Cc: Linux RDMA Mailing List <linux-rdma@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails
Date: Thu, 10 Mar 2016 12:01:36 -0500	[thread overview]
Message-ID: <56E1A870.5030308@Netapp.com> (raw)
In-Reply-To: <54ECE0AF-A930-45E0-A03A-FB7CD789B538@oracle.com>

On 03/10/2016 11:40 AM, Chuck Lever wrote:
> 
>> On Mar 10, 2016, at 5:25 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>>
>>
>>>> Moving the QP into error state right after with rdma_disconnect
>>>> you are not sure that none of the subset of the invalidations
>>>> that _were_ posted completed and you get the corresponding MRs
>>>> in a bogus state...
>>>
>>> Moving the QP to error state and then draining the CQs means
>>> that all LOCAL_INV WRs that managed to get posted will get
>>> completed or flushed. That's already handled today.
>>>
>>> It's the WRs that didn't get posted that I'm worried about
>>> in this patch.
>>>
>>> Are there RDMA consumers in the kernel that use that third
>>> argument to recover when LOCAL_INV WRs cannot be posted?
>>
>> None :)
>>
>>>>> I suppose I could reset these MRs instead (that is,
>>>>> pass them to ib_dereg_mr).
>>>>
>>>> Or, just wait for a completion for those that were posted
>>>> and then all the MRs are in a consistent state.
>>>
>>> When a LOCAL_INV completes with IB_WC_SUCCESS, the associated
>>> MR is in a known state (ie, invalid).
>>>
>>> The WRs that flush mean the associated MRs are not in a known
>>> state. Sometimes the MR state is different than the hardware
>>> state, for example. Trying to do anything with one of these
>>> inconsistent MRs results in IB_WC_BIND_MW_ERR until the thing
>>> is deregistered.
>>
>> Correct.
>>
>>> The xprtrdma completion handlers mark the MR associated with
>>> a flushed LOCAL_INV WR "stale". They all have to be reset with
>>> ib_dereg_mr to guarantee they are usable again. Have a look at
>>> __frwr_recovery_worker().
>>
>> Yes, I'm aware of that.
>>
>>> And, xprtrdma waits for only the last LOCAL_INV in the chain to
>>> complete. If that one isn't posted, then fr_done is never woken
>>> up. In that case, frwr_op_unmap_sync() would wait forever.
>>
>> Ah.. so the (missing) completions is the problem, now I get
>> it.
>>
>>> If I understand you I think the correct solution is for
>>> frwr_op_unmap_sync() to regroup and reset the MRs associated
>>> with the LOCAL_INV WRs that were never posted, using the same
>>> mechanism as __frwr_recovery_worker() .
>>
>> Yea, I'd recycle all the MRs instead of having non-trivial logic
>> to try and figure out MR states...
> 
> We have to keep that logic, since a spurious disconnect
> will result in flushed LOCAL_INV requests too. In fact
> that's the by far more likely source of inconsistent MRs.
> 
> 
>>> It's already 4.5-rc7, a little late for a significant rework
>>> of this patch, so maybe I should drop it?
>>
>> Perhaps... Although you can make it incremental because the current
>> patch doesn't seem to break anything, just not solving the complete
>> problem...
> 
> I'm preparing to extend the frwr_queue_recovery mechanism
> in v4.7 to deal with other cases, and that new code could
> be used here to fence MRs, rather than forcing a disconnect.
> 
> I'd like to leave 05/11 in place for v4.6.
> 
> Anna, can you add Sagi's Reviewed-by tags to the other
> patches in this series, as he posted earlier this week?

Yeah, I can do that.  I'll leave in the patch, and send everything to Trond later this afternoon or tomorrow!

Anna
> 
> 
> --
> Chuck Lever
> 
> 
> 


WARNING: multiple messages have this Message-ID (diff)
From: Anna Schumaker <Anna.Schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	Sagi Grimberg
	<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Cc: Linux RDMA Mailing List
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux NFS Mailing List
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails
Date: Thu, 10 Mar 2016 12:01:36 -0500	[thread overview]
Message-ID: <56E1A870.5030308@Netapp.com> (raw)
In-Reply-To: <54ECE0AF-A930-45E0-A03A-FB7CD789B538-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

On 03/10/2016 11:40 AM, Chuck Lever wrote:
> 
>> On Mar 10, 2016, at 5:25 AM, Sagi Grimberg <sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
>>
>>
>>>> Moving the QP into error state right after with rdma_disconnect
>>>> you are not sure that none of the subset of the invalidations
>>>> that _were_ posted completed and you get the corresponding MRs
>>>> in a bogus state...
>>>
>>> Moving the QP to error state and then draining the CQs means
>>> that all LOCAL_INV WRs that managed to get posted will get
>>> completed or flushed. That's already handled today.
>>>
>>> It's the WRs that didn't get posted that I'm worried about
>>> in this patch.
>>>
>>> Are there RDMA consumers in the kernel that use that third
>>> argument to recover when LOCAL_INV WRs cannot be posted?
>>
>> None :)
>>
>>>>> I suppose I could reset these MRs instead (that is,
>>>>> pass them to ib_dereg_mr).
>>>>
>>>> Or, just wait for a completion for those that were posted
>>>> and then all the MRs are in a consistent state.
>>>
>>> When a LOCAL_INV completes with IB_WC_SUCCESS, the associated
>>> MR is in a known state (ie, invalid).
>>>
>>> The WRs that flush mean the associated MRs are not in a known
>>> state. Sometimes the MR state is different than the hardware
>>> state, for example. Trying to do anything with one of these
>>> inconsistent MRs results in IB_WC_BIND_MW_ERR until the thing
>>> is deregistered.
>>
>> Correct.
>>
>>> The xprtrdma completion handlers mark the MR associated with
>>> a flushed LOCAL_INV WR "stale". They all have to be reset with
>>> ib_dereg_mr to guarantee they are usable again. Have a look at
>>> __frwr_recovery_worker().
>>
>> Yes, I'm aware of that.
>>
>>> And, xprtrdma waits for only the last LOCAL_INV in the chain to
>>> complete. If that one isn't posted, then fr_done is never woken
>>> up. In that case, frwr_op_unmap_sync() would wait forever.
>>
>> Ah.. so the (missing) completions is the problem, now I get
>> it.
>>
>>> If I understand you I think the correct solution is for
>>> frwr_op_unmap_sync() to regroup and reset the MRs associated
>>> with the LOCAL_INV WRs that were never posted, using the same
>>> mechanism as __frwr_recovery_worker() .
>>
>> Yea, I'd recycle all the MRs instead of having non-trivial logic
>> to try and figure out MR states...
> 
> We have to keep that logic, since a spurious disconnect
> will result in flushed LOCAL_INV requests too. In fact
> that's the by far more likely source of inconsistent MRs.
> 
> 
>>> It's already 4.5-rc7, a little late for a significant rework
>>> of this patch, so maybe I should drop it?
>>
>> Perhaps... Although you can make it incremental because the current
>> patch doesn't seem to break anything, just not solving the complete
>> problem...
> 
> I'm preparing to extend the frwr_queue_recovery mechanism
> in v4.7 to deal with other cases, and that new code could
> be used here to fence MRs, rather than forcing a disconnect.
> 
> I'd like to leave 05/11 in place for v4.6.
> 
> Anna, can you add Sagi's Reviewed-by tags to the other
> patches in this series, as he posted earlier this week?

Yeah, I can do that.  I'll leave in the patch, and send everything to Trond later this afternoon or tomorrow!

Anna
> 
> 
> --
> Chuck Lever
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-03-10 17:01 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-04 16:27 [PATCH v3 00/11] NFS/RDMA client patches for v4.6 Chuck Lever
2016-03-04 16:27 ` Chuck Lever
2016-03-04 16:27 ` [PATCH v3 01/11] xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro Chuck Lever
2016-03-04 16:27   ` Chuck Lever
2016-03-08 17:48   ` Sagi Grimberg
2016-03-08 17:48     ` Sagi Grimberg
2016-03-04 16:27 ` [PATCH v3 02/11] xprtrdma: Clean up physical_op_map() Chuck Lever
2016-03-04 16:27   ` Chuck Lever
2016-03-08 17:48   ` Sagi Grimberg
2016-03-08 17:48     ` Sagi Grimberg
2016-03-04 16:27 ` [PATCH v3 03/11] xprtrdma: Clean up dprintk format string containing a newline Chuck Lever
2016-03-04 16:27   ` Chuck Lever
2016-03-08 17:48   ` Sagi Grimberg
2016-03-08 17:48     ` Sagi Grimberg
2016-03-04 16:27 ` [PATCH v3 04/11] xprtrdma: Segment head and tail XDR buffers on page boundaries Chuck Lever
2016-03-04 16:27   ` Chuck Lever
2016-03-04 16:28 ` [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails Chuck Lever
2016-03-04 16:28   ` Chuck Lever
2016-03-08 17:53   ` Sagi Grimberg
2016-03-08 17:53     ` Sagi Grimberg
2016-03-08 18:03     ` Chuck Lever
2016-03-08 18:03       ` Chuck Lever
2016-03-09 11:09       ` Sagi Grimberg
2016-03-09 11:09         ` Sagi Grimberg
2016-03-09 20:47         ` Chuck Lever
2016-03-09 20:47           ` Chuck Lever
2016-03-09 21:40           ` Anna Schumaker
2016-03-09 21:40             ` Anna Schumaker
2016-03-10 10:25           ` Sagi Grimberg
2016-03-10 10:25             ` Sagi Grimberg
2016-03-10 15:04             ` Steve Wise
2016-03-10 15:04               ` Steve Wise
2016-03-10 15:05               ` Chuck Lever
2016-03-10 15:05                 ` Chuck Lever
2016-03-10 15:31                 ` Steve Wise
2016-03-10 15:31                   ` Steve Wise
2016-03-10 15:35                   ` Chuck Lever
2016-03-10 15:35                     ` Chuck Lever
2016-03-10 15:54                     ` Steve Wise
2016-03-10 15:54                       ` Steve Wise
2016-03-10 15:58                       ` Chuck Lever
2016-03-10 15:58                         ` Chuck Lever
2016-03-10 16:10                         ` Steve Wise
2016-03-10 16:10                           ` Steve Wise
2016-03-10 16:14                           ` Chuck Lever
2016-03-10 16:14                             ` Chuck Lever
2016-03-10 16:21                             ` Steve Wise
2016-03-10 16:21                               ` Steve Wise
2016-03-10 16:40             ` Chuck Lever
2016-03-10 16:40               ` Chuck Lever
2016-03-10 17:01               ` Anna Schumaker [this message]
2016-03-10 17:01                 ` Anna Schumaker
2016-03-04 16:28 ` [PATCH v3 06/11] rpcrdma: Add RPCRDMA_HDRLEN_ERR Chuck Lever
2016-03-04 16:28   ` Chuck Lever
2016-03-08 17:53   ` Sagi Grimberg
2016-03-08 17:53     ` Sagi Grimberg
2016-03-04 16:28 ` [PATCH v3 07/11] xprtrdma: Properly handle RDMA_ERROR replies Chuck Lever
2016-03-04 16:28   ` Chuck Lever
2016-03-04 16:28 ` [PATCH v3 08/11] xprtrdma: Serialize credit accounting again Chuck Lever
2016-03-04 16:28   ` Chuck Lever
2016-03-04 16:28 ` [PATCH v3 09/11] xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs Chuck Lever
2016-03-04 16:28   ` Chuck Lever
2016-03-08 17:55   ` Sagi Grimberg
2016-03-08 17:55     ` Sagi Grimberg
2016-03-04 16:28 ` [PATCH v3 10/11] xprtrdma: Use an anonymous union in struct rpcrdma_mw Chuck Lever
2016-03-04 16:28   ` Chuck Lever
2016-03-08 17:55   ` Sagi Grimberg
2016-03-08 17:55     ` Sagi Grimberg
2016-03-04 16:28 ` [PATCH v3 11/11] xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs Chuck Lever
2016-03-04 16:28   ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56E1A870.5030308@Netapp.com \
    --to=anna.schumaker@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sagig@dev.mellanox.co.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.