From: "Steve Wise" <swise@opengridcomputing.com>
To: "'Chuck Lever'" <chuck.lever@oracle.com>
Cc: "'Sagi Grimberg'" <sagig@dev.mellanox.co.il>,
<anna.schumaker@netapp.com>,
"'Linux RDMA Mailing List'" <linux-rdma@vger.kernel.org>,
"'Linux NFS Mailing List'" <linux-nfs@vger.kernel.org>
Subject: RE: [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails
Date: Thu, 10 Mar 2016 09:54:15 -0600 [thread overview]
Message-ID: <7b3901d17ae5$18fbf540$4af3dfc0$@opengridcomputing.com> (raw)
In-Reply-To: <BB3E1E71-E3B0-48D2-BADE-120152BE42D3@oracle.com>
> >>>>>> Moving the QP into error state right after with rdma_disconnect
> >>>>>> you are not sure that none of the subset of the invalidations
> >>>>>> that _were_ posted completed and you get the corresponding MRs
> >>>>>> in a bogus state...
> >>>>>
> >>>>> Moving the QP to error state and then draining the CQs means
> >>>>> that all LOCAL_INV WRs that managed to get posted will get
> >>>>> completed or flushed. That's already handled today.
> >>>>>
> >>>>> It's the WRs that didn't get posted that I'm worried about
> >>>>> in this patch.
> >>>>>
> >>>>> Are there RDMA consumers in the kernel that use that third
> >>>>> argument to recover when LOCAL_INV WRs cannot be posted?
> >>>>
> >>>> None :)
> >>>>
> >>>>>>> I suppose I could reset these MRs instead (that is,
> >>>>>>> pass them to ib_dereg_mr).
> >>>>>>
> >>>>>> Or, just wait for a completion for those that were posted
> >>>>>> and then all the MRs are in a consistent state.
> >>>>>
> >>>>> When a LOCAL_INV completes with IB_WC_SUCCESS, the associated
> >>>>> MR is in a known state (ie, invalid).
> >>>>>
> >>>>> The WRs that flush mean the associated MRs are not in a known
> >>>>> state. Sometimes the MR state is different than the hardware
> >>>>> state, for example. Trying to do anything with one of these
> >>>>> inconsistent MRs results in IB_WC_BIND_MW_ERR until the thing
> >>>>> is deregistered.
> >>>>
> >>>> Correct.
> >>>>
> >>>
> >>> It is legal to invalidate an MR that is not in the valid state. So you
> > don't
> >>> have to deregister it, you can assume it is valid and post another LINV
WR.
> >>
> >> I've tried that. Once the MR is inconsistent, even LOCAL_INV
> >> does not work.
> >>
> >
> > Maybe IB Verbs don't mandate that invalidating an invalid MR must be
allowed?
> > (looking at the verbs spec now).
>
IB Verbs doesn't have specify this requirement. iW verbs does. So transport
independent applications cannot rely on it. So ib_dereg_mr() seems to be the
only thing you can do.
> If the MR is truly invalid, then there is no issue, and
> the second LOCAL_INV completes successfully.
>
> The problem is after a flushed LOCAL_INV, the MR state
> sometimes does not match the hardware state. The MR is
> neither registered or invalid.
>
There is a difference, at least with iWARP devices, between the MR state: VALID
vs INVALID, and if the MR is allocated or not.
> A flushed LOCAL_INV tells you nothing more than that the
> LOCAL_INV didn't complete. The MR state at that point is
> unknown.
>
With respect to iWARP and cxgb4: when you allocate a fastreg MR, HW has an entry
for that MR and it is marked "allocated". The MR record in HW also has a state:
VALID or INVALID. While the MR is "allocated" you can post WRs to invalidate it
which changes the state to INVALID, or fast-register memory which makes it
VALID. Regardless of what happens on any given QP, the MR remains "allocated"
until you call ib_dereg_mr(). So at least for cxgb4, you could in fact just
post another LINV to get it back to a known state that allows subsequent
fast-reg WRs.
Perhaps IB devices don't work this way.
What error did you get when you tried just doing an LINV after a flush?
Steve.
next prev parent reply other threads:[~2016-03-10 15:53 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-04 16:27 [PATCH v3 00/11] NFS/RDMA client patches for v4.6 Chuck Lever
2016-03-04 16:27 ` [PATCH v3 01/11] xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro Chuck Lever
2016-03-08 17:48 ` Sagi Grimberg
2016-03-04 16:27 ` [PATCH v3 02/11] xprtrdma: Clean up physical_op_map() Chuck Lever
2016-03-08 17:48 ` Sagi Grimberg
2016-03-04 16:27 ` [PATCH v3 03/11] xprtrdma: Clean up dprintk format string containing a newline Chuck Lever
2016-03-08 17:48 ` Sagi Grimberg
2016-03-04 16:27 ` [PATCH v3 04/11] xprtrdma: Segment head and tail XDR buffers on page boundaries Chuck Lever
2016-03-04 16:28 ` [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails Chuck Lever
2016-03-08 17:53 ` Sagi Grimberg
2016-03-08 18:03 ` Chuck Lever
2016-03-09 11:09 ` Sagi Grimberg
2016-03-09 20:47 ` Chuck Lever
2016-03-09 21:40 ` Anna Schumaker
2016-03-10 10:25 ` Sagi Grimberg
2016-03-10 15:04 ` Steve Wise
2016-03-10 15:05 ` Chuck Lever
2016-03-10 15:31 ` Steve Wise
2016-03-10 15:35 ` Chuck Lever
2016-03-10 15:54 ` Steve Wise [this message]
2016-03-10 15:58 ` Chuck Lever
2016-03-10 16:10 ` Steve Wise
2016-03-10 16:14 ` Chuck Lever
2016-03-10 16:21 ` Steve Wise
2016-03-10 16:40 ` Chuck Lever
2016-03-10 17:01 ` Anna Schumaker
2016-03-04 16:28 ` [PATCH v3 06/11] rpcrdma: Add RPCRDMA_HDRLEN_ERR Chuck Lever
2016-03-08 17:53 ` Sagi Grimberg
2016-03-04 16:28 ` [PATCH v3 07/11] xprtrdma: Properly handle RDMA_ERROR replies Chuck Lever
2016-03-04 16:28 ` [PATCH v3 08/11] xprtrdma: Serialize credit accounting again Chuck Lever
2016-03-04 16:28 ` [PATCH v3 09/11] xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs Chuck Lever
2016-03-08 17:55 ` Sagi Grimberg
2016-03-04 16:28 ` [PATCH v3 10/11] xprtrdma: Use an anonymous union in struct rpcrdma_mw Chuck Lever
2016-03-08 17:55 ` Sagi Grimberg
2016-03-04 16:28 ` [PATCH v3 11/11] xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='7b3901d17ae5$18fbf540$4af3dfc0$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
--cc=anna.schumaker@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=sagig@dev.mellanox.co.il \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).