linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Chuck Lever' <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: 'Sagi Grimberg'
	<sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org,
	'Linux RDMA Mailing List'
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	'Linux NFS Mailing List'
	<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: RE: [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails
Date: Thu, 10 Mar 2016 10:21:28 -0600	[thread overview]
Message-ID: <7b6d01d17ae8$e68f7e20$b3ae7a60$@opengridcomputing.com> (raw)
In-Reply-To: <B32CA8B9-3EB7-4DC3-A945-5C9F05D5F984-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

> >>>>>>>>>> Moving the QP into error state right after with rdma_disconnect
> >>>>>>>>>> you are not sure that none of the subset of the invalidations
> >>>>>>>>>> that _were_ posted completed and you get the corresponding MRs
> >>>>>>>>>> in a bogus state...
> >>>>>>>>>
> >>>>>>>>> Moving the QP to error state and then draining the CQs means
> >>>>>>>>> that all LOCAL_INV WRs that managed to get posted will get
> >>>>>>>>> completed or flushed. That's already handled today.
> >>>>>>>>>
> >>>>>>>>> It's the WRs that didn't get posted that I'm worried about
> >>>>>>>>> in this patch.
> >>>>>>>>>
> >>>>>>>>> Are there RDMA consumers in the kernel that use that third
> >>>>>>>>> argument to recover when LOCAL_INV WRs cannot be posted?
> >>>>>>>>
> >>>>>>>> None :)
> >>>>>>>>
> >>>>>>>>>>> I suppose I could reset these MRs instead (that is,
> >>>>>>>>>>> pass them to ib_dereg_mr).
> >>>>>>>>>>
> >>>>>>>>>> Or, just wait for a completion for those that were posted
> >>>>>>>>>> and then all the MRs are in a consistent state.
> >>>>>>>>>
> >>>>>>>>> When a LOCAL_INV completes with IB_WC_SUCCESS, the associated
> >>>>>>>>> MR is in a known state (ie, invalid).
> >>>>>>>>>
> >>>>>>>>> The WRs that flush mean the associated MRs are not in a known
> >>>>>>>>> state. Sometimes the MR state is different than the hardware
> >>>>>>>>> state, for example. Trying to do anything with one of these
> >>>>>>>>> inconsistent MRs results in IB_WC_BIND_MW_ERR until the thing
> >>>>>>>>> is deregistered.
> >>>>>>>>
> >>>>>>>> Correct.
> >>>>>>>>
> >>>>>>>
> >>>>>>> It is legal to invalidate an MR that is not in the valid state.  So
you
> >>>>> don't
> >>>>>>> have to deregister it, you can assume it is valid and post another
LINV
> >>> WR.
> >>>>>>
> >>>>>> I've tried that. Once the MR is inconsistent, even LOCAL_INV
> >>>>>> does not work.
> >>>>>>
> >>>>>
> >>>>> Maybe IB Verbs don't mandate that invalidating an invalid MR must be
> >>> allowed?
> >>>>> (looking at the verbs spec now).
> >>>>
> >>>
> >>> IB Verbs doesn't have specify this requirement.  iW verbs does.  So
> > transport
> >>> independent applications cannot rely on it.  So ib_dereg_mr() seems to be
> > the
> >>> only thing you can do.
> >>>
> >>>> If the MR is truly invalid, then there is no issue, and
> >>>> the second LOCAL_INV completes successfully.
> >>>>
> >>>> The problem is after a flushed LOCAL_INV, the MR state
> >>>> sometimes does not match the hardware state. The MR is
> >>>> neither registered or invalid.
> >>>>
> >>>
> >>> There is a difference, at least with iWARP devices, between the MR state:
> > VALID
> >>> vs INVALID, and if the MR is allocated or not.
> >>>
> >>>> A flushed LOCAL_INV tells you nothing more than that the
> >>>> LOCAL_INV didn't complete. The MR state at that point is
> >>>> unknown.
> >>>>
> >>>
> >>> With respect to iWARP and cxgb4: when you allocate a fastreg MR, HW has an
> >> entry
> >>> for that MR and it is marked "allocated".  The MR record in HW also has a
> > state:
> >>> VALID or INVALID.  While the MR is "allocated" you can post WRs to
> > invalidate it
> >>> which changes the state to INVALID, or fast-register memory which makes it
> >>> VALID.  Regardless of what happens on any given QP, the MR remains
> > "allocated"
> >>> until you call ib_dereg_mr().  So at least for cxgb4, you could in fact
just
> >>> post another LINV to get it back to a known state that allows subsequent
> >>> fast-reg WRs.
> >>>
> >>> Perhaps IB devices don't work this way.
> >>>
> >>> What error did you get when you tried just doing an LINV after a flush?
> >>
> >> With CX-2 and CX-3, after a flushed LOCAL_INV, trying either
> >> a FASTREG or LOCAL_INV on that MR can sometimes complete with
> >> IB_WC_MW_BIND_ERR.
> >
> >
> > I wonder if you post a FASREG+LINV+LINV if you'd get the same failure?  IE
> > invalidate the same rkey twice.  Just as an experiment...
> 
> Once the MR is in this state, FASTREG does not work either.
> All FASTREG and LINV flush with IB_WC_MW_BIND_ERR until
> the MR is deregistered.

Mellanox can probably tell us why. 

I was just wondering if posting a double LINV on a valid working FRMR would fail
with these devices.  But its moot.  As you've concluded, looks like the only
safe was to handle this is to dereg them and reallocate...


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2016-03-10 16:21 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-04 16:27 [PATCH v3 00/11] NFS/RDMA client patches for v4.6 Chuck Lever
     [not found] ` <20160304162447.13590.9524.stgit-HOT7RYQ5zl57mBIHr/i2/HG1qET/BF4J@public.gmane.org>
2016-03-04 16:27   ` [PATCH v3 01/11] xprtrdma: Clean up unused RPCRDMA_INLINE_PAD_THRESH macro Chuck Lever
     [not found]     ` <20160304162726.13590.39290.stgit-HOT7RYQ5zl57mBIHr/i2/HG1qET/BF4J@public.gmane.org>
2016-03-08 17:48       ` Sagi Grimberg
2016-03-04 16:27   ` [PATCH v3 02/11] xprtrdma: Clean up physical_op_map() Chuck Lever
     [not found]     ` <20160304162735.13590.96884.stgit-HOT7RYQ5zl57mBIHr/i2/HG1qET/BF4J@public.gmane.org>
2016-03-08 17:48       ` Sagi Grimberg
2016-03-04 16:27   ` [PATCH v3 03/11] xprtrdma: Clean up dprintk format string containing a newline Chuck Lever
     [not found]     ` <20160304162743.13590.23268.stgit-HOT7RYQ5zl57mBIHr/i2/HG1qET/BF4J@public.gmane.org>
2016-03-08 17:48       ` Sagi Grimberg
2016-03-04 16:27   ` [PATCH v3 04/11] xprtrdma: Segment head and tail XDR buffers on page boundaries Chuck Lever
2016-03-04 16:28   ` [PATCH v3 05/11] xprtrdma: Do not wait if ib_post_send() fails Chuck Lever
     [not found]     ` <20160304162801.13590.89343.stgit-HOT7RYQ5zl57mBIHr/i2/HG1qET/BF4J@public.gmane.org>
2016-03-08 17:53       ` Sagi Grimberg
     [not found]         ` <56DF1186.3030303-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-08 18:03           ` Chuck Lever
     [not found]             ` <8696EFBA-B7DB-42AC-AB57-C656070F4ED3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-09 11:09               ` Sagi Grimberg
     [not found]                 ` <56E00483.2060304-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-09 20:47                   ` Chuck Lever
     [not found]                     ` <6B59B087-9CFA-458B-8848-B08B8E14E2C7-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-09 21:40                       ` Anna Schumaker
2016-03-10 10:25                       ` Sagi Grimberg
     [not found]                         ` <56E14BA2.2050504-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2016-03-10 15:04                           ` Steve Wise
2016-03-10 15:05                             ` Chuck Lever
     [not found]                               ` <AC62FAB3-5569-4FA3-93AF-35CD2A1869EF-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-10 15:31                                 ` Steve Wise
2016-03-10 15:35                                   ` Chuck Lever
     [not found]                                     ` <BB3E1E71-E3B0-48D2-BADE-120152BE42D3-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-10 15:54                                       ` Steve Wise
2016-03-10 15:58                                         ` Chuck Lever
     [not found]                                           ` <BE799F1D-970E-49F8-8C96-FFDF4E6E9A9C-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-10 16:10                                             ` Steve Wise
2016-03-10 16:14                                               ` Chuck Lever
     [not found]                                                 ` <B32CA8B9-3EB7-4DC3-A945-5C9F05D5F984-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-10 16:21                                                   ` Steve Wise [this message]
2016-03-10 16:40                           ` Chuck Lever
     [not found]                             ` <54ECE0AF-A930-45E0-A03A-FB7CD789B538-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-03-10 17:01                               ` Anna Schumaker
2016-03-04 16:28   ` [PATCH v3 06/11] rpcrdma: Add RPCRDMA_HDRLEN_ERR Chuck Lever
     [not found]     ` <20160304162809.13590.25065.stgit-HOT7RYQ5zl57mBIHr/i2/HG1qET/BF4J@public.gmane.org>
2016-03-08 17:53       ` Sagi Grimberg
2016-03-04 16:28   ` [PATCH v3 07/11] xprtrdma: Properly handle RDMA_ERROR replies Chuck Lever
2016-03-04 16:28   ` [PATCH v3 08/11] xprtrdma: Serialize credit accounting again Chuck Lever
2016-03-04 16:28   ` [PATCH v3 09/11] xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs Chuck Lever
     [not found]     ` <20160304162836.13590.20232.stgit-HOT7RYQ5zl57mBIHr/i2/HG1qET/BF4J@public.gmane.org>
2016-03-08 17:55       ` Sagi Grimberg
2016-03-04 16:28   ` [PATCH v3 10/11] xprtrdma: Use an anonymous union in struct rpcrdma_mw Chuck Lever
     [not found]     ` <20160304162845.13590.5501.stgit-HOT7RYQ5zl57mBIHr/i2/HG1qET/BF4J@public.gmane.org>
2016-03-08 17:55       ` Sagi Grimberg
2016-03-04 16:28   ` [PATCH v3 11/11] xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='7b6d01d17ae8$e68f7e20$b3ae7a60$@opengridcomputing.com' \
    --to=swise-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
    --cc=anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).