From: "Steve Wise" <swise@opengridcomputing.com>
To: "'Chuck Lever'" <chuck.lever@oracle.com>
Cc: "'Devesh Sharma'" <Devesh.Sharma@Emulex.Com>,
"'Linux NFS Mailing List'" <linux-nfs@vger.kernel.org>,
<linux-rdma@vger.kernel.org>,
"'Trond Myklebust'" <trond.myklebust@primarydata.com>
Subject: RE: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
Date: Thu, 10 Apr 2014 13:34:56 -0500 [thread overview]
Message-ID: <006601cf54eb$92488e30$b6d9aa90$@opengridcomputing.com> (raw)
In-Reply-To: <D7836AB3-FCB6-40EF-9954-B58A05A87791@oracle.com>
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Thursday, April 10, 2014 12:44 PM
> To: Steve Wise
> Cc: Devesh Sharma; Linux NFS Mailing List; linux-rdma@vger.kernel.org; Trond Myklebust
> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
>
>
> On Apr 10, 2014, at 11:01 AM, Steve Wise <swise@opengridcomputing.com> wrote:
>
> > On 4/9/2014 7:26 PM, Chuck Lever wrote:
> >> On Apr 9, 2014, at 7:56 PM, Devesh Sharma <Devesh.Sharma@Emulex.Com> wrote:
> >>
> >>> Hi Chuk and Trond
> >>>
> >>> I will resend a v2 for this.
> >>> What if ib_post_send() fails with immidate error, I that case also DECR_CQCOUNT()
will
> be called but no completion will be reported. Will that not cause any problems?
> >> We should investigate whether an error return from ib_post_{send,recv} means there
will
> be no completion. But I've never seen these verbs fail in practice, so I'm not in a
hurry to make
> work for anyone! ;-)
> >
> > A synchronous failure from ib_post_* means the WR (or at least one of them if there
were >
> 1) failed and did not get submitted to HW. So there will be no completion for those
that failed.
>
> OK.
>
> Our post operations are largely single WRs. Before we address CQCOUNT in error cases,
we'd
> have to deal with chained WRs.
>
> Chained WRs are used only when rpcrdma_register_frmr_external() finds an MR that hasn't
> been invalidated. That's actually working around a FRMR re-use bug (commit 5c635e09). If
the
> underlying re-use problem was fixed, we could get rid of the chained WR in
> register_frmr_external() (and we wouldn't need completions at all for FAST_REG_MR).
>
> But at 100,000 feet, if a post operation fails, that seems like a very serious issue. I
wonder
> whether we would be better off disconnecting and starting over in those cases.
>
I agree. The application is responsible to flow-control its posting of WRs to the SQs/RQs.
So we should never see sync failures with ib_post_* due to over-running the queues.
However, if the QP moves out of RTS for whatever reason, then a multi-threaded application
could encounter sync failures because the QP exited RTS. Anyway, I agree: if there are
any failures with ib_post_*, the application should kill the connection, (LOG SOMETHING!),
and setup a new connection.
My 2 centimes. :)
Steve
WARNING: multiple messages have this Message-ID (diff)
From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Chuck Lever' <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: 'Devesh Sharma'
<Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org>,
'Linux NFS Mailing List'
<linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
'Trond Myklebust'
<trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
Subject: RE: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
Date: Thu, 10 Apr 2014 13:34:56 -0500 [thread overview]
Message-ID: <006601cf54eb$92488e30$b6d9aa90$@opengridcomputing.com> (raw)
In-Reply-To: <D7836AB3-FCB6-40EF-9954-B58A05A87791-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
> Sent: Thursday, April 10, 2014 12:44 PM
> To: Steve Wise
> Cc: Devesh Sharma; Linux NFS Mailing List; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Trond Myklebust
> Subject: Re: [PATCH V1] NFS-RDMA: fix qp pointer validation checks
>
>
> On Apr 10, 2014, at 11:01 AM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>
> > On 4/9/2014 7:26 PM, Chuck Lever wrote:
> >> On Apr 9, 2014, at 7:56 PM, Devesh Sharma <Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org> wrote:
> >>
> >>> Hi Chuk and Trond
> >>>
> >>> I will resend a v2 for this.
> >>> What if ib_post_send() fails with immidate error, I that case also DECR_CQCOUNT()
will
> be called but no completion will be reported. Will that not cause any problems?
> >> We should investigate whether an error return from ib_post_{send,recv} means there
will
> be no completion. But I've never seen these verbs fail in practice, so I'm not in a
hurry to make
> work for anyone! ;-)
> >
> > A synchronous failure from ib_post_* means the WR (or at least one of them if there
were >
> 1) failed and did not get submitted to HW. So there will be no completion for those
that failed.
>
> OK.
>
> Our post operations are largely single WRs. Before we address CQCOUNT in error cases,
we'd
> have to deal with chained WRs.
>
> Chained WRs are used only when rpcrdma_register_frmr_external() finds an MR that hasn't
> been invalidated. That's actually working around a FRMR re-use bug (commit 5c635e09). If
the
> underlying re-use problem was fixed, we could get rid of the chained WR in
> register_frmr_external() (and we wouldn't need completions at all for FAST_REG_MR).
>
> But at 100,000 feet, if a post operation fails, that seems like a very serious issue. I
wonder
> whether we would be better off disconnecting and starting over in those cases.
>
I agree. The application is responsible to flow-control its posting of WRs to the SQs/RQs.
So we should never see sync failures with ib_post_* due to over-running the queues.
However, if the QP moves out of RTS for whatever reason, then a multi-threaded application
could encounter sync failures because the QP exited RTS. Anyway, I agree: if there are
any failures with ib_post_*, the application should kill the connection, (LOG SOMETHING!),
and setup a new connection.
My 2 centimes. :)
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-04-10 18:34 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-09 18:40 [PATCH V1] NFS-RDMA: fix qp pointer validation checks Devesh Sharma
2014-04-09 18:40 ` Devesh Sharma
2014-04-09 20:22 ` Trond Myklebust
2014-04-09 20:22 ` Trond Myklebust
2014-04-09 20:26 ` Chuck Lever
2014-04-09 20:26 ` Chuck Lever
2014-04-09 23:56 ` Devesh Sharma
2014-04-09 23:56 ` Devesh Sharma
2014-04-10 0:26 ` Chuck Lever
2014-04-10 0:26 ` Chuck Lever
2014-04-10 15:01 ` Steve Wise
2014-04-10 15:01 ` Steve Wise
2014-04-10 17:43 ` Chuck Lever
2014-04-10 17:43 ` Chuck Lever
2014-04-10 18:34 ` Steve Wise [this message]
2014-04-10 18:34 ` Steve Wise
2014-04-10 17:42 ` Devesh Sharma
2014-04-10 17:42 ` Devesh Sharma
2014-04-10 17:51 ` Chuck Lever
2014-04-10 17:51 ` Chuck Lever
2014-04-10 17:54 ` Devesh Sharma
2014-04-10 17:54 ` Devesh Sharma
2014-04-10 19:53 ` Chuck Lever
2014-04-10 19:53 ` Chuck Lever
2014-04-11 23:51 ` Devesh Sharma
2014-04-11 23:51 ` Devesh Sharma
2014-04-13 4:01 ` Chuck Lever
2014-04-13 4:01 ` Chuck Lever
2014-04-14 20:53 ` Chuck Lever
2014-04-14 20:53 ` Chuck Lever
2014-04-14 22:46 ` Devesh Sharma
2014-04-14 22:46 ` Devesh Sharma
2014-04-15 0:39 ` Chuck Lever
2014-04-15 0:39 ` Chuck Lever
2014-04-15 18:25 ` Devesh Sharma
2014-04-15 18:25 ` Devesh Sharma
2014-04-23 23:30 ` Devesh Sharma
2014-04-23 23:30 ` Devesh Sharma
2014-04-24 7:12 ` Sagi Grimberg
2014-04-24 7:12 ` Sagi Grimberg
2014-04-24 15:01 ` Chuck Lever
2014-04-24 15:01 ` Chuck Lever
2014-04-24 15:48 ` Devesh Sharma
2014-04-24 15:48 ` Devesh Sharma
2014-04-24 17:44 ` Chuck Lever
2014-04-24 17:44 ` Chuck Lever
2014-04-27 10:12 ` Sagi Grimberg
2014-04-27 10:12 ` Sagi Grimberg
2014-04-27 12:37 ` Chuck Lever
2014-04-27 12:37 ` Chuck Lever
2014-04-28 8:58 ` Sagi Grimberg
2014-04-28 8:58 ` Sagi Grimberg
2014-04-14 23:55 ` Devesh Sharma
2014-04-14 23:55 ` Devesh Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='006601cf54eb$92488e30$b6d9aa90$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
--cc=Devesh.Sharma@Emulex.Com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.