Re: [PATCH RFC 0/5] xprtrdma Send completion batching

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chuck Lever <chuck.lever@oracle.com>
To: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sagi Grimberg <sagi@grimberg.me>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH RFC 0/5] xprtrdma Send completion batching
Date: Wed, 6 Sep 2017 17:00:29 -0400	[thread overview]
Message-ID: <C5144079-7F9C-4783-AF4D-A73411420D5E@oracle.com> (raw)
In-Reply-To: <20170906200957.GB22567@obsidianresearch.com>

> On Sep 6, 2017, at 4:09 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
> 
> On Wed, Sep 06, 2017 at 04:02:24PM -0400, Chuck Lever wrote:
>> 
>>> On Sep 6, 2017, at 3:39 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
>>> 
>>> On Wed, Sep 06, 2017 at 02:33:50PM -0400, Chuck Lever wrote:
>>> 
>>>> B. Force RPC completion to wait for Send completion, which
>>>> would allow the post-v4.6 scatter-gather code to work
>>>> safely. This would need some guarantee that Sends will
>>>> always complete in a short period.
>>> 
>>> Why is waiting for the send completion so fundamentally different from
>>> waiting for the remote RPC reply?
>>> 
>>> I would say that 99% of time the send completion and RPC reply
>>> completion will occure approximately concurrently.
>>> 
>>> eg It is quite likely the RPC reply SEND carries an embeded ack
>>> for the requesting SEND..
>> 
>> Depends on implementation. Average RTT on IB is 3-5 usecs.
>> Average RPC RTT is about an order of magnitude more. Typically
>> the Send is ACK'd more quickly than the RPC Reply can be sent.
>> 
>> But I get your point: the normal case isn't a problem.
>> 
>> The problematic case arises when the Send is not able to complete
>> because the NFS server is not reachable. User starts pounding on
>> ^C, RPC can't complete because Send won't complete, control
>> doesn't return to user.
> 
> Sure, but why is that so different from the NFS server not generating
> a response?
> 
> I thought you already implemetned a ctrl-c scheme that killed the QP?
> Or was that just a discussion?

No, we want to _avoid_ killing the QP if we can. A ctrl-C (or a
timer signal, say) on an otherwise healthy connection must not
perturb other outstanding RPCs, if possible.

What I implemented was a scheme to invalidate the memory of a
(POSIX) signaled RPC before it completes, in case the RPC Reply
hadn't yet arrived.

Currently, the only time the QP might be killed is if the server
attempts to RDMA Write an RPC Reply into one of these invalidated
memory regions. That case can't be avoided with the current RPC-
over-RDMA protocol.

> That is the only way to async terminate outstanding RPCs and clean
> up. Killing the QP will allow the send to be 'completed'.

It forces outstanding Sends to flush.

But as you explained it at the time, xprtrdma needs to wait
somehow for the QP to complete it's transition to error state
before allowing RPCs to complete. Probably ib_drain_qp would be
enough.

And again, we want to preserve the connection if it is healthy.

> Having ctrl-c escalate to a QP tear down after a short timeout seems
> reasonable. 99% of cases will not need the teardown since the send
> will complete..

So I think we are partially there already. If an RPC timeout occurs
(which should be after a few minutes) then xprtrdma does disconnect,
which tears down the QP.

If a timer signal fires on an RPC waiting for a server that is
unreachable, the application won't see the signal until the RPC
times out. Maybe that's how it works now?

And, otherwise, a ^C on an app waiting for an unresponsive server
will not have immediate results. But again, I think that's how it
works now.

> How does sockets based NFS handle this? Doesn't it zero copy from these
> same buffers into SKBs? How does it cancel the SKBs before the NIC
> transmits them?
> 
> Seems like exactly the same kind of problem to me..

TCP has keep-alive, where the sockets consumer is notified as soon
as the network layer determines the remote is unresponsive. The
connection is closed from underneath the consumer.

For RDMA, which has no keep-alive mechanism, we seem to be going
with waiting for the RPC to time out, then the consumer itself
breaks the connection.

--
Chuck Lever

next prev parent reply	other threads:[~2017-09-06 21:00 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-05 17:00 [PATCH RFC 0/5] xprtrdma Send completion batching Chuck Lever
2017-09-05 17:00 ` [PATCH RFC 1/5] xprtrdma: Clean up SGE accounting in rpcrdma_prepare_msg_sges() Chuck Lever
2017-09-05 17:00 ` [PATCH RFC 2/5] xprtrdma: Change return value of rpcrdma_prepare_send_sges() Chuck Lever
2017-09-05 17:00 ` [PATCH RFC 3/5] xprtrdma: Add data structure to manage RDMA Send arguments Chuck Lever
2017-09-05 17:00 ` [PATCH RFC 4/5] xprtrdma: Manage RDMA Send arguments via lock-free circular queue Chuck Lever
2017-09-05 21:50   ` Chuck Lever
2017-09-05 17:00 ` [PATCH RFC 5/5] xprtrdma: Remove atomic send completion counting Chuck Lever
2017-09-05 20:06 ` [PATCH RFC 0/5] xprtrdma Send completion batching Jason Gunthorpe
2017-09-05 21:22   ` Chuck Lever
2017-09-05 22:03     ` Jason Gunthorpe
2017-09-06 14:17       ` Chuck Lever
2017-09-06  1:28     ` Tom Talpey
2017-09-06 11:54 ` Sagi Grimberg
2017-09-06 14:15   ` Chuck Lever
2017-09-06 14:29     ` Sagi Grimberg
2017-09-06 15:11       ` Chuck Lever
2017-09-06 15:23         ` Sagi Grimberg
2017-09-06 18:33           ` Chuck Lever
2017-09-06 19:39             ` Jason Gunthorpe
2017-09-06 20:02               ` Chuck Lever
2017-09-06 20:09                 ` Jason Gunthorpe
2017-09-06 21:00                   ` Chuck Lever [this message]
2017-09-06 21:11                     ` Jason Gunthorpe
2017-09-07 13:17               ` Tom Talpey
2017-09-07 15:08                 ` Jason Gunthorpe
2017-09-07 16:15                   ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C5144079-7F9C-4783-AF4D-A73411420D5E@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).