linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-rdma@vger.kernel.org,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH v1 5/8] svcrdma: Add infrastructure to receive backwards direction RPC/RDMA replies
Date: Mon, 23 Nov 2015 21:02:44 -0500	[thread overview]
Message-ID: <5653C544.1000303@talpey.com> (raw)
In-Reply-To: <0296249F-C653-4DA1-B40E-070B14BD0C85@oracle.com>

I'll have to think about whether I agree with that as a
protocol statement.

Chunks in a reply are there to account for the data that is
handled in the chunk of a request. So it kind of comes down
to whether RDMA is allowed (or used) on the backchannel. I
still think that is fundamentally an upper layer question,
not an RPC one.


On 11/23/2015 8:47 PM, Chuck Lever wrote:
>
>> On Nov 23, 2015, at 7:44 PM, Tom Talpey <tom@talpey.com> wrote:
>>
>> On 11/23/2015 5:20 PM, Chuck Lever wrote:
>>> To support the NFSv4.1 backchannel on RDMA connections, add a
>>> capability for receiving an RPC/RDMA reply on a connection
>>> established by a client.
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>   net/sunrpc/xprtrdma/rpc_rdma.c          |   76 +++++++++++++++++++++++++++++++
>>>   net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |   60 ++++++++++++++++++++++++
>>>   net/sunrpc/xprtrdma/xprt_rdma.h         |    4 ++
>>>   3 files changed, 140 insertions(+)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> index c10d969..fef0623 100644
>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> @@ -946,3 +946,79 @@ repost:
>>>   	if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, &r_xprt->rx_ep, rep))
>>>   		rpcrdma_recv_buffer_put(rep);
>>>   }
>>> +
>>> +#if defined(CONFIG_SUNRPC_BACKCHANNEL)
>>> +
>>> +int
>>> +rpcrdma_handle_bc_reply(struct rpc_xprt *xprt, struct rpcrdma_msg *rmsgp,
>>> +			struct xdr_buf *rcvbuf)
>>> +{
>>> +	struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
>>> +	struct kvec *dst, *src = &rcvbuf->head[0];
>>> +	struct rpc_rqst *req;
>>> +	unsigned long cwnd;
>>> +	u32 credits;
>>> +	size_t len;
>>> +	__be32 xid;
>>> +	__be32 *p;
>>> +	int ret;
>>> +
>>> +	p = (__be32 *)src->iov_base;
>>> +	len = src->iov_len;
>>> +	xid = rmsgp->rm_xid;
>>> +
>>> +	pr_info("%s: xid=%08x, length=%zu\n",
>>> +		__func__, be32_to_cpu(xid), len);
>>> +	pr_info("%s: RPC/RDMA: %*ph\n",
>>> +		__func__, (int)RPCRDMA_HDRLEN_MIN, rmsgp);
>>> +	pr_info("%s:      RPC: %*ph\n",
>>> +		__func__, (int)len, p);
>>> +
>>> +	ret = -EAGAIN;
>>> +	if (src->iov_len < 24)
>>> +		goto out_shortreply;
>>> +
>>> +	spin_lock_bh(&xprt->transport_lock);
>>> +	req = xprt_lookup_rqst(xprt, xid);
>>> +	if (!req)
>>> +		goto out_notfound;
>>> +
>>> +	dst = &req->rq_private_buf.head[0];
>>> +	memcpy(&req->rq_private_buf, &req->rq_rcv_buf, sizeof(struct xdr_buf));
>>> +	if (dst->iov_len < len)
>>> +		goto out_unlock;
>>> +	memcpy(dst->iov_base, p, len);
>>> +
>>> +	credits = be32_to_cpu(rmsgp->rm_credit);
>>> +	if (credits == 0)
>>> +		credits = 1;	/* don't deadlock */
>>> +	else if (credits > r_xprt->rx_buf.rb_bc_max_requests)
>>> +		credits = r_xprt->rx_buf.rb_bc_max_requests;
>>> +
>>> +	cwnd = xprt->cwnd;
>>> +	xprt->cwnd = credits << RPC_CWNDSHIFT;
>>> +	if (xprt->cwnd > cwnd)
>>> +		xprt_release_rqst_cong(req->rq_task);
>>> +
>>> +	ret = 0;
>>> +	xprt_complete_rqst(req->rq_task, rcvbuf->len);
>>> +	rcvbuf->len = 0;
>>> +
>>> +out_unlock:
>>> +	spin_unlock_bh(&xprt->transport_lock);
>>> +out:
>>> +	return ret;
>>> +
>>> +out_shortreply:
>>> +	pr_info("svcrdma: short bc reply: xprt=%p, len=%zu\n",
>>> +		xprt, src->iov_len);
>>> +	goto out;
>>> +
>>> +out_notfound:
>>> +	pr_info("svcrdma: unrecognized bc reply: xprt=%p, xid=%08x\n",
>>> +		xprt, be32_to_cpu(xid));
>>> +
>>> +	goto out_unlock;
>>> +}
>>> +
>>> +#endif	/* CONFIG_SUNRPC_BACKCHANNEL */
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> index ff4f01e..2b762b5 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>>> @@ -47,6 +47,7 @@
>>>   #include <rdma/ib_verbs.h>
>>>   #include <rdma/rdma_cm.h>
>>>   #include <linux/sunrpc/svc_rdma.h>
>>> +#include "xprt_rdma.h"
>>>
>>>   #define RPCDBG_FACILITY	RPCDBG_SVCXPRT
>>>
>>> @@ -567,6 +568,42 @@ static int rdma_read_complete(struct svc_rqst *rqstp,
>>>   	return ret;
>>>   }
>>>
>>> +#if defined(CONFIG_SUNRPC_BACKCHANNEL)
>>> +
>>> +/* By convention, backchannel calls arrive via rdma_msg type
>>> + * messages, and never populate the chunk lists. This makes
>>> + * the RPC/RDMA header small and fixed in size, so it is
>>> + * straightforward to check the RPC header's direction field.
>>> + */
>>> +static bool
>>> +svc_rdma_is_backchannel_reply(struct svc_xprt *xprt, struct rpcrdma_msg *rmsgp)
>>> +{
>>> +	__be32 *p = (__be32 *)rmsgp;
>>> +
>>> +	if (!xprt->xpt_bc_xprt)
>>> +		return false;
>>> +
>>> +	if (rmsgp->rm_type != rdma_msg)
>>> +		return false;
>>> +	if (rmsgp->rm_body.rm_chunks[0] != xdr_zero)
>>> +		return false;
>>> +	if (rmsgp->rm_body.rm_chunks[1] != xdr_zero)
>>> +		return false;
>>> +	if (rmsgp->rm_body.rm_chunks[2] != xdr_zero)
>>> +		return false;
>>
>> The above assertion is only true for the NFS behavior as spec'd
>> today (no chunk-bearing bulk data on existing backchannel NFS
>> protocol messages). That at least deserves a comment.
>
> Not sure what you mean:
>
> https://datatracker.ietf.org/doc/draft-ietf-nfsv4-rpcrdma-bidirection/
>
> says a chunk-less message is how a backward RPC reply goes over
> RPC/RDMA. NFS is not in the picture here.
>
>
>> Or, why
>> not simply ignore the chunks? They're not the receiver's problem.
>
> This check is done before the chunks are parsed. The point
> is the receiver has to verify that the RPC-over-RDMA header
> is the small, fix-sized kind _first_ before it can go look
> at the CALLDIR field in the RPC header.
>
>
>>> +
>>> +	/* sanity */
>>> +	if (p[7] != rmsgp->rm_xid)
>>> +		return false;
>>> +	/* call direction */
>>> +	if (p[8] == cpu_to_be32(RPC_CALL))
>>> +		return false;
>>> +
>>> +	return true;
>>> +}
>>> +
>>> +#endif	/* CONFIG_SUNRPC_BACKCHANNEL */
>>> +
>>>   /*
>>>    * Set up the rqstp thread context to point to the RQ buffer. If
>>>    * necessary, pull additional data from the client with an RDMA_READ
>>> @@ -632,6 +669,17 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>>>   		goto close_out;
>>>   	}
>>>
>>> +#if defined(CONFIG_SUNRPC_BACKCHANNEL)
>>> +	if (svc_rdma_is_backchannel_reply(xprt, rmsgp)) {
>>> +		ret = rpcrdma_handle_bc_reply(xprt->xpt_bc_xprt, rmsgp,
>>> +					      &rqstp->rq_arg);
>>> +		svc_rdma_put_context(ctxt, 0);
>>> +		if (ret)
>>> +			goto repost;
>>> +		return ret;
>>> +	}
>>> +#endif
>>> +
>>>   	/* Read read-list data. */
>>>   	ret = rdma_read_chunks(rdma_xprt, rmsgp, rqstp, ctxt);
>>>   	if (ret > 0) {
>>> @@ -668,4 +716,16 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>>>   	set_bit(XPT_CLOSE, &xprt->xpt_flags);
>>>   defer:
>>>   	return 0;
>>> +
>>> +#if defined(CONFIG_SUNRPC_BACKCHANNEL)
>>> +repost:
>>> +	ret = svc_rdma_post_recv(rdma_xprt);
>>> +	if (ret) {
>>> +		pr_info("svcrdma: could not post a receive buffer, err=%d."
>>> +		        "Closing transport %p.\n", ret, rdma_xprt);
>>> +		set_bit(XPT_CLOSE, &rdma_xprt->sc_xprt.xpt_flags);
>>> +		ret = -ENOTCONN;
>>> +	}
>>> +	return ret;
>>> +#endif
>>>   }
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> index ac7f8d4..9aeff2b 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> @@ -309,6 +309,8 @@ struct rpcrdma_buffer {
>>>   	u32			rb_bc_srv_max_requests;
>>>   	spinlock_t		rb_reqslock;	/* protect rb_allreqs */
>>>   	struct list_head	rb_allreqs;
>>> +
>>> +	u32			rb_bc_max_requests;
>>>   };
>>>   #define rdmab_to_ia(b) (&container_of((b), struct rpcrdma_xprt, rx_buf)->rx_ia)
>>>
>>> @@ -511,6 +513,8 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *);
>>>    * RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
>>>    */
>>>   int rpcrdma_marshal_req(struct rpc_rqst *);
>>> +int rpcrdma_handle_bc_reply(struct rpc_xprt *, struct rpcrdma_msg *,
>>> +			    struct xdr_buf *);
>>>
>>>   /* RPC/RDMA module init - xprtrdma/transport.c
>>>    */
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

  reply	other threads:[~2015-11-24  2:02 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-23 22:20 [PATCH v1 0/8] NFS/RDMA server patches for 4.5 Chuck Lever
2015-11-23 22:20 ` [PATCH v1 1/8] svcrdma: Do not send XDR roundup bytes for a write chunk Chuck Lever
2015-11-23 22:20 ` [PATCH v1 2/8] svcrdma: Define maximum number of backchannel requests Chuck Lever
2015-11-24  0:39   ` Tom Talpey
2015-11-24  1:09     ` Chuck Lever
2015-11-24  1:19       ` Tom Talpey
2015-11-24  1:36         ` Trond Myklebust
2015-11-24  1:36         ` Chuck Lever
2015-11-24  1:47           ` Tom Talpey
2015-11-23 22:20 ` [PATCH v1 3/8] svcrdma: Add svc_rdma_get_context() API that is allowed to fail Chuck Lever
2015-11-24  6:55   ` Christoph Hellwig
2015-11-24 14:24     ` Chuck Lever
2015-11-24 20:02       ` Christoph Hellwig
2015-11-24 20:06         ` Chuck Lever
2015-12-04 15:29     ` Chuck Lever
2015-11-23 22:20 ` [PATCH v1 4/8] svcrdma: Add infrastructure to send backwards direction RPC/RDMA calls Chuck Lever
2015-11-23 22:20 ` [PATCH v1 5/8] svcrdma: Add infrastructure to receive backwards direction RPC/RDMA replies Chuck Lever
2015-11-24  0:44   ` Tom Talpey
2015-11-24  1:47     ` Chuck Lever
2015-11-24  2:02       ` Tom Talpey [this message]
2015-11-23 22:21 ` [PATCH v1 6/8] xprtrdma: Add class for RDMA backwards direction transport Chuck Lever
2015-11-24  0:49   ` Tom Talpey
2015-11-23 22:21 ` [PATCH v1 7/8] svcrdma: No need to count WRs in svc_rdma_send() Chuck Lever
2015-11-23 22:21 ` [PATCH v1 8/8] svcrdma: Remove svc_rdma_fastreg_mr::access_flags field Chuck Lever
2015-11-24  0:52   ` Tom Talpey
2015-11-24  0:53     ` Chuck Lever
2015-11-24  6:39       ` Christoph Hellwig
2015-11-24 14:08         ` Chuck Lever
2015-11-24 16:03           ` Christoph Hellwig
2015-11-24 16:04             ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5653C544.1000303@talpey.com \
    --to=tom@talpey.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).