* [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9
@ 2016-08-15 20:57 Chuck Lever
2016-08-15 20:58 ` [PATCH v1 1/3] rpcrdma: RDMA/CM private message data structure Chuck Lever
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: Chuck Lever @ 2016-08-15 20:57 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Posted for review, the following patch series makes these changes:
- Introduce simple RDMA-CM private message exchange
- Support Remote Invalidation
Available in the "nfsd-rdma-for-4.9" topic branch of this git repo:
git://git.linux-nfs.org/projects/cel/cel-2.6.git
Or for browsing:
http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=log;h=refs/heads/nfsd-rdma-for-4.9
---
Chuck Lever (3):
rpcrdma: RDMA/CM private message data structure
svcrdma: Server-side support for rpcrdma_connect_private
svcrdma: support Remote Invalidation for prototyping
include/linux/sunrpc/rpc_rdma.h | 35 ++++++++++++++++++
include/linux/sunrpc/svc_rdma.h | 1 +
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 58 ++++++++++++++++++++++++++++--
net/sunrpc/xprtrdma/svc_rdma_transport.c | 42 +++++++++++++++++++---
4 files changed, 128 insertions(+), 8 deletions(-)
--
Chuck Lever
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v1 1/3] rpcrdma: RDMA/CM private message data structure
2016-08-15 20:57 [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9 Chuck Lever
@ 2016-08-15 20:58 ` Chuck Lever
2016-08-15 20:58 ` [PATCH v1 2/3] svcrdma: Server-side support for rpcrdma_connect_private Chuck Lever
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2016-08-15 20:58 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Introduce data structure used by both client and server to exchange
implementation details during RDMA/CM connection establishment.
This is an experimental out-of-band exchange between Linux
RPC-over-RDMA Version One implementations, replacing the deprecated
CCP (see RFC 5666bis). The purpose of this extension is to enable
prototyping of features that might be introduced in a subsequent
version of RPC-over-RDMA.
Suggested by Christoph Hellwig and Devesh Sharma.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/rpc_rdma.h | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/include/linux/sunrpc/rpc_rdma.h b/include/linux/sunrpc/rpc_rdma.h
index 3b1ff38..a7da6bf 100644
--- a/include/linux/sunrpc/rpc_rdma.h
+++ b/include/linux/sunrpc/rpc_rdma.h
@@ -41,6 +41,7 @@
#define _LINUX_SUNRPC_RPC_RDMA_H
#include <linux/types.h>
+#include <linux/bitops.h>
#define RPCRDMA_VERSION 1
#define rpcrdma_version cpu_to_be32(RPCRDMA_VERSION)
@@ -129,4 +130,38 @@ enum rpcrdma_proc {
#define rdma_done cpu_to_be32(RDMA_DONE)
#define rdma_error cpu_to_be32(RDMA_ERROR)
+/*
+ * Private extension to RPC-over-RDMA Version One.
+ * Message passed during RDMA-CM connection set-up.
+ *
+ * Add new fields at the end, and don't permute existing
+ * fields.
+ */
+struct rpcrdma_connect_private {
+ __be32 cp_magic;
+ u8 cp_version;
+ u8 cp_flags;
+ u8 cp_send_size;
+ u8 cp_recv_size;
+} __packed;
+
+#define rpcrdma_cmp_magic __cpu_to_be32(0xf6ab0e18)
+
+enum {
+ RPCRDMA_CMP_VERSION = 1,
+ RPCRDMA_CMP_F_SND_W_INV_OK = BIT(0),
+};
+
+static inline u8
+rpcrdma_encode_buffer_size(unsigned int size)
+{
+ return (size >> 10) - 1;
+}
+
+static inline unsigned int
+rpcrdma_decode_buffer_size(u8 val)
+{
+ return ((unsigned int)val + 1) << 10;
+}
+
#endif /* _LINUX_SUNRPC_RPC_RDMA_H */
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 2/3] svcrdma: Server-side support for rpcrdma_connect_private
2016-08-15 20:57 [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9 Chuck Lever
2016-08-15 20:58 ` [PATCH v1 1/3] rpcrdma: RDMA/CM private message data structure Chuck Lever
@ 2016-08-15 20:58 ` Chuck Lever
2016-08-15 20:58 ` [PATCH v1 3/3] svcrdma: support Remote Invalidation for prototyping Chuck Lever
2016-08-16 10:29 ` [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9 Sagi Grimberg
3 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2016-08-15 20:58 UTC (permalink / raw)
To: linux-rdma, linux-nfs
Prepare to receive an RDMA-CM private message when handling a new
connection attempt, and send a similar message as part of connection
acceptance.
Both sides can communicate their various implementation limits.
Implementations that don't support this sideband protocol ignore it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
net/sunrpc/xprtrdma/svc_rdma_transport.c | 34 ++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index dd94401..4843824 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -642,6 +642,21 @@ int svc_rdma_repost_recv(struct svcxprt_rdma *xprt, gfp_t flags)
return ret;
}
+static void
+svc_rdma_parse_connect_private(struct svcxprt_rdma *newxprt,
+ struct rdma_conn_param *param)
+{
+ const struct rpcrdma_connect_private *pmsg = param->private_data;
+
+ if (pmsg &&
+ pmsg->cp_magic == rpcrdma_cmp_magic &&
+ pmsg->cp_version == RPCRDMA_CMP_VERSION) {
+ dprintk("svcrdma: client send_size %u, recv_size %u\n",
+ rpcrdma_decode_buffer_size(pmsg->cp_send_size),
+ rpcrdma_decode_buffer_size(pmsg->cp_recv_size));
+ }
+}
+
/*
* This function handles the CONNECT_REQUEST event on a listening
* endpoint. It is passed the cma_id for the _new_ connection. The context in
@@ -653,7 +668,8 @@ int svc_rdma_repost_recv(struct svcxprt_rdma *xprt, gfp_t flags)
* will call the recvfrom method on the listen xprt which will accept the new
* connection.
*/
-static void handle_connect_req(struct rdma_cm_id *new_cma_id, size_t client_ird)
+static void handle_connect_req(struct rdma_cm_id *new_cma_id,
+ struct rdma_conn_param *param)
{
struct svcxprt_rdma *listen_xprt = new_cma_id->context;
struct svcxprt_rdma *newxprt;
@@ -669,9 +685,10 @@ static void handle_connect_req(struct rdma_cm_id *new_cma_id, size_t client_ird)
new_cma_id->context = newxprt;
dprintk("svcrdma: Creating newxprt=%p, cm_id=%p, listenxprt=%p\n",
newxprt, newxprt->sc_cm_id, listen_xprt);
+ svc_rdma_parse_connect_private(newxprt, param);
/* Save client advertised inbound read limit for use later in accept. */
- newxprt->sc_ord = client_ird;
+ newxprt->sc_ord = param->initiator_depth;
/* Set the local and remote addresses in the transport */
sa = (struct sockaddr *)&newxprt->sc_cm_id->route.addr.dst_addr;
@@ -706,8 +723,7 @@ static int rdma_listen_handler(struct rdma_cm_id *cma_id,
dprintk("svcrdma: Connect request on cma_id=%p, xprt = %p, "
"event = %s (%d)\n", cma_id, cma_id->context,
rdma_event_msg(event->event), event->event);
- handle_connect_req(cma_id,
- event->param.conn.initiator_depth);
+ handle_connect_req(cma_id, &event->param.conn);
break;
case RDMA_CM_EVENT_ESTABLISHED:
@@ -941,6 +957,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
struct svcxprt_rdma *listen_rdma;
struct svcxprt_rdma *newxprt = NULL;
struct rdma_conn_param conn_param;
+ struct rpcrdma_connect_private pmsg;
struct ib_qp_init_attr qp_attr;
struct ib_device *dev;
unsigned int i;
@@ -1094,11 +1111,20 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
/* Swap out the handler */
newxprt->sc_cm_id->event_handler = rdma_cma_handler;
+ /* Construct RDMA-CM private message */
+ pmsg.cp_magic = rpcrdma_cmp_magic;
+ pmsg.cp_version = RPCRDMA_CMP_VERSION;
+ pmsg.cp_flags = 0;
+ pmsg.cp_send_size = pmsg.cp_recv_size =
+ rpcrdma_encode_buffer_size(newxprt->sc_max_req_size);
+
/* Accept Connection */
set_bit(RDMAXPRT_CONN_PENDING, &newxprt->sc_flags);
memset(&conn_param, 0, sizeof conn_param);
conn_param.responder_resources = 0;
conn_param.initiator_depth = newxprt->sc_ord;
+ conn_param.private_data = &pmsg;
+ conn_param.private_data_len = sizeof(pmsg);
ret = rdma_accept(newxprt->sc_cm_id, &conn_param);
if (ret) {
dprintk("svcrdma: failed to accept new connection, ret=%d\n",
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v1 3/3] svcrdma: support Remote Invalidation for prototyping
2016-08-15 20:57 [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9 Chuck Lever
2016-08-15 20:58 ` [PATCH v1 1/3] rpcrdma: RDMA/CM private message data structure Chuck Lever
2016-08-15 20:58 ` [PATCH v1 2/3] svcrdma: Server-side support for rpcrdma_connect_private Chuck Lever
@ 2016-08-15 20:58 ` Chuck Lever
2016-08-16 10:29 ` [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9 Sagi Grimberg
3 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2016-08-15 20:58 UTC (permalink / raw)
To: linux-rdma, linux-nfs
To allow testing, add a sysctl that enables the use of Send With
Invalidate in place of Send when transmitting RPC replies. The
invalidate_rkey is arbitrarily chosen from among rkeys present
in the RPC-over-RDMA header's chunk lists.
Send With Invalidate can be enabled when all client and server HCAs
support it, and the client does not send persistently registered
rkeys (like a local DMA rkey).
Send With Invalidate improves performance only when clients can
recognize, while processing an RPC reply, that an rkey has already
been invalidated. That is a separate change.
In the future, the RPC-over-RDMA protocol might support Remote
Invalidation properly. The protocol needs to enable signaling
between peers to indicate when Remote Invalidation can be used.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
include/linux/sunrpc/svc_rdma.h | 1 +
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 58 ++++++++++++++++++++++++++++--
net/sunrpc/xprtrdma/svc_rdma_transport.c | 12 +++++-
3 files changed, 65 insertions(+), 6 deletions(-)
diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index d6917b8..8a43650 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -136,6 +136,7 @@ struct svcxprt_rdma {
int sc_ord; /* RDMA read limit */
int sc_max_sge;
int sc_max_sge_rd; /* max sge for read target */
+ bool sc_snd_w_inv; /* OK to use Send With Invalidate */
atomic_t sc_sq_count; /* Number of SQ WR on queue */
unsigned int sc_sq_depth; /* Depth of SQ */
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index 54d53330..e9adbe5 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -225,6 +225,48 @@ svc_rdma_get_reply_array(struct rpcrdma_msg *rmsgp,
return rp_ary;
}
+/* RPC-over-RDMA Version One private extension: Remote Invalidation.
+ * Responder's choice: requester signals it can handle Send With
+ * Invalidate, and responder chooses one rkey to invalidate.
+ *
+ * Find a candidate rkey to invalidate when sending a reply. Picks the
+ * first rkey it finds in the chunks lists.
+ *
+ * Returns zero if RPC's chunk lists are empty.
+ */
+static u32 svc_rdma_get_inv_rkey(struct rpcrdma_msg *rdma_argp,
+ struct rpcrdma_write_array *wr_ary,
+ struct rpcrdma_write_array *rp_ary)
+{
+ struct rpcrdma_read_chunk *rd_ary;
+ struct rpcrdma_segment *arg_ch;
+ u32 inv_rkey;
+
+ inv_rkey = 0;
+
+ rd_ary = svc_rdma_get_read_chunk(rdma_argp);
+ if (rd_ary) {
+ inv_rkey = be32_to_cpu(rd_ary->rc_target.rs_handle);
+ goto out;
+ }
+
+ if (wr_ary && be32_to_cpu(wr_ary->wc_nchunks)) {
+ arg_ch = &wr_ary->wc_array[0].wc_target;
+ inv_rkey = be32_to_cpu(arg_ch->rs_handle);
+ goto out;
+ }
+
+ if (rp_ary && be32_to_cpu(rp_ary->wc_nchunks)) {
+ arg_ch = &rp_ary->wc_array[0].wc_target;
+ inv_rkey = be32_to_cpu(arg_ch->rs_handle);
+ goto out;
+ }
+
+out:
+ dprintk("svcrdma: Send With Invalidate rkey=%08x\n", inv_rkey);
+ return inv_rkey;
+}
+
/* Assumptions:
* - The specified write_len can be represented in sc_max_sge * PAGE_SIZE
*/
@@ -464,7 +506,8 @@ static int send_reply(struct svcxprt_rdma *rdma,
struct page *page,
struct rpcrdma_msg *rdma_resp,
struct svc_rdma_req_map *vec,
- int byte_count)
+ int byte_count,
+ u32 inv_rkey)
{
struct svc_rdma_op_ctxt *ctxt;
struct ib_send_wr send_wr;
@@ -549,7 +592,11 @@ static int send_reply(struct svcxprt_rdma *rdma,
send_wr.wr_cqe = &ctxt->cqe;
send_wr.sg_list = ctxt->sge;
send_wr.num_sge = sge_no;
- send_wr.opcode = IB_WR_SEND;
+ if (inv_rkey) {
+ send_wr.opcode = IB_WR_SEND_WITH_INV;
+ send_wr.ex.invalidate_rkey = inv_rkey;
+ } else
+ send_wr.opcode = IB_WR_SEND;
send_wr.send_flags = IB_SEND_SIGNALED;
ret = svc_rdma_send(rdma, &send_wr);
@@ -581,6 +628,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
int inline_bytes;
struct page *res_page;
struct svc_rdma_req_map *vec;
+ u32 inv_rkey;
dprintk("svcrdma: sending response for rqstp=%p\n", rqstp);
@@ -591,6 +639,10 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
wr_ary = svc_rdma_get_write_array(rdma_argp);
rp_ary = svc_rdma_get_reply_array(rdma_argp, wr_ary);
+ inv_rkey = 0;
+ if (rdma->sc_snd_w_inv)
+ inv_rkey = svc_rdma_get_inv_rkey(rdma_argp, wr_ary, rp_ary);
+
/* Build an req vec for the XDR */
vec = svc_rdma_get_req_map(rdma);
ret = svc_rdma_map_xdr(rdma, &rqstp->rq_res, vec, wr_ary != NULL);
@@ -633,7 +685,7 @@ int svc_rdma_sendto(struct svc_rqst *rqstp)
goto err1;
ret = send_reply(rdma, rqstp, res_page, rdma_resp, vec,
- inline_bytes);
+ inline_bytes, inv_rkey);
if (ret < 0)
goto err1;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 4843824..1fe34f6 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -651,9 +651,14 @@ svc_rdma_parse_connect_private(struct svcxprt_rdma *newxprt,
if (pmsg &&
pmsg->cp_magic == rpcrdma_cmp_magic &&
pmsg->cp_version == RPCRDMA_CMP_VERSION) {
- dprintk("svcrdma: client send_size %u, recv_size %u\n",
+ newxprt->sc_snd_w_inv = pmsg->cp_flags &
+ RPCRDMA_CMP_F_SND_W_INV_OK;
+
+ dprintk("svcrdma: client send_size %u, recv_size %u "
+ "remote inv %ssupported\n",
rpcrdma_decode_buffer_size(pmsg->cp_send_size),
- rpcrdma_decode_buffer_size(pmsg->cp_recv_size));
+ rpcrdma_decode_buffer_size(pmsg->cp_recv_size),
+ newxprt->sc_snd_w_inv ? "" : "un");
}
}
@@ -1087,7 +1092,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
dev->attrs.max_fast_reg_page_list_len;
newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
newxprt->sc_reader = rdma_read_chunk_frmr;
- }
+ } else
+ newxprt->sc_snd_w_inv = false;
/*
* Determine if a DMA MR is required and if so, what privs are required
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9
2016-08-15 20:57 [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9 Chuck Lever
` (2 preceding siblings ...)
2016-08-15 20:58 ` [PATCH v1 3/3] svcrdma: support Remote Invalidation for prototyping Chuck Lever
@ 2016-08-16 10:29 ` Sagi Grimberg
2016-08-16 14:58 ` Chuck Lever
3 siblings, 1 reply; 6+ messages in thread
From: Sagi Grimberg @ 2016-08-16 10:29 UTC (permalink / raw)
To: Chuck Lever, linux-rdma, linux-nfs
The series looks good to me Chuck,
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9
2016-08-16 10:29 ` [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9 Sagi Grimberg
@ 2016-08-16 14:58 ` Chuck Lever
0 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2016-08-16 14:58 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: linux-rdma, Linux NFS Mailing List
> On Aug 16, 2016, at 6:29 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
> The series looks good to me Chuck,
>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Thanks, Sagi.
--
Chuck Lever
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-08-16 14:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-15 20:57 [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9 Chuck Lever
2016-08-15 20:58 ` [PATCH v1 1/3] rpcrdma: RDMA/CM private message data structure Chuck Lever
2016-08-15 20:58 ` [PATCH v1 2/3] svcrdma: Server-side support for rpcrdma_connect_private Chuck Lever
2016-08-15 20:58 ` [PATCH v1 3/3] svcrdma: support Remote Invalidation for prototyping Chuck Lever
2016-08-16 10:29 ` [PATCH v1 0/3] server-side NFS/RDMA patches proposed for v4.9 Sagi Grimberg
2016-08-16 14:58 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).