linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14
@ 2017-08-28 19:05 Chuck Lever
  2017-08-28 19:06 ` [PATCH 1/3] svcrdma: Limit RQ depth Chuck Lever
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Chuck Lever @ 2017-08-28 19:05 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Hi Bruce-

These patches allow svcrdma to adjust more precisely to the limits
of the underlying RDMA device on the server.

These have been floating around for several months, and were posted
a few weeks ago for review on linux-rdma. They should be ready for
you to take for v4.14.

These are the final server-side patches I have for for v4.14 cycle.


---

Chuck Lever (3):
      svcrdma: Limit RQ depth
      rdma core: Add rdma_rw_mr_payload()
      svcrdma: Estimate Send Queue depth properly


 drivers/infiniband/core/rw.c             |   24 ++++++++++++++++++++
 include/rdma/rw.h                        |    2 ++
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   36 +++++++++++++++++++++---------
 3 files changed, 51 insertions(+), 11 deletions(-)

--
Chuck Lever

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] svcrdma: Limit RQ depth
  2017-08-28 19:05 [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14 Chuck Lever
@ 2017-08-28 19:06 ` Chuck Lever
  2017-08-28 19:06 ` [PATCH 2/3] rdma core: Add rdma_rw_mr_payload() Chuck Lever
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Chuck Lever @ 2017-08-28 19:06 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

Ensure that the chosen Receive Queue depth for a newly created
transport does not overrun the QP WR limit of the underlying device.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 2aa8473..cdb04f8 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -167,8 +167,8 @@ static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
 {
 	unsigned int i;
 
-	/* Each RPC/RDMA credit can consume a number of send
-	 * and receive WQEs. One ctxt is allocated for each.
+	/* Each RPC/RDMA credit can consume one Receive and
+	 * one Send WQE at the same time.
 	 */
 	i = xprt->sc_sq_depth + xprt->sc_rq_depth;
 
@@ -742,13 +742,18 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	newxprt->sc_max_sge = min((size_t)dev->attrs.max_sge,
 				  (size_t)RPCSVC_MAXPAGES);
 	newxprt->sc_max_req_size = svcrdma_max_req_size;
-	newxprt->sc_max_requests = min_t(u32, dev->attrs.max_qp_wr,
-					 svcrdma_max_requests);
-	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
-	newxprt->sc_max_bc_requests = min_t(u32, dev->attrs.max_qp_wr,
-					    svcrdma_max_bc_requests);
+	newxprt->sc_max_requests = svcrdma_max_requests;
+	newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
 	newxprt->sc_rq_depth = newxprt->sc_max_requests +
 			       newxprt->sc_max_bc_requests;
+	if (newxprt->sc_rq_depth > dev->attrs.max_qp_wr) {
+		pr_warn("svcrdma: reducing receive depth to %d\n",
+			dev->attrs.max_qp_wr);
+		newxprt->sc_rq_depth = dev->attrs.max_qp_wr;
+		newxprt->sc_max_requests = newxprt->sc_rq_depth - 2;
+		newxprt->sc_max_bc_requests = 2;
+	}
+	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
 	newxprt->sc_sq_depth = newxprt->sc_rq_depth;
 	atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth);
 


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] rdma core: Add rdma_rw_mr_payload()
  2017-08-28 19:05 [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14 Chuck Lever
  2017-08-28 19:06 ` [PATCH 1/3] svcrdma: Limit RQ depth Chuck Lever
@ 2017-08-28 19:06 ` Chuck Lever
  2017-08-28 19:06 ` [PATCH 3/3] svcrdma: Estimate Send Queue depth properly Chuck Lever
  2017-09-05 18:59 ` [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14 J. Bruce Fields
  3 siblings, 0 replies; 5+ messages in thread
From: Chuck Lever @ 2017-08-28 19:06 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

The amount of payload per MR depends on device capabilities and
the memory registration mode in use. The new rdma_rw API hides both,
making it difficult for ULPs to determine how large their transport
send queues need to be.

Expose the MR payload information via a new API.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
---
 drivers/infiniband/core/rw.c |   24 ++++++++++++++++++++++++
 include/rdma/rw.h            |    2 ++
 2 files changed, 26 insertions(+)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index dbfd854..6ca607e 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -643,6 +643,30 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
 
+/**
+ * rdma_rw_mr_factor - return number of MRs required for a payload
+ * @device:	device handling the connection
+ * @port_num:	port num to which the connection is bound
+ * @maxpages:	maximum payload pages per rdma_rw_ctx
+ *
+ * Returns the number of MRs the device requires to move @maxpayload
+ * bytes. The returned value is used during transport creation to
+ * compute max_rdma_ctxts and the size of the transport's Send and
+ * Send Completion Queues.
+ */
+unsigned int rdma_rw_mr_factor(struct ib_device *device, u8 port_num,
+			       unsigned int maxpages)
+{
+	unsigned int mr_pages;
+
+	if (rdma_rw_can_use_mr(device, port_num))
+		mr_pages = rdma_rw_fr_page_list_len(device);
+	else
+		mr_pages = device->attrs.max_sge_rd;
+	return DIV_ROUND_UP(maxpages, mr_pages);
+}
+EXPORT_SYMBOL(rdma_rw_mr_factor);
+
 void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
 {
 	u32 factor;
diff --git a/include/rdma/rw.h b/include/rdma/rw.h
index 377d865..a3cbbc7 100644
--- a/include/rdma/rw.h
+++ b/include/rdma/rw.h
@@ -81,6 +81,8 @@ struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 		struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
 
+unsigned int rdma_rw_mr_factor(struct ib_device *device, u8 port_num,
+		unsigned int maxpages);
 void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr);
 int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr);
 void rdma_rw_cleanup_mrs(struct ib_qp *qp);


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] svcrdma: Estimate Send Queue depth properly
  2017-08-28 19:05 [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14 Chuck Lever
  2017-08-28 19:06 ` [PATCH 1/3] svcrdma: Limit RQ depth Chuck Lever
  2017-08-28 19:06 ` [PATCH 2/3] rdma core: Add rdma_rw_mr_payload() Chuck Lever
@ 2017-08-28 19:06 ` Chuck Lever
  2017-09-05 18:59 ` [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14 J. Bruce Fields
  3 siblings, 0 replies; 5+ messages in thread
From: Chuck Lever @ 2017-08-28 19:06 UTC (permalink / raw)
  To: bfields; +Cc: linux-rdma, linux-nfs

The rdma_rw API adjusts max_send_wr upwards during the
rdma_create_qp() call. If the ULP actually wants to take advantage
of these extra resources, it must increase the size of its send
completion queue (created before rdma_create_qp is called) and
increase its send queue accounting limit.

Use the new rdma_rw_mr_factor API to figure out the correct value
to use for the Send Queue and Send Completion Queue depths.

And, ensure that the chosen Send Queue depth for a newly created
transport does not overrun the QP WR limit of the underlying device.

Lastly, there's no longer a need to carry the Send Queue depth in
struct svcxprt_rdma, since the value is used only in the
svc_rdma_accept() path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index cdb04f8..5caf8e7 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -51,6 +51,7 @@
 #include <linux/workqueue.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
+#include <rdma/rw.h>
 #include <linux/sunrpc/svc_rdma.h>
 #include <linux/export.h>
 #include "xprt_rdma.h"
@@ -713,7 +714,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	struct ib_qp_init_attr qp_attr;
 	struct ib_device *dev;
 	struct sockaddr *sap;
-	unsigned int i;
+	unsigned int i, ctxts;
 	int ret = 0;
 
 	listen_rdma = container_of(xprt, struct svcxprt_rdma, sc_xprt);
@@ -754,7 +755,14 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		newxprt->sc_max_bc_requests = 2;
 	}
 	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
-	newxprt->sc_sq_depth = newxprt->sc_rq_depth;
+	ctxts = rdma_rw_mr_factor(dev, newxprt->sc_port_num, RPCSVC_MAXPAGES);
+	ctxts *= newxprt->sc_max_requests;
+	newxprt->sc_sq_depth = newxprt->sc_rq_depth + ctxts;
+	if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr) {
+		pr_warn("svcrdma: reducing send depth to %d\n",
+			dev->attrs.max_qp_wr);
+		newxprt->sc_sq_depth = dev->attrs.max_qp_wr;
+	}
 	atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth);
 
 	if (!svc_rdma_prealloc_ctxts(newxprt))
@@ -789,8 +797,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	qp_attr.event_handler = qp_event_handler;
 	qp_attr.qp_context = &newxprt->sc_xprt;
 	qp_attr.port_num = newxprt->sc_port_num;
-	qp_attr.cap.max_rdma_ctxs = newxprt->sc_max_requests;
-	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth;
+	qp_attr.cap.max_rdma_ctxs = ctxts;
+	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth - ctxts;
 	qp_attr.cap.max_recv_wr = newxprt->sc_rq_depth;
 	qp_attr.cap.max_send_sge = newxprt->sc_max_sge;
 	qp_attr.cap.max_recv_sge = newxprt->sc_max_sge;
@@ -858,6 +866,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	dprintk("    remote address  : %pIS:%u\n", sap, rpc_get_port(sap));
 	dprintk("    max_sge         : %d\n", newxprt->sc_max_sge);
 	dprintk("    sq_depth        : %d\n", newxprt->sc_sq_depth);
+	dprintk("    rdma_rw_ctxs    : %d\n", ctxts);
 	dprintk("    max_requests    : %d\n", newxprt->sc_max_requests);
 	dprintk("    ord             : %d\n", newxprt->sc_ord);
 


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14
  2017-08-28 19:05 [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14 Chuck Lever
                   ` (2 preceding siblings ...)
  2017-08-28 19:06 ` [PATCH 3/3] svcrdma: Estimate Send Queue depth properly Chuck Lever
@ 2017-09-05 18:59 ` J. Bruce Fields
  3 siblings, 0 replies; 5+ messages in thread
From: J. Bruce Fields @ 2017-09-05 18:59 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

Thanks, applying for 4.14.

--b.

On Mon, Aug 28, 2017 at 03:05:57PM -0400, Chuck Lever wrote:
> Hi Bruce-
> 
> These patches allow svcrdma to adjust more precisely to the limits
> of the underlying RDMA device on the server.
> 
> These have been floating around for several months, and were posted
> a few weeks ago for review on linux-rdma. They should be ready for
> you to take for v4.14.
> 
> These are the final server-side patches I have for for v4.14 cycle.
> 
> 
> ---
> 
> Chuck Lever (3):
>       svcrdma: Limit RQ depth
>       rdma core: Add rdma_rw_mr_payload()
>       svcrdma: Estimate Send Queue depth properly
> 
> 
>  drivers/infiniband/core/rw.c             |   24 ++++++++++++++++++++
>  include/rdma/rw.h                        |    2 ++
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |   36 +++++++++++++++++++++---------
>  3 files changed, 51 insertions(+), 11 deletions(-)
> 
> --
> Chuck Lever

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-09-05 18:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-28 19:05 [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14 Chuck Lever
2017-08-28 19:06 ` [PATCH 1/3] svcrdma: Limit RQ depth Chuck Lever
2017-08-28 19:06 ` [PATCH 2/3] rdma core: Add rdma_rw_mr_payload() Chuck Lever
2017-08-28 19:06 ` [PATCH 3/3] svcrdma: Estimate Send Queue depth properly Chuck Lever
2017-09-05 18:59 ` [PATCH 0/3] Final NFS/RDMA server patches proposed for v4.14 J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).