linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: anna.schumaker@netapp.com
Cc: linux-rdma@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: [PATCH 7/9] xprtrdma: Introduce ->alloc_slot call-out for xprtrdma
Date: Mon, 05 Mar 2018 15:13:29 -0500	[thread overview]
Message-ID: <20180305201329.10904.5164.stgit@manet.1015granger.net> (raw)
In-Reply-To: <20180305200825.10904.40829.stgit@manet.1015granger.net>

rpcrdma_buffer_get acquires an rpcrdma_req and rep for each RPC.
Currently this is done in the call_allocate action, and sometimes it
can fail if there are many outstanding RPCs.

When call_allocate fails, the RPC task is put on the delayq. It is
awoken a few milliseconds later, but there's no guarantee it will
get a buffer at that time. The RPC task can be repeatedly put back
to sleep or even starved.

The call_allocate action should rarely fail. The delayq mechanism is
not meant to deal with transport congestion.

In the current sunrpc stack, there is a friendlier way to deal with
this situation. These objects are actually tantamount to an RPC
slot (rpc_rqst) and there is a separate FSM action, distinct from
call_allocate, for allocating slot resources. This is the
call_reserve action.

When allocation fails during this action, the RPC is placed on the
transport's backlog queue. The backlog mechanism provides a stronger
guarantee that when the RPC is awoken, a buffer will be available
for it; and backlogged RPCs are awoken one-at-a-time.

To make slot resource allocation occur in the call_reserve action,
create special ->alloc_slot and ->free_slot call-outs for xprtrdma.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/transport.c |   52 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 50 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 40ff91d..1dac949 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -537,6 +537,54 @@
 	}
 }
 
+/**
+ * xprt_rdma_alloc_slot - allocate an rpc_rqst
+ * @xprt: controlling RPC transport
+ * @task: RPC task requesting a fresh rpc_rqst
+ *
+ * tk_status values:
+ *	%0 if task->tk_rqstp points to a fresh rpc_rqst
+ *	%-EAGAIN if no rpc_rqst is available; queued on backlog
+ */
+static void
+xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task)
+{
+	struct rpc_rqst *rqst;
+
+	spin_lock(&xprt->reserve_lock);
+	if (list_empty(&xprt->free))
+		goto out_sleep;
+	rqst = list_first_entry(&xprt->free, struct rpc_rqst, rq_list);
+	list_del(&rqst->rq_list);
+	spin_unlock(&xprt->reserve_lock);
+
+	task->tk_rqstp = rqst;
+	task->tk_status = 0;
+	return;
+
+out_sleep:
+	rpc_sleep_on(&xprt->backlog, task, NULL);
+	spin_unlock(&xprt->reserve_lock);
+	task->tk_status = -EAGAIN;
+}
+
+/**
+ * xprt_rdma_free_slot - release an rpc_rqst
+ * @xprt: controlling RPC transport
+ * @rqst: rpc_rqst to release
+ *
+ */
+static void
+xprt_rdma_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *rqst)
+{
+	memset(rqst, 0, sizeof(*rqst));
+
+	spin_lock(&xprt->reserve_lock);
+	list_add(&rqst->rq_list, &xprt->free);
+	rpc_wake_up_next(&xprt->backlog);
+	spin_unlock(&xprt->reserve_lock);
+}
+
 static bool
 rpcrdma_get_sendbuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
 		    size_t size, gfp_t flags)
@@ -779,8 +827,8 @@ void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
 static const struct rpc_xprt_ops xprt_rdma_procs = {
 	.reserve_xprt		= xprt_reserve_xprt_cong,
 	.release_xprt		= xprt_release_xprt_cong, /* sunrpc/xprt.c */
-	.alloc_slot		= xprt_alloc_slot,
-	.free_slot		= xprt_free_slot,
+	.alloc_slot		= xprt_rdma_alloc_slot,
+	.free_slot		= xprt_rdma_free_slot,
 	.release_request	= xprt_release_rqst_cong,       /* ditto */
 	.set_retrans_timeout	= xprt_set_retrans_timeout_def, /* ditto */
 	.timer			= xprt_rdma_timer,


  parent reply	other threads:[~2018-03-05 20:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-05 20:12 [PATCH 0/9] Second round of v4.17 NFS/RDMA client patches Chuck Lever
2018-03-05 20:12 ` [PATCH 1/9] SUNRPC: Move xprt_update_rtt callsite Chuck Lever
2018-03-05 20:13 ` [PATCH 2/9] SUNRPC: Make RTT measurement more precise (Receive) Chuck Lever
2018-03-05 20:13 ` [PATCH 3/9] SUNRPC: Make RTT measurement more precise (Send) Chuck Lever
2018-03-05 20:13 ` [PATCH 4/9] SUNRPC: Make num_reqs a non-atomic integer Chuck Lever
2018-03-05 20:13 ` [PATCH 5/9] SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock Chuck Lever
2018-03-06 22:02   ` Anna Schumaker
2018-03-06 22:07     ` Chuck Lever
2018-03-06 22:30       ` Chuck Lever
2018-03-07 20:00         ` Anna Schumaker
2018-03-07 20:23           ` Chuck Lever
2018-03-07 20:32             ` Anna Schumaker
2018-03-07 20:44               ` Chuck Lever
2018-03-05 20:13 ` [PATCH 6/9] SUNRPC: Add a ->free_slot transport callout Chuck Lever
2018-03-05 20:13 ` Chuck Lever [this message]
2018-03-05 20:13 ` [PATCH 8/9] xprtrdma: Make rpc_rqst part of rpcrdma_req Chuck Lever
2018-03-05 20:13 ` [PATCH 9/9] xprtrdma: Allocate rpcrdma_reps during Receive completion Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180305201329.10904.5164.stgit@manet.1015granger.net \
    --to=chuck.lever@oracle.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).