linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: linux-rdma <linux-rdma@vger.kernel.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>
Subject: Re: [PATCH v1 10/19] xprtrdma: Move Receive posting to Receive handler
Date: Tue, 8 May 2018 15:56:15 -0400	[thread overview]
Message-ID: <B2B5A84B-EFEA-4FF5-A861-FCE78FE73EA8@oracle.com> (raw)
In-Reply-To: <27ddafba-f31d-40b3-0064-f87e3fcd8019@Netapp.com>



> On May 8, 2018, at 3:52 PM, Anna Schumaker <Anna.Schumaker@Netapp.com> =
wrote:
>=20
>=20
>=20
> On 05/08/2018 03:47 PM, Chuck Lever wrote:
>>=20
>>=20
>>> On May 8, 2018, at 3:40 PM, Anna Schumaker =
<anna.schumaker@netapp.com> wrote:
>>>=20
>>> Hi Chuck,
>>>=20
>>> On 05/04/2018 03:35 PM, Chuck Lever wrote:
>>>> Receive completion and Reply handling are done by a BOUND
>>>> workqueue, meaning they run on only one CPU.
>>>>=20
>>>> Posting receives is currently done in the send_request path, which
>>>> on large systems is typically done on a different CPU than the one
>>>> handling Receive completions. This results in movement of
>>>> Receive-related cachelines between the sending and receiving CPUs.
>>>>=20
>>>> More importantly, it means that currently Receives are posted while
>>>> the transport's write lock is held, which is unnecessary and =
costly.
>>>>=20
>>>> Finally, allocation of Receive buffers is performed on-demand in
>>>> the Receive completion handler. This helps guarantee that they are
>>>> allocated on the same NUMA node as the CPU that handles Receive
>>>> completions.
>>>>=20
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>=20
>>> Running this against a 4.17-rc4 server seems to work okay, but =
running against a 4.16 server fails the cthon special tests with:        =
                                                                         =
                                                  =20
>>>=20
>>> write/read 30 MB file                                                =
                                                                         =
                                                                         =
    =20
>>> verify failed, offset 11272192; expected 79, got                     =
                                                                         =
                                                                         =
    =20
>>> 79 79 79 79 79 79 79 79 79 79                                        =
                                                                         =
                                                                         =
    =20
>>> 79 79 79 79 79 79 79 79 79 79                                        =
                                                                         =
                                                                         =
    =20
>>> 79 79 79 79 79 79 79 79 79 79                                        =
                                                                         =
                                                                         =
    =20
>>> 79 79 79 79 79 79 79 79 79 79                                        =
                                                                         =
                                                                         =
    =20
>>> 79 79 79 79 79 79 79 79 79 79                                        =
                                                                         =
                                                                         =
    =20
>>> 79 79 79 79 79 79 79 79 79 79                                        =
                                                                         =
                                                                         =
    =20
>>>=20
>>> and it goes on for several hundred more lines after this.  How =
worried do we need to be about somebody running a new client against and =
old server?
>>=20
>> I'm not sure what that result means, I've never seen it
>> before. But I can't think of a reason there would be an
>> incompatibility with a v4.16 server. That behavior needs
>> to be chased down and explained.
>>=20
>> Can you bisect this to a particular commit?
>=20
> Do you mean on the server side?  I don't see the problem on the client =
until I apply this patch

Sure, why not try bisecting the server. I'll see if I can
reproduce the behavior here with InfiniBand.


>>> Some of the performance issues I've had in the past seem to have =
gone away with the 4.17-rc4 code as well.  I'm not sure if that's =
related to your code or something changing in soft roce, but either way =
I'm much happier :)
>>>=20
>>> Anna
>>>=20
>>>> ---
>>>> include/trace/events/rpcrdma.h    |   40 +++++++-
>>>> net/sunrpc/xprtrdma/backchannel.c |   32 +------
>>>> net/sunrpc/xprtrdma/rpc_rdma.c    |   22 +----
>>>> net/sunrpc/xprtrdma/transport.c   |    3 -
>>>> net/sunrpc/xprtrdma/verbs.c       |  176 =
+++++++++++++++++++++----------------
>>>> net/sunrpc/xprtrdma/xprt_rdma.h   |    6 +
>>>> 6 files changed, 150 insertions(+), 129 deletions(-)
>>>>=20
>>>> diff --git a/include/trace/events/rpcrdma.h =
b/include/trace/events/rpcrdma.h
>>>> index 99c0049..ad27e19 100644
>>>> --- a/include/trace/events/rpcrdma.h
>>>> +++ b/include/trace/events/rpcrdma.h
>>>> @@ -546,6 +546,39 @@
>>>> 	)
>>>> );
>>>>=20
>>>> +TRACE_EVENT(xprtrdma_post_recvs,
>>>> +	TP_PROTO(
>>>> +		const struct rpcrdma_xprt *r_xprt,
>>>> +		unsigned int count,
>>>> +		int status
>>>> +	),
>>>> +
>>>> +	TP_ARGS(r_xprt, count, status),
>>>> +
>>>> +	TP_STRUCT__entry(
>>>> +		__field(const void *, r_xprt)
>>>> +		__field(unsigned int, count)
>>>> +		__field(int, status)
>>>> +		__field(int, posted)
>>>> +		__string(addr, rpcrdma_addrstr(r_xprt))
>>>> +		__string(port, rpcrdma_portstr(r_xprt))
>>>> +	),
>>>> +
>>>> +	TP_fast_assign(
>>>> +		__entry->r_xprt =3D r_xprt;
>>>> +		__entry->count =3D count;
>>>> +		__entry->status =3D status;
>>>> +		__entry->posted =3D r_xprt->rx_buf.rb_posted_receives;
>>>> +		__assign_str(addr, rpcrdma_addrstr(r_xprt));
>>>> +		__assign_str(port, rpcrdma_portstr(r_xprt));
>>>> +	),
>>>> +
>>>> +	TP_printk("peer=3D[%s]:%s r_xprt=3D%p: %u new recvs, %d active =
(rc %d)",
>>>> +		__get_str(addr), __get_str(port), __entry->r_xprt,
>>>> +		__entry->count, __entry->posted, __entry->status
>>>> +	)
>>>> +);
>>>> +
>>>> /**
>>>> ** Completion events
>>>> **/
>>>> @@ -800,7 +833,6 @@
>>>> 		__field(unsigned int, task_id)
>>>> 		__field(unsigned int, client_id)
>>>> 		__field(const void *, req)
>>>> -		__field(const void *, rep)
>>>> 		__field(size_t, callsize)
>>>> 		__field(size_t, rcvsize)
>>>> 	),
>>>> @@ -809,15 +841,13 @@
>>>> 		__entry->task_id =3D task->tk_pid;
>>>> 		__entry->client_id =3D task->tk_client->cl_clid;
>>>> 		__entry->req =3D req;
>>>> -		__entry->rep =3D req ? req->rl_reply : NULL;
>>>> 		__entry->callsize =3D task->tk_rqstp->rq_callsize;
>>>> 		__entry->rcvsize =3D task->tk_rqstp->rq_rcvsize;
>>>> 	),
>>>>=20
>>>> -	TP_printk("task:%u@%u req=3D%p rep=3D%p (%zu, %zu)",
>>>> +	TP_printk("task:%u@%u req=3D%p (%zu, %zu)",
>>>> 		__entry->task_id, __entry->client_id,
>>>> -		__entry->req, __entry->rep,
>>>> -		__entry->callsize, __entry->rcvsize
>>>> +		__entry->req, __entry->callsize, __entry->rcvsize
>>>> 	)
>>>> );
>>>>=20
>>>> diff --git a/net/sunrpc/xprtrdma/backchannel.c =
b/net/sunrpc/xprtrdma/backchannel.c
>>>> index 4034788..c8f1c2b 100644
>>>> --- a/net/sunrpc/xprtrdma/backchannel.c
>>>> +++ b/net/sunrpc/xprtrdma/backchannel.c
>>>> @@ -71,23 +71,6 @@ static int rpcrdma_bc_setup_reqs(struct =
rpcrdma_xprt *r_xprt,
>>>> 	return -ENOMEM;
>>>> }
>>>>=20
>>>> -/* Allocate and add receive buffers to the rpcrdma_buffer's
>>>> - * existing list of rep's. These are released when the
>>>> - * transport is destroyed.
>>>> - */
>>>> -static int rpcrdma_bc_setup_reps(struct rpcrdma_xprt *r_xprt,
>>>> -				 unsigned int count)
>>>> -{
>>>> -	int rc =3D 0;
>>>> -
>>>> -	while (count--) {
>>>> -		rc =3D rpcrdma_create_rep(r_xprt);
>>>> -		if (rc)
>>>> -			break;
>>>> -	}
>>>> -	return rc;
>>>> -}
>>>> -
>>>> /**
>>>> * xprt_rdma_bc_setup - Pre-allocate resources for handling =
backchannel requests
>>>> * @xprt: transport associated with these backchannel resources
>>>> @@ -116,14 +99,6 @@ int xprt_rdma_bc_setup(struct rpc_xprt *xprt, =
unsigned int reqs)
>>>> 	if (rc)
>>>> 		goto out_free;
>>>>=20
>>>> -	rc =3D rpcrdma_bc_setup_reps(r_xprt, reqs);
>>>> -	if (rc)
>>>> -		goto out_free;
>>>> -
>>>> -	rc =3D rpcrdma_ep_post_extra_recv(r_xprt, reqs);
>>>> -	if (rc)
>>>> -		goto out_free;
>>>> -
>>>> 	r_xprt->rx_buf.rb_bc_srv_max_requests =3D reqs;
>>>> 	request_module("svcrdma");
>>>> 	trace_xprtrdma_cb_setup(r_xprt, reqs);
>>>> @@ -228,6 +203,7 @@ int xprt_rdma_bc_send_reply(struct rpc_rqst =
*rqst)
>>>> 	if (rc < 0)
>>>> 		goto failed_marshal;
>>>>=20
>>>> +	rpcrdma_post_recvs(r_xprt, true);
>>>> 	if (rpcrdma_ep_post(&r_xprt->rx_ia, &r_xprt->rx_ep, req))
>>>> 		goto drop_connection;
>>>> 	return 0;
>>>> @@ -268,10 +244,14 @@ void xprt_rdma_bc_destroy(struct rpc_xprt =
*xprt, unsigned int reqs)
>>>> */
>>>> void xprt_rdma_bc_free_rqst(struct rpc_rqst *rqst)
>>>> {
>>>> +	struct rpcrdma_req *req =3D rpcr_to_rdmar(rqst);
>>>> 	struct rpc_xprt *xprt =3D rqst->rq_xprt;
>>>>=20
>>>> 	dprintk("RPC:       %s: freeing rqst %p (req %p)\n",
>>>> -		__func__, rqst, rpcr_to_rdmar(rqst));
>>>> +		__func__, rqst, req);
>>>> +
>>>> +	rpcrdma_recv_buffer_put(req->rl_reply);
>>>> +	req->rl_reply =3D NULL;
>>>>=20
>>>> 	spin_lock_bh(&xprt->bc_pa_lock);
>>>> 	list_add_tail(&rqst->rq_bc_pa_list, &xprt->bc_pa_list);
>>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c =
b/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> index 8f89e3f..d676106 100644
>>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>>> @@ -1027,8 +1027,6 @@ static bool rpcrdma_results_inline(struct =
rpcrdma_xprt *r_xprt,
>>>>=20
>>>> out_short:
>>>> 	pr_warn("RPC/RDMA short backward direction call\n");
>>>> -	if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, rep))
>>>> -		xprt_disconnect_done(&r_xprt->rx_xprt);
>>>> 	return true;
>>>> }
>>>> #else	/* CONFIG_SUNRPC_BACKCHANNEL */
>>>> @@ -1334,13 +1332,14 @@ void rpcrdma_reply_handler(struct =
rpcrdma_rep *rep)
>>>> 	u32 credits;
>>>> 	__be32 *p;
>>>>=20
>>>> +	--buf->rb_posted_receives;
>>>> +
>>>> 	if (rep->rr_hdrbuf.head[0].iov_len =3D=3D 0)
>>>> 		goto out_badstatus;
>>>>=20
>>>> +	/* Fixed transport header fields */
>>>> 	xdr_init_decode(&rep->rr_stream, &rep->rr_hdrbuf,
>>>> 			rep->rr_hdrbuf.head[0].iov_base);
>>>> -
>>>> -	/* Fixed transport header fields */
>>>> 	p =3D xdr_inline_decode(&rep->rr_stream, 4 * sizeof(*p));
>>>> 	if (unlikely(!p))
>>>> 		goto out_shortreply;
>>>> @@ -1379,17 +1378,10 @@ void rpcrdma_reply_handler(struct =
rpcrdma_rep *rep)
>>>>=20
>>>> 	trace_xprtrdma_reply(rqst->rq_task, rep, req, credits);
>>>>=20
>>>> +	rpcrdma_post_recvs(r_xprt, false);
>>>> 	queue_work(rpcrdma_receive_wq, &rep->rr_work);
>>>> 	return;
>>>>=20
>>>> -out_badstatus:
>>>> -	rpcrdma_recv_buffer_put(rep);
>>>> -	if (r_xprt->rx_ep.rep_connected =3D=3D 1) {
>>>> -		r_xprt->rx_ep.rep_connected =3D -EIO;
>>>> -		rpcrdma_conn_func(&r_xprt->rx_ep);
>>>> -	}
>>>> -	return;
>>>> -
>>>> out_badversion:
>>>> 	trace_xprtrdma_reply_vers(rep);
>>>> 	goto repost;
>>>> @@ -1409,7 +1401,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep =
*rep)
>>>> * receive buffer before returning.
>>>> */
>>>> repost:
>>>> -	r_xprt->rx_stats.bad_reply_count++;
>>>> -	if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, rep))
>>>> -		rpcrdma_recv_buffer_put(rep);
>>>> +	rpcrdma_post_recvs(r_xprt, false);
>>>> +out_badstatus:
>>>> +	rpcrdma_recv_buffer_put(rep);
>>>> }
>>>> diff --git a/net/sunrpc/xprtrdma/transport.c =
b/net/sunrpc/xprtrdma/transport.c
>>>> index 79885aa..0c775f0 100644
>>>> --- a/net/sunrpc/xprtrdma/transport.c
>>>> +++ b/net/sunrpc/xprtrdma/transport.c
>>>> @@ -722,9 +722,6 @@
>>>> 	if (rc < 0)
>>>> 		goto failed_marshal;
>>>>=20
>>>> -	if (req->rl_reply =3D=3D NULL) 		/* e.g. reconnection */
>>>> -		rpcrdma_recv_buffer_get(req);
>>>> -
>>>> 	/* Must suppress retransmit to maintain credits */
>>>> 	if (rqst->rq_connect_cookie =3D=3D xprt->connect_cookie)
>>>> 		goto drop_connection;
>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c =
b/net/sunrpc/xprtrdma/verbs.c
>>>> index f4ce7af..2a38301 100644
>>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>>> @@ -74,6 +74,7 @@
>>>> */
>>>> static void rpcrdma_mrs_create(struct rpcrdma_xprt *r_xprt);
>>>> static void rpcrdma_mrs_destroy(struct rpcrdma_buffer *buf);
>>>> +static int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt, bool =
temp);
>>>> static void rpcrdma_dma_unmap_regbuf(struct rpcrdma_regbuf *rb);
>>>>=20
>>>> struct workqueue_struct *rpcrdma_receive_wq __read_mostly;
>>>> @@ -726,7 +727,6 @@
>>>> {
>>>> 	struct rpcrdma_xprt *r_xprt =3D container_of(ia, struct =
rpcrdma_xprt,
>>>> 						   rx_ia);
>>>> -	unsigned int extras;
>>>> 	int rc;
>>>>=20
>>>> retry:
>>>> @@ -770,9 +770,8 @@
>>>> 	}
>>>>=20
>>>> 	dprintk("RPC:       %s: connected\n", __func__);
>>>> -	extras =3D r_xprt->rx_buf.rb_bc_srv_max_requests;
>>>> -	if (extras)
>>>> -		rpcrdma_ep_post_extra_recv(r_xprt, extras);
>>>> +
>>>> +	rpcrdma_post_recvs(r_xprt, true);
>>>>=20
>>>> out:
>>>> 	if (rc)
>>>> @@ -1082,14 +1081,8 @@ struct rpcrdma_req *
>>>> 	return req;
>>>> }
>>>>=20
>>>> -/**
>>>> - * rpcrdma_create_rep - Allocate an rpcrdma_rep object
>>>> - * @r_xprt: controlling transport
>>>> - *
>>>> - * Returns 0 on success or a negative errno on failure.
>>>> - */
>>>> -int
>>>> -rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
>>>> +static int
>>>> +rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt, bool temp)
>>>> {
>>>> 	struct rpcrdma_create_data_internal *cdata =3D &r_xprt->rx_data;
>>>> 	struct rpcrdma_buffer *buf =3D &r_xprt->rx_buf;
>>>> @@ -1117,6 +1110,7 @@ struct rpcrdma_req *
>>>> 	rep->rr_recv_wr.wr_cqe =3D &rep->rr_cqe;
>>>> 	rep->rr_recv_wr.sg_list =3D &rep->rr_rdmabuf->rg_iov;
>>>> 	rep->rr_recv_wr.num_sge =3D 1;
>>>> +	rep->rr_temp =3D temp;
>>>>=20
>>>> 	spin_lock(&buf->rb_lock);
>>>> 	list_add(&rep->rr_list, &buf->rb_recv_bufs);
>>>> @@ -1168,12 +1162,8 @@ struct rpcrdma_req *
>>>> 		list_add(&req->rl_list, &buf->rb_send_bufs);
>>>> 	}
>>>>=20
>>>> +	buf->rb_posted_receives =3D 0;
>>>> 	INIT_LIST_HEAD(&buf->rb_recv_bufs);
>>>> -	for (i =3D 0; i <=3D buf->rb_max_requests; i++) {
>>>> -		rc =3D rpcrdma_create_rep(r_xprt);
>>>> -		if (rc)
>>>> -			goto out;
>>>> -	}
>>>>=20
>>>> 	rc =3D rpcrdma_sendctxs_create(r_xprt);
>>>> 	if (rc)
>>>> @@ -1268,7 +1258,6 @@ struct rpcrdma_req *
>>>> 		rep =3D rpcrdma_buffer_get_rep_locked(buf);
>>>> 		rpcrdma_destroy_rep(rep);
>>>> 	}
>>>> -	buf->rb_send_count =3D 0;
>>>>=20
>>>> 	spin_lock(&buf->rb_reqslock);
>>>> 	while (!list_empty(&buf->rb_allreqs)) {
>>>> @@ -1283,7 +1272,6 @@ struct rpcrdma_req *
>>>> 		spin_lock(&buf->rb_reqslock);
>>>> 	}
>>>> 	spin_unlock(&buf->rb_reqslock);
>>>> -	buf->rb_recv_count =3D 0;
>>>>=20
>>>> 	rpcrdma_mrs_destroy(buf);
>>>> }
>>>> @@ -1356,27 +1344,11 @@ struct rpcrdma_mr *
>>>> 	__rpcrdma_mr_put(&r_xprt->rx_buf, mr);
>>>> }
>>>>=20
>>>> -static struct rpcrdma_rep *
>>>> -rpcrdma_buffer_get_rep(struct rpcrdma_buffer *buffers)
>>>> -{
>>>> -	/* If an RPC previously completed without a reply (say, a
>>>> -	 * credential problem or a soft timeout occurs) then hold off
>>>> -	 * on supplying more Receive buffers until the number of new
>>>> -	 * pending RPCs catches up to the number of posted Receives.
>>>> -	 */
>>>> -	if (unlikely(buffers->rb_send_count < buffers->rb_recv_count))
>>>> -		return NULL;
>>>> -
>>>> -	if (unlikely(list_empty(&buffers->rb_recv_bufs)))
>>>> -		return NULL;
>>>> -	buffers->rb_recv_count++;
>>>> -	return rpcrdma_buffer_get_rep_locked(buffers);
>>>> -}
>>>> -
>>>> -/*
>>>> - * Get a set of request/reply buffers.
>>>> +/**
>>>> + * rpcrdma_buffer_get - Get a request buffer
>>>> + * @buffers: Buffer pool from which to obtain a buffer
>>>> *
>>>> - * Reply buffer (if available) is attached to send buffer upon =
return.
>>>> + * Returns a fresh rpcrdma_req, or NULL if none are available.
>>>> */
>>>> struct rpcrdma_req *
>>>> rpcrdma_buffer_get(struct rpcrdma_buffer *buffers)
>>>> @@ -1384,23 +1356,21 @@ struct rpcrdma_req *
>>>> 	struct rpcrdma_req *req;
>>>>=20
>>>> 	spin_lock(&buffers->rb_lock);
>>>> -	if (list_empty(&buffers->rb_send_bufs))
>>>> -		goto out_reqbuf;
>>>> -	buffers->rb_send_count++;
>>>> +	if (unlikely(list_empty(&buffers->rb_send_bufs)))
>>>> +		goto out_noreqs;
>>>> 	req =3D rpcrdma_buffer_get_req_locked(buffers);
>>>> -	req->rl_reply =3D rpcrdma_buffer_get_rep(buffers);
>>>> 	spin_unlock(&buffers->rb_lock);
>>>> -
>>>> 	return req;
>>>>=20
>>>> -out_reqbuf:
>>>> +out_noreqs:
>>>> 	spin_unlock(&buffers->rb_lock);
>>>> 	return NULL;
>>>> }
>>>>=20
>>>> -/*
>>>> - * Put request/reply buffers back into pool.
>>>> - * Pre-decrement counter/array index.
>>>> +/**
>>>> + * rpcrdma_buffer_put - Put request/reply buffers back into pool
>>>> + * @req: object to return
>>>> + *
>>>> */
>>>> void
>>>> rpcrdma_buffer_put(struct rpcrdma_req *req)
>>>> @@ -1411,27 +1381,16 @@ struct rpcrdma_req *
>>>> 	req->rl_reply =3D NULL;
>>>>=20
>>>> 	spin_lock(&buffers->rb_lock);
>>>> -	buffers->rb_send_count--;
>>>> -	list_add_tail(&req->rl_list, &buffers->rb_send_bufs);
>>>> +	list_add(&req->rl_list, &buffers->rb_send_bufs);
>>>> 	if (rep) {
>>>> -		buffers->rb_recv_count--;
>>>> -		list_add_tail(&rep->rr_list, &buffers->rb_recv_bufs);
>>>> +		if (!rep->rr_temp) {
>>>> +			list_add(&rep->rr_list, &buffers->rb_recv_bufs);
>>>> +			rep =3D NULL;
>>>> +		}
>>>> 	}
>>>> 	spin_unlock(&buffers->rb_lock);
>>>> -}
>>>> -
>>>> -/*
>>>> - * Recover reply buffers from pool.
>>>> - * This happens when recovering from disconnect.
>>>> - */
>>>> -void
>>>> -rpcrdma_recv_buffer_get(struct rpcrdma_req *req)
>>>> -{
>>>> -	struct rpcrdma_buffer *buffers =3D req->rl_buffer;
>>>> -
>>>> -	spin_lock(&buffers->rb_lock);
>>>> -	req->rl_reply =3D rpcrdma_buffer_get_rep(buffers);
>>>> -	spin_unlock(&buffers->rb_lock);
>>>> +	if (rep)
>>>> +		rpcrdma_destroy_rep(rep);
>>>> }
>>>>=20
>>>> /*
>>>> @@ -1443,10 +1402,13 @@ struct rpcrdma_req *
>>>> {
>>>> 	struct rpcrdma_buffer *buffers =3D &rep->rr_rxprt->rx_buf;
>>>>=20
>>>> -	spin_lock(&buffers->rb_lock);
>>>> -	buffers->rb_recv_count--;
>>>> -	list_add_tail(&rep->rr_list, &buffers->rb_recv_bufs);
>>>> -	spin_unlock(&buffers->rb_lock);
>>>> +	if (!rep->rr_temp) {
>>>> +		spin_lock(&buffers->rb_lock);
>>>> +		list_add(&rep->rr_list, &buffers->rb_recv_bufs);
>>>> +		spin_unlock(&buffers->rb_lock);
>>>> +	} else {
>>>> +		rpcrdma_destroy_rep(rep);
>>>> +	}
>>>> }
>>>>=20
>>>> /**
>>>> @@ -1542,13 +1504,6 @@ struct rpcrdma_regbuf *
>>>> 	struct ib_send_wr *send_wr =3D &req->rl_sendctx->sc_wr;
>>>> 	int rc;
>>>>=20
>>>> -	if (req->rl_reply) {
>>>> -		rc =3D rpcrdma_ep_post_recv(ia, req->rl_reply);
>>>> -		if (rc)
>>>> -			return rc;
>>>> -		req->rl_reply =3D NULL;
>>>> -	}
>>>> -
>>>> 	if (!ep->rep_send_count ||
>>>> 	    test_bit(RPCRDMA_REQ_F_TX_RESOURCES, &req->rl_flags)) {
>>>> 		send_wr->send_flags |=3D IB_SEND_SIGNALED;
>>>> @@ -1623,3 +1578,70 @@ struct rpcrdma_regbuf *
>>>> 	rpcrdma_recv_buffer_put(rep);
>>>> 	return rc;
>>>> }
>>>> +
>>>> +/**
>>>> + * rpcrdma_post_recvs - Maybe post some Receive buffers
>>>> + * @r_xprt: controlling transport
>>>> + * @temp: when true, allocate temp rpcrdma_rep objects
>>>> + *
>>>> + */
>>>> +void
>>>> +rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp)
>>>> +{
>>>> +	struct rpcrdma_buffer *buf =3D &r_xprt->rx_buf;
>>>> +	struct ib_recv_wr *wr, *bad_wr;
>>>> +	int needed, count, rc;
>>>> +
>>>> +	needed =3D buf->rb_credits + (buf->rb_bc_srv_max_requests << 1);
>>>> +	if (buf->rb_posted_receives > needed)
>>>> +		return;
>>>> +	needed -=3D buf->rb_posted_receives;
>>>> +
>>>> +	count =3D 0;
>>>> +	wr =3D NULL;
>>>> +	while (needed) {
>>>> +		struct rpcrdma_regbuf *rb;
>>>> +		struct rpcrdma_rep *rep;
>>>> +
>>>> +		spin_lock(&buf->rb_lock);
>>>> +		rep =3D list_first_entry_or_null(&buf->rb_recv_bufs,
>>>> +					       struct rpcrdma_rep, =
rr_list);
>>>> +		if (likely(rep))
>>>> +			list_del(&rep->rr_list);
>>>> +		spin_unlock(&buf->rb_lock);
>>>> +		if (!rep) {
>>>> +			if (rpcrdma_create_rep(r_xprt, temp))
>>>> +				break;
>>>> +			continue;
>>>> +		}
>>>> +
>>>> +		rb =3D rep->rr_rdmabuf;
>>>> +		if (!rpcrdma_regbuf_is_mapped(rb)) {
>>>> +			if (!__rpcrdma_dma_map_regbuf(&r_xprt->rx_ia, =
rb)) {
>>>> +				rpcrdma_recv_buffer_put(rep);
>>>> +				break;
>>>> +			}
>>>> +		}
>>>> +
>>>> +		trace_xprtrdma_post_recv(rep->rr_recv_wr.wr_cqe);
>>>> +		rep->rr_recv_wr.next =3D wr;
>>>> +		wr =3D &rep->rr_recv_wr;
>>>> +		++count;
>>>> +		--needed;
>>>> +	}
>>>> +	if (!count)
>>>> +		return;
>>>> +
>>>> +	rc =3D ib_post_recv(r_xprt->rx_ia.ri_id->qp, wr, &bad_wr);
>>>> +	if (rc) {
>>>> +		for (wr =3D bad_wr; wr; wr =3D wr->next) {
>>>> +			struct rpcrdma_rep *rep;
>>>> +
>>>> +			rep =3D container_of(wr, struct rpcrdma_rep, =
rr_recv_wr);
>>>> +			rpcrdma_recv_buffer_put(rep);
>>>> +			--count;
>>>> +		}
>>>> +	}
>>>> +	buf->rb_posted_receives +=3D count;
>>>> +	trace_xprtrdma_post_recvs(r_xprt, count, rc);
>>>> +}
>>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h =
b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> index 765e4df..a6d0d6e 100644
>>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>>> @@ -197,6 +197,7 @@ struct rpcrdma_rep {
>>>> 	__be32			rr_proc;
>>>> 	int			rr_wc_flags;
>>>> 	u32			rr_inv_rkey;
>>>> +	bool			rr_temp;
>>>> 	struct rpcrdma_regbuf	*rr_rdmabuf;
>>>> 	struct rpcrdma_xprt	*rr_rxprt;
>>>> 	struct work_struct	rr_work;
>>>> @@ -397,11 +398,11 @@ struct rpcrdma_buffer {
>>>> 	struct rpcrdma_sendctx	**rb_sc_ctxs;
>>>>=20
>>>> 	spinlock_t		rb_lock;	/* protect buf lists */
>>>> -	int			rb_send_count, rb_recv_count;
>>>> 	struct list_head	rb_send_bufs;
>>>> 	struct list_head	rb_recv_bufs;
>>>> 	u32			rb_max_requests;
>>>> 	u32			rb_credits;	/* most recent credit =
grant */
>>>> +	int			rb_posted_receives;
>>>>=20
>>>> 	u32			rb_bc_srv_max_requests;
>>>> 	spinlock_t		rb_reqslock;	/* protect rb_allreqs */
>>>> @@ -558,13 +559,13 @@ int rpcrdma_ep_create(struct rpcrdma_ep *, =
struct rpcrdma_ia *,
>>>> int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *,
>>>> 				struct rpcrdma_req *);
>>>> int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct rpcrdma_rep =
*);
>>>> +void rpcrdma_post_recvs(struct rpcrdma_xprt *r_xprt, bool temp);
>>>>=20
>>>> /*
>>>> * Buffer calls - xprtrdma/verbs.c
>>>> */
>>>> struct rpcrdma_req *rpcrdma_create_req(struct rpcrdma_xprt *);
>>>> void rpcrdma_destroy_req(struct rpcrdma_req *);
>>>> -int rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt);
>>>> int rpcrdma_buffer_create(struct rpcrdma_xprt *);
>>>> void rpcrdma_buffer_destroy(struct rpcrdma_buffer *);
>>>> struct rpcrdma_sendctx *rpcrdma_sendctx_get_locked(struct =
rpcrdma_buffer *buf);
>>>> @@ -577,7 +578,6 @@ int rpcrdma_ep_post(struct rpcrdma_ia *, struct =
rpcrdma_ep *,
>>>>=20
>>>> struct rpcrdma_req *rpcrdma_buffer_get(struct rpcrdma_buffer *);
>>>> void rpcrdma_buffer_put(struct rpcrdma_req *);
>>>> -void rpcrdma_recv_buffer_get(struct rpcrdma_req *);
>>>> void rpcrdma_recv_buffer_put(struct rpcrdma_rep *);
>>>>=20
>>>> struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(size_t, enum =
dma_data_direction,
>>>>=20
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>=20
>> --
>> Chuck Lever
>>=20
>>=20
>>=20
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




  reply	other threads:[~2018-05-08 19:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-04 19:34 [PATCH v1 00/19] NFS/RDMA client patches for next Chuck Lever
2018-05-04 19:34 ` [PATCH v1 01/19] xprtrdma: Add proper SPDX tags for NetApp-contributed source Chuck Lever
2018-05-07 13:27   ` Anna Schumaker
2018-05-07 14:11     ` Chuck Lever
2018-05-07 14:28       ` Anna Schumaker
2018-05-14 20:37         ` Jason Gunthorpe
2018-05-04 19:34 ` [PATCH v1 02/19] xprtrdma: Try to fail quickly if proto=rdma Chuck Lever
2018-05-04 19:34 ` [PATCH v1 03/19] xprtrdma: Create transport's CM ID in the correct network namespace Chuck Lever
2018-05-04 19:34 ` [PATCH v1 04/19] xprtrdma: Fix max_send_wr computation Chuck Lever
2018-05-04 19:34 ` [PATCH v1 05/19] SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock Chuck Lever
2018-05-04 19:34 ` [PATCH v1 06/19] SUNRPC: Add a ->free_slot transport callout Chuck Lever
2018-05-04 19:35 ` [PATCH v1 07/19] xprtrdma: Introduce ->alloc_slot call-out for xprtrdma Chuck Lever
2018-05-04 19:35 ` [PATCH v1 08/19] xprtrdma: Make rpc_rqst part of rpcrdma_req Chuck Lever
2018-05-04 19:35 ` [PATCH v1 09/19] xprtrdma: Clean up Receive trace points Chuck Lever
2018-05-04 19:35 ` [PATCH v1 10/19] xprtrdma: Move Receive posting to Receive handler Chuck Lever
2018-05-08 19:40   ` Anna Schumaker
2018-05-08 19:47     ` Chuck Lever
2018-05-08 19:52       ` Anna Schumaker
2018-05-08 19:56         ` Chuck Lever [this message]
2018-05-29 18:23         ` Chuck Lever
2018-05-31 20:55           ` Anna Schumaker
2018-05-04 19:35 ` [PATCH v1 11/19] xprtrdma: Remove rpcrdma_ep_{post_recv, post_extra_recv} Chuck Lever
2018-05-04 19:35 ` [PATCH v1 12/19] xprtrdma: Remove rpcrdma_buffer_get_req_locked() Chuck Lever
2018-05-04 19:35 ` [PATCH v1 13/19] xprtrdma: Remove rpcrdma_buffer_get_rep_locked() Chuck Lever
2018-05-04 19:35 ` [PATCH v1 14/19] xprtrdma: Make rpcrdma_sendctx_put_locked() a static function Chuck Lever
2018-05-04 19:35 ` [PATCH v1 15/19] xprtrdma: Return -ENOBUFS when no pages are available Chuck Lever
2018-05-04 19:35 ` [PATCH v1 16/19] xprtrdma: Move common wait_for_buffer_space call to parent function Chuck Lever
2018-05-04 19:35 ` [PATCH v1 17/19] xprtrdma: Wait on empty sendctx queue Chuck Lever
2018-05-04 19:36 ` [PATCH v1 18/19] xprtrdma: Add trace_xprtrdma_dma_map(mr) Chuck Lever
2018-05-04 19:36 ` [PATCH v1 19/19] xprtrdma: Remove transfertypes array Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B2B5A84B-EFEA-4FF5-A861-FCE78FE73EA8@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=Anna.Schumaker@Netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).