linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] NFS/RDMA client patches for v4.15
@ 2017-10-09 16:03 Chuck Lever
  2017-10-09 16:03 ` [PATCH v1 1/3] xprtrdma: Don't defer fencing an async RPC's chunks Chuck Lever
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Chuck Lever @ 2017-10-09 16:03 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

Hi Anna-

Here is the first set of NFS/RDMA client patches for v4.15. These
close a window where async RPCs that never receive a Reply will
terminate but let the memory invalidation happen later.

I don't think there was much of an exposure here. Hard async RPCs
typically just hang anyway, they can't be interrupted.

Please consider these patches for v4.15.

---

Chuck Lever (3):
      xprtrdma: Don't defer fencing an async RPC's chunks
      xprtrdma: Use ro_unmap_sync in xprt_rdma_send_request
      xprtrdma: Remove ro_unmap_safe


 net/sunrpc/xprtrdma/fmr_ops.c   |   19 -------------------
 net/sunrpc/xprtrdma/frwr_ops.c  |   19 -------------------
 net/sunrpc/xprtrdma/transport.c |    5 +++--
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 --
 4 files changed, 3 insertions(+), 42 deletions(-)

--
Chuck Lever

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v1 1/3] xprtrdma: Don't defer fencing an async RPC's chunks
  2017-10-09 16:03 [PATCH v1 0/3] NFS/RDMA client patches for v4.15 Chuck Lever
@ 2017-10-09 16:03 ` Chuck Lever
  2017-10-09 16:03 ` [PATCH v1 2/3] xprtrdma: Use ro_unmap_sync in xprt_rdma_send_request Chuck Lever
  2017-10-09 16:03 ` [PATCH v1 3/3] xprtrdma: Remove ro_unmap_safe Chuck Lever
  2 siblings, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2017-10-09 16:03 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

In current kernels, waiting in xprt_release appears to be safe to
do. I had erroneously believed that for ASYNC RPCs, waiting of any
kind in xprt_release->xprt_rdma_free would result in deadlock. I've
done injection testing and consulted with Trond to confirm that
waiting in the RPC release path is safe.

For the very few times where RPC resources haven't yet been released
earlier by the reply handler, it is safe to wait synchronously in
xprt_rdma_free for invalidation rather than defering it to MR
recovery.

Note: When the QP is error state, posting a LocalInvalidate should
flush and mark the MR as bad. There is no way the remote HCA can
access that MR via a QP in error state, so it is effectively already
inaccessible and thus safe for the Upper Layer to access. The next
time the MR is used it should be recognized and cleaned up properly
by frwr_op_map.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/transport.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index c84e2b6..8cf5ccf 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -686,7 +686,7 @@
 	dprintk("RPC:       %s: called on 0x%p\n", __func__, req->rl_reply);
 
 	if (!list_empty(&req->rl_registered))
-		ia->ri_ops->ro_unmap_safe(r_xprt, req, !RPC_IS_ASYNC(task));
+		ia->ri_ops->ro_unmap_sync(r_xprt, &req->rl_registered);
 	rpcrdma_unmap_sges(ia, req);
 	rpcrdma_buffer_put(req);
 }


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v1 2/3] xprtrdma: Use ro_unmap_sync in xprt_rdma_send_request
  2017-10-09 16:03 [PATCH v1 0/3] NFS/RDMA client patches for v4.15 Chuck Lever
  2017-10-09 16:03 ` [PATCH v1 1/3] xprtrdma: Don't defer fencing an async RPC's chunks Chuck Lever
@ 2017-10-09 16:03 ` Chuck Lever
  2017-10-13 17:43   ` Anna Schumaker
  2017-10-09 16:03 ` [PATCH v1 3/3] xprtrdma: Remove ro_unmap_safe Chuck Lever
  2 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2017-10-09 16:03 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

The "safe" version of ro_unmap is used here to avoid waiting
unnecessarily. However:

 - It is safe to wait. After all, we have to wait anyway when using
   FMR to register memory.

 - This case is rare: it occurs only after a reconnect.

By switching this call site to ro_unmap_sync, the final use of
ro_unmap_safe is removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/transport.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 8cf5ccf..eb46d24 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -728,7 +728,8 @@
 
 	/* On retransmit, remove any previously registered chunks */
 	if (unlikely(!list_empty(&req->rl_registered)))
-		r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
+		r_xprt->rx_ia.ri_ops->ro_unmap_sync(r_xprt,
+						    &req->rl_registered);
 
 	rc = rpcrdma_marshal_req(r_xprt, rqst);
 	if (rc < 0)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v1 3/3] xprtrdma: Remove ro_unmap_safe
  2017-10-09 16:03 [PATCH v1 0/3] NFS/RDMA client patches for v4.15 Chuck Lever
  2017-10-09 16:03 ` [PATCH v1 1/3] xprtrdma: Don't defer fencing an async RPC's chunks Chuck Lever
  2017-10-09 16:03 ` [PATCH v1 2/3] xprtrdma: Use ro_unmap_sync in xprt_rdma_send_request Chuck Lever
@ 2017-10-09 16:03 ` Chuck Lever
  2 siblings, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2017-10-09 16:03 UTC (permalink / raw)
  To: anna.schumaker; +Cc: linux-rdma, linux-nfs

Clean up: There are no remaining callers of this method.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/fmr_ops.c   |   19 -------------------
 net/sunrpc/xprtrdma/frwr_ops.c  |   19 -------------------
 net/sunrpc/xprtrdma/xprt_rdma.h |    2 --
 3 files changed, 40 deletions(-)

diff --git a/net/sunrpc/xprtrdma/fmr_ops.c b/net/sunrpc/xprtrdma/fmr_ops.c
index 6c71513..30bf713 100644
--- a/net/sunrpc/xprtrdma/fmr_ops.c
+++ b/net/sunrpc/xprtrdma/fmr_ops.c
@@ -305,28 +305,9 @@ enum {
 	}
 }
 
-/* Use a slow, safe mechanism to invalidate all memory regions
- * that were registered for "req".
- */
-static void
-fmr_op_unmap_safe(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
-		  bool sync)
-{
-	struct rpcrdma_mw *mw;
-
-	while (!list_empty(&req->rl_registered)) {
-		mw = rpcrdma_pop_mw(&req->rl_registered);
-		if (sync)
-			fmr_op_recover_mr(mw);
-		else
-			rpcrdma_defer_mr_recovery(mw);
-	}
-}
-
 const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
 	.ro_map				= fmr_op_map,
 	.ro_unmap_sync			= fmr_op_unmap_sync,
-	.ro_unmap_safe			= fmr_op_unmap_safe,
 	.ro_recover_mr			= fmr_op_recover_mr,
 	.ro_open			= fmr_op_open,
 	.ro_maxpages			= fmr_op_maxpages,
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index df062e0..3053fb0 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -558,28 +558,9 @@
 	goto unmap;
 }
 
-/* Use a slow, safe mechanism to invalidate all memory regions
- * that were registered for "req".
- */
-static void
-frwr_op_unmap_safe(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
-		   bool sync)
-{
-	struct rpcrdma_mw *mw;
-
-	while (!list_empty(&req->rl_registered)) {
-		mw = rpcrdma_pop_mw(&req->rl_registered);
-		if (sync)
-			frwr_op_recover_mr(mw);
-		else
-			rpcrdma_defer_mr_recovery(mw);
-	}
-}
-
 const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
 	.ro_map				= frwr_op_map,
 	.ro_unmap_sync			= frwr_op_unmap_sync,
-	.ro_unmap_safe			= frwr_op_unmap_safe,
 	.ro_recover_mr			= frwr_op_recover_mr,
 	.ro_open			= frwr_op_open,
 	.ro_maxpages			= frwr_op_maxpages,
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index e26a97d..74e0174 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -473,8 +473,6 @@ struct rpcrdma_memreg_ops {
 				  struct rpcrdma_mw **);
 	void		(*ro_unmap_sync)(struct rpcrdma_xprt *,
 					 struct list_head *);
-	void		(*ro_unmap_safe)(struct rpcrdma_xprt *,
-					 struct rpcrdma_req *, bool);
 	void		(*ro_recover_mr)(struct rpcrdma_mw *);
 	int		(*ro_open)(struct rpcrdma_ia *,
 				   struct rpcrdma_ep *,


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 2/3] xprtrdma: Use ro_unmap_sync in xprt_rdma_send_request
  2017-10-09 16:03 ` [PATCH v1 2/3] xprtrdma: Use ro_unmap_sync in xprt_rdma_send_request Chuck Lever
@ 2017-10-13 17:43   ` Anna Schumaker
  2017-10-13 17:55     ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Anna Schumaker @ 2017-10-13 17:43 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

Hi Chuck,

On 10/09/2017 12:03 PM, Chuck Lever wrote:
> The "safe" version of ro_unmap is used here to avoid waiting
> unnecessarily. However:
> 
>  - It is safe to wait. After all, we have to wait anyway when using
>    FMR to register memory.
> 
>  - This case is rare: it occurs only after a reconnect.

I'm just double checking that this really is safe?  I'm seeing the hung task killer running more often after applying this patch:

[  245.376879] INFO: task kworker/0:5:193 blocked for more than 120 seconds.
[  245.379909]       Not tainted 4.14.0-rc4-ANNA+ #10934
[  245.382001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  245.385926] kworker/0:5     D    0   193      2 0x80000080
[  245.388148] Workqueue: rpciod rpc_async_schedule [sunrpc]
[  245.389386] Call Trace:
[  245.390070]  __schedule+0x23d/0x850
[  245.390673]  schedule+0x46/0xa0
[  245.391219]  schedule_timeout+0x230/0x480
[  245.391817]  ? ib_send_mad+0x32d/0x430
[  245.392387]  ? __slab_alloc.isra.24+0x3e/0x60
[  245.392988]  wait_for_common+0xc9/0x190
[  245.393574]  ? wait_for_common+0xc9/0x190
[  245.395290]  ? wake_up_q+0x90/0x90
[  245.396239]  wait_for_completion+0x30/0x40
[  245.397279]  frwr_op_unmap_sync+0x139/0x2a0 [rpcrdma]
[  245.398401]  ? call_decode+0x830/0x830 [sunrpc]
[  245.399435]  xprt_rdma_send_request+0xdc/0xf0 [rpcrdma]
[  245.400558]  xprt_transmit+0x7f/0x370 [sunrpc]
[  245.401500]  ? call_decode+0x830/0x830 [sunrpc]
[  245.402079]  call_transmit+0x1c4/0x2a0 [sunrpc]
[  245.402652]  ? call_decode+0x830/0x830 [sunrpc]
[  245.403194]  __rpc_execute+0x92/0x430 [sunrpc]
[  245.403737]  ? rpc_wake_up+0x7e/0x90 [sunrpc]
[  245.404264]  rpc_async_schedule+0x25/0x30 [sunrpc]
[  245.404796]  process_one_work+0x1ed/0x430
[  245.405276]  worker_thread+0x45/0x3f0
[  245.405730]  kthread+0x134/0x150
[  245.406144]  ? process_one_work+0x430/0x430
[  245.406627]  ? kthread_create_on_node+0x80/0x80
[  245.407114]  ret_from_fork+0x25/0x30

I see this when running cthon tests over NFS v3 with kvm / softroce configured on my client and server.  I'm willing to accept that the problem might be on my end, but I still want to check with you first :)

Thanks,
Anna

> 
> By switching this call site to ro_unmap_sync, the final use of
> ro_unmap_safe is removed.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/xprtrdma/transport.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index 8cf5ccf..eb46d24 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -728,7 +728,8 @@
>  
>  	/* On retransmit, remove any previously registered chunks */
>  	if (unlikely(!list_empty(&req->rl_registered)))
> -		r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
> +		r_xprt->rx_ia.ri_ops->ro_unmap_sync(r_xprt,
> +						    &req->rl_registered);
>  
>  	rc = rpcrdma_marshal_req(r_xprt, rqst);
>  	if (rc < 0)
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 2/3] xprtrdma: Use ro_unmap_sync in xprt_rdma_send_request
  2017-10-13 17:43   ` Anna Schumaker
@ 2017-10-13 17:55     ` Chuck Lever
  2017-10-13 19:05       ` Chuck Lever
  0 siblings, 1 reply; 7+ messages in thread
From: Chuck Lever @ 2017-10-13 17:55 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: linux-rdma, Linux NFS Mailing List


> On Oct 13, 2017, at 1:43 PM, Anna Schumaker <Anna.Schumaker@Netapp.com> wrote:
> 
> Hi Chuck,
> 
> On 10/09/2017 12:03 PM, Chuck Lever wrote:
>> The "safe" version of ro_unmap is used here to avoid waiting
>> unnecessarily. However:
>> 
>> - It is safe to wait. After all, we have to wait anyway when using
>>   FMR to register memory.
>> 
>> - This case is rare: it occurs only after a reconnect.
> 
> I'm just double checking that this really is safe?  I'm seeing the hung task killer running more often after applying this patch:
> 
> [  245.376879] INFO: task kworker/0:5:193 blocked for more than 120 seconds.
> [  245.379909]       Not tainted 4.14.0-rc4-ANNA+ #10934
> [  245.382001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  245.385926] kworker/0:5     D    0   193      2 0x80000080
> [  245.388148] Workqueue: rpciod rpc_async_schedule [sunrpc]
> [  245.389386] Call Trace:
> [  245.390070]  __schedule+0x23d/0x850
> [  245.390673]  schedule+0x46/0xa0
> [  245.391219]  schedule_timeout+0x230/0x480
> [  245.391817]  ? ib_send_mad+0x32d/0x430
> [  245.392387]  ? __slab_alloc.isra.24+0x3e/0x60
> [  245.392988]  wait_for_common+0xc9/0x190
> [  245.393574]  ? wait_for_common+0xc9/0x190
> [  245.395290]  ? wake_up_q+0x90/0x90
> [  245.396239]  wait_for_completion+0x30/0x40
> [  245.397279]  frwr_op_unmap_sync+0x139/0x2a0 [rpcrdma]
> [  245.398401]  ? call_decode+0x830/0x830 [sunrpc]
> [  245.399435]  xprt_rdma_send_request+0xdc/0xf0 [rpcrdma]
> [  245.400558]  xprt_transmit+0x7f/0x370 [sunrpc]
> [  245.401500]  ? call_decode+0x830/0x830 [sunrpc]
> [  245.402079]  call_transmit+0x1c4/0x2a0 [sunrpc]
> [  245.402652]  ? call_decode+0x830/0x830 [sunrpc]
> [  245.403194]  __rpc_execute+0x92/0x430 [sunrpc]
> [  245.403737]  ? rpc_wake_up+0x7e/0x90 [sunrpc]
> [  245.404264]  rpc_async_schedule+0x25/0x30 [sunrpc]
> [  245.404796]  process_one_work+0x1ed/0x430
> [  245.405276]  worker_thread+0x45/0x3f0
> [  245.405730]  kthread+0x134/0x150
> [  245.406144]  ? process_one_work+0x430/0x430
> [  245.406627]  ? kthread_create_on_node+0x80/0x80
> [  245.407114]  ret_from_fork+0x25/0x30
> 
> I see this when running cthon tests over NFS v3 with kvm / softroce configured on my client and server.  I'm willing to accept that the problem might be on my end, but I still want to check with you first :)

My guess is that this is because your RDMA layer is dropping the
connection frequently, so you're hitting this case a lot. If
LocalInvalidation is taking a long time, that's a bug in the
provider, I'd say.

But I'm just speculating. You need to look into this and figure
out what's going on. ftrace with function_graph should give
some impression of where the waiting is going on.


> Thanks,
> Anna
> 
>> 
>> By switching this call site to ro_unmap_sync, the final use of
>> ro_unmap_safe is removed.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> net/sunrpc/xprtrdma/transport.c |    3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
>> index 8cf5ccf..eb46d24 100644
>> --- a/net/sunrpc/xprtrdma/transport.c
>> +++ b/net/sunrpc/xprtrdma/transport.c
>> @@ -728,7 +728,8 @@
>> 
>> 	/* On retransmit, remove any previously registered chunks */
>> 	if (unlikely(!list_empty(&req->rl_registered)))
>> -		r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
>> +		r_xprt->rx_ia.ri_ops->ro_unmap_sync(r_xprt,
>> +						    &req->rl_registered);
>> 
>> 	rc = rpcrdma_marshal_req(r_xprt, rqst);
>> 	if (rc < 0)
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 2/3] xprtrdma: Use ro_unmap_sync in xprt_rdma_send_request
  2017-10-13 17:55     ` Chuck Lever
@ 2017-10-13 19:05       ` Chuck Lever
  0 siblings, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2017-10-13 19:05 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: linux-rdma, Linux NFS Mailing List


> On Oct 13, 2017, at 1:55 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>> 
>> On Oct 13, 2017, at 1:43 PM, Anna Schumaker <Anna.Schumaker@Netapp.com> wrote:
>> 
>> Hi Chuck,
>> 
>> On 10/09/2017 12:03 PM, Chuck Lever wrote:
>>> The "safe" version of ro_unmap is used here to avoid waiting
>>> unnecessarily. However:
>>> 
>>> - It is safe to wait. After all, we have to wait anyway when using
>>>  FMR to register memory.
>>> 
>>> - This case is rare: it occurs only after a reconnect.
>> 
>> I'm just double checking that this really is safe?  I'm seeing the hung task killer running more often after applying this patch:
>> 
>> [  245.376879] INFO: task kworker/0:5:193 blocked for more than 120 seconds.
>> [  245.379909]       Not tainted 4.14.0-rc4-ANNA+ #10934
>> [  245.382001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [  245.385926] kworker/0:5     D    0   193      2 0x80000080
>> [  245.388148] Workqueue: rpciod rpc_async_schedule [sunrpc]
>> [  245.389386] Call Trace:
>> [  245.390070]  __schedule+0x23d/0x850
>> [  245.390673]  schedule+0x46/0xa0
>> [  245.391219]  schedule_timeout+0x230/0x480
>> [  245.391817]  ? ib_send_mad+0x32d/0x430
>> [  245.392387]  ? __slab_alloc.isra.24+0x3e/0x60
>> [  245.392988]  wait_for_common+0xc9/0x190
>> [  245.393574]  ? wait_for_common+0xc9/0x190
>> [  245.395290]  ? wake_up_q+0x90/0x90
>> [  245.396239]  wait_for_completion+0x30/0x40
>> [  245.397279]  frwr_op_unmap_sync+0x139/0x2a0 [rpcrdma]
>> [  245.398401]  ? call_decode+0x830/0x830 [sunrpc]
>> [  245.399435]  xprt_rdma_send_request+0xdc/0xf0 [rpcrdma]
>> [  245.400558]  xprt_transmit+0x7f/0x370 [sunrpc]
>> [  245.401500]  ? call_decode+0x830/0x830 [sunrpc]
>> [  245.402079]  call_transmit+0x1c4/0x2a0 [sunrpc]
>> [  245.402652]  ? call_decode+0x830/0x830 [sunrpc]
>> [  245.403194]  __rpc_execute+0x92/0x430 [sunrpc]
>> [  245.403737]  ? rpc_wake_up+0x7e/0x90 [sunrpc]
>> [  245.404264]  rpc_async_schedule+0x25/0x30 [sunrpc]
>> [  245.404796]  process_one_work+0x1ed/0x430
>> [  245.405276]  worker_thread+0x45/0x3f0
>> [  245.405730]  kthread+0x134/0x150
>> [  245.406144]  ? process_one_work+0x430/0x430
>> [  245.406627]  ? kthread_create_on_node+0x80/0x80
>> [  245.407114]  ret_from_fork+0x25/0x30
>> 
>> I see this when running cthon tests over NFS v3 with kvm / softroce configured on my client and server.  I'm willing to accept that the problem might be on my end, but I still want to check with you first :)
> 
> My guess is that this is because your RDMA layer is dropping the
> connection frequently, so you're hitting this case a lot. If
> LocalInvalidation is taking a long time, that's a bug in the
> provider, I'd say.
> 
> But I'm just speculating. You need to look into this and figure
> out what's going on. ftrace with function_graph should give
> some impression of where the waiting is going on.

Reproduced here. Trying to figure out why the client is not
reconnecting.


>> Thanks,
>> Anna
>> 
>>> 
>>> By switching this call site to ro_unmap_sync, the final use of
>>> ro_unmap_safe is removed.
>>> 
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>> net/sunrpc/xprtrdma/transport.c |    3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
>>> index 8cf5ccf..eb46d24 100644
>>> --- a/net/sunrpc/xprtrdma/transport.c
>>> +++ b/net/sunrpc/xprtrdma/transport.c
>>> @@ -728,7 +728,8 @@
>>> 
>>> 	/* On retransmit, remove any previously registered chunks */
>>> 	if (unlikely(!list_empty(&req->rl_registered)))
>>> -		r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
>>> +		r_xprt->rx_ia.ri_ops->ro_unmap_sync(r_xprt,
>>> +						    &req->rl_registered);
>>> 
>>> 	rc = rpcrdma_marshal_req(r_xprt, rqst);
>>> 	if (rc < 0)
>>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-10-13 19:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-09 16:03 [PATCH v1 0/3] NFS/RDMA client patches for v4.15 Chuck Lever
2017-10-09 16:03 ` [PATCH v1 1/3] xprtrdma: Don't defer fencing an async RPC's chunks Chuck Lever
2017-10-09 16:03 ` [PATCH v1 2/3] xprtrdma: Use ro_unmap_sync in xprt_rdma_send_request Chuck Lever
2017-10-13 17:43   ` Anna Schumaker
2017-10-13 17:55     ` Chuck Lever
2017-10-13 19:05       ` Chuck Lever
2017-10-09 16:03 ` [PATCH v1 3/3] xprtrdma: Remove ro_unmap_safe Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).