public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] SUNRPC: Fix backchannel reply, again
@ 2024-06-19 13:51 cel
  2024-06-20 11:41 ` Benjamin Coddington
  0 siblings, 1 reply; 3+ messages in thread
From: cel @ 2024-06-19 13:51 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

I still see "RPC: Could not send backchannel reply error: -110"
quite often, along with slow-running tests. Debugging shows that the
backchannel is still stumbling when it has to queue a callback reply
on a busy transport.

Note that every one of these timeouts causes a connection loss by
virtue of the xprt_conditional_disconnect() call in that arm of
call_cb_transmit_status().

I found that setting to_maxval is necessary to get the RPC timeout
logic to behave whenever to_exponential is not set.

Fixes: 57331a59ac0d ("NFSv4.1: Use the nfs_client's rpc timeouts for backchannel")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/svc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 965a27806bfd..e03f14024e47 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1588,9 +1588,11 @@ void svc_process(struct svc_rqst *rqstp)
  */
 void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
 {
+	struct rpc_timeout timeout = {
+		.to_increment		= 0,
+	};
 	struct rpc_task *task;
 	int proc_error;
-	struct rpc_timeout timeout;
 
 	/* Build the svc_rqst used by the common processing routine */
 	rqstp->rq_xid = req->rq_xid;
@@ -1643,6 +1645,7 @@ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
 		timeout.to_initval = req->rq_xprt->timeout->to_initval;
 		timeout.to_retries = req->rq_xprt->timeout->to_retries;
 	}
+	timeout.to_maxval = timeout.to_initval;
 	memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf));
 	task = rpc_run_bc_task(req, &timeout);
 
-- 
2.45.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] SUNRPC: Fix backchannel reply, again
  2024-06-19 13:51 [PATCH v2] SUNRPC: Fix backchannel reply, again cel
@ 2024-06-20 11:41 ` Benjamin Coddington
  2024-06-20 14:11   ` Chuck Lever
  0 siblings, 1 reply; 3+ messages in thread
From: Benjamin Coddington @ 2024-06-20 11:41 UTC (permalink / raw)
  To: cel; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs, Chuck Lever

On 19 Jun 2024, at 9:51, cel@kernel.org wrote:

> From: Chuck Lever <chuck.lever@oracle.com>
>
> I still see "RPC: Could not send backchannel reply error: -110"
> quite often, along with slow-running tests. Debugging shows that the
> backchannel is still stumbling when it has to queue a callback reply
> on a busy transport.
>
> Note that every one of these timeouts causes a connection loss by
> virtue of the xprt_conditional_disconnect() call in that arm of
> call_cb_transmit_status().
>
> I found that setting to_maxval is necessary to get the RPC timeout
> logic to behave whenever to_exponential is not set.
>
> Fixes: 57331a59ac0d ("NFSv4.1: Use the nfs_client's rpc timeouts for backchannel")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

That makes sense - I guess we were getting some random stack value in there?

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>

Ben

> ---
>  net/sunrpc/svc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 965a27806bfd..e03f14024e47 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1588,9 +1588,11 @@ void svc_process(struct svc_rqst *rqstp)
>   */
>  void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
>  {
> +	struct rpc_timeout timeout = {
> +		.to_increment		= 0,
> +	};
>  	struct rpc_task *task;
>  	int proc_error;
> -	struct rpc_timeout timeout;
>
>  	/* Build the svc_rqst used by the common processing routine */
>  	rqstp->rq_xid = req->rq_xid;
> @@ -1643,6 +1645,7 @@ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
>  		timeout.to_initval = req->rq_xprt->timeout->to_initval;
>  		timeout.to_retries = req->rq_xprt->timeout->to_retries;
>  	}
> +	timeout.to_maxval = timeout.to_initval;
>  	memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf));
>  	task = rpc_run_bc_task(req, &timeout);
>
> -- 
> 2.45.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] SUNRPC: Fix backchannel reply, again
  2024-06-20 11:41 ` Benjamin Coddington
@ 2024-06-20 14:11   ` Chuck Lever
  0 siblings, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2024-06-20 14:11 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: cel, Trond Myklebust, Anna Schumaker, linux-nfs

On Thu, Jun 20, 2024 at 07:41:21AM -0400, Benjamin Coddington wrote:
> On 19 Jun 2024, at 9:51, cel@kernel.org wrote:
> 
> > From: Chuck Lever <chuck.lever@oracle.com>
> >
> > I still see "RPC: Could not send backchannel reply error: -110"
> > quite often, along with slow-running tests. Debugging shows that the
> > backchannel is still stumbling when it has to queue a callback reply
> > on a busy transport.
> >
> > Note that every one of these timeouts causes a connection loss by
> > virtue of the xprt_conditional_disconnect() call in that arm of
> > call_cb_transmit_status().
> >
> > I found that setting to_maxval is necessary to get the RPC timeout
> > logic to behave whenever to_exponential is not set.
> >
> > Fixes: 57331a59ac0d ("NFSv4.1: Use the nfs_client's rpc timeouts for backchannel")
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> 
> That makes sense - I guess we were getting some random stack value in there?

Hi Ben-

On my systems it was always zero (which is why v1 of this patch did
not clear the other fields in @timeout before using it).

A zero to_maxval value results in the same timeout-on-sleep behavior
as you saw before 57331a59ac0d was applied.

A random non-zero value will behave correctly as long as the transport
is making forward progress, so we never noticed a problem.


> Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
> 
> Ben
> 
> > ---
> >  net/sunrpc/svc.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 965a27806bfd..e03f14024e47 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -1588,9 +1588,11 @@ void svc_process(struct svc_rqst *rqstp)
> >   */
> >  void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
> >  {
> > +	struct rpc_timeout timeout = {
> > +		.to_increment		= 0,
> > +	};
> >  	struct rpc_task *task;
> >  	int proc_error;
> > -	struct rpc_timeout timeout;
> >
> >  	/* Build the svc_rqst used by the common processing routine */
> >  	rqstp->rq_xid = req->rq_xid;
> > @@ -1643,6 +1645,7 @@ void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp)
> >  		timeout.to_initval = req->rq_xprt->timeout->to_initval;
> >  		timeout.to_retries = req->rq_xprt->timeout->to_retries;
> >  	}
> > +	timeout.to_maxval = timeout.to_initval;
> >  	memcpy(&req->rq_snd_buf, &rqstp->rq_res, sizeof(req->rq_snd_buf));
> >  	task = rpc_run_bc_task(req, &timeout);
> >
> > -- 
> > 2.45.1
> 

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-06-20 14:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-19 13:51 [PATCH v2] SUNRPC: Fix backchannel reply, again cel
2024-06-20 11:41 ` Benjamin Coddington
2024-06-20 14:11   ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox