[PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
@ 2025-08-04 18:46 Dai Ngo
  2025-08-04 18:49 ` Chuck Lever
  0 siblings, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2025-08-04 18:46 UTC (permalink / raw)
  To: chuck.lever, jlayton, neilb, okorniev, tom; +Cc: linux-nfs

Currently, when an RPC connection times out during the connect phase,
the task is retried by placing it back on the pending queue and waiting
again. In some cases, the timeout occurs because TCP is unable to send
the SYN packet. This situation most often arises on bare metal systems
at boot time, when the NFS mount is attempted while the network link
appears to be up but is not yet stable.

This patch addresses the issue by updating call_connect_status to destroy
the transport on ETIMEDOUT error before retrying the connection. This
ensures that subsequent connection attempts use a fresh transport,
reducing the likelihood of repeated failures due to lingering network
issues.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 net/sunrpc/clnt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 21426c3049d3..701b742750c5 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task *task)
 	case -EHOSTUNREACH:
 	case -EPIPE:
 	case -EPROTO:
+	case -ETIMEDOUT:
 		xprt_conditional_disconnect(task->tk_rqstp->rq_xprt,
 					    task->tk_rqstp->rq_connect_cookie);
 		if (RPC_IS_SOFTCONN(task))
@@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task *task)
 	case -EADDRINUSE:
 	case -ENOTCONN:
 	case -EAGAIN:
-	case -ETIMEDOUT:
 		if (!(task->tk_flags & RPC_TASK_NO_ROUND_ROBIN) &&
 		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
 		    test_bit(XPRT_REMOVE, &xprt->state)) {
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
  2025-08-04 18:46 Dai Ngo
@ 2025-08-04 18:49 ` Chuck Lever
  0 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2025-08-04 18:49 UTC (permalink / raw)
  To: Dai Ngo, jlayton, neilb, okorniev, tom; +Cc: linux-nfs

On 8/4/25 2:46 PM, Dai Ngo wrote:
> Currently, when an RPC connection times out during the connect phase,
> the task is retried by placing it back on the pending queue and waiting
> again. In some cases, the timeout occurs because TCP is unable to send
> the SYN packet. This situation most often arises on bare metal systems
> at boot time, when the NFS mount is attempted while the network link
> appears to be up but is not yet stable.
> 
> This patch addresses the issue by updating call_connect_status to destroy
> the transport on ETIMEDOUT error before retrying the connection. This
> ensures that subsequent connection attempts use a fresh transport,
> reducing the likelihood of repeated failures due to lingering network
> issues.
> 
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  net/sunrpc/clnt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 21426c3049d3..701b742750c5 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task *task)
>  	case -EHOSTUNREACH:
>  	case -EPIPE:
>  	case -EPROTO:
> +	case -ETIMEDOUT:
>  		xprt_conditional_disconnect(task->tk_rqstp->rq_xprt,
>  					    task->tk_rqstp->rq_connect_cookie);
>  		if (RPC_IS_SOFTCONN(task))
> @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task *task)
>  	case -EADDRINUSE:
>  	case -ENOTCONN:
>  	case -EAGAIN:
> -	case -ETIMEDOUT:
>  		if (!(task->tk_flags & RPC_TASK_NO_ROUND_ROBIN) &&
>  		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
>  		    test_bit(XPRT_REMOVE, &xprt->state)) {

Hello Dai, net/sunrpc/clnt.c is client-side. Can you resend to Trond and
Anna, please?


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
@ 2025-08-04 19:08 Dai Ngo
  2025-08-04 19:21 ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2025-08-04 19:08 UTC (permalink / raw)
  To: trondmy, anna; +Cc: linux-nfs

Currently, when an RPC connection times out during the connect phase,
the task is retried by placing it back on the pending queue and waiting
again. In some cases, the timeout occurs because TCP is unable to send
the SYN packet. This situation most often arises on bare metal systems
at boot time, when the NFS mount is attempted while the network link
appears to be up but is not yet stable.

This patch addresses the issue by updating call_connect_status to destroy
the transport on ETIMEDOUT error before retrying the connection. This
ensures that subsequent connection attempts use a fresh transport,
reducing the likelihood of repeated failures due to lingering network
issues.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 net/sunrpc/clnt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 21426c3049d3..701b742750c5 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task *task)
 	case -EHOSTUNREACH:
 	case -EPIPE:
 	case -EPROTO:
+	case -ETIMEDOUT:
 		xprt_conditional_disconnect(task->tk_rqstp->rq_xprt,
 					    task->tk_rqstp->rq_connect_cookie);
 		if (RPC_IS_SOFTCONN(task))
@@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task *task)
 	case -EADDRINUSE:
 	case -ENOTCONN:
 	case -EAGAIN:
-	case -ETIMEDOUT:
 		if (!(task->tk_flags & RPC_TASK_NO_ROUND_ROBIN) &&
 		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
 		    test_bit(XPRT_REMOVE, &xprt->state)) {
-- 
2.43.5

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
  2025-08-04 19:08 [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry Dai Ngo
@ 2025-08-04 19:21 ` Trond Myklebust
  2025-08-04 20:13   ` Dai Ngo
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2025-08-04 19:21 UTC (permalink / raw)
  To: Dai Ngo, anna; +Cc: linux-nfs

On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
> Currently, when an RPC connection times out during the connect phase,
> the task is retried by placing it back on the pending queue and
> waiting
> again. In some cases, the timeout occurs because TCP is unable to
> send
> the SYN packet. This situation most often arises on bare metal
> systems
> at boot time, when the NFS mount is attempted while the network link
> appears to be up but is not yet stable.
> 
> This patch addresses the issue by updating call_connect_status to
> destroy
> the transport on ETIMEDOUT error before retrying the connection. This
> ensures that subsequent connection attempts use a fresh transport,
> reducing the likelihood of repeated failures due to lingering network
> issues.
> 
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  net/sunrpc/clnt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 21426c3049d3..701b742750c5 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task *task)
>  	case -EHOSTUNREACH:
>  	case -EPIPE:
>  	case -EPROTO:
> +	case -ETIMEDOUT:
>  		xprt_conditional_disconnect(task->tk_rqstp->rq_xprt,
>  					    task->tk_rqstp-
> >rq_connect_cookie);
>  		if (RPC_IS_SOFTCONN(task))
> @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task *task)
>  	case -EADDRINUSE:
>  	case -ENOTCONN:
>  	case -EAGAIN:
> -	case -ETIMEDOUT:
>  		if (!(task->tk_flags & RPC_TASK_NO_ROUND_ROBIN) &&
>  		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
>  		    test_bit(XPRT_REMOVE, &xprt->state)) {

Why is this needed? The ETIMEDOUT is supposed to be a task level error,
not a connection level thing.

Oh... Is this because of TLS? If so, then please fix that to use a more
appropriate error.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
  2025-08-04 19:21 ` Trond Myklebust
@ 2025-08-04 20:13   ` Dai Ngo
  2025-08-04 23:55     ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2025-08-04 20:13 UTC (permalink / raw)
  To: Trond Myklebust, anna; +Cc: linux-nfs


On 8/4/25 12:21 PM, Trond Myklebust wrote:
> On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
>> Currently, when an RPC connection times out during the connect phase,
>> the task is retried by placing it back on the pending queue and
>> waiting
>> again. In some cases, the timeout occurs because TCP is unable to
>> send
>> the SYN packet. This situation most often arises on bare metal
>> systems
>> at boot time, when the NFS mount is attempted while the network link
>> appears to be up but is not yet stable.
>>
>> This patch addresses the issue by updating call_connect_status to
>> destroy
>> the transport on ETIMEDOUT error before retrying the connection. This
>> ensures that subsequent connection attempts use a fresh transport,
>> reducing the likelihood of repeated failures due to lingering network
>> issues.
>>
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>>   net/sunrpc/clnt.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>> index 21426c3049d3..701b742750c5 100644
>> --- a/net/sunrpc/clnt.c
>> +++ b/net/sunrpc/clnt.c
>> @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task *task)
>>   	case -EHOSTUNREACH:
>>   	case -EPIPE:
>>   	case -EPROTO:
>> +	case -ETIMEDOUT:
>>   		xprt_conditional_disconnect(task->tk_rqstp->rq_xprt,
>>   					    task->tk_rqstp-
>>> rq_connect_cookie);
>>   		if (RPC_IS_SOFTCONN(task))
>> @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task *task)
>>   	case -EADDRINUSE:
>>   	case -ENOTCONN:
>>   	case -EAGAIN:
>> -	case -ETIMEDOUT:
>>   		if (!(task->tk_flags & RPC_TASK_NO_ROUND_ROBIN) &&
>>   		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
>>   		    test_bit(XPRT_REMOVE, &xprt->state)) {
> Why is this needed? The ETIMEDOUT is supposed to be a task level error,
> not a connection level thing.

If TCP was not able to sent the SYN out on due to the transient error
with the link status and the task just turns around a wait again, since
TCP does not retry the SYN, eventually systemd times out and stops the
mount.


Below is the snippet from the system log with rpcdebug enabled:


Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info>  [1750414989.6033] manager: startup complete

... <NFS mount starts>
Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
...
Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue 0000000093f481cd "xprt_pending"
Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 setting alarm for 60000 ms

... <link status down & up>
Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: Link is up at 1000 Mbps, full duplex
Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>  [1750414990.6359] device (em1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>  [1750414990.6361] device (em1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>  [1750414990.6364] device (em1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>  [1750414990.6383] device (em1): Activation: successful, device activated.

... <connect timed out>
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task (now 4294742016)
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 disabling timer
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 removed from queue 0000000093f481cd "xprt_pending"
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 call_connect_status (status -110)

... <wait again>
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue "xprt_pending" time 4294742016)
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue 0000000093f481cd "xprt_pending"

... <systemd timed out>
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed out. Terminating.

Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 got signal
Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task (now 4294770229)
Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 disabling timer
Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 removed from queue 0000000093f481cd "xprt_pending"

Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512 [error]
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process exited, code=killed status=15
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with result 'timeout'.

This patch forces TCP to send the SYN on ETIMEDOUT error.

-Dai

>
> Oh... Is this because of TLS? If so, then please fix that to use a more
> appropriate error.
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
  2025-08-04 20:13   ` Dai Ngo
@ 2025-08-04 23:55     ` Trond Myklebust
  2025-08-05 15:46       ` Dai Ngo
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2025-08-04 23:55 UTC (permalink / raw)
  To: Dai Ngo, anna; +Cc: linux-nfs

On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote:
> 
> On 8/4/25 12:21 PM, Trond Myklebust wrote:
> > On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
> > > Currently, when an RPC connection times out during the connect
> > > phase,
> > > the task is retried by placing it back on the pending queue and
> > > waiting
> > > again. In some cases, the timeout occurs because TCP is unable to
> > > send
> > > the SYN packet. This situation most often arises on bare metal
> > > systems
> > > at boot time, when the NFS mount is attempted while the network
> > > link
> > > appears to be up but is not yet stable.
> > > 
> > > This patch addresses the issue by updating call_connect_status to
> > > destroy
> > > the transport on ETIMEDOUT error before retrying the connection.
> > > This
> > > ensures that subsequent connection attempts use a fresh
> > > transport,
> > > reducing the likelihood of repeated failures due to lingering
> > > network
> > > issues.
> > > 
> > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > ---
> > >   net/sunrpc/clnt.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > index 21426c3049d3..701b742750c5 100644
> > > --- a/net/sunrpc/clnt.c
> > > +++ b/net/sunrpc/clnt.c
> > > @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task *task)
> > >   	case -EHOSTUNREACH:
> > >   	case -EPIPE:
> > >   	case -EPROTO:
> > > +	case -ETIMEDOUT:
> > >   		xprt_conditional_disconnect(task->tk_rqstp-
> > > >rq_xprt,
> > >   					    task->tk_rqstp-
> > > > rq_connect_cookie);
> > >   		if (RPC_IS_SOFTCONN(task))
> > > @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task *task)
> > >   	case -EADDRINUSE:
> > >   	case -ENOTCONN:
> > >   	case -EAGAIN:
> > > -	case -ETIMEDOUT:
> > >   		if (!(task->tk_flags & RPC_TASK_NO_ROUND_ROBIN)
> > > &&
> > >   		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
> > >   		    test_bit(XPRT_REMOVE, &xprt->state)) {
> > Why is this needed? The ETIMEDOUT is supposed to be a task level
> > error,
> > not a connection level thing.
> 
> If TCP was not able to sent the SYN out on due to the transient error
> with the link status and the task just turns around a wait again,
> since
> TCP does not retry the SYN, eventually systemd times out and stops
> the
> mount.
> 
> 
> Below is the snippet from the system log with rpcdebug enabled:
> 
> 
> Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: NIC
> Link is Up, 10 Gbps Full Duplex, Flow Control: None
> Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414989.6033] manager: startup complete
> 
> ... <NFS mount starts>
> Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
> ...
> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue
> 0000000093f481cd "xprt_pending"
> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 setting alarm for
> 60000 ms
> 
> ... <link status down & up>
> Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: Link is
> up at 1000 Mbps, full duplex
> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414990.6359] device (em1): state change: disconnected -> prepare
> (reason 'none', sys-iface-state: 'managed')
> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414990.6361] device (em1): state change: prepare -> config
> (reason 'none', sys-iface-state: 'managed')
> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414990.6364] device (em1): state change: config -> ip-config
> (reason 'none', sys-iface-state: 'managed')
> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414990.6383] device (em1): Activation: successful, device
> activated.
> 
> ... <connect timed out>
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task
> (now 4294742016)
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 disabling timer
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 removed from queue
> 0000000093f481cd "xprt_pending"
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 call_connect_status
> (status -110)
> 
> ... <wait again>
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue
> "xprt_pending" time 4294742016)
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue
> 0000000093f481cd "xprt_pending"
> 
> ... <systemd timed out>
> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed
> out. Terminating.
> 
> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 got signal
> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task
> (now 4294770229)
> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 disabling timer
> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 removed from queue
> 0000000093f481cd "xprt_pending"
> 
> Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512
> [error]
> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process
> exited, code=killed status=15
> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with
> result 'timeout'.
> 
> This patch forces TCP to send the SYN on ETIMEDOUT error.
> 

Something is very wrong here...

If this patch is correct, and the call to xprt_conditional_disconnect()
does indeed cause the socket to close, then something must have bumped
xprt->connect_cookie. The only things that can do that are the state
changes recorded in xs_tcp_state_change(),
xs_sock_reset_connection_flags(), or xprt_autoclose().

If it was xs_tcp_state_change() that bumped xprt->connect_cookie, then
either we're in TCP_ESTABLISHED (in which case triggering a close on
ETIMEDOUT is just wrong), or we're in the TCP_CLOSE state, in which
case autoclose should already be scheduled.
If xs_sock_reset_connection_flags() got called, then the socket is in
the process of being closed.
Ditto if xprt_autoclose() got called.

So why do we need the call to xprt_conditional_disconnect()?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
  2025-08-04 23:55     ` Trond Myklebust
@ 2025-08-05 15:46       ` Dai Ngo
  2025-08-05 16:41         ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2025-08-05 15:46 UTC (permalink / raw)
  To: Trond Myklebust, anna; +Cc: linux-nfs


On 8/4/25 4:55 PM, Trond Myklebust wrote:
> On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote:
>> On 8/4/25 12:21 PM, Trond Myklebust wrote:
>>> On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
>>>> Currently, when an RPC connection times out during the connect
>>>> phase,
>>>> the task is retried by placing it back on the pending queue and
>>>> waiting
>>>> again. In some cases, the timeout occurs because TCP is unable to
>>>> send
>>>> the SYN packet. This situation most often arises on bare metal
>>>> systems
>>>> at boot time, when the NFS mount is attempted while the network
>>>> link
>>>> appears to be up but is not yet stable.
>>>>
>>>> This patch addresses the issue by updating call_connect_status to
>>>> destroy
>>>> the transport on ETIMEDOUT error before retrying the connection.
>>>> This
>>>> ensures that subsequent connection attempts use a fresh
>>>> transport,
>>>> reducing the likelihood of repeated failures due to lingering
>>>> network
>>>> issues.
>>>>
>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>> ---
>>>>    net/sunrpc/clnt.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>> index 21426c3049d3..701b742750c5 100644
>>>> --- a/net/sunrpc/clnt.c
>>>> +++ b/net/sunrpc/clnt.c
>>>> @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task *task)
>>>>    	case -EHOSTUNREACH:
>>>>    	case -EPIPE:
>>>>    	case -EPROTO:
>>>> +	case -ETIMEDOUT:
>>>>    		xprt_conditional_disconnect(task->tk_rqstp-
>>>>> rq_xprt,
>>>>    					    task->tk_rqstp-
>>>>> rq_connect_cookie);
>>>>    		if (RPC_IS_SOFTCONN(task))
>>>> @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task *task)
>>>>    	case -EADDRINUSE:
>>>>    	case -ENOTCONN:
>>>>    	case -EAGAIN:
>>>> -	case -ETIMEDOUT:
>>>>    		if (!(task->tk_flags & RPC_TASK_NO_ROUND_ROBIN)
>>>> &&
>>>>    		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
>>>>    		    test_bit(XPRT_REMOVE, &xprt->state)) {
>>> Why is this needed? The ETIMEDOUT is supposed to be a task level
>>> error,
>>> not a connection level thing.
>> If TCP was not able to sent the SYN out on due to the transient error
>> with the link status and the task just turns around a wait again,
>> since
>> TCP does not retry the SYN, eventually systemd times out and stops
>> the
>> mount.
>>
>>
>> Below is the snippet from the system log with rpcdebug enabled:
>>
>>
>> Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: NIC
>> Link is Up, 10 Gbps Full Duplex, Flow Control: None
>> Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info>
>> [1750414989.6033] manager: startup complete
>>
>> ... <NFS mount starts>
>> Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
>> ...
>> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue
>> 0000000093f481cd "xprt_pending"
>> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 setting alarm for
>> 60000 ms
>>
>> ... <link status down & up>
>> Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: Link is
>> up at 1000 Mbps, full duplex
>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>> [1750414990.6359] device (em1): state change: disconnected -> prepare
>> (reason 'none', sys-iface-state: 'managed')
>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>> [1750414990.6361] device (em1): state change: prepare -> config
>> (reason 'none', sys-iface-state: 'managed')
>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>> [1750414990.6364] device (em1): state change: config -> ip-config
>> (reason 'none', sys-iface-state: 'managed')
>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>> [1750414990.6383] device (em1): Activation: successful, device
>> activated.
>>
>> ... <connect timed out>
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task
>> (now 4294742016)
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 disabling timer
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 removed from queue
>> 0000000093f481cd "xprt_pending"
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 call_connect_status
>> (status -110)
>>
>> ... <wait again>
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue
>> "xprt_pending" time 4294742016)
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue
>> 0000000093f481cd "xprt_pending"
>>
>> ... <systemd timed out>
>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed
>> out. Terminating.
>>
>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 got signal
>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task
>> (now 4294770229)
>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 disabling timer
>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 removed from queue
>> 0000000093f481cd "xprt_pending"
>>
>> Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512
>> [error]
>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process
>> exited, code=killed status=15
>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with
>> result 'timeout'.
>>
>> This patch forces TCP to send the SYN on ETIMEDOUT error.
>>
> Something is very wrong here...
>
> If this patch is correct, and the call to xprt_conditional_disconnect()
> does indeed cause the socket to close, then something must have bumped
> xprt->connect_cookie.

I'm a little confused here, xprt_conditional_disconnect only closes the
socket if the connect cookie is still the same:

         if (cookie != xprt->connect_cookie)
                 goto out;

So in this case the xprt->connect_cookie has not been bumped.

-Dai

>   The only things that can do that are the state
> changes recorded in xs_tcp_state_change(),
> xs_sock_reset_connection_flags(), or xprt_autoclose().
>
> If it was xs_tcp_state_change() that bumped xprt->connect_cookie, then
> either we're in TCP_ESTABLISHED (in which case triggering a close on
> ETIMEDOUT is just wrong), or we're in the TCP_CLOSE state, in which
> case autoclose should already be scheduled.
> If xs_sock_reset_connection_flags() got called, then the socket is in
> the process of being closed.
> Ditto if xprt_autoclose() got called.
>
> So why do we need the call to xprt_conditional_disconnect()?
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
  2025-08-05 15:46       ` Dai Ngo
@ 2025-08-05 16:41         ` Trond Myklebust
  2025-08-05 17:07           ` Dai Ngo
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2025-08-05 16:41 UTC (permalink / raw)
  To: Dai Ngo, anna; +Cc: linux-nfs

On Tue, 2025-08-05 at 08:46 -0700, Dai Ngo wrote:
> 
> On 8/4/25 4:55 PM, Trond Myklebust wrote:
> > On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote:
> > > On 8/4/25 12:21 PM, Trond Myklebust wrote:
> > > > On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
> > > > > Currently, when an RPC connection times out during the
> > > > > connect
> > > > > phase,
> > > > > the task is retried by placing it back on the pending queue
> > > > > and
> > > > > waiting
> > > > > again. In some cases, the timeout occurs because TCP is
> > > > > unable to
> > > > > send
> > > > > the SYN packet. This situation most often arises on bare
> > > > > metal
> > > > > systems
> > > > > at boot time, when the NFS mount is attempted while the
> > > > > network
> > > > > link
> > > > > appears to be up but is not yet stable.
> > > > > 
> > > > > This patch addresses the issue by updating
> > > > > call_connect_status to
> > > > > destroy
> > > > > the transport on ETIMEDOUT error before retrying the
> > > > > connection.
> > > > > This
> > > > > ensures that subsequent connection attempts use a fresh
> > > > > transport,
> > > > > reducing the likelihood of repeated failures due to lingering
> > > > > network
> > > > > issues.
> > > > > 
> > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > ---
> > > > >    net/sunrpc/clnt.c | 2 +-
> > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > > > index 21426c3049d3..701b742750c5 100644
> > > > > --- a/net/sunrpc/clnt.c
> > > > > +++ b/net/sunrpc/clnt.c
> > > > > @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task
> > > > > *task)
> > > > >    	case -EHOSTUNREACH:
> > > > >    	case -EPIPE:
> > > > >    	case -EPROTO:
> > > > > +	case -ETIMEDOUT:
> > > > >    		xprt_conditional_disconnect(task->tk_rqstp-
> > > > > > rq_xprt,
> > > > >    					    task->tk_rqstp-
> > > > > > rq_connect_cookie);
> > > > >    		if (RPC_IS_SOFTCONN(task))
> > > > > @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task
> > > > > *task)
> > > > >    	case -EADDRINUSE:
> > > > >    	case -ENOTCONN:
> > > > >    	case -EAGAIN:
> > > > > -	case -ETIMEDOUT:
> > > > >    		if (!(task->tk_flags &
> > > > > RPC_TASK_NO_ROUND_ROBIN)
> > > > > &&
> > > > >    		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
> > > > >    		    test_bit(XPRT_REMOVE, &xprt->state)) {
> > > > Why is this needed? The ETIMEDOUT is supposed to be a task
> > > > level
> > > > error,
> > > > not a connection level thing.
> > > If TCP was not able to sent the SYN out on due to the transient
> > > error
> > > with the link status and the task just turns around a wait again,
> > > since
> > > TCP does not retry the SYN, eventually systemd times out and
> > > stops
> > > the
> > > mount.
> > > 
> > > 
> > > Below is the snippet from the system log with rpcdebug enabled:
> > > 
> > > 
> > > Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: NIC
> > > Link is Up, 10 Gbps Full Duplex, Flow Control: None
> > > Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414989.6033] manager: startup complete
> > > 
> > > ... <NFS mount starts>
> > > Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
> > > ...
> > > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue
> > > 0000000093f481cd "xprt_pending"
> > > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 setting alarm
> > > for
> > > 60000 ms
> > > 
> > > ... <link status down & up>
> > > Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: Link
> > > is
> > > up at 1000 Mbps, full duplex
> > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414990.6359] device (em1): state change: disconnected ->
> > > prepare
> > > (reason 'none', sys-iface-state: 'managed')
> > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414990.6361] device (em1): state change: prepare -> config
> > > (reason 'none', sys-iface-state: 'managed')
> > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414990.6364] device (em1): state change: config -> ip-config
> > > (reason 'none', sys-iface-state: 'managed')
> > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414990.6383] device (em1): Activation: successful, device
> > > activated.
> > > 
> > > ... <connect timed out>
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
> > > __rpc_wake_up_task
> > > (now 4294742016)
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 disabling timer
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 removed from
> > > queue
> > > 0000000093f481cd "xprt_pending"
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
> > > call_connect_status
> > > (status -110)
> > > 
> > > ... <wait again>
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue
> > > "xprt_pending" time 4294742016)
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue
> > > 0000000093f481cd "xprt_pending"
> > > 
> > > ... <systemd timed out>
> > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting
> > > timed
> > > out. Terminating.
> > > 
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 got signal
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1
> > > __rpc_wake_up_task
> > > (now 4294770229)
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 disabling timer
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 removed from
> > > queue
> > > 0000000093f481cd "xprt_pending"
> > > 
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512
> > > [error]
> > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount
> > > process
> > > exited, code=killed status=15
> > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with
> > > result 'timeout'.
> > > 
> > > This patch forces TCP to send the SYN on ETIMEDOUT error.
> > > 
> > Something is very wrong here...
> > 
> > If this patch is correct, and the call to
> > xprt_conditional_disconnect()
> > does indeed cause the socket to close, then something must have
> > bumped
> > xprt->connect_cookie.
> 
> I'm a little confused here, xprt_conditional_disconnect only closes
> the
> socket if the connect cookie is still the same:
> 
>          if (cookie != xprt->connect_cookie)
>                  goto out;
> 
> So in this case the xprt->connect_cookie has not been bumped.

You're right... Sorry...

So does this mean that the socket is still in the SYN-SENT state? It is
normally supposed to remain in that state for 60 seconds, and resend
SYN using an exponential back off. What you are doing should not be
necessary if the socket is in that state.

> 
> -Dai
> 
> >   The only things that can do that are the state
> > changes recorded in xs_tcp_state_change(),
> > xs_sock_reset_connection_flags(), or xprt_autoclose().
> > 
> > If it was xs_tcp_state_change() that bumped xprt->connect_cookie,
> > then
> > either we're in TCP_ESTABLISHED (in which case triggering a close
> > on
> > ETIMEDOUT is just wrong), or we're in the TCP_CLOSE state, in which
> > case autoclose should already be scheduled.
> > If xs_sock_reset_connection_flags() got called, then the socket is
> > in
> > the process of being closed.
> > Ditto if xprt_autoclose() got called.
> > 
> > So why do we need the call to xprt_conditional_disconnect()?
> > 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
  2025-08-05 16:41         ` Trond Myklebust
@ 2025-08-05 17:07           ` Dai Ngo
  2025-08-05 17:41             ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Dai Ngo @ 2025-08-05 17:07 UTC (permalink / raw)
  To: Trond Myklebust, anna; +Cc: linux-nfs


On 8/5/25 9:41 AM, Trond Myklebust wrote:
> On Tue, 2025-08-05 at 08:46 -0700, Dai Ngo wrote:
>> On 8/4/25 4:55 PM, Trond Myklebust wrote:
>>> On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote:
>>>> On 8/4/25 12:21 PM, Trond Myklebust wrote:
>>>>> On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
>>>>>> Currently, when an RPC connection times out during the
>>>>>> connect
>>>>>> phase,
>>>>>> the task is retried by placing it back on the pending queue
>>>>>> and
>>>>>> waiting
>>>>>> again. In some cases, the timeout occurs because TCP is
>>>>>> unable to
>>>>>> send
>>>>>> the SYN packet. This situation most often arises on bare
>>>>>> metal
>>>>>> systems
>>>>>> at boot time, when the NFS mount is attempted while the
>>>>>> network
>>>>>> link
>>>>>> appears to be up but is not yet stable.
>>>>>>
>>>>>> This patch addresses the issue by updating
>>>>>> call_connect_status to
>>>>>> destroy
>>>>>> the transport on ETIMEDOUT error before retrying the
>>>>>> connection.
>>>>>> This
>>>>>> ensures that subsequent connection attempts use a fresh
>>>>>> transport,
>>>>>> reducing the likelihood of repeated failures due to lingering
>>>>>> network
>>>>>> issues.
>>>>>>
>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>> ---
>>>>>>     net/sunrpc/clnt.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>>>> index 21426c3049d3..701b742750c5 100644
>>>>>> --- a/net/sunrpc/clnt.c
>>>>>> +++ b/net/sunrpc/clnt.c
>>>>>> @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task
>>>>>> *task)
>>>>>>     	case -EHOSTUNREACH:
>>>>>>     	case -EPIPE:
>>>>>>     	case -EPROTO:
>>>>>> +	case -ETIMEDOUT:
>>>>>>     		xprt_conditional_disconnect(task->tk_rqstp-
>>>>>>> rq_xprt,
>>>>>>     					    task->tk_rqstp-
>>>>>>> rq_connect_cookie);
>>>>>>     		if (RPC_IS_SOFTCONN(task))
>>>>>> @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task
>>>>>> *task)
>>>>>>     	case -EADDRINUSE:
>>>>>>     	case -ENOTCONN:
>>>>>>     	case -EAGAIN:
>>>>>> -	case -ETIMEDOUT:
>>>>>>     		if (!(task->tk_flags &
>>>>>> RPC_TASK_NO_ROUND_ROBIN)
>>>>>> &&
>>>>>>     		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
>>>>>>     		    test_bit(XPRT_REMOVE, &xprt->state)) {
>>>>> Why is this needed? The ETIMEDOUT is supposed to be a task
>>>>> level
>>>>> error,
>>>>> not a connection level thing.
>>>> If TCP was not able to sent the SYN out on due to the transient
>>>> error
>>>> with the link status and the task just turns around a wait again,
>>>> since
>>>> TCP does not retry the SYN, eventually systemd times out and
>>>> stops
>>>> the
>>>> mount.
>>>>
>>>>
>>>> Below is the snippet from the system log with rpcdebug enabled:
>>>>
>>>>
>>>> Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: NIC
>>>> Link is Up, 10 Gbps Full Duplex, Flow Control: None
>>>> Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info>
>>>> [1750414989.6033] manager: startup complete
>>>>
>>>> ... <NFS mount starts>
>>>> Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
>>>> ...
>>>> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue
>>>> 0000000093f481cd "xprt_pending"
>>>> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 setting alarm
>>>> for
>>>> 60000 ms
>>>>
>>>> ... <link status down & up>
>>>> Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: Link
>>>> is
>>>> up at 1000 Mbps, full duplex
>>>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>>>> [1750414990.6359] device (em1): state change: disconnected ->
>>>> prepare
>>>> (reason 'none', sys-iface-state: 'managed')
>>>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>>>> [1750414990.6361] device (em1): state change: prepare -> config
>>>> (reason 'none', sys-iface-state: 'managed')
>>>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>>>> [1750414990.6364] device (em1): state change: config -> ip-config
>>>> (reason 'none', sys-iface-state: 'managed')
>>>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>>>> [1750414990.6383] device (em1): Activation: successful, device
>>>> activated.
>>>>
>>>> ... <connect timed out>
>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
>>>> __rpc_wake_up_task
>>>> (now 4294742016)
>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 disabling timer
>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 removed from
>>>> queue
>>>> 0000000093f481cd "xprt_pending"
>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
>>>> call_connect_status
>>>> (status -110)
>>>>
>>>> ... <wait again>
>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue
>>>> "xprt_pending" time 4294742016)
>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue
>>>> 0000000093f481cd "xprt_pending"
>>>>
>>>> ... <systemd timed out>
>>>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting
>>>> timed
>>>> out. Terminating.
>>>>
>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 got signal
>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1
>>>> __rpc_wake_up_task
>>>> (now 4294770229)
>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 disabling timer
>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 removed from
>>>> queue
>>>> 0000000093f481cd "xprt_pending"
>>>>
>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512
>>>> [error]
>>>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount
>>>> process
>>>> exited, code=killed status=15
>>>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with
>>>> result 'timeout'.
>>>>
>>>> This patch forces TCP to send the SYN on ETIMEDOUT error.
>>>>
>>> Something is very wrong here...
>>>
>>> If this patch is correct, and the call to
>>> xprt_conditional_disconnect()
>>> does indeed cause the socket to close, then something must have
>>> bumped
>>> xprt->connect_cookie.
>> I'm a little confused here, xprt_conditional_disconnect only closes
>> the
>> socket if the connect cookie is still the same:
>>
>>           if (cookie != xprt->connect_cookie)
>>                   goto out;
>>
>> So in this case the xprt->connect_cookie has not been bumped.
> You're right... Sorry...
>
> So does this mean that the socket is still in the SYN-SENT state? It is
> normally supposed to remain in that state for 60 seconds, and resend
> SYN using an exponential back off.

Apparently this was not done by TCP in this scenario:

... mount starts
Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue 0000000093f481cd "xprt_pending"

Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>  [1750414990.6383] device (em1): Activation: successful, device activated.

... RPC task timed out on connect after 62 secs.  Note that at this time the network link status was already up for more than 60 secs
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task (now 4294742016)

... RPC task goes back and wait again.
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue "xprt_pending" time 4294742016)
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue 0000000093f481cd "xprt_pending"

... nothing happened and systemd timed out after 90 secs and kill the mount
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed out. Terminating.
Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512 [error]
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process exited, code=killed status=15
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with result 'timeout'.

I don't know why TCP does not retry the SYN after the networkk link is
up again but I though instead of trying to figure out and fix the problem
at TCP layer, we can make the RPC connect process a bit more robust by using
a new socket for retry.

-Dai

> What you are doing should not be
> necessary if the socket is in that state.
>
>> -Dai
>>
>>>    The only things that can do that are the state
>>> changes recorded in xs_tcp_state_change(),
>>> xs_sock_reset_connection_flags(), or xprt_autoclose().
>>>
>>> If it was xs_tcp_state_change() that bumped xprt->connect_cookie,
>>> then
>>> either we're in TCP_ESTABLISHED (in which case triggering a close
>>> on
>>> ETIMEDOUT is just wrong), or we're in the TCP_CLOSE state, in which
>>> case autoclose should already be scheduled.
>>> If xs_sock_reset_connection_flags() got called, then the socket is
>>> in
>>> the process of being closed.
>>> Ditto if xprt_autoclose() got called.
>>>
>>> So why do we need the call to xprt_conditional_disconnect()?
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
  2025-08-05 17:07           ` Dai Ngo
@ 2025-08-05 17:41             ` Trond Myklebust
  2025-08-05 18:25               ` Dai Ngo
  0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2025-08-05 17:41 UTC (permalink / raw)
  To: Dai Ngo, anna; +Cc: linux-nfs

On Tue, 2025-08-05 at 10:07 -0700, Dai Ngo wrote:
> 
> On 8/5/25 9:41 AM, Trond Myklebust wrote:
> > On Tue, 2025-08-05 at 08:46 -0700, Dai Ngo wrote:
> > > On 8/4/25 4:55 PM, Trond Myklebust wrote:
> > > > On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote:
> > > > > On 8/4/25 12:21 PM, Trond Myklebust wrote:
> > > > > > On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
> > > > > > > Currently, when an RPC connection times out during the
> > > > > > > connect
> > > > > > > phase,
> > > > > > > the task is retried by placing it back on the pending
> > > > > > > queue
> > > > > > > and
> > > > > > > waiting
> > > > > > > again. In some cases, the timeout occurs because TCP is
> > > > > > > unable to
> > > > > > > send
> > > > > > > the SYN packet. This situation most often arises on bare
> > > > > > > metal
> > > > > > > systems
> > > > > > > at boot time, when the NFS mount is attempted while the
> > > > > > > network
> > > > > > > link
> > > > > > > appears to be up but is not yet stable.
> > > > > > > 
> > > > > > > This patch addresses the issue by updating
> > > > > > > call_connect_status to
> > > > > > > destroy
> > > > > > > the transport on ETIMEDOUT error before retrying the
> > > > > > > connection.
> > > > > > > This
> > > > > > > ensures that subsequent connection attempts use a fresh
> > > > > > > transport,
> > > > > > > reducing the likelihood of repeated failures due to
> > > > > > > lingering
> > > > > > > network
> > > > > > > issues.
> > > > > > > 
> > > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > > > ---
> > > > > > >     net/sunrpc/clnt.c | 2 +-
> > > > > > >     1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > > > > > index 21426c3049d3..701b742750c5 100644
> > > > > > > --- a/net/sunrpc/clnt.c
> > > > > > > +++ b/net/sunrpc/clnt.c
> > > > > > > @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task
> > > > > > > *task)
> > > > > > >     	case -EHOSTUNREACH:
> > > > > > >     	case -EPIPE:
> > > > > > >     	case -EPROTO:
> > > > > > > +	case -ETIMEDOUT:
> > > > > > >     		xprt_conditional_disconnect(task-
> > > > > > > >tk_rqstp-
> > > > > > > > rq_xprt,
> > > > > > >     					    task-
> > > > > > > >tk_rqstp-
> > > > > > > > rq_connect_cookie);
> > > > > > >     		if (RPC_IS_SOFTCONN(task))
> > > > > > > @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task
> > > > > > > *task)
> > > > > > >     	case -EADDRINUSE:
> > > > > > >     	case -ENOTCONN:
> > > > > > >     	case -EAGAIN:
> > > > > > > -	case -ETIMEDOUT:
> > > > > > >     		if (!(task->tk_flags &
> > > > > > > RPC_TASK_NO_ROUND_ROBIN)
> > > > > > > &&
> > > > > > >     		    (task->tk_flags & RPC_TASK_MOVEABLE)
> > > > > > > &&
> > > > > > >     		    test_bit(XPRT_REMOVE, &xprt->state))
> > > > > > > {
> > > > > > Why is this needed? The ETIMEDOUT is supposed to be a task
> > > > > > level
> > > > > > error,
> > > > > > not a connection level thing.
> > > > > If TCP was not able to sent the SYN out on due to the
> > > > > transient
> > > > > error
> > > > > with the link status and the task just turns around a wait
> > > > > again,
> > > > > since
> > > > > TCP does not retry the SYN, eventually systemd times out and
> > > > > stops
> > > > > the
> > > > > mount.
> > > > > 
> > > > > 
> > > > > Below is the snippet from the system log with rpcdebug
> > > > > enabled:
> > > > > 
> > > > > 
> > > > > Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1:
> > > > > NIC
> > > > > Link is Up, 10 Gbps Full Duplex, Flow Control: None
> > > > > Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info>
> > > > > [1750414989.6033] manager: startup complete
> > > > > 
> > > > > ... <NFS mount starts>
> > > > > Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
> > > > > ...
> > > > > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to
> > > > > queue
> > > > > 0000000093f481cd "xprt_pending"
> > > > > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 setting
> > > > > alarm
> > > > > for
> > > > > 60000 ms
> > > > > 
> > > > > ... <link status down & up>
> > > > > Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1:
> > > > > Link
> > > > > is
> > > > > up at 1000 Mbps, full duplex
> > > > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > > > [1750414990.6359] device (em1): state change: disconnected ->
> > > > > prepare
> > > > > (reason 'none', sys-iface-state: 'managed')
> > > > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > > > [1750414990.6361] device (em1): state change: prepare ->
> > > > > config
> > > > > (reason 'none', sys-iface-state: 'managed')
> > > > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > > > [1750414990.6364] device (em1): state change: config -> ip-
> > > > > config
> > > > > (reason 'none', sys-iface-state: 'managed')
> > > > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > > > [1750414990.6383] device (em1): Activation: successful,
> > > > > device
> > > > > activated.
> > > > > 
> > > > > ... <connect timed out>
> > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
> > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
> > > > > __rpc_wake_up_task
> > > > > (now 4294742016)
> > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 disabling
> > > > > timer
> > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 removed from
> > > > > queue
> > > > > 0000000093f481cd "xprt_pending"
> > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
> > > > > call_connect_status
> > > > > (status -110)
> > > > > 
> > > > > ... <wait again>
> > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
> > > > > sleep_on(queue
> > > > > "xprt_pending" time 4294742016)
> > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to
> > > > > queue
> > > > > 0000000093f481cd "xprt_pending"
> > > > > 
> > > > > ... <systemd timed out>
> > > > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting
> > > > > timed
> > > > > out. Terminating.
> > > > > 
> > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 got signal
> > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1
> > > > > __rpc_wake_up_task
> > > > > (now 4294770229)
> > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 disabling
> > > > > timer
> > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 removed from
> > > > > queue
> > > > > 0000000093f481cd "xprt_pending"
> > > > > 
> > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() =
> > > > > -512
> > > > > [error]
> > > > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount
> > > > > process
> > > > > exited, code=killed status=15
> > > > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed
> > > > > with
> > > > > result 'timeout'.
> > > > > 
> > > > > This patch forces TCP to send the SYN on ETIMEDOUT error.
> > > > > 
> > > > Something is very wrong here...
> > > > 
> > > > If this patch is correct, and the call to
> > > > xprt_conditional_disconnect()
> > > > does indeed cause the socket to close, then something must have
> > > > bumped
> > > > xprt->connect_cookie.
> > > I'm a little confused here, xprt_conditional_disconnect only
> > > closes
> > > the
> > > socket if the connect cookie is still the same:
> > > 
> > >           if (cookie != xprt->connect_cookie)
> > >                   goto out;
> > > 
> > > So in this case the xprt->connect_cookie has not been bumped.
> > You're right... Sorry...
> > 
> > So does this mean that the socket is still in the SYN-SENT state?
> > It is
> > normally supposed to remain in that state for 60 seconds, and
> > resend
> > SYN using an exponential back off.
> 
> Apparently this was not done by TCP in this scenario:
> 
> ... mount starts
> Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue
> 0000000093f481cd "xprt_pending"
> 
> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414990.6383] device (em1): Activation: successful, device
> activated.
> 
> ... RPC task timed out on connect after 62 secs.  Note that at this
> time the network link status was already up for more than 60 secs
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task
> (now 4294742016)
> 
> ... RPC task goes back and wait again.
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue
> "xprt_pending" time 4294742016)
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue
> 0000000093f481cd "xprt_pending"
> 
> ... nothing happened and systemd timed out after 90 secs and kill the
> mount
> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed
> out. Terminating.
> Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512
> [error]
> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process
> exited, code=killed status=15
> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with
> result 'timeout'.
> 
> I don't know why TCP does not retry the SYN after the networkk link
> is
> up again but I though instead of trying to figure out and fix the
> problem
> at TCP layer, we can make the RPC connect process a bit more robust
> by using
> a new socket for retry.
> 

We should never be using individual RPC message timeouts as a measure
for whether or not the TCP connection is down.

For the initial connection attempt, we use the TCP_SYNCNT socket option
to decide when to give up. Once the connection is established, we set
TCP_USER_TIMEOUT as well as socket keepalive to ensure that the
connection is still alive.

I'm willing to consider further socket-specific timers, but not if the
reason is to work around bugs (assuming that is indeed the case here).

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
  2025-08-05 17:41             ` Trond Myklebust
@ 2025-08-05 18:25               ` Dai Ngo
  0 siblings, 0 replies; 11+ messages in thread
From: Dai Ngo @ 2025-08-05 18:25 UTC (permalink / raw)
  To: Trond Myklebust, anna; +Cc: linux-nfs


On 8/5/25 10:41 AM, Trond Myklebust wrote:
> On Tue, 2025-08-05 at 10:07 -0700, Dai Ngo wrote:
>> On 8/5/25 9:41 AM, Trond Myklebust wrote:
>>> On Tue, 2025-08-05 at 08:46 -0700, Dai Ngo wrote:
>>>> On 8/4/25 4:55 PM, Trond Myklebust wrote:
>>>>> On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote:
>>>>>> On 8/4/25 12:21 PM, Trond Myklebust wrote:
>>>>>>> On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
>>>>>>>> Currently, when an RPC connection times out during the
>>>>>>>> connect
>>>>>>>> phase,
>>>>>>>> the task is retried by placing it back on the pending
>>>>>>>> queue
>>>>>>>> and
>>>>>>>> waiting
>>>>>>>> again. In some cases, the timeout occurs because TCP is
>>>>>>>> unable to
>>>>>>>> send
>>>>>>>> the SYN packet. This situation most often arises on bare
>>>>>>>> metal
>>>>>>>> systems
>>>>>>>> at boot time, when the NFS mount is attempted while the
>>>>>>>> network
>>>>>>>> link
>>>>>>>> appears to be up but is not yet stable.
>>>>>>>>
>>>>>>>> This patch addresses the issue by updating
>>>>>>>> call_connect_status to
>>>>>>>> destroy
>>>>>>>> the transport on ETIMEDOUT error before retrying the
>>>>>>>> connection.
>>>>>>>> This
>>>>>>>> ensures that subsequent connection attempts use a fresh
>>>>>>>> transport,
>>>>>>>> reducing the likelihood of repeated failures due to
>>>>>>>> lingering
>>>>>>>> network
>>>>>>>> issues.
>>>>>>>>
>>>>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>>>>> ---
>>>>>>>>      net/sunrpc/clnt.c | 2 +-
>>>>>>>>      1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
>>>>>>>> index 21426c3049d3..701b742750c5 100644
>>>>>>>> --- a/net/sunrpc/clnt.c
>>>>>>>> +++ b/net/sunrpc/clnt.c
>>>>>>>> @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task
>>>>>>>> *task)
>>>>>>>>      	case -EHOSTUNREACH:
>>>>>>>>      	case -EPIPE:
>>>>>>>>      	case -EPROTO:
>>>>>>>> +	case -ETIMEDOUT:
>>>>>>>>      		xprt_conditional_disconnect(task-
>>>>>>>>> tk_rqstp-
>>>>>>>>> rq_xprt,
>>>>>>>>      					    task-
>>>>>>>>> tk_rqstp-
>>>>>>>>> rq_connect_cookie);
>>>>>>>>      		if (RPC_IS_SOFTCONN(task))
>>>>>>>> @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task
>>>>>>>> *task)
>>>>>>>>      	case -EADDRINUSE:
>>>>>>>>      	case -ENOTCONN:
>>>>>>>>      	case -EAGAIN:
>>>>>>>> -	case -ETIMEDOUT:
>>>>>>>>      		if (!(task->tk_flags &
>>>>>>>> RPC_TASK_NO_ROUND_ROBIN)
>>>>>>>> &&
>>>>>>>>      		    (task->tk_flags & RPC_TASK_MOVEABLE)
>>>>>>>> &&
>>>>>>>>      		    test_bit(XPRT_REMOVE, &xprt->state))
>>>>>>>> {
>>>>>>> Why is this needed? The ETIMEDOUT is supposed to be a task
>>>>>>> level
>>>>>>> error,
>>>>>>> not a connection level thing.
>>>>>> If TCP was not able to sent the SYN out on due to the
>>>>>> transient
>>>>>> error
>>>>>> with the link status and the task just turns around a wait
>>>>>> again,
>>>>>> since
>>>>>> TCP does not retry the SYN, eventually systemd times out and
>>>>>> stops
>>>>>> the
>>>>>> mount.
>>>>>>
>>>>>>
>>>>>> Below is the snippet from the system log with rpcdebug
>>>>>> enabled:
>>>>>>
>>>>>>
>>>>>> Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1:
>>>>>> NIC
>>>>>> Link is Up, 10 Gbps Full Duplex, Flow Control: None
>>>>>> Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info>
>>>>>> [1750414989.6033] manager: startup complete
>>>>>>
>>>>>> ... <NFS mount starts>
>>>>>> Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
>>>>>> ...
>>>>>> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to
>>>>>> queue
>>>>>> 0000000093f481cd "xprt_pending"
>>>>>> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 setting
>>>>>> alarm
>>>>>> for
>>>>>> 60000 ms
>>>>>>
>>>>>> ... <link status down & up>
>>>>>> Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1:
>>>>>> Link
>>>>>> is
>>>>>> up at 1000 Mbps, full duplex
>>>>>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>>>>>> [1750414990.6359] device (em1): state change: disconnected ->
>>>>>> prepare
>>>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>>>>>> [1750414990.6361] device (em1): state change: prepare ->
>>>>>> config
>>>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>>>>>> [1750414990.6364] device (em1): state change: config -> ip-
>>>>>> config
>>>>>> (reason 'none', sys-iface-state: 'managed')
>>>>>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>>>>>> [1750414990.6383] device (em1): Activation: successful,
>>>>>> device
>>>>>> activated.
>>>>>>
>>>>>> ... <connect timed out>
>>>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
>>>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
>>>>>> __rpc_wake_up_task
>>>>>> (now 4294742016)
>>>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 disabling
>>>>>> timer
>>>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 removed from
>>>>>> queue
>>>>>> 0000000093f481cd "xprt_pending"
>>>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
>>>>>> call_connect_status
>>>>>> (status -110)
>>>>>>
>>>>>> ... <wait again>
>>>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
>>>>>> sleep_on(queue
>>>>>> "xprt_pending" time 4294742016)
>>>>>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to
>>>>>> queue
>>>>>> 0000000093f481cd "xprt_pending"
>>>>>>
>>>>>> ... <systemd timed out>
>>>>>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting
>>>>>> timed
>>>>>> out. Terminating.
>>>>>>
>>>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 got signal
>>>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1
>>>>>> __rpc_wake_up_task
>>>>>> (now 4294770229)
>>>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 disabling
>>>>>> timer
>>>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 removed from
>>>>>> queue
>>>>>> 0000000093f481cd "xprt_pending"
>>>>>>
>>>>>> Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() =
>>>>>> -512
>>>>>> [error]
>>>>>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount
>>>>>> process
>>>>>> exited, code=killed status=15
>>>>>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed
>>>>>> with
>>>>>> result 'timeout'.
>>>>>>
>>>>>> This patch forces TCP to send the SYN on ETIMEDOUT error.
>>>>>>
>>>>> Something is very wrong here...
>>>>>
>>>>> If this patch is correct, and the call to
>>>>> xprt_conditional_disconnect()
>>>>> does indeed cause the socket to close, then something must have
>>>>> bumped
>>>>> xprt->connect_cookie.
>>>> I'm a little confused here, xprt_conditional_disconnect only
>>>> closes
>>>> the
>>>> socket if the connect cookie is still the same:
>>>>
>>>>            if (cookie != xprt->connect_cookie)
>>>>                    goto out;
>>>>
>>>> So in this case the xprt->connect_cookie has not been bumped.
>>> You're right... Sorry...
>>>
>>> So does this mean that the socket is still in the SYN-SENT state?
>>> It is
>>> normally supposed to remain in that state for 60 seconds, and
>>> resend
>>> SYN using an exponential back off.
>> Apparently this was not done by TCP in this scenario:
>>
>> ... mount starts
>> Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
>> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue
>> 0000000093f481cd "xprt_pending"
>>
>> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
>> [1750414990.6383] device (em1): Activation: successful, device
>> activated.
>>
>> ... RPC task timed out on connect after 62 secs.  Note that at this
>> time the network link status was already up for more than 60 secs
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task
>> (now 4294742016)
>>
>> ... RPC task goes back and wait again.
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue
>> "xprt_pending" time 4294742016)
>> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue
>> 0000000093f481cd "xprt_pending"
>>
>> ... nothing happened and systemd timed out after 90 secs and kill the
>> mount
>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed
>> out. Terminating.
>> Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512
>> [error]
>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process
>> exited, code=killed status=15
>> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with
>> result 'timeout'.
>>
>> I don't know why TCP does not retry the SYN after the networkk link
>> is
>> up again but I though instead of trying to figure out and fix the
>> problem
>> at TCP layer, we can make the RPC connect process a bit more robust
>> by using
>> a new socket for retry.
>>
> We should never be using individual RPC message timeouts as a measure
> for whether or not the TCP connection is down.
>
> For the initial connection attempt, we use the TCP_SYNCNT socket option
> to decide when to give up.

The problem we're having is the initial connection attempt at boot time,
NFS mount started while the network link has been stable yet. NetworkManager
could also be blamed here, it claims 'startup complete' which allows
systemd to start the NFS mounts.

And for some reasons TCP did not send the SYN under this flaky link condition,
even after the network link is up again; we don't see any SYN in the network
trace. The TCP_SYNCNT option is not in play here.

This seems like a bug in the TCP/network layer and it seems to me that
you prefer to push the responsibility to resolve this problem down to
TCP/network stack.

This problem is reproducible, do you want to collect any additional data
before making the final decision?

-Dai

>   Once the connection is established, we set
> TCP_USER_TIMEOUT as well as socket keepalive to ensure that the
> connection is still alive.
>
> I'm willing to consider further socket-specific timers, but not if the
> reason is to work around bugs (assuming that is indeed the case here).
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-08-05 18:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-04 19:08 [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry Dai Ngo
2025-08-04 19:21 ` Trond Myklebust
2025-08-04 20:13   ` Dai Ngo
2025-08-04 23:55     ` Trond Myklebust
2025-08-05 15:46       ` Dai Ngo
2025-08-05 16:41         ` Trond Myklebust
2025-08-05 17:07           ` Dai Ngo
2025-08-05 17:41             ` Trond Myklebust
2025-08-05 18:25               ` Dai Ngo
  -- strict thread matches above, loose matches on Subject: below --
2025-08-04 18:46 Dai Ngo
2025-08-04 18:49 ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).