From: Trond Myklebust <trondmy@kernel.org>
To: Dai Ngo <dai.ngo@oracle.com>, anna@kernel.org
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry
Date: Tue, 05 Aug 2025 09:41:28 -0700 [thread overview]
Message-ID: <3978e03aa1638621bced96738ef210ed389e2afb.camel@kernel.org> (raw)
In-Reply-To: <de62d42e-5e85-45d5-825b-369938a4a24c@oracle.com>
On Tue, 2025-08-05 at 08:46 -0700, Dai Ngo wrote:
>
> On 8/4/25 4:55 PM, Trond Myklebust wrote:
> > On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote:
> > > On 8/4/25 12:21 PM, Trond Myklebust wrote:
> > > > On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
> > > > > Currently, when an RPC connection times out during the
> > > > > connect
> > > > > phase,
> > > > > the task is retried by placing it back on the pending queue
> > > > > and
> > > > > waiting
> > > > > again. In some cases, the timeout occurs because TCP is
> > > > > unable to
> > > > > send
> > > > > the SYN packet. This situation most often arises on bare
> > > > > metal
> > > > > systems
> > > > > at boot time, when the NFS mount is attempted while the
> > > > > network
> > > > > link
> > > > > appears to be up but is not yet stable.
> > > > >
> > > > > This patch addresses the issue by updating
> > > > > call_connect_status to
> > > > > destroy
> > > > > the transport on ETIMEDOUT error before retrying the
> > > > > connection.
> > > > > This
> > > > > ensures that subsequent connection attempts use a fresh
> > > > > transport,
> > > > > reducing the likelihood of repeated failures due to lingering
> > > > > network
> > > > > issues.
> > > > >
> > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > ---
> > > > > net/sunrpc/clnt.c | 2 +-
> > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > > > index 21426c3049d3..701b742750c5 100644
> > > > > --- a/net/sunrpc/clnt.c
> > > > > +++ b/net/sunrpc/clnt.c
> > > > > @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task
> > > > > *task)
> > > > > case -EHOSTUNREACH:
> > > > > case -EPIPE:
> > > > > case -EPROTO:
> > > > > + case -ETIMEDOUT:
> > > > > xprt_conditional_disconnect(task->tk_rqstp-
> > > > > > rq_xprt,
> > > > > task->tk_rqstp-
> > > > > > rq_connect_cookie);
> > > > > if (RPC_IS_SOFTCONN(task))
> > > > > @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task
> > > > > *task)
> > > > > case -EADDRINUSE:
> > > > > case -ENOTCONN:
> > > > > case -EAGAIN:
> > > > > - case -ETIMEDOUT:
> > > > > if (!(task->tk_flags &
> > > > > RPC_TASK_NO_ROUND_ROBIN)
> > > > > &&
> > > > > (task->tk_flags & RPC_TASK_MOVEABLE) &&
> > > > > test_bit(XPRT_REMOVE, &xprt->state)) {
> > > > Why is this needed? The ETIMEDOUT is supposed to be a task
> > > > level
> > > > error,
> > > > not a connection level thing.
> > > If TCP was not able to sent the SYN out on due to the transient
> > > error
> > > with the link status and the task just turns around a wait again,
> > > since
> > > TCP does not retry the SYN, eventually systemd times out and
> > > stops
> > > the
> > > mount.
> > >
> > >
> > > Below is the snippet from the system log with rpcdebug enabled:
> > >
> > >
> > > Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: NIC
> > > Link is Up, 10 Gbps Full Duplex, Flow Control: None
> > > Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414989.6033] manager: startup complete
> > >
> > > ... <NFS mount starts>
> > > Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
> > > ...
> > > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC: 1 added to queue
> > > 0000000093f481cd "xprt_pending"
> > > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC: 1 setting alarm
> > > for
> > > 60000 ms
> > >
> > > ... <link status down & up>
> > > Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: Link
> > > is
> > > up at 1000 Mbps, full duplex
> > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414990.6359] device (em1): state change: disconnected ->
> > > prepare
> > > (reason 'none', sys-iface-state: 'managed')
> > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414990.6361] device (em1): state change: prepare -> config
> > > (reason 'none', sys-iface-state: 'managed')
> > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414990.6364] device (em1): state change: config -> ip-config
> > > (reason 'none', sys-iface-state: 'managed')
> > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
> > > [1750414990.6383] device (em1): Activation: successful, device
> > > activated.
> > >
> > > ... <connect timed out>
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 timeout
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1
> > > __rpc_wake_up_task
> > > (now 4294742016)
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 disabling timer
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 removed from
> > > queue
> > > 0000000093f481cd "xprt_pending"
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1
> > > call_connect_status
> > > (status -110)
> > >
> > > ... <wait again>
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 sleep_on(queue
> > > "xprt_pending" time 4294742016)
> > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 added to queue
> > > 0000000093f481cd "xprt_pending"
> > >
> > > ... <systemd timed out>
> > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting
> > > timed
> > > out. Terminating.
> > >
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 got signal
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1
> > > __rpc_wake_up_task
> > > (now 4294770229)
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 disabling timer
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 removed from
> > > queue
> > > 0000000093f481cd "xprt_pending"
> > >
> > > Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512
> > > [error]
> > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount
> > > process
> > > exited, code=killed status=15
> > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with
> > > result 'timeout'.
> > >
> > > This patch forces TCP to send the SYN on ETIMEDOUT error.
> > >
> > Something is very wrong here...
> >
> > If this patch is correct, and the call to
> > xprt_conditional_disconnect()
> > does indeed cause the socket to close, then something must have
> > bumped
> > xprt->connect_cookie.
>
> I'm a little confused here, xprt_conditional_disconnect only closes
> the
> socket if the connect cookie is still the same:
>
> if (cookie != xprt->connect_cookie)
> goto out;
>
> So in this case the xprt->connect_cookie has not been bumped.
You're right... Sorry...
So does this mean that the socket is still in the SYN-SENT state? It is
normally supposed to remain in that state for 60 seconds, and resend
SYN using an exponential back off. What you are doing should not be
necessary if the socket is in that state.
>
> -Dai
>
> > The only things that can do that are the state
> > changes recorded in xs_tcp_state_change(),
> > xs_sock_reset_connection_flags(), or xprt_autoclose().
> >
> > If it was xs_tcp_state_change() that bumped xprt->connect_cookie,
> > then
> > either we're in TCP_ESTABLISHED (in which case triggering a close
> > on
> > ETIMEDOUT is just wrong), or we're in the TCP_CLOSE state, in which
> > case autoclose should already be scheduled.
> > If xs_sock_reset_connection_flags() got called, then the socket is
> > in
> > the process of being closed.
> > Ditto if xprt_autoclose() got called.
> >
> > So why do we need the call to xprt_conditional_disconnect()?
> >
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
next prev parent reply other threads:[~2025-08-05 16:41 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-04 19:08 [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry Dai Ngo
2025-08-04 19:21 ` Trond Myklebust
2025-08-04 20:13 ` Dai Ngo
2025-08-04 23:55 ` Trond Myklebust
2025-08-05 15:46 ` Dai Ngo
2025-08-05 16:41 ` Trond Myklebust [this message]
2025-08-05 17:07 ` Dai Ngo
2025-08-05 17:41 ` Trond Myklebust
2025-08-05 18:25 ` Dai Ngo
-- strict thread matches above, loose matches on Subject: below --
2025-08-04 18:46 Dai Ngo
2025-08-04 18:49 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3978e03aa1638621bced96738ef210ed389e2afb.camel@kernel.org \
--to=trondmy@kernel.org \
--cc=anna@kernel.org \
--cc=dai.ngo@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).