* [PATCH 0/4] Improve failover times for pNFS mirroring @ 2023-08-18 13:41 trondmy 2023-08-18 13:41 ` [PATCH 1/4] SUNRPC: Set the TCP_SYNCNT to match the socket timeout trondmy 0 siblings, 1 reply; 5+ messages in thread From: trondmy @ 2023-08-18 13:41 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> When a data server goes down, it can currently take 3 minutes for the RPC connection attempt to give up, and return control to the NFS layer. If the file is mirrored, we usually want to fail the attempt to the downed data server much earlier, and retry using one of the other mirrors. This patchset sets the connect timeout to be closer to the I/O timeout value for the case of pNFS to NFSv3 data servers. Trond Myklebust (4): SUNRPC: Set the TCP_SYNCNT to match the socket timeout SUNRPC: Refactor and simplify connect timeout SUNRPC: Allow specification of TCP client connect timeout at setup NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver fs/nfs/client.c | 2 ++ fs/nfs/internal.h | 2 ++ fs/nfs/nfs3client.c | 3 ++ fs/nfs/pnfs_nfs.c | 3 ++ include/linux/sunrpc/clnt.h | 2 ++ include/linux/sunrpc/xprt.h | 2 ++ net/sunrpc/clnt.c | 2 ++ net/sunrpc/xprtsock.c | 58 +++++++++++++++++++++++++++---------- 8 files changed, 59 insertions(+), 15 deletions(-) -- 2.41.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] SUNRPC: Set the TCP_SYNCNT to match the socket timeout 2023-08-18 13:41 [PATCH 0/4] Improve failover times for pNFS mirroring trondmy @ 2023-08-18 13:41 ` trondmy 2023-08-18 13:41 ` [PATCH 2/4] SUNRPC: Refactor and simplify connect timeout trondmy 0 siblings, 1 reply; 5+ messages in thread From: trondmy @ 2023-08-18 13:41 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Set the TCP SYN count so that we abort the connection attempt at around the expected timeout value. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprtsock.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 9f010369100a..ba045187cf65 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2230,9 +2230,13 @@ static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt, struct socket *sock) { struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); + struct net *net = sock_net(sock->sk); + unsigned long connect_timeout; + unsigned long syn_retries; unsigned int keepidle; unsigned int keepcnt; unsigned int timeo; + unsigned long t; spin_lock(&xprt->transport_lock); keepidle = DIV_ROUND_UP(xprt->timeout->to_initval, HZ); @@ -2250,6 +2254,16 @@ static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt, /* TCP user timeout (see RFC5482) */ tcp_sock_set_user_timeout(sock->sk, timeo); + + /* Connect timeout */ + connect_timeout = max_t(unsigned long, + DIV_ROUND_UP(xprt->connect_timeout, HZ), 1); + syn_retries = max_t(unsigned long, + READ_ONCE(net->ipv4.sysctl_tcp_syn_retries), 1); + for (t = 0; t < syn_retries && (1 << t) < connect_timeout; t++) + ; + if (t <= syn_retries) + tcp_sock_set_syncnt(sock->sk, t - 1); } static void xs_tcp_set_connect_timeout(struct rpc_xprt *xprt, -- 2.41.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/4] SUNRPC: Refactor and simplify connect timeout 2023-08-18 13:41 ` [PATCH 1/4] SUNRPC: Set the TCP_SYNCNT to match the socket timeout trondmy @ 2023-08-18 13:41 ` trondmy 2023-08-18 13:41 ` [PATCH 3/4] SUNRPC: Allow specification of TCP client connect timeout at setup trondmy 0 siblings, 1 reply; 5+ messages in thread From: trondmy @ 2023-08-18 13:41 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Instead of requiring the requests to redrive the connection several times, just let the TCP connect code manage it now that we've adjusted the TCP_SYNCNT value. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprtsock.c | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index ba045187cf65..f1909c22cea3 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2266,6 +2266,26 @@ static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt, tcp_sock_set_syncnt(sock->sk, t - 1); } +static void xs_tcp_do_set_connect_timeout(struct rpc_xprt *xprt, + unsigned long connect_timeout) +{ + struct sock_xprt *transport = + container_of(xprt, struct sock_xprt, xprt); + struct rpc_timeout to; + unsigned long initval; + + memcpy(&to, xprt->timeout, sizeof(to)); + /* Arbitrary lower limit */ + initval = max_t(unsigned long, connect_timeout, + XS_TCP_INIT_REEST_TO << 1); + to.to_initval = initval; + to.to_maxval = initval; + to.to_retries = 0; + memcpy(&transport->tcp_timeout, &to, sizeof(transport->tcp_timeout)); + xprt->timeout = &transport->tcp_timeout; + xprt->connect_timeout = connect_timeout; +} + static void xs_tcp_set_connect_timeout(struct rpc_xprt *xprt, unsigned long connect_timeout, unsigned long reconnect_timeout) @@ -2277,19 +2297,8 @@ static void xs_tcp_set_connect_timeout(struct rpc_xprt *xprt, spin_lock(&xprt->transport_lock); if (reconnect_timeout < xprt->max_reconnect_timeout) xprt->max_reconnect_timeout = reconnect_timeout; - if (connect_timeout < xprt->connect_timeout) { - memcpy(&to, xprt->timeout, sizeof(to)); - initval = DIV_ROUND_UP(connect_timeout, to.to_retries + 1); - /* Arbitrary lower limit */ - if (initval < XS_TCP_INIT_REEST_TO << 1) - initval = XS_TCP_INIT_REEST_TO << 1; - to.to_initval = initval; - to.to_maxval = initval; - memcpy(&transport->tcp_timeout, &to, - sizeof(transport->tcp_timeout)); - xprt->timeout = &transport->tcp_timeout; - xprt->connect_timeout = connect_timeout; - } + if (connect_timeout < xprt->connect_timeout) + xs_tcp_do_set_connect_timeout(xprt, connect_timeout); set_bit(XPRT_SOCK_UPD_TIMEOUT, &transport->sock_state); spin_unlock(&xprt->transport_lock); } -- 2.41.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] SUNRPC: Allow specification of TCP client connect timeout at setup 2023-08-18 13:41 ` [PATCH 2/4] SUNRPC: Refactor and simplify connect timeout trondmy @ 2023-08-18 13:41 ` trondmy 2023-08-18 13:41 ` [PATCH 4/4] NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver trondmy 0 siblings, 1 reply; 5+ messages in thread From: trondmy @ 2023-08-18 13:41 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> When we create a TCP transport, the connect timeout parameters are currently fixed to be 90s. This is problematic in the pNFS flexfiles case, where we may have multiple mirrors, and we would like to fail over quickly to the next mirror if a data server is down. This patch adds the ability to specify the connection parameters at RPC client creation time. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/clnt.h | 2 ++ include/linux/sunrpc/xprt.h | 2 ++ net/sunrpc/clnt.c | 2 ++ net/sunrpc/xprtsock.c | 9 +++++++-- 4 files changed, 13 insertions(+), 2 deletions(-) diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 4f41d839face..af7358277f1c 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -148,6 +148,8 @@ struct rpc_create_args { const struct cred *cred; unsigned int max_connect; struct xprtsec_parms xprtsec; + unsigned long connect_timeout; + unsigned long reconnect_timeout; }; struct rpc_add_xprt_test { diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index b52411bcfe4e..4ecc89301eb7 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -351,6 +351,8 @@ struct xprt_create { struct rpc_xprt_switch *bc_xps; unsigned int flags; struct xprtsec_parms xprtsec; + unsigned long connect_timeout; + unsigned long reconnect_timeout; }; struct xprt_class { diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index d7c697af3762..9edebfdb5ce1 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -534,6 +534,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args) .servername = args->servername, .bc_xprt = args->bc_xprt, .xprtsec = args->xprtsec, + .connect_timeout = args->connect_timeout, + .reconnect_timeout = args->reconnect_timeout, }; char servername[48]; struct rpc_clnt *clnt; diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index f1909c22cea3..77420362e525 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2291,8 +2291,6 @@ static void xs_tcp_set_connect_timeout(struct rpc_xprt *xprt, unsigned long reconnect_timeout) { struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - struct rpc_timeout to; - unsigned long initval; spin_lock(&xprt->transport_lock); if (reconnect_timeout < xprt->max_reconnect_timeout) @@ -3351,8 +3349,15 @@ static struct rpc_xprt *xs_setup_tcp(struct xprt_create *args) xprt->timeout = &xs_tcp_default_timeout; xprt->max_reconnect_timeout = xprt->timeout->to_maxval; + if (args->reconnect_timeout && + args->reconnect_timeout < xprt->max_reconnect_timeout) + xprt->max_reconnect_timeout = args->reconnect_timeout; + xprt->connect_timeout = xprt->timeout->to_initval * (xprt->timeout->to_retries + 1); + if (args->connect_timeout && + args->connect_timeout < xprt->connect_timeout) + xs_tcp_do_set_connect_timeout(xprt, args->connect_timeout); INIT_WORK(&transport->recv_worker, xs_stream_data_receive_workfn); INIT_WORK(&transport->error_worker, xs_error_handle); -- 2.41.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 4/4] NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver 2023-08-18 13:41 ` [PATCH 3/4] SUNRPC: Allow specification of TCP client connect timeout at setup trondmy @ 2023-08-18 13:41 ` trondmy 0 siblings, 0 replies; 5+ messages in thread From: trondmy @ 2023-08-18 13:41 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Ensure that the connect timeout for the pNFS flexfiles driver is of the same order as the I/O timeout, so that we can fail over quickly when trying to read from a data server that is down. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/client.c | 2 ++ fs/nfs/internal.h | 2 ++ fs/nfs/nfs3client.c | 3 +++ fs/nfs/pnfs_nfs.c | 3 +++ 4 files changed, 10 insertions(+) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index e4c5f193ed5e..44eca51b2808 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -517,6 +517,8 @@ int nfs_create_rpc_client(struct nfs_client *clp, .authflavor = flavor, .cred = cl_init->cred, .xprtsec = cl_init->xprtsec, + .connect_timeout = cl_init->connect_timeout, + .reconnect_timeout = cl_init->reconnect_timeout, }; if (test_bit(NFS_CS_DISCRTRY, &clp->cl_flags)) diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 0019c7578f9d..4d80925c94f7 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -82,6 +82,8 @@ struct nfs_client_initdata { const struct rpc_timeout *timeparms; const struct cred *cred; struct xprtsec_parms xprtsec; + unsigned long connect_timeout; + unsigned long reconnect_timeout; }; /* diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c index eff3802c5e03..674c012868b1 100644 --- a/fs/nfs/nfs3client.c +++ b/fs/nfs/nfs3client.c @@ -86,6 +86,7 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv, int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans) { struct rpc_timeout ds_timeout; + unsigned long connect_timeout = ds_timeo * (ds_retrans + 1) * HZ / 10; struct nfs_client *mds_clp = mds_srv->nfs_client; struct nfs_client_initdata cl_init = { .addr = ds_addr, @@ -98,6 +99,8 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv, .timeparms = &ds_timeout, .cred = mds_srv->cred, .xprtsec = mds_clp->cl_xprtsec, + .connect_timeout = connect_timeout, + .reconnect_timeout = connect_timeout, }; struct nfs_client *clp; char buf[INET6_ADDRSTRLEN + 1]; diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c index a0112ad4937a..a08cfda6fff1 100644 --- a/fs/nfs/pnfs_nfs.c +++ b/fs/nfs/pnfs_nfs.c @@ -852,6 +852,7 @@ static int _nfs4_pnfs_v3_ds_connect(struct nfs_server *mds_srv, { struct nfs_client *clp = ERR_PTR(-EIO); struct nfs4_pnfs_ds_addr *da; + unsigned long connect_timeout = timeo * (retrans + 1) * HZ / 10; int status = 0; dprintk("--> %s DS %s\n", __func__, ds->ds_remotestr); @@ -870,6 +871,8 @@ static int _nfs4_pnfs_v3_ds_connect(struct nfs_server *mds_srv, .dstaddr = (struct sockaddr *)&da->da_addr, .addrlen = da->da_addrlen, .servername = clp->cl_hostname, + .connect_timeout = connect_timeout, + .reconnect_timeout = connect_timeout, }; if (da->da_transport != clp->cl_proto) -- 2.41.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-18 13:50 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-18 13:41 [PATCH 0/4] Improve failover times for pNFS mirroring trondmy 2023-08-18 13:41 ` [PATCH 1/4] SUNRPC: Set the TCP_SYNCNT to match the socket timeout trondmy 2023-08-18 13:41 ` [PATCH 2/4] SUNRPC: Refactor and simplify connect timeout trondmy 2023-08-18 13:41 ` [PATCH 3/4] SUNRPC: Allow specification of TCP client connect timeout at setup trondmy 2023-08-18 13:41 ` [PATCH 4/4] NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver trondmy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox