* [PATCH v2 0/5] Improve failover times for pNFS mirroring @ 2023-08-19 21:32 trondmy 2023-08-19 21:32 ` [PATCH v2 1/5] SUNRPC: Set the TCP_SYNCNT to match the socket timeout trondmy 0 siblings, 1 reply; 6+ messages in thread From: trondmy @ 2023-08-19 21:32 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> When a data server goes down, it can currently take 3 minutes for the RPC connection attempt to give up, and return control to the NFS layer. If the file is mirrored, we usually want to fail the attempt to the downed data server much earlier, and retry using one of the other mirrors. This patchset sets the connect timeout to be closer to the I/O timeout value for the case of pNFS to NFSv3 data servers. v2: - Don't override connect timeouts in rpc_clnt_add_xprt - Don't override specified connect timeouts at setup Trond Myklebust (5): SUNRPC: Set the TCP_SYNCNT to match the socket timeout SUNRPC: Refactor and simplify connect timeout SUNRPC: Allow specification of TCP client connect timeout at setup SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt() NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver fs/nfs/client.c | 2 ++ fs/nfs/internal.h | 2 ++ fs/nfs/nfs3client.c | 3 ++ fs/nfs/pnfs_nfs.c | 3 ++ include/linux/sunrpc/clnt.h | 2 ++ include/linux/sunrpc/xprt.h | 2 ++ net/sunrpc/clnt.c | 7 +++++ net/sunrpc/xprtsock.c | 55 +++++++++++++++++++++++++++---------- 8 files changed, 61 insertions(+), 15 deletions(-) -- 2.41.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/5] SUNRPC: Set the TCP_SYNCNT to match the socket timeout 2023-08-19 21:32 [PATCH v2 0/5] Improve failover times for pNFS mirroring trondmy @ 2023-08-19 21:32 ` trondmy 2023-08-19 21:32 ` [PATCH v2 2/5] SUNRPC: Refactor and simplify connect timeout trondmy 0 siblings, 1 reply; 6+ messages in thread From: trondmy @ 2023-08-19 21:32 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Set the TCP SYN count so that we abort the connection attempt at around the expected timeout value. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprtsock.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 9f010369100a..47d0b6a8c32e 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2230,9 +2230,13 @@ static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt, struct socket *sock) { struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); + struct net *net = sock_net(sock->sk); + unsigned long connect_timeout; + unsigned long syn_retries; unsigned int keepidle; unsigned int keepcnt; unsigned int timeo; + unsigned long t; spin_lock(&xprt->transport_lock); keepidle = DIV_ROUND_UP(xprt->timeout->to_initval, HZ); @@ -2250,6 +2254,16 @@ static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt, /* TCP user timeout (see RFC5482) */ tcp_sock_set_user_timeout(sock->sk, timeo); + + /* Connect timeout */ + connect_timeout = max_t(unsigned long, + DIV_ROUND_UP(xprt->connect_timeout, HZ), 1); + syn_retries = max_t(unsigned long, + READ_ONCE(net->ipv4.sysctl_tcp_syn_retries), 1); + for (t = 0; t <= syn_retries && (1UL << t) < connect_timeout; t++) + ; + if (t <= syn_retries) + tcp_sock_set_syncnt(sock->sk, t - 1); } static void xs_tcp_set_connect_timeout(struct rpc_xprt *xprt, -- 2.41.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/5] SUNRPC: Refactor and simplify connect timeout 2023-08-19 21:32 ` [PATCH v2 1/5] SUNRPC: Set the TCP_SYNCNT to match the socket timeout trondmy @ 2023-08-19 21:32 ` trondmy 2023-08-19 21:32 ` [PATCH v2 3/5] SUNRPC: Allow specification of TCP client connect timeout at setup trondmy 0 siblings, 1 reply; 6+ messages in thread From: trondmy @ 2023-08-19 21:32 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Instead of requiring the requests to redrive the connection several times, just let the TCP connect code manage it now that we've adjusted the TCP_SYNCNT value. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/xprtsock.c | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 47d0b6a8c32e..e558f0024fe5 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2266,6 +2266,25 @@ static void xs_tcp_set_socket_timeouts(struct rpc_xprt *xprt, tcp_sock_set_syncnt(sock->sk, t - 1); } +static void xs_tcp_do_set_connect_timeout(struct rpc_xprt *xprt, + unsigned long connect_timeout) +{ + struct sock_xprt *transport = + container_of(xprt, struct sock_xprt, xprt); + struct rpc_timeout to; + unsigned long initval; + + memcpy(&to, xprt->timeout, sizeof(to)); + /* Arbitrary lower limit */ + initval = max_t(unsigned long, connect_timeout, XS_TCP_INIT_REEST_TO); + to.to_initval = initval; + to.to_maxval = initval; + to.to_retries = 0; + memcpy(&transport->tcp_timeout, &to, sizeof(transport->tcp_timeout)); + xprt->timeout = &transport->tcp_timeout; + xprt->connect_timeout = connect_timeout; +} + static void xs_tcp_set_connect_timeout(struct rpc_xprt *xprt, unsigned long connect_timeout, unsigned long reconnect_timeout) @@ -2277,19 +2296,8 @@ static void xs_tcp_set_connect_timeout(struct rpc_xprt *xprt, spin_lock(&xprt->transport_lock); if (reconnect_timeout < xprt->max_reconnect_timeout) xprt->max_reconnect_timeout = reconnect_timeout; - if (connect_timeout < xprt->connect_timeout) { - memcpy(&to, xprt->timeout, sizeof(to)); - initval = DIV_ROUND_UP(connect_timeout, to.to_retries + 1); - /* Arbitrary lower limit */ - if (initval < XS_TCP_INIT_REEST_TO << 1) - initval = XS_TCP_INIT_REEST_TO << 1; - to.to_initval = initval; - to.to_maxval = initval; - memcpy(&transport->tcp_timeout, &to, - sizeof(transport->tcp_timeout)); - xprt->timeout = &transport->tcp_timeout; - xprt->connect_timeout = connect_timeout; - } + if (connect_timeout < xprt->connect_timeout) + xs_tcp_do_set_connect_timeout(xprt, connect_timeout); set_bit(XPRT_SOCK_UPD_TIMEOUT, &transport->sock_state); spin_unlock(&xprt->transport_lock); } -- 2.41.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 3/5] SUNRPC: Allow specification of TCP client connect timeout at setup 2023-08-19 21:32 ` [PATCH v2 2/5] SUNRPC: Refactor and simplify connect timeout trondmy @ 2023-08-19 21:32 ` trondmy 2023-08-19 21:32 ` [PATCH v2 4/5] SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt() trondmy 0 siblings, 1 reply; 6+ messages in thread From: trondmy @ 2023-08-19 21:32 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> When we create a TCP transport, the connect timeout parameters are currently fixed to be 90s. This is problematic in the pNFS flexfiles case, where we may have multiple mirrors, and we would like to fail over quickly to the next mirror if a data server is down. This patch adds the ability to specify the connection parameters at RPC client creation time. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- include/linux/sunrpc/clnt.h | 2 ++ include/linux/sunrpc/xprt.h | 2 ++ net/sunrpc/clnt.c | 2 ++ net/sunrpc/xprtsock.c | 7 +++++-- 4 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 4f41d839face..af7358277f1c 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -148,6 +148,8 @@ struct rpc_create_args { const struct cred *cred; unsigned int max_connect; struct xprtsec_parms xprtsec; + unsigned long connect_timeout; + unsigned long reconnect_timeout; }; struct rpc_add_xprt_test { diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index b52411bcfe4e..4ecc89301eb7 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -351,6 +351,8 @@ struct xprt_create { struct rpc_xprt_switch *bc_xps; unsigned int flags; struct xprtsec_parms xprtsec; + unsigned long connect_timeout; + unsigned long reconnect_timeout; }; struct xprt_class { diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index d7c697af3762..9edebfdb5ce1 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -534,6 +534,8 @@ struct rpc_clnt *rpc_create(struct rpc_create_args *args) .servername = args->servername, .bc_xprt = args->bc_xprt, .xprtsec = args->xprtsec, + .connect_timeout = args->connect_timeout, + .reconnect_timeout = args->reconnect_timeout, }; char servername[48]; struct rpc_clnt *clnt; diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index e558f0024fe5..6e845e51cbf3 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2290,8 +2290,6 @@ static void xs_tcp_set_connect_timeout(struct rpc_xprt *xprt, unsigned long reconnect_timeout) { struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - struct rpc_timeout to; - unsigned long initval; spin_lock(&xprt->transport_lock); if (reconnect_timeout < xprt->max_reconnect_timeout) @@ -3350,8 +3348,13 @@ static struct rpc_xprt *xs_setup_tcp(struct xprt_create *args) xprt->timeout = &xs_tcp_default_timeout; xprt->max_reconnect_timeout = xprt->timeout->to_maxval; + if (args->reconnect_timeout) + xprt->max_reconnect_timeout = args->reconnect_timeout; + xprt->connect_timeout = xprt->timeout->to_initval * (xprt->timeout->to_retries + 1); + if (args->connect_timeout) + xs_tcp_do_set_connect_timeout(xprt, args->connect_timeout); INIT_WORK(&transport->recv_worker, xs_stream_data_receive_workfn); INIT_WORK(&transport->error_worker, xs_error_handle); -- 2.41.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 4/5] SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt() 2023-08-19 21:32 ` [PATCH v2 3/5] SUNRPC: Allow specification of TCP client connect timeout at setup trondmy @ 2023-08-19 21:32 ` trondmy 2023-08-19 21:32 ` [PATCH v2 5/5] NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver trondmy 0 siblings, 1 reply; 6+ messages in thread From: trondmy @ 2023-08-19 21:32 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> If the caller specifies the connect timeouts in the arguments to rpc_clnt_add_xprt(), then we shouldn't override them. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- net/sunrpc/clnt.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 9edebfdb5ce1..943dc3897378 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -3071,6 +3071,11 @@ int rpc_clnt_add_xprt(struct rpc_clnt *clnt, } xprt->resvport = resvport; xprt->reuseport = reuseport; + + if (xprtargs->connect_timeout) + connect_timeout = xprtargs->connect_timeout; + if (xprtargs->reconnect_timeout) + reconnect_timeout = xprtargs->reconnect_timeout; if (xprt->ops->set_connect_timeout != NULL) xprt->ops->set_connect_timeout(xprt, connect_timeout, -- 2.41.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 5/5] NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver 2023-08-19 21:32 ` [PATCH v2 4/5] SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt() trondmy @ 2023-08-19 21:32 ` trondmy 0 siblings, 0 replies; 6+ messages in thread From: trondmy @ 2023-08-19 21:32 UTC (permalink / raw) To: Anna Schumaker; +Cc: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Ensure that the connect timeout for the pNFS flexfiles driver is of the same order as the I/O timeout, so that we can fail over quickly when trying to read from a data server that is down. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/client.c | 2 ++ fs/nfs/internal.h | 2 ++ fs/nfs/nfs3client.c | 3 +++ fs/nfs/pnfs_nfs.c | 3 +++ 4 files changed, 10 insertions(+) diff --git a/fs/nfs/client.c b/fs/nfs/client.c index e4c5f193ed5e..44eca51b2808 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -517,6 +517,8 @@ int nfs_create_rpc_client(struct nfs_client *clp, .authflavor = flavor, .cred = cl_init->cred, .xprtsec = cl_init->xprtsec, + .connect_timeout = cl_init->connect_timeout, + .reconnect_timeout = cl_init->reconnect_timeout, }; if (test_bit(NFS_CS_DISCRTRY, &clp->cl_flags)) diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 0019c7578f9d..4d80925c94f7 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -82,6 +82,8 @@ struct nfs_client_initdata { const struct rpc_timeout *timeparms; const struct cred *cred; struct xprtsec_parms xprtsec; + unsigned long connect_timeout; + unsigned long reconnect_timeout; }; /* diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c index eff3802c5e03..674c012868b1 100644 --- a/fs/nfs/nfs3client.c +++ b/fs/nfs/nfs3client.c @@ -86,6 +86,7 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv, int ds_proto, unsigned int ds_timeo, unsigned int ds_retrans) { struct rpc_timeout ds_timeout; + unsigned long connect_timeout = ds_timeo * (ds_retrans + 1) * HZ / 10; struct nfs_client *mds_clp = mds_srv->nfs_client; struct nfs_client_initdata cl_init = { .addr = ds_addr, @@ -98,6 +99,8 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv, .timeparms = &ds_timeout, .cred = mds_srv->cred, .xprtsec = mds_clp->cl_xprtsec, + .connect_timeout = connect_timeout, + .reconnect_timeout = connect_timeout, }; struct nfs_client *clp; char buf[INET6_ADDRSTRLEN + 1]; diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c index a0112ad4937a..a08cfda6fff1 100644 --- a/fs/nfs/pnfs_nfs.c +++ b/fs/nfs/pnfs_nfs.c @@ -852,6 +852,7 @@ static int _nfs4_pnfs_v3_ds_connect(struct nfs_server *mds_srv, { struct nfs_client *clp = ERR_PTR(-EIO); struct nfs4_pnfs_ds_addr *da; + unsigned long connect_timeout = timeo * (retrans + 1) * HZ / 10; int status = 0; dprintk("--> %s DS %s\n", __func__, ds->ds_remotestr); @@ -870,6 +871,8 @@ static int _nfs4_pnfs_v3_ds_connect(struct nfs_server *mds_srv, .dstaddr = (struct sockaddr *)&da->da_addr, .addrlen = da->da_addrlen, .servername = clp->cl_hostname, + .connect_timeout = connect_timeout, + .reconnect_timeout = connect_timeout, }; if (da->da_transport != clp->cl_proto) -- 2.41.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-08-20 0:22 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-19 21:32 [PATCH v2 0/5] Improve failover times for pNFS mirroring trondmy 2023-08-19 21:32 ` [PATCH v2 1/5] SUNRPC: Set the TCP_SYNCNT to match the socket timeout trondmy 2023-08-19 21:32 ` [PATCH v2 2/5] SUNRPC: Refactor and simplify connect timeout trondmy 2023-08-19 21:32 ` [PATCH v2 3/5] SUNRPC: Allow specification of TCP client connect timeout at setup trondmy 2023-08-19 21:32 ` [PATCH v2 4/5] SUNRPC: Don't override connect timeouts in rpc_clnt_add_xprt() trondmy 2023-08-19 21:32 ` [PATCH v2 5/5] NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver trondmy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).