* [PATCH RFC 1/4] NFS: Add a mount option to make ENETUNREACH errors fatal
2025-03-20 17:44 [PATCH RFC 0/4] Containerised NFS clients and teardown trondmy
@ 2025-03-20 17:44 ` trondmy
2025-03-20 17:44 ` [PATCH RFC 2/4] NFS: Treat ENETUNREACH errors as fatal in containers trondmy
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: trondmy @ 2025-03-20 17:44 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Josef Bacik
From: Trond Myklebust <trond.myklebust@hammerspace.com>
If the NFS client was initially created in a container, and that
container is torn down, there is usually no possibity to go back and
destroy any NFS clients that are hung because their virtual network
devices have been unlinked.
Add a flag that tells the NFS client that in these circumstances, it
should treat ENETDOWN and ENETUNREACH errors as fatal to the NFS client.
The option defaults to being on when the mount happens from inside a net
namespace that is not "init_net".
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs/fs_context.c | 38 ++++++++++++++++++++++++++++++++++++++
fs/nfs/super.c | 2 ++
include/linux/nfs_fs_sb.h | 1 +
3 files changed, 41 insertions(+)
diff --git a/fs/nfs/fs_context.c b/fs/nfs/fs_context.c
index 1cabba1231d6..5eb8c0a7833b 100644
--- a/fs/nfs/fs_context.c
+++ b/fs/nfs/fs_context.c
@@ -50,6 +50,7 @@ enum nfs_param {
Opt_clientaddr,
Opt_cto,
Opt_alignwrite,
+ Opt_fatal_neterrors,
Opt_fg,
Opt_fscache,
Opt_fscache_flag,
@@ -97,6 +98,20 @@ enum nfs_param {
Opt_xprtsec,
};
+enum {
+ Opt_fatal_neterrors_default,
+ Opt_fatal_neterrors_enetunreach,
+ Opt_fatal_neterrors_none,
+};
+
+static const struct constant_table nfs_param_enums_fatal_neterrors[] = {
+ { "default", Opt_fatal_neterrors_default },
+ { "enetdown:enetunreach", Opt_fatal_neterrors_enetunreach },
+ { "enetunreach:enetdown", Opt_fatal_neterrors_enetunreach },
+ { "none", Opt_fatal_neterrors_none },
+ {}
+};
+
enum {
Opt_local_lock_all,
Opt_local_lock_flock,
@@ -153,6 +168,7 @@ static const struct fs_parameter_spec nfs_fs_parameters[] = {
fsparam_string("clientaddr", Opt_clientaddr),
fsparam_flag_no("cto", Opt_cto),
fsparam_flag_no("alignwrite", Opt_alignwrite),
+ fsparam_enum ("fatal_errors", Opt_fatal_neterrors, nfs_param_enums_fatal_neterrors),
fsparam_flag ("fg", Opt_fg),
fsparam_flag_no("fsc", Opt_fscache_flag),
fsparam_string("fsc", Opt_fscache),
@@ -896,6 +912,25 @@ static int nfs_fs_context_parse_param(struct fs_context *fc,
goto out_of_bounds;
ctx->nfs_server.max_connect = result.uint_32;
break;
+ case Opt_fatal_neterrors:
+ trace_nfs_mount_assign(param->key, param->string);
+ switch (result.uint_32) {
+ case Opt_fatal_neterrors_default:
+ if (fc->net_ns != &init_net)
+ ctx->flags |= NFS_MOUNT_NETUNREACH_FATAL;
+ else
+ ctx->flags &= ~NFS_MOUNT_NETUNREACH_FATAL;
+ break;
+ case Opt_fatal_neterrors_enetunreach:
+ ctx->flags |= NFS_MOUNT_NETUNREACH_FATAL;
+ break;
+ case Opt_fatal_neterrors_none:
+ ctx->flags &= ~NFS_MOUNT_NETUNREACH_FATAL;
+ break;
+ default:
+ goto out_invalid_value;
+ }
+ break;
case Opt_lookupcache:
trace_nfs_mount_assign(param->key, param->string);
switch (result.uint_32) {
@@ -1675,6 +1710,9 @@ static int nfs_init_fs_context(struct fs_context *fc)
ctx->xprtsec.cert_serial = TLS_NO_CERT;
ctx->xprtsec.privkey_serial = TLS_NO_PRIVKEY;
+ if (fc->net_ns != &init_net)
+ ctx->flags |= NFS_MOUNT_NETUNREACH_FATAL;
+
fc->s_iflags |= SB_I_STABLE_WRITES;
}
fc->fs_private = ctx;
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 96de658a7886..23ed1ed67a10 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -457,6 +457,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
{ NFS_MOUNT_FORCE_RDIRPLUS, ",rdirplus=force", "" },
{ NFS_MOUNT_UNSHARED, ",nosharecache", "" },
{ NFS_MOUNT_NORESVPORT, ",noresvport", "" },
+ { NFS_MOUNT_NETUNREACH_FATAL,
+ ",fatal_neterrors=enetdown:enetunreach", "" },
{ 0, NULL, NULL }
};
const struct proc_nfs_info *nfs_infop;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b83d16a42afc..a6ce8590eaaf 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -168,6 +168,7 @@ struct nfs_server {
#define NFS_MOUNT_SHUTDOWN 0x08000000
#define NFS_MOUNT_NO_ALIGNWRITE 0x10000000
#define NFS_MOUNT_FORCE_RDIRPLUS 0x20000000
+#define NFS_MOUNT_NETUNREACH_FATAL 0x40000000
unsigned int fattr_valid; /* Valid attributes */
unsigned int caps; /* server capabilities */
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH RFC 2/4] NFS: Treat ENETUNREACH errors as fatal in containers
2025-03-20 17:44 [PATCH RFC 0/4] Containerised NFS clients and teardown trondmy
2025-03-20 17:44 ` [PATCH RFC 1/4] NFS: Add a mount option to make ENETUNREACH errors fatal trondmy
@ 2025-03-20 17:44 ` trondmy
2025-03-20 17:44 ` [PATCH RFC 3/4] pNFS/flexfiles: " trondmy
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: trondmy @ 2025-03-20 17:44 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Josef Bacik
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the generic
NFS client. If the flag is set, the client will receive ENETDOWN and
ENETUNREACH errors from the RPC layer, and is expected to treat them as
being fatal.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs/client.c | 5 +++++
fs/nfs/nfs4proc.c | 3 +++
include/linux/nfs_fs_sb.h | 1 +
include/linux/sunrpc/clnt.h | 5 ++++-
include/linux/sunrpc/sched.h | 1 +
net/sunrpc/clnt.c | 30 ++++++++++++++++++++++--------
6 files changed, 36 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 3b0918ade53c..02c916a55020 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -546,6 +546,8 @@ int nfs_create_rpc_client(struct nfs_client *clp,
args.flags |= RPC_CLNT_CREATE_NOPING;
if (test_bit(NFS_CS_REUSEPORT, &clp->cl_flags))
args.flags |= RPC_CLNT_CREATE_REUSEPORT;
+ if (test_bit(NFS_CS_NETUNREACH_FATAL, &clp->cl_flags))
+ args.flags |= RPC_CLNT_CREATE_NETUNREACH_FATAL;
if (!IS_ERR(clp->cl_rpcclient))
return 0;
@@ -709,6 +711,9 @@ static int nfs_init_server(struct nfs_server *server,
if (ctx->flags & NFS_MOUNT_NORESVPORT)
set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
+ if (ctx->flags & NFS_MOUNT_NETUNREACH_FATAL)
+ __set_bit(NFS_CS_NETUNREACH_FATAL, &cl_init.init_flags);
+
/* Allocate or find a client reference we can use */
clp = nfs_get_client(&cl_init);
if (IS_ERR(clp))
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index f195e7ceca1b..6fb3708560cf 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -195,6 +195,9 @@ static int nfs4_map_errors(int err)
return -EBUSY;
case -NFS4ERR_NOT_SAME:
return -ENOTSYNC;
+ case -ENETDOWN:
+ case -ENETUNREACH:
+ break;
default:
dprintk("%s could not handle NFSv4 error %d\n",
__func__, -err);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index a6ce8590eaaf..71319637a84e 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -50,6 +50,7 @@ struct nfs_client {
#define NFS_CS_DS 7 /* - Server is a DS */
#define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
#define NFS_CS_PNFS 9 /* - Server used for pnfs */
+#define NFS_CS_NETUNREACH_FATAL 10 /* - ENETUNREACH errors are fatal */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 9ad353f26f4f..5d9cdf31853c 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -64,7 +64,9 @@ struct rpc_clnt {
cl_noretranstimeo: 1,/* No retransmit timeouts */
cl_autobind : 1,/* use getport() */
cl_chatty : 1,/* be verbose */
- cl_shutdown : 1;/* rpc immediate -EIO */
+ cl_shutdown : 1,/* rpc immediate -EIO */
+ cl_netunreach_fatal : 1;
+ /* Treat ENETUNREACH errors as fatal */
struct xprtsec_parms cl_xprtsec; /* transport security policy */
struct rpc_rtt * cl_rtt; /* RTO estimator data */
@@ -175,6 +177,7 @@ struct rpc_add_xprt_test {
#define RPC_CLNT_CREATE_SOFTERR (1UL << 10)
#define RPC_CLNT_CREATE_REUSEPORT (1UL << 11)
#define RPC_CLNT_CREATE_CONNECTED (1UL << 12)
+#define RPC_CLNT_CREATE_NETUNREACH_FATAL (1UL << 13)
struct rpc_clnt *rpc_create(struct rpc_create_args *args);
struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *,
diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index eac57914dcf3..ccba79ebf893 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -134,6 +134,7 @@ struct rpc_task_setup {
#define RPC_TASK_MOVEABLE 0x0004 /* nfs4.1+ rpc tasks */
#define RPC_TASK_NULLCREDS 0x0010 /* Use AUTH_NULL credential */
#define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */
+#define RPC_TASK_NETUNREACH_FATAL 0x0040 /* ENETUNREACH is fatal */
#define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */
#define RPC_TASK_NO_ROUND_ROBIN 0x0100 /* send requests on "main" xprt */
#define RPC_TASK_SOFT 0x0200 /* Use soft timeouts */
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 3499b17ffea7..45f0154a0d07 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -512,6 +512,8 @@ static struct rpc_clnt *rpc_create_xprt(struct rpc_create_args *args,
clnt->cl_discrtry = 1;
if (!(args->flags & RPC_CLNT_CREATE_QUIET))
clnt->cl_chatty = 1;
+ if (args->flags & RPC_CLNT_CREATE_NETUNREACH_FATAL)
+ clnt->cl_netunreach_fatal = 1;
return clnt;
}
@@ -662,6 +664,7 @@ static struct rpc_clnt *__rpc_clone_client(struct rpc_create_args *args,
new->cl_noretranstimeo = clnt->cl_noretranstimeo;
new->cl_discrtry = clnt->cl_discrtry;
new->cl_chatty = clnt->cl_chatty;
+ new->cl_netunreach_fatal = clnt->cl_netunreach_fatal;
new->cl_principal = clnt->cl_principal;
new->cl_max_connect = clnt->cl_max_connect;
return new;
@@ -1195,6 +1198,8 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
task->tk_flags |= RPC_TASK_TIMEOUT;
if (clnt->cl_noretranstimeo)
task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT;
+ if (clnt->cl_netunreach_fatal)
+ task->tk_flags |= RPC_TASK_NETUNREACH_FATAL;
atomic_inc(&clnt->cl_task_count);
}
@@ -2101,14 +2106,17 @@ call_bind_status(struct rpc_task *task)
case -EPROTONOSUPPORT:
trace_rpcb_bind_version_err(task);
goto retry_timeout;
+ case -ENETDOWN:
+ case -ENETUNREACH:
+ if (task->tk_flags & RPC_TASK_NETUNREACH_FATAL)
+ break;
+ fallthrough;
case -ECONNREFUSED: /* connection problems */
case -ECONNRESET:
case -ECONNABORTED:
case -ENOTCONN:
case -EHOSTDOWN:
- case -ENETDOWN:
case -EHOSTUNREACH:
- case -ENETUNREACH:
case -EPIPE:
trace_rpcb_unreachable_err(task);
if (!RPC_IS_SOFTCONN(task)) {
@@ -2190,19 +2198,22 @@ call_connect_status(struct rpc_task *task)
task->tk_status = 0;
switch (status) {
+ case -ENETDOWN:
+ case -ENETUNREACH:
+ if (task->tk_flags & RPC_TASK_NETUNREACH_FATAL)
+ break;
+ fallthrough;
case -ECONNREFUSED:
case -ECONNRESET:
/* A positive refusal suggests a rebind is needed. */
- if (RPC_IS_SOFTCONN(task))
- break;
if (clnt->cl_autobind) {
rpc_force_rebind(clnt);
+ if (RPC_IS_SOFTCONN(task))
+ break;
goto out_retry;
}
fallthrough;
case -ECONNABORTED:
- case -ENETDOWN:
- case -ENETUNREACH:
case -EHOSTUNREACH:
case -EPIPE:
case -EPROTO:
@@ -2454,10 +2465,13 @@ call_status(struct rpc_task *task)
trace_rpc_call_status(task);
task->tk_status = 0;
switch(status) {
- case -EHOSTDOWN:
case -ENETDOWN:
- case -EHOSTUNREACH:
case -ENETUNREACH:
+ if (task->tk_flags & RPC_TASK_NETUNREACH_FATAL)
+ goto out_exit;
+ fallthrough;
+ case -EHOSTDOWN:
+ case -EHOSTUNREACH:
case -EPERM:
if (RPC_IS_SOFTCONN(task))
goto out_exit;
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH RFC 3/4] pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
2025-03-20 17:44 [PATCH RFC 0/4] Containerised NFS clients and teardown trondmy
2025-03-20 17:44 ` [PATCH RFC 1/4] NFS: Add a mount option to make ENETUNREACH errors fatal trondmy
2025-03-20 17:44 ` [PATCH RFC 2/4] NFS: Treat ENETUNREACH errors as fatal in containers trondmy
@ 2025-03-20 17:44 ` trondmy
2025-03-20 17:44 ` [PATCH RFC 4/4] pNFS/flexfiles: Report ENETDOWN as a connection error trondmy
2025-03-20 19:32 ` [PATCH RFC 0/4] Containerised NFS clients and teardown Jeff Layton
4 siblings, 0 replies; 7+ messages in thread
From: trondmy @ 2025-03-20 17:44 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Josef Bacik
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the pNFS
flexfiles client. In these circumstances, the client needs to treat the
ENETDOWN and ENETUNREACH errors as fatal, and should abandon the
attempted I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 23 +++++++++++++++++++++--
fs/nfs/nfs3client.c | 2 ++
fs/nfs/nfs4client.c | 5 +++++
include/linux/nfs4.h | 1 +
4 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 98b45b636be3..f89fdba7289d 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1154,10 +1154,14 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task,
rpc_wake_up(&tbl->slot_tbl_waitq);
goto reset;
/* RPC connection errors */
+ case -ENETDOWN:
+ case -ENETUNREACH:
+ if (test_bit(NFS_CS_NETUNREACH_FATAL, &clp->cl_flags))
+ return -NFS4ERR_FATAL_IOERROR;
+ fallthrough;
case -ECONNREFUSED:
case -EHOSTDOWN:
case -EHOSTUNREACH:
- case -ENETUNREACH:
case -EIO:
case -ETIMEDOUT:
case -EPIPE:
@@ -1183,6 +1187,7 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task,
/* Retry all errors through either pNFS or MDS except for -EJUKEBOX */
static int ff_layout_async_handle_error_v3(struct rpc_task *task,
+ struct nfs_client *clp,
struct pnfs_layout_segment *lseg,
u32 idx)
{
@@ -1200,6 +1205,11 @@ static int ff_layout_async_handle_error_v3(struct rpc_task *task,
case -EJUKEBOX:
nfs_inc_stats(lseg->pls_layout->plh_inode, NFSIOS_DELAY);
goto out_retry;
+ case -ENETDOWN:
+ case -ENETUNREACH:
+ if (test_bit(NFS_CS_NETUNREACH_FATAL, &clp->cl_flags))
+ return -NFS4ERR_FATAL_IOERROR;
+ fallthrough;
default:
dprintk("%s DS connection error %d\n", __func__,
task->tk_status);
@@ -1234,7 +1244,7 @@ static int ff_layout_async_handle_error(struct rpc_task *task,
switch (vers) {
case 3:
- return ff_layout_async_handle_error_v3(task, lseg, idx);
+ return ff_layout_async_handle_error_v3(task, clp, lseg, idx);
case 4:
return ff_layout_async_handle_error_v4(task, state, clp,
lseg, idx);
@@ -1337,6 +1347,9 @@ static int ff_layout_read_done_cb(struct rpc_task *task,
return task->tk_status;
case -EAGAIN:
goto out_eagain;
+ case -NFS4ERR_FATAL_IOERROR:
+ task->tk_status = -EIO;
+ return 0;
}
return 0;
@@ -1507,6 +1520,9 @@ static int ff_layout_write_done_cb(struct rpc_task *task,
return task->tk_status;
case -EAGAIN:
return -EAGAIN;
+ case -NFS4ERR_FATAL_IOERROR:
+ task->tk_status = -EIO;
+ return 0;
}
if (hdr->res.verf->committed == NFS_FILE_SYNC ||
@@ -1551,6 +1567,9 @@ static int ff_layout_commit_done_cb(struct rpc_task *task,
case -EAGAIN:
rpc_restart_call_prepare(task);
return -EAGAIN;
+ case -NFS4ERR_FATAL_IOERROR:
+ task->tk_status = -EIO;
+ return 0;
}
ff_layout_set_layoutcommit(data->inode, data->lseg, data->lwb);
diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index b0c8a39c2bbd..0d7310c1ee0c 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -120,6 +120,8 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv,
if (mds_srv->flags & NFS_MOUNT_NORESVPORT)
__set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
+ if (test_bit(NFS_CS_NETUNREACH_FATAL, &mds_clp->cl_flags))
+ __set_bit(NFS_CS_NETUNREACH_FATAL, &cl_init.init_flags);
__set_bit(NFS_CS_DS, &cl_init.init_flags);
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 83378f69b35e..96e7a17203a4 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -937,6 +937,9 @@ static int nfs4_set_client(struct nfs_server *server,
__set_bit(NFS_CS_TSM_POSSIBLE, &cl_init.init_flags);
server->port = rpc_get_port((struct sockaddr *)addr);
+ if (server->options & NFS_MOUNT_NETUNREACH_FATAL)
+ __set_bit(NFS_CS_NETUNREACH_FATAL, &cl_init.init_flags);
+
/* Allocate or find a client reference we can use */
clp = nfs_get_client(&cl_init);
if (IS_ERR(clp))
@@ -1011,6 +1014,8 @@ struct nfs_client *nfs4_set_ds_client(struct nfs_server *mds_srv,
if (mds_srv->flags & NFS_MOUNT_NORESVPORT)
__set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags);
+ if (test_bit(NFS_CS_NETUNREACH_FATAL, &mds_clp->cl_flags))
+ __set_bit(NFS_CS_NETUNREACH_FATAL, &cl_init.init_flags);
__set_bit(NFS_CS_PNFS, &cl_init.init_flags);
cl_init.max_connect = NFS_MAX_TRANSPORTS;
diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 5fa60fe441b5..d8cad844870a 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -300,6 +300,7 @@ enum nfsstat4 {
/* error codes for internal client use */
#define NFS4ERR_RESET_TO_MDS 12001
#define NFS4ERR_RESET_TO_PNFS 12002
+#define NFS4ERR_FATAL_IOERROR 12003
static inline bool seqid_mutating_err(u32 err)
{
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH RFC 4/4] pNFS/flexfiles: Report ENETDOWN as a connection error
2025-03-20 17:44 [PATCH RFC 0/4] Containerised NFS clients and teardown trondmy
` (2 preceding siblings ...)
2025-03-20 17:44 ` [PATCH RFC 3/4] pNFS/flexfiles: " trondmy
@ 2025-03-20 17:44 ` trondmy
2025-03-20 19:32 ` [PATCH RFC 0/4] Containerised NFS clients and teardown Jeff Layton
4 siblings, 0 replies; 7+ messages in thread
From: trondmy @ 2025-03-20 17:44 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Josef Bacik
From: Trond Myklebust <trond.myklebust@hammerspace.com>
If the client should see an ENETDOWN when trying to connect to the data
server, it might still be able to talk to the metadata server through
another NIC. If so, report the error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index f89fdba7289d..61ad269c825f 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1274,6 +1274,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
case -ECONNRESET:
case -EHOSTDOWN:
case -EHOSTUNREACH:
+ case -ENETDOWN:
case -ENETUNREACH:
case -EADDRINUSE:
case -ENOBUFS:
--
2.48.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH RFC 0/4] Containerised NFS clients and teardown
2025-03-20 17:44 [PATCH RFC 0/4] Containerised NFS clients and teardown trondmy
` (3 preceding siblings ...)
2025-03-20 17:44 ` [PATCH RFC 4/4] pNFS/flexfiles: Report ENETDOWN as a connection error trondmy
@ 2025-03-20 19:32 ` Jeff Layton
2025-03-20 20:40 ` Trond Myklebust
4 siblings, 1 reply; 7+ messages in thread
From: Jeff Layton @ 2025-03-20 19:32 UTC (permalink / raw)
To: trondmy, linux-nfs; +Cc: Josef Bacik
[-- Attachment #1: Type: text/plain, Size: 6694 bytes --]
On Thu, 2025-03-20 at 13:44 -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> When a NFS client is started from inside a container, it is often not
> possible to ensure a safe shutdown and flush of the data before the
> container orchestrator steps in to tear down the network. Typically,
> what can happen is that the orchestrator triggers a lazy umount of the
> mounted filesystems, then proceeds to delete virtual network device
> links, bridges, NAT configurations, etc.
>
> Once that happens, it may be impossible to reach into the container to
> perform any further shutdown actions on the NFS client.
>
> This patchset proposes to allow the client to deal with these situations
> by treating the two errors ENETDOWN and ENETUNREACH as being fatal.
> The intention is to then allow the I/O queue to drain, and any remaining
> RPC calls to error out, so that the lazy umounts can complete the
> shutdown process.
>
> In order to do so, a new mount option "fatal_errors" is introduced,
> which can take the values "default", "none" and "enetdown:enetunreach".
> The value "none" forces the existing behaviour, whereby hard mounts are
> unaffected by the ENETDOWN and ENETUNREACH errors.
> The value "enetdown:enetunreach" forces ENETDOWN and ENETUNREACH errors
> to always be fatal.
> If the user does not specify the "fatal_errors" option, or uses the
> value "default", then ENETDOWN and ENETUNREACH will be fatal if the
> mount was started from inside a network namespace that is not
> "init_net", and otherwise not.
>
> The expectation is that users will normally not need to set this option,
> unless they are running inside a container, and want to prevent ENETDOWN
> and ENETUNREACH from being fatal by setting "-ofatal_errors=none".
>
> Trond Myklebust (4):
> NFS: Add a mount option to make ENETUNREACH errors fatal
> NFS: Treat ENETUNREACH errors as fatal in containers
> pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
> pNFS/flexfiles: Report ENETDOWN as a connection error
>
> fs/nfs/client.c | 5 ++++
> fs/nfs/flexfilelayout/flexfilelayout.c | 24 ++++++++++++++--
> fs/nfs/fs_context.c | 38 ++++++++++++++++++++++++++
> fs/nfs/nfs3client.c | 2 ++
> fs/nfs/nfs4client.c | 5 ++++
> fs/nfs/nfs4proc.c | 3 ++
> fs/nfs/super.c | 2 ++
> include/linux/nfs4.h | 1 +
> include/linux/nfs_fs_sb.h | 2 ++
> include/linux/sunrpc/clnt.h | 5 +++-
> include/linux/sunrpc/sched.h | 1 +
> net/sunrpc/clnt.c | 30 ++++++++++++++------
> 12 files changed, 107 insertions(+), 11 deletions(-)
>
I like the concept, but unfortunately it doesn't help with the
reproducer I have. The rpc_tasks remain stuck. Here's the contents of
the rpc_tasks file:
252 c825 0 0x3 0xd2147cd2 2147 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
251 c825 0 0x3 0xd3147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
241 c825 0 0x3 0xd4147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
531 c825 0 0x3 0xd5147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
640 c825 0 0x3 0xd6147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
634 c825 0 0x3 0xd7147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
564 c825 0 0x3 0xd8147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
567 c825 0 0x3 0xd9147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
258 c825 0 0x3 0xda147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
259 c825 0 0x3 0xdb147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
1159 c825 0 0x3 0xdc147cd2 2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
246 c825 0 0x3 0xdd147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
536 c825 0 0x3 0xde147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
645 c825 0 0x3 0xdf147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
637 c825 0 0x3 0xe0147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
572 c825 0 0x3 0xe1147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
568 c825 0 0x3 0xe2147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
263 c825 0 0x3 0xe3147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
1163 c825 0 0x3 0xe4147cd2 2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
262 c825 0 0x3 0xe5147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
1162 c825 0 0x3 0xe6147cd2 2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
250 c825 0 0x3 0xe7147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
537 c825 0 0x3 0xe8147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
646 c825 0 0x3 0xe9147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
642 c825 0 0x3 0xea147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
1165 c825 0 0x3 0xeb147cd2 2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
579 c825 0 0x3 0xec147cd2 2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
574 c825 0 0x3 0xed147cd2 2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
269 c825 0 0x3 0xee147cd2 2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
265 c825 0 0x3 0xef147cd2 2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
I turned up a bunch of tracepoints, and collected some output for a
while waiting for the tasks to die. It's attached.
I see some ENETUNREACH (-101) errors in there, but the rpc_tasks didn't
die off. It looks sort of like the rpc_task flag didn't get set
properly? I'll plan to take a closer look tomorrow unless you figure it
out.
--
Jeff Layton <jlayton@kernel.org>
[-- Attachment #2: nfs-nonet-trace.txt.gz --]
[-- Type: application/gzip, Size: 52907 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH RFC 0/4] Containerised NFS clients and teardown
2025-03-20 19:32 ` [PATCH RFC 0/4] Containerised NFS clients and teardown Jeff Layton
@ 2025-03-20 20:40 ` Trond Myklebust
0 siblings, 0 replies; 7+ messages in thread
From: Trond Myklebust @ 2025-03-20 20:40 UTC (permalink / raw)
To: linux-nfs@vger.kernel.org, jlayton@kernel.org; +Cc: josef@toxicpanda.com
On Thu, 2025-03-20 at 15:32 -0400, Jeff Layton wrote:
> On Thu, 2025-03-20 at 13:44 -0400, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >
> > When a NFS client is started from inside a container, it is often
> > not
> > possible to ensure a safe shutdown and flush of the data before the
> > container orchestrator steps in to tear down the network.
> > Typically,
> > what can happen is that the orchestrator triggers a lazy umount of
> > the
> > mounted filesystems, then proceeds to delete virtual network device
> > links, bridges, NAT configurations, etc.
> >
> > Once that happens, it may be impossible to reach into the container
> > to
> > perform any further shutdown actions on the NFS client.
> >
> > This patchset proposes to allow the client to deal with these
> > situations
> > by treating the two errors ENETDOWN and ENETUNREACH as being
> > fatal.
> > The intention is to then allow the I/O queue to drain, and any
> > remaining
> > RPC calls to error out, so that the lazy umounts can complete the
> > shutdown process.
> >
> > In order to do so, a new mount option "fatal_errors" is introduced,
> > which can take the values "default", "none" and
> > "enetdown:enetunreach".
> > The value "none" forces the existing behaviour, whereby hard mounts
> > are
> > unaffected by the ENETDOWN and ENETUNREACH errors.
> > The value "enetdown:enetunreach" forces ENETDOWN and ENETUNREACH
> > errors
> > to always be fatal.
> > If the user does not specify the "fatal_errors" option, or uses the
> > value "default", then ENETDOWN and ENETUNREACH will be fatal if the
> > mount was started from inside a network namespace that is not
> > "init_net", and otherwise not.
> >
> > The expectation is that users will normally not need to set this
> > option,
> > unless they are running inside a container, and want to prevent
> > ENETDOWN
> > and ENETUNREACH from being fatal by setting "-ofatal_errors=none".
> >
> > Trond Myklebust (4):
> > NFS: Add a mount option to make ENETUNREACH errors fatal
> > NFS: Treat ENETUNREACH errors as fatal in containers
> > pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
> > pNFS/flexfiles: Report ENETDOWN as a connection error
> >
> > fs/nfs/client.c | 5 ++++
> > fs/nfs/flexfilelayout/flexfilelayout.c | 24 ++++++++++++++--
> > fs/nfs/fs_context.c | 38
> > ++++++++++++++++++++++++++
> > fs/nfs/nfs3client.c | 2 ++
> > fs/nfs/nfs4client.c | 5 ++++
> > fs/nfs/nfs4proc.c | 3 ++
> > fs/nfs/super.c | 2 ++
> > include/linux/nfs4.h | 1 +
> > include/linux/nfs_fs_sb.h | 2 ++
> > include/linux/sunrpc/clnt.h | 5 +++-
> > include/linux/sunrpc/sched.h | 1 +
> > net/sunrpc/clnt.c | 30 ++++++++++++++------
> > 12 files changed, 107 insertions(+), 11 deletions(-)
> >
>
> I like the concept, but unfortunately it doesn't help with the
> reproducer I have. The rpc_tasks remain stuck. Here's the contents of
> the rpc_tasks file:
>
> 252 c825 0 0x3 0xd2147cd2 2147 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 251 c825 0 0x3 0xd3147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 241 c825 0 0x3 0xd4147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 531 c825 0 0x3 0xd5147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 640 c825 0 0x3 0xd6147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 READ a:call_bind [sunrpc] q:delayq
> 634 c825 0 0x3 0xd7147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 564 c825 0 0x3 0xd8147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 567 c825 0 0x3 0xd9147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 258 c825 0 0x3 0xda147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 259 c825 0 0x3 0xdb147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 1159 c825 0 0x3 0xdc147cd2 2146 nfs_commit_ops [nfs] nfsv4
> COMMIT a:call_bind [sunrpc] q:delayq
> 246 c825 0 0x3 0xdd147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 536 c825 0 0x3 0xde147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 645 c825 0 0x3 0xdf147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 READ a:call_bind [sunrpc] q:delayq
> 637 c825 0 0x3 0xe0147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 572 c825 0 0x3 0xe1147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 568 c825 0 0x3 0xe2147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 263 c825 0 0x3 0xe3147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 1163 c825 0 0x3 0xe4147cd2 2146 nfs_commit_ops [nfs] nfsv4
> COMMIT a:call_bind [sunrpc] q:delayq
> 262 c825 0 0x3 0xe5147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 1162 c825 0 0x3 0xe6147cd2 2146 nfs_commit_ops [nfs] nfsv4
> COMMIT a:call_bind [sunrpc] q:delayq
> 250 c825 0 0x3 0xe7147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 537 c825 0 0x3 0xe8147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 646 c825 0 0x3 0xe9147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 READ a:call_bind [sunrpc] q:delayq
> 642 c825 0 0x3 0xea147cd2 2146 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 1165 c825 0 0x3 0xeb147cd2 2146 nfs_commit_ops [nfs] nfsv4
> COMMIT a:call_bind [sunrpc] q:delayq
> 579 c825 0 0x3 0xec147cd2 2145 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 574 c825 0 0x3 0xed147cd2 2145 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 269 c825 0 0x3 0xee147cd2 2145 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
> 265 c825 0 0x3 0xef147cd2 2145 nfs_pgio_common_ops [nfs]
> nfsv4 WRITE a:call_bind [sunrpc] q:delayq
>
> I turned up a bunch of tracepoints, and collected some output for a
> while waiting for the tasks to die. It's attached.
>
> I see some ENETUNREACH (-101) errors in there, but the rpc_tasks
> didn't
> die off. It looks sort of like the rpc_task flag didn't get set
> properly? I'll plan to take a closer look tomorrow unless you figure
> it
> out.
Ah, crap... The client clp->cl_flag gets initialised differently in
NFSv4, so the mount flag wasn't getting propagated.
A v2 is forthcoming with the fix.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 7+ messages in thread