From: Jeff Layton <jlayton@kernel.org>
To: trondmy@kernel.org, linux-nfs@vger.kernel.org
Cc: Josef Bacik <josef@toxicpanda.com>
Subject: Re: [PATCH RFC 0/4] Containerised NFS clients and teardown
Date: Thu, 20 Mar 2025 15:32:10 -0400 [thread overview]
Message-ID: <143c1b07b8c8957ee3041cf7872a80965b14b4fd.camel@kernel.org> (raw)
In-Reply-To: <cover.1742490771.git.trond.myklebust@hammerspace.com>
[-- Attachment #1: Type: text/plain, Size: 6694 bytes --]
On Thu, 2025-03-20 at 13:44 -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> When a NFS client is started from inside a container, it is often not
> possible to ensure a safe shutdown and flush of the data before the
> container orchestrator steps in to tear down the network. Typically,
> what can happen is that the orchestrator triggers a lazy umount of the
> mounted filesystems, then proceeds to delete virtual network device
> links, bridges, NAT configurations, etc.
>
> Once that happens, it may be impossible to reach into the container to
> perform any further shutdown actions on the NFS client.
>
> This patchset proposes to allow the client to deal with these situations
> by treating the two errors ENETDOWN and ENETUNREACH as being fatal.
> The intention is to then allow the I/O queue to drain, and any remaining
> RPC calls to error out, so that the lazy umounts can complete the
> shutdown process.
>
> In order to do so, a new mount option "fatal_errors" is introduced,
> which can take the values "default", "none" and "enetdown:enetunreach".
> The value "none" forces the existing behaviour, whereby hard mounts are
> unaffected by the ENETDOWN and ENETUNREACH errors.
> The value "enetdown:enetunreach" forces ENETDOWN and ENETUNREACH errors
> to always be fatal.
> If the user does not specify the "fatal_errors" option, or uses the
> value "default", then ENETDOWN and ENETUNREACH will be fatal if the
> mount was started from inside a network namespace that is not
> "init_net", and otherwise not.
>
> The expectation is that users will normally not need to set this option,
> unless they are running inside a container, and want to prevent ENETDOWN
> and ENETUNREACH from being fatal by setting "-ofatal_errors=none".
>
> Trond Myklebust (4):
> NFS: Add a mount option to make ENETUNREACH errors fatal
> NFS: Treat ENETUNREACH errors as fatal in containers
> pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
> pNFS/flexfiles: Report ENETDOWN as a connection error
>
> fs/nfs/client.c | 5 ++++
> fs/nfs/flexfilelayout/flexfilelayout.c | 24 ++++++++++++++--
> fs/nfs/fs_context.c | 38 ++++++++++++++++++++++++++
> fs/nfs/nfs3client.c | 2 ++
> fs/nfs/nfs4client.c | 5 ++++
> fs/nfs/nfs4proc.c | 3 ++
> fs/nfs/super.c | 2 ++
> include/linux/nfs4.h | 1 +
> include/linux/nfs_fs_sb.h | 2 ++
> include/linux/sunrpc/clnt.h | 5 +++-
> include/linux/sunrpc/sched.h | 1 +
> net/sunrpc/clnt.c | 30 ++++++++++++++------
> 12 files changed, 107 insertions(+), 11 deletions(-)
>
I like the concept, but unfortunately it doesn't help with the
reproducer I have. The rpc_tasks remain stuck. Here's the contents of
the rpc_tasks file:
252 c825 0 0x3 0xd2147cd2 2147 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
251 c825 0 0x3 0xd3147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
241 c825 0 0x3 0xd4147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
531 c825 0 0x3 0xd5147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
640 c825 0 0x3 0xd6147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
634 c825 0 0x3 0xd7147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
564 c825 0 0x3 0xd8147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
567 c825 0 0x3 0xd9147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
258 c825 0 0x3 0xda147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
259 c825 0 0x3 0xdb147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
1159 c825 0 0x3 0xdc147cd2 2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
246 c825 0 0x3 0xdd147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
536 c825 0 0x3 0xde147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
645 c825 0 0x3 0xdf147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
637 c825 0 0x3 0xe0147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
572 c825 0 0x3 0xe1147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
568 c825 0 0x3 0xe2147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
263 c825 0 0x3 0xe3147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
1163 c825 0 0x3 0xe4147cd2 2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
262 c825 0 0x3 0xe5147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
1162 c825 0 0x3 0xe6147cd2 2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
250 c825 0 0x3 0xe7147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
537 c825 0 0x3 0xe8147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
646 c825 0 0x3 0xe9147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
642 c825 0 0x3 0xea147cd2 2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
1165 c825 0 0x3 0xeb147cd2 2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
579 c825 0 0x3 0xec147cd2 2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
574 c825 0 0x3 0xed147cd2 2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
269 c825 0 0x3 0xee147cd2 2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
265 c825 0 0x3 0xef147cd2 2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
I turned up a bunch of tracepoints, and collected some output for a
while waiting for the tasks to die. It's attached.
I see some ENETUNREACH (-101) errors in there, but the rpc_tasks didn't
die off. It looks sort of like the rpc_task flag didn't get set
properly? I'll plan to take a closer look tomorrow unless you figure it
out.
--
Jeff Layton <jlayton@kernel.org>
[-- Attachment #2: nfs-nonet-trace.txt.gz --]
[-- Type: application/gzip, Size: 52907 bytes --]
next prev parent reply other threads:[~2025-03-20 19:32 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-20 17:44 [PATCH RFC 0/4] Containerised NFS clients and teardown trondmy
2025-03-20 17:44 ` [PATCH RFC 1/4] NFS: Add a mount option to make ENETUNREACH errors fatal trondmy
2025-03-20 17:44 ` [PATCH RFC 2/4] NFS: Treat ENETUNREACH errors as fatal in containers trondmy
2025-03-20 17:44 ` [PATCH RFC 3/4] pNFS/flexfiles: " trondmy
2025-03-20 17:44 ` [PATCH RFC 4/4] pNFS/flexfiles: Report ENETDOWN as a connection error trondmy
2025-03-20 19:32 ` Jeff Layton [this message]
2025-03-20 20:40 ` [PATCH RFC 0/4] Containerised NFS clients and teardown Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=143c1b07b8c8957ee3041cf7872a80965b14b4fd.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox