Re: [PATCH RFC 0/4] Containerised NFS clients and teardown

Linux NFS development
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@kernel.org>
To: trondmy@kernel.org, linux-nfs@vger.kernel.org
Cc: Josef Bacik <josef@toxicpanda.com>
Subject: Re: [PATCH RFC 0/4] Containerised NFS clients and teardown
Date: Thu, 20 Mar 2025 15:32:10 -0400	[thread overview]
Message-ID: <143c1b07b8c8957ee3041cf7872a80965b14b4fd.camel@kernel.org> (raw)
In-Reply-To: <cover.1742490771.git.trond.myklebust@hammerspace.com>

[-- Attachment #1: Type: text/plain, Size: 6694 bytes --]

On Thu, 2025-03-20 at 13:44 -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> When a NFS client is started from inside a container, it is often not
> possible to ensure a safe shutdown and flush of the data before the
> container orchestrator steps in to tear down the network. Typically,
> what can happen is that the orchestrator triggers a lazy umount of the
> mounted filesystems, then proceeds to delete virtual network device
> links, bridges, NAT configurations, etc.
> 
> Once that happens, it may be impossible to reach into the container to
> perform any further shutdown actions on the NFS client.
> 
> This patchset proposes to allow the client to deal with these situations
> by treating the two errors ENETDOWN  and ENETUNREACH as being fatal.
> The intention is to then allow the I/O queue to drain, and any remaining
> RPC calls to error out, so that the lazy umounts can complete the
> shutdown process.
> 
> In order to do so, a new mount option "fatal_errors" is introduced,
> which can take the values "default", "none" and "enetdown:enetunreach".
> The value "none" forces the existing behaviour, whereby hard mounts are
> unaffected by the ENETDOWN and ENETUNREACH errors.
> The value "enetdown:enetunreach" forces ENETDOWN and ENETUNREACH errors
> to always be fatal.
> If the user does not specify the "fatal_errors" option, or uses the
> value "default", then ENETDOWN and ENETUNREACH will be fatal if the
> mount was started from inside a network namespace that is not
> "init_net", and otherwise not.
> 
> The expectation is that users will normally not need to set this option,
> unless they are running inside a container, and want to prevent ENETDOWN
> and ENETUNREACH from being fatal by setting "-ofatal_errors=none".
> 
> Trond Myklebust (4):
>   NFS: Add a mount option to make ENETUNREACH errors fatal
>   NFS: Treat ENETUNREACH errors as fatal in containers
>   pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
>   pNFS/flexfiles: Report ENETDOWN as a connection error
> 
>  fs/nfs/client.c                        |  5 ++++
>  fs/nfs/flexfilelayout/flexfilelayout.c | 24 ++++++++++++++--
>  fs/nfs/fs_context.c                    | 38 ++++++++++++++++++++++++++
>  fs/nfs/nfs3client.c                    |  2 ++
>  fs/nfs/nfs4client.c                    |  5 ++++
>  fs/nfs/nfs4proc.c                      |  3 ++
>  fs/nfs/super.c                         |  2 ++
>  include/linux/nfs4.h                   |  1 +
>  include/linux/nfs_fs_sb.h              |  2 ++
>  include/linux/sunrpc/clnt.h            |  5 +++-
>  include/linux/sunrpc/sched.h           |  1 +
>  net/sunrpc/clnt.c                      | 30 ++++++++++++++------
>  12 files changed, 107 insertions(+), 11 deletions(-)
> 

I like the concept, but unfortunately it doesn't help with the
reproducer I have. The rpc_tasks remain stuck. Here's the contents of
the rpc_tasks file:

  252 c825      0 0x3 0xd2147cd2     2147 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  251 c825      0 0x3 0xd3147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  241 c825      0 0x3 0xd4147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  531 c825      0 0x3 0xd5147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  640 c825      0 0x3 0xd6147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
  634 c825      0 0x3 0xd7147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  564 c825      0 0x3 0xd8147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  567 c825      0 0x3 0xd9147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  258 c825      0 0x3 0xda147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  259 c825      0 0x3 0xdb147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
 1159 c825      0 0x3 0xdc147cd2     2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
  246 c825      0 0x3 0xdd147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  536 c825      0 0x3 0xde147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  645 c825      0 0x3 0xdf147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
  637 c825      0 0x3 0xe0147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  572 c825      0 0x3 0xe1147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  568 c825      0 0x3 0xe2147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  263 c825      0 0x3 0xe3147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
 1163 c825      0 0x3 0xe4147cd2     2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
  262 c825      0 0x3 0xe5147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
 1162 c825      0 0x3 0xe6147cd2     2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
  250 c825      0 0x3 0xe7147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  537 c825      0 0x3 0xe8147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  646 c825      0 0x3 0xe9147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
  642 c825      0 0x3 0xea147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
 1165 c825      0 0x3 0xeb147cd2     2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
  579 c825      0 0x3 0xec147cd2     2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  574 c825      0 0x3 0xed147cd2     2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  269 c825      0 0x3 0xee147cd2     2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  265 c825      0 0x3 0xef147cd2     2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq

I turned up a bunch of tracepoints, and collected some output for a
while waiting for the tasks to die. It's attached.

I see some ENETUNREACH (-101) errors in there, but the rpc_tasks didn't
die off. It looks sort of like the rpc_task flag didn't get set
properly? I'll plan to take a closer look tomorrow unless you figure it
out.
-- 
Jeff Layton <jlayton@kernel.org>

[-- Attachment #2: nfs-nonet-trace.txt.gz --]
[-- Type: application/gzip, Size: 52907 bytes --]

next prev parent reply	other threads:[~2025-03-20 19:32 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-20 17:44 [PATCH RFC 0/4] Containerised NFS clients and teardown trondmy
2025-03-20 17:44 ` [PATCH RFC 1/4] NFS: Add a mount option to make ENETUNREACH errors fatal trondmy
2025-03-20 17:44 ` [PATCH RFC 2/4] NFS: Treat ENETUNREACH errors as fatal in containers trondmy
2025-03-20 17:44 ` [PATCH RFC 3/4] pNFS/flexfiles: " trondmy
2025-03-20 17:44 ` [PATCH RFC 4/4] pNFS/flexfiles: Report ENETDOWN as a connection error trondmy
2025-03-20 19:32 ` Jeff Layton [this message]
2025-03-20 20:40   ` [PATCH RFC 0/4] Containerised NFS clients and teardown Trond Myklebust

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=143c1b07b8c8957ee3041cf7872a80965b14b4fd.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox