From: Trond Myklebust <trondmy@hammerspace.com>
To: "jlayton@kernel.org" <jlayton@kernel.org>,
"bcodding@redhat.com" <bcodding@redhat.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"josef@toxicpanda.com" <josef@toxicpanda.com>
Subject: Re: [PATCH v3 6/6] NFSv4: Treat ENETUNREACH errors as fatal for state recovery
Date: Wed, 26 Mar 2025 13:10:02 +0000 [thread overview]
Message-ID: <1b4fe6e8af344cc44b039b02ef0ce67702a91140.camel@hammerspace.com> (raw)
In-Reply-To: <578359da41b13a0ca35879e61e8769b95546e480.camel@kernel.org>
On Wed, 2025-03-26 at 07:18 -0400, Jeff Layton wrote:
> On Wed, 2025-03-26 at 06:39 -0400, Benjamin Coddington wrote:
> > On 25 Mar 2025, at 18:35, trondmy@kernel.org wrote:
> >
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > >
> > > If a containerised process is killed and causes an ENETUNREACH or
> > > ENETDOWN error to be propagated to the state manager, then mark
> > > the
> > > nfs_client as being dead so that we don't loop in functions that
> > > are
> > > expecting recovery to succeed.
> > >
> > > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > ---
> > > fs/nfs/nfs4state.c | 10 +++++++++-
> > > 1 file changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > index 272d2ebdae0f..7612e977e80b 100644
> > > --- a/fs/nfs/nfs4state.c
> > > +++ b/fs/nfs/nfs4state.c
> > > @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct
> > > nfs_client *clp)
> > > pr_warn_ratelimited("NFS: state manager%s%s failed on
> > > NFSv4 server %s"
> > > " with error %d\n", section_sep,
> > > section,
> > > clp->cl_hostname, -status);
> > > - ssleep(1);
> > > + switch (status) {
> > > + case -ENETDOWN:
> > > + case -ENETUNREACH:
> > > + nfs_mark_client_ready(clp, -EIO);
> > > + break;
> > > + default:
> > > + ssleep(1);
> > > + break;
> > > + }
> > > out_drain:
> > > memalloc_nofs_restore(memflags);
> > > nfs4_end_drain_session(clp);
> > > --
> > > 2.49.0
> >
> > Doesn't this have the same bug as the sysfs shutdown - in that a
> > mount with
> > fatal_neterrors=ENETDOWN:ENETUNREACH can take down the state
> > manager for a
> > mount without it? I think the same consideration applies as
> > shutdown so
> > far: in practical use, you're not going to care.
> >
>
> True. In principle we probably ought to avoid sharing superblocks
> between mounts with different fatal_errors= options. In practice, I
> agree that no one will care since this means that the server is
> borked.
>
> > Another thought - its pretty subtle that the only way those errors
> > might/should reach us here is if that mount option is in play.
> >
> > Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
> >
> > Ben
> >
>
The difference here is that the fatal_neterrors option as it stands
today, only looks at whether or not your request is routable. It is not
intended to function as a mechanism for dealing with servers being
down. It only works as a last resort for when the host's orchestrator
software destroys a container's local virtual network devices without
first ensuring that its mounts have been safely shut down.
IOW: yes there are some subtleties here, which is why we need a mount
option in the first place to allow override of the default behaviours.
However those default settings as they stand are hopefully sufficiently
conservative that admins should only rarely need to override them.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
prev parent reply other threads:[~2025-03-26 13:10 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-25 22:35 [PATCH v3 0/6] Ensure that ENETUNREACH terminates state recovery trondmy
2025-03-25 22:35 ` [PATCH v3 1/6] SUNRPC: rpcbind should never reset the port to the value '0' trondmy
2025-03-26 10:43 ` Benjamin Coddington
2025-03-25 22:35 ` [PATCH v3 2/6] SUNRPC: rpc_clnt_set_transport() must not change the autobind setting trondmy
2025-03-26 10:43 ` Benjamin Coddington
2025-03-25 22:35 ` [PATCH v3 3/6] NFS: Shut down the nfs_client only after all the superblocks trondmy
2025-03-26 0:15 ` Jeff Layton
2025-03-26 0:37 ` Trond Myklebust
2025-03-26 10:07 ` Benjamin Coddington
2025-03-25 22:35 ` [PATCH v3 4/6] NFSv4: Further cleanups to shutdown loops trondmy
2025-03-26 0:17 ` Jeff Layton
2025-03-26 10:13 ` Jeff Layton
2025-03-26 10:46 ` Benjamin Coddington
2025-03-26 18:11 ` Trond Myklebust
2025-03-26 18:21 ` Jeff Layton
2025-03-26 18:24 ` Trond Myklebust
2025-03-27 0:35 ` Jeff Layton
2025-03-25 22:35 ` [PATCH v3 5/6] NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client trondmy
2025-03-26 0:17 ` Jeff Layton
2025-03-26 10:46 ` Benjamin Coddington
2025-03-25 22:35 ` [PATCH v3 6/6] NFSv4: Treat ENETUNREACH errors as fatal for state recovery trondmy
2025-03-26 10:39 ` Benjamin Coddington
2025-03-26 11:18 ` Jeff Layton
2025-03-26 13:10 ` Trond Myklebust [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1b4fe6e8af344cc44b039b02ef0ce67702a91140.camel@hammerspace.com \
--to=trondmy@hammerspace.com \
--cc=bcodding@redhat.com \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox