From: Oleksandr Tymoshenko <ovt@google.com>
To: trondmy@kernel.org
Cc: anna@kernel.org, jbongio@google.com, linux-nfs@vger.kernel.org,
ovt@google.com, stable@vger.kernel.org, gonzo@bluezbox.com
Subject: [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
Date: Mon, 9 Sep 2024 17:46:59 +0000 [thread overview]
Message-ID: <20240909174659.2163601-1-ovt@google.com> (raw)
In-Reply-To: <8f2e20f2fc894398da371517c6c8111aba072fb1.camel@kernel.org>
>> nfs41_init_clientid does not signal a failure condition from
>> nfs4_proc_exchange_id and nfs4_proc_create_session to a client which
>> may
>> lead to mount syscall indefinitely blocked in the following stack
> NACK. This will break all sorts of recovery scenarios, because it
> doesn't distinguish between an initial 'mount' and a server reboot
> recovery situation.
> Even in the case where we are in the initial mount, it also doesn't
> distinguish between transient errors such as NFS4ERR_DELAY or reboot
> errors such as NFS4ERR_STALE_CLIENTID, etc.
> Exactly what is the scenario that is causing your hang? Let's try to
> address that with a more targeted fix.
(re-sending with the correct subject, previous mistake was due to my tools failure)
The scenario is as follows: there are several NFS servers and several
production machines with multiple NFS mounts. This is a containerized
multi-tennant workflow so every tennant gets its own NFS mount to access their
data. At some point nfs41_init_clientid fails in the initial mount.nfs call
and all subsequent mount.nfs calls just hang in nfs_wait_client_init_complete
until the original one, where nfs4_proc_exchange_id has failed, is killed.
The cause of the nfs41_init_clientid failure in the production case is a timeout.
The following error message is observed in logs:
NFS: state manager: lease expired failed on NFSv4 server <ip> with error 110
prev parent reply other threads:[~2024-09-09 17:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-06 0:57 [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client Oleksandr Tymoshenko
2024-09-06 0:58 ` kernel test robot
2024-09-08 16:48 ` Trond Myklebust
2024-09-09 16:36 ` [PATCH 6.1.y] net: tls: handle backlogging of crypto requests Oleksandr Tymoshenko
2024-09-09 17:56 ` Trond Myklebust
2024-09-09 23:06 ` [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client Oleksandr Tymoshenko
2024-09-10 0:22 ` Trond Myklebust
2024-09-10 21:08 ` Oleksandr Tymoshenko
2024-09-23 20:15 ` Oleksandr Tymoshenko
2024-09-26 20:02 ` Trond Myklebust
2024-09-09 17:46 ` Oleksandr Tymoshenko [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240909174659.2163601-1-ovt@google.com \
--to=ovt@google.com \
--cc=anna@kernel.org \
--cc=gonzo@bluezbox.com \
--cc=jbongio@google.com \
--cc=linux-nfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=trondmy@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.