From: Trond Myklebust <trondmy@kernel.org>
To: Zhihao Cheng <chengzhihao1@huawei.com>, anna@kernel.org
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org,
yangerkun@huawei.com, Li Lingfeng <lilingfeng3@huawei.com>
Subject: Re: [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period
Date: Wed, 22 Apr 2026 08:38:25 -0400 [thread overview]
Message-ID: <ac3165e050f4fa9ca4dfd102f7ec1c5e554db693.camel@kernel.org> (raw)
In-Reply-To: <e8c5a503-e8b4-11ef-68fb-a0195ce07b07@huawei.com>
On Wed, 2026-04-22 at 14:55 +0800, Zhihao Cheng wrote:
> 在 2026/4/22 14:44, Zhihao Cheng 写道:
> Add lilingfeng3@huawei.com
> > NFS server restart causes client to enter an infinite loop during
> > state
> > recovery. The state manager gets stuck in NFS4CLNT_RECLAIM_NOGRACE
> > processing,
> > with the server repeatedly returning NFS4ERR_GRACE for each file
> > iteration.
> > This problem is reported in [1].
> >
> > Trigger sequence:
> > 1. Client opens 2 files. After server reboot, client enters
> > nfs4_do_reclaim(RECLAIM_REBOOT). Server misses grace period
> > and returns
> > NFS4ERR_NO_GRACE, causing client to set
> > NFS4CLNT_RECLAIM_NOGRACE.
> > 2. Client enters nfs4_do_reclaim(RECLAIM_NOGRACE) to recover
> > first file.
> > Server reboots again, open request returns NFS4ERR_BADSESSION,
> > client
> > sets NFS4CLNT_SESSION_RESET.
> > 3. nfs4_reset_session calls nfs4_proc_create_session which fails
> > with
> > ETIMEDOUT due to network¹ÊÕÏ, nfs4_handle_reclaim_lease_error
> > sets
> > NFS4CLNT_LEASE_EXPIRED but does NOT set
> > NFS4CLNT_RECLAIM_REBOOT.
> > 4. When nfs4_reclaim_lease runs, because NFS4CLNT_RECLAIM_NOGRACE
> > is already
> > set, it skips setting NFS4CLNT_RECLAIM_REBOOT (the bug,
> > modified by
> > commit b42353ff8d346 ("NFSv4.1: Clean up
> > nfs4_reclaim_lease")).
> > 5. Server never receives RECLAIM_COMPLETE, so cl_flags lacks
> > NFSD4_CLIENT_RECLAIM_COMPLETE. When processing subsequent
> > files,
> > server always returns nfserr_grace, causing infinite retry
> > loop.
> >
> > Fix it by setting NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease if
> > NFS4CLNT_SERVER_SCOPE_MISMATCH is not set, so that the client sends
> > RECLAIM_COMPLETE to the server first, allowing subsequent nograce
> > recovery to proceed.
> >
> > Fetch a reproducer in [2].
> >
> > [1]
> > https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/
> > [2] https://bugzilla.kernel.org/show_bug.cgi?id=221399
> >
> > Fixes: b42353ff8d346 ("NFSv4.1: Clean up nfs4_reclaim_lease")
> > Cc: stable@vger.kernel.org
> > Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
> > Closes:
> > https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/
> > Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> > ---
> > fs/nfs/nfs4state.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index 305a772e5497..817327e73d88 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -2012,7 +2012,7 @@ static int nfs4_reclaim_lease(struct
> > nfs_client *clp)
> > return nfs4_handle_reclaim_lease_error(clp,
> > status);
> > if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH,
> > &clp->cl_state))
> > nfs4_state_start_reclaim_nograce(clp);
> > - if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
> > + else
> > set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
> > clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
> > clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
> >
>
This will cause the client to try to do reboot recovery in a situation
where it isn't allowed to do so by the spec. We should never be setting
NFS4CLNT_RECLAIM_REBOOT if NFS4CLNT_RECLAIM_NOGRACE is already set.
One solution would be to just immediately call
nfs4_state_end_reclaim_reboot() if NFS4CLNT_RECLAIM_NOGRACE is set.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com
next prev parent reply other threads:[~2026-04-22 12:38 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-22 6:44 [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period Zhihao Cheng
2026-04-22 6:55 ` Zhihao Cheng
2026-04-22 12:38 ` Trond Myklebust [this message]
2026-04-23 9:05 ` Zhihao Cheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ac3165e050f4fa9ca4dfd102f7ec1c5e554db693.camel@kernel.org \
--to=trondmy@kernel.org \
--cc=anna@kernel.org \
--cc=chengzhihao1@huawei.com \
--cc=lilingfeng3@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=yangerkun@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox