public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@kernel.org>
To: Zhihao Cheng <chengzhihao1@huawei.com>, anna@kernel.org
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	 yangerkun@huawei.com, Li Lingfeng <lilingfeng3@huawei.com>
Subject: Re: [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period
Date: Wed, 22 Apr 2026 08:38:25 -0400	[thread overview]
Message-ID: <ac3165e050f4fa9ca4dfd102f7ec1c5e554db693.camel@kernel.org> (raw)
In-Reply-To: <e8c5a503-e8b4-11ef-68fb-a0195ce07b07@huawei.com>

On Wed, 2026-04-22 at 14:55 +0800, Zhihao Cheng wrote:
> 在 2026/4/22 14:44, Zhihao Cheng 写道:
> Add lilingfeng3@huawei.com
> > NFS server restart causes client to enter an infinite loop during
> > state
> > recovery. The state manager gets stuck in NFS4CLNT_RECLAIM_NOGRACE
> > processing,
> > with the server repeatedly returning NFS4ERR_GRACE for each file
> > iteration.
> > This problem is reported in [1].
> > 
> > Trigger sequence:
> >   1. Client opens 2 files. After server reboot, client enters
> >      nfs4_do_reclaim(RECLAIM_REBOOT). Server misses grace period
> > and returns
> >      NFS4ERR_NO_GRACE, causing client to set
> > NFS4CLNT_RECLAIM_NOGRACE.
> >   2. Client enters nfs4_do_reclaim(RECLAIM_NOGRACE) to recover
> > first file.
> >      Server reboots again, open request returns NFS4ERR_BADSESSION,
> > client
> >      sets NFS4CLNT_SESSION_RESET.
> >   3. nfs4_reset_session calls nfs4_proc_create_session which fails
> > with
> >      ETIMEDOUT due to network¹ÊÕÏ, nfs4_handle_reclaim_lease_error
> > sets
> >      NFS4CLNT_LEASE_EXPIRED but does NOT set
> > NFS4CLNT_RECLAIM_REBOOT.
> >   4. When nfs4_reclaim_lease runs, because NFS4CLNT_RECLAIM_NOGRACE
> > is already
> >      set, it skips setting NFS4CLNT_RECLAIM_REBOOT (the bug,
> > modified by
> >      commit b42353ff8d346 ("NFSv4.1: Clean up
> > nfs4_reclaim_lease")).
> >   5. Server never receives RECLAIM_COMPLETE, so cl_flags lacks
> >      NFSD4_CLIENT_RECLAIM_COMPLETE. When processing subsequent
> > files,
> >      server always returns nfserr_grace, causing infinite retry
> > loop.
> > 
> > Fix it by setting NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease if
> > NFS4CLNT_SERVER_SCOPE_MISMATCH is not set, so that the client sends
> > RECLAIM_COMPLETE to the server first, allowing subsequent nograce
> > recovery to proceed.
> > 
> > Fetch a reproducer in [2].
> > 
> > [1]
> > https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/
> > [2] https://bugzilla.kernel.org/show_bug.cgi?id=221399
> > 
> > Fixes: b42353ff8d346 ("NFSv4.1: Clean up nfs4_reclaim_lease")
> > Cc: stable@vger.kernel.org
> > Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
> > Closes:
> > https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/
> > Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
> > ---
> >   fs/nfs/nfs4state.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index 305a772e5497..817327e73d88 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -2012,7 +2012,7 @@ static int nfs4_reclaim_lease(struct
> > nfs_client *clp)
> >   		return nfs4_handle_reclaim_lease_error(clp,
> > status);
> >   	if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH,
> > &clp->cl_state))
> >   		nfs4_state_start_reclaim_nograce(clp);
> > -	if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
> > +	else
> >   		set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
> >   	clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
> >   	clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
> > 
> 
This will cause the client to try to do reboot recovery in a situation
where it isn't allowed to do so by the spec. We should never be setting
NFS4CLNT_RECLAIM_REBOOT if NFS4CLNT_RECLAIM_NOGRACE is already set.

One solution would be to just immediately call
nfs4_state_end_reclaim_reboot() if NFS4CLNT_RECLAIM_NOGRACE is set.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

  reply	other threads:[~2026-04-22 12:38 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-22  6:44 [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period Zhihao Cheng
2026-04-22  6:55 ` Zhihao Cheng
2026-04-22 12:38   ` Trond Myklebust [this message]
2026-04-23  9:05     ` Zhihao Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac3165e050f4fa9ca4dfd102f7ec1c5e554db693.camel@kernel.org \
    --to=trondmy@kernel.org \
    --cc=anna@kernel.org \
    --cc=chengzhihao1@huawei.com \
    --cc=lilingfeng3@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=yangerkun@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox