public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period
@ 2026-04-22  6:44 Zhihao Cheng
  2026-04-22  6:55 ` Zhihao Cheng
  0 siblings, 1 reply; 4+ messages in thread
From: Zhihao Cheng @ 2026-04-22  6:44 UTC (permalink / raw)
  To: trondmy, anna; +Cc: linux-nfs, linux-kernel, chengzhihao1, yangerkun

NFS server restart causes client to enter an infinite loop during state
recovery. The state manager gets stuck in NFS4CLNT_RECLAIM_NOGRACE processing,
with the server repeatedly returning NFS4ERR_GRACE for each file iteration.
This problem is reported in [1].

Trigger sequence:
 1. Client opens 2 files. After server reboot, client enters
    nfs4_do_reclaim(RECLAIM_REBOOT). Server misses grace period and returns
    NFS4ERR_NO_GRACE, causing client to set NFS4CLNT_RECLAIM_NOGRACE.
 2. Client enters nfs4_do_reclaim(RECLAIM_NOGRACE) to recover first file.
    Server reboots again, open request returns NFS4ERR_BADSESSION, client
    sets NFS4CLNT_SESSION_RESET.
 3. nfs4_reset_session calls nfs4_proc_create_session which fails with
    ETIMEDOUT due to network¹ÊÕÏ, nfs4_handle_reclaim_lease_error sets
    NFS4CLNT_LEASE_EXPIRED but does NOT set NFS4CLNT_RECLAIM_REBOOT.
 4. When nfs4_reclaim_lease runs, because NFS4CLNT_RECLAIM_NOGRACE is already
    set, it skips setting NFS4CLNT_RECLAIM_REBOOT (the bug, modified by
    commit b42353ff8d346 ("NFSv4.1: Clean up nfs4_reclaim_lease")).
 5. Server never receives RECLAIM_COMPLETE, so cl_flags lacks
    NFSD4_CLIENT_RECLAIM_COMPLETE. When processing subsequent files,
    server always returns nfserr_grace, causing infinite retry loop.

Fix it by setting NFS4CLNT_RECLAIM_REBOOT in nfs4_reclaim_lease if
NFS4CLNT_SERVER_SCOPE_MISMATCH is not set, so that the client sends
RECLAIM_COMPLETE to the server first, allowing subsequent nograce
recovery to proceed.

Fetch a reproducer in [2].

[1] https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/
[2] https://bugzilla.kernel.org/show_bug.cgi?id=221399

Fixes: b42353ff8d346 ("NFSv4.1: Clean up nfs4_reclaim_lease")
Cc: stable@vger.kernel.org
Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Closes: https://lore.kernel.org/linux-nfs/55da00d4-a656-4ed2-ae57-7f881297a1b2@huawei.com/
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
---
 fs/nfs/nfs4state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 305a772e5497..817327e73d88 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2012,7 +2012,7 @@ static int nfs4_reclaim_lease(struct nfs_client *clp)
 		return nfs4_handle_reclaim_lease_error(clp, status);
 	if (test_and_clear_bit(NFS4CLNT_SERVER_SCOPE_MISMATCH, &clp->cl_state))
 		nfs4_state_start_reclaim_nograce(clp);
-	if (!test_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state))
+	else
 		set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);
 	clear_bit(NFS4CLNT_CHECK_LEASE, &clp->cl_state);
 	clear_bit(NFS4CLNT_LEASE_EXPIRED, &clp->cl_state);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-23  9:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-22  6:44 [PATCH] NFSv4: Fix state recovery deadlock when server misses grace period Zhihao Cheng
2026-04-22  6:55 ` Zhihao Cheng
2026-04-22 12:38   ` Trond Myklebust
2026-04-23  9:05     ` Zhihao Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox