From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B11A283FDC; Fri, 29 May 2026 16:05:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780070712; cv=none; b=JJm3dShFA4ML+Nfv6GBlqzUJNClDoFiKqFVodvldgqxbi3Tp6lSETUpFCJopHQnhlVcMwdwx9YSHLn1BWm7bpAA+Fu7A56NAlujDHFdH0I8z+wsvekzpmVx0kSCCbOpoqotSooUUj3RjfZrwjkBEKq7nx4Aj5AW2WlFs2BV9VoU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780070712; c=relaxed/simple; bh=go/O26jSK+55BXNe8Mbksu4VOiqI+cqaiVz/B6htdVM=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=rN/V2rbFVnOmpnf07YugXluTL34vvHV6X0eBzbD+4nodSvYfbEVCcIO9uk6QRPsyMZhp2RcUTcZQPLYZKLMjpIVNs286ZOgxjRv1gP1VTDCCvq4puWOqE/WMEZfauPvQyOTYVCboRFWRxqASiLOOEm2D4cB8DLQTb4Zdhcr1oC0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=VFE6IoO8; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="VFE6IoO8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 021481F00893; Fri, 29 May 2026 16:05:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780070711; bh=N3fnnX7VE9PwhBi0D6DVowo2N8Sz/N5+UcweBLjWKgc=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=VFE6IoO8nqYZZS+vP+OgRIAvXWwEDcBW5aIx7IAlcq4O3b6WjRFAgLEjyYi89XlKf 3ylr1DTJxO9UYwYbHvEOYNwacBJ1BWbC5VWZcASswdUNNPW39KYvyRmrEWTH2syyCw Md25KZg9pCb4ed0WzxIGf8ogeSUhAPlvRsvDpdds7PWcUl+vcsgr8j3qyKwoy4u9NV w3piJvmrkBsQVEGZx7vG2ZC8BMm0oh8H9ocCo/bM9b6gMZgEod1R2JjcvQGQLrTlj8 EK93kCUmvFUPGlMnK4JX3bQ+c/MXaaTgIWGrmB4MikMfMSDs+5mf/b0gBP+9MwrCmI ax23i+gzahBnw== Message-ID: Date: Fri, 29 May 2026 12:05:09 -0400 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 03/10] nfsd: serialize nfsd4_end_grace() with atomic test-and-set To: Jeff Layton , Chuck Lever , NeilBrown , Olga Kornievskaia , Dai Ngo , Tom Talpey , "J. Bruce Fields" , Scott Mayhew , Trond Myklebust , Andreas Gruenbacher , Mike Snitzer , Rick Macklem Cc: Chris Mason , linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org References: <20260528-nfsd-fixes-v1-0-e78708eff77d@kernel.org> <20260528-nfsd-fixes-v1-3-e78708eff77d@kernel.org> From: Chuck Lever Content-Language: en-US Organization: kernel.org In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 5/29/26 11:57 AM, Jeff Layton wrote: > On Fri, 2026-05-29 at 11:38 -0400, Chuck Lever wrote: >> >> On Thu, May 28, 2026, at 5:55 PM, Jeff Layton wrote: >>> From: Chris Mason >>> >>> nfsd4_end_grace() guards its drain path with a plain bool: >>> >>> if (nn->grace_ended) >>> return; >>> nn->grace_ended = true; >>> >>> The read and the write are independent, and nothing in struct >>> nfsd_net serializes them. At least two contexts can reach this >>> code with no lock held: >>> >>> laundromat path >>> laundry_wq kworker >>> nfs4_laundromat() >>> nfsd4_end_grace() >>> >>> RECLAIM_COMPLETE path >>> nfsd compound kthread >>> nfsd4_reclaim_complete() >>> inc_reclaim_complete() >>> nfsd4_end_grace() >>> >>> Both callers can observe grace_ended == false on different CPUs, >>> both store true, and both proceed into nfsd4_record_grace_done(), >>> which invokes the active client_tracking_ops->grace_done callback. >>> For tracking ops that drain reclaim_str_hashtbl (legacy_tracking_ops >>> via nfsd4_recdir_purge_old, and the cld v1+ ops via >>> nfsd4_cld_grace_done), grace_done calls nfs4_release_reclaim(), >>> which walks every bucket of reclaim_str_hashtbl with no lock and >>> calls nfs4_remove_reclaim_record() (list_del + kfree) on each >>> entry. Two concurrent walkers corrupt the list and double-free >>> every nfs4_client_reclaim. A concurrent nfsd4_find_reclaim_client() >>> iterating the same bucket reads through freed memory. >>> >>> A third call site exists in nfs4_state_start_net() on the >>> skip_grace startup path, but it runs under nfsd_mutex before any >>> client has connected and before the laundromat's first delayed >>> work fires, so it cannot race with the two callers above. >>> >>> Fix by replacing the read/write pair with try_cmpxchg() so exactly >>> one caller transitions grace_ended from false to true and proceeds >>> into the drain; the loser returns immediately. bool supports >>> 1-byte cmpxchg on all supported architectures, and no lock >>> ordering changes are needed. >>> >>> Fixes: 362063a595be ("nfsd: keep a tally of RECLAIM_COMPLETE operations >>> when using nfsdcld") >>> Assisted-by: kres:claude-opus-4-7 >>> Signed-off-by: Chris Mason >>> --- >>> fs/nfsd/nfs4state.c | 17 ++++++++++++++--- >>> 1 file changed, 14 insertions(+), 3 deletions(-) >>> >>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>> index f4d12dbcf97b..dc4ac541436f 100644 >>> --- a/fs/nfsd/nfs4state.c >>> +++ b/fs/nfsd/nfs4state.c >>> @@ -7022,12 +7022,23 @@ nfsd4_renew(struct svc_rqst *rqstp, struct >>> nfsd4_compound_state *cstate, >>> static void >>> nfsd4_end_grace(struct nfsd_net *nn) >>> { >>> - /* do nothing if grace period already ended */ >>> - if (nn->grace_ended) >>> + bool expected = false; >>> + >>> + /* >>> + * nfsd4_end_grace() can be entered concurrently from the >>> + * laundromat workqueue and from an nfsd compound thread >>> + * handling RECLAIM_COMPLETE. Without serialization, both >>> + * callers can observe grace_ended==false and proceed into >>> + * nfsd4_record_grace_done(). For tracking ops whose >>> + * grace_done drains reclaim_str_hashtbl, that results in >>> + * list corruption and a double free of every >>> + * nfs4_client_reclaim entry. Use an atomic test-and-set so >>> + * exactly one caller proceeds. >>> + */ >>> + if (!try_cmpxchg(&nn->grace_ended, &expected, true)) >>> return; >>> >>> trace_nfsd_grace_complete(nn); >>> - nn->grace_ended = true; >>> /* >>> * If the server goes down again right now, an NFSv4 >>> * client will still be allowed to reclaim after it comes back up, >>> >>> -- >>> 2.54.0 >> >> Seems like the usual idiom for something like this is an atomic >> bit op. Perhaps try_cmpxchg on a boolean variable is not going >> to behave as you expect on every hardware platform. > > We just need a single flag here though. try_cmpxchg() had better work > the same way on every platform or a lot of stuff is FUBAR. Where > wouldn't it? Codex suggests on Hexagon, cmpxchg grabs more than just that boolean. -- Chuck Lever