From: Jeff Layton <jlayton@kernel.org>
To: NeilBrown <neilb@suse.de>, Chuck Lever <chuck.lever@oracle.com>,
Dai Ngo <Dai.Ngo@oracle.com>,
Olga Kornievskaia <kolga@netapp.com>, Tom Talpey <tom@talpey.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] nfsd: fix RELEASE_LOCKOWNER
Date: Mon, 22 Jan 2024 07:53:09 -0500 [thread overview]
Message-ID: <df56708aced4ec40e62592ed055ba0075b429d8f.camel@kernel.org> (raw)
In-Reply-To: <170589589641.23031.16356786177193106749@noble.neil.brown.name>
On Mon, 2024-01-22 at 14:58 +1100, NeilBrown wrote:
> The test on so_count in nfsd4_release_lockowner() is nonsense and
> harmful. Revert to using check_for_locks(), changing that to not sleep.
>
> First: harmful.
> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
> test on so_count can transiently return a false positive resulting in a
> return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is
> clearly a protocol violation and with the Linux NFS client it can cause
> incorrect behaviour.
>
> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
> processing a LOCK request which failed because, at the time that request
> was received, the given owner held a conflicting lock, then the nfsd
> thread processing that LOCK request can hold a reference (conflock) to
> the lock owner that causes nfsd4_release_lockowner() to return an
> incorrect error.
>
> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
> it knows that the error is impossible. It assumes the lock owner was in
> fact released so it feels free to use the same lock owner identifier in
> some later locking request.
>
> When it does reuse a lock owner identifier for which a previous RELEASE
> failed, it will naturally use a lock_seqid of zero. However the server,
> which didn't release the lock owner, will expect a larger lock_seqid and
> so will respond with NFS4ERR_BAD_SEQID.
>
> So clearly it is harmful to allow a false positive, which testing
> so_count allows.
>
> The test is nonsense because ... well... it doesn't mean anything.
>
> so_count is the sum of three different counts.
> 1/ the set of states listed on so_stateids
> 2/ the set of active vfs locks owned by any of those states
> 3/ various transient counts such as for conflicting locks.
>
> When it is tested against '2' it is clear that one of these is the
> transient reference obtained by find_lockowner_str_locked(). It is not
> clear what the other one is expected to be.
>
> In practice, the count is often 2 because there is precisely one state
> on so_stateids. If there were more, this would fail.
>
> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
> In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in
> all the lock states being removed, and so the lockowner being discarded
> (it is removed when there are no more references which usually happens
> when the lock state is discarded). When nfsd4_release_lockowner() finds
> that the lock owner doesn't exist, it returns success.
>
> The other case shows an so_count of '2' and precisely one state listed
> in so_stateid. It appears that the Linux client uses a separate lock
> owner for each file resulting in one lock state per lock owner, so this
> test on '2' is safe. For another client it might not be safe.
>
> So this patch changes check_for_locks() to use the (newish)
> find_any_file_locked() so that it doesn't take a reference on the
> nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With
> this check is it safe to restore the use of check_for_locks() rather
> than testing so_count against the mysterious '2'.
>
> Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()")
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> fs/nfsd/nfs4state.c | 26 +++++++++++++++-----------
> 1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 2fa54cfd4882..6dc6340e2852 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -7911,14 +7911,16 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
> {
> struct file_lock *fl;
> int status = false;
> - struct nfsd_file *nf = find_any_file(fp);
> + struct nfsd_file *nf;
> struct inode *inode;
> struct file_lock_context *flctx;
>
> + spin_lock(&fp->fi_lock);
> + nf = find_any_file_locked(fp);
> if (!nf) {
> /* Any valid lock stateid should have some sort of access */
> WARN_ON_ONCE(1);
> - return status;
> + goto out;
> }
>
> inode = file_inode(nf->nf_file);
> @@ -7934,7 +7936,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
> }
> spin_unlock(&flctx->flc_lock);
> }
> - nfsd_file_put(nf);
> +out:
> + spin_unlock(&fp->fi_lock);
> return status;
> }
>
^^^
Nice cleanup here! Not having to take a reference in this path is great.
> @@ -7944,10 +7947,8 @@ check_for_locks(struct nfs4_file *fp, struct nfs4_lockowner *lowner)
> * @cstate: NFSv4 COMPOUND state
> * @u: RELEASE_LOCKOWNER arguments
> *
> - * The lockowner's so_count is bumped when a lock record is added
> - * or when copying a conflicting lock. The latter case is brief,
> - * but can lead to fleeting false positives when looking for
> - * locks-in-use.
> + * Check if theree are any locks still held and if not - free the lockowner
> + * and any lock state that is owned.
> *
> * Return values:
> * %nfs_ok: lockowner released or not found
> @@ -7983,10 +7984,13 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp,
> spin_unlock(&clp->cl_lock);
> return nfs_ok;
> }
> - if (atomic_read(&lo->lo_owner.so_count) != 2) {
> - spin_unlock(&clp->cl_lock);
> - nfs4_put_stateowner(&lo->lo_owner);
> - return nfserr_locks_held;
> +
> + list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) {
> + if (check_for_locks(stp->st_stid.sc_file, lo)) {
> + spin_unlock(&clp->cl_lock);
> + nfs4_put_stateowner(&lo->lo_owner);
> + return nfserr_locks_held;
> + }
> }
> unhash_lockowner_locked(lo);
> while (!list_empty(&lo->lo_owner.so_stateids)) {
Anytime I see codepaths checking reference counts for specific values,
that's always a red flag to me, and I've hated this particular so_count
check since we added it several years ago.
This is a much better solution.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2024-01-22 12:53 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-22 3:58 [PATCH] nfsd: fix RELEASE_LOCKOWNER NeilBrown
2024-01-22 12:53 ` Jeff Layton [this message]
2024-01-22 14:27 ` Chuck Lever
2024-01-22 21:57 ` NeilBrown
2024-01-22 23:14 ` Chuck Lever III
2024-01-22 23:18 ` Chuck Lever III
2024-01-22 23:31 ` NeilBrown
2024-01-22 23:44 ` Chuck Lever III
2024-01-23 0:18 ` NeilBrown
2024-01-23 0:52 ` Chuck Lever III
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=df56708aced4ec40e62592ed055ba0075b429d8f.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=kolga@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox