From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: linux-nfs-owner@vger.kernel.org
Received: from fieldses.org ([174.143.236.118]:58340 "EHLO fieldses.org"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752430Ab2EYNqj (ORCPT <rfc822;linux-nfs@vger.kernel.org>);
	Fri, 25 May 2012 09:46:39 -0400
Date: Fri, 25 May 2012 09:46:36 -0400
To: Chuck Lever <chuck.lever@oracle.com>
Cc: bfields@redhat.com, linux-nfs@vger.kernel.org,
        Bryan Schumaker <bjschuma@netapp.com>
Subject: Re: [PATCH] NFSD: TEST_STATEID should not return
 NFS4ERR_STALE_STATEID
Message-ID: <20120525134636.GA28842@fieldses.org>
References: <20120524192932.8296.44698.stgit@lebasque.1015granger.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
In-Reply-To: <20120524192932.8296.44698.stgit@lebasque.1015granger.net>
From: "J. Bruce Fields" <bfields@fieldses.org>
Sender: linux-nfs-owner@vger.kernel.org
List-ID: <linux-nfs.vger.kernel.org>

On Thu, May 24, 2012 at 03:29:51PM -0400, Chuck Lever wrote:
> The error values that TEST_STATEID is allowed to return does not
> include NFS4ERR_STALE_STATEID.  In addition, RFC 5661 says:
> 
> 15.1.16.5.  NFS4ERR_STALE_STATEID (Error Code 10023)
> 
>    A stateid generated by an earlier server instance was used.  This
>    error is moot in NFSv4.1 because all operations that take a stateid
>    MUST be preceded by the SEQUENCE operation, and the earlier server
>    instance is detected by the session infrastructure that supports
>    SEQUENCE.
> 
> I triggered the NFS4ERR_STALE_STATEID during nograce recovery testing.
> My client had updated its boot verifier, so the server instance hadn't
> changed, but the client instance had.  Thus the server allowed the
> SEQUENCE operation, but returned NFS4ERR_STALE_STATEID on the
> TEST_STATEID operation.
> 
> After a client's lease expires, TEST_STATEID should report
> NFS4ERR_EXPIRED for state IDs that the client tries to recover.  I
> don't see a way to make that happen, though.

After the client's lease expires, the SEQUENCE operation will fail.

(Which I believe to be a valid, if unforgiving, server implementation.
If we were to implement "courtesy locks" in this case, I believe we'd
remember the clientid for longer, permit the SEQUENCE, and fail
individual stateid's with EXPIRED as appropriate?)

> Finally, RFC 5661, section 18.48.3 has this:
> 
>  o  Special stateids are always considered invalid (they result in the
>     error code NFS4ERR_BAD_STATEID).

Thanks for the explanation!

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
> Bruce, would you consider taking something like this?

Sure; nits:

>  fs/nfsd/nfs4state.c |    7 ++++---
>  1 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 9235cfa..ae1fab3 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -3365,12 +3365,13 @@ __be32 nfs4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
>  	struct nfs4_ol_stateid *ols;
>  	__be32 status;
>  
> +	if (ZERO_STATEID(stateid) || ONE_STATEID(stateid))
> +		return nfserr_bad_stateid;

Or inval?  This is just a buggy client.

>  	if (STALE_STATEID(stateid))
> -		return nfserr_stale_stateid;
> -
> +		return nfserr_bad_stateid;

Again, this is just a buggy client, since we shouldn't have gotten past
the SEQUENCE in this case unless the client's sending a stateid that's
actually someone else's.

If you think it's worth checking for those buggy client cases, we could
instaed check that stateid->si_opaque.so_clid and cl->clientid agree.
That'd cover the special-stateid checks too.

>  	s = find_stateid(cl, stateid);
>  	if (!s)
> -		 return nfserr_stale_stateid;
> +		 return nfserr_bad_stateid;

So this must be the case you actually hit.  Agreed with this change.


--b.

>  	status = check_stateid_generation(stateid, &s->sc_stateid, 1);
>  	if (status)
>  		return status;
>