linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-nfs@vger.kernel.org, Andrew W Elble <aweits@rit.edu>
Subject: Re: [PATCH] nfsd: serialize state seqid morphing operations
Date: Tue, 29 Sep 2015 17:11:55 -0400	[thread overview]
Message-ID: <20150929211155.GI3190@fieldses.org> (raw)
In-Reply-To: <1442490428-29487-1-git-send-email-jeff.layton@primarydata.com>

On Thu, Sep 17, 2015 at 07:47:08AM -0400, Jeff Layton wrote:
> Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
> running in parallel. The server would receive the OPEN_DOWNGRADE first
> and check its seqid, but then an OPEN would race in and bump it. The
> OPEN_DOWNGRADE would then complete and bump the seqid again.  The result
> was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
> it should have been rejected since the seqid changed.
> 
> The only recourse we have here I think is to serialize operations that
> bump the seqid in a stateid, particularly when we're given a seqid in
> the call. To address this, we add a new rw_semaphore to the
> nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
> after looking up the stateid to ensure that nothing else is going to
> bump it while we're operating on it.
> 
> In the case of OPEN, we do a down_read, as the call doesn't contain a
> seqid. Those can run in parallel -- we just need to serialize them when
> there is a concurrent OPEN_DOWNGRADE or CLOSE.
> 
> LOCK and LOCKU however always take the write lock as there is no
> opportunity for parallelizing those.
> 
> Reported-and-Tested-by: Andrew W Elble <aweits@rit.edu>
> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> ---
>  fs/nfsd/nfs4state.c | 33 ++++++++++++++++++++++++++++-----
>  fs/nfsd/state.h     | 19 ++++++++++---------
>  2 files changed, 38 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 0f1d5691b795..1b39edf10b67 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -3360,6 +3360,7 @@ static void init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp,
>  	stp->st_access_bmap = 0;
>  	stp->st_deny_bmap = 0;
>  	stp->st_openstp = NULL;
> +	init_rwsem(&stp->st_rwsem);
>  	spin_lock(&oo->oo_owner.so_client->cl_lock);
>  	list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
>  	spin_lock(&fp->fi_lock);
> @@ -4187,15 +4188,20 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
>  	 */
>  	if (stp) {
>  		/* Stateid was found, this is an OPEN upgrade */
> +		down_read(&stp->st_rwsem);
>  		status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open);
> -		if (status)
> +		if (status) {
> +			up_read(&stp->st_rwsem);
>  			goto out;
> +		}
>  	} else {
>  		stp = open->op_stp;
>  		open->op_stp = NULL;
>  		init_open_stateid(stp, fp, open);
> +		down_read(&stp->st_rwsem);
>  		status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
>  		if (status) {
> +			up_read(&stp->st_rwsem);
>  			release_open_stateid(stp);
>  			goto out;
>  		}
> @@ -4207,6 +4213,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
>  	}
>  	update_stateid(&stp->st_stid.sc_stateid);
>  	memcpy(&open->op_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
> +	up_read(&stp->st_rwsem);
>  
>  	if (nfsd4_has_session(&resp->cstate)) {
>  		if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) {

The patch looks good, but:

Does it matter that we don't have an exclusive lock over that
update_stateid?

I think there's at least one small bug there:

	static inline void update_stateid(stateid_t *stateid)
	{       
	        stateid->si_generation++;
	        /* Wraparound recommendation from 3530bis-13 9.1.3.2: */
	        if (stateid->si_generation == 0)
	                stateid->si_generation = 1;
	}

The si_generation increment isn't atomic, and even if it were the wraparound
handling definitely wouldn't.  That's a pretty small race.

Does it also matter that this si_generation update isn't atomic with respect
to the actual open and upgrade of the share bits?

--b.

> @@ -4819,10 +4826,13 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_
>  		 * revoked delegations are kept only for free_stateid.
>  		 */
>  		return nfserr_bad_stateid;
> +	down_write(&stp->st_rwsem);
>  	status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate));
> -	if (status)
> -		return status;
> -	return nfs4_check_fh(current_fh, &stp->st_stid);
> +	if (status == nfs_ok)
> +		status = nfs4_check_fh(current_fh, &stp->st_stid);
> +	if (status != nfs_ok)
> +		up_write(&stp->st_rwsem);
> +	return status;
>  }
>  
>  /* 
> @@ -4869,6 +4879,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs
>  		return status;
>  	oo = openowner(stp->st_stateowner);
>  	if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) {
> +		up_write(&stp->st_rwsem);
>  		nfs4_put_stid(&stp->st_stid);
>  		return nfserr_bad_stateid;
>  	}
> @@ -4899,11 +4910,14 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		goto out;
>  	oo = openowner(stp->st_stateowner);
>  	status = nfserr_bad_stateid;
> -	if (oo->oo_flags & NFS4_OO_CONFIRMED)
> +	if (oo->oo_flags & NFS4_OO_CONFIRMED) {
> +		up_write(&stp->st_rwsem);
>  		goto put_stateid;
> +	}
>  	oo->oo_flags |= NFS4_OO_CONFIRMED;
>  	update_stateid(&stp->st_stid.sc_stateid);
>  	memcpy(&oc->oc_resp_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
> +	up_write(&stp->st_rwsem);
>  	dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n",
>  		__func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid));
>  
> @@ -4982,6 +4996,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp,
>  	memcpy(&od->od_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
>  	status = nfs_ok;
>  put_stateid:
> +	up_write(&stp->st_rwsem);
>  	nfs4_put_stid(&stp->st_stid);
>  out:
>  	nfsd4_bump_seqid(cstate, status);
> @@ -5035,6 +5050,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  		goto out; 
>  	update_stateid(&stp->st_stid.sc_stateid);
>  	memcpy(&close->cl_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
> +	up_write(&stp->st_rwsem);
>  
>  	nfsd4_close_open_stateid(stp);
>  
> @@ -5260,6 +5276,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo,
>  	stp->st_access_bmap = 0;
>  	stp->st_deny_bmap = open_stp->st_deny_bmap;
>  	stp->st_openstp = open_stp;
> +	init_rwsem(&stp->st_rwsem);
>  	list_add(&stp->st_locks, &open_stp->st_locks);
>  	list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids);
>  	spin_lock(&fp->fi_lock);
> @@ -5428,6 +5445,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  					&open_stp, nn);
>  		if (status)
>  			goto out;
> +		up_write(&open_stp->st_rwsem);
>  		open_sop = openowner(open_stp->st_stateowner);
>  		status = nfserr_bad_stateid;
>  		if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid,
> @@ -5435,6 +5453,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  			goto out;
>  		status = lookup_or_create_lock_state(cstate, open_stp, lock,
>  							&lock_stp, &new);
> +		if (status == nfs_ok)
> +			down_write(&lock_stp->st_rwsem);
>  	} else {
>  		status = nfs4_preprocess_seqid_op(cstate,
>  				       lock->lk_old_lock_seqid,
> @@ -5540,6 +5560,8 @@ out:
>  		    seqid_mutating_err(ntohl(status)))
>  			lock_sop->lo_owner.so_seqid++;
>  
> +		up_write(&lock_stp->st_rwsem);
> +
>  		/*
>  		 * If this is a new, never-before-used stateid, and we are
>  		 * returning an error, then just go ahead and release it.
> @@ -5709,6 +5731,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
>  fput:
>  	fput(filp);
>  put_stateid:
> +	up_write(&stp->st_rwsem);
>  	nfs4_put_stid(&stp->st_stid);
>  out:
>  	nfsd4_bump_seqid(cstate, status);
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 583ffc13cae2..31bde12feefe 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -534,15 +534,16 @@ struct nfs4_file {
>   * Better suggestions welcome.
>   */
>  struct nfs4_ol_stateid {
> -	struct nfs4_stid    st_stid; /* must be first field */
> -	struct list_head              st_perfile;
> -	struct list_head              st_perstateowner;
> -	struct list_head              st_locks;
> -	struct nfs4_stateowner      * st_stateowner;
> -	struct nfs4_clnt_odstate    * st_clnt_odstate;
> -	unsigned char                 st_access_bmap;
> -	unsigned char                 st_deny_bmap;
> -	struct nfs4_ol_stateid         * st_openstp;
> +	struct nfs4_stid		st_stid;
> +	struct list_head		st_perfile;
> +	struct list_head		st_perstateowner;
> +	struct list_head		st_locks;
> +	struct nfs4_stateowner		*st_stateowner;
> +	struct nfs4_clnt_odstate	*st_clnt_odstate;
> +	unsigned char			st_access_bmap;
> +	unsigned char			st_deny_bmap;
> +	struct nfs4_ol_stateid		*st_openstp;
> +	struct rw_semaphore		st_rwsem;
>  };
>  
>  static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)
> -- 
> 2.4.3

  reply	other threads:[~2015-09-29 21:11 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-17 11:47 [PATCH] nfsd: serialize state seqid morphing operations Jeff Layton
2015-09-29 21:11 ` J. Bruce Fields [this message]
2015-09-29 21:26   ` Jeff Layton
2015-09-29 23:14     ` J. Bruce Fields
2015-09-30 10:53       ` Jeff Layton
2015-09-30 14:30         ` J. Bruce Fields
2015-09-30 14:35           ` Jeff Layton
2015-10-01 17:51 ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150929211155.GI3190@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=aweits@rit.edu \
    --cc=jlayton@poochiereds.net \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).