From: Jeff Layton <jlayton@poochiereds.net>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org, Andrew W Elble <aweits@rit.edu>
Subject: Re: [PATCH] nfsd: serialize state seqid morphing operations
Date: Tue, 29 Sep 2015 17:26:38 -0400 [thread overview]
Message-ID: <20150929172638.6cc775ff@tlielax.poochiereds.net> (raw)
In-Reply-To: <20150929211155.GI3190@fieldses.org>
On Tue, 29 Sep 2015 17:11:55 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:
> On Thu, Sep 17, 2015 at 07:47:08AM -0400, Jeff Layton wrote:
> > Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
> > running in parallel. The server would receive the OPEN_DOWNGRADE first
> > and check its seqid, but then an OPEN would race in and bump it. The
> > OPEN_DOWNGRADE would then complete and bump the seqid again. The result
> > was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
> > it should have been rejected since the seqid changed.
> >
> > The only recourse we have here I think is to serialize operations that
> > bump the seqid in a stateid, particularly when we're given a seqid in
> > the call. To address this, we add a new rw_semaphore to the
> > nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
> > after looking up the stateid to ensure that nothing else is going to
> > bump it while we're operating on it.
> >
> > In the case of OPEN, we do a down_read, as the call doesn't contain a
> > seqid. Those can run in parallel -- we just need to serialize them when
> > there is a concurrent OPEN_DOWNGRADE or CLOSE.
> >
> > LOCK and LOCKU however always take the write lock as there is no
> > opportunity for parallelizing those.
> >
> > Reported-and-Tested-by: Andrew W Elble <aweits@rit.edu>
> > Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> > ---
> > fs/nfsd/nfs4state.c | 33 ++++++++++++++++++++++++++++-----
> > fs/nfsd/state.h | 19 ++++++++++---------
> > 2 files changed, 38 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 0f1d5691b795..1b39edf10b67 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -3360,6 +3360,7 @@ static void init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp,
> > stp->st_access_bmap = 0;
> > stp->st_deny_bmap = 0;
> > stp->st_openstp = NULL;
> > + init_rwsem(&stp->st_rwsem);
> > spin_lock(&oo->oo_owner.so_client->cl_lock);
> > list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
> > spin_lock(&fp->fi_lock);
> > @@ -4187,15 +4188,20 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
> > */
> > if (stp) {
> > /* Stateid was found, this is an OPEN upgrade */
> > + down_read(&stp->st_rwsem);
> > status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open);
> > - if (status)
> > + if (status) {
> > + up_read(&stp->st_rwsem);
> > goto out;
> > + }
> > } else {
> > stp = open->op_stp;
> > open->op_stp = NULL;
> > init_open_stateid(stp, fp, open);
> > + down_read(&stp->st_rwsem);
> > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
> > if (status) {
> > + up_read(&stp->st_rwsem);
> > release_open_stateid(stp);
> > goto out;
> > }
> > @@ -4207,6 +4213,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
> > }
> > update_stateid(&stp->st_stid.sc_stateid);
> > memcpy(&open->op_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
> > + up_read(&stp->st_rwsem);
> >
> > if (nfsd4_has_session(&resp->cstate)) {
> > if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) {
>
> The patch looks good, but:
>
> Does it matter that we don't have an exclusive lock over that
> update_stateid?
>
> I think there's at least one small bug there:
>
> static inline void update_stateid(stateid_t *stateid)
> {
> stateid->si_generation++;
> /* Wraparound recommendation from 3530bis-13 9.1.3.2: */
> if (stateid->si_generation == 0)
> stateid->si_generation = 1;
> }
>
> The si_generation increment isn't atomic, and even if it were the wraparound
> handling definitely wouldn't. That's a pretty small race.
>
Yeah, I eyeballed that some time ago and convinced myself it was ok,
but I think you're right that there is a potential race there. That
counter sort of seems like something that ought to use atomics in some
fashion. The wraparound is tricky, but could be done locklessly with
cmpxchg, I think...
> Does it also matter that this si_generation update isn't atomic with respect
> to the actual open and upgrade of the share bits?
>
I don't think so. If you race with a concurrent OPEN upgrade, the
server will either have what you requested or a superset of that. It's
only OPEN_DOWNGRADE and CLOSE that need full serialization.
> --b.
>
> > @@ -4819,10 +4826,13 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_
> > * revoked delegations are kept only for free_stateid.
> > */
> > return nfserr_bad_stateid;
> > + down_write(&stp->st_rwsem);
> > status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate));
> > - if (status)
> > - return status;
> > - return nfs4_check_fh(current_fh, &stp->st_stid);
> > + if (status == nfs_ok)
> > + status = nfs4_check_fh(current_fh, &stp->st_stid);
> > + if (status != nfs_ok)
> > + up_write(&stp->st_rwsem);
> > + return status;
> > }
> >
> > /*
> > @@ -4869,6 +4879,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs
> > return status;
> > oo = openowner(stp->st_stateowner);
> > if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) {
> > + up_write(&stp->st_rwsem);
> > nfs4_put_stid(&stp->st_stid);
> > return nfserr_bad_stateid;
> > }
> > @@ -4899,11 +4910,14 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > goto out;
> > oo = openowner(stp->st_stateowner);
> > status = nfserr_bad_stateid;
> > - if (oo->oo_flags & NFS4_OO_CONFIRMED)
> > + if (oo->oo_flags & NFS4_OO_CONFIRMED) {
> > + up_write(&stp->st_rwsem);
> > goto put_stateid;
> > + }
> > oo->oo_flags |= NFS4_OO_CONFIRMED;
> > update_stateid(&stp->st_stid.sc_stateid);
> > memcpy(&oc->oc_resp_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
> > + up_write(&stp->st_rwsem);
> > dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n",
> > __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid));
> >
> > @@ -4982,6 +4996,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp,
> > memcpy(&od->od_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
> > status = nfs_ok;
> > put_stateid:
> > + up_write(&stp->st_rwsem);
> > nfs4_put_stid(&stp->st_stid);
> > out:
> > nfsd4_bump_seqid(cstate, status);
> > @@ -5035,6 +5050,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > goto out;
> > update_stateid(&stp->st_stid.sc_stateid);
> > memcpy(&close->cl_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
> > + up_write(&stp->st_rwsem);
> >
> > nfsd4_close_open_stateid(stp);
> >
> > @@ -5260,6 +5276,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo,
> > stp->st_access_bmap = 0;
> > stp->st_deny_bmap = open_stp->st_deny_bmap;
> > stp->st_openstp = open_stp;
> > + init_rwsem(&stp->st_rwsem);
> > list_add(&stp->st_locks, &open_stp->st_locks);
> > list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids);
> > spin_lock(&fp->fi_lock);
> > @@ -5428,6 +5445,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > &open_stp, nn);
> > if (status)
> > goto out;
> > + up_write(&open_stp->st_rwsem);
> > open_sop = openowner(open_stp->st_stateowner);
> > status = nfserr_bad_stateid;
> > if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid,
> > @@ -5435,6 +5453,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > goto out;
> > status = lookup_or_create_lock_state(cstate, open_stp, lock,
> > &lock_stp, &new);
> > + if (status == nfs_ok)
> > + down_write(&lock_stp->st_rwsem);
> > } else {
> > status = nfs4_preprocess_seqid_op(cstate,
> > lock->lk_old_lock_seqid,
> > @@ -5540,6 +5560,8 @@ out:
> > seqid_mutating_err(ntohl(status)))
> > lock_sop->lo_owner.so_seqid++;
> >
> > + up_write(&lock_stp->st_rwsem);
> > +
> > /*
> > * If this is a new, never-before-used stateid, and we are
> > * returning an error, then just go ahead and release it.
> > @@ -5709,6 +5731,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > fput:
> > fput(filp);
> > put_stateid:
> > + up_write(&stp->st_rwsem);
> > nfs4_put_stid(&stp->st_stid);
> > out:
> > nfsd4_bump_seqid(cstate, status);
> > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > index 583ffc13cae2..31bde12feefe 100644
> > --- a/fs/nfsd/state.h
> > +++ b/fs/nfsd/state.h
> > @@ -534,15 +534,16 @@ struct nfs4_file {
> > * Better suggestions welcome.
> > */
> > struct nfs4_ol_stateid {
> > - struct nfs4_stid st_stid; /* must be first field */
> > - struct list_head st_perfile;
> > - struct list_head st_perstateowner;
> > - struct list_head st_locks;
> > - struct nfs4_stateowner * st_stateowner;
> > - struct nfs4_clnt_odstate * st_clnt_odstate;
> > - unsigned char st_access_bmap;
> > - unsigned char st_deny_bmap;
> > - struct nfs4_ol_stateid * st_openstp;
> > + struct nfs4_stid st_stid;
> > + struct list_head st_perfile;
> > + struct list_head st_perstateowner;
> > + struct list_head st_locks;
> > + struct nfs4_stateowner *st_stateowner;
> > + struct nfs4_clnt_odstate *st_clnt_odstate;
> > + unsigned char st_access_bmap;
> > + unsigned char st_deny_bmap;
> > + struct nfs4_ol_stateid *st_openstp;
> > + struct rw_semaphore st_rwsem;
> > };
> >
> > static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)
> > --
> > 2.4.3
--
Jeff Layton <jlayton@poochiereds.net>
next prev parent reply other threads:[~2015-09-29 21:26 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-17 11:47 [PATCH] nfsd: serialize state seqid morphing operations Jeff Layton
2015-09-29 21:11 ` J. Bruce Fields
2015-09-29 21:26 ` Jeff Layton [this message]
2015-09-29 23:14 ` J. Bruce Fields
2015-09-30 10:53 ` Jeff Layton
2015-09-30 14:30 ` J. Bruce Fields
2015-09-30 14:35 ` Jeff Layton
2015-10-01 17:51 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150929172638.6cc775ff@tlielax.poochiereds.net \
--to=jlayton@poochiereds.net \
--cc=aweits@rit.edu \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).