From: "J. Bruce Fields" <bfields@fieldses.org>
To: Jeff Layton <jlayton@poochiereds.net>
Cc: linux-nfs@vger.kernel.org, Andrew W Elble <aweits@rit.edu>
Subject: Re: [PATCH] nfsd: serialize state seqid morphing operations
Date: Tue, 29 Sep 2015 19:14:27 -0400 [thread overview]
Message-ID: <20150929231427.GA12002@fieldses.org> (raw)
In-Reply-To: <20150929172638.6cc775ff@tlielax.poochiereds.net>
On Tue, Sep 29, 2015 at 05:26:38PM -0400, Jeff Layton wrote:
> On Tue, 29 Sep 2015 17:11:55 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
>
> > On Thu, Sep 17, 2015 at 07:47:08AM -0400, Jeff Layton wrote:
> > > Andrew was seeing a race occur when an OPEN and OPEN_DOWNGRADE were
> > > running in parallel. The server would receive the OPEN_DOWNGRADE first
> > > and check its seqid, but then an OPEN would race in and bump it. The
> > > OPEN_DOWNGRADE would then complete and bump the seqid again. The result
> > > was that the OPEN_DOWNGRADE would be applied after the OPEN, even though
> > > it should have been rejected since the seqid changed.
> > >
> > > The only recourse we have here I think is to serialize operations that
> > > bump the seqid in a stateid, particularly when we're given a seqid in
> > > the call. To address this, we add a new rw_semaphore to the
> > > nfs4_ol_stateid struct. We do a down_write prior to checking the seqid
> > > after looking up the stateid to ensure that nothing else is going to
> > > bump it while we're operating on it.
> > >
> > > In the case of OPEN, we do a down_read, as the call doesn't contain a
> > > seqid. Those can run in parallel -- we just need to serialize them when
> > > there is a concurrent OPEN_DOWNGRADE or CLOSE.
> > >
> > > LOCK and LOCKU however always take the write lock as there is no
> > > opportunity for parallelizing those.
> > >
> > > Reported-and-Tested-by: Andrew W Elble <aweits@rit.edu>
> > > Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> > > ---
> > > fs/nfsd/nfs4state.c | 33 ++++++++++++++++++++++++++++-----
> > > fs/nfsd/state.h | 19 ++++++++++---------
> > > 2 files changed, 38 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index 0f1d5691b795..1b39edf10b67 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -3360,6 +3360,7 @@ static void init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp,
> > > stp->st_access_bmap = 0;
> > > stp->st_deny_bmap = 0;
> > > stp->st_openstp = NULL;
> > > + init_rwsem(&stp->st_rwsem);
> > > spin_lock(&oo->oo_owner.so_client->cl_lock);
> > > list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
> > > spin_lock(&fp->fi_lock);
> > > @@ -4187,15 +4188,20 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
> > > */
> > > if (stp) {
> > > /* Stateid was found, this is an OPEN upgrade */
> > > + down_read(&stp->st_rwsem);
> > > status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open);
> > > - if (status)
> > > + if (status) {
> > > + up_read(&stp->st_rwsem);
> > > goto out;
> > > + }
> > > } else {
> > > stp = open->op_stp;
> > > open->op_stp = NULL;
> > > init_open_stateid(stp, fp, open);
> > > + down_read(&stp->st_rwsem);
> > > status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
> > > if (status) {
> > > + up_read(&stp->st_rwsem);
> > > release_open_stateid(stp);
> > > goto out;
> > > }
> > > @@ -4207,6 +4213,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
> > > }
> > > update_stateid(&stp->st_stid.sc_stateid);
> > > memcpy(&open->op_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
> > > + up_read(&stp->st_rwsem);
> > >
> > > if (nfsd4_has_session(&resp->cstate)) {
> > > if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) {
> >
> > The patch looks good, but:
> >
> > Does it matter that we don't have an exclusive lock over that
> > update_stateid?
> >
> > I think there's at least one small bug there:
> >
> > static inline void update_stateid(stateid_t *stateid)
> > {
> > stateid->si_generation++;
> > /* Wraparound recommendation from 3530bis-13 9.1.3.2: */
> > if (stateid->si_generation == 0)
> > stateid->si_generation = 1;
> > }
> >
> > The si_generation increment isn't atomic, and even if it were the wraparound
> > handling definitely wouldn't. That's a pretty small race.
> >
>
> Yeah, I eyeballed that some time ago and convinced myself it was ok,
> but I think you're right that there is a potential race there. That
> counter sort of seems like something that ought to use atomics in some
> fashion. The wraparound is tricky, but could be done locklessly with
> cmpxchg, I think...
> > Does it also matter that this si_generation update isn't atomic with respect
> > to the actual open and upgrade of the share bits?
> >
>
> I don't think so. If you race with a concurrent OPEN upgrade, the
> server will either have what you requested or a superset of that. It's
> only OPEN_DOWNGRADE and CLOSE that need full serialization.
IO also takes stateids, and I think it'd be interesting to think about
scenarios that involve concurrent opens and reads or writes. For
example:
- process_open2 upgrades R to RW
- Write processed with si_generation=1
- process_open2 bumps si_generation
The write succeeds, but it was performed with a stateid that represented
a read-only open. I believe that write should have gotten either
OPENMODE (if it happened before the open) or OLD_STATEID (if it happened
after).
I don't know if that's actually a problem in practice.
Hm, I also wonder about the window between the si_generation bump and
memcpy. Multiple concurrent opens could end up returning the same
stateid:
- si_generation++
- si_generation++
- memcpy new stateid
- memcpy new stateid
Again, I'm not sure if that will actually cause application-visible
problems.
--b.
next prev parent reply other threads:[~2015-09-29 23:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-17 11:47 [PATCH] nfsd: serialize state seqid morphing operations Jeff Layton
2015-09-29 21:11 ` J. Bruce Fields
2015-09-29 21:26 ` Jeff Layton
2015-09-29 23:14 ` J. Bruce Fields [this message]
2015-09-30 10:53 ` Jeff Layton
2015-09-30 14:30 ` J. Bruce Fields
2015-09-30 14:35 ` Jeff Layton
2015-10-01 17:51 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150929231427.GA12002@fieldses.org \
--to=bfields@fieldses.org \
--cc=aweits@rit.edu \
--cc=jlayton@poochiereds.net \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.