From: Trond Myklebust <trondmy@hammerspace.com>
To: "neilb@suse.com" <neilb@suse.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: Another OPEN / OPEN_DOWNGRADE race
Date: Fri, 22 Feb 2019 13:31:41 +0000 [thread overview]
Message-ID: <6b2eecda4f82a1628a25f44d792b4bfeb90a6a65.camel@hammerspace.com> (raw)
In-Reply-To: <87tvgwv2jj.fsf@notabene.neil.brown.name>
Hi Neil,
On Fri, 2019-02-22 at 16:02 +1100, NeilBrown wrote:
> On Fri, Feb 22 2019, Trond Myklebust wrote:
>
> > Hi Neil
> >
> > On Fri, 2019-02-22 at 11:58 +1100, NeilBrown wrote:
> > > Hi,
> > > I have a report of an NFSv4.1 state management problem in our
> > > 4.4
> > > kernel which appears to be caused by a race between OPEN and
> > > OPEN_DOWNGRADE, and I don't think the race is fixed in mainline.
> > >
> > > The test program creates multiple threads which open a file
> > > O_RDWR
> > > and
> > > write to it, then opens for O_RDONLY and reads from it.
> > > A network trace which shows the problem suggests that at a point
> > > in
> > > time where there are some O_RDONLY opens and one O_RDWR open, a
> > > new
> > > O_RDWR open is requested just as the O_RDWR open is being
> > > closed.
> > >
> > > The close of the O_RDWR clears state->n_rdwr early so
> > > can_open_cached()
> > > fails for the O_RDWR open, and it needs to go to the server.
> > > The open for O_RDWR doesn't increment n_rdwr until after the
> > > open
> > > succeeds, so nfs4_close_prepare sees
> > > n_rdwr == 0
> > > n_rdonly > 0
> > > NFS_O_RDWR_STATE and NFS_O_RDONLY_STATE set
> > > which causes it to choose an OPEN_DOWNGRADE.
> > >
> > > What we see is a OPEN/share-all and an OPEN_DOWNGRADE/share-read
> > > request are sent one after the other without waiting for a
> > > reply.
> > > The OPEN is processed first, then the OPEN_DOWNGRADE, resulting
> > > in a
> > > state that only allows reads. Then a WRITE is attempted which
> > > fails.
> > > This enters a infinite loop with 2 steps:
> > > - a WRITE gets NFS4ERR_OPENMODE
> > > - a TEST_STATEID succeeds
> > >
> > > Once an OPEN/share-all request has been sent, it isn't really
> > > correct
> > > to send an OPEN_DOWNGRADE/share-read request. However the fact
> > > that
> > > the OPEN has been sent isn't visible to nfs4_close_prepare().
> > >
> > > There is an asymmetry between open and close w.r.t. updating the
> > > n_[mode] counter and setting the NFS_O_mode_STATE bits.
> > >
> > > For close, the counter is decremented, then the server is told,
> > > then
> > > the state bits are cleared.
> > > For open, the counter and state bits are both cleared after the
> > > server
> > > is asked.
> > >
> > > I understand that this is probably because the OPEN could fail,
> > > and
> > > incrementing a counter before we are sure of success seems
> > > unwise. But
> > > doing so would allow us to avoid the incorrect OPEN_DOWNGRADE.
> > >
> > > Any suggestions on what a good solution would look like? Does
> > > it
> > > ever
> > > make sense for an OPEN request to be concurrent with a CLOSE or
> > > OPEN_DOWNGRADE ?? Maybe they should be serialized with each
> > > other
> > > (maybe not as fully serialized as NFSv4.0, but more than they
> > > currently
> > > are in NFSv4.1)
>
> Hi Trond,
> thanks for the reply.
>
> > If the CLOSE/OPEN_DOWNGRADE is processed by the server _after_ the
> > OPEN, then the stateid presented by the client will have an
> > incorrect
> > seqid, and so the server should reject the operation with a
> > NFS4ERR_OLD_STATEID.
>
> According to RFC5661, section 18.18.3, the DESCRIPTION of
> OPEN_DOWNGRADE,
>
> The seqid argument is not used in NFSv4.1, MAY be any value, and
> MUST be ignored by the server.
>
> So the fact that the stateid passed (3) is less than the stateids
> already returned for some OPEN operations (4 and 5), the server MUST
> still process the OPEN_DOWNGRADE, not give an error.
That refers to the explicit "seqid" parameter in struct
OPEN_DOWNGRADE4args. I'm referring to the seqid that constitutes the
first 4 bytes of the stateid.
In other words, I'm referring to the mechanism described here:
https://tools.ietf.org/html/rfc5661#section-8.2.2
Note that the Linux client doesn't ever set the stateid seqid to zero
on CLOSE or OPEN_DOWNGRADE precisely because we rely on the
NFS4ERR_OLD_STATEID mechanism to detect races with OPEN.
>
> > If, on the other hand, the CLOSE/OPEN_DOWNGRADE is processed before
> > the
> > OPEN, then the client should either see a new stateid with a new
> > 'other' field (CLOSE) or it will see the same stateid but with a
> > seqid
> > that does not match what it expects (OPEN_DOWNGRADE). In the
> > current
> > mainline code, the client will then attempt to wait for any
> > "missing"
> > seqids to get processed (see nfs_set_open_stateid_locked()).
>
> Hmm.. I don't have the waiting in my 4.4 code - I should probably
> make
> sure I understand that. However the tcpdump trace shows that this
> scenario isn't what is happening, so it isn't immediately relevant.
>
True. The waiting mechanism was introduced by commit c9399f21c215
("NFSv4: Fix OPEN / CLOSE race") in v4.15. We should probably have
kicked that down the stable patch mechanism.
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
next prev parent reply other threads:[~2019-02-22 13:31 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-22 0:58 Another OPEN / OPEN_DOWNGRADE race NeilBrown
2019-02-22 4:06 ` Trond Myklebust
2019-02-22 5:02 ` NeilBrown
2019-02-22 13:31 ` Trond Myklebust [this message]
2019-02-23 1:22 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6b2eecda4f82a1628a25f44d792b4bfeb90a6a65.camel@hammerspace.com \
--to=trondmy@hammerspace.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox