From: NeilBrown <neilb@suse.com>
To: Trond Myklebust <trondmy@hammerspace.com>,
"linux-nfs\@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: Another OPEN / OPEN_DOWNGRADE race
Date: Sat, 23 Feb 2019 12:22:24 +1100 [thread overview]
Message-ID: <87imxbuwmn.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <6b2eecda4f82a1628a25f44d792b4bfeb90a6a65.camel@hammerspace.com>
[-- Attachment #1: Type: text/plain, Size: 5742 bytes --]
On Fri, Feb 22 2019, Trond Myklebust wrote:
> Hi Neil,
>
> On Fri, 2019-02-22 at 16:02 +1100, NeilBrown wrote:
>> On Fri, Feb 22 2019, Trond Myklebust wrote:
>>
>> > Hi Neil
>> >
>> > On Fri, 2019-02-22 at 11:58 +1100, NeilBrown wrote:
>> > > Hi,
>> > > I have a report of an NFSv4.1 state management problem in our
>> > > 4.4
>> > > kernel which appears to be caused by a race between OPEN and
>> > > OPEN_DOWNGRADE, and I don't think the race is fixed in mainline.
>> > >
>> > > The test program creates multiple threads which open a file
>> > > O_RDWR
>> > > and
>> > > write to it, then opens for O_RDONLY and reads from it.
>> > > A network trace which shows the problem suggests that at a point
>> > > in
>> > > time where there are some O_RDONLY opens and one O_RDWR open, a
>> > > new
>> > > O_RDWR open is requested just as the O_RDWR open is being
>> > > closed.
>> > >
>> > > The close of the O_RDWR clears state->n_rdwr early so
>> > > can_open_cached()
>> > > fails for the O_RDWR open, and it needs to go to the server.
>> > > The open for O_RDWR doesn't increment n_rdwr until after the
>> > > open
>> > > succeeds, so nfs4_close_prepare sees
>> > > n_rdwr == 0
>> > > n_rdonly > 0
>> > > NFS_O_RDWR_STATE and NFS_O_RDONLY_STATE set
>> > > which causes it to choose an OPEN_DOWNGRADE.
>> > >
>> > > What we see is a OPEN/share-all and an OPEN_DOWNGRADE/share-read
>> > > request are sent one after the other without waiting for a
>> > > reply.
>> > > The OPEN is processed first, then the OPEN_DOWNGRADE, resulting
>> > > in a
>> > > state that only allows reads. Then a WRITE is attempted which
>> > > fails.
>> > > This enters a infinite loop with 2 steps:
>> > > - a WRITE gets NFS4ERR_OPENMODE
>> > > - a TEST_STATEID succeeds
>> > >
>> > > Once an OPEN/share-all request has been sent, it isn't really
>> > > correct
>> > > to send an OPEN_DOWNGRADE/share-read request. However the fact
>> > > that
>> > > the OPEN has been sent isn't visible to nfs4_close_prepare().
>> > >
>> > > There is an asymmetry between open and close w.r.t. updating the
>> > > n_[mode] counter and setting the NFS_O_mode_STATE bits.
>> > >
>> > > For close, the counter is decremented, then the server is told,
>> > > then
>> > > the state bits are cleared.
>> > > For open, the counter and state bits are both cleared after the
>> > > server
>> > > is asked.
>> > >
>> > > I understand that this is probably because the OPEN could fail,
>> > > and
>> > > incrementing a counter before we are sure of success seems
>> > > unwise. But
>> > > doing so would allow us to avoid the incorrect OPEN_DOWNGRADE.
>> > >
>> > > Any suggestions on what a good solution would look like? Does
>> > > it
>> > > ever
>> > > make sense for an OPEN request to be concurrent with a CLOSE or
>> > > OPEN_DOWNGRADE ?? Maybe they should be serialized with each
>> > > other
>> > > (maybe not as fully serialized as NFSv4.0, but more than they
>> > > currently
>> > > are in NFSv4.1)
>>
>> Hi Trond,
>> thanks for the reply.
>>
>> > If the CLOSE/OPEN_DOWNGRADE is processed by the server _after_ the
>> > OPEN, then the stateid presented by the client will have an
>> > incorrect
>> > seqid, and so the server should reject the operation with a
>> > NFS4ERR_OLD_STATEID.
>>
>> According to RFC5661, section 18.18.3, the DESCRIPTION of
>> OPEN_DOWNGRADE,
>>
>> The seqid argument is not used in NFSv4.1, MAY be any value, and
>> MUST be ignored by the server.
>>
>> So the fact that the stateid passed (3) is less than the stateids
>> already returned for some OPEN operations (4 and 5), the server MUST
>> still process the OPEN_DOWNGRADE, not give an error.
>
> That refers to the explicit "seqid" parameter in struct
> OPEN_DOWNGRADE4args. I'm referring to the seqid that constitutes the
> first 4 bytes of the stateid.
>
> In other words, I'm referring to the mechanism described here:
> https://tools.ietf.org/html/rfc5661#section-8.2.2
>
> Note that the Linux client doesn't ever set the stateid seqid to zero
> on CLOSE or OPEN_DOWNGRADE precisely because we rely on the
> NFS4ERR_OLD_STATEID mechanism to detect races with OPEN.
Ahhh, I had missed that there were two seqids in OPEN_DOWNGRADE, on in
the stateid, one not.
Makes sense now.
With that perspective, it is pretty clear that the server is
misbehaving.
The OPEN_DOWNGRADE presents a seqid of 3 and after returning two open
requests with seqids of 4 and 5, the server processes that
OPEN_DOWNGRADE without error and returns a seqid of 6.
Thanks a lot,
NeilBrown
>
>>
>> > If, on the other hand, the CLOSE/OPEN_DOWNGRADE is processed before
>> > the
>> > OPEN, then the client should either see a new stateid with a new
>> > 'other' field (CLOSE) or it will see the same stateid but with a
>> > seqid
>> > that does not match what it expects (OPEN_DOWNGRADE). In the
>> > current
>> > mainline code, the client will then attempt to wait for any
>> > "missing"
>> > seqids to get processed (see nfs_set_open_stateid_locked()).
>>
>> Hmm.. I don't have the waiting in my 4.4 code - I should probably
>> make
>> sure I understand that. However the tcpdump trace shows that this
>> scenario isn't what is happening, so it isn't immediately relevant.
>>
>
> True. The waiting mechanism was introduced by commit c9399f21c215
> ("NFSv4: Fix OPEN / CLOSE race") in v4.15. We should probably have
> kicked that down the stable patch mechanism.
> Cheers
> Trond
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
prev parent reply other threads:[~2019-02-23 1:22 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-22 0:58 Another OPEN / OPEN_DOWNGRADE race NeilBrown
2019-02-22 4:06 ` Trond Myklebust
2019-02-22 5:02 ` NeilBrown
2019-02-22 13:31 ` Trond Myklebust
2019-02-23 1:22 ` NeilBrown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87imxbuwmn.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trondmy@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox