public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* Another OPEN / OPEN_DOWNGRADE race
@ 2019-02-22  0:58 NeilBrown
  2019-02-22  4:06 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2019-02-22  0:58 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 2317 bytes --]


Hi,
 I have a report of an NFSv4.1 state management problem in our 4.4
 kernel which appears to be caused by a race between OPEN and
 OPEN_DOWNGRADE, and I don't think the race is fixed in mainline.

 The test program creates multiple threads which open a file O_RDWR and
 write to it, then opens for O_RDONLY and reads from it.
 A network trace which shows the problem suggests that at a point in
 time where there are some O_RDONLY opens and one O_RDWR open, a new
 O_RDWR open is requested just as the O_RDWR open is being closed.

 The close of the O_RDWR clears state->n_rdwr early so can_open_cached()
 fails for the O_RDWR open, and it needs to go to the server.
 The open for O_RDWR doesn't increment n_rdwr until after the open
 succeeds, so nfs4_close_prepare sees
    n_rdwr == 0
    n_rdonly > 0
    NFS_O_RDWR_STATE and NFS_O_RDONLY_STATE set
 which causes it to choose an OPEN_DOWNGRADE.

 What we see is a OPEN/share-all and an OPEN_DOWNGRADE/share-read
 request are sent one after the other without waiting for a reply.
 The OPEN is processed first, then the OPEN_DOWNGRADE, resulting in a
 state that only allows reads.  Then a WRITE is attempted which fails.
 This enters a infinite loop with 2 steps:
  - a WRITE gets NFS4ERR_OPENMODE
  - a TEST_STATEID succeeds

 Once an OPEN/share-all request has been sent, it isn't really correct
 to send an OPEN_DOWNGRADE/share-read request.  However the fact that
 the OPEN has been sent isn't visible to nfs4_close_prepare().

 There is an asymmetry between open and close w.r.t. updating the
 n_[mode] counter and setting the NFS_O_mode_STATE bits.

 For close, the counter is decremented, then the server is told, then
 the state bits are cleared.
 For open, the counter and state bits are both cleared after the server
 is asked.

 I understand that this is probably because the OPEN could fail, and
 incrementing a counter before we are sure of success seems unwise.  But
 doing so would allow us to avoid the incorrect OPEN_DOWNGRADE.

 Any suggestions on what a good solution would look like?  Does it ever
 make sense for an OPEN request to be concurrent with a CLOSE or
 OPEN_DOWNGRADE ??  Maybe they should be serialized with each other
 (maybe not as fully serialized as NFSv4.0, but more than they currently
 are in NFSv4.1)

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-02-23  1:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-02-22  0:58 Another OPEN / OPEN_DOWNGRADE race NeilBrown
2019-02-22  4:06 ` Trond Myklebust
2019-02-22  5:02   ` NeilBrown
2019-02-22 13:31     ` Trond Myklebust
2019-02-23  1:22       ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox