* Linux client misses lack of open-confirm? @ 2007-12-22 4:15 Jeff Garzik 2007-12-22 15:27 ` Trond Myklebust 0 siblings, 1 reply; 3+ messages in thread From: Jeff Garzik @ 2007-12-22 4:15 UTC (permalink / raw) To: NFS list While debugging my NFS server, I may have caught a Linux client bug. My server is currently buggy, in that, it never sets the OPEN4_RESULT_CONFIRM bit after an OPEN with a new owner. Shockingly, I can pass ~530 pynfs tests, fsx-linux [Linux v4 client], and build a kernel [Linux v4 client] even with such brokenness. ;-) Anyway, the Linux NFSv4 client (2.6.24-rc6) seems quite happy with this state of affairs, right until CLOSE time, when it passes "seqid + 2" to my server rather than the expected "seqid + 1". Though I am quite happy that Linux managed to workaround my stupid server and store data successfully _anyway_, I thought it was worth commenting. I was assuming either a) Linux would notice the lack of OPEN4_RESULT_CONFIRM and complain accordingly, or, b) Linux would generate a correct seqid, taking into account the fact that it did not issue OPEN_CONFIRM. As you can see from the wireshark-0.99.7-2.fc8 binary dump at http://gtf.org/garzik/misc/dump.bz2 (33k compressed) we see many examples of C: OPEN (seqid == 0) S: NFS4_OK C: [perhaps some intervening READ or WRITE or *ATTR] S: [replies as expected] C: CLOSE (seqid == 2) S: NFS4ERR_BAD_SEQID If you feel this behavior is fine given a broken server, that's cool... I just figured I would post in case somebody cared about this data point. Jeff P.S. I really really hate stateid/seqids at this point. RFC nonwithstanding, they are basically undocumented. I am reduced to poking through NFSv4 WG archives and Linux kernel code to find out what my server should be doing. pynfs is no help here, either. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Linux client misses lack of open-confirm? 2007-12-22 4:15 Linux client misses lack of open-confirm? Jeff Garzik @ 2007-12-22 15:27 ` Trond Myklebust [not found] ` <1198337249.7741.52.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 3+ messages in thread From: Trond Myklebust @ 2007-12-22 15:27 UTC (permalink / raw) To: Jeff Garzik; +Cc: NFS list On Fri, 2007-12-21 at 23:15 -0500, Jeff Garzik wrote: > While debugging my NFS server, I may have caught a Linux client bug. > > My server is currently buggy, in that, it never sets the > OPEN4_RESULT_CONFIRM bit after an OPEN with a new owner. Shockingly, I > can pass ~530 pynfs tests, fsx-linux [Linux v4 client], and build a > kernel [Linux v4 client] even with such brokenness. ;-) > > Anyway, the Linux NFSv4 client (2.6.24-rc6) seems quite happy with this > state of affairs, right until CLOSE time, when it passes "seqid + 2" to > my server rather than the expected "seqid + 1". > > Though I am quite happy that Linux managed to workaround my stupid > server and store data successfully _anyway_, I thought it was worth > commenting. I was assuming either > > a) Linux would notice the lack of OPEN4_RESULT_CONFIRM and > complain accordingly, or, > > b) Linux would generate a correct seqid, taking into account > the fact that it did not issue OPEN_CONFIRM. > > As you can see from the wireshark-0.99.7-2.fc8 binary dump at > > http://gtf.org/garzik/misc/dump.bz2 (33k compressed) > > we see many examples of > > C: OPEN (seqid == 0) > S: NFS4_OK > > C: [perhaps some intervening READ or WRITE or *ATTR] > S: [replies as expected] > > C: CLOSE (seqid == 2) > S: NFS4ERR_BAD_SEQID > > If you feel this behavior is fine given a broken server, that's cool... > I just figured I would post in case somebody cared about this data point. Hmm... That's not good. It is perfectly legal for a server to not request OPEN4_RESULT_CONFIRM (although it is probably not a very good idea), and the client should be able to cope with that. I'll have a look at what is going on there. > P.S. I really really hate stateid/seqids at this point. RFC > nonwithstanding, they are basically undocumented. I am reduced to > poking through NFSv4 WG archives and Linux kernel code to find out what > my server should be doing. pynfs is no help here, either. The primary function of seqids is to allow the server to distinguish replayed non-idempotent RPC requests from new requests, so their properties are really quite simple: * If the seqid presented by the client is in sequence, then the server is supposed to handle the request. * If the seqid matches that of the last request, then the server is supposed to replay the reply. * If the seqid is completely out of sequence, then the server should return the BAD_SEQID error. As for stateids, their purpose is to allow the server to figure out to which client it is talking, and to track what state the client thinks it is holding. Apart from the seqid field (which is there in order to track the ordering of OPEN requests), a stateid is an opaque structure. The only really important requirement here is that you need to be able to distinguish stale state from valid state so that you can fence off RPC requests that refer to stale locks. Cheers Trond ^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <1198337249.7741.52.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: Linux client misses lack of open-confirm? [not found] ` <1198337249.7741.52.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2007-12-23 2:05 ` Jeff Garzik 0 siblings, 0 replies; 3+ messages in thread From: Jeff Garzik @ 2007-12-23 2:05 UTC (permalink / raw) To: Trond Myklebust; +Cc: NFS list Trond Myklebust wrote: > On Fri, 2007-12-21 at 23:15 -0500, Jeff Garzik wrote: >> While debugging my NFS server, I may have caught a Linux client bug. >> >> My server is currently buggy, in that, it never sets the >> OPEN4_RESULT_CONFIRM bit after an OPEN with a new owner. Shockingly, I >> can pass ~530 pynfs tests, fsx-linux [Linux v4 client], and build a >> kernel [Linux v4 client] even with such brokenness. ;-) >> >> Anyway, the Linux NFSv4 client (2.6.24-rc6) seems quite happy with this >> state of affairs, right until CLOSE time, when it passes "seqid + 2" to >> my server rather than the expected "seqid + 1". >> >> Though I am quite happy that Linux managed to workaround my stupid >> server and store data successfully _anyway_, I thought it was worth >> commenting. I was assuming either >> >> a) Linux would notice the lack of OPEN4_RESULT_CONFIRM and >> complain accordingly, or, >> >> b) Linux would generate a correct seqid, taking into account >> the fact that it did not issue OPEN_CONFIRM. >> >> As you can see from the wireshark-0.99.7-2.fc8 binary dump at >> >> http://gtf.org/garzik/misc/dump.bz2 (33k compressed) >> >> we see many examples of >> >> C: OPEN (seqid == 0) >> S: NFS4_OK >> >> C: [perhaps some intervening READ or WRITE or *ATTR] >> S: [replies as expected] >> >> C: CLOSE (seqid == 2) >> S: NFS4ERR_BAD_SEQID >> >> If you feel this behavior is fine given a broken server, that's cool... >> I just figured I would post in case somebody cared about this data point. > > Hmm... That's not good. It is perfectly legal for a server to not > request OPEN4_RESULT_CONFIRM (although it is probably not a very good > idea), and the client should be able to cope with that. If you want to reproduce, my server is open (though largely unannounced, since its still in initial coding phase): git://git.kernel.org/pub/scm/daemon/nfs/nfs4-ram.git Commit b3f602203ab023aa559c4db5449448b9c7044f36 (HEAD~2 currently) can reproduce the behavior nicely. The server is currently a zero-configuration-file RAM server, so its easy to test: just build and run (./nfs4_ramd). It binds to port 2049 with an empty filesystem, each time it is started. (--help for alternate port or other options) > I'll have a look at what is going on there. > >> P.S. I really really hate stateid/seqids at this point. RFC >> nonwithstanding, they are basically undocumented. I am reduced to >> poking through NFSv4 WG archives and Linux kernel code to find out what >> my server should be doing. pynfs is no help here, either. > > The primary function of seqids is to allow the server to distinguish > replayed non-idempotent RPC requests from new requests, so their > properties are really quite simple: > > * If the seqid presented by the client is in sequence, then the > server is supposed to handle the request. > * If the seqid matches that of the last request, then the server > is supposed to replay the reply. > * If the seqid is completely out of sequence, then the server > should return the BAD_SEQID error. > > As for stateids, their purpose is to allow the server to figure out to > which client it is talking, and to track what state the client thinks it > is holding. Apart from the seqid field (which is there in order to track > the ordering of OPEN requests), a stateid is an opaque structure. > The only really important requirement here is that you need to be able > to distinguish stale state from valid state so that you can fence off > RPC requests that refer to stale locks. Yeah I figured out the purpose pretty quickly. The thing I missed was that the seqid is per-lockowner, and not per-openfile. No surprise things got weird, when I coded a server following that logic... Plus there are a ton of undocumented -ordering- constraints you must follow, with regards to validating seqid/stateid and then returning the correct error. Thanks for the response! Hope my buggy server helps you track down client problems ;-) Jeff ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-12-23 2:05 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-22 4:15 Linux client misses lack of open-confirm? Jeff Garzik
2007-12-22 15:27 ` Trond Myklebust
[not found] ` <1198337249.7741.52.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2007-12-23 2:05 ` Jeff Garzik
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox