Linux NFS development
 help / color / mirror / Atom feed
* Linux client misses lack of open-confirm?
@ 2007-12-22  4:15 Jeff Garzik
  2007-12-22 15:27 ` Trond Myklebust
  0 siblings, 1 reply; 3+ messages in thread
From: Jeff Garzik @ 2007-12-22  4:15 UTC (permalink / raw)
  To: NFS list

While debugging my NFS server, I may have caught a Linux client bug.

My server is currently buggy, in that, it never sets the 
OPEN4_RESULT_CONFIRM bit after an OPEN with a new owner.  Shockingly, I 
can pass ~530 pynfs tests, fsx-linux [Linux v4 client], and build a 
kernel [Linux v4 client] even with such brokenness.  ;-)

Anyway, the Linux NFSv4 client (2.6.24-rc6) seems quite happy with this 
state of affairs, right until CLOSE time, when it passes "seqid + 2" to 
my server rather than the expected "seqid + 1".

Though I am quite happy that Linux managed to workaround my stupid 
server and store data successfully _anyway_, I thought it was worth 
commenting.  I was assuming either

	a) Linux would notice the lack of OPEN4_RESULT_CONFIRM and
	   complain accordingly, or,

	b) Linux would generate a correct seqid, taking into account
	   the fact that it did not issue OPEN_CONFIRM.

As you can see from the wireshark-0.99.7-2.fc8 binary dump at

	http://gtf.org/garzik/misc/dump.bz2 (33k compressed)

we see many examples of

	C:	OPEN	(seqid == 0)
	S:	NFS4_OK

	C:	[perhaps some intervening READ or WRITE or *ATTR]
	S:	[replies as expected]

	C:	CLOSE	(seqid == 2)
	S:	NFS4ERR_BAD_SEQID

If you feel this behavior is fine given a broken server, that's cool... 
  I just figured I would post in case somebody cared about this data point.

	Jeff


P.S.  I really really hate stateid/seqids at this point.  RFC 
nonwithstanding, they are basically undocumented.  I am reduced to 
poking through NFSv4 WG archives and Linux kernel code to find out what 
my server should be doing.  pynfs is no help here, either.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Linux client misses lack of open-confirm?
  2007-12-22  4:15 Linux client misses lack of open-confirm? Jeff Garzik
@ 2007-12-22 15:27 ` Trond Myklebust
       [not found]   ` <1198337249.7741.52.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Trond Myklebust @ 2007-12-22 15:27 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: NFS list


On Fri, 2007-12-21 at 23:15 -0500, Jeff Garzik wrote:
> While debugging my NFS server, I may have caught a Linux client bug.
> 
> My server is currently buggy, in that, it never sets the 
> OPEN4_RESULT_CONFIRM bit after an OPEN with a new owner.  Shockingly, I 
> can pass ~530 pynfs tests, fsx-linux [Linux v4 client], and build a 
> kernel [Linux v4 client] even with such brokenness.  ;-)
> 
> Anyway, the Linux NFSv4 client (2.6.24-rc6) seems quite happy with this 
> state of affairs, right until CLOSE time, when it passes "seqid + 2" to 
> my server rather than the expected "seqid + 1".
> 
> Though I am quite happy that Linux managed to workaround my stupid 
> server and store data successfully _anyway_, I thought it was worth 
> commenting.  I was assuming either
> 
> 	a) Linux would notice the lack of OPEN4_RESULT_CONFIRM and
> 	   complain accordingly, or,
> 
> 	b) Linux would generate a correct seqid, taking into account
> 	   the fact that it did not issue OPEN_CONFIRM.
> 
> As you can see from the wireshark-0.99.7-2.fc8 binary dump at
> 
> 	http://gtf.org/garzik/misc/dump.bz2 (33k compressed)
> 
> we see many examples of
> 
> 	C:	OPEN	(seqid == 0)
> 	S:	NFS4_OK
> 
> 	C:	[perhaps some intervening READ or WRITE or *ATTR]
> 	S:	[replies as expected]
> 
> 	C:	CLOSE	(seqid == 2)
> 	S:	NFS4ERR_BAD_SEQID
> 
> If you feel this behavior is fine given a broken server, that's cool... 
>   I just figured I would post in case somebody cared about this data point.

Hmm... That's not good. It is perfectly legal for a server to not
request OPEN4_RESULT_CONFIRM (although it is probably not a very good
idea), and the client should be able to cope with that.

I'll have a look at what is going on there.

> P.S.  I really really hate stateid/seqids at this point.  RFC 
> nonwithstanding, they are basically undocumented.  I am reduced to 
> poking through NFSv4 WG archives and Linux kernel code to find out what 
> my server should be doing.  pynfs is no help here, either.

The primary function of seqids is to allow the server to distinguish
replayed non-idempotent RPC requests from new requests, so their
properties are really quite simple:

      * If the seqid presented by the client is in sequence, then the
        server is supposed to handle the request.
      * If the seqid matches that of the last request, then the server
        is supposed to replay the reply.
      * If the seqid is completely out of sequence, then the server
        should return the BAD_SEQID error.

As for stateids, their purpose is to allow the server to figure out to
which client it is talking, and to track what state the client thinks it
is holding. Apart from the seqid field (which is there in order to track
the ordering of OPEN requests), a stateid is an opaque structure.
The only really important requirement here is that you need to be able
to distinguish stale state from valid state so that you can fence off
RPC requests that refer to stale locks.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Linux client misses lack of open-confirm?
       [not found]   ` <1198337249.7741.52.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2007-12-23  2:05     ` Jeff Garzik
  0 siblings, 0 replies; 3+ messages in thread
From: Jeff Garzik @ 2007-12-23  2:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: NFS list

Trond Myklebust wrote:
> On Fri, 2007-12-21 at 23:15 -0500, Jeff Garzik wrote:
>> While debugging my NFS server, I may have caught a Linux client bug.
>>
>> My server is currently buggy, in that, it never sets the 
>> OPEN4_RESULT_CONFIRM bit after an OPEN with a new owner.  Shockingly, I 
>> can pass ~530 pynfs tests, fsx-linux [Linux v4 client], and build a 
>> kernel [Linux v4 client] even with such brokenness.  ;-)
>>
>> Anyway, the Linux NFSv4 client (2.6.24-rc6) seems quite happy with this 
>> state of affairs, right until CLOSE time, when it passes "seqid + 2" to 
>> my server rather than the expected "seqid + 1".
>>
>> Though I am quite happy that Linux managed to workaround my stupid 
>> server and store data successfully _anyway_, I thought it was worth 
>> commenting.  I was assuming either
>>
>> 	a) Linux would notice the lack of OPEN4_RESULT_CONFIRM and
>> 	   complain accordingly, or,
>>
>> 	b) Linux would generate a correct seqid, taking into account
>> 	   the fact that it did not issue OPEN_CONFIRM.
>>
>> As you can see from the wireshark-0.99.7-2.fc8 binary dump at
>>
>> 	http://gtf.org/garzik/misc/dump.bz2 (33k compressed)
>>
>> we see many examples of
>>
>> 	C:	OPEN	(seqid == 0)
>> 	S:	NFS4_OK
>>
>> 	C:	[perhaps some intervening READ or WRITE or *ATTR]
>> 	S:	[replies as expected]
>>
>> 	C:	CLOSE	(seqid == 2)
>> 	S:	NFS4ERR_BAD_SEQID
>>
>> If you feel this behavior is fine given a broken server, that's cool... 
>>   I just figured I would post in case somebody cared about this data point.
> 
> Hmm... That's not good. It is perfectly legal for a server to not
> request OPEN4_RESULT_CONFIRM (although it is probably not a very good
> idea), and the client should be able to cope with that.

If you want to reproduce, my server is open (though largely unannounced, 
since its still in initial coding phase): 
git://git.kernel.org/pub/scm/daemon/nfs/nfs4-ram.git

Commit b3f602203ab023aa559c4db5449448b9c7044f36 (HEAD~2 currently) can 
reproduce the behavior nicely.

The server is currently a zero-configuration-file RAM server, so its 
easy to test: just build and run (./nfs4_ramd).  It binds to port 2049 
with an empty filesystem, each time it is started.  (--help for 
alternate port or other options)


> I'll have a look at what is going on there.
> 
>> P.S.  I really really hate stateid/seqids at this point.  RFC 
>> nonwithstanding, they are basically undocumented.  I am reduced to 
>> poking through NFSv4 WG archives and Linux kernel code to find out what 
>> my server should be doing.  pynfs is no help here, either.
> 
> The primary function of seqids is to allow the server to distinguish
> replayed non-idempotent RPC requests from new requests, so their
> properties are really quite simple:
> 
>       * If the seqid presented by the client is in sequence, then the
>         server is supposed to handle the request.
>       * If the seqid matches that of the last request, then the server
>         is supposed to replay the reply.
>       * If the seqid is completely out of sequence, then the server
>         should return the BAD_SEQID error.
> 
> As for stateids, their purpose is to allow the server to figure out to
> which client it is talking, and to track what state the client thinks it
> is holding. Apart from the seqid field (which is there in order to track
> the ordering of OPEN requests), a stateid is an opaque structure.
> The only really important requirement here is that you need to be able
> to distinguish stale state from valid state so that you can fence off
> RPC requests that refer to stale locks.

Yeah I figured out the purpose pretty quickly.  The thing I missed was 
that the seqid is per-lockowner, and not per-openfile.  No surprise 
things got weird, when I coded a server following that logic...

Plus there are a ton of undocumented -ordering- constraints you must 
follow, with regards to validating seqid/stateid and then returning the 
correct error.

Thanks for the response!  Hope my buggy server helps you track down 
client problems ;-)

	Jeff



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-12-23  2:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-22  4:15 Linux client misses lack of open-confirm? Jeff Garzik
2007-12-22 15:27 ` Trond Myklebust
     [not found]   ` <1198337249.7741.52.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2007-12-23  2:05     ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox