Linux NFS development
 help / color / mirror / Atom feed
* A new NFSv4 server...
@ 2008-01-03 12:16 Jeff Garzik
  2008-01-03 16:32 ` J. Bruce Fields
  2008-01-04  9:15 ` Peter Åstrand
  0 siblings, 2 replies; 37+ messages in thread
From: Jeff Garzik @ 2008-01-03 12:16 UTC (permalink / raw)
  To: nfsv4, NFS list

In case some developers are interested...  I'm poking at a from-scratch 
userland NFSv4 server, as a side project.

In my personal opinion, version 4 of NFS is a quantum-leap improvement 
over previous versions.  While I used NFS v3 extensively, I always felt 
it was a crappy protocol, and unworthy of serious development effort. 
That changed with v4.

I chose to use NFSv4 as the basis for experiments (hopefully yielding 
production software) that I've long wanted to do in reliable 
filesystems, distributed filesystems, and other fun areas.

In the first step down this long path, I've created an NFSv4 userland 
server from scratch.  Currently it merely serves data straight from RAM, 
but the long term goal is to permit modular storage backends.  Thus you 
could implement a simple RAM backend, an sqlite-based backend or a 
complex distributed storage backend.

As this is a first-mention developer-only announcement, I didn't bother 
to create source tarballs.  Here is the git repo:
	git://git.kernel.org/pub/scm/daemon/nfs/nfs4-ram.git

This is the home page, but it's mainly a stub pointing to the git repo:
	http://linux.yyz.us/projects/nfsv4.html

The server will
* serve data from RAM, with NFSv4 persistent filehandles and FILE_SYNC4
* destroy all data, when the process exits
* pass 97% of the useful pynfs tests (cvs latest)
* pass fsx-linux stress testing, with Linux NFSv4 client (2.6.recent)
* pass kernel build stress testing, with Linux NFSv4 client (2.6.recent)

It will not, at the present time,
* store any data or metadata in stable storage
* do RPCSEC_GSS (thus, not yet RFC-compliant)
* do delegations (thus, with reduced caching and increased 
revalidations, can be slower than disk-based storage)

At this point, I'm quite interested to hear feedback on how the server 
works with other NFSv4 clients.  I'm interested in making sure the 
server is portable to FreeBSD and other OS's, even though it was 
developed and tested only on Linux.  I also intend to use some 
Linux-specific syscalls, most notably sync_file_range(2) and 
sendfile/tee/splice/vmsplice, so that will have to be glossed over by 
portability code.

Finally, this is a spare time project, something I've mostly been poking 
at while having idle time on a not-Internet-connected laptop. 
Technically its sponsored by Red Hat, since RH pays my salary for all my 
open source work, but this is largely a personal project done for 
personal reasons.  I just hope others find it interesting or useful, as 
it progresses.

	Jeff


^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
@ 2008-01-04 15:28 Rick Macklem
       [not found] ` <200801041528.KAA18776-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Rick Macklem @ 2008-01-04 15:28 UTC (permalink / raw)
  To: jeff; +Cc: linux-nfs, nfsv4

> Plus, surely in this day and age, we can figure out something better 
> than waiting for face-to-face events to test something.  Maybe somebody 
> could arrange a donation of some slice of a grid (Amazon EC2?), make 
> various OS images available, and give engineers some way to request a 
> selection of tests, with a selection of OS images?

I tried putting a server up accessible over the internet and only ever
got one person testing on it once (or maybe it was just a hacker:-). I
did test my client against a server at CITI once, after signing a
bakeathon NDA. But, I agree, and I don't really think it even needs
a central site. I don't see why vendors couldn't put up servers
(production software or whatever they are comfortable having internet
 accessible) that clients can test against. I'll be happy to put my
server up and I'd be happy to test against internet accessible servers
with my client.

And, like you, I don't get to connectathon since I don't "make a living
at this" (to loosely quote another poster).

rick
Btw: You might want to post to nfsv4@ietf.org w.r.t. interoperability
     testing, since that will catch people who don't lurk on this list.

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
@ 2008-01-04 15:48 Rick Macklem
       [not found] ` <200801041548.KAA18953-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Rick Macklem @ 2008-01-04 15:48 UTC (permalink / raw)
  To: gnb; +Cc: linux-nfs, nfsv4

> You had me worried there for a moment, I thought you might be the first
> person to admit to liking the NFS4 protocol design.

I actually like quite a bit about it, although I agree that the XDR/Sun
RPC underpinnings are getting pretty tired (mid-1980s). I liked Sessions,
but think that it's gotten overly complex (why 5 required encrypted
checksum algorithms, wouldn't one be enough? for example). It would have been simpler,
if it had been "posix only" and not tried to be Windows compatible, but
I see the argument for Windows compatibility, hense the Open.

> The classic persistent file handles, for example, could be considered a
> major
> design flaw.  Firstly it makes the inode# -> dentry lookup a performance
> path
> for the underlying filesystem, which it isn't in any local load. 

Sounds like a server implementor's perspective. From a client implementor's
point of view, a T-stable file handle is a wonderful thing. I have no idea
how to correctly implement client side support for the volatile file
handles allowed in NFSv4. My client doesn't support them.

> Secondly, it's
> inherently insecure if you export anything less than an entire
> filesystem, unless
> you use a slow, buggy, and non-conformant hack like subtree_check.

Security is definitely an issue. The RPCSEC_GSS stuff works ok, but it
would have been nice to have some sort of "machine credential" that
could be used to authenticate a client (the host credential used by
SetClientID etc, kinda does that, but it isn't really specified). It
seems easier to do encryption/encrypted checksumming down by the transport
layer and not visible to the RPC. With that, many sites would be
comfortable with simpler user credentials than what Kerberos provides.
(Actually, I think most of the problem is that nice tools to set up and
 manage Kerberos aren't in most Unix-like systems, so sysadmins don't
 want to bother with Kerberos.)

rick

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
@ 2008-01-04 17:11 Rick Macklem
       [not found] ` <200801041711.MAA19577-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Rick Macklem @ 2008-01-04 17:11 UTC (permalink / raw)
  To: jeff; +Cc: linux-nfs, nfsv4

> heh, tell me about it.  First I started out using rpcgen, then rewrote 
> everything to do raw XDR decoding.  OPEN is huge.
> 
> IMO, OPEN should be split into multiple operations, probably one for 
> each "OPEN arm".  It's not like new opcode numbers are expensive.

As I hinted at, Open is the way it is, since Windows requires one Op
so that Open/Share locks can be implemented correctly. Anything else
would not have satisfied a Win client's requirements.

> One of my personal desires is for a high level of cache coherence 
> throughout the system for all clients (though perhaps an admin could 
> optionally relax this requirement).  I'm a fan of Google's "Chubby", a 
> distributed reliable filesystem that stalls client writes until cache 
> invalidations for the associated byte range are processed for all 
> interested clients.

Delegations provide cache coherency "in a sense". When a client has a
delegation, it knows that no-one else is writing the file. Unfortunately,
as soon as a client gets an Open without a delegation, in no longer
gets the conherency guarantee (and servers are completely free to not
issue delegations if they don't feel like doing so). A client can
re-open a file when it gets an Open without a delegation, but if the
server still doesn't give it a delegation, it can't do anything more.
(The re-open trick is useful for an Open that requires confirmation, since
 the server can't issue a delegation for that case.)

rick

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
@ 2008-01-04 17:28 Rick Macklem
       [not found] ` <200801041728.MAA19743-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
  0 siblings, 1 reply; 37+ messages in thread
From: Rick Macklem @ 2008-01-04 17:28 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs, nfsv4, gnb

> As long as they persist while you have an open (or a delegation), it
> shouldn't be so hard to implement, should it?  If a filehandle expires,
> then you throw away any cache associated with it, but as long as no
> applications hold file descriptors for it, that's not a catastrophe.
> 
> But I'm a little confused whether rfc 3530's 4.2.3 gives a way for the
> server to express that guarantee.

Agreed, but I've always assumed the server can return NFS4ERR_FHEXPIRED
at any time. (It's listed as a error for many Ops, such as Read and Write.)
Also, what does a client do after a server reboot. It can't use
Open/Claim_previous for recovery.

Even if there is a "don't expire while Open" guarantee, it's still a pita
for the client to hang onto pathnames for directories and such, so that
they can re-lookup the fh. (And if that re-lookup happens to fail or end
up in a different place?)

rick

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2008-01-06 23:54 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-03 12:16 A new NFSv4 server Jeff Garzik
2008-01-03 16:32 ` J. Bruce Fields
2008-01-04  5:32   ` Jeff Garzik
2008-01-04  6:24     ` Greg Banks
     [not found]       ` <477DD11B.40909-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-01-04  7:04         ` Jeff Garzik
2008-01-04  9:07           ` Benny Halevy
2008-01-04 15:49             ` Jeff Garzik
2008-01-04 19:51               ` Benny Halevy
2008-01-05  1:46               ` Greg Banks
2008-01-05  7:56                 ` Benny Halevy
2008-01-04 17:47             ` J. Bruce Fields
2008-01-04 19:55               ` Benny Halevy
2008-01-04  9:15           ` Peter Åstrand
2008-01-04 10:05             ` Neil Brown
     [not found]             ` <Pine.LNX.4.64.0801040954070.5004-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-04 13:50               ` Frank van Maarseveen
2008-01-04 16:41               ` Jeff Garzik
2008-01-04 20:03                 ` Peter Åstrand
     [not found]                   ` <Pine.LNX.4.64.0801042030380.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-06 23:54                     ` James Morris
2008-01-04 20:31             ` Muntz, Daniel
2008-01-04  9:15 ` Peter Åstrand
2008-01-04 16:14   ` Jeff Garzik
2008-01-04 19:58     ` Peter Åstrand
  -- strict thread matches above, loose matches on Subject: below --
2008-01-04 15:28 Rick Macklem
     [not found] ` <200801041528.KAA18776-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:21   ` J. Bruce Fields
2008-01-04 18:03     ` Tom Haynes
     [not found]       ` <477E750A.2030905-8AdZ+HgO7noAvxtiuMwx3w@public.gmane.org>
2008-01-04 18:21         ` J. Bruce Fields
2008-01-04 19:50     ` Jeff Garzik
2008-01-04 19:57       ` Peter Åstrand
     [not found]         ` <Pine.LNX.4.64.0801042055490.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-05  0:43           ` Jeff Garzik
2008-01-04 15:48 Rick Macklem
     [not found] ` <200801041548.KAA18953-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:15   ` J. Bruce Fields
2008-01-05  2:32   ` Greg Banks
2008-01-04 17:11 Rick Macklem
     [not found] ` <200801041711.MAA19577-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-05  0:51   ` Jeff Garzik
2008-01-04 17:28 Rick Macklem
     [not found] ` <200801041728.MAA19743-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:42   ` J. Bruce Fields
2008-01-04 17:45   ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox