* A new NFSv4 server...
@ 2008-01-03 12:16 Jeff Garzik
2008-01-03 16:32 ` J. Bruce Fields
2008-01-04 9:15 ` Peter Åstrand
0 siblings, 2 replies; 37+ messages in thread
From: Jeff Garzik @ 2008-01-03 12:16 UTC (permalink / raw)
To: nfsv4, NFS list
In case some developers are interested... I'm poking at a from-scratch
userland NFSv4 server, as a side project.
In my personal opinion, version 4 of NFS is a quantum-leap improvement
over previous versions. While I used NFS v3 extensively, I always felt
it was a crappy protocol, and unworthy of serious development effort.
That changed with v4.
I chose to use NFSv4 as the basis for experiments (hopefully yielding
production software) that I've long wanted to do in reliable
filesystems, distributed filesystems, and other fun areas.
In the first step down this long path, I've created an NFSv4 userland
server from scratch. Currently it merely serves data straight from RAM,
but the long term goal is to permit modular storage backends. Thus you
could implement a simple RAM backend, an sqlite-based backend or a
complex distributed storage backend.
As this is a first-mention developer-only announcement, I didn't bother
to create source tarballs. Here is the git repo:
git://git.kernel.org/pub/scm/daemon/nfs/nfs4-ram.git
This is the home page, but it's mainly a stub pointing to the git repo:
http://linux.yyz.us/projects/nfsv4.html
The server will
* serve data from RAM, with NFSv4 persistent filehandles and FILE_SYNC4
* destroy all data, when the process exits
* pass 97% of the useful pynfs tests (cvs latest)
* pass fsx-linux stress testing, with Linux NFSv4 client (2.6.recent)
* pass kernel build stress testing, with Linux NFSv4 client (2.6.recent)
It will not, at the present time,
* store any data or metadata in stable storage
* do RPCSEC_GSS (thus, not yet RFC-compliant)
* do delegations (thus, with reduced caching and increased
revalidations, can be slower than disk-based storage)
At this point, I'm quite interested to hear feedback on how the server
works with other NFSv4 clients. I'm interested in making sure the
server is portable to FreeBSD and other OS's, even though it was
developed and tested only on Linux. I also intend to use some
Linux-specific syscalls, most notably sync_file_range(2) and
sendfile/tee/splice/vmsplice, so that will have to be glossed over by
portability code.
Finally, this is a spare time project, something I've mostly been poking
at while having idle time on a not-Internet-connected laptop.
Technically its sponsored by Red Hat, since RH pays my salary for all my
open source work, but this is largely a personal project done for
personal reasons. I just hope others find it interesting or useful, as
it progresses.
Jeff
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-03 12:16 A new NFSv4 server Jeff Garzik
@ 2008-01-03 16:32 ` J. Bruce Fields
2008-01-04 5:32 ` Jeff Garzik
2008-01-04 9:15 ` Peter Åstrand
1 sibling, 1 reply; 37+ messages in thread
From: J. Bruce Fields @ 2008-01-03 16:32 UTC (permalink / raw)
To: Jeff Garzik; +Cc: NFS list, nfsv4
On Thu, Jan 03, 2008 at 07:16:49AM -0500, Jeff Garzik wrote:
> At this point, I'm quite interested to hear feedback on how the server
> works with other NFSv4 clients.
Any possibility of making Connectathon in May?:
http://www.connectathon.org/
The major NFSv4 implementors normally have their clients there, so it's
usually the quickest way to find and solve any interoperability
problems. Almost all the work will probably be on sessions and pNFS,
but people should be willing to do basic 4.0 testing too.
Glad to hear you're still working on this--it sounds interesting.
--b.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-03 16:32 ` J. Bruce Fields
@ 2008-01-04 5:32 ` Jeff Garzik
2008-01-04 6:24 ` Greg Banks
0 siblings, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2008-01-04 5:32 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: nfsv4, NFS list
J. Bruce Fields wrote:
> On Thu, Jan 03, 2008 at 07:16:49AM -0500, Jeff Garzik wrote:
>> At this point, I'm quite interested to hear feedback on how the server
>> works with other NFSv4 clients.
>
> Any possibility of making Connectathon in May?:
>
> http://www.connectathon.org/
>
> The major NFSv4 implementors normally have their clients there, so it's
> usually the quickest way to find and solve any interoperability
> problems. Almost all the work will probably be on sessions and pNFS,
> but people should be willing to do basic 4.0 testing too.
As this isn't an official RH project, I would probably have to pay my
own way, which makes it doubtful :)
Plus, surely in this day and age, we can figure out something better
than waiting for face-to-face events to test something. Maybe somebody
could arrange a donation of some slice of a grid (Amazon EC2?), make
various OS images available, and give engineers some way to request a
selection of tests, with a selection of OS images?
> Glad to hear you're still working on this--it sounds interesting.
Certainly pNFS parallels some of the work I want to do... NFSv4.1 is so
darned complex though. I am torn as to whether or not I want to take my
server down that path.
I really wish the entire wire protocol were scrapped and replaced with
something more sane, and easier to parse. The variable-length
structures passed to PCI hardware these days [as seen in the kernel
drivers I hack on, IOW] are just as compact, if not more so, but are
designed to be parsed quickly in large chunks, rather than the "next XDR
may be your last!" approach :)
Sessions are IMO a tad overdone, too... largely due to necessities
forced upon NFSv4.1 by the legacy RPC protocol assumptions. If you
simply /assume/ basic properties of TCP or SCTP, it's a lot easier to do
multi-channel or multi-homed messaging. Multi-channel _isn't_ really
that hard, and we've been doing it since the earliest days of NNTP and
the Usenet Top 1000 pissing contest, if not longer.
It's tempting to see what would arise from a clean-slate wire protocol
effort, something that is otherwise compatible with NFS 4.x operations,
objects, and data model.
Jeff
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 5:32 ` Jeff Garzik
@ 2008-01-04 6:24 ` Greg Banks
[not found] ` <477DD11B.40909-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
0 siblings, 1 reply; 37+ messages in thread
From: Greg Banks @ 2008-01-04 6:24 UTC (permalink / raw)
To: Jeff Garzik; +Cc: J. Bruce Fields, NFS list, nfsv4
Jeff Garzik wrote:
> J. Bruce Fields wrote:
>>
>> http://www.connectathon.org/
>>
>>
> As this isn't an official RH project, I would probably have to pay my
> own way, which makes it doubtful :)
>
It's entirely possible someone might run your server code on a spare
machine,
given functioning install packages and easy instructions.
> Plus, surely in this day and age, we can figure out something better
> than waiting for face-to-face events to test something. Maybe somebody
> could arrange a donation of some slice of a grid (Amazon EC2?), make
> various OS images available, and give engineers some way to request a
> selection of tests, with a selection of OS images?
>
Vendors turn up to cthon with proprietary and unreleased software and
hardware
which they most certainly are not going to let anyone else run for
them. Also,
being in the same hall with all those vendors' technical folks tends to
make bugs
shallow. It's a very valuable exercise for any organisation making a
living from
NFS.
> I really wish the entire wire protocol were scrapped and replaced with
> something more sane, and easier to parse.
You had me worried there for a moment, I thought you might be the first
person to admit to liking the NFS4 protocol design.
> It's tempting to see what would arise from a clean-slate wire protocol
> effort, something that is otherwise compatible with NFS 4.x operations,
> objects, and data model.
>
Much like the old phone system, the primary value of protocols like NFS
is the
widespread presence of reliable conformant implementations. Most of the
rest of
the NFS is problematic. I would argue that some aspects of the NFS
operations,
objects, and data model is rather more busted than the XDR encoding.
The classic persistent file handles, for example, could be considered a
major
design flaw. Firstly it makes the inode# -> dentry lookup a performance
path
for the underlying filesystem, which it isn't in any local load.
Secondly, it's
inherently insecure if you export anything less than an entire
filesystem, unless
you use a slow, buggy, and non-conformant hack like subtree_check.
Another major flaw is putting the client in control of when unstable data is
written to disk, but not providing any way for the client to find out how to
do that optimally.
Then there's the NFS4 approach to extended attributes.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <477DD11B.40909-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
@ 2008-01-04 7:04 ` Jeff Garzik
2008-01-04 9:07 ` Benny Halevy
2008-01-04 9:15 ` Peter Åstrand
0 siblings, 2 replies; 37+ messages in thread
From: Jeff Garzik @ 2008-01-04 7:04 UTC (permalink / raw)
To: Greg Banks; +Cc: J. Bruce Fields, NFS list, nfsv4
Greg Banks wrote:
> It's entirely possible someone might run your server code on a spare
> machine,
> given functioning install packages and easy instructions.
Easy enough to do...
>> Plus, surely in this day and age, we can figure out something better
>> than waiting for face-to-face events to test something. Maybe somebody
>> could arrange a donation of some slice of a grid (Amazon EC2?), make
>> various OS images available, and give engineers some way to request a
>> selection of tests, with a selection of OS images?
>>
> Vendors turn up to cthon with proprietary and unreleased software and
> hardware
> which they most certainly are not going to let anyone else run for
> them. Also,
> being in the same hall with all those vendors' technical folks tends to
> make bugs
> shallow. It's a very valuable exercise for any organisation making a
> living from
> NFS.
Certainly, but I could see a grid of released, non-proprietary software
as quite a valuable resource in addition to f2f events. Quality can
only increase, if the [Linux | *BSD | OpenSolaris |...] NFS clients
could run regression tests against several different NFS servers, each
time an NFS client receives a set of changes.
Even if it's only the open source operating systems that wish to
participate, having a mix of OS's and platforms would be useful. A
permanent, virtual cthon.
>> I really wish the entire wire protocol were scrapped and replaced with
>> something more sane, and easier to parse.
> You had me worried there for a moment, I thought you might be the first
> person to admit to liking the NFS4 protocol design.
>
>> It's tempting to see what would arise from a clean-slate wire protocol
>> effort, something that is otherwise compatible with NFS 4.x operations,
>> objects, and data model.
It's more like v4 is a vast relative improvement over prior NFS. Given
the huge number of NFS users and sites, IMO v4 is a huge improvement for
Unix file sharing overall.
But if you are dreaming of a truly clean slate protocol... I've got a
long wish list too :)
> Much like the old phone system, the primary value of protocols like NFS
> is the
> widespread presence of reliable conformant implementations. Most of the
> rest of
> the NFS is problematic. I would argue that some aspects of the NFS
> operations,
> objects, and data model is rather more busted than the XDR encoding.
>
> The classic persistent file handles, for example, could be considered a
> major
> design flaw. Firstly it makes the inode# -> dentry lookup a performance
> path
> for the underlying filesystem, which it isn't in any local load.
Oh, certainly. I was mainly thinking a replacement of the wire protocol
would be an easier step for people to swallow than a new protocol.
But if you are implying there is enough momentum to simply rewrite NFS
from scratch, I'll cheer and help out with coding :) Or maybe Zach
Brown will do it for us with CRFS:
http://linux.conf.au/programme/detail?TalkID=247
A big feature of NFS today is its high Just Works(tm) value (ease of
configuration and some minimum level of fault tolerance), so any
replacement would need to have similar attributes.
> Another major flaw is putting the client in control of when unstable data is
> written to disk, but not providing any way for the client to find out how to
> do that optimally.
>
> Then there's the NFS4 approach to extended attributes.
Ugh. Don't get me started. That's not in my server yet, but I can
already see the mess ahead.
Jeff
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 7:04 ` Jeff Garzik
@ 2008-01-04 9:07 ` Benny Halevy
2008-01-04 15:49 ` Jeff Garzik
2008-01-04 17:47 ` J. Bruce Fields
2008-01-04 9:15 ` Peter Åstrand
1 sibling, 2 replies; 37+ messages in thread
From: Benny Halevy @ 2008-01-04 9:07 UTC (permalink / raw)
To: Jeff Garzik; +Cc: NFS list, nfsv4, Greg Banks
On Jan. 04, 2008, 9:04 +0200, Jeff Garzik <jeff@garzik.org> wrote:
>>> I really wish the entire wire protocol were scrapped and replaced with
>>> something more sane, and easier to parse.
>> You had me worried there for a moment, I thought you might be the first
>> person to admit to liking the NFS4 protocol design.
>>
>>> It's tempting to see what would arise from a clean-slate wire protocol
>>> effort, something that is otherwise compatible with NFS 4.x operations,
>>> objects, and data model.
>
> It's more like v4 is a vast relative improvement over prior NFS. Given
> the huge number of NFS users and sites, IMO v4 is a huge improvement for
> Unix file sharing overall.
>
> But if you are dreaming of a truly clean slate protocol... I've got a
> long wish list too :)
>
>
>> Much like the old phone system, the primary value of protocols like NFS
>> is the
>> widespread presence of reliable conformant implementations. Most of the
>> rest of
>> the NFS is problematic. I would argue that some aspects of the NFS
>> operations,
>> objects, and data model is rather more busted than the XDR encoding.
>>
>> The classic persistent file handles, for example, could be considered a
>> major
>> design flaw. Firstly it makes the inode# -> dentry lookup a performance
>> path
>> for the underlying filesystem, which it isn't in any local load.
>
> Oh, certainly. I was mainly thinking a replacement of the wire protocol
> would be an easier step for people to swallow than a new protocol.
>
> But if you are implying there is enough momentum to simply rewrite NFS
> from scratch, I'll cheer and help out with coding :) Or maybe Zach
> Brown will do it for us with CRFS:
> http://linux.conf.au/programme/detail?TalkID=247
>
> A big feature of NFS today is its high Just Works(tm) value (ease of
> configuration and some minimum level of fault tolerance), so any
> replacement would need to have similar attributes.
>
>
>> Another major flaw is putting the client in control of when unstable data is
>> written to disk, but not providing any way for the client to find out how to
>> do that optimally.
>>
>> Then there's the NFS4 approach to extended attributes.
>
> Ugh. Don't get me started. That's not in my server yet, but I can
> already see the mess ahead.
>
> Jeff
>
Jeff, taking into account the amount of effort people and different
organizations have already put into NFSv4 and NFSv4.1 I wish you could
tunnel your inventive energy into making NFSv4.1 better rather than
trying to reinvent NFS/RPC/XDR.
Although It's rather late in the process since the NFSv4 working group
is close to putting the NFSv4.1 Internet-draft up for last
call, we would certainly appreciate more implementation feedback.
Benny
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 7:04 ` Jeff Garzik
2008-01-04 9:07 ` Benny Halevy
@ 2008-01-04 9:15 ` Peter Åstrand
2008-01-04 10:05 ` Neil Brown
` (2 more replies)
1 sibling, 3 replies; 37+ messages in thread
From: Peter Åstrand @ 2008-01-04 9:15 UTC (permalink / raw)
To: Jeff Garzik; +Cc: NFS list, nfsv4
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2302 bytes --]
[About v4]
On Fri, 4 Jan 2008, Jeff Garzik wrote:
> > > I really wish the entire wire protocol were scrapped and replaced with
> > > something more sane, and easier to parse.
> > You had me worried there for a moment, I thought you might be the first
> > person to admit to liking the NFS4 protocol design.
Couldn't agree more.
> In my personal opinion, version 4 of NFS is a quantum-leap improvement over
> previous versions. While I used NFS v3 extensively, I always felt it was a
> crappy protocol, and unworthy of serious development effort. That changed with
> v4.
...
> > > It's tempting to see what would arise from a clean-slate wire protocol
> > > effort, something that is otherwise compatible with NFS 4.x operations,
> > > objects, and data model.
>
> It's more like v4 is a vast relative improvement over prior NFS. Given the
> huge number of NFS users and sites, IMO v4 is a huge improvement for Unix file
> sharing overall.
Many years ago, before NFSv4 was finished, I felt the same. I was waiting
for v4 and thought that everything would be so much better. I wanted to
help and started the "pynfs" project. Today, I have a different opinion. I
think v3 is a fairly good protocol, if you use it correctly. For example,
many people don't realize that you don't need the portmapper, that you can
use a single well-known TCP port, that you can use RPCSEC_GSS and so
forth, even with v3.
I think v4 has a few valuable improvements, but it comes with a very high
price. v3 has a minimalistic beauty which v4 lacks. For example, take a
look at the OPEN operation with 7 arguments, of which many are complex
data structures:
(cfh), seqid, share_access, share_deny, owner, openhow, claim ->
(cfh), stateid, cinfo, rflags, open_confirm, attrset delegation
Not pretty...
> Oh, certainly. I was mainly thinking a replacement of the wire protocol would
> be an easier step for people to swallow than a new protocol.
I've been thinking of trying to put together something like NFS v3.5. Some
parts of v4 are nice, but the complexity is too high.
Regards,
---
Peter Åstrand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Linköping Phone: +46-13-21 46 00
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-03 12:16 A new NFSv4 server Jeff Garzik
2008-01-03 16:32 ` J. Bruce Fields
@ 2008-01-04 9:15 ` Peter Åstrand
2008-01-04 16:14 ` Jeff Garzik
1 sibling, 1 reply; 37+ messages in thread
From: Peter Åstrand @ 2008-01-04 9:15 UTC (permalink / raw)
To: Jeff Garzik; +Cc: NFS list, nfsv4
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1155 bytes --]
[About your implementation]
> In case some developers are interested... I'm poking at a from-scratch
> userland NFSv4 server, as a side project.
Cool! Being one of the unfs3 developers, I believe that userland NFS
servers are very useful.
> In the first step down this long path, I've created an NFSv4 userland server
> from scratch. Currently it merely serves data straight from RAM, but the long
Do you know about the n4 project
(http://cvs.samba.org/cgi-bin/cvsweb/n4/) ? It was abandon many years
ago, but might have some usefulness.
> term goal is to permit modular storage backends. Thus you could implement a
> simple RAM backend, an sqlite-based backend or a complex distributed storage
> backend.
unfs3 has a basic modular backend system. I wonder if it would be
possible to merge your server with unfs3, and still have something
that's readable. If you are interested, take a look at
http://cvs.lysator.liu.se/viewcvs/viewcvs.cgi/unfs3/?root=unfs3.
Regards,
---
Peter Åstrand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Linköping Phone: +46-13-21 46 00
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 9:15 ` Peter Åstrand
@ 2008-01-04 10:05 ` Neil Brown
[not found] ` <Pine.LNX.4.64.0801040954070.5004-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-04 20:31 ` Muntz, Daniel
2 siblings, 0 replies; 37+ messages in thread
From: Neil Brown @ 2008-01-04 10:05 UTC (permalink / raw)
To: Peter Åstrand; +Cc: Jeff Garzik, NFS list, nfsv4
On Friday January 4, astrand-+4tYiAq3b6azQB+pC5nmwQ@public.gmane.org wrote:
>
> > Oh, certainly. I was mainly thinking a replacement of the wire protocol would
> > be an easier step for people to swallow than a new protocol.
>
> I've been thinking of trying to put together something like NFS v3.5. Some
> parts of v4 are nice, but the complexity is too high.
>
That is soooooooo tempting.
Of course we would need to be clear on how it is better than NFS,
CIFS, 9P, and CRFS to name but a few. And it would not be a small
undertaking.
But it is very tempting.
NeilBrown
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <Pine.LNX.4.64.0801040954070.5004-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
@ 2008-01-04 13:50 ` Frank van Maarseveen
2008-01-04 16:41 ` Jeff Garzik
1 sibling, 0 replies; 37+ messages in thread
From: Frank van Maarseveen @ 2008-01-04 13:50 UTC (permalink / raw)
To: Peter Åstrand; +Cc: Jeff Garzik, NFS list, nfsv4
On Fri, Jan 04, 2008 at 10:15:00AM +0100, Peter =C5strand wrote:
>=20
> [About v4]
>=20
>=20
> On Fri, 4 Jan 2008, Jeff Garzik wrote:
>=20
> > > > I really wish the entire wire protocol were scrapped and replac=
ed with
> > > > something more sane, and easier to parse.=20
> > > You had me worried there for a moment, I thought you might be the=
first
> > > person to admit to liking the NFS4 protocol design.
>=20
> Couldn't agree more.=20
>=20
>=20
>=20
> > In my personal opinion, version 4 of NFS is a quantum-leap improvem=
ent over
> > previous versions. While I used NFS v3 extensively, I always felt =
it was a
> > crappy protocol, and unworthy of serious development effort. That c=
hanged with
> > v4.
> ...
> > > > It's tempting to see what would arise from a clean-slate wire p=
rotocol
> > > > effort, something that is otherwise compatible with NFS 4.x ope=
rations,
> > > > objects, and data model.
> >=20
> > It's more like v4 is a vast relative improvement over prior NFS. G=
iven the
> > huge number of NFS users and sites, IMO v4 is a huge improvement fo=
r Unix file
> > sharing overall.
>=20
> Many years ago, before NFSv4 was finished, I felt the same. I was wai=
ting=20
> for v4 and thought that everything would be so much better. I wanted =
to=20
> help and started the "pynfs" project. Today, I have a different opini=
on. I=20
> think v3 is a fairly good protocol, if you use it correctly. For exam=
ple,=20
> many people don't realize that you don't need the portmapper, that yo=
u can=20
> use a single well-known TCP port, that you can use RPCSEC_GSS and so=20
> forth, even with v3.=20
Somehow this reminds me of IPv4 vs. IPv6. IIRC some protocol features h=
ave
in a sense been "backported" to IPv4.
--=20
=46rank
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
@ 2008-01-04 15:28 Rick Macklem
[not found] ` <200801041528.KAA18776-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
0 siblings, 1 reply; 37+ messages in thread
From: Rick Macklem @ 2008-01-04 15:28 UTC (permalink / raw)
To: jeff; +Cc: linux-nfs, nfsv4
> Plus, surely in this day and age, we can figure out something better
> than waiting for face-to-face events to test something. Maybe somebody
> could arrange a donation of some slice of a grid (Amazon EC2?), make
> various OS images available, and give engineers some way to request a
> selection of tests, with a selection of OS images?
I tried putting a server up accessible over the internet and only ever
got one person testing on it once (or maybe it was just a hacker:-). I
did test my client against a server at CITI once, after signing a
bakeathon NDA. But, I agree, and I don't really think it even needs
a central site. I don't see why vendors couldn't put up servers
(production software or whatever they are comfortable having internet
accessible) that clients can test against. I'll be happy to put my
server up and I'd be happy to test against internet accessible servers
with my client.
And, like you, I don't get to connectathon since I don't "make a living
at this" (to loosely quote another poster).
rick
Btw: You might want to post to nfsv4@ietf.org w.r.t. interoperability
testing, since that will catch people who don't lurk on this list.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
@ 2008-01-04 15:48 Rick Macklem
[not found] ` <200801041548.KAA18953-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
0 siblings, 1 reply; 37+ messages in thread
From: Rick Macklem @ 2008-01-04 15:48 UTC (permalink / raw)
To: gnb; +Cc: linux-nfs, nfsv4
> You had me worried there for a moment, I thought you might be the first
> person to admit to liking the NFS4 protocol design.
I actually like quite a bit about it, although I agree that the XDR/Sun
RPC underpinnings are getting pretty tired (mid-1980s). I liked Sessions,
but think that it's gotten overly complex (why 5 required encrypted
checksum algorithms, wouldn't one be enough? for example). It would have been simpler,
if it had been "posix only" and not tried to be Windows compatible, but
I see the argument for Windows compatibility, hense the Open.
> The classic persistent file handles, for example, could be considered a
> major
> design flaw. Firstly it makes the inode# -> dentry lookup a performance
> path
> for the underlying filesystem, which it isn't in any local load.
Sounds like a server implementor's perspective. From a client implementor's
point of view, a T-stable file handle is a wonderful thing. I have no idea
how to correctly implement client side support for the volatile file
handles allowed in NFSv4. My client doesn't support them.
> Secondly, it's
> inherently insecure if you export anything less than an entire
> filesystem, unless
> you use a slow, buggy, and non-conformant hack like subtree_check.
Security is definitely an issue. The RPCSEC_GSS stuff works ok, but it
would have been nice to have some sort of "machine credential" that
could be used to authenticate a client (the host credential used by
SetClientID etc, kinda does that, but it isn't really specified). It
seems easier to do encryption/encrypted checksumming down by the transport
layer and not visible to the RPC. With that, many sites would be
comfortable with simpler user credentials than what Kerberos provides.
(Actually, I think most of the problem is that nice tools to set up and
manage Kerberos aren't in most Unix-like systems, so sysadmins don't
want to bother with Kerberos.)
rick
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 9:07 ` Benny Halevy
@ 2008-01-04 15:49 ` Jeff Garzik
2008-01-04 19:51 ` Benny Halevy
2008-01-05 1:46 ` Greg Banks
2008-01-04 17:47 ` J. Bruce Fields
1 sibling, 2 replies; 37+ messages in thread
From: Jeff Garzik @ 2008-01-04 15:49 UTC (permalink / raw)
To: Benny Halevy; +Cc: NFS list, nfsv4, Greg Banks
Benny Halevy wrote:
> Jeff, taking into account the amount of effort people and different
> organizations have already put into NFSv4 and NFSv4.1 I wish you could
> tunnel your inventive energy into making NFSv4.1 better rather than
> trying to reinvent NFS/RPC/XDR.
>
> Although It's rather late in the process since the NFSv4 working group
> is close to putting the NFSv4.1 Internet-draft up for last
> call, we would certainly appreciate more implementation feedback.
I am more than happy to give feedback, though (as you say) it is
probably too late for substantial feedback to have any large effect.
My general engineering opinions of pNFS:
* Fills an obvious need: eliminating the need to copy data through the
metadata interface to backend storage. Many clear, tangible benefits here.
* pNFS major issue #1: client storage protocol
Storing and retrieving blobs over the network, with strong
authentication/integrity/security, is a solved problem.
Pick ONE client storage protocol (HTTP? iSCSI OSD2?), and stick to it.
Or maybe HTTP|SCSI but nothing more. Heck, even BitTorrent w/ auth
extensions would be better than yet another protocol for similar
purposes (not that I'm advocating BT, just saying...).
Maximize reuse of existing software and mindshare.
* pNFS major issue #2: abandons NFS's "one true generic" path
I believe pNFS violates the "spirit of NFS" by deviating from a defacto
assumption found in earlier versions: data transfer is simple,
arbitrary blobs, addressed in the same manner, and sent via the same
protocol.
Pick ONE layout type, and stick to it. Banish all other layout types to
other software layers.
Protocol conversion servers, firmware, and other softwares can easily
convert from a generic layout to something more exotic like OSD or
[insert site specific protocol here].
NFS itself should not be delving into low-level storage details like
this. Clients should not need to know low-level details (like stripe
sizes). In Linux, we call this a layering violation.
Working on kernel storage drivers as I do, I can see the attraction of
wanting to do things this way... but we invented layering and
abstraction in computer science for good reasons :)
* pNFS major issue #3: no longer a "closed loop" protocol
By permitting multiple layout types, and in particular undefined
(site-specific) layout types, it is by definition _impossible_ for
anyone to claim full protocol interoperability with other implementations.
The number of possible combinations approaches infinity, with obvious
consequences on testing, and production software quality.
And when a marketing department advertises "fully NFSv4.1 compliant!" on
ther company's appliance, it is trivial for any engineer to construct
another "fully NFSv4.1 compliant" setup -- with equivalent
authentication, metadata and data sets -- that is not interoperable
except via the fallback case (copy through the metadata server).
Such interoperability breakdowns are IMO not in the spirit of NFS.
Jeff
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 9:15 ` Peter Åstrand
@ 2008-01-04 16:14 ` Jeff Garzik
2008-01-04 19:58 ` Peter Åstrand
0 siblings, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2008-01-04 16:14 UTC (permalink / raw)
To: Peter Åstrand; +Cc: NFS list, nfsv4
Peter =C5strand wrote:
> Do you know about the n4 project
> (http://cvs.samba.org/cgi-bin/cvsweb/n4/) ? It was abandon many years
> ago, but might have some usefulness. =
Nope, I'll definitely take look. Already imported it into git using =
git-cvsimport. :)
>> term goal is to permit modular storage backends. Thus you could impleme=
nt a
>> simple RAM backend, an sqlite-based backend or a complex distributed sto=
rage
>> backend.
> =
> unfs3 has a basic modular backend system. I wonder if it would be
> possible to merge your server with unfs3, and still have something
> that's readable. If you are interested, take a look at
> http://cvs.lysator.liu.se/viewcvs/viewcvs.cgi/unfs3/?root=3Dunfs3.
I'm always interested in [legally] stealing useful ideas and code, so I =
will definitely take a look.
I was sorta thinking about implementing a couple backends, and seeing =
what API organically appears. That's sorta how Linux kernel API =
"design" happens, and it tends to produce something useful and compact, =
if not a bit unique :)
I can imagine that some backends may wish that the server handle some =
details of state and locking, while other backends may wish to record =
all that information into a database in stable storage. So it's =
difficult to forecast how all that will fall out in the end. unfs3 =
probably has many lessons to teach me...
So far my best resource for NFS technical "folklore" is generally =
google, which turns up a wealth of useful mailing list discussions =
involving neilb, meisler, and others.
Jeff
P.S. cvsps, the util git-cvsimport uses, doesn't seem to like the unfs3 =
CVS repository. Any ideas?
Running cvsps...
connect error: Network is unreachable
cvs rlog: Logging unfs3
cvs rlog: Logging unfs3/Config
cvs rlog: Logging unfs3/Extras
cvs rlog: Logging unfs3/contrib
cvs rlog: Logging unfs3/contrib/nfsotpclient
cvs rlog: Logging unfs3/contrib/nfsotpclient/mountclient
cvs rlog: Logging unfs3/contrib/rpcproxy
cvs rlog: Logging unfs3/doc
Fetching LICENSE v 1.1
New LICENSE: 1416 bytes
Fetching Makefile.in v 1.1
Unknown: error
The same command works just fine with 99% of cvs repositories out there, =
pserver, ssh, or whatever.
And a regular CVS checkout works just fine, I am able to check out and =
browse files and look for gems.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <Pine.LNX.4.64.0801040954070.5004-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-04 13:50 ` Frank van Maarseveen
@ 2008-01-04 16:41 ` Jeff Garzik
2008-01-04 20:03 ` Peter Åstrand
1 sibling, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2008-01-04 16:41 UTC (permalink / raw)
To: Peter Åstrand; +Cc: NFS list, nfsv4
Peter =C3=85strand wrote:
> Many years ago, before NFSv4 was finished, I felt the same. I was wai=
ting=20
> for v4 and thought that everything would be so much better. I wanted =
to=20
> help and started the "pynfs" project. Today, I have a different opini=
on. I=20
<grin>
> think v3 is a fairly good protocol, if you use it correctly. For exam=
ple,=20
> many people don't realize that you don't need the portmapper, that yo=
u can=20
> use a single well-known TCP port, that you can use RPCSEC_GSS and so=20
> forth, even with v3.=20
Absolutely... But still, I think integrated mount protocol (aka pseudo=
=20
filesystem namespace) and integrated locking were big steps forward.=20
You really shouldn't need more than one protocol.
Speaking of RPCSEC_GSS: I would love to see a much more straightforward=
=20
authentication process, something /not/ buried inside special behaviors=
=20
triggered by opcodes found in an opaque cred struct :/ RPCSEC_GSS=20
context creation, the special casing around the 'null' procedure, and=20
the overloading of the RPC data portion of things is a huge pain to=20
implement.
Authentication and security should be simple, tough to screw up. I=20
would tend to prefer an ASCII-based authentication/security negotiation=
=20
at the start of a [SCTP|TCP] stream.
Use TLS to give most people what they want: AUTH_SYS with encryption.=20
GSSAPI is fine as a "required option" but you shouldn't need GSSAPI to=20
do simple wire encryption between IP-authenticated hosts.
> I think v4 has a few valuable improvements, but it comes with a very =
high=20
> price. v3 has a minimalistic beauty which v4 lacks. For example, take=
a=20
> look at the OPEN operation with 7 arguments, of which many are comple=
x=20
> data structures:
>=20
> (cfh), seqid, share_access, share_deny, owner, openhow, claim ->
> (cfh), stateid, cinfo, rflags, open_confirm, attrset delegation
>=20
> Not pretty... =20
heh, tell me about it. First I started out using rpcgen, then rewrote=20
everything to do raw XDR decoding. OPEN is huge.
IMO, OPEN should be split into multiple operations, probably one for=20
each "OPEN arm". It's not like new opcode numbers are expensive.
Or, hope of hopes, simplify OPEN in some other manner, like delegating=20
tasks to other operations.
>> Oh, certainly. I was mainly thinking a replacement of the wire prot=
ocol would
>> be an easier step for people to swallow than a new protocol.
>=20
> I've been thinking of trying to put together something like NFS v3.5.=
Some=20
> parts of v4 are nice, but the complexity is too high.=20
Agreed that's it's quite complex.
One of my personal desires is for a high level of cache coherence=20
throughout the system for all clients (though perhaps an admin could=20
optionally relax this requirement). I'm a fan of Google's "Chubby", a=20
distributed reliable filesystem that stalls client writes until cache=20
invalidations for the associated byte range are processed for all=20
interested clients.
And anything approaching cache coherence requires some complexity :/
Another thing I like about NFSv4 is that batching sequences into chunks=
=20
of fine-grained operations is generally a useful practice. So while th=
e=20
end result (COMPOUND) is a bit of a pain, bundling a sequence of=20
operations into a single unit is useful.
Jeff
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
@ 2008-01-04 17:11 Rick Macklem
[not found] ` <200801041711.MAA19577-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
0 siblings, 1 reply; 37+ messages in thread
From: Rick Macklem @ 2008-01-04 17:11 UTC (permalink / raw)
To: jeff; +Cc: linux-nfs, nfsv4
> heh, tell me about it. First I started out using rpcgen, then rewrote
> everything to do raw XDR decoding. OPEN is huge.
>
> IMO, OPEN should be split into multiple operations, probably one for
> each "OPEN arm". It's not like new opcode numbers are expensive.
As I hinted at, Open is the way it is, since Windows requires one Op
so that Open/Share locks can be implemented correctly. Anything else
would not have satisfied a Win client's requirements.
> One of my personal desires is for a high level of cache coherence
> throughout the system for all clients (though perhaps an admin could
> optionally relax this requirement). I'm a fan of Google's "Chubby", a
> distributed reliable filesystem that stalls client writes until cache
> invalidations for the associated byte range are processed for all
> interested clients.
Delegations provide cache coherency "in a sense". When a client has a
delegation, it knows that no-one else is writing the file. Unfortunately,
as soon as a client gets an Open without a delegation, in no longer
gets the conherency guarantee (and servers are completely free to not
issue delegations if they don't feel like doing so). A client can
re-open a file when it gets an Open without a delegation, but if the
server still doesn't give it a delegation, it can't do anything more.
(The re-open trick is useful for an Open that requires confirmation, since
the server can't issue a delegation for that case.)
rick
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <200801041548.KAA18953-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
@ 2008-01-04 17:15 ` J. Bruce Fields
2008-01-05 2:32 ` Greg Banks
1 sibling, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2008-01-04 17:15 UTC (permalink / raw)
To: Rick Macklem; +Cc: gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf, jeff, linux-nfs, nfsv4
On Fri, Jan 04, 2008 at 10:48:39AM -0500, Rick Macklem wrote:
> Sounds like a server implementor's perspective. From a client implementor's
> point of view, a T-stable file handle is a wonderful thing. I have no idea
> how to correctly implement client side support for the volatile file
> handles allowed in NFSv4. My client doesn't support them.
As long as they persist while you have an open (or a delegation), it
shouldn't be so hard to implement, should it? If a filehandle expires,
then you throw away any cache associated with it, but as long as no
applications hold file descriptors for it, that's not a catastrophe.
But I'm a little confused whether rfc 3530's 4.2.3 gives a way for the
server to express that guarantee.
--b.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <200801041528.KAA18776-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
@ 2008-01-04 17:21 ` J. Bruce Fields
2008-01-04 18:03 ` Tom Haynes
2008-01-04 19:50 ` Jeff Garzik
0 siblings, 2 replies; 37+ messages in thread
From: J. Bruce Fields @ 2008-01-04 17:21 UTC (permalink / raw)
To: Rick Macklem; +Cc: jeff, linux-nfs, nfsv4
On Fri, Jan 04, 2008 at 10:28:10AM -0500, Rick Macklem wrote:
> > Plus, surely in this day and age, we can figure out something better
> > than waiting for face-to-face events to test something. Maybe somebody
> > could arrange a donation of some slice of a grid (Amazon EC2?), make
> > various OS images available, and give engineers some way to request a
> > selection of tests, with a selection of OS images?
>
> I tried putting a server up accessible over the internet and only ever
> got one person testing on it once (or maybe it was just a hacker:-). I
> did test my client against a server at CITI once, after signing a
> bakeathon NDA. But, I agree, and I don't really think it even needs
> a central site. I don't see why vendors couldn't put up servers
> (production software or whatever they are comfortable having internet
> accessible) that clients can test against. I'll be happy to put my
> server up and I'd be happy to test against internet accessible servers
> with my client.
Ditto. I think it'd be great to have a variety of client and server
implementations available over the net, but I've had no luck talking
anybody else into it.
--b.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
@ 2008-01-04 17:28 Rick Macklem
[not found] ` <200801041728.MAA19743-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
0 siblings, 1 reply; 37+ messages in thread
From: Rick Macklem @ 2008-01-04 17:28 UTC (permalink / raw)
To: bfields; +Cc: linux-nfs, nfsv4, gnb
> As long as they persist while you have an open (or a delegation), it
> shouldn't be so hard to implement, should it? If a filehandle expires,
> then you throw away any cache associated with it, but as long as no
> applications hold file descriptors for it, that's not a catastrophe.
>
> But I'm a little confused whether rfc 3530's 4.2.3 gives a way for the
> server to express that guarantee.
Agreed, but I've always assumed the server can return NFS4ERR_FHEXPIRED
at any time. (It's listed as a error for many Ops, such as Read and Write.)
Also, what does a client do after a server reboot. It can't use
Open/Claim_previous for recovery.
Even if there is a "don't expire while Open" guarantee, it's still a pita
for the client to hang onto pathnames for directories and such, so that
they can re-lookup the fh. (And if that re-lookup happens to fail or end
up in a different place?)
rick
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <200801041728.MAA19743-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
@ 2008-01-04 17:42 ` J. Bruce Fields
2008-01-04 17:45 ` Trond Myklebust
1 sibling, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2008-01-04 17:42 UTC (permalink / raw)
To: Rick Macklem; +Cc: gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf, jeff, linux-nfs, nfsv4
On Fri, Jan 04, 2008 at 12:28:26PM -0500, Rick Macklem wrote:
> > As long as they persist while you have an open (or a delegation), it
> > shouldn't be so hard to implement, should it? If a filehandle expires,
> > then you throw away any cache associated with it, but as long as no
> > applications hold file descriptors for it, that's not a catastrophe.
> >
> > But I'm a little confused whether rfc 3530's 4.2.3 gives a way for the
> > server to express that guarantee.
>
> Agreed, but I've always assumed the server can return NFS4ERR_FHEXPIRED
> at any time. (It's listed as a error for many Ops, such as Read and Write.)
> Also, what does a client do after a server reboot. It can't use
> Open/Claim_previous for recovery.
I was hoping that FH4_VOL_NOEXPIRE_WITH_OPEN might also cover opens over
server reboot, but that's a question for the ietf list. And perhaps the
requirement to keep that filehandle->inode mapping in persistant storage
would negate the advantage to the server of volatile filehandles.
> Even if there is a "don't expire while Open" guarantee, it's still a pita
> for the client to hang onto pathnames for directories and such, so that
> they can re-lookup the fh. (And if that re-lookup happens to fail or end
> up in a different place?)
I'm confused--can't you just throw away your lookup/readdir cache for
that directory and not use it again until an application actually does a
new lookup?
Oh, but I guess the client can hold references to the directory itself
in the form of filehandles or current working directories. So I guess
you'd need some kind of open/close (or get/put) operations for
directories as well to get agreed-on lifetimes for directory
filehandles. Does it still seem worth it after all that?
--b.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <200801041728.MAA19743-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:42 ` J. Bruce Fields
@ 2008-01-04 17:45 ` Trond Myklebust
1 sibling, 0 replies; 37+ messages in thread
From: Trond Myklebust @ 2008-01-04 17:45 UTC (permalink / raw)
To: Rick Macklem
Cc: bfields, linux-nfs, nfsv4, gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf
On Fri, 2008-01-04 at 12:28 -0500, Rick Macklem wrote:
> > As long as they persist while you have an open (or a delegation), it
> > shouldn't be so hard to implement, should it? If a filehandle expires,
> > then you throw away any cache associated with it, but as long as no
> > applications hold file descriptors for it, that's not a catastrophe.
> >
> > But I'm a little confused whether rfc 3530's 4.2.3 gives a way for the
> > server to express that guarantee.
>
> Agreed, but I've always assumed the server can return NFS4ERR_FHEXPIRED
> at any time. (It's listed as a error for many Ops, such as Read and Write.)
> Also, what does a client do after a server reboot. It can't use
> Open/Claim_previous for recovery.
You need to check the fh_expire_type attribute in order to figure out
what the rules are. If the bits FH4_VOLATILE_ANY and
FH4_NO_EXPIRE_WITH_OPEN are set, then the rule is that the filehandle
will not expire on OPEN as Bruce said.
On server reboot, the client gets the additional task of recovering
filehandles before it can recover opens. It can't do an OPEN by
pathname, since the server may have imposed a grace period, which will
not allow this.
Yes, this is all very silly...
> Even if there is a "don't expire while Open" guarantee, it's still a pita
> for the client to hang onto pathnames for directories and such, so that
> they can re-lookup the fh. (And if that re-lookup happens to fail or end
> up in a different place?)
Worse: if the server doesn't support fileids (which are non-mandatory
'cos NTFS & co don't have them) then you have no way to figure out if
the directory you just looked up is the same as the one you have cached.
So, while volatile filehandles may seem like a boon to server
implementations, they can still make life hell for the clients.
Trond
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 9:07 ` Benny Halevy
2008-01-04 15:49 ` Jeff Garzik
@ 2008-01-04 17:47 ` J. Bruce Fields
2008-01-04 19:55 ` Benny Halevy
1 sibling, 1 reply; 37+ messages in thread
From: J. Bruce Fields @ 2008-01-04 17:47 UTC (permalink / raw)
To: Benny Halevy; +Cc: Jeff Garzik, NFS list, nfsv4, Greg Banks
On Fri, Jan 04, 2008 at 11:07:45AM +0200, Benny Halevy wrote:
> Jeff, taking into account the amount of effort people and different
> organizations have already put into NFSv4 and NFSv4.1 I wish you could
> tunnel your inventive energy into making NFSv4.1 better rather than
> trying to reinvent NFS/RPC/XDR.
It's also important to have fun. Imagining what you could do from a
clean slate is, if nothing else, a fun exercise. And it may end up with
ideas that turn out to be implementable without starting from scratch.
--b.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 17:21 ` J. Bruce Fields
@ 2008-01-04 18:03 ` Tom Haynes
[not found] ` <477E750A.2030905-8AdZ+HgO7noAvxtiuMwx3w@public.gmane.org>
2008-01-04 19:50 ` Jeff Garzik
1 sibling, 1 reply; 37+ messages in thread
From: Tom Haynes @ 2008-01-04 18:03 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs, nfsv4
J. Bruce Fields wrote:
> Ditto. I think it'd be great to have a variety of client and server
> implementations available over the net, but I've had no luck talking
> anybody else into it.
>
> --b.
> _______________________________________________
> NFSv4 mailing list
> NFSv4@linux-nfs.org
> http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
>
I suspect that the explosion of virtual servers has
probably killed this type of effort. It appears much
easier for me to give you an image than to expose
a machine on the network.
Sun actually has a set of test machines the public can
use to regression test OpenSolaris fixes. I'm not sure
if it can accommodate foreign OSes just yet.
And to add to Rick's story, we've got the pNFS enabled
bits available for download. We've only gotten feedback
from one person. So either our code is really great and
we are not getting feedback, or people are just waiting
for the pNFS implementations to mature.
We did get a request to make the bits available as a
VMware image. :->
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <477E750A.2030905-8AdZ+HgO7noAvxtiuMwx3w@public.gmane.org>
@ 2008-01-04 18:21 ` J. Bruce Fields
0 siblings, 0 replies; 37+ messages in thread
From: J. Bruce Fields @ 2008-01-04 18:21 UTC (permalink / raw)
To: Tom Haynes; +Cc: Rick Macklem, linux-nfs, nfsv4
On Fri, Jan 04, 2008 at 12:03:54PM -0600, Tom Haynes wrote:
> I suspect that the explosion of virtual servers has
> probably killed this type of effort. It appears much
> easier for me to give you an image than to expose
> a machine on the network.
Each little bit you can take away from the overhead will make people
much more likely to test. The connectathon experience is just:
- Look up server's name.
- mount servername:/ /mnt/
- run test
The virtual server image approach requires at least getting the server
image and probably doing some configuration too, and then remembering to
get updates occasionally.
> Sun actually has a set of test machines the public can
> use to regression test OpenSolaris fixes. I'm not sure
> if it can accommodate foreign OSes just yet.
>
> And to add to Rick's story, we've got the pNFS enabled
> bits available for download. We've only gotten feedback
> from one person. So either our code is really great and
> we are not getting feedback, or people are just waiting
> for the pNFS implementations to mature.
Or it got released just before the holidays....
>
> We did get a request to make the bits available as a
> VMware image. :->
Hah.
--b.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 17:21 ` J. Bruce Fields
2008-01-04 18:03 ` Tom Haynes
@ 2008-01-04 19:50 ` Jeff Garzik
2008-01-04 19:57 ` Peter Åstrand
1 sibling, 1 reply; 37+ messages in thread
From: Jeff Garzik @ 2008-01-04 19:50 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Rick Macklem, linux-nfs, nfsv4
J. Bruce Fields wrote:
> On Fri, Jan 04, 2008 at 10:28:10AM -0500, Rick Macklem wrote:
>>> Plus, surely in this day and age, we can figure out something better
>>> than waiting for face-to-face events to test something. Maybe somebody
>>> could arrange a donation of some slice of a grid (Amazon EC2?), make
>>> various OS images available, and give engineers some way to request a
>>> selection of tests, with a selection of OS images?
>> I tried putting a server up accessible over the internet and only ever
>> got one person testing on it once (or maybe it was just a hacker:-). I
>> did test my client against a server at CITI once, after signing a
>> bakeathon NDA. But, I agree, and I don't really think it even needs
>> a central site. I don't see why vendors couldn't put up servers
>> (production software or whatever they are comfortable having internet
>> accessible) that clients can test against. I'll be happy to put my
>> server up and I'd be happy to test against internet accessible servers
>> with my client.
>
> Ditto. I think it'd be great to have a variety of client and server
> implementations available over the net, but I've had no luck talking
> anybody else into it.
I think blanket public access wouldn't be as effective as passworded
access to a cluster, much like how people get accounts on kernel.org
(which is an excellent model for shared-interest services).
On the test cluster, I would want to be able to really stress my
software, which to any normal firewall or casual observer would look
like a DoS attempt.
Jeff
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 15:49 ` Jeff Garzik
@ 2008-01-04 19:51 ` Benny Halevy
2008-01-05 1:46 ` Greg Banks
1 sibling, 0 replies; 37+ messages in thread
From: Benny Halevy @ 2008-01-04 19:51 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Greg Banks, NFS list, nfsv4
Jeff Garzik wrote:
> Benny Halevy wrote:
>> Jeff, taking into account the amount of effort people and different
>> organizations have already put into NFSv4 and NFSv4.1 I wish you could
>> tunnel your inventive energy into making NFSv4.1 better rather than
>> trying to reinvent NFS/RPC/XDR.
>>
>> Although It's rather late in the process since the NFSv4 working group
>> is close to putting the NFSv4.1 Internet-draft up for last
>> call, we would certainly appreciate more implementation feedback.
>
>
> I am more than happy to give feedback, though (as you say) it is
> probably too late for substantial feedback to have any large effect.
>
> My general engineering opinions of pNFS:
>
> * Fills an obvious need: eliminating the need to copy data through
> the metadata interface to backend storage. Many clear, tangible
> benefits here.
>
>
> * pNFS major issue #1: client storage protocol
>
> Storing and retrieving blobs over the network, with strong
> authentication/integrity/security, is a solved problem.
>
> Pick ONE client storage protocol (HTTP? iSCSI OSD2?), and stick to it.
> Or maybe HTTP|SCSI but nothing more. Heck, even BitTorrent w/ auth
> extensions would be better than yet another protocol for similar
> purposes (not that I'm advocating BT, just saying...).
Well, two things about that: maybe if pnfs started from scratch this
was the approach that could have been taken but one of the motivating
factors for pNFS (and actually one that I believe will help make it
successful)
was the desire to replace existing proprietary file system protocols
from several vendors such as EMC, IBM, or Panasas that used
different storage protocols. Second, providing support for several
kinds of storage is better for customers having existing storage
they want to harness together with pNFS.
>
> Maximize reuse of existing software and mindshare.
That's always good.
>
>
> * pNFS major issue #2: abandons NFS's "one true generic" path
>
> I believe pNFS violates the "spirit of NFS" by deviating from a
> defacto assumption found in earlier versions: data transfer is
> simple, arbitrary blobs, addressed in the same manner, and sent via
> the same protocol.
>
> Pick ONE layout type, and stick to it. Banish all other layout types
> to other software layers.
I agree wholeheartedly that we should have had one layout data structure
("layout type" is a loaded
term in the spec...). This was my position from day one (and even
before) but unfortunately it wasn't
accepted and each "layout type" got to define it's own layout data
structure while we could have
defined one generic data structure for mapping files onto all different
kinds of storage devices
while keeping only the device addressing information private to the
"layout type" (== storage
protocol class), plus some other data that's internal to the layout
type, e.g. OSD capabilities.
>
> Protocol conversion servers, firmware, and other softwares can easily
> convert from a generic layout to something more exotic like OSD or
> [insert site specific protocol here].
>
> NFS itself should not be delving into low-level storage details like
> this. Clients should not need to know low-level details (like stripe
> sizes). In Linux, we call this a layering violation.
This is the essence of the layout type concept, implemented as a layout
driver in the linux nfsv4.1 client
implementation. The fact that the files layout type definition are part
of the nfsv4.1 protocol are a mere
fact that the files based layout type uses NFSv4.1 as the storage
protocol. It could very well be defined
in a separate RFC exactly like the blocks and objects layout type
specifications and that can keep its
internal data structures opaque to the generic NFSv4.1 protocol (which
behaves as a transport protocol
for the layout-type specific data)
>
> Working on kernel storage drivers as I do, I can see the attraction of
> wanting to do things this way... but we invented layering and
> abstraction in computer science for good reasons :)
Yup. I think that the layout driver is the software layer you're looking
for.
>
>
> * pNFS major issue #3: no longer a "closed loop" protocol
>
> By permitting multiple layout types, and in particular undefined
> (site-specific) layout types, it is by definition _impossible_ for
> anyone to claim full protocol interoperability with other
> implementations.
>
> The number of possible combinations approaches infinity, with obvious
> consequences on testing, and production software quality.
The non-standard layout types are defined as experimental. To claim
interoperability one would
need to publish a suitable specification of the new layout type (see
"Defining new layout types",
section 22.4 of
http://www.nfsv4-editor.org/draft-18/draft-ietf-nfsv4-minorversion1-18.html#pnfsiana).
Though I agree that allowing a single file to be accessed with multiple
layout types (in theory)
complicates testing typically there will be at most one layout type per
file system.
>
> And when a marketing department advertises "fully NFSv4.1 compliant!"
> on ther company's appliance, it is trivial for any engineer to
> construct another "fully NFSv4.1 compliant" setup -- with equivalent
> authentication, metadata and data sets -- that is not interoperable
> except via the fallback case (copy through the metadata server).
The compliance with NFSv4.1 is indeed tied with the legacy I/O path, and
for pnfs, with the files layout type,
as it is a part of NFSv4.1. Other implementation of NFSv4.1 with pNFS
over non-files layout types will
have to claim compliance with their respective standards. For example,
Panasas's implementation will need
to comply with the OSD standard, iSCSI (or FC), the Object-based pNFS
RFC, and finally NFSv4.1
> Such interoperability breakdowns are IMO not in the spirit of NFS.
That's a part of making progress IMO...
Benny
>
> Jeff
>
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 17:47 ` J. Bruce Fields
@ 2008-01-04 19:55 ` Benny Halevy
0 siblings, 0 replies; 37+ messages in thread
From: Benny Halevy @ 2008-01-04 19:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: NFS list, nfsv4, Greg Banks
J. Bruce Fields wrote:
> On Fri, Jan 04, 2008 at 11:07:45AM +0200, Benny Halevy wrote:
>
>> Jeff, taking into account the amount of effort people and different
>> organizations have already put into NFSv4 and NFSv4.1 I wish you could
>> tunnel your inventive energy into making NFSv4.1 better rather than
>> trying to reinvent NFS/RPC/XDR.
>>
>
> It's also important to have fun. Imagining what you could do from a
> clean slate is, if nothing else, a fun exercise. And it may end up with
> ideas that turn out to be implementable without starting from scratch.
>
> --b.
>
Absolutely :)
And I certainly hope it will benefit all of us.
Benny
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 19:50 ` Jeff Garzik
@ 2008-01-04 19:57 ` Peter Åstrand
[not found] ` <Pine.LNX.4.64.0801042055490.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
0 siblings, 1 reply; 37+ messages in thread
From: Peter Åstrand @ 2008-01-04 19:57 UTC (permalink / raw)
To: Jeff Garzik; +Cc: linux-nfs, nfsv4
[-- Attachment #1: Type: TEXT/PLAIN, Size: 908 bytes --]
On Fri, 4 Jan 2008, Jeff Garzik wrote:
> > Ditto. I think it'd be great to have a variety of client and server
> > implementations available over the net, but I've had no luck talking
> > anybody else into it.
>
> I think blanket public access wouldn't be as effective as passworded access to
> a cluster, much like how people get accounts on kernel.org (which is an
> excellent model for shared-interest services).
>
> On the test cluster, I would want to be able to really stress my software,
> which to any normal firewall or casual observer would look like a DoS attempt.
I'm not afraid of stress-tests that looks like DoS attempts. What worries
me is that a writable export will be used for sharing warez or something
like that.
Rgds,
---
Peter Åstrand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Linköping Phone: +46-13-21 46 00
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 16:14 ` Jeff Garzik
@ 2008-01-04 19:58 ` Peter Åstrand
0 siblings, 0 replies; 37+ messages in thread
From: Peter Åstrand @ 2008-01-04 19:58 UTC (permalink / raw)
To: Jeff Garzik; +Cc: NFS list, nfsv4
[-- Attachment #1: Type: TEXT/PLAIN, Size: 497 bytes --]
On Fri, 4 Jan 2008, Jeff Garzik wrote:
> P.S. cvsps, the util git-cvsimport uses, doesn't seem to like the unfs3 CVS
> repository. Any ideas?
Seems to work for me. I tested with:
$ cvs -d :pserver:anonymous-D0yiLyI4lD89hKwg4ls7rsugMpMbD5Xr@public.gmane.org:/cvsroot/unfs3 co unfs3
$ cd unfs3
$ cvsps
Using cvsps version 2.1.
Rgds,
---
Peter Åstrand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Linköping Phone: +46-13-21 46 00
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 16:41 ` Jeff Garzik
@ 2008-01-04 20:03 ` Peter Åstrand
[not found] ` <Pine.LNX.4.64.0801042030380.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
0 siblings, 1 reply; 37+ messages in thread
From: Peter Åstrand @ 2008-01-04 20:03 UTC (permalink / raw)
To: Jeff Garzik; +Cc: NFS list, nfsv4
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2089 bytes --]
> > think v3 is a fairly good protocol, if you use it correctly. For example,
> > many people don't realize that you don't need the portmapper, that you can
> > use a single well-known TCP port, that you can use RPCSEC_GSS and so forth,
> > even with v3.
>
> Absolutely... But still, I think integrated mount protocol (aka pseudo
> filesystem namespace) and integrated locking were big steps forward. You
> really shouldn't need more than one protocol.
I don't think the mount protocol is a big problem. Either you can continue
to use the mount protocol and just have one export (/) and export all file
systems below it ("nohide"). Modern clients (such as modern Linux kernels)
will automatically create "sub mounts" when traversing and discovering new
fsids.
Or, you can get rid of the mount protocol by using WebNFS and the public
filehandle.
File locking support is harder, though.
> Authentication and security should be simple, tough to screw up. I would tend
> to prefer an ASCII-based authentication/security negotiation at the start of a
> [SCTP|TCP] stream.
>
> Use TLS to give most people what they want: AUTH_SYS with encryption. GSSAPI
> is fine as a "required option" but you shouldn't need GSSAPI to do simple wire
> encryption between IP-authenticated hosts.
SSH is another option if you just want encryption, but my impression is
that AUTH_SYS is a very big problem as well.
> heh, tell me about it. First I started out using rpcgen, then rewrote
> everything to do raw XDR decoding. OPEN is huge.
>
> IMO, OPEN should be split into multiple operations, probably one for each
> "OPEN arm". It's not like new opcode numbers are expensive.
>
> Or, hope of hopes, simplify OPEN in some other manner, like delegating tasks
> to other operations.
Or perhaps aiming for something less than perfect. Remember, the perfect
is the enemy of the good.
Regards,
---
Peter Åstrand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Linköping Phone: +46-13-21 46 00
[-- Attachment #2: Type: text/plain, Size: 138 bytes --]
_______________________________________________
NFSv4 mailing list
NFSv4@linux-nfs.org
http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4
^ permalink raw reply [flat|nested] 37+ messages in thread
* RE: A new NFSv4 server...
2008-01-04 9:15 ` Peter Åstrand
2008-01-04 10:05 ` Neil Brown
[not found] ` <Pine.LNX.4.64.0801040954070.5004-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
@ 2008-01-04 20:31 ` Muntz, Daniel
2 siblings, 0 replies; 37+ messages in thread
From: Muntz, Daniel @ 2008-01-04 20:31 UTC (permalink / raw)
To: Peter Åstrand, Jeff Garzik; +Cc: NFS list, nfsv4
I like it. NFS 3.5. Drop pNFS into the v3 protocol (a greatly simplified =
pNFS compared to what we ended up with in 4.1), and you'd have yourself a s=
weet little distributed fs. The important part is that any such effort be =
called "NFS x.y". Naming is important, or we'd be running AFS/DFS instead =
of NFS 4.0 (please, try to resist flaming me on the hyperbole).
-Dan
-----Original Message-----
From: Peter =C5strand [mailto:astrand@cendio.se] =
Sent: Friday, January 04, 2008 1:15 AM
To: Jeff Garzik
Cc: NFS list; nfsv4@linux-nfs.org
Subject: Re: A new NFSv4 server...
[About v4]
On Fri, 4 Jan 2008, Jeff Garzik wrote:
> > > I really wish the entire wire protocol were scrapped and replaced =
> > > with something more sane, and easier to parse.
> > You had me worried there for a moment, I thought you might be the =
> > first person to admit to liking the NFS4 protocol design.
Couldn't agree more. =
> In my personal opinion, version 4 of NFS is a quantum-leap improvement =
> over previous versions. While I used NFS v3 extensively, I always =
> felt it was a crappy protocol, and unworthy of serious development =
> effort. That changed with v4.
...
> > > It's tempting to see what would arise from a clean-slate wire =
> > > protocol effort, something that is otherwise compatible with NFS =
> > > 4.x operations, objects, and data model.
> =
> It's more like v4 is a vast relative improvement over prior NFS. =
> Given the huge number of NFS users and sites, IMO v4 is a huge =
> improvement for Unix file sharing overall.
Many years ago, before NFSv4 was finished, I felt the same. I was waiting f=
or v4 and thought that everything would be so much better. I wanted to help=
and started the "pynfs" project. Today, I have a different opinion. I thin=
k v3 is a fairly good protocol, if you use it correctly. For example, many =
people don't realize that you don't need the portmapper, that you can use a=
single well-known TCP port, that you can use RPCSEC_GSS and so forth, even=
with v3. =
I think v4 has a few valuable improvements, but it comes with a very high p=
rice. v3 has a minimalistic beauty which v4 lacks. For example, take a look=
at the OPEN operation with 7 arguments, of which many are complex data str=
uctures:
(cfh), seqid, share_access, share_deny, owner, openhow, claim ->
(cfh), stateid, cinfo, rflags, open_confirm, attrset delegation
Not pretty... =
> Oh, certainly. I was mainly thinking a replacement of the wire =
> protocol would be an easier step for people to swallow than a new protoco=
l.
I've been thinking of trying to put together something like NFS v3.5. Some =
parts of v4 are nice, but the complexity is too high. =
Regards, =
---
Peter =C5strand ThinLinc Chief Developer
Cendio AB http://www.cendio.se
Wallenbergs gata 4
583 30 Link=F6ping Phone: +46-13-21 46 00
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <Pine.LNX.4.64.0801042055490.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
@ 2008-01-05 0:43 ` Jeff Garzik
0 siblings, 0 replies; 37+ messages in thread
From: Jeff Garzik @ 2008-01-05 0:43 UTC (permalink / raw)
To: Peter Åstrand; +Cc: linux-nfs, nfsv4
Peter =C5strand wrote:
> On Fri, 4 Jan 2008, Jeff Garzik wrote:
>=20
>>> Ditto. I think it'd be great to have a variety of client and serve=
r
>>> implementations available over the net, but I've had no luck talkin=
g
>>> anybody else into it.
>> I think blanket public access wouldn't be as effective as passworded=
access to
>> a cluster, much like how people get accounts on kernel.org (which is=
an
>> excellent model for shared-interest services).
>>
>> On the test cluster, I would want to be able to really stress my sof=
tware,
>> which to any normal firewall or casual observer would look like a Do=
S attempt.
>=20
> I'm not afraid of stress-tests that looks like DoS attempts. What wor=
ries=20
> me is that a writable export will be used for sharing warez or someth=
ing=20
> like that.=20
I would think that LAN testing would have more value than WAN testing=20
anyway. So just close off the firewall, except for ssh bastion host or=
=20
similar.
Jeff
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <200801041711.MAA19577-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
@ 2008-01-05 0:51 ` Jeff Garzik
0 siblings, 0 replies; 37+ messages in thread
From: Jeff Garzik @ 2008-01-05 0:51 UTC (permalink / raw)
To: Rick Macklem; +Cc: linux-nfs, nfsv4
Rick Macklem wrote:
>> heh, tell me about it. First I started out using rpcgen, then rewrote
>> everything to do raw XDR decoding. OPEN is huge.
>>
>> IMO, OPEN should be split into multiple operations, probably one for
>> each "OPEN arm". It's not like new opcode numbers are expensive.
>
> As I hinted at, Open is the way it is, since Windows requires one Op
> so that Open/Share locks can be implemented correctly. Anything else
> would not have satisfied a Win client's requirements.
My apologies for being unclear. I meant something along the lines of
giving each OPEN arm its own opcode, to "flatten" things a bit. The
guarantees should and would remain the same, and Windows would continue
to work.
>> One of my personal desires is for a high level of cache coherence
>> throughout the system for all clients (though perhaps an admin could
>> optionally relax this requirement). I'm a fan of Google's "Chubby", a
>> distributed reliable filesystem that stalls client writes until cache
>> invalidations for the associated byte range are processed for all
>> interested clients.
>
> Delegations provide cache coherency "in a sense". When a client has a
> delegation, it knows that no-one else is writing the file. Unfortunately,
> as soon as a client gets an Open without a delegation, in no longer
> gets the conherency guarantee (and servers are completely free to not
> issue delegations if they don't feel like doing so). A client can
> re-open a file when it gets an Open without a delegation, but if the
> server still doesn't give it a delegation, it can't do anything more.
> (The re-open trick is useful for an Open that requires confirmation, since
> the server can't issue a delegation for that case.)
Yes, delegation makes for nice caching. I'm interested in making the
other stuff coherent (as possible) too, for use cases such as
shared-writer files with locking, "watched" files (1 writer, many
readers), files to which many clients append data (a la GoogleFS's
atomic append), directories that are polled by many clients, etc.
It's all in how one juggles workload-specific priorities, really. Some
workloads are nice with push-invalidation strategies like Chubby, others
not so much.
Jeff
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-04 15:49 ` Jeff Garzik
2008-01-04 19:51 ` Benny Halevy
@ 2008-01-05 1:46 ` Greg Banks
2008-01-05 7:56 ` Benny Halevy
1 sibling, 1 reply; 37+ messages in thread
From: Greg Banks @ 2008-01-05 1:46 UTC (permalink / raw)
To: Jeff Garzik; +Cc: nfsv4, NFS list
Jeff Garzik wrote:
> Benny Halevy wrote:
>>
>> Although It's rather late in the process since the NFSv4 working group
>> is close to putting the NFSv4.1 Internet-draft up for last
>> call, we would certainly appreciate more implementation feedback.
>
>
> I am more than happy to give feedback, though (as you say) it is
> probably too late for substantial feedback to have any large effect.
About five years too late. Witness the uncomfortable hacks required to
retrofit the extra Sessions fields into the protocol without changing
the basic COMPOUND arguments and results structures, which a minor version
doesn't allow you to do.
>
> My general engineering opinions of pNFS:
>
> * Fills an obvious need: eliminating the need to copy data through
> the metadata interface to backend storage. Many clear, tangible
> benefits here.
>
>
> *[...]
> Pick ONE client storage protocol [...] and stick to it.
>
> * [...]
> Pick ONE layout type, and stick to it.
> * pNFS major issue #3: no longer a "closed loop" protocol
>
>
> And when a marketing department advertises "fully NFSv4.1 compliant!"
> on ther company's appliance, it is trivial for any engineer to
> construct another "fully NFSv4.1 compliant" setup -- with equivalent
> authentication, metadata and data sets -- that is not interoperable
> except via the fallback case (copy through the metadata server).
>
> Such interoperability breakdowns are IMO not in the spirit of NFS.
Strongly agreed with all the above. It's difficult to avoid the
conclusion that the current pNFS spec is designed to allow the
existing parallel filesystem vendors to sell "standards compliant"
solutions that only work with their client software and where all
the MDS and DS machines need to be bought from the same vendor.
If the spec defined a single layout type, and all three protocols
involved were variants of NFSv4.1 and were defined in the spec,
we would have a true open standard.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <200801041548.KAA18953-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:15 ` J. Bruce Fields
@ 2008-01-05 2:32 ` Greg Banks
1 sibling, 0 replies; 37+ messages in thread
From: Greg Banks @ 2008-01-05 2:32 UTC (permalink / raw)
To: Rick Macklem; +Cc: jeff, linux-nfs, nfsv4
Rick Macklem wrote:
>> You had me worried there for a moment, I thought you might be the first
>> person to admit to liking the NFS4 protocol design.
>>
>
> I actually like quite a bit about it, although I agree that the XDR/Sun
> RPC underpinnings are getting pretty tired (mid-1980s). I liked Sessions,
> but think that it's gotten overly complex (why 5 required encrypted
> checksum algorithms, wouldn't one be enough? for example).
Agreed, the basic ideas behind Sessions are good and long overdue.
> It would have been simpler,
> if it had been "posix only" and not tried to be Windows compatible, but
> I see the argument for Windows compatibility, hense the Open.
>
I don't see the argument. I've yet to meet a sysadmin who would want
to use NFS from a Windows client when that client already has a adequate
remote filesystem client implementation on it.
>
>> The classic persistent file handles, for example, could be considered a
>> major
>> design flaw. Firstly it makes the inode# -> dentry lookup a performance
>> path
>> for the underlying filesystem, which it isn't in any local load.
>>
>
> Sounds like a server implementor's perspective.
Well, yeah ;-)
> From a client implementor's
> point of view, a T-stable file handle is a wonderful thing. I have no idea
> how to correctly implement client side support for the volatile file
> handles allowed in NFSv4. My client doesn't support them.
>
I can see how volatile file handles would be a problem for clients, and
I don't think they're the answer either. For files, a better solution
would be to use an index into a small per-session table of open files.
Directories are a different matter though.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
2008-01-05 1:46 ` Greg Banks
@ 2008-01-05 7:56 ` Benny Halevy
0 siblings, 0 replies; 37+ messages in thread
From: Benny Halevy @ 2008-01-05 7:56 UTC (permalink / raw)
To: Greg Banks; +Cc: NFS list, nfsv4
Greg Banks wrote:
> Strongly agreed with all the above. It's difficult to avoid the
> conclusion that the current pNFS spec is designed to allow the
> existing parallel filesystem vendors to sell "standards compliant"
> solutions that only work with their client software and where all
> the MDS and DS machines need to be bought from the same vendor.
> If the spec defined a single layout type, and all three protocols
> involved were variants of NFSv4.1 and were defined in the spec,
> we would have a true open standard.
>
>
Greg, I'm afraid your conclusion is just wrong. What exactly is it
based on?
I'd appreciate if you could look again at the current Internet Drafts
comprising
NFSv4.1 and layout types and please raise any issues you see in the specs
that would jeopardize interoperability between different client software
vendors
and different server / storage vendors.
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1
http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj
http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-block
There is indeed a third protocol in the overall architecture used by the
MDS to manage
the storage devices and this protocol is outside the scope of NFSv4.1.
This may lead
to non-interoperable implementations of server/storage systems but that
definitely was
not the intent of the design decision to leave the storage management
protocol unspecified
in NFSv4.1.
The object-based layout type. for example. is based on using the
standard OSD protocol
between the MDS and the OSDs for control as well as between the clients
and the OSDs
for data transfer. How does that preclude interoperability between
different client, MDS,
and DS vendors if the MDS, OSDs, and clients comply with T-10 OSD and
MDS and
client comply with NFSv4.1 and the pnfs-obj RFC (when it becomes one)?
Benny
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: A new NFSv4 server...
[not found] ` <Pine.LNX.4.64.0801042030380.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
@ 2008-01-06 23:54 ` James Morris
0 siblings, 0 replies; 37+ messages in thread
From: James Morris @ 2008-01-06 23:54 UTC (permalink / raw)
To: Peter Åstrand
Cc: Jeff Garzik, NFS list, nfsv4, mike-z9p9JiHjuePQT0dZR+AlfA
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1403 bytes --]
On Fri, 4 Jan 2008, Peter Åstrand wrote:
> > Use TLS to give most people what they want: AUTH_SYS with encryption. GSSAPI
> > is fine as a "required option" but you shouldn't need GSSAPI to do simple wire
> > encryption between IP-authenticated hosts.
>
> SSH is another option if you just want encryption, but my impression is
> that AUTH_SYS is a very big problem as well.
I've been looking into this recently, essentially ending up down a very
similar track to the SSiLKey proposal presented at IETF67:
http://www3.ietf.org/proceedings/06nov/slides/spkm-5/spkm-5.ppt
The basic idea in SSiLKey is to boostrap an RPCSEC_GSS session with TLS
and then layer LIPKEY on top.
It seems to me that SSH might be preferrable to TLS as a low infrastucture
mechanism, as many people already have ssh keys (and use them), there's no
need for a HTTP server, and SSH already supports a variety of
authentication mechanisms.
In the SSH case, I'm not sure yet whether LIPKEY would be the most
appropriate mechanism to utilize, and whether this scheme might in fact be
cleaner overall without using GSS at this level. i.e. GSS can be used
directly by SSH itself if desired, and there's also PAM.
There's also a patch for SSH to utilize GPG keys (raising the possibility
of utilizing existing webs of trust), although it does not seem to be
current.
- James
--
James Morris
<jmorris@namei.org>
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2008-01-06 23:54 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-03 12:16 A new NFSv4 server Jeff Garzik
2008-01-03 16:32 ` J. Bruce Fields
2008-01-04 5:32 ` Jeff Garzik
2008-01-04 6:24 ` Greg Banks
[not found] ` <477DD11B.40909-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-01-04 7:04 ` Jeff Garzik
2008-01-04 9:07 ` Benny Halevy
2008-01-04 15:49 ` Jeff Garzik
2008-01-04 19:51 ` Benny Halevy
2008-01-05 1:46 ` Greg Banks
2008-01-05 7:56 ` Benny Halevy
2008-01-04 17:47 ` J. Bruce Fields
2008-01-04 19:55 ` Benny Halevy
2008-01-04 9:15 ` Peter Åstrand
2008-01-04 10:05 ` Neil Brown
[not found] ` <Pine.LNX.4.64.0801040954070.5004-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-04 13:50 ` Frank van Maarseveen
2008-01-04 16:41 ` Jeff Garzik
2008-01-04 20:03 ` Peter Åstrand
[not found] ` <Pine.LNX.4.64.0801042030380.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-06 23:54 ` James Morris
2008-01-04 20:31 ` Muntz, Daniel
2008-01-04 9:15 ` Peter Åstrand
2008-01-04 16:14 ` Jeff Garzik
2008-01-04 19:58 ` Peter Åstrand
-- strict thread matches above, loose matches on Subject: below --
2008-01-04 15:28 Rick Macklem
[not found] ` <200801041528.KAA18776-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:21 ` J. Bruce Fields
2008-01-04 18:03 ` Tom Haynes
[not found] ` <477E750A.2030905-8AdZ+HgO7noAvxtiuMwx3w@public.gmane.org>
2008-01-04 18:21 ` J. Bruce Fields
2008-01-04 19:50 ` Jeff Garzik
2008-01-04 19:57 ` Peter Åstrand
[not found] ` <Pine.LNX.4.64.0801042055490.18738-K9BqGu7AvB3wj5YHdwD3Ga2PxDmRETKR@public.gmane.org>
2008-01-05 0:43 ` Jeff Garzik
2008-01-04 15:48 Rick Macklem
[not found] ` <200801041548.KAA18953-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:15 ` J. Bruce Fields
2008-01-05 2:32 ` Greg Banks
2008-01-04 17:11 Rick Macklem
[not found] ` <200801041711.MAA19577-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-05 0:51 ` Jeff Garzik
2008-01-04 17:28 Rick Macklem
[not found] ` <200801041728.MAA19743-bYVALtacgsT800Iu1Vt84J3p9npsUQCG@public.gmane.org>
2008-01-04 17:42 ` J. Bruce Fields
2008-01-04 17:45 ` Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox