* NFS as a Cluster File System.
@ 2003-01-09 19:39 Lorn Kay
2003-01-09 21:11 ` Brian Tinsley
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Lorn Kay @ 2003-01-09 19:39 UTC (permalink / raw)
To: nfs, linux-ha
Is NFS a viable CFS? (I'm cross posting this due to a discussion on the the
linux-ha list recently.)
NFS has a bad reputation probably due to (at least) the following:
It has been used in networking environments where different server hardware
configurations (NICS, drivers, etc.) running different operating systems
have connected to each other (in many-to-many configurations).
It grew up on networks that were perhaps unstable, or immature
(Someones kicked the token ring coax cable laying on the floor again)
long before switches were common place, and the network was loaded down with
all kinds of network traffic.
It wasnt understood very well. Since the default mount options worked,
system administrators often didnt fully understand the ramifications of
their NFS client mount option choices.
It relied on UDP, which is susceptible to huge retransmission efforts on
noisy or lossy networks.
NFS was used over many-hop WAN connections.
NFS servers were often used for many other tasks, not just NFS.
A cluster configuration, however, offers several advantages over the typical
NFS configuration:
All NFS clients (the cluster nodes) run the same operating system (Linux).
All clients run the same version of NFS and the kernel.
All clients use the same network tuned configuration.
A physical network can be dedicated to NFS. (Using a high quality switch,
with short data-center-only cable runs.)
All clients connect to one NFS server.
The NFS server is a high-quality dedicated machine (Net App, EMC, etc.)
Only one mount point need be used with one set of mount options.
Linux clients can use TCP instead of UDP.
Except for the vagaries of the load placed on the cluster nodes, this sounds
like a test lab environment. If NFS cant work in this environment where
will it ever work?
--K
_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail
-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: NFS as a Cluster File System. 2003-01-09 19:39 NFS as a Cluster File System Lorn Kay @ 2003-01-09 21:11 ` Brian Tinsley 2003-01-09 22:04 ` Brian Jackson 2003-01-09 21:29 ` Alan Robertson ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Brian Tinsley @ 2003-01-09 21:11 UTC (permalink / raw) To: Lorn Kay; +Cc: nfs, linux-ha Lorn Kay wrote: > > Is NFS a viable CFS? (I'm cross posting this due to a discussion on > the the linux-ha list recently.) Since there is not a really good cluster filesystem for Linux that is not either "half baked" (IMHO - I'm probably going to get smacked over that statement!) or cost an arm and a leg, this is exactly the route we have taken. > The NFS server is a high-quality dedicated machine (Net App, EMC, > etc.) We've had great success with just using SMP Linux servers. We do have one EMC IP4700 in production, and it's a nice system, but I prefer the Linux based alternative. > Linux clients can use TCP instead of UDP. Although I haven't had problems with this in our lab, I believe the NFS authors still consider this experimental. -- -[========================]- -[ Brian Tinsley ]- -[ Chief Systems Engineer ]- -[ Emageon ]- -[========================]- ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NFS as a Cluster File System. 2003-01-09 21:11 ` Brian Tinsley @ 2003-01-09 22:04 ` Brian Jackson 2003-01-09 23:02 ` Brian Tinsley 0 siblings, 1 reply; 18+ messages in thread From: Brian Jackson @ 2003-01-09 22:04 UTC (permalink / raw) To: nfs, linux-ha On Thursday 09 January 2003 03:11 pm, Brian Tinsley wrote: > Lorn Kay wrote: > > Is NFS a viable CFS? (I'm cross posting this due to a discussion on > > the the linux-ha list recently.) > > Since there is not a really good cluster filesystem for Linux that is > not either "half baked" Hey, we're working on it ;) --Brian Jackson OpenGFS Project > (IMHO - I'm probably going to get smacked over > that statement!) or cost an arm and a leg, this is exactly the route we > have taken. > > > The NFS server is a high-quality dedicated machine (Net App, EMC, > > etc.) > > We've had great success with just using SMP Linux servers. We do have > one EMC IP4700 in production, and it's a nice system, but I prefer the > Linux based alternative. > > > Linux clients can use TCP instead of UDP. > > Although I haven't had problems with this in our lab, I believe the NFS > authors still consider this experimental. ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: NFS as a Cluster File System. 2003-01-09 22:04 ` Brian Jackson @ 2003-01-09 23:02 ` Brian Tinsley 0 siblings, 0 replies; 18+ messages in thread From: Brian Tinsley @ 2003-01-09 23:02 UTC (permalink / raw) To: Brian Jackson; +Cc: nfs, linux-ha [-- Attachment #1: Type: text/html, Size: 1389 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NFS as a Cluster File System. 2003-01-09 19:39 NFS as a Cluster File System Lorn Kay 2003-01-09 21:11 ` Brian Tinsley @ 2003-01-09 21:29 ` Alan Robertson 2003-01-13 19:36 ` Neil Brown 2003-01-09 21:50 ` Lars Marowsky-Bree 2003-01-13 4:20 ` David B. Ritch 3 siblings, 1 reply; 18+ messages in thread From: Alan Robertson @ 2003-01-09 21:29 UTC (permalink / raw) To: Lorn Kay; +Cc: nfs, linux-ha Lorn Kay wrote: > > Is NFS a viable CFS? (I'm cross posting this due to a discussion on the > the linux-ha list recently.) > > NFS has a bad reputation probably due to (at least) the following: > > It has been used in networking environments where different server > hardware configurations (NICS, drivers, etc.) running different > operating systems have connected to each other (in many-to-many > configurations). > > It “grew up” on networks that were perhaps unstable, or immature > (“Someone’s kicked the token ring coax cable laying on the floor again”) > long before switches were common place, and the network was loaded down > with all kinds of network traffic. > > It wasn’t understood very well. Since the default mount options > worked, system administrators often didn’t fully understand the > ramifications of their NFS client mount option choices. > > It relied on UDP, which is susceptible to huge retransmission > efforts on noisy or lossy networks. > > NFS was used over many-hop WAN connections. > > NFS servers were often used for many other tasks, not just NFS. > > > A cluster configuration, however, offers several advantages over the > typical NFS configuration: > > All NFS clients (the cluster nodes) run the same operating system > (Linux). > > All clients run the same version of NFS and the kernel. > > All clients use the same network tuned configuration. > > A physical network can be dedicated to NFS. (Using a high quality > switch, with short data-center-only cable runs.) > > All clients connect to one NFS server. > > The NFS server is a high-quality dedicated machine (Net App, EMC, etc.) > > Only one mount point need be used with one set of mount options. > > Linux clients can use TCP instead of UDP. > > Except for the vagaries of the load placed on the cluster nodes, this > sounds like a test lab environment. If NFS can’t work in this > environment where will it ever work? NFS V3 and before have problems with "cache coherency". That is, the different nodes in the cluster are not guaranteed to see the same contents. I think this is supposed to be fixed in v4. -- Alan Robertson <alanr@unix.sh> "Openness is the foundation and preservative of friendship.... Let me claim from you at all times your undisguised opinions." - William Wilberforce ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: NFS as a Cluster File System. 2003-01-09 21:29 ` Alan Robertson @ 2003-01-13 19:36 ` Neil Brown 2003-01-13 20:25 ` David B. Ritch 2003-01-14 15:46 ` Trond Myklebust 0 siblings, 2 replies; 18+ messages in thread From: Neil Brown @ 2003-01-13 19:36 UTC (permalink / raw) To: Alan Robertson; +Cc: Lorn Kay, nfs, linux-ha On Thursday January 9, alanr@unix.sh wrote: > > NFS V3 and before have problems with "cache coherency". That is, the > different nodes in the cluster are not guaranteed to see the same contents. > > I think this is supposed to be fixed in v4. > NFSv4 does not try to "fix" this. It makes no attempts at "cache coherency" beyond what NFSv2/3 provide which is "close to open" cohenrence, meaning that if only one process has a file open at a time, then everythnig will appear coherent, and if multiple processes have the file open at the same time, they need to use record locking. I really don't think total cache coherency is a sensible goal for a network filesystem, even a cluster filesystem. It imposes lots of extra network traffic that most of the time will be of no value. If an application needs some degree of coherence, it should be explicit about it's needs (using open/close or locking) so that the protocol can provide it then, and only then. NeilBrown ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: NFS as a Cluster File System. 2003-01-13 19:36 ` Neil Brown @ 2003-01-13 20:25 ` David B. Ritch 2003-01-13 20:40 ` Neil Brown 2003-01-14 15:46 ` Trond Myklebust 1 sibling, 1 reply; 18+ messages in thread From: David B. Ritch @ 2003-01-13 20:25 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Robertson, Lorn Kay, NFS mailing list, linux-ha I agree that cache coherency is not a sensible goal for a cluster filesystem. However, cache coherency of metadata is rather important. For example, when one node creates a file of intermediate data, it is important for the other nodes to be able to see that. Using actime=0 is the conventional mechanism for allowing file creation and deletion to be propagated quickly. Usually, one can tweak that a bit to reduce the burden on the server. However, it might be be nice if there were a mechanism to propagate this sort of metadata change without dumping all metadata over a second or two old. dbr On Mon, 2003-01-13 at 14:36, Neil Brown wrote: > On Thursday January 9, alanr@unix.sh wrote: > > > > NFS V3 and before have problems with "cache coherency". That is, the > > different nodes in the cluster are not guaranteed to see the same contents. > > > > I think this is supposed to be fixed in v4. > > > > NFSv4 does not try to "fix" this. It makes no attempts at "cache > coherency" beyond what NFSv2/3 provide which is "close to open" > cohenrence, meaning that if only one process has a file open at a > time, then everythnig will appear coherent, and if multiple processes > have the file open at the same time, they need to use record locking. > > I really don't think total cache coherency is a sensible goal for a > network filesystem, even a cluster filesystem. It imposes lots of > extra network traffic that most of the time will be of no value. > If an application needs some degree of coherence, it should be > explicit about it's needs (using open/close or locking) so that the > protocol can provide it then, and only then. > > NeilBrown > > > ------------------------------------------------------- > This SF.NET email is sponsored by: FREE SSL Guide from Thawte > are you planning your Web Server Security? Click here to get a FREE > Thawte SSL guide and find the answers to all your SSL security issues. > http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs -- David B. Ritch High Performance Technologies, Inc. ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: NFS as a Cluster File System. 2003-01-13 20:25 ` David B. Ritch @ 2003-01-13 20:40 ` Neil Brown 2003-01-13 20:50 ` David B. Ritch 0 siblings, 1 reply; 18+ messages in thread From: Neil Brown @ 2003-01-13 20:40 UTC (permalink / raw) To: David B. Ritch; +Cc: Alan Robertson, Lorn Kay, NFS mailing list, linux-ha On January 13, dritch@hpti.com wrote: > I agree that cache coherency is not a sensible goal for a cluster > filesystem. However, cache coherency of metadata is rather important. > For example, when one node creates a file of intermediate data, it is > important for the other nodes to be able to see that. Using actime=0 is > the conventional mechanism for allowing file creation and deletion to be > propagated quickly. Usually, one can tweak that a bit to reduce the > burden on the server. However, it might be be nice if there were a > mechanism to propagate this sort of metadata change without dumping all > metadata over a second or two old. If the 'other nodes' open the file and look in it, then they should see current data (if they don't it's a bug). If they just 'stat' it to see if it has changed then they may see and old timestamp. I recommend openning the file. It is an explicit way for the application to say "I really want to know the current state of this file". NeilBrown ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: NFS as a Cluster File System. 2003-01-13 20:40 ` Neil Brown @ 2003-01-13 20:50 ` David B. Ritch 2003-01-13 22:11 ` Neil Brown 0 siblings, 1 reply; 18+ messages in thread From: David B. Ritch @ 2003-01-13 20:50 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Robertson, Lorn Kay, NFS mailing list, linux-ha That makes sense. However, it is common practice in many shops to write intermediate data files with some sort of serial number or timestamp in the name, and for the next step in the process to look for data using "ls" with a wildcard. When doing that, you don't know what the name of the next file might be, so you can't simply open it. While I agree that this is not the most ideal method for coordinating processing, it is widely used and I have found a need to support it. We've also had processes fail with a "file not found" error when trying to read a file recently written by a process on another node. It has always been my belief that this was a failure when a process tried to open the file, and the local metadata cache had not yet been updated. Just to clarify - are you saying that the open system call should have contacted the server, even if the local cached information said that the file (and perhaps its parent directory) did not exist? Thanks, dbr On Mon, 2003-01-13 at 15:40, Neil Brown wrote: > On January 13, dritch@hpti.com wrote: > > I agree that cache coherency is not a sensible goal for a cluster > > filesystem. However, cache coherency of metadata is rather important. > > For example, when one node creates a file of intermediate data, it is > > important for the other nodes to be able to see that. Using actime=0 is > > the conventional mechanism for allowing file creation and deletion to be > > propagated quickly. Usually, one can tweak that a bit to reduce the > > burden on the server. However, it might be be nice if there were a > > mechanism to propagate this sort of metadata change without dumping all > > metadata over a second or two old. > > If the 'other nodes' open the file and look in it, then they should > see current data (if they don't it's a bug). If they just 'stat' it > to see if it has changed then they may see and old timestamp. > > I recommend openning the file. It is an explicit way for the > application to say "I really want to know the current state of this > file". > > NeilBrown -- David B. Ritch High Performance Technologies, Inc. ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: NFS as a Cluster File System. 2003-01-13 20:50 ` David B. Ritch @ 2003-01-13 22:11 ` Neil Brown 0 siblings, 0 replies; 18+ messages in thread From: Neil Brown @ 2003-01-13 22:11 UTC (permalink / raw) To: David B. Ritch; +Cc: Alan Robertson, Lorn Kay, NFS mailing list, linux-ha On January 13, dritch@hpti.com wrote: > That makes sense. However, it is common practice in many shops to write > intermediate data files with some sort of serial number or timestamp in > the name, and for the next step in the process to look for data using > "ls" with a wildcard. When doing that, you don't know what the name of > the next file might be, so you can't simply open it. I don't think that this should be a problem for NFS. To do the 'ls' or to expand the wildcard you need to open the directory (and do a readdir) and this should cause the client to check with the server. Once you have the name the open should work. > > While I agree that this is not the most ideal method for coordinating > processing, it is widely used and I have found a need to support it. It seems reasonable to me. > > We've also had processes fail with a "file not found" error when trying > to read a file recently written by a process on another node. It has > always been my belief that this was a failure when a process tried to > open the file, and the local metadata cache had not yet been updated. > Just to clarify - are you saying that the open system call should have > contacted the server, even if the local cached information said that the > file (and perhaps its parent directory) did not exist? Hmmm... My understanding of NFS and 'close to open' semantics is that on 'open' the client should definately contact the server, atleast to do a GETATTR on the file, and possibly to do a LOOKUP if there is doubt as to the current information in the name cache. However the Linux VFS is not very friendly to network filesystems in this respect. The NFS client doesn't know if a given name lookup is for an "open" or for a "stat" and so it cannot impose it's subtley different semantics. So I can well imagine an "open" on a file that another client has just created failing, if a recent name lookup has said that it didn't exist. However if you always do an opendir/readdir first, and only try to open files that were found in the readdir, then the client should be able to reliably detect the change to the directory and submit a new LOOKUP request. I don't know if the Linux NFS client does this or not. I think it is fair to say that Linux isn't really ready for this sort of tightly-coupled-network-filesystem thing yet. The VFS just isn't ready. It doesn't even allow O_CREAT|O_EXCL to work over NFS even though the NFSv3 protocol supports. The implementers of Lustre has enhanced the VFS for their use. This may get into mainline in 2.7 (too late for 2.6), or something else might be developed. With careful coding it should be possible to achieve any particular result, but you really need to know exactly what functionality the NFS client does, and does not, provide. [[ NOTE: these replies aren't making it to linux-ha@muc.de as I am not a subscriber. Feel free to forward them if you think that is appropriate]] NeilBrown ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: NFS as a Cluster File System. 2003-01-13 19:36 ` Neil Brown 2003-01-13 20:25 ` David B. Ritch @ 2003-01-14 15:46 ` Trond Myklebust 2003-01-14 16:01 ` Kumaran Rajaram 1 sibling, 1 reply; 18+ messages in thread From: Trond Myklebust @ 2003-01-14 15:46 UTC (permalink / raw) To: Neil Brown; +Cc: Alan Robertson, Lorn Kay, nfs, linux-ha >>>>> " " == Neil Brown <neilb@cse.unsw.edu.au> writes: > NFSv4 does not try to "fix" this. It makes no attempts at > "cache coherency" beyond what NFSv2/3 provide which is "close > to open" cohenrence, meaning that if only one process has a > file open at a time, then everythnig will appear coherent, and > if multiple processes have the file open at the same time, they > need to use record locking. Note, though, that in addition to supporting file locking, NFSv4 adds support for file 'delegation' which allow the NFS client to do locking entirely as a local operation (i.e. there is no need to contact the server). For most applications, this will make locking a much faster operation... Cheers, Trond ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: NFS as a Cluster File System. 2003-01-14 15:46 ` Trond Myklebust @ 2003-01-14 16:01 ` Kumaran Rajaram 2003-01-14 16:08 ` Trond Myklebust 0 siblings, 1 reply; 18+ messages in thread From: Kumaran Rajaram @ 2003-01-14 16:01 UTC (permalink / raw) To: Trond Myklebust; +Cc: nfs > Note, though, that in addition to supporting file locking, NFSv4 adds > support for file 'delegation' which allow the NFS client to do locking > entirely as a local operation (i.e. there is no need to contact the > server). For most applications, this will make locking a much faster > operation... If file-locking is made local, how do other NFS-clients get to know the locking info. I suspect this might lead to multiple NFS-clients holding the lock on the same file simultaneously, leading to file-inconsistency. Please correct me if Iam wrong. Thanks, -Kums -- Kumaran Rajaram, Mississippi State University -- kums@cs.msstate.edu <http://www.cs.msstate.edu/~kums> ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Re: NFS as a Cluster File System. 2003-01-14 16:01 ` Kumaran Rajaram @ 2003-01-14 16:08 ` Trond Myklebust 0 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2003-01-14 16:08 UTC (permalink / raw) To: Kumaran Rajaram; +Cc: nfs >>>>> " " == Kumaran Rajaram <kums@CS.MsState.EDU> writes: > If file-locking is made local, how do other NFS-clients get > to know the > locking info. I suspect this might lead to multiple NFS-clients > holding the lock on the same file simultaneously, leading to > file-inconsistency. Please correct me if Iam wrong. I suggest you read RFC3010, as this is what the file delegation takes care of. Delegation is a way for the server to tell the client that it is the only current user of that particular file. If another client comes along and opens the same file, then the server must notify the first client that it is about to revoke the delegation, and give it a short period of time in which to flush back all changes (including any locks that are being held). Cheers, Trond ------------------------------------------------------- This SF.NET email is sponsored by: FREE SSL Guide from Thawte are you planning your Web Server Security? Click here to get a FREE Thawte SSL guide and find the answers to all your SSL security issues. http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NFS as a Cluster File System. 2003-01-09 19:39 NFS as a Cluster File System Lorn Kay 2003-01-09 21:11 ` Brian Tinsley 2003-01-09 21:29 ` Alan Robertson @ 2003-01-09 21:50 ` Lars Marowsky-Bree 2003-01-09 23:09 ` Brian Tinsley 2003-01-13 4:20 ` David B. Ritch 3 siblings, 1 reply; 18+ messages in thread From: Lars Marowsky-Bree @ 2003-01-09 21:50 UTC (permalink / raw) To: Lorn Kay, nfs, linux-ha On 2003-01-09T19:39:50, Lorn Kay <lorn_kay@hotmail.com> said: > Is NFS a viable CFS? (I'm cross posting this due to a discussion on the the > linux-ha list recently.) NFS might be a viable system for making content available in a cluster, given a highly available NFS sever (not that easy to do right, actually) and provided that the bandwidth and latency is good enough for you; file locking might also be a problem. However, it is NOT a "CFS", which people commonly use to refer to a filesystem which is distributed and usually shares the same storage system connected to all nodes. I believe there might be a confusion of words here ;-) Sincerely, Lars Marowsky-Brée <lmb@suse.de> -- Principal Squirrel SuSE Labs - Research & Development, SuSE Linux AG "If anything can go wrong, it will." "Chance favors the prepared (mind)." -- Capt. Edward A. Murphy -- Louis Pasteur ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NFS as a Cluster File System. 2003-01-09 21:50 ` Lars Marowsky-Bree @ 2003-01-09 23:09 ` Brian Tinsley 0 siblings, 0 replies; 18+ messages in thread From: Brian Tinsley @ 2003-01-09 23:09 UTC (permalink / raw) To: Lars Marowsky-Bree; +Cc: Lorn Kay, nfs, linux-ha Lars Marowsky-Bree wrote: >On 2003-01-09T19:39:50, > Lorn Kay <lorn_kay@hotmail.com> said: > >>Is NFS a viable CFS? (I'm cross posting this due to a discussion on the the linux-ha list recently.) >> >> >NFS might be a viable system for making content available in a cluster, given a highly available NFS sever (not that easy to do right, actually) and provided that the bandwidth and latency is good enough for you; file locking might also be a problem. > There are numerous threads in the NFS mailing list archives (and probably in the NFS HOWTO - it's been quite a while since I've read it) on how to set up a HA NFS cluster. Yes, there are quite a few pitfalls to watch for and some applications may not behave well in this configuration, but it's definitely achievable. >However, it is NOT a "CFS", which people commonly use to refer to a filesystem which is distributed and usually shares the same storage system connected to all nodes. > Yes, good clarification. ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NFS as a Cluster File System. 2003-01-09 19:39 NFS as a Cluster File System Lorn Kay ` (2 preceding siblings ...) 2003-01-09 21:50 ` Lars Marowsky-Bree @ 2003-01-13 4:20 ` David B. Ritch 3 siblings, 0 replies; 18+ messages in thread From: David B. Ritch @ 2003-01-13 4:20 UTC (permalink / raw) To: Lorn Kay; +Cc: NFS mailing list, linux-ha As has been discussed, there are various meanings for the expression CFS. So, I'll assume that you are looking for a filesystem to serve files to a cluster. You're right - NFS has a bad reputation. However, I believe that there are 3 additional reasons that I have not seen in this thread. First, until very recently, NFS has not been stable under Linux. Before the 2.4.18 (or possibly 2.4.14) kernel, it had frequent hangs, at least on SMP systems. Even under 2.4.18 and 2.4.19, we have seen peculiar results occasionally, such as "ls -l" displaying the wrong owners for most of the files in a directory. 2.4.20 looks pretty good. This is not an NFS problem as such, but a Linux problem. I've used NFS extensively with many commercial versions of Unix without such problems. Thanks to Trond, Neil, and others for solving this for Linux! Second, NFS does not provide much security. It doesn't provide for strong authentication, and it doesn't provide for encryption in transit. It's vulnerable to lots of DOS attacks. It's really only suited to a local, controlled network. Finally, NFS is very sensitive to latency. I'm not sure whether this is an issue inherent to the protocol, or just to all implementations that I have used. However, I have seen a few millisecond latency reduce NFS throughput from 10-12MB/sec over 100BaseT to 3MB/sec or less. In addition, for a cluster, nfs has an additional weakness over some newer filesystems. It typically depends on a single server, or sometimes a cluster of servers. Either way, when a parallel job starts up on a fairly large cluster, typically many nodes suddenly attempt to access the same filesystem on a single server. This may be just to load an executable, or it may be to access a data file. Either way, the server is suddenly subject to a very high load, and its performance plummets, as a result of many nodes simultaneously trying to access the same thing. There are various workarounds to avoid this problem. For example, many clusters (such as Cplant, at Sandia National Lab) use special software to replicate an executable across the active set of nodes before running it. There are several shared filesystems, which allow multiple servers to access the same shared disks, and simultaneously serve the same files and filesystems to multiple servers. Typically, these have very good performance, but less stability than is required in a production environment. A variant of a cluster filesystem is ENFS, which is also used at Sandia. The old user-space nfs daemon from Linux has been modified to be used as a forwarder. For every 32 (or so) compute nodes, there is a leader node. The compute nodes NFS-mount filesystems from the leader nodes. The leader nodes nfs-mount filesystems from servers, and then use the nfs daemon to re-export them to the compute nodes. This system actually works quite well. A 1500-node system is booted diskless, from a single admin node, in just a few minutes (I don't recall the exact speed, but I believe it's in the 5-20 minute range). ENFS has some weaknesses. First, it does not support NFS-V4 or -V4, so it is limited to files of no more than 2GB. Second, it has never been productized and released. I would *love* to see the kernel NFS implementation able to provide the same sort of forwarding. Last summer, I tested the kernel as a forwarder. With the filesystem ID patch, I thought that it would be possible. Unfortunately, although it was able to forward filesystems enough for a few listings to succeed, it soon hung. Perhaps a newer version would actually work, but I don't believe that this has been a priority for any of the developers. There are many other "solutions", GFS, CVFS, PVS, etc., each with its own issues. There are some characteristics that I believe any real solution will have, most of which are shared by the existing "solutions". 1) There must be a single image available to all the compute nodes. This means thousands of nodes, not just 10s of nodes. This may be achievable, though, by a combination of methods. One example may be a shared filesystem, mounted by 10s of nodes, which is then NFS-exported to the rest of the nodes. 2) There must be a fan-out effect. That is, in order to be scalable, the same file/filesystem must be able to be cached on multiple servers. Ideally, a hierarchy of servers should be possible. That is, a leader may cache for 32 sub-leaders, each of which cache for 32 compute nodes. 32 is an arbitrary number - replace it with your favorite. 3) It must be stable. 4) It must provide high performance. Ideally, an individual node in a high performance cluster should be able to read or write 100MB/sec at least. 5) It should be able to function over a variety of networks, including, for example, Ethernet, Myrinet, and Quadrics. 6) It should not have a single point of failure. Many shared filesystems, for example, depend on a singe metadata server. 7) It must support the full normal filesystem semantics. PVFS, for example, meets most of these requirements, but doesn't support symbolic links. There are probably other requirements, too, but these are the requirements that immediately come to mind. Unfortunately, I don't know of a solution that meets all of these. Is there one? Thanks, dbr On Thu, 2003-01-09 at 14:39, Lorn Kay wrote: > Is NFS a viable CFS? (I'm cross posting this due to a discussion on the the > linux-ha list recently.) -- David B. Ritch High Performance Technologies, Inc. ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: NFS as a Cluster File System.
@ 2003-01-09 23:13 Lorn Kay
2003-01-10 3:34 ` Alan Cox
0 siblings, 1 reply; 18+ messages in thread
From: Lorn Kay @ 2003-01-09 23:13 UTC (permalink / raw)
To: lmb, nfs, linux-ha
>However, it is NOT a "CFS", which people commonly use to refer to a
>filesystem
>which is distributed and usually shares the same storage system connected
>to
>all nodes.
>
>I believe there might be a confusion of words here ;-)
>
>
>Sincerely,
> Lars Marowsky-Brée <lmb@suse.de>
Sorry, still confused about what a "CFS" really is. In "In Search Of
Clusters" Gregory Pfister takes the position that a distributed file system
is what he calls a valid "single system image" file system, what I would
take to mean a cluster file system (though he doesn't use those words).
I guess you are saying a clustered file system isn't necessarily supporting
a cluster of application servers but is itself stored on a cluster. (A
single server can be the only server using a cluster file system.) ?
--K
_________________________________________________________________
Help STOP SPAM: Try the new MSN 8 and get 2 months FREE*
http://join.msn.com/?page=features/junkmail
-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: NFS as a Cluster File System. 2003-01-09 23:13 Lorn Kay @ 2003-01-10 3:34 ` Alan Cox 0 siblings, 0 replies; 18+ messages in thread From: Alan Cox @ 2003-01-10 3:34 UTC (permalink / raw) To: Lorn Kay; +Cc: lmb, nfs, linux-ha On Thu, 2003-01-09 at 23:13, Lorn Kay wrote: > Sorry, still confused about what a "CFS" really is. In "In Search Of > Clusters" Gregory Pfister takes the position that a distributed file system > is what he calls a valid "single system image" file system, what I would > take to mean a cluster file system (though he doesn't use those words). > > I guess you are saying a clustered file system isn't necessarily supporting > a cluster of application servers but is itself stored on a cluster. (A > single server can be the only server using a cluster file system.) ? It seems to mean about three different things 1. "A clusterwide view of the file store implemented by any unspecified means" - ie an application view point. 2. "A filesystem which supports operation of a cluster" 3. "A filesystem with multiple systems accessing a single shared file system on shared storage" Meaning #3 can be really confusing because a 'cluster file system' in that sense is actually exactly what you don't want for many cluster setups, especially those with little active shared storage' [For example if you are doing database failover you can do I/O fencing and mount/umount of a more normal fs] ------------------------------------------------------- This SF.NET email is sponsored by: SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See! http://www.vasoftware.com _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2003-01-14 16:08 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-01-09 19:39 NFS as a Cluster File System Lorn Kay 2003-01-09 21:11 ` Brian Tinsley 2003-01-09 22:04 ` Brian Jackson 2003-01-09 23:02 ` Brian Tinsley 2003-01-09 21:29 ` Alan Robertson 2003-01-13 19:36 ` Neil Brown 2003-01-13 20:25 ` David B. Ritch 2003-01-13 20:40 ` Neil Brown 2003-01-13 20:50 ` David B. Ritch 2003-01-13 22:11 ` Neil Brown 2003-01-14 15:46 ` Trond Myklebust 2003-01-14 16:01 ` Kumaran Rajaram 2003-01-14 16:08 ` Trond Myklebust 2003-01-09 21:50 ` Lars Marowsky-Bree 2003-01-09 23:09 ` Brian Tinsley 2003-01-13 4:20 ` David B. Ritch -- strict thread matches above, loose matches on Subject: below -- 2003-01-09 23:13 Lorn Kay 2003-01-10 3:34 ` Alan Cox
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.