All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS as a Cluster File System.
@ 2003-01-09 19:39 Lorn Kay
  2003-01-09 21:11 ` Brian Tinsley
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Lorn Kay @ 2003-01-09 19:39 UTC (permalink / raw)
  To: nfs, linux-ha


Is NFS a viable CFS? (I'm cross posting this due to a discussion on the the 
linux-ha list recently.)

NFS has a bad reputation probably due to (at least) the following:

	It has been used in networking environments where different server hardware 
configurations (NICS, drivers, etc.) running different operating systems 
have connected to each other (in many-to-many configurations).

	It “grew up” on networks that were perhaps unstable, or immature 
(“Someone’s kicked the token ring coax cable laying on the floor again”) 
long before switches were common place, and the network was loaded down with 
all kinds of network traffic.

	It wasn’t understood very well. Since the default mount options worked, 
system administrators often didn’t fully understand the ramifications of 
their NFS client mount option choices.

	It relied on UDP, which is susceptible to huge retransmission efforts on 
noisy or lossy networks.

	NFS was used over many-hop WAN connections.

	NFS servers were often used for many other tasks, not just NFS.


A cluster configuration, however, offers several advantages over the typical 
NFS configuration:

	All NFS clients (the cluster nodes) run the same operating system (Linux).

	All clients run the same version of NFS and the kernel.

	All clients use the same network tuned configuration.

	A physical network can be dedicated to NFS. (Using a high quality switch, 
with short data-center-only cable runs.)

	All clients connect to one NFS server.

	The NFS server is a high-quality dedicated machine (Net App, EMC, etc.)

	Only one mount point need be used with one set of mount options.

	Linux clients can use TCP instead of UDP.

Except for the vagaries of the load placed on the cluster nodes, this sounds 
like a test lab environment. If NFS can’t work in this environment where 
will it ever work?

--K






_________________________________________________________________
STOP MORE SPAM with the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail



-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NFS as a Cluster File System.
  2003-01-09 19:39 NFS as a Cluster File System Lorn Kay
@ 2003-01-09 21:11 ` Brian Tinsley
  2003-01-09 22:04   ` Brian Jackson
  2003-01-09 21:29 ` Alan Robertson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Brian Tinsley @ 2003-01-09 21:11 UTC (permalink / raw)
  To: Lorn Kay; +Cc: nfs, linux-ha

Lorn Kay wrote:

>
> Is NFS a viable CFS? (I'm cross posting this due to a discussion on 
> the the linux-ha list recently.) 

Since there is not a really good cluster filesystem for Linux that is 
not either "half baked" (IMHO - I'm probably going to get smacked over 
that statement!) or cost an arm and a leg, this is exactly the route we 
have taken.

>     The NFS server is a high-quality dedicated machine (Net App, EMC, 
> etc.) 

We've had great success with just using SMP Linux servers. We do have 
one EMC IP4700 in production, and it's a nice system, but I prefer the 
Linux based alternative.

>     Linux clients can use TCP instead of UDP. 

Although I haven't had problems with this in our lab, I believe the NFS 
authors still consider this experimental.


-- 

-[========================]-
-[      Brian Tinsley     ]-
-[ Chief Systems Engineer ]-
-[        Emageon         ]-
-[========================]-






-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NFS as a Cluster File System.
  2003-01-09 19:39 NFS as a Cluster File System Lorn Kay
  2003-01-09 21:11 ` Brian Tinsley
@ 2003-01-09 21:29 ` Alan Robertson
  2003-01-13 19:36   ` Neil Brown
  2003-01-09 21:50 ` Lars Marowsky-Bree
  2003-01-13  4:20 ` David B. Ritch
  3 siblings, 1 reply; 18+ messages in thread
From: Alan Robertson @ 2003-01-09 21:29 UTC (permalink / raw)
  To: Lorn Kay; +Cc: nfs, linux-ha

Lorn Kay wrote:
> 
> Is NFS a viable CFS? (I'm cross posting this due to a discussion on the 
> the linux-ha list recently.)
> 
> NFS has a bad reputation probably due to (at least) the following:
> 
>     It has been used in networking environments where different server 
> hardware configurations (NICS, drivers, etc.) running different 
> operating systems have connected to each other (in many-to-many 
> configurations).
> 
>     It “grew up” on networks that were perhaps unstable, or immature 
> (“Someone’s kicked the token ring coax cable laying on the floor again”) 
> long before switches were common place, and the network was loaded down 
> with all kinds of network traffic.
> 
>     It wasn’t understood very well. Since the default mount options 
> worked, system administrators often didn’t fully understand the 
> ramifications of their NFS client mount option choices.
> 
>     It relied on UDP, which is susceptible to huge retransmission 
> efforts on noisy or lossy networks.
> 
>     NFS was used over many-hop WAN connections.
> 
>     NFS servers were often used for many other tasks, not just NFS.
> 
> 
> A cluster configuration, however, offers several advantages over the 
> typical NFS configuration:
> 
>     All NFS clients (the cluster nodes) run the same operating system 
> (Linux).
> 
>     All clients run the same version of NFS and the kernel.
> 
>     All clients use the same network tuned configuration.
> 
>     A physical network can be dedicated to NFS. (Using a high quality 
> switch, with short data-center-only cable runs.)
> 
>     All clients connect to one NFS server.
> 
>     The NFS server is a high-quality dedicated machine (Net App, EMC, etc.)
> 
>     Only one mount point need be used with one set of mount options.
> 
>     Linux clients can use TCP instead of UDP.
> 
> Except for the vagaries of the load placed on the cluster nodes, this 
> sounds like a test lab environment. If NFS can’t work in this 
> environment where will it ever work?

NFS V3 and before have problems with "cache coherency".  That is, the 
different nodes in the cluster are not guaranteed to see the same contents.

I think this is supposed to be fixed in v4.


-- 
     Alan Robertson <alanr@unix.sh>

"Openness is the foundation and preservative of friendship....  Let me claim 
from you at all times your undisguised opinions." - William Wilberforce




-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NFS as a Cluster File System.
  2003-01-09 19:39 NFS as a Cluster File System Lorn Kay
  2003-01-09 21:11 ` Brian Tinsley
  2003-01-09 21:29 ` Alan Robertson
@ 2003-01-09 21:50 ` Lars Marowsky-Bree
  2003-01-09 23:09   ` Brian Tinsley
  2003-01-13  4:20 ` David B. Ritch
  3 siblings, 1 reply; 18+ messages in thread
From: Lars Marowsky-Bree @ 2003-01-09 21:50 UTC (permalink / raw)
  To: Lorn Kay, nfs, linux-ha

On 2003-01-09T19:39:50,
   Lorn Kay <lorn_kay@hotmail.com> said:

> Is NFS a viable CFS? (I'm cross posting this due to a discussion on the the 
> linux-ha list recently.)

NFS might be a viable system for making content available in a cluster, given
a highly available NFS sever (not that easy to do right, actually) and
provided that the bandwidth and latency is good enough for you; file locking
might also be a problem.

However, it is NOT a "CFS", which people commonly use to refer to a filesystem
which is distributed and usually shares the same storage system connected to
all nodes.

I believe there might be a confusion of words here ;-)


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Principal Squirrel 
SuSE Labs - Research & Development, SuSE Linux AG
  
"If anything can go wrong, it will." "Chance favors the prepared (mind)."
  -- Capt. Edward A. Murphy            -- Louis Pasteur


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NFS as a Cluster File System.
  2003-01-09 21:11 ` Brian Tinsley
@ 2003-01-09 22:04   ` Brian Jackson
  2003-01-09 23:02     ` Brian Tinsley
  0 siblings, 1 reply; 18+ messages in thread
From: Brian Jackson @ 2003-01-09 22:04 UTC (permalink / raw)
  To: nfs, linux-ha

On Thursday 09 January 2003 03:11 pm, Brian Tinsley wrote:
> Lorn Kay wrote:
> > Is NFS a viable CFS? (I'm cross posting this due to a discussion on
> > the the linux-ha list recently.)
>
> Since there is not a really good cluster filesystem for Linux that is
> not either "half baked"

Hey, we're working on it ;)

--Brian Jackson
OpenGFS Project

> (IMHO - I'm probably going to get smacked over
> that statement!) or cost an arm and a leg, this is exactly the route we
> have taken.
>
> >     The NFS server is a high-quality dedicated machine (Net App, EMC,
> > etc.)
>
> We've had great success with just using SMP Linux servers. We do have
> one EMC IP4700 in production, and it's a nice system, but I prefer the
> Linux based alternative.
>
> >     Linux clients can use TCP instead of UDP.
>
> Although I haven't had problems with this in our lab, I believe the NFS
> authors still consider this experimental.



-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: NFS as a Cluster File System.
  2003-01-09 22:04   ` Brian Jackson
@ 2003-01-09 23:02     ` Brian Tinsley
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Tinsley @ 2003-01-09 23:02 UTC (permalink / raw)
  To: Brian Jackson; +Cc: nfs, linux-ha

[-- Attachment #1: Type: text/html, Size: 1389 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NFS as a Cluster File System.
  2003-01-09 21:50 ` Lars Marowsky-Bree
@ 2003-01-09 23:09   ` Brian Tinsley
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Tinsley @ 2003-01-09 23:09 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: Lorn Kay, nfs, linux-ha

Lars Marowsky-Bree wrote:

>On 2003-01-09T19:39:50,
>   Lorn Kay <lorn_kay@hotmail.com> said:
>
>>Is NFS a viable CFS? (I'm cross posting this due to a discussion on the the linux-ha list recently.)
>>    
>>
>NFS might be a viable system for making content available in a cluster, given a highly available NFS sever (not that easy to do right, actually) and provided that the bandwidth and latency is good enough for you; file locking might also be a problem.
>
There are numerous threads in the NFS mailing list archives (and 
probably in the NFS HOWTO - it's been quite a while since I've read it) 
on how to set up a HA NFS cluster. Yes, there are quite a few pitfalls 
to watch for and some applications may not behave well in this 
configuration, but it's definitely achievable.

>However, it is NOT a "CFS", which people commonly use to refer to a filesystem which is distributed and usually shares the same storage system connected to all nodes.
>
Yes, good clarification.




-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NFS as a Cluster File System.
@ 2003-01-09 23:13 Lorn Kay
  2003-01-10  3:34 ` Alan Cox
  0 siblings, 1 reply; 18+ messages in thread
From: Lorn Kay @ 2003-01-09 23:13 UTC (permalink / raw)
  To: lmb, nfs, linux-ha


>However, it is NOT a "CFS", which people commonly use to refer to a 
>filesystem
>which is distributed and usually shares the same storage system connected 
>to
>all nodes.
>
>I believe there might be a confusion of words here ;-)
>
>
>Sincerely,
>     Lars Marowsky-Brée <lmb@suse.de>

Sorry, still confused about what a "CFS" really is. In "In Search Of 
Clusters" Gregory Pfister takes the position that a distributed file system 
is what he calls a valid "single system image" file system, what I would 
take to mean a cluster file system (though he doesn't use those words).

I guess you are saying a clustered file system isn't necessarily supporting 
a cluster of application servers but is itself stored on a cluster. (A 
single server can be the only server using a cluster file system.) ?

--K



_________________________________________________________________
Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail



-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NFS as a Cluster File System.
  2003-01-09 23:13 Lorn Kay
@ 2003-01-10  3:34 ` Alan Cox
  0 siblings, 0 replies; 18+ messages in thread
From: Alan Cox @ 2003-01-10  3:34 UTC (permalink / raw)
  To: Lorn Kay; +Cc: lmb, nfs, linux-ha

On Thu, 2003-01-09 at 23:13, Lorn Kay wrote:
> Sorry, still confused about what a "CFS" really is. In "In Search Of 
> Clusters" Gregory Pfister takes the position that a distributed file system 
> is what he calls a valid "single system image" file system, what I would 
> take to mean a cluster file system (though he doesn't use those words).
> 
> I guess you are saying a clustered file system isn't necessarily supporting 
> a cluster of application servers but is itself stored on a cluster. (A 
> single server can be the only server using a cluster file system.) ?

It seems to mean about three different things 

1.   "A clusterwide view of the file store implemented by any unspecified
means"  - ie an application view point.

2.   "A filesystem which supports operation of a cluster"

3.   "A filesystem with multiple systems accessing a single shared 
file system on shared storage"

Meaning #3 can be really confusing because a 'cluster file system' in that
sense is actually exactly what you don't want for many cluster setups,
especially those with little active shared storage'

[For example if you are doing database failover you can do I/O fencing
 and mount/umount of a more normal fs]




-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: NFS as a Cluster File System.
  2003-01-09 19:39 NFS as a Cluster File System Lorn Kay
                   ` (2 preceding siblings ...)
  2003-01-09 21:50 ` Lars Marowsky-Bree
@ 2003-01-13  4:20 ` David B. Ritch
  3 siblings, 0 replies; 18+ messages in thread
From: David B. Ritch @ 2003-01-13  4:20 UTC (permalink / raw)
  To: Lorn Kay; +Cc: NFS mailing list, linux-ha

As has been discussed, there are various meanings for the expression
CFS.  So, I'll assume that you are looking for a filesystem to serve
files to a cluster.

You're right - NFS has a bad reputation.  However, I believe that there
are 3 additional reasons that I have not seen in this thread.  First,
until very recently, NFS has not been stable under Linux.  Before the
2.4.18 (or possibly 2.4.14) kernel, it had frequent hangs, at least on
SMP systems.  Even under 2.4.18 and 2.4.19, we have seen peculiar
results occasionally, such as "ls -l" displaying the wrong owners for
most of the files in a directory.  2.4.20 looks pretty good.  This is
not an NFS problem as such, but a Linux problem.  I've used NFS
extensively with many commercial versions of Unix without such
problems.  Thanks to Trond, Neil, and others for solving this for Linux!

Second, NFS does not provide much security.  It doesn't provide for
strong authentication, and it doesn't provide for encryption in
transit.  It's vulnerable to lots of DOS attacks.  It's really only
suited to a local, controlled network.

Finally, NFS is very sensitive to latency.  I'm not sure whether this is
an issue inherent to the protocol, or just to all implementations that I
have used.  However, I have seen a few millisecond latency reduce NFS
throughput from 10-12MB/sec over 100BaseT to 3MB/sec or less.

In addition, for a cluster, nfs has an additional weakness over some
newer filesystems.  It typically depends on a single server, or
sometimes a cluster of servers.  Either way, when a parallel job starts
up on a fairly large cluster, typically many nodes suddenly attempt to
access the same filesystem on a single server.  This may be just to load
an executable, or it may be to access a data file.  Either way, the
server is suddenly subject to a very high load, and its performance
plummets, as a result of many nodes simultaneously trying to access the
same thing.

There are various workarounds to avoid this problem.  For example, many
clusters (such as Cplant, at Sandia National Lab) use special software
to replicate an executable across the active set of nodes before running
it.  There are several shared filesystems, which allow multiple servers
to access the same shared disks, and simultaneously serve the same files
and filesystems to multiple servers.  Typically, these have very good
performance, but less stability than is required in a production
environment.

A variant of a cluster filesystem is ENFS, which is also used at
Sandia.  The old user-space nfs daemon from Linux has been modified to
be used as a forwarder.  For every 32 (or so) compute nodes, there is a
leader node.  The compute nodes NFS-mount filesystems from the leader
nodes.  The leader nodes nfs-mount filesystems from servers, and then
use the nfs daemon to re-export them to the compute nodes.  This system
actually works quite well.  A 1500-node system is booted diskless, from
a single admin node, in just a few minutes (I don't recall the exact
speed, but I believe it's in the 5-20 minute range).

ENFS has some weaknesses.  First, it does not support NFS-V4 or -V4, so
it is limited to files of no more than 2GB.  Second, it has never been
productized and released.  I would *love* to see the kernel NFS
implementation able to provide the same sort of forwarding.  Last
summer, I tested the kernel as a forwarder.  With the filesystem ID
patch, I thought that it would be possible.  Unfortunately, although it
was able to forward filesystems enough for a few listings to succeed, it
soon hung.  Perhaps a newer version would actually work, but I don't
believe that this has been a priority for any of the developers.

There are many other "solutions", GFS, CVFS, PVS, etc., each with its
own issues.  There are some characteristics that I believe any real
solution will have, most of which are shared by the existing
"solutions".

1) There must be a single image available to all the compute nodes. 
This means thousands of nodes, not just 10s of nodes.  This may be
achievable, though, by a combination of methods.  One example may be a
shared filesystem, mounted by 10s of nodes, which is then NFS-exported
to the rest of the nodes.

2)  There must be a fan-out effect.  That is, in order to be scalable,
the same file/filesystem must be able to be cached on multiple servers. 
Ideally, a hierarchy of servers should be possible.  That is, a leader
may cache for 32 sub-leaders, each of which cache for 32 compute nodes. 
32 is an arbitrary number - replace it with your favorite.

3) It must be stable.

4) It must provide high performance.  Ideally, an individual node in a
high performance cluster should be able to read or write 100MB/sec at
least.

5) It should be able to function over a variety of networks, including,
for example, Ethernet, Myrinet, and Quadrics.

6) It should not have a single point of failure.  Many shared
filesystems, for example, depend on a singe metadata server.

7) It must support the full normal filesystem semantics.  PVFS, for
example, meets most of these requirements, but doesn't support symbolic
links.

There are probably other requirements, too, but these are the
requirements that immediately come to mind.  Unfortunately, I don't know
of a solution that meets all of these.  Is there one?

Thanks,

dbr

On Thu, 2003-01-09 at 14:39, Lorn Kay wrote:
> Is NFS a viable CFS? (I'm cross posting this due to a discussion on the the 
> linux-ha list recently.)

-- 
David B. Ritch
High Performance Technologies, Inc.


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: NFS as a Cluster File System.
  2003-01-09 21:29 ` Alan Robertson
@ 2003-01-13 19:36   ` Neil Brown
  2003-01-13 20:25     ` David B. Ritch
  2003-01-14 15:46     ` Trond Myklebust
  0 siblings, 2 replies; 18+ messages in thread
From: Neil Brown @ 2003-01-13 19:36 UTC (permalink / raw)
  To: Alan Robertson; +Cc: Lorn Kay, nfs, linux-ha

On Thursday January 9, alanr@unix.sh wrote:
> 
> NFS V3 and before have problems with "cache coherency".  That is, the 
> different nodes in the cluster are not guaranteed to see the same contents.
> 
> I think this is supposed to be fixed in v4.
> 

NFSv4 does not try to "fix" this.  It makes no attempts at "cache
coherency" beyond what NFSv2/3 provide which is "close to open"
cohenrence, meaning that if only one process has a file open at a
time, then everythnig will appear coherent, and if multiple processes
have the file open at the same time, they need to use record locking.

I really don't think total cache coherency is a sensible goal for a
network filesystem, even a cluster filesystem.  It imposes lots of
extra network traffic that most of the time will be of no value.
If an application needs some degree of coherence, it should be
explicit about it's needs (using open/close or locking) so that the
protocol can provide it then, and only then.

NeilBrown


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: NFS as a Cluster File System.
  2003-01-13 19:36   ` Neil Brown
@ 2003-01-13 20:25     ` David B. Ritch
  2003-01-13 20:40       ` Neil Brown
  2003-01-14 15:46     ` Trond Myklebust
  1 sibling, 1 reply; 18+ messages in thread
From: David B. Ritch @ 2003-01-13 20:25 UTC (permalink / raw)
  To: Neil Brown; +Cc: Alan Robertson, Lorn Kay, NFS mailing list, linux-ha

I agree that cache coherency is not a sensible goal for a cluster
filesystem.  However, cache coherency of metadata is rather important. 
For example, when one node creates a file of intermediate data, it is
important for the other nodes to be able to see that.  Using actime=0 is
the conventional mechanism for allowing file creation and deletion to be
propagated quickly.  Usually, one can tweak that a bit to reduce the
burden on the server.  However, it might be be nice if there were a
mechanism to propagate this sort of metadata change without dumping all
metadata over a second or two old.

dbr

On Mon, 2003-01-13 at 14:36, Neil Brown wrote:
> On Thursday January 9, alanr@unix.sh wrote:
> > 
> > NFS V3 and before have problems with "cache coherency".  That is, the 
> > different nodes in the cluster are not guaranteed to see the same contents.
> > 
> > I think this is supposed to be fixed in v4.
> > 
> 
> NFSv4 does not try to "fix" this.  It makes no attempts at "cache
> coherency" beyond what NFSv2/3 provide which is "close to open"
> cohenrence, meaning that if only one process has a file open at a
> time, then everythnig will appear coherent, and if multiple processes
> have the file open at the same time, they need to use record locking.
> 
> I really don't think total cache coherency is a sensible goal for a
> network filesystem, even a cluster filesystem.  It imposes lots of
> extra network traffic that most of the time will be of no value.
> If an application needs some degree of coherence, it should be
> explicit about it's needs (using open/close or locking) so that the
> protocol can provide it then, and only then.
> 
> NeilBrown
> 
> 
> -------------------------------------------------------
> This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
> are you planning your Web Server Security? Click here to get a FREE
> Thawte SSL guide and find the answers to all your  SSL security issues.
> http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
-- 
David B. Ritch
High Performance Technologies, Inc.


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: NFS as a Cluster File System.
  2003-01-13 20:25     ` David B. Ritch
@ 2003-01-13 20:40       ` Neil Brown
  2003-01-13 20:50         ` David B. Ritch
  0 siblings, 1 reply; 18+ messages in thread
From: Neil Brown @ 2003-01-13 20:40 UTC (permalink / raw)
  To: David B. Ritch; +Cc: Alan Robertson, Lorn Kay, NFS mailing list, linux-ha

On  January 13, dritch@hpti.com wrote:
> I agree that cache coherency is not a sensible goal for a cluster
> filesystem.  However, cache coherency of metadata is rather important. 
> For example, when one node creates a file of intermediate data, it is
> important for the other nodes to be able to see that.  Using actime=0 is
> the conventional mechanism for allowing file creation and deletion to be
> propagated quickly.  Usually, one can tweak that a bit to reduce the
> burden on the server.  However, it might be be nice if there were a
> mechanism to propagate this sort of metadata change without dumping all
> metadata over a second or two old.

If the 'other nodes' open the file and look in it, then they should
see current data (if they don't it's a bug).  If they just 'stat' it
to see if it has changed then they may see and old timestamp.

I recommend openning the file.  It is an explicit way for the
application to say "I really want to know the current state of this
file". 

NeilBrown


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: NFS as a Cluster File System.
  2003-01-13 20:40       ` Neil Brown
@ 2003-01-13 20:50         ` David B. Ritch
  2003-01-13 22:11           ` Neil Brown
  0 siblings, 1 reply; 18+ messages in thread
From: David B. Ritch @ 2003-01-13 20:50 UTC (permalink / raw)
  To: Neil Brown; +Cc: Alan Robertson, Lorn Kay, NFS mailing list, linux-ha

That makes sense.  However, it is common practice in many shops to write
intermediate data files with some sort of serial number or timestamp in
the name, and for the next step in the process to look for data using
"ls" with a wildcard.  When doing that, you don't know what the name of
the next file might be, so you can't simply open it.

While I agree that this is not the most ideal method for coordinating
processing, it is widely used and I have found a need to support it.

We've also had processes fail with a "file not found" error when trying
to read a file recently written by a process on another node.  It has
always been my belief that this was a failure when a process tried to
open the file, and the local metadata cache had not yet been updated. 
Just to clarify - are you saying that the open system call should have
contacted the server, even if the local cached information said that the
file (and perhaps its parent directory) did not exist?

Thanks,

dbr

On Mon, 2003-01-13 at 15:40, Neil Brown wrote:
> On  January 13, dritch@hpti.com wrote:
> > I agree that cache coherency is not a sensible goal for a cluster
> > filesystem.  However, cache coherency of metadata is rather important. 
> > For example, when one node creates a file of intermediate data, it is
> > important for the other nodes to be able to see that.  Using actime=0 is
> > the conventional mechanism for allowing file creation and deletion to be
> > propagated quickly.  Usually, one can tweak that a bit to reduce the
> > burden on the server.  However, it might be be nice if there were a
> > mechanism to propagate this sort of metadata change without dumping all
> > metadata over a second or two old.
> 
> If the 'other nodes' open the file and look in it, then they should
> see current data (if they don't it's a bug).  If they just 'stat' it
> to see if it has changed then they may see and old timestamp.
> 
> I recommend openning the file.  It is an explicit way for the
> application to say "I really want to know the current state of this
> file". 
> 
> NeilBrown
-- 
David B. Ritch
High Performance Technologies, Inc.


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: NFS as a Cluster File System.
  2003-01-13 20:50         ` David B. Ritch
@ 2003-01-13 22:11           ` Neil Brown
  0 siblings, 0 replies; 18+ messages in thread
From: Neil Brown @ 2003-01-13 22:11 UTC (permalink / raw)
  To: David B. Ritch; +Cc: Alan Robertson, Lorn Kay, NFS mailing list, linux-ha

On  January 13, dritch@hpti.com wrote:
> That makes sense.  However, it is common practice in many shops to write
> intermediate data files with some sort of serial number or timestamp in
> the name, and for the next step in the process to look for data using
> "ls" with a wildcard.  When doing that, you don't know what the name of
> the next file might be, so you can't simply open it.

I don't think that this should be a problem for NFS.  To do the 'ls'
or to expand the wildcard you need to open the directory (and do a
readdir) and this should cause the client to check with the server.
Once you have the name the open should work.

> 
> While I agree that this is not the most ideal method for coordinating
> processing, it is widely used and I have found a need to support it.

It seems reasonable to me.

> 
> We've also had processes fail with a "file not found" error when trying
> to read a file recently written by a process on another node.  It has
> always been my belief that this was a failure when a process tried to
> open the file, and the local metadata cache had not yet been updated. 
> Just to clarify - are you saying that the open system call should have
> contacted the server, even if the local cached information said that the
> file (and perhaps its parent directory) did not exist?

Hmmm... My understanding of NFS and 'close to open' semantics is that
on 'open' the client should definately contact the server, atleast to
do a GETATTR on the file, and possibly to do a LOOKUP if there is
doubt as to the current information in the name cache.

However the Linux VFS is not very friendly to network filesystems in
this respect.  The NFS client doesn't know if a given name lookup is
for an "open" or for a "stat" and so it cannot impose it's subtley
different semantics.

So I can well imagine an "open" on a file that another client has just
created failing, if a recent name lookup has said that it didn't
exist.

However if you always do an opendir/readdir first, and only try to
open files that were found in the readdir, then the client should be
able to reliably detect the change to the directory and submit a new
LOOKUP request.  I don't know if the Linux NFS client does this or
not.

I think it is fair to say that Linux isn't really ready for this sort
of tightly-coupled-network-filesystem thing yet.  The VFS just isn't
ready.  It doesn't even allow O_CREAT|O_EXCL to work over NFS even
though the NFSv3 protocol supports.  
The implementers of Lustre has enhanced the VFS for their use.  This
may get into mainline in 2.7 (too late for 2.6), or something else
might be developed.

With careful coding it should be possible to achieve any particular
result, but you really need to know exactly what functionality the NFS
client does, and does not, provide.

[[ NOTE: these replies aren't making it to  linux-ha@muc.de as I am
not a subscriber.  Feel free to forward them if you think that is
appropriate]]

NeilBrown


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: NFS as a Cluster File System.
  2003-01-13 19:36   ` Neil Brown
  2003-01-13 20:25     ` David B. Ritch
@ 2003-01-14 15:46     ` Trond Myklebust
  2003-01-14 16:01       ` Kumaran Rajaram
  1 sibling, 1 reply; 18+ messages in thread
From: Trond Myklebust @ 2003-01-14 15:46 UTC (permalink / raw)
  To: Neil Brown; +Cc: Alan Robertson, Lorn Kay, nfs, linux-ha

>>>>> " " == Neil Brown <neilb@cse.unsw.edu.au> writes:

     > NFSv4 does not try to "fix" this.  It makes no attempts at
     > "cache coherency" beyond what NFSv2/3 provide which is "close
     > to open" cohenrence, meaning that if only one process has a
     > file open at a time, then everythnig will appear coherent, and
     > if multiple processes have the file open at the same time, they
     > need to use record locking.

Note, though, that in addition to supporting file locking, NFSv4 adds
support for file 'delegation' which allow the NFS client to do locking
entirely as a local operation (i.e. there is no need to contact the
server). For most applications, this will make locking a much faster
operation...

Cheers,
  Trond


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: NFS as a Cluster File System.
  2003-01-14 15:46     ` Trond Myklebust
@ 2003-01-14 16:01       ` Kumaran Rajaram
  2003-01-14 16:08         ` Trond Myklebust
  0 siblings, 1 reply; 18+ messages in thread
From: Kumaran Rajaram @ 2003-01-14 16:01 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: nfs


> Note, though, that in addition to supporting file locking, NFSv4 adds
> support for file 'delegation' which allow the NFS client to do locking
> entirely as a local operation (i.e. there is no need to contact the
> server). For most applications, this will make locking a much faster
> operation...

  If file-locking is made local, how do other NFS-clients get to know the
locking info. I suspect this might lead to multiple NFS-clients holding
the lock on the same file simultaneously, leading to file-inconsistency.
Please correct me if Iam wrong.

Thanks,
-Kums

-- Kumaran Rajaram, Mississippi State University --
kums@cs.msstate.edu  <http://www.cs.msstate.edu/~kums>




-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Re: NFS as a Cluster File System.
  2003-01-14 16:01       ` Kumaran Rajaram
@ 2003-01-14 16:08         ` Trond Myklebust
  0 siblings, 0 replies; 18+ messages in thread
From: Trond Myklebust @ 2003-01-14 16:08 UTC (permalink / raw)
  To: Kumaran Rajaram; +Cc: nfs

>>>>> " " == Kumaran Rajaram <kums@CS.MsState.EDU> writes:

     >   If file-locking is made local, how do other NFS-clients get
     >   to know the
     > locking info. I suspect this might lead to multiple NFS-clients
     > holding the lock on the same file simultaneously, leading to
     > file-inconsistency.  Please correct me if Iam wrong.

I suggest you read RFC3010, as this is what the file delegation takes
care of.

Delegation is a way for the server to tell the client that it is the
only current user of that particular file. If another client comes
along and opens the same file, then the server must notify the first
client that it is about to revoke the delegation, and give it a short
period of time in which to flush back all changes (including any locks
that are being held).

Cheers,
  Trond


-------------------------------------------------------
This SF.NET email is sponsored by: FREE  SSL Guide from Thawte
are you planning your Web Server Security? Click here to get a FREE
Thawte SSL guide and find the answers to all your  SSL security issues.
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0026en
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-01-14 16:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-09 19:39 NFS as a Cluster File System Lorn Kay
2003-01-09 21:11 ` Brian Tinsley
2003-01-09 22:04   ` Brian Jackson
2003-01-09 23:02     ` Brian Tinsley
2003-01-09 21:29 ` Alan Robertson
2003-01-13 19:36   ` Neil Brown
2003-01-13 20:25     ` David B. Ritch
2003-01-13 20:40       ` Neil Brown
2003-01-13 20:50         ` David B. Ritch
2003-01-13 22:11           ` Neil Brown
2003-01-14 15:46     ` Trond Myklebust
2003-01-14 16:01       ` Kumaran Rajaram
2003-01-14 16:08         ` Trond Myklebust
2003-01-09 21:50 ` Lars Marowsky-Bree
2003-01-09 23:09   ` Brian Tinsley
2003-01-13  4:20 ` David B. Ritch
  -- strict thread matches above, loose matches on Subject: below --
2003-01-09 23:13 Lorn Kay
2003-01-10  3:34 ` Alan Cox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.