Re: network storage solutions

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: network storage solutions
       [not found] <1053018023.2883.168.camel@protein.scalableinformatics.com>
@ 2003-05-15 17:50 ` Jeff Layton
  2003-05-15 18:19   ` Brian Pawlowski
  2003-05-15 18:56   ` Joe Landman
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff Layton @ 2003-05-15 17:50 UTC (permalink / raw)
  To: Beowulf; +Cc: Joseph Landman, nfs

Joseph Landman wrote:

> On Thu, 2003-05-15 at 12:24, Jeff Layton wrote:
> > Jeff Layton wrote:
> >
> > > Joe Landman wrote:
> > >
> > >> Note: the soft vs hard mount is a matter of "religion" to some 
> folk.  I
> > >> usually specify
> > >>
> > >
> > >   I don't think it's really a religion. From what I've read,
> > > the NFS guru's say that you have to use hard mounts to
> > > guarantee data integrity (which I'm sure everyone wants
> > > for a rw mounted filesystem). Here is one reference:
> > >
> > > http://www.netapp.com/tech_library/3183.html#3.
>
> I still maintain it is a religious preference.  Hard mounts can and will
> crash client machines in the event of a server being permanently down.
> Some folks want that behavior.  Some do not.  This is also a religious
> war.
>

   I'm cc-ing the NFS mailing list to get their input on this.
However, let me say that I don't really view it as a religious
preference. If I lose my server in a cluster, I don't mind
losing the nodes (however, we've lost the NFS server
before and never lost any of the nodes on a 288 node cluster
even though they are hard mounted - strange).
   Since we use our cluster for production work (please, I'm
not trying to offend anyone), we HAVE to have non-corrupted
data. This is why we use hard mounts with 'sync' as well as
a few other options. The URL above to Chuck's paper has
several examples of "good" mount options.

> Amazing how many of them occur.
>
> The way I and other who use soft mounts view it, data lossage occurs
> when the server crashes, as you cannot guarantee (except with sync),
> that the data was committed to disk.
>

However, if I read Chuck's paper correctly, with soft mount
you can get a soft time-out that can interrupt an operation
but the client will continue then with corrupted data. Am I
understanding this correctly? Therefore, the clients may be
up, but now the data is corrupt and the appliation doesn't
know it.


> Worse, if you are using a
> journaling fs on the NFS server side, to recover the fs, there may
> require a roll-back of the fs state.  This would crash a transaction in
> progress on the client with a hard mount and sync, and in a number of
> cases, crash the kernel.  With a soft mount, and sync, you would get an
> error.  Please note that this is a highly oversimplified version of what
> really happens, and some may disagree with the statements.  Refer to the
> source to see what happens.  Wont be reproduced here.
>
> Which one is more relevant to you is more a matter of preference than of
> data security.  If your server crashes, you are going to lose
> transactions in flight, written but not committed.  How the client
> responds to those is a matter of preference.  This is where the
> religious aspect crops up.
>

I'm not sure... If the server crashes, I think this is true.
But what if you get an interrupt. Soft mounts will allow
the application to continue with corrupted data while hard
mounts will produce an error, but not corrupt data (I think).

Jeff

> [...]
>
> > >> as options on my mounts.  I prefer the soft mount for a number of
> > >> reasons, most notably stability of the whole cluster is not a 
> function
> > >> of the least stable server.
>
> This really opens up some of the points of how to handle errors in the
> cluster shared file system.
>
> -- 
> Joseph Landman <landman@scalableinformatics.com>
>


-- 
Jeff Layton
Senior Engineer - Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta

"Is it possible to overclock a cattle prod?" - Irv Mullins




-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: network storage solutions
  2003-05-15 17:50 ` network storage solutions Jeff Layton
@ 2003-05-15 18:19   ` Brian Pawlowski
  2003-05-16  6:07     ` Brian Pawlowski
  2003-05-15 18:56   ` Joe Landman
  1 sibling, 1 reply; 9+ messages in thread
From: Brian Pawlowski @ 2003-05-15 18:19 UTC (permalink / raw)
  To: jeffrey.b.layton; +Cc: beowulf, landman, nfs

It's not religious.

It's simple.

NFS servers (that I use) commit their data to persistent storage
before replying to the client.  This protects against simple data loss
in face of server reboots.  If they didn't do this, I could get
silent data loss or corruptions of data that my application may not be aware of
or recover from.

That's expected behaviour from hard mounts on a client to a server.

Soft mounts say "Try, but after N errors of transmission give up."

People use soft mounts for (1) improved performance (you can juice
up cheap servers by caching data), or (2) prevent hung clients
in face of unreliable networks and servers (when client is accessing
many NFS servers).

At Sun, I felt in the end soft mounts were a bad idea.  Better was "intr"
where at least user interaction could override "hard" mount guarantees,
and the user can make a choice of "screw my data".

Today, though, even the reboot persistence of data is inadequate
for many critical apps.  Commercial servers have RAID or mirroring,
clustered configs for eliminating single points of failure (and hung mounts),
etc.

> Joseph Landman wrote:
> 
> > On Thu, 2003-05-15 at 12:24, Jeff Layton wrote:
> > > Jeff Layton wrote:
> > >
> > > > Joe Landman wrote:
> > > >
> > > >> Note: the soft vs hard mount is a matter of "religion" to some 
> > folk.  I
> > > >> usually specify
> > > >>
> > > >
> > > >   I don't think it's really a religion. From what I've read,
> > > > the NFS guru's say that you have to use hard mounts to
> > > > guarantee data integrity (which I'm sure everyone wants
> > > > for a rw mounted filesystem). Here is one reference:
> > > >
> > > > http://www.netapp.com/tech_library/3183.html#3.
> >
> > I still maintain it is a religious preference.  Hard mounts can and will
> > crash client machines in the event of a server being permanently down.
> > Some folks want that behavior.  Some do not.  This is also a religious
> > war.
> >
> 
>    I'm cc-ing the NFS mailing list to get their input on this.
> However, let me say that I don't really view it as a religious
> preference. If I lose my server in a cluster, I don't mind
> losing the nodes (however, we've lost the NFS server
> before and never lost any of the nodes on a 288 node cluster
> even though they are hard mounted - strange).
>    Since we use our cluster for production work (please, I'm
> not trying to offend anyone), we HAVE to have non-corrupted
> data. This is why we use hard mounts with 'sync' as well as
> a few other options. The URL above to Chuck's paper has
> several examples of "good" mount options.
> 
> > Amazing how many of them occur.
> >
> > The way I and other who use soft mounts view it, data lossage occurs
> > when the server crashes, as you cannot guarantee (except with sync),
> > that the data was committed to disk.
> >
> 
> However, if I read Chuck's paper correctly, with soft mount
> you can get a soft time-out that can interrupt an operation
> but the client will continue then with corrupted data. Am I
> understanding this correctly? Therefore, the clients may be
> up, but now the data is corrupt and the appliation doesn't
> know it.
> 
> 
> > Worse, if you are using a
> > journaling fs on the NFS server side, to recover the fs, there may
> > require a roll-back of the fs state.  This would crash a transaction in
> > progress on the client with a hard mount and sync, and in a number of
> > cases, crash the kernel.  With a soft mount, and sync, you would get an
> > error.  Please note that this is a highly oversimplified version of what
> > really happens, and some may disagree with the statements.  Refer to the
> > source to see what happens.  Wont be reproduced here.
> >
> > Which one is more relevant to you is more a matter of preference than of
> > data security.  If your server crashes, you are going to lose
> > transactions in flight, written but not committed.  How the client
> > responds to those is a matter of preference.  This is where the
> > religious aspect crops up.
> >
> 
> I'm not sure... If the server crashes, I think this is true.
> But what if you get an interrupt. Soft mounts will allow
> the application to continue with corrupted data while hard
> mounts will produce an error, but not corrupt data (I think).
> 
> Jeff
> 
> > [...]
> >
> > > >> as options on my mounts.  I prefer the soft mount for a number of
> > > >> reasons, most notably stability of the whole cluster is not a 
> > function
> > > >> of the least stable server.
> >
> > This really opens up some of the points of how to handle errors in the
> > cluster shared file system.
> >
> > -- 
> > Joseph Landman <landman@scalableinformatics.com>
> >
> 
> 
> -- 
> Jeff Layton
> Senior Engineer - Aerodynamics and CFD
> Lockheed-Martin Aeronautical Company - Marietta
> 
> "Is it possible to overclock a cattle prod?" - Irv Mullins
> 
> 
> 
> 
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux enterprise solutions
> www.enterpriselinuxforum.com
> 
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re: network storage solutions
  2003-05-15 18:19   ` Brian Pawlowski
@ 2003-05-16  6:07     ` Brian Pawlowski
  0 siblings, 0 replies; 9+ messages in thread
From: Brian Pawlowski @ 2003-05-16  6:07 UTC (permalink / raw)
  To: Brian Pawlowski; +Cc: jeffrey.b.layton, beowulf, landman, nfs

> People use soft mounts for (1) improved performance (you can juice
> up cheap servers by caching data), or (2) prevent hung clients
> in face of unreliable networks and servers (when client is accessing
> many NFS servers).

Skip (1) - that is (a)sync on server (not soft mounts).  I should think before
I type:-)

So, Windows CIFS has dramatic "soft mount" like behaviour (some
popup like "Delayed writes lost" on session disconnect).  Always
a pain - makes me want hard mount NFS behaviour.


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: network storage solutions
  2003-05-15 17:50 ` network storage solutions Jeff Layton
  2003-05-15 18:19   ` Brian Pawlowski
@ 2003-05-15 18:56   ` Joe Landman
  2003-05-15 22:01     ` Trent Piepho
  1 sibling, 1 reply; 9+ messages in thread
From: Joe Landman @ 2003-05-15 18:56 UTC (permalink / raw)
  To: jeffrey.b.layton; +Cc: Beowulf, nfs

On Thu, 2003-05-15 at 13:50, Jeff Layton wrote:

>    Since we use our cluster for production work (please, I'm
> not trying to offend anyone), we HAVE to have non-corrupted
> data. This is why we use hard mounts with 'sync' as well as
> a few other options. The URL above to Chuck's paper has
> several examples of "good" mount options.

Hmmm.  I am reasonably sure that when the IO system returns an error, it
does in fact get propagated to the appropriate user-land calling
program.  The program then makes the determination as to whether or not
to continue.  There are quite a few programs that rarely inspect return
code from file operations.  If you really require uncorrupted data, then
you are probably using the synchronous/unbuffered file writes anyway
(the O_SYNC, and possibly O_DIRECT options, though NFS has experimental
support for O_DIRECT from reading the note around Trond's patches).

> > The way I and other who use soft mounts view it, data lossage occurs
> > when the server crashes, as you cannot guarantee (except with sync),
> > that the data was committed to disk.
> >
> 
> However, if I read Chuck's paper correctly, with soft mount
> you can get a soft time-out that can interrupt an operation
> but the client will continue then with corrupted data. Am I
> understanding this correctly? Therefore, the clients may be
> up, but now the data is corrupt and the appliation doesn't
> know it.

I would like to know that as well.  I would like to believe it will not
continue with corrupt data, but return an error code/condition which
should be handled.

[...]

> I'm not sure... If the server crashes, I think this is true.
> But what if you get an interrupt. Soft mounts will allow
> the application to continue with corrupted data while hard
> mounts will produce an error, but not corrupt data (I think).

I hope not.  The programs that I send an INTR to on an NFS system (with
the intr flag allowed) seem to accept the signal and die.  I guess the
question is here, what should be the state of the filesystem upon
acceptance of that signal?  Can you assume it is in a known state?

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615

-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: network storage solutions
  2003-05-15 18:56   ` Joe Landman
@ 2003-05-15 22:01     ` Trent Piepho
  0 siblings, 0 replies; 9+ messages in thread
From: Trent Piepho @ 2003-05-15 22:01 UTC (permalink / raw)
  Cc: Beowulf, nfs

On 15 May 2003, Joe Landman wrote:
> > However, if I read Chuck's paper correctly, with soft mount
> > you can get a soft time-out that can interrupt an operation
> > but the client will continue then with corrupted data. Am I
> > understanding this correctly? Therefore, the clients may be
> > up, but now the data is corrupt and the appliation doesn't
> > know it.
> 
> I would like to know that as well.  I would like to believe it will not
> continue with corrupt data, but return an error code/condition which
> should be handled.

That was my experience.  We had a problem with soft NFS timing out during huge
IO loads to large raid arrays.  With a large server side cache getting
flushed, some NFS requests could take several tens of seconds before the
server got around to processing them.  The soft NFS timeout limit turned out
to be quite small.  When this happens, there was both a message to syslog from
the kernel about nfs timeout exceeded, and the application returned an error
(read:  I/O error or something of that nature).  I can see how a poorly coded
(though not uncommon) program that doesn't check the return value of read and
write calls would not detect the failure.  I raised the timeout to a more
reasonable value, and no problems since.

-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <3EC2A740.9060902@cert.ucr.edu>]

[parent not found: <Pine.LNX.3.96.1030514135224.2430H-100000@Maggie.Linux-Consulting.com>]

[parent not found: <20030515070359.GB1912@greglaptop.attbi.com>]

[parent not found: <3EC3ECC6.6000802@cert.ucr.edu>]

[parent not found: <3EC40815.9040504@lanl.gov>]

* Re: network storage solutions
       [not found]       ` <3EC40815.9040504@lanl.gov>
@ 2003-05-15 18:12         ` Jeffrey B. Layton
  2003-05-16 13:21           ` Robert G. Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Jeffrey B. Layton @ 2003-05-15 18:12 UTC (permalink / raw)
  To: beowulf; +Cc: Josip Loncaric, nfs, Charles.Lever

Josip Loncaric wrote:

> Glen Kaukola wrote:
>
>> Greg Lindahl wrote:
>>
>>> There are soft and hard mounts, and there are interruptable mounts
>>> ("intr" -- check out "man nfs").
>>>
>>> A hard mount will never time out. If you make it interruptable, then
>>> the user can choose to ^C. This is the safe option.
>>>  
>>>
>>
>> You know, I thought that's how it was supposed to work too.  I do use 
>> the intr option, but even with that option, when a nfs drive is down, 
>> and something like a df command gets stuck, hitting ctrl-c doesn't 
>> seem to do a thing.  All I can ever do is just kill my xterm or 
>> whatever.  
>
>
> We've had similar problems while I was at ICASE.  "Hard" mounts would 
> lock up client processes (even unmount) when the NFS server went down, 
> but "soft" mounts were "too soft" for some of our users.  A reasonable 
> solution is to "harden" your soft mounts by insisting on longer major 
> timeouts, as in "retrans=15" (the default is 3). 


I still think this is dangerous. With soft mounts you can
still get silent data corruption despite the longer timeouts.
Chuck, do you agree?

Jeff

>
>
> Sincerely,
> Josip
>
> P.S.  Our NFS servers virtually never went down, except due to 
> hardware problems or service, so indefinite retransmissions were 
> highly undesirable.
>




-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: network storage solutions
  2003-05-15 18:12         ` Jeffrey B. Layton
@ 2003-05-16 13:21           ` Robert G. Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Robert G. Brown @ 2003-05-16 13:21 UTC (permalink / raw)
  To: Jeffrey B. Layton; +Cc: beowulf, Josip Loncaric, nfs, Charles.Lever

On Thu, 15 May 2003, Jeffrey B. Layton wrote:

> > We've had similar problems while I was at ICASE.  "Hard" mounts would 
> > lock up client processes (even unmount) when the NFS server went down, 
> > but "soft" mounts were "too soft" for some of our users.  A reasonable 
> > solution is to "harden" your soft mounts by insisting on longer major 
> > timeouts, as in "retrans=15" (the default is 3). 
> 
> 
> I still think this is dangerous. With soft mounts you can
> still get silent data corruption despite the longer timeouts.
> Chuck, do you agree?
> 
> Jeff

Perhaps it is a question of probability, and what people are willing to
accept in terms of data loss in a given environment.  It is a
cost-benefit equation, as always, so acceptable solutions do have to at
least examine the cost of a corrupted file against other costs
associated with using hard mounts everywhere.  

In one somewhat jaded view, one says "crashes happen, and if a crash
occurs in the middle of a file write there is a distinct chance of
losing that file".  This is (I suspect) true anyway for both hard and
soft mounts, depending on the cause of the crash and what has to be done
to fix it.  If the exported filesystem is left in an inconsistent state
post-crash and is modified before being re-exported to the clients, they
are likely to see a stale mount and not be able to complete the ongoing
write transaction.

Nowadays (within linux) it is indeed pretty rare as Greg noted for a
client not to recover gracefully from a server crash and reboot,
although I confess to being less lucky -- we still see stale NFS mounts
after certain crashes, and generally plan on being ABLE to reboot all
the clients in the department after any major, planned downtime of our
principle servers.  There seems to be a bit of state dependence here --
"most" clients recover, but one or two sometimes seem to hang and need
either a therapeutic reboot or at least a remount to clear some
state-dependent problem.

A question for the experts out there -- does the use of a journalling
filesystem affect the probability of NFS file corruption on a soft
mount?  As in, is there any interaction between the journal and the NFS
server that cause an incomplete or corrupted transaction to be
interpreted as cause for invoking some of the protections journalling
provides?  I'm just curious...one would think that NFS would effectively
"journal" itself to consistently end up in a "reliable" state (which
might well cost one the latest writes to the file!) even on a soft
mount.

The probability and cost-benefit issues are often related to LAN
architecture.  In a common architecture, one has a single (or perhaps
2-3) "major server(s)" that have lots of capacity in all dimensions.
This is where users manipulate "critical data" (e.g. home directories,
project directories), and one EXPECTS the LAN to effectively go down
when these servers are down so the mounts should definitely be hard
mounts (although they might well be automounts, so your system isn't
hung if YOUR home directory server stays up).  To protect against
anomalous amounts of downtime (which DEFINITELY costs one work at a
fixed rate, compared to the stochastic expectation of loss in the case
of possible data corruption) one makes the servers as reliable as
possible -- they are architected "not to go down" and have things like
four-hour service or hot mirror spares.

In a few cases, as in Greg's example, lots of people with desktop
workstations export workspace and crossmount it all over the place.
Then the issue becomes one of cumulative stability of the workstation
space.  Because of the nasty behaviors of e.g. stat, it is quite common
for a system or at least a session to effectively hang when ANY of its
hard mounts go down -- perhaps not to crash, and to recover gracefully
when the offending server comes back up, but in the meantime you can
lose access to your workstation and ability to do work -- a real cost,
potentially multiplied by N on a big network where NOBODY gets to do
work until the workstation is back up.  The downed workstation might NOT
be so reliable and might NOT have hot and cold running service and might
stay down a day or more, and a decision might well be made to take
draconian measures to free up the (not really) "hung" clients.  This
also can cost work, e.g. work in progress, where a user may have to
choose between not being able to work interactively while their
background task completes or e.g. killing the background task with a
reboot to come back up without the hung mount.

Obviously the "best" solution to this sort of situation is to not put
NFS exports on your path where you can avoid it, and to use the
automounter to effectively reduce the number of mounts that can hang on
a general path stat or df to the unavoidable main (hopefully reliable)
filesystems plus those exported spaces belonging to your buddies that
you actually are using NOW.  However, in a small/informal LAN (like a
home network), where the workstations that are providing the mounts
aren't horribly overloaded at either the network or CPU or memory level
(so they aren't at all likely to timeout on an NFS request) and where
the admin either doesn't want to figure out the automounter or just
isn't that concerned about the (low) probability of data corruption, one
might choose as a "quick and dirty" solution to use a soft mount and bet
that data corruption never occurs.

Back a LONG time ago when NFS recovery on a hard mount was basically
nonexistent (e.g. SunOS, Irix, etc.) I used sometimes used soft mounts
on crossmounted workstation spaces and our (much slower, much less
powerful client "servers") LAN never knowingly had a problem with data
actually being corrupted -- although files were sometimes lost, leaving
one of those lovely .nfs323112114 tags -- so I'd guess that the
>>probability<< of silent corruption is actually pretty low.  On the
other hand, even a soft mount was never really all that recoverable
either -- NFS just plain had a way of stubbornly hanging whenever a
server went down, no matter what.

It proved smarter, more cost-beneficial, and more professional in the
long run (in a production LAN environment, with real costs associated
with EVERYTHING) to consolidate exported space, including e.g. project
space, into a very few, very reliable servers, period and just not LET
"everybody mount everybody else", soft OR hard.  Um, so to speak;-) I AM
talking about networking here, after all...

In summary, while soft mounts exist(ed) "for a reason", they never in
the past and still don't work terribly well or reliably, and the reasons
themselves for using them have mostly passed on.  There are better ways
to cope with the cost/benefit dilemna between de facto hung workstations
and possibly lost/corrupt data.  The vast improvements in automounters
(which back in those same old days sucked incredibly and were as likely
to produce problems as to solve them:-) make automounters with hard
mounts, from a few, reliable, consolidated servers, by far the
preferrable solution to the problem.  The fact is that most users of
single user workstations are most unlikely to have more than one or two
automount directories mounted at any one time (within the mount timeout
window) simply because they will typically be "working" at one path
location or perhaps two at a time.

Server consolidation also makes it MUCH easier to back things up,
another "chronic" problem in cowboy networks where there otherwise would
be project directories on fifty workstations, most of them with
relatively unreliable IDE disks, every one being used by somebody that
would whine or bluster and threaten if their data went away upon the
crash of their cheap, three year old disk.  Then there are the "control
and security issues" -- NFS is a bleeding wound as far as security is
concerned anyway (or at least has been historically) and all those
crossmounts on private workstations offer a cracker or evil employee
numerous opportunities to be naughty.  True, one generally keeps a
sucker rod handy to school the latter, but cleaning it afterwards is
such a mess.  Workstations just aren't architected (without effort and
additional expense) to be good, secure, reliable servers in a LAN
serving hundreds of clients.

The "best" solution for "most" LAN architectures is thus to automount
basically everything but the home directories or other "critical"
filesystems mounted from a few reliable servers -- maybe even automount
the home directories (if you have more than one home server, e.g.)!
That way a desktop client system doesn't (generally) "hang"
(recoverably, but hang nonetheless as far as the user is concerned) or
become otherwise difficult to work with if a non-critical or currently
unused server dies -- at most one might lose a tty window when one tries
to access an automount, but if one keeps the automounts off of one's
path then the path stat won't hang (almost) every shell transaction.
Administrative control is concentrated in a relatively few points of
failure and systems to secure and back up.  Data reliability and
protection against loss of work time and access at the same time.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu

-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: network storage solutions
@ 2003-05-16 16:16 Lever, Charles
  0 siblings, 0 replies; 9+ messages in thread
From: Lever, Charles @ 2003-05-16 16:16 UTC (permalink / raw)
  To: Jeffrey B. Layton, beowulf; +Cc: Josip Loncaric, nfs

> > We've had similar problems while I was at ICASE.  "Hard"=20
> mounts would=20
> > lock up client processes (even unmount) when the NFS server=20
> went down,=20
> > but "soft" mounts were "too soft" for some of our users.  A=20
> reasonable=20
> > solution is to "harden" your soft mounts by insisting on=20
> longer major=20
> > timeouts, as in "retrans=3D15" (the default is 3).=20
>=20
>=20
> I still think this is dangerous. With soft mounts you can
> still get silent data corruption despite the longer timeouts.
> Chuck, do you agree?

yes. there is always a probability of corruption if there is
the possibility that the client will give up before the
operation has completed.

you can reduce that probability by following the suggestions
i posted yesterday.


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: network storage solutions
@ 2003-05-16 16:23 Lever, Charles
  0 siblings, 0 replies; 9+ messages in thread
From: Lever, Charles @ 2003-05-16 16:23 UTC (permalink / raw)
  To: Robert G. Brown, Jeffrey B. Layton; +Cc: beowulf, Josip Loncaric, nfs

> -----Original Message-----
> From: Robert G. Brown [mailto:rgb@phy.duke.edu]
> Sent: Friday, May 16, 2003 9:22 AM
> To: Jeffrey B. Layton
> Cc: beowulf@beowulf.org; Josip Loncaric; nfs@lists.sourceforge.net;
> Lever, Charles
> Subject: Re: network storage solutions
>=20
> Perhaps it is a question of probability, and what people are=20
> willing to
> accept in terms of data loss in a given environment.  It is a
> cost-benefit equation, as always, so acceptable solutions do=20
> have to at
> least examine the cost of a corrupted file against other costs
> associated with using hard mounts everywhere. =20

right.  we're dealing with probabilities here.  there is always
a non-zero probability of data corruption, even with local
file systems.

using soft mounts increases the probability of silent data
corruption.  if you can live with that, or you have solid
recovery mechanisms, then soft is a reasonable choice.

but it's best to be informed about this choice, rather than
just stabbing at using soft mounts because it makes other
problems go away.


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-05-16 16:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1053018023.2883.168.camel@protein.scalableinformatics.com>
2003-05-15 17:50 ` network storage solutions Jeff Layton
2003-05-15 18:19   ` Brian Pawlowski
2003-05-16  6:07     ` Brian Pawlowski
2003-05-15 18:56   ` Joe Landman
2003-05-15 22:01     ` Trent Piepho
     [not found] <3EC2A740.9060902@cert.ucr.edu>
     [not found] ` <Pine.LNX.3.96.1030514135224.2430H-100000@Maggie.Linux-Consulting.com>
     [not found]   ` <20030515070359.GB1912@greglaptop.attbi.com>
     [not found]     ` <3EC3ECC6.6000802@cert.ucr.edu>
     [not found]       ` <3EC40815.9040504@lanl.gov>
2003-05-15 18:12         ` Jeffrey B. Layton
2003-05-16 13:21           ` Robert G. Brown
2003-05-16 16:16 Lever, Charles
  -- strict thread matches above, loose matches on Subject: below --
2003-05-16 16:23 Lever, Charles

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.