NFS retry on disconnection

All of lore.kernel.org
 help / color / mirror / Atom feed

* NFS retry on disconnection
@ 2002-05-11 17:41 David Chow
  2002-05-12 19:58 ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: David Chow @ 2002-05-11 17:41 UTC (permalink / raw)
  To: linux-fsdevel

Dear all,

If the NFS server is disconnected for some reason, the current NFSv3 
will try forever? Or will not try after some timeouts? If I want to 
change this behaviour, is it a default before of the sunrpc in the 
kernel or it is soem code manipulate by the nfs client in the linux 
kernel? Thanks for helping.

regards,

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-11 17:41 NFS retry on disconnection David Chow
@ 2002-05-12 19:58 ` Trond Myklebust
  0 siblings, 0 replies; 12+ messages in thread
From: Trond Myklebust @ 2002-05-12 19:58 UTC (permalink / raw)
  To: David Chow; +Cc: linux-fsdevel

>>>>> " " == David Chow <davidchow@shaolinmicro.com> writes:

     > Dear all, If the NFS server is disconnected for some reason,
     > the current NFSv3 will try forever? Or will not try after some
     > timeouts? If I want to change this behaviour, is it a default
     > before of the sunrpc in the kernel or it is soem code
     > manipulate by the nfs client in the linux kernel? Thanks for
     > helping.

Timeouts & similar is a *REALLY BADLY CONCEIVED* non-starter of an
idea.

If you are adamant that you want all the aggravation of data loss,
programs crashing, user complaints, etc etc though, you can use the
'soft' mount option.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
@ 2002-05-13 16:17 Bryan Henderson
  2002-05-13 19:53 ` Ion Badulescu
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Bryan Henderson @ 2002-05-13 16:17 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: David Chow, linux-fsdevel

>Timeouts & similar is a *REALLY BADLY CONCEIVED* non-starter of an
>idea.
>
>If you are adamant that you want all the aggravation of data loss,
>programs crashing, user complaints, etc etc though, you can use the
>'soft' mount option.

You sound like someone who has not faced the aggravation of resources that
are hung indefinitely because communication has been lost with an NFS
server which is no longer of any relevance to anything.  I'd say in many
cases that must outweigh the aggravation of data loss and programs crashing
and cause more user complaints.  Or do you have a way besides timeouts to
ease that aggravation.

Though you didn't answer the question, your advice implies that the Linux
NFSv3 filesystem driver retries forever when communication with the NFS
server has been lost.  If not, please correct us.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-13 16:17 Bryan Henderson
@ 2002-05-13 19:53 ` Ion Badulescu
  2002-05-14  6:41 ` David Chow
  2002-05-14  7:11 ` Trond Myklebust
  2 siblings, 0 replies; 12+ messages in thread
From: Ion Badulescu @ 2002-05-13 19:53 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: nfs

On Mon, 13 May 2002 09:17:54 -0700, Bryan Henderson <hbryan@us.ibm.com> wrote:

>>If you are adamant that you want all the aggravation of data loss,
>>programs crashing, user complaints, etc etc though, you can use the
>>'soft' mount option.
> 
> You sound like someone who has not faced the aggravation of resources that
> are hung indefinitely because communication has been lost with an NFS
> server which is no longer of any relevance to anything.  I'd say in many
> cases that must outweigh the aggravation of data loss and programs crashing
> and cause more user complaints.  Or do you have a way besides timeouts to
> ease that aggravation.

It's a lose-lose situation, and really the only proper way to fix the mess
is to bring the downed server back up.

Having NFS timeout underneath you is exactly the same as having a hard 
drive fail underneath you. Most if not all applications are utterly 
unprepared to deal with the resulting I/O errors, thus guaranteeing
data corruption.

The 'soft' mounting is only acceptable for read-only mounts, and even then
only if you really don't care about your input data disappearing at random.
That's AT RANDOM -- it's not a typo, you'll get RANDOM failures, depending
on how loaded your network and/or server are at times.

> Though you didn't answer the question, your advice implies that the Linux
> NFSv3 filesystem driver retries forever when communication with the NFS
> server has been lost.  If not, please correct us.

The default is 'hard' which will retry forever.

You can mount with '-o soft' which will give you the desired data corruption.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: bandwidth@sourceforge.net
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-13 16:17 Bryan Henderson
  2002-05-13 19:53 ` Ion Badulescu
@ 2002-05-14  6:41 ` David Chow
  2002-05-14  7:11 ` Trond Myklebust
  2 siblings, 0 replies; 12+ messages in thread
From: David Chow @ 2002-05-14  6:41 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: Trond Myklebust, linux-fsdevel

在 週二, 2002-05-14 00:17, Bryan Henderson 寫道：
> 
> >Timeouts & similar is a *REALLY BADLY CONCEIVED* non-starter of an
> >idea.
> >
> >If you are adamant that you want all the aggravation of data loss,
> >programs crashing, user complaints, etc etc though, you can use the
> >'soft' mount option.
> 
> You sound like someone who has not faced the aggravation of resources that
> are hung indefinitely because communication has been lost with an NFS
> server which is no longer of any relevance to anything.  I'd say in many
> cases that must outweigh the aggravation of data loss and programs crashing
> and cause more user complaints.  Or do you have a way besides timeouts to
> ease that aggravation.
> 
> Though you didn't answer the question, your advice implies that the Linux
> NFSv3 filesystem driver retries forever when communication with the NFS
> server has been lost.  If not, please correct us.
Trond is talking about the "hard" and "soft" mount options. Let me correct, lets say
I am using hard mounts. If a disconnection occur, will a hard mounted
NFS client retry forever? Or client simply dies out like a soft mount
after a while? For me, I have experience some "real disconnection" after
some time (say 5 min). Since I am working on the server's file system, I
afraid it is my problem from the dentry_to_fh() code or fh_to_dentry()
code on the server side. Just want to clarify, please answer my
question.Just want to know if there are some way to change this kind of
behavior in the current Linux NFS client code. Thanks.

regards,
David

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-13 16:17 Bryan Henderson
  2002-05-13 19:53 ` Ion Badulescu
  2002-05-14  6:41 ` David Chow
@ 2002-05-14  7:11 ` Trond Myklebust
  2002-05-14  8:55   ` David Woodhouse
  2 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2002-05-14  7:11 UTC (permalink / raw)
  To: Bryan Henderson; +Cc: David Chow, linux-fsdevel

Måndag 13. mai 2002 18:17 skreiv Bryan Henderson:

> You sound like someone who has not faced the aggravation of resources that
> are hung indefinitely because communication has been lost with an NFS
> server which is no longer of any relevance to anything.  I'd say in many
> cases that must outweigh the aggravation of data loss and programs crashing
> and cause more user complaints.

Believe me: people tend to complain a lot more vociferously when their program 
goes belly up with an EIO error as a result of server congestion. It always 
appears to come as a big surprise to these people that timeouts can occur due 
to random chance. Try browsing through the L-K or NFS mailinglist archives 
some day...
'soft' RPC calls (i.e. RPC calls with timeouts) might make some limited sense 
on a system where you are using reliable transport (TCP) and have very long 
timeouts. They make no sense whatsoever when designing a filesystem that is 
to run over UDP.

The 'intr' mount allows the user him/herself to choose to interrupt the RPC 
call explicitly. It achieves what you appeared to want without the danger of 
some transient server side issue leading to permanent loss of data for the 
client.

In conclusion therefore:
  - YES: Linux has sane defaults for NFS. We retry indefinitely if the server 
is down.
  - Use 'intr' if you don't care as much about data integrity.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-14  7:11 ` Trond Myklebust
@ 2002-05-14  8:55   ` David Woodhouse
  2002-05-14  9:24     ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: David Woodhouse @ 2002-05-14  8:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bryan Henderson, David Chow, linux-fsdevel

trond.myklebust@fys.uio.no said:
>   - Use 'intr' if you don't care as much about data integrity.

If 'intr' jeopardises data integrity, that should be considered a bug.

There's no reason we should allow ourselves to get confused to the point of 
losing data just because the user wants their prompt back -- other than 
'cleaning up is hard'. 

--
dwmw2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-14  8:55   ` David Woodhouse
@ 2002-05-14  9:24     ` Trond Myklebust
  2002-05-14  9:30       ` David Woodhouse
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2002-05-14  9:24 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Bryan Henderson, David Chow, linux-fsdevel

Tysdag 14. mai 2002 10:55 skreiv David Woodhouse:
> trond.myklebust@fys.uio.no said:
> >   - Use 'intr' if you don't care as much about data integrity.
>
> If 'intr' jeopardises data integrity, that should be considered a bug.
>
> There's no reason we should allow ourselves to get confused to the point of
> losing data just because the user wants their prompt back -- other than
> 'cleaning up is hard'.

Some things are unavoidable. The standard NFS close-to-open cache consistency 
rules are not, for instance, compatible with 'intr' since the latter breaks 
requirements such as 'close() must wait for all data to have been flushed to 
disk'.
Some things should be looked into. For instance, w.r.t. file locking the 
function 'locks_unlock_delete()' doesn't actually try to check for error 
conditions before it clears a lock from the lists. That means that the server 
can easily be left with a lock that never gets cleared.

As you can see, there are good reasons why 'intr' too is not a default mount 
option...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-14  9:24     ` Trond Myklebust
@ 2002-05-14  9:30       ` David Woodhouse
  2002-05-14  9:41         ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: David Woodhouse @ 2002-05-14  9:30 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bryan Henderson, David Chow, linux-fsdevel

trond.myklebust@fys.uio.no said:
>  Some things are unavoidable. The standard NFS close-to-open cache
> consistency  rules are not, for instance, compatible with 'intr' since
> the latter breaks  requirements such as 'close() must wait for all
> data to have been flushed to  disk'.

If the sys_close() system call is interrupted, surely it returns -EINTR and 
the file is still open? If userspace doesn't check that, this is not our 
problem.

And if we _must_ let the sys_close() call succeed even though it was 
interrupted, surely we can enforce cache consistency before allowing the 
next open() to succeed?

--
dwmw2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-14  9:30       ` David Woodhouse
@ 2002-05-14  9:41         ` Trond Myklebust
  2002-05-14  9:46           ` David Woodhouse
  0 siblings, 1 reply; 12+ messages in thread
From: Trond Myklebust @ 2002-05-14  9:41 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Bryan Henderson, David Chow, linux-fsdevel

Tysdag 14. mai 2002 11:30 skreiv David Woodhouse:

> If the sys_close() system call is interrupted, surely it returns -EINTR and
> the file is still open? If userspace doesn't check that, this is not our
> problem.

Look at the code for close on exit. I can't see that those EINTRs actually do 
get propagated to userland.

> And if we _must_ let the sys_close() call succeed even though it was
> interrupted, surely we can enforce cache consistency before allowing the
> next open() to succeed?

Provided you have only one client, yes. The close-to-open consistency rules 
are there to ensure that 2 different NFS clients can operate in serial (i.e. 
not simultaneously) on the same set of files without major cache consistency 
problems.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-14  9:41         ` Trond Myklebust
@ 2002-05-14  9:46           ` David Woodhouse
  2002-05-14 10:11             ` Trond Myklebust
  0 siblings, 1 reply; 12+ messages in thread
From: David Woodhouse @ 2002-05-14  9:46 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Bryan Henderson, David Chow, linux-fsdevel


trond.myklebust@fys.uio.no said:
>  Look at the code for close on exit. I can't see that those EINTRs
> actually do  get propagated to userland. 

True. I suppose you could make a case for ignoring signals in the
close-on-exit case, or just attempt to do the flush as soon as possible 
thereafter. Are there any other cases where it's actually a problem?

--
dwmw2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: NFS retry on disconnection
  2002-05-14  9:46           ` David Woodhouse
@ 2002-05-14 10:11             ` Trond Myklebust
  0 siblings, 0 replies; 12+ messages in thread
From: Trond Myklebust @ 2002-05-14 10:11 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Bryan Henderson, David Chow, linux-fsdevel

Tysdag 14. mai 2002 11:46 skreiv David Woodhouse:
> True. I suppose you could make a case for ignoring signals in the
> close-on-exit case, or just attempt to do the flush as soon as possible
> thereafter.

In theory you could still get into a mess if some script uses lockfile or a 
similar utility in order to serialize write access between 2 different NFS 
clients to the same file.

>  Are there any other cases where it's actually a problem?

mmap() is usually good for a laugh or two when discussing the effects of lack 
of cache consistency, but offhand I can't think of any application that would 
care...

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2002-05-14 10:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-11 17:41 NFS retry on disconnection David Chow
2002-05-12 19:58 ` Trond Myklebust
  -- strict thread matches above, loose matches on Subject: below --
2002-05-13 16:17 Bryan Henderson
2002-05-13 19:53 ` Ion Badulescu
2002-05-14  6:41 ` David Chow
2002-05-14  7:11 ` Trond Myklebust
2002-05-14  8:55   ` David Woodhouse
2002-05-14  9:24     ` Trond Myklebust
2002-05-14  9:30       ` David Woodhouse
2002-05-14  9:41         ` Trond Myklebust
2002-05-14  9:46           ` David Woodhouse
2002-05-14 10:11             ` Trond Myklebust

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.