why do attempts to access a nfs v3 filesystem (ro,soft) block the process for minutes at a time? (when the nfs server is down)

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* why do attempts to access a nfs v3 filesystem (ro,soft) block the process for minutes at a time? (when the nfs server is down)
@ 2010-07-16 15:20 Tom H
  2010-07-16 15:25 ` Chuck Lever
  0 siblings, 1 reply; 4+ messages in thread
From: Tom H @ 2010-07-16 15:20 UTC (permalink / raw)
  To: linux-nfs

(apologies for the cross post from the deprecated list)

Hi all,

I have a web server which serves some content from an nfs filesystem
mounted like so;
nfsserver1:/somemount /var/www/html/somefiles  nfs     rw,soft
             0 0

# mount | grep nfs
nfsserver1:/somemount on /var/www/html/somefiles type nfs
(ro,soft,addr=xx.xx.xx.xx)

According to the documentation, an NFS operation on a soft mount should
wait for a "major timeout" and then report "server not responding" to
syslog and return an error. where a major timeout is after default
retrans=3 retransmissions.

I understand the process to be like this;
call --->0.7 secs --->retransmission--->1.4
secs--->retransmission--->2.8 secs--->server not responding(major timeout)

However it is pretty clear that this is retrying indefinitely (or at 
least many more times that I would like), as the
log files show loads of;
Jul 16 07:56:09 server1 kernel: nfs: server server2 not responding,
timed out
Jul 16 07:57:09 server1 last message repeated 4 times
Jul 16 07:57:09 server1 last message repeated 6 times

and eventually this kills the apache server as all the available
processes are blocked during "retrying indefinitely", until the apache
server is restarted. (restarting the nfs server at this point does not
seem to recover the apache child processes)

So what should my strategy be to stop the failed mount killing apache. I
care more about the apache staying up, as I don't have that much control
over the nfs server..

(also I noticed that it seems to timeout quicker with the mount options
set like (soft, timeo=7, retrans=3) which is unexpected, because they
are supposed to be the default)

Regards and thanks in advance,
T

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: why do attempts to access a nfs v3 filesystem (ro,soft) block the process for minutes at a time? (when the nfs server is down)
  2010-07-16 15:20 why do attempts to access a nfs v3 filesystem (ro,soft) block the process for minutes at a time? (when the nfs server is down) Tom H
@ 2010-07-16 15:25 ` Chuck Lever
  2010-07-16 16:10   ` Tom H
  0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2010-07-16 15:25 UTC (permalink / raw)
  To: Tom H; +Cc: linux-nfs

On 07/16/2010 11:20 AM, Tom H wrote:
>
> (apologies for the cross post from the deprecated list)
>
> Hi all,
>
> I have a web server which serves some content from an nfs filesystem
> mounted like so;
> nfsserver1:/somemount /var/www/html/somefiles nfs rw,soft
> 0 0
>
> # mount | grep nfs
> nfsserver1:/somemount on /var/www/html/somefiles type nfs
> (ro,soft,addr=xx.xx.xx.xx)
>
> According to the documentation, an NFS operation on a soft mount should
> wait for a "major timeout" and then report "server not responding" to
> syslog and return an error. where a major timeout is after default
> retrans=3 retransmissions.
>
> I understand the process to be like this;
> call --->0.7 secs --->retransmission--->1.4
> secs--->retransmission--->2.8 secs--->server not responding(major timeout)
>
> However it is pretty clear that this is retrying indefinitely (or at
> least many more times that I would like), as the
> log files show loads of;
> Jul 16 07:56:09 server1 kernel: nfs: server server2 not responding,
> timed out
> Jul 16 07:57:09 server1 last message repeated 4 times
> Jul 16 07:57:09 server1 last message repeated 6 times
>
> and eventually this kills the apache server as all the available
> processes are blocked during "retrying indefinitely", until the apache
> server is restarted. (restarting the nfs server at this point does not
> seem to recover the apache child processes)
>
> So what should my strategy be to stop the failed mount killing apache. I
> care more about the apache staying up, as I don't have that much control
> over the nfs server..
>
> (also I noticed that it seems to timeout quicker with the mount options
> set like (soft, timeo=7, retrans=3) which is unexpected, because they
> are supposed to be the default)

They are the default settings for UDP mounts, but you didn't specify 
UDP.  TCP is the default transport protocol, and has been for some time. 
  TCP uses a long retransmit timeout.  See nfs(5).

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: why do attempts to access a nfs v3 filesystem (ro,soft) block the process for minutes at a time? (when the nfs server is down)
  2010-07-16 15:25 ` Chuck Lever
@ 2010-07-16 16:10   ` Tom H
  2010-07-16 16:26     ` Chuck Lever
  0 siblings, 1 reply; 4+ messages in thread
From: Tom H @ 2010-07-16 16:10 UTC (permalink / raw)
  Cc: linux-nfs

Chuck Lever wrote:
> On 07/16/2010 11:20 AM, Tom H wrote:
>>
>> (also I noticed that it seems to timeout quicker with the mount options
>> set like (soft, timeo=7, retrans=3) which is unexpected, because they
>> are supposed to be the default)
>
> They are the default settings for UDP mounts, but you didn't specify 
> UDP.  TCP is the default transport protocol, and has been for some 
> time.  TCP uses a long retransmit timeout.  See nfs(5).
>
OK, I see that now. Thanks.!

However further experimentation with mount options 
(ro,soft,retrans=0,timeo=0,intr,proto=tcp) - requests to a failed nfs 
file-system still block the apache process for some apparently random 
time up to 3 minutes.

Cheers
T

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: why do attempts to access a nfs v3 filesystem (ro,soft) block the process for minutes at a time? (when the nfs server is down)
  2010-07-16 16:10   ` Tom H
@ 2010-07-16 16:26     ` Chuck Lever
  0 siblings, 0 replies; 4+ messages in thread
From: Chuck Lever @ 2010-07-16 16:26 UTC (permalink / raw)
  To: Tom H; +Cc: linux-nfs

On 07/16/10 12:10 PM, Tom H wrote:
> Chuck Lever wrote:
>> On 07/16/2010 11:20 AM, Tom H wrote:
>>>
>>> (also I noticed that it seems to timeout quicker with the mount options
>>> set like (soft, timeo=7, retrans=3) which is unexpected, because they
>>> are supposed to be the default)
>>
>> They are the default settings for UDP mounts, but you didn't specify
>> UDP. TCP is the default transport protocol, and has been for some
>> time. TCP uses a long retransmit timeout. See nfs(5).
>>
> OK, I see that now. Thanks.!
>
> However further experimentation with mount options
> (ro,soft,retrans=0,timeo=0,intr,proto=tcp) - requests to a failed nfs
> file-system still block the apache process for some apparently random
> time up to 3 minutes.

I don't know exactly what retrans=0 and timeo=0 might do, but short 
timeouts over TCP are not recommended.  If you want it to fail sooner 
(and your network is clean enough), use proto=udp.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-07-16 16:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-16 15:20 why do attempts to access a nfs v3 filesystem (ro,soft) block the process for minutes at a time? (when the nfs server is down) Tom H
2010-07-16 15:25 ` Chuck Lever
2010-07-16 16:10   ` Tom H
2010-07-16 16:26     ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).