From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from woodbine.london.02.net ([87.194.255.145]:33177 "EHLO woodbine.london.02.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965798Ab0GPPUW (ORCPT ); Fri, 16 Jul 2010 11:20:22 -0400 Received: from [192.168.1.65] (87.194.12.203) by woodbine.london.02.net (8.5.124.03) id 4C1F980300D8110B for linux-nfs@vger.kernel.org; Fri, 16 Jul 2010 16:20:21 +0100 Message-ID: <4C4078AE.5070300@limepepper.co.uk> Date: Fri, 16 Jul 2010 16:20:14 +0100 From: Tom H To: linux-nfs@vger.kernel.org Subject: why do attempts to access a nfs v3 filesystem (ro,soft) block the process for minutes at a time? (when the nfs server is down) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 (apologies for the cross post from the deprecated list) Hi all, I have a web server which serves some content from an nfs filesystem mounted like so; nfsserver1:/somemount /var/www/html/somefiles nfs rw,soft 0 0 # mount | grep nfs nfsserver1:/somemount on /var/www/html/somefiles type nfs (ro,soft,addr=xx.xx.xx.xx) According to the documentation, an NFS operation on a soft mount should wait for a "major timeout" and then report "server not responding" to syslog and return an error. where a major timeout is after default retrans=3 retransmissions. I understand the process to be like this; call --->0.7 secs --->retransmission--->1.4 secs--->retransmission--->2.8 secs--->server not responding(major timeout) However it is pretty clear that this is retrying indefinitely (or at least many more times that I would like), as the log files show loads of; Jul 16 07:56:09 server1 kernel: nfs: server server2 not responding, timed out Jul 16 07:57:09 server1 last message repeated 4 times Jul 16 07:57:09 server1 last message repeated 6 times and eventually this kills the apache server as all the available processes are blocked during "retrying indefinitely", until the apache server is restarted. (restarting the nfs server at this point does not seem to recover the apache child processes) So what should my strategy be to stop the failed mount killing apache. I care more about the apache staying up, as I don't have that much control over the nfs server.. (also I noticed that it seems to timeout quicker with the mount options set like (soft, timeo=7, retrans=3) which is unexpected, because they are supposed to be the default) Regards and thanks in advance, T