public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* nfs: server not responding
@ 2009-05-16  0:57 Jerome Walters
  0 siblings, 0 replies; 27+ messages in thread
From: Jerome Walters @ 2009-05-16  0:57 UTC (permalink / raw)
  To: linux-nfs

Description of problem:
Periodically, and with no obvious cause, all NFS connections between ou=
r Debian=20
Testing (_Squeeze_) x86 client (a diskless node which uses nfsroot and =
boots=20
from the server) and our Debian Testing (_Squeeze_) x86 server hang and=
 dmesg=20
on the client side informs that the server is "not responding".

The server is responding to everyone else's requests.=20

Restarting the nfsd on the server doesn't appear to solve the problem.

At first I wasnt able to capture some debug information since /var/log =
was=20
mounted over the nfs, so I have installed a hard drive where I mounted=20
only /var/log to be able to capture debug logs from the client as well.


Debug Logs:=20
http://fixity.net/tmp/client.log.gz - Kernel RPC Debug Log from the cli=
ent
http://fixity.net/tmp/server.log.gz - Kernel RPC Debug Log from the ser=
ver


How reproducible:
Happens from 10 to 90 minutes after booting the diskless node.


Actual results:
NFS connections stop responding, system hangs or becomes very slow and=20
unresponsive (it doesnt respond to Ctrl+Alt+Del as well). 60 to 90 minu=
tes=20
after the first server time out client says server OK but the client is=
 still=20
unresponsive. Immediately after that the client logs server connection =
loss=20
again which leads to continues loop. Client is still unresponsive. Some=
times=20
client resumes normal operation for couple of hours but then the proble=
m=20
repeats.


Connectivity info:=20
Both the client and the server are connected to Gigabit Ethernet Cisco =
Metro=20
series managable switch. Both of them use Intel Pro 82545GM Gigabit Eth=
ernet=20
Server Controllers. Neither one of them log any Ethernet errors and non=
e are=20
logged by the switch.


Expected results:
NFS connections continue to function and don't fail like clockwork when=
 every=20
other client on the network has no issues.


Client & Server Load:
=46or the purposes of testing both machines were only running needed da=
emons and=20
weren=E2=80=99t loaded at all.


Client & Server Kernel:
On both the client and server custom compiled linux 2.6.29.3 kernel was=
 used.=20
Configuration file @ http://fixity.net/tmp/config-2.6.29.3.gz


Client & Server Network interface fragmented packet queue length:
net.ipv4.ipfrag_high_thresh =3D 524288
net.ipv4.ipfrag_low_thresh =3D 393216


Client Versions:
libnfsidmap2/squeeze uptodate 0.21-2
nfs-common/squeeze uptodate 1:1.1.4-1


Client Mount (cat /proc/mounts | grep nfsroot):
10.11.11.1:/nfsroot / nfs=20
rw,vers=3D3,rsize=3D524288,wsize=3D524288,namlen=3D255,hard,nointr,nolo=
ck,proto=3Dtcp,time
o=3D7,retrans=3D10,sec=3Dsys,addr=3D10.11.11.1 0 0


Client fstab:
proc            /proc           proc    defaults        0       0
/dev/nfs        /               nfs     defaults        1       1
none            /tmp            tmpfs   defaults        0       0
none            /var/run        tmpfs   defaults        0       0
none            /var/lock       tmpfs   defaults        0       0
none            /var/tmp        tmpfs   defaults        0       0


Client Daemons:
portmap, rpc.statd, rpc.idmapd


Server Daemons:
portmap, rpc.statd, rpc.idmapd, rpc.mountd --manage-gids


Server Versions:
libnfsidmap2/squeeze uptodate 0.21-2
nfs-common/squeeze uptodate 1:1.1.4-1
nfs-kernel-server/testing uptodate 1:1.1.4-1


Server Export:
/nfsroot 10.11.11.*(rw,no_root_squash,async,no_subtree_check)


Server Options:
RPCNFSDCOUNT=3D16
RPCNFSDPRIORITY=3D0
RPCMOUNTDOPTS=3D--manage-gids
NEED_SVCGSSD=3Dno
RPCSVCGSSDOPTS=3Dno


Additional Info:
Since I have read that tweaking the nfsroot mount options could improve=
 the=20
situation a have tested with different options as follows:
rsize/wsize=3D1024|2048|4096|8192|32768|524288
timeo=3D15|60|600
retrans=3D3|10|20
None resulted in solving the problem.





Any help or suggestions on fixing the problem would be highly appreciat=
ed. I=20
have been messing with that problem for the last couple of weeks and ra=
n out of=20
ideas.


Best Regards,
Jerome Walters


^ permalink raw reply	[flat|nested] 27+ messages in thread
* NFS server not responding
@ 2003-11-27 12:00 Douglas Furlong
  2003-11-27 16:30 ` Trond Myklebust
  2003-11-28  8:46 ` Juergen Sauer
  0 siblings, 2 replies; 27+ messages in thread
From: Douglas Furlong @ 2003-11-27 12:00 UTC (permalink / raw)
  To: nfs

Good day all.

I am running in to excessive amounts of NFS errors as below.

kernel: nfs: server neon not responding, still trying
kernel: nfs: server neon OK

I was hoping that some of you may be able to provide me with some
assistance.

First The Hardware
------------------
Neon: FileServer
Disks: 4xSATA connected to a HighPoint RAID controller. I am using their
drivers but using Linux software raid (md0). this stores the bulk of the
data.
	1xATA connected to on-board IDE, this has the rest of the OS on it.
Network Card: 3c905 (more details can be obtained if needed).
OS: Redhat9 + all current updates + statd version 1.0.6 (from sf.net)
Authentication/User Details: Via an OpenLDAP server
Memory: 512MB
CPU: XP2800

Wibbit: Workstation
Disks: Normal ATA disk.
Network Card: 3c905 I believe.
OS: Fedora Core1 (was previously RedHat9 suffering the same problems)
Authentication/User Details: Via an OpenLDAP server
Memory: 512MB
CPU: XP2200

Network: Switched 10/100. Fileserver connected to a HP switch,
workstations connected to the HP switch via smaller 5port switches.


The Software
------------

Server
------
A bit more about the software.

The server is using an LDAP server (on the same physical network,
separate IP network) to authenticate uses credentials. nscd is running
and working on this machine.
I have exported several directory structures including home drives from
this machine.

/etc/exports
/mnt/raid/ISO/          192.168.0.1/255.255.255.0(ro,sync)
/mnt/raid/home          192.168.0.1/255.255.255.0(rw,sync)
/mnt/raid/Operations    192.168.0.1/255.255.255.0(rw,sync)
/mnt/raid/Systems       192.168.0.1/255.255.255.0(rw,sync)
/mnt/raid/CustomerServices      192.168.0.1/255.255.255.0(rw,sync)
/mnt/raid/cvs           192.168.0.1/255.255.255.0(rw,sync)
/opt    192.168.0.1/255.255.255.0(rw,sync)
# For testing using iozone
/mnt/raid/test          192.168.0.150(rw,sync,no_root_squash)

I have upgraded the version of statd due to a problem reported on a
newsgroup referring to a problem with RedHat's patches. I am not sure if
it was causing the problem, but I was (am) running out of idea's. The
patch was with regards to statd dropping root privileges.

Clients
-------
All of my testing is being done from my client, however I have about 16
Linux desktops with their home directories mounted off of Neon, and
numerous applications that are mounted off of Neon (oh plus the data).

/etc/fstab
# NFS Mounts
neon:/mnt/raid/home     /home                   nfs    
wsize=8192,rsize=8192,intr,hard 0 0
neon:/mnt/raid/ISO/     /mnt/neon/iso           nfs    
wsize=8192,rsize=8192,intr,hard 0 0 
neon:/opt               /opt                    nfs    
wsize=8192,rsize=8192,intr,hard 0 0

# NFS Mount for testing
neon:/mnt/raid/test     /mnt/neon/test          nfs    
rw,hard,intr,rsize=8192,wsize=8192 0 0

I have started nfslock on both the clients and server, as well as nfs.

Usability
---------
When my users are working on their Linux machines, they notice from time
to time that they get intermittent "freezing" where applications stop
responding, unable to switch desktops or error messages from evolution
saying it cant store data. 
All of these freezes co-inside with error messages like the below
appearing in the /var/log/messages
kernel: nfs: server neon not responding, still trying
kernel: nfs: server neon OK
The above can be repeated hundreds of times over the course of several
hours.

I had attempted to set up a network install of open office, but this
caused the machines to become 100% unusable due to OpenOffice tying up
the system. Setting the mount option to soft, prevented this, however
OpenOffice was not usable (would not start).

However I am able to run Pheonix and aMSN off of the NFS server, but I
do find at times that there is a delay opening/closing the browser. I
believe this is once again down to NFS time outs.

Below is a cat of the nfsd file in /proc/net/rpc, I am not sure what the
th value should be, but I think those numbers are quite high.

[root@neon rpc]# cat nfsd 
rc 70031 9018069 27954571
fh 10717 36541222 0 278580 494554
io 3860485896 4234117935
th 32 73218 6754.760 3694.770 2485.590 1861.300 1778.710 906.570 689.360
588.490 494.790 5316.810
ra 64 4680995 22399 14758 7499 4804 4549 2906 2844 2000 2174 306976
net 37042672 37042672 0 0
rpc 37042671 1 1 0 0
proc2 18 2 330 0 0 244 0 1306091 0 0 0 0 0 0 0 0 0 17 25
proc3 22 2 16164612 257385 4123444 1202703 5040 3745880 7412118 526581
2427 5126 108 398040 2342 350136 133820 68430 20129 37392 11528 0
1268719

Does any one have any hints or suggestions that I could take away and
work with?

Cheers

doug



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive?  Does it
help you create better code?  SHARE THE LOVE, and help us help
YOU!  Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2009-05-16  1:00 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-16  0:57 nfs: server not responding Jerome Walters
  -- strict thread matches above, loose matches on Subject: below --
2003-11-27 12:00 NFS " Douglas Furlong
2003-11-27 16:30 ` Trond Myklebust
2003-11-27 19:07   ` Douglas Furlong
2003-11-27 20:02     ` Trond Myklebust
2003-11-28  8:46 ` Juergen Sauer
2003-11-28  9:37   ` Douglas Furlong
2003-11-28 10:11     ` Juergen Sauer
2003-11-28 10:48       ` Douglas Furlong
2003-11-28 12:28         ` Bogdan Costescu
2003-11-28 16:56           ` Trond Myklebust
2003-11-28 18:43             ` Bogdan Costescu
2003-12-02 14:37             ` Douglas Furlong
2003-12-02 15:37               ` Trond Myklebust
2003-12-04 17:17                 ` Steve Dickson
2003-12-04 17:37             ` Steve Dickson
2003-12-04 18:39               ` Trond Myklebust
2003-12-04 19:11                 ` Steve Dickson
2003-12-04 20:55                   ` seth vidal
2003-12-04 21:24                     ` Steve Dickson
2003-12-05  2:53                       ` Kyle Rose
2003-12-09 19:47                         ` Steve Dickson
2003-12-09 20:09                           ` Kyle Rose
2003-12-05 15:50                 ` Bogdan Costescu
2003-11-30 20:01           ` seth vidal
2003-12-01 10:58             ` Bogdan Costescu
2003-11-28 12:36       ` Bogdan Costescu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox