From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jean-Christophe Ducom Subject: Re: processes stuck in....? Date: Tue, 06 May 2003 17:43:26 -0500 Sender: nfs-admin@lists.sourceforge.net Message-ID: <3EB83A8E.5050702@nd.edu> References: <482A3FA0050D21419C269D13989C6113127DAD@lavender-fe.eng.netapp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Return-path: Received: from pickering.cc.nd.edu ([129.74.250.225]) by sc8-sf-list1.sourceforge.net with esmtp (Cipher TLSv1:DES-CBC3-SHA:168) (Exim 3.31-VA-mm2 #1 (Debian)) id 19DBCC-0003lU-00 for ; Tue, 06 May 2003 15:46:00 -0700 To: "Lever, Charles" , nfs Errors-To: nfs-admin@lists.sourceforge.net List-Help: List-Post: List-Subscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Unsubscribe: , List-Archive: Thanks for your email Charles. > your symptoms sound like a hardware problem to me. have > you checked your mainboard and memory? Except that a) if I run locally the program, everything is fine b) the node is part of a medium size cluster and this happens basically on every node. One thing I noticed is that if I 'tail -f big_file' on the server, I get several times "Stale NFS file handle". I'll use Fstress and other test to reproduce the problem and hopefully will have more debug message to submit. Thanks again for your help JC > > >>-----Original Message----- >>From: Jean-Christophe Ducom [mailto:jducom@nd.edu] >>Sent: Tuesday, May 06, 2003 3:23 PM >>To: nfs@lists.sourceforge.net >>Subject: [NFS] processes stuck in....? >> >> >>Hi, >> >> I'd like to get some feedback on a problem that freezes >>SMP clients so hard >>that a) even when a monitor is plugged, there is no display >>(no control from >>keyboard either) >>b) no terminal console access via serial port c) even >>nmi_watchdog can't get out >>of it apparently (no report) >>This happens usually in a pretty consistently when the nfs >>client creates a lot >>of files (like 800+) or write big file. >>I don't have any tcpdump or something else for now. >>Any idea/suggestion (change mount options?) >>Thanks for any help or feedback >> >> JC >> >> >> >> >> >>Client: Precision 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21 >>GigE, Redhat 7.2, >>kernel 2.4.21-rc1, nfs-utils-1.0.3-1 with ext2 file >># rpcinfo -p >> program vers proto port >> 100000 2 tcp 111 portmapper >> 100000 2 udp 111 portmapper >> 100021 1 udp 32768 nlockmgr >> 100021 3 udp 32768 nlockmgr >> 100021 4 udp 32768 nlockmgr >> 100024 1 udp 904 status >> 100024 1 tcp 907 status >>The directory are mounted with option: >>10.0.0.10:/export/data /opt/data nfs >>rw,nosuid,nodev,hard,intr,bg,rsize=8192,wsize=8192 0 0 >> >>Excerpt from /var/log/messages: >>May 6 13:59:27 bob1 kernel: sk9dlin: Network Device Driver v1.03 >>May 6 13:59:27 bob1 kernel: (C)Copyright 2001 SysKonnect GmbH. >>May 6 13:59:27 bob1 kernel: eth0: SysKonnect SK-9D21 Gigabit Ethernet >>May 6 13:59:27 bob1 kernel: eth0: network connection down >>May 6 13:59:27 bob1 kernel: eth0: NIC Link is downeth0: >>network connection down >>May 6 13:59:27 bob1 kernel: eth0: Network connection up using >>May 6 13:59:27 bob1 kernel: speed = 1000 Mbps >>May 6 13:59:27 bob1 kernel: duplex mode = full >>May 6 13:59:27 bob1 kernel: card = copper >>May 6 13:59:27 bob1 kernel: flowctrl = none >>May 6 13:59:27 bob1 kernel: autoneg = no >>..... >>May 6 13:59:24 bob1 sysctl: net.ipv4.ip_forward = 0 >>May 6 13:59:24 bob1 sysctl: net.ipv4.conf.default.rp_filter = 1 >>May 6 13:59:24 bob1 sysctl: kernel.sysrq = 0 >>May 6 13:59:24 bob1 sysctl: kernel.shmall = 33554432 >>May 6 13:59:24 bob1 sysctl: kernel.shmmax = 536870912 >>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_fin_timeout = 30 >>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_keepalive_time = 1800 >>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_window_scaling = 0 >>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_sack = 0 >>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_timestamps = 0 >>May 6 13:59:24 bob1 sysctl: net.ipv4.ip_no_pmtu_disc = 1 >>May 6 13:59:24 bob1 sysctl: net.core.rmem_max = 262143 >>May 6 13:59:24 bob1 sysctl: net.core.rmem_default = 262143 >>May 6 13:59:24 bob1 sysctl: net.core.wmem_max = 262143 >>..... >>May 6 13:59:34 bob1 rpc.lockd: lockdsvc: Function not implemented >>May 6 13:59:34 bob1 nfslock: rpc.lockd startup failed >>May 6 13:59:34 bob1 rpc.statd[724]: Version 1.0.3 Starting >>May 6 13:59:34 bob1 nfslock: rpc.statd startup succeeded >> >> >>----------------------------- >>Server: 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21 GigE, >>Redhat 7.2, kernel >>2.4.21-rc1, nfs-utils-1.0.3-1 with ext3 on a raid5 >># rpcinfo -p >> program vers proto port >> 100000 2 tcp 111 portmapper >> 100000 2 udp 111 portmapper >> 100011 1 udp 696 rquotad >> 100011 2 udp 696 rquotad >> 100011 1 tcp 699 rquotad >> 100011 2 tcp 699 rquotad >> 100003 2 udp 2049 nfs >> 100003 3 udp 2049 nfs >> 100021 1 udp 32768 nlockmgr >> 100021 3 udp 32768 nlockmgr >> 100021 4 udp 32768 nlockmgr >> 100005 1 udp 732 mountd >> 100005 1 tcp 735 mountd >> 100005 2 udp 732 mountd >> 100005 2 tcp 735 mountd >> 100005 3 udp 732 mountd >> 100005 3 tcp 735 mountd >> 100024 1 udp 757 status >> 100024 1 tcp 760 status >># cat /etc/exports >>/export/data 10.0.3.0/8(rw) 10.0.0.1(rw) >> >> >>Excerpt From /var/log/messages: >>May 5 21:24:44 file1 exportfs[938]: /etc/exports [4]: No >>'sync' or 'async' >>option specified for export "10.0.0.1:/export/data". >>Assuming default >>behaviour ('sync'). NOTE: this default has changed from >>previous versions >>May 5 21:24:44 file1 nfs: Starting NFS services: succeeded >>May 5 21:24:45 file1 nfs: rpc.rquotad startup succeeded >>May 5 21:24:45 file1 nfs: rpc.nfsd startup succeeded >>May 5 21:24:45 file1 nfs: rpc.mountd startup succeeded >>May 5 21:24:45 file1 nfslock: rpc.lockd startup succeeded >>May 5 21:24:45 file1 rpc.statd[1001]: Version 1.0.3 Starting >>May 5 21:24:45 file1 nfslock: rpc.statd startup succeeded >> >>The exported directory is on a Promise Ultratrak100 TX8 >>connected to a Tekram >>390U3W. >> >> >> >>------------------------------------------------------- >>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara >>The only event dedicated to issues related to Linux >>enterprise solutions >>www.enterpriselinuxforum.com >> >>_______________________________________________ >>NFS maillist - NFS@lists.sourceforge.net >>https://lists.sourceforge.net/lists/listinfo/nfs >> > > > > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux enterprise solutions > www.enterpriselinuxforum.com > > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > ------------------------------------------------------- Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara The only event dedicated to issues related to Linux enterprise solutions www.enterpriselinuxforum.com _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs