From: Jean-Christophe Ducom <jducom@nd.edu>
To: "Lever, Charles" <Charles.Lever@netapp.com>,
nfs <nfs@lists.sourceforge.net>
Subject: Re: processes stuck in....?
Date: Tue, 06 May 2003 17:43:26 -0500 [thread overview]
Message-ID: <3EB83A8E.5050702@nd.edu> (raw)
In-Reply-To: 482A3FA0050D21419C269D13989C6113127DAD@lavender-fe.eng.netapp.com
Thanks for your email Charles.
> your symptoms sound like a hardware problem to me. have
> you checked your mainboard and memory?
Except that a) if I run locally the program, everything is fine
b) the node is part of a medium size cluster and this happens basically on every
node.
One thing I noticed is that if I 'tail -f big_file' on the server, I get
several times "Stale NFS file handle".
I'll use Fstress and other test to reproduce the problem and hopefully will have
more debug message to submit.
Thanks again for your help
JC
>
>
>>-----Original Message-----
>>From: Jean-Christophe Ducom [mailto:jducom@nd.edu]
>>Sent: Tuesday, May 06, 2003 3:23 PM
>>To: nfs@lists.sourceforge.net
>>Subject: [NFS] processes stuck in....?
>>
>>
>>Hi,
>>
>> I'd like to get some feedback on a problem that freezes
>>SMP clients so hard
>>that a) even when a monitor is plugged, there is no display
>>(no control from
>>keyboard either)
>>b) no terminal console access via serial port c) even
>>nmi_watchdog can't get out
>>of it apparently (no report)
>>This happens usually in a pretty consistently when the nfs
>>client creates a lot
>>of files (like 800+) or write big file.
>>I don't have any tcpdump or something else for now.
>>Any idea/suggestion (change mount options?)
>>Thanks for any help or feedback
>>
>> JC
>>
>>
>>
>>
>>
>>Client: Precision 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21
>>GigE, Redhat 7.2,
>>kernel 2.4.21-rc1, nfs-utils-1.0.3-1 with ext2 file
>># rpcinfo -p
>> program vers proto port
>> 100000 2 tcp 111 portmapper
>> 100000 2 udp 111 portmapper
>> 100021 1 udp 32768 nlockmgr
>> 100021 3 udp 32768 nlockmgr
>> 100021 4 udp 32768 nlockmgr
>> 100024 1 udp 904 status
>> 100024 1 tcp 907 status
>>The directory are mounted with option:
>>10.0.0.10:/export/data /opt/data nfs
>>rw,nosuid,nodev,hard,intr,bg,rsize=8192,wsize=8192 0 0
>>
>>Excerpt from /var/log/messages:
>>May 6 13:59:27 bob1 kernel: sk9dlin: Network Device Driver v1.03
>>May 6 13:59:27 bob1 kernel: (C)Copyright 2001 SysKonnect GmbH.
>>May 6 13:59:27 bob1 kernel: eth0: SysKonnect SK-9D21 Gigabit Ethernet
>>May 6 13:59:27 bob1 kernel: eth0: network connection down
>>May 6 13:59:27 bob1 kernel: eth0: NIC Link is downeth0:
>>network connection down
>>May 6 13:59:27 bob1 kernel: eth0: Network connection up using
>>May 6 13:59:27 bob1 kernel: speed = 1000 Mbps
>>May 6 13:59:27 bob1 kernel: duplex mode = full
>>May 6 13:59:27 bob1 kernel: card = copper
>>May 6 13:59:27 bob1 kernel: flowctrl = none
>>May 6 13:59:27 bob1 kernel: autoneg = no
>>.....
>>May 6 13:59:24 bob1 sysctl: net.ipv4.ip_forward = 0
>>May 6 13:59:24 bob1 sysctl: net.ipv4.conf.default.rp_filter = 1
>>May 6 13:59:24 bob1 sysctl: kernel.sysrq = 0
>>May 6 13:59:24 bob1 sysctl: kernel.shmall = 33554432
>>May 6 13:59:24 bob1 sysctl: kernel.shmmax = 536870912
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_fin_timeout = 30
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_keepalive_time = 1800
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_window_scaling = 0
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_sack = 0
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_timestamps = 0
>>May 6 13:59:24 bob1 sysctl: net.ipv4.ip_no_pmtu_disc = 1
>>May 6 13:59:24 bob1 sysctl: net.core.rmem_max = 262143
>>May 6 13:59:24 bob1 sysctl: net.core.rmem_default = 262143
>>May 6 13:59:24 bob1 sysctl: net.core.wmem_max = 262143
>>.....
>>May 6 13:59:34 bob1 rpc.lockd: lockdsvc: Function not implemented
>>May 6 13:59:34 bob1 nfslock: rpc.lockd startup failed
>>May 6 13:59:34 bob1 rpc.statd[724]: Version 1.0.3 Starting
>>May 6 13:59:34 bob1 nfslock: rpc.statd startup succeeded
>>
>>
>>-----------------------------
>>Server: 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21 GigE,
>>Redhat 7.2, kernel
>>2.4.21-rc1, nfs-utils-1.0.3-1 with ext3 on a raid5
>># rpcinfo -p
>> program vers proto port
>> 100000 2 tcp 111 portmapper
>> 100000 2 udp 111 portmapper
>> 100011 1 udp 696 rquotad
>> 100011 2 udp 696 rquotad
>> 100011 1 tcp 699 rquotad
>> 100011 2 tcp 699 rquotad
>> 100003 2 udp 2049 nfs
>> 100003 3 udp 2049 nfs
>> 100021 1 udp 32768 nlockmgr
>> 100021 3 udp 32768 nlockmgr
>> 100021 4 udp 32768 nlockmgr
>> 100005 1 udp 732 mountd
>> 100005 1 tcp 735 mountd
>> 100005 2 udp 732 mountd
>> 100005 2 tcp 735 mountd
>> 100005 3 udp 732 mountd
>> 100005 3 tcp 735 mountd
>> 100024 1 udp 757 status
>> 100024 1 tcp 760 status
>># cat /etc/exports
>>/export/data 10.0.3.0/8(rw) 10.0.0.1(rw)
>>
>>
>>Excerpt From /var/log/messages:
>>May 5 21:24:44 file1 exportfs[938]: /etc/exports [4]: No
>>'sync' or 'async'
>>option specified for export "10.0.0.1:/export/data".
>>Assuming default
>>behaviour ('sync'). NOTE: this default has changed from
>>previous versions
>>May 5 21:24:44 file1 nfs: Starting NFS services: succeeded
>>May 5 21:24:45 file1 nfs: rpc.rquotad startup succeeded
>>May 5 21:24:45 file1 nfs: rpc.nfsd startup succeeded
>>May 5 21:24:45 file1 nfs: rpc.mountd startup succeeded
>>May 5 21:24:45 file1 nfslock: rpc.lockd startup succeeded
>>May 5 21:24:45 file1 rpc.statd[1001]: Version 1.0.3 Starting
>>May 5 21:24:45 file1 nfslock: rpc.statd startup succeeded
>>
>>The exported directory is on a Promise Ultratrak100 TX8
>>connected to a Tekram
>>390U3W.
>>
>>
>>
>>-------------------------------------------------------
>>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
>>The only event dedicated to issues related to Linux
>>enterprise solutions
>>www.enterpriselinuxforum.com
>>
>>_______________________________________________
>>NFS maillist - NFS@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/nfs
>>
>
>
>
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux enterprise solutions
> www.enterpriselinuxforum.com
>
> _______________________________________________
> NFS maillist - NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>
-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
next prev parent reply other threads:[~2003-05-06 22:46 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-05-06 22:24 processes stuck in....? Lever, Charles
2003-05-06 22:43 ` Jean-Christophe Ducom [this message]
-- strict thread matches above, loose matches on Subject: below --
2003-05-06 19:22 Jean-Christophe Ducom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3EB83A8E.5050702@nd.edu \
--to=jducom@nd.edu \
--cc=Charles.Lever@netapp.com \
--cc=nfs@lists.sourceforge.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.