* processes stuck in....?
@ 2003-05-06 19:22 Jean-Christophe Ducom
0 siblings, 0 replies; 3+ messages in thread
From: Jean-Christophe Ducom @ 2003-05-06 19:22 UTC (permalink / raw)
To: nfs
Hi,
I'd like to get some feedback on a problem that freezes SMP clients so hard
that a) even when a monitor is plugged, there is no display (no control from
keyboard either)
b) no terminal console access via serial port c) even nmi_watchdog can't get out
of it apparently (no report)
This happens usually in a pretty consistently when the nfs client creates a lot
of files (like 800+) or write big file.
I don't have any tcpdump or something else for now.
Any idea/suggestion (change mount options?)
Thanks for any help or feedback
JC
Client: Precision 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21 GigE, Redhat 7.2,
kernel 2.4.21-rc1, nfs-utils-1.0.3-1 with ext2 file
# rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100021 1 udp 32768 nlockmgr
100021 3 udp 32768 nlockmgr
100021 4 udp 32768 nlockmgr
100024 1 udp 904 status
100024 1 tcp 907 status
The directory are mounted with option:
10.0.0.10:/export/data /opt/data nfs
rw,nosuid,nodev,hard,intr,bg,rsize=8192,wsize=8192 0 0
Excerpt from /var/log/messages:
May 6 13:59:27 bob1 kernel: sk9dlin: Network Device Driver v1.03
May 6 13:59:27 bob1 kernel: (C)Copyright 2001 SysKonnect GmbH.
May 6 13:59:27 bob1 kernel: eth0: SysKonnect SK-9D21 Gigabit Ethernet
May 6 13:59:27 bob1 kernel: eth0: network connection down
May 6 13:59:27 bob1 kernel: eth0: NIC Link is downeth0: network connection down
May 6 13:59:27 bob1 kernel: eth0: Network connection up using
May 6 13:59:27 bob1 kernel: speed = 1000 Mbps
May 6 13:59:27 bob1 kernel: duplex mode = full
May 6 13:59:27 bob1 kernel: card = copper
May 6 13:59:27 bob1 kernel: flowctrl = none
May 6 13:59:27 bob1 kernel: autoneg = no
.....
May 6 13:59:24 bob1 sysctl: net.ipv4.ip_forward = 0
May 6 13:59:24 bob1 sysctl: net.ipv4.conf.default.rp_filter = 1
May 6 13:59:24 bob1 sysctl: kernel.sysrq = 0
May 6 13:59:24 bob1 sysctl: kernel.shmall = 33554432
May 6 13:59:24 bob1 sysctl: kernel.shmmax = 536870912
May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_fin_timeout = 30
May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_keepalive_time = 1800
May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_window_scaling = 0
May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_sack = 0
May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_timestamps = 0
May 6 13:59:24 bob1 sysctl: net.ipv4.ip_no_pmtu_disc = 1
May 6 13:59:24 bob1 sysctl: net.core.rmem_max = 262143
May 6 13:59:24 bob1 sysctl: net.core.rmem_default = 262143
May 6 13:59:24 bob1 sysctl: net.core.wmem_max = 262143
.....
May 6 13:59:34 bob1 rpc.lockd: lockdsvc: Function not implemented
May 6 13:59:34 bob1 nfslock: rpc.lockd startup failed
May 6 13:59:34 bob1 rpc.statd[724]: Version 1.0.3 Starting
May 6 13:59:34 bob1 nfslock: rpc.statd startup succeeded
-----------------------------
Server: 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21 GigE, Redhat 7.2, kernel
2.4.21-rc1, nfs-utils-1.0.3-1 with ext3 on a raid5
# rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100011 1 udp 696 rquotad
100011 2 udp 696 rquotad
100011 1 tcp 699 rquotad
100011 2 tcp 699 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100021 1 udp 32768 nlockmgr
100021 3 udp 32768 nlockmgr
100021 4 udp 32768 nlockmgr
100005 1 udp 732 mountd
100005 1 tcp 735 mountd
100005 2 udp 732 mountd
100005 2 tcp 735 mountd
100005 3 udp 732 mountd
100005 3 tcp 735 mountd
100024 1 udp 757 status
100024 1 tcp 760 status
# cat /etc/exports
/export/data 10.0.3.0/8(rw) 10.0.0.1(rw)
Excerpt From /var/log/messages:
May 5 21:24:44 file1 exportfs[938]: /etc/exports [4]: No 'sync' or 'async'
option specified for export "10.0.0.1:/export/data". Assuming default
behaviour ('sync'). NOTE: this default has changed from previous versions
May 5 21:24:44 file1 nfs: Starting NFS services: succeeded
May 5 21:24:45 file1 nfs: rpc.rquotad startup succeeded
May 5 21:24:45 file1 nfs: rpc.nfsd startup succeeded
May 5 21:24:45 file1 nfs: rpc.mountd startup succeeded
May 5 21:24:45 file1 nfslock: rpc.lockd startup succeeded
May 5 21:24:45 file1 rpc.statd[1001]: Version 1.0.3 Starting
May 5 21:24:45 file1 nfslock: rpc.statd startup succeeded
The exported directory is on a Promise Ultratrak100 TX8 connected to a Tekram
390U3W.
-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: processes stuck in....?
@ 2003-05-06 22:24 Lever, Charles
2003-05-06 22:43 ` Jean-Christophe Ducom
0 siblings, 1 reply; 3+ messages in thread
From: Lever, Charles @ 2003-05-06 22:24 UTC (permalink / raw)
To: Jean-Christophe Ducom; +Cc: nfs
hi jean-christophe-
i have a similar system here (1.26GHz dual Tualatin P-III
system with 1GB and SysKonnect GbE, currently running
2.4.21-pre7 on RH7.2) that i use to run a nightly backup
of about 45GB via tar over NFS. (naturally there are
better ways to run a backup; i use this as an NFS client
exerciser).
your symptoms sound like a hardware problem to me. have
you checked your mainboard and memory?
> -----Original Message-----
> From: Jean-Christophe Ducom [mailto:jducom@nd.edu]
> Sent: Tuesday, May 06, 2003 3:23 PM
> To: nfs@lists.sourceforge.net
> Subject: [NFS] processes stuck in....?
>=20
>=20
> Hi,
>=20
> I'd like to get some feedback on a problem that freezes=20
> SMP clients so hard=20
> that a) even when a monitor is plugged, there is no display=20
> (no control from=20
> keyboard either)
> b) no terminal console access via serial port c) even=20
> nmi_watchdog can't get out=20
> of it apparently (no report)
> This happens usually in a pretty consistently when the nfs=20
> client creates a lot=20
> of files (like 800+) or write big file.
> I don't have any tcpdump or something else for now.
> Any idea/suggestion (change mount options?)
> Thanks for any help or feedback
>=20
> JC
>=20
>=20
>=20
>=20
>=20
> Client: Precision 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21=20
> GigE, Redhat 7.2,=20
> kernel 2.4.21-rc1, nfs-utils-1.0.3-1 with ext2 file
> # rpcinfo -p
> program vers proto port
> 100000 2 tcp 111 portmapper
> 100000 2 udp 111 portmapper
> 100021 1 udp 32768 nlockmgr
> 100021 3 udp 32768 nlockmgr
> 100021 4 udp 32768 nlockmgr
> 100024 1 udp 904 status
> 100024 1 tcp 907 status
> The directory are mounted with option:
> 10.0.0.10:/export/data /opt/data nfs=20
> rw,nosuid,nodev,hard,intr,bg,rsize=3D8192,wsize=3D8192 0 0
>=20
> Excerpt from /var/log/messages:
> May 6 13:59:27 bob1 kernel: sk9dlin: Network Device Driver v1.03
> May 6 13:59:27 bob1 kernel: (C)Copyright 2001 SysKonnect GmbH.
> May 6 13:59:27 bob1 kernel: eth0: SysKonnect SK-9D21 Gigabit Ethernet
> May 6 13:59:27 bob1 kernel: eth0: network connection down
> May 6 13:59:27 bob1 kernel: eth0: NIC Link is downeth0:=20
> network connection down
> May 6 13:59:27 bob1 kernel: eth0: Network connection up using
> May 6 13:59:27 bob1 kernel: speed =3D 1000 Mbps
> May 6 13:59:27 bob1 kernel: duplex mode =3D full
> May 6 13:59:27 bob1 kernel: card =3D copper
> May 6 13:59:27 bob1 kernel: flowctrl =3D none
> May 6 13:59:27 bob1 kernel: autoneg =3D no
> .....
> May 6 13:59:24 bob1 sysctl: net.ipv4.ip_forward =3D 0
> May 6 13:59:24 bob1 sysctl: net.ipv4.conf.default.rp_filter =3D 1
> May 6 13:59:24 bob1 sysctl: kernel.sysrq =3D 0
> May 6 13:59:24 bob1 sysctl: kernel.shmall =3D 33554432
> May 6 13:59:24 bob1 sysctl: kernel.shmmax =3D 536870912
> May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_fin_timeout =3D 30
> May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_keepalive_time =3D 1800
> May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_window_scaling =3D 0
> May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_sack =3D 0
> May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_timestamps =3D 0
> May 6 13:59:24 bob1 sysctl: net.ipv4.ip_no_pmtu_disc =3D 1
> May 6 13:59:24 bob1 sysctl: net.core.rmem_max =3D 262143
> May 6 13:59:24 bob1 sysctl: net.core.rmem_default =3D 262143
> May 6 13:59:24 bob1 sysctl: net.core.wmem_max =3D 262143
> .....
> May 6 13:59:34 bob1 rpc.lockd: lockdsvc: Function not implemented
> May 6 13:59:34 bob1 nfslock: rpc.lockd startup failed
> May 6 13:59:34 bob1 rpc.statd[724]: Version 1.0.3 Starting
> May 6 13:59:34 bob1 nfslock: rpc.statd startup succeeded
>=20
>=20
> -----------------------------
> Server: 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21 GigE,=20
> Redhat 7.2, kernel=20
> 2.4.21-rc1, nfs-utils-1.0.3-1 with ext3 on a raid5
> # rpcinfo -p
> program vers proto port
> 100000 2 tcp 111 portmapper
> 100000 2 udp 111 portmapper
> 100011 1 udp 696 rquotad
> 100011 2 udp 696 rquotad
> 100011 1 tcp 699 rquotad
> 100011 2 tcp 699 rquotad
> 100003 2 udp 2049 nfs
> 100003 3 udp 2049 nfs
> 100021 1 udp 32768 nlockmgr
> 100021 3 udp 32768 nlockmgr
> 100021 4 udp 32768 nlockmgr
> 100005 1 udp 732 mountd
> 100005 1 tcp 735 mountd
> 100005 2 udp 732 mountd
> 100005 2 tcp 735 mountd
> 100005 3 udp 732 mountd
> 100005 3 tcp 735 mountd
> 100024 1 udp 757 status
> 100024 1 tcp 760 status
> # cat /etc/exports
> /export/data 10.0.3.0/8(rw) 10.0.0.1(rw)
>=20
>=20
> Excerpt From /var/log/messages:
> May 5 21:24:44 file1 exportfs[938]: /etc/exports [4]: No=20
> 'sync' or 'async'=20
> option specified for export "10.0.0.1:/export/data". =20
> Assuming default=20
> behaviour ('sync'). NOTE: this default has changed from=20
> previous versions
> May 5 21:24:44 file1 nfs: Starting NFS services: succeeded
> May 5 21:24:45 file1 nfs: rpc.rquotad startup succeeded
> May 5 21:24:45 file1 nfs: rpc.nfsd startup succeeded
> May 5 21:24:45 file1 nfs: rpc.mountd startup succeeded
> May 5 21:24:45 file1 nfslock: rpc.lockd startup succeeded
> May 5 21:24:45 file1 rpc.statd[1001]: Version 1.0.3 Starting
> May 5 21:24:45 file1 nfslock: rpc.statd startup succeeded
>=20
> The exported directory is on a Promise Ultratrak100 TX8=20
> connected to a Tekram=20
> 390U3W.
>=20
>=20
>=20
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux=20
> enterprise solutions
> www.enterpriselinuxforum.com
>=20
> _______________________________________________
> NFS maillist - NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20
-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: processes stuck in....?
2003-05-06 22:24 processes stuck in....? Lever, Charles
@ 2003-05-06 22:43 ` Jean-Christophe Ducom
0 siblings, 0 replies; 3+ messages in thread
From: Jean-Christophe Ducom @ 2003-05-06 22:43 UTC (permalink / raw)
To: Lever, Charles, nfs
Thanks for your email Charles.
> your symptoms sound like a hardware problem to me. have
> you checked your mainboard and memory?
Except that a) if I run locally the program, everything is fine
b) the node is part of a medium size cluster and this happens basically on every
node.
One thing I noticed is that if I 'tail -f big_file' on the server, I get
several times "Stale NFS file handle".
I'll use Fstress and other test to reproduce the problem and hopefully will have
more debug message to submit.
Thanks again for your help
JC
>
>
>>-----Original Message-----
>>From: Jean-Christophe Ducom [mailto:jducom@nd.edu]
>>Sent: Tuesday, May 06, 2003 3:23 PM
>>To: nfs@lists.sourceforge.net
>>Subject: [NFS] processes stuck in....?
>>
>>
>>Hi,
>>
>> I'd like to get some feedback on a problem that freezes
>>SMP clients so hard
>>that a) even when a monitor is plugged, there is no display
>>(no control from
>>keyboard either)
>>b) no terminal console access via serial port c) even
>>nmi_watchdog can't get out
>>of it apparently (no report)
>>This happens usually in a pretty consistently when the nfs
>>client creates a lot
>>of files (like 800+) or write big file.
>>I don't have any tcpdump or something else for now.
>>Any idea/suggestion (change mount options?)
>>Thanks for any help or feedback
>>
>> JC
>>
>>
>>
>>
>>
>>Client: Precision 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21
>>GigE, Redhat 7.2,
>>kernel 2.4.21-rc1, nfs-utils-1.0.3-1 with ext2 file
>># rpcinfo -p
>> program vers proto port
>> 100000 2 tcp 111 portmapper
>> 100000 2 udp 111 portmapper
>> 100021 1 udp 32768 nlockmgr
>> 100021 3 udp 32768 nlockmgr
>> 100021 4 udp 32768 nlockmgr
>> 100024 1 udp 904 status
>> 100024 1 tcp 907 status
>>The directory are mounted with option:
>>10.0.0.10:/export/data /opt/data nfs
>>rw,nosuid,nodev,hard,intr,bg,rsize=8192,wsize=8192 0 0
>>
>>Excerpt from /var/log/messages:
>>May 6 13:59:27 bob1 kernel: sk9dlin: Network Device Driver v1.03
>>May 6 13:59:27 bob1 kernel: (C)Copyright 2001 SysKonnect GmbH.
>>May 6 13:59:27 bob1 kernel: eth0: SysKonnect SK-9D21 Gigabit Ethernet
>>May 6 13:59:27 bob1 kernel: eth0: network connection down
>>May 6 13:59:27 bob1 kernel: eth0: NIC Link is downeth0:
>>network connection down
>>May 6 13:59:27 bob1 kernel: eth0: Network connection up using
>>May 6 13:59:27 bob1 kernel: speed = 1000 Mbps
>>May 6 13:59:27 bob1 kernel: duplex mode = full
>>May 6 13:59:27 bob1 kernel: card = copper
>>May 6 13:59:27 bob1 kernel: flowctrl = none
>>May 6 13:59:27 bob1 kernel: autoneg = no
>>.....
>>May 6 13:59:24 bob1 sysctl: net.ipv4.ip_forward = 0
>>May 6 13:59:24 bob1 sysctl: net.ipv4.conf.default.rp_filter = 1
>>May 6 13:59:24 bob1 sysctl: kernel.sysrq = 0
>>May 6 13:59:24 bob1 sysctl: kernel.shmall = 33554432
>>May 6 13:59:24 bob1 sysctl: kernel.shmmax = 536870912
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_fin_timeout = 30
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_keepalive_time = 1800
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_window_scaling = 0
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_sack = 0
>>May 6 13:59:24 bob1 sysctl: net.ipv4.tcp_timestamps = 0
>>May 6 13:59:24 bob1 sysctl: net.ipv4.ip_no_pmtu_disc = 1
>>May 6 13:59:24 bob1 sysctl: net.core.rmem_max = 262143
>>May 6 13:59:24 bob1 sysctl: net.core.rmem_default = 262143
>>May 6 13:59:24 bob1 sysctl: net.core.wmem_max = 262143
>>.....
>>May 6 13:59:34 bob1 rpc.lockd: lockdsvc: Function not implemented
>>May 6 13:59:34 bob1 nfslock: rpc.lockd startup failed
>>May 6 13:59:34 bob1 rpc.statd[724]: Version 1.0.3 Starting
>>May 6 13:59:34 bob1 nfslock: rpc.statd startup succeeded
>>
>>
>>-----------------------------
>>Server: 530 Dual 1.7GhZ Xeon with Syskonnect SK9D21 GigE,
>>Redhat 7.2, kernel
>>2.4.21-rc1, nfs-utils-1.0.3-1 with ext3 on a raid5
>># rpcinfo -p
>> program vers proto port
>> 100000 2 tcp 111 portmapper
>> 100000 2 udp 111 portmapper
>> 100011 1 udp 696 rquotad
>> 100011 2 udp 696 rquotad
>> 100011 1 tcp 699 rquotad
>> 100011 2 tcp 699 rquotad
>> 100003 2 udp 2049 nfs
>> 100003 3 udp 2049 nfs
>> 100021 1 udp 32768 nlockmgr
>> 100021 3 udp 32768 nlockmgr
>> 100021 4 udp 32768 nlockmgr
>> 100005 1 udp 732 mountd
>> 100005 1 tcp 735 mountd
>> 100005 2 udp 732 mountd
>> 100005 2 tcp 735 mountd
>> 100005 3 udp 732 mountd
>> 100005 3 tcp 735 mountd
>> 100024 1 udp 757 status
>> 100024 1 tcp 760 status
>># cat /etc/exports
>>/export/data 10.0.3.0/8(rw) 10.0.0.1(rw)
>>
>>
>>Excerpt From /var/log/messages:
>>May 5 21:24:44 file1 exportfs[938]: /etc/exports [4]: No
>>'sync' or 'async'
>>option specified for export "10.0.0.1:/export/data".
>>Assuming default
>>behaviour ('sync'). NOTE: this default has changed from
>>previous versions
>>May 5 21:24:44 file1 nfs: Starting NFS services: succeeded
>>May 5 21:24:45 file1 nfs: rpc.rquotad startup succeeded
>>May 5 21:24:45 file1 nfs: rpc.nfsd startup succeeded
>>May 5 21:24:45 file1 nfs: rpc.mountd startup succeeded
>>May 5 21:24:45 file1 nfslock: rpc.lockd startup succeeded
>>May 5 21:24:45 file1 rpc.statd[1001]: Version 1.0.3 Starting
>>May 5 21:24:45 file1 nfslock: rpc.statd startup succeeded
>>
>>The exported directory is on a Promise Ultratrak100 TX8
>>connected to a Tekram
>>390U3W.
>>
>>
>>
>>-------------------------------------------------------
>>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
>>The only event dedicated to issues related to Linux
>>enterprise solutions
>>www.enterpriselinuxforum.com
>>
>>_______________________________________________
>>NFS maillist - NFS@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/nfs
>>
>
>
>
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux enterprise solutions
> www.enterpriselinuxforum.com
>
> _______________________________________________
> NFS maillist - NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>
-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2003-05-06 22:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-06 22:24 processes stuck in....? Lever, Charles
2003-05-06 22:43 ` Jean-Christophe Ducom
-- strict thread matches above, loose matches on Subject: below --
2003-05-06 19:22 Jean-Christophe Ducom
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.