* 2.4.17 NFS hangup @ 2002-02-03 20:22 Burjan Gabor 2002-02-03 21:06 ` Trond Myklebust 0 siblings, 1 reply; 8+ messages in thread From: Burjan Gabor @ 2002-02-03 20:22 UTC (permalink / raw) To: LKML Hello, I have a reproducable problem with 2.4.17 kernel and NFS client after netbooting an RS/6000 (ppc architecture). Immediately after boot: partvis:/tmp# dd if=/dev/zero of=blah1 count=1 1+0 records in 1+0 records out partvis:/tmp# partvis:/tmp# dd if=/dev/zero of=blah2 count=2 2+0 records in 2+0 records out nfs: server 157.181.150.31 not responding, still trying nfs: server 157.181.150.31 not responding, still trying nfs: task 913 can't get a request slot ... and so on Relevant tcpdump output: 20:41:40.927855 heron.elte.hu.nfs > partvis.elte.hu.3648238371: reply ok 28 lookup ERROR: No such file or directory (DF) 20:41:40.928622 partvis.elte.hu.3648238372 > heron.elte.hu.nfs: 148 create [|nfs] (DF) 20:41:40.929271 heron.elte.hu.nfs > partvis.elte.hu.3648238372: reply ok 128 create [|nfs] (DF) 20:41:40.930655 partvis.elte.hu.3648238373 > heron.elte.hu.nfs: 100 getattr [|nfs] (DF) 20:41:40.930976 heron.elte.hu.nfs > partvis.elte.hu.3648238373: reply ok 96 getattr REG 100644 ids 0 However, reading works without any problems. Full tcpdump output from poweron: http://www.csoma.elte.hu/~burjang/nfs-tcpdump-20010203.out.gz buga ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.17 NFS hangup 2002-02-03 20:22 2.4.17 NFS hangup Burjan Gabor @ 2002-02-03 21:06 ` Trond Myklebust 2002-02-03 21:34 ` Burján Gábor 0 siblings, 1 reply; 8+ messages in thread From: Trond Myklebust @ 2002-02-03 21:06 UTC (permalink / raw) To: Burjan Gabor; +Cc: LKML >>>>> " " == Burjan Gabor <buga+dated+1013026971.2270df@elte.hu> writes: > 20:41:40.927855 heron.elte.hu.nfs > partvis.elte.hu.3648238371: > reply ok 28 lookup ERROR: No such file or directory (DF) > 20:41:40.928622 partvis.elte.hu.3648238372 > heron.elte.hu.nfs: > 148 create [|nfs] (DF) 20:41:40.929271 heron.elte.hu.nfs > > partvis.elte.hu.3648238372: reply ok 128 create [|nfs] (DF) > 20:41:40.930655 partvis.elte.hu.3648238373 > heron.elte.hu.nfs: > 100 getattr [|nfs] (DF) 20:41:40.930976 heron.elte.hu.nfs > > partvis.elte.hu.3648238373: reply ok 96 getattr REG 100644 ids > 0 Nothing abnormal there or in your file. However, when you start getting 'server not responding' messages, and no tcpdump output it's usually a sign that the networking layer has given up on you. Any strange output from 'netstat -s'? It would be useful to know what networking card/driver combination you are using? Any firewalls/netfilter setups? Any special mount options? Cheers, Trond ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.17 NFS hangup 2002-02-03 21:06 ` Trond Myklebust @ 2002-02-03 21:34 ` Burján Gábor 2002-02-03 21:58 ` Alan Cox 2002-02-03 22:44 ` Trond Myklebust 0 siblings, 2 replies; 8+ messages in thread From: Burján Gábor @ 2002-02-03 21:34 UTC (permalink / raw) To: Trond Myklebust; +Cc: LKML Hello, On Sun, Feb 03, Trond Myklebust wrote: > Nothing abnormal there or in your file. However, when you start > getting 'server not responding' messages, and no tcpdump output it's > usually a sign that the networking layer has given up on you. Any > strange output from 'netstat -s'? Output is here: http://www.csoma.elte.hu/~burjang/netstat-s-20020203.out I think `1710 reassemblies required' may be strange after boot... How can I figure out what causes this? > It would be useful to know what networking card/driver combination you > are using? Any firewalls/netfilter setups? Any special mount options? eth0: PCnet/PCI II 79C970A at 0x1020, 08 00 5a f8 82 e7 pcnet32: pcnet32_private lp=c0591000 lp_dma_addr=0x80591000 assigned IRQ 15. pcnet32.c:v1.25kf 17.11.2001 tsbogend@alpha.franken.de (this card is an integrated AMD pcnet32 in a 43P-140) There are no firewalls or packet filters. I didn't specify any special mount options for nfs: partvis:~$ cat /proc/mounts /dev/root / nfs rw,v2,rsize=4096,wsize=4096,hard,udp,nolock,addr=157.181.150.31 0 0 proc /proc proc rw 0 0 partvis:~$ buga ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.17 NFS hangup 2002-02-03 21:34 ` Burján Gábor @ 2002-02-03 21:58 ` Alan Cox 2002-02-03 22:44 ` Trond Myklebust 1 sibling, 0 replies; 8+ messages in thread From: Alan Cox @ 2002-02-03 21:58 UTC (permalink / raw) To: "Burján Gábor"; +Cc: Trond Myklebust, LKML > Output is here: http://www.csoma.elte.hu/~burjang/netstat-s-20020203.out > > I think `1710 reassemblies required' may be strange after boot... > How can I figure out what causes this? NFS uses large packets. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.17 NFS hangup 2002-02-03 21:34 ` Burján Gábor 2002-02-03 21:58 ` Alan Cox @ 2002-02-03 22:44 ` Trond Myklebust 2002-02-03 23:00 ` Burján Gábor 1 sibling, 1 reply; 8+ messages in thread From: Trond Myklebust @ 2002-02-03 22:44 UTC (permalink / raw) To: Burján Gábor; +Cc: Alan Cox, Linux Kernel Hmm... pcnet32.c seems to engage in some dubious practices. Look for instance at the way it can call pcnet32_restart() from within the interrupt handler. Are you seeing any kernel log messages about 'Tx FIFO error!' that might indicate that particular code is getting triggered? Cheers, Trond ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.17 NFS hangup 2002-02-03 22:44 ` Trond Myklebust @ 2002-02-03 23:00 ` Burján Gábor 2002-02-04 13:21 ` Athanasius 0 siblings, 1 reply; 8+ messages in thread From: Burján Gábor @ 2002-02-03 23:00 UTC (permalink / raw) To: Trond Myklebust; +Cc: Alan Cox, Linux Kernel On Sun, Feb 03, Trond Myklebust wrote: > Are you seeing any kernel log messages about 'Tx FIFO error!' that > might indicate that particular code is getting triggered? No, nothing logged except the NFS related messages. However, after NFS hangup I cannot scp from the host, but ssh works... I am beginning to think that this is not an NFS issue. Then what could it be? buga ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.17 NFS hangup 2002-02-03 23:00 ` Burján Gábor @ 2002-02-04 13:21 ` Athanasius 2002-02-04 14:47 ` Athanasius 0 siblings, 1 reply; 8+ messages in thread From: Athanasius @ 2002-02-04 13:21 UTC (permalink / raw) To: Burj?n G?bor; +Cc: Trond Myklebust, Alan Cox, Linux Kernel [-- Attachment #1: Type: text/plain, Size: 3292 bytes --] On Mon, Feb 04, 2002 at 12:00:30AM +0100, Burj?n G?bor wrote: > On Sun, Feb 03, Trond Myklebust wrote: > > > Are you seeing any kernel log messages about 'Tx FIFO error!' that > > might indicate that particular code is getting triggered? > > No, nothing logged except the NFS related messages. However, after NFS > hangup I cannot scp from the host, but ssh works... I am beginning to > think that this is not an NFS issue. Then what could it be? I'm seeing something like this as well. Two machines using BNC/thinwire (yes, I know, waiting on finances to make this better), 2 other machines on the same segment. I use an NFS mount from the server (jimblewix) on the workstation (emelia) for amongst other things playing mp3s. Machine specs: SERVER PII-400 @400MHz 384MB PC100 SDRAM eth0: NE2000 (ISA) <--- internal interface eth1: 3com509b <--- external interface, NFS traffic NOT on this Linux jimblewix 2.4.17 #7 Sat Jan 5 16:15:44 GMT 2002 i686 unknown WORKSTATION AMD Athlon XP 1600+ 1.4GHz, not overclocked 512MB PC2100 DDR eth0: NE2000 (PCI eth0: NetVin NV5000SC found at 0xdc00, IRQ 11, 00:40:95:45:91:38.) Linux emelia 2.4.18-pre7 #3 Thu Jan 31 07:07:48 GMT 2002 i686 unknown ALSO on 2.4.17 Repeatedly I'll have xmms stop playing an mp3 mid-file due to NFS timeouts. I have the same problem cp'ing large files over the NFS mounts as well. Currently these are soft mounts. IF I change them to hard mounts rather than an i/o error on that file and control coming back the app will just lock hard in D state until a reboot. /etc/fstab on the WORKSTATION: 192.168.0.162:/home/users on /home/users type nfs (rw,nosuid,nodev,nolock,rsize=8192,wsize=8192,soft,intr,addr=192.168.0.162) 192.168.0.162:/usr/local on /export/miggy-1/usr-local type nfs (rw,nosuid,nodev,rsize=8192,wsize=8192,soft,intr,addr=192.168.0.162) 192.168.0.162:/other on /other type nfs (rw,nosuid,nodev,rsize=8192,wsize=8192,soft,intr,addr=192.168.0.162) That last one is usually where I'm doing the big cp'ing to/from. I've just had the problem twice whilst typing this email: Feb 4 13:07:31 emelia kernel: nfs: server 192.168.0.162 not responding, timed o ut Feb 4 13:07:52 emelia last message repeated 2 times Feb 4 13:12:17 emelia kernel: nfs: server 192.168.0.162 not responding, timed o ut Feb 4 13:12:38 emelia last message repeated 2 times <NOTHING in /var/log/kern.log on jimblewix> I haven't had any of the following since this line: kern.log.2.gz:1649:Jan 18 07:39:28 emelia kernel: nfs: task 13016 can't get a request slot Whilst I appreciate that thinnet/BNC isn't the best technology to be using this segment isn't THAT busy most of the time, certainly not the majority of times mp3s cut out (ones that WILL play fine end to end at other times so it's not corruption in them). If there any patches/options (other than hard mounts without other changes) I should be trying please let me know. thanks, -Ath -- - Athanasius = Athanasius(at)gurus.tf / http://www.clan-lovely.org/~athan/ Finger athan(at)fysh.org for PGP key "And it's me who is my enemy. Me who beats me up. Me who makes the monsters. Me who strips my confidence." Paula Cole - ME [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.17 NFS hangup 2002-02-04 13:21 ` Athanasius @ 2002-02-04 14:47 ` Athanasius 0 siblings, 0 replies; 8+ messages in thread From: Athanasius @ 2002-02-04 14:47 UTC (permalink / raw) To: Linux Kernel; +Cc: Burj?n G?bor, Trond Myklebust, Alan Cox [-- Attachment #1: Type: text/plain, Size: 1427 bytes --] On Mon, Feb 04, 2002 at 01:21:46PM +0000, Athanasius wrote: > I'm seeing something like this as well. Two machines using > BNC/thinwire (yes, I know, waiting on finances to make this better), 2 > other machines on the same segment. I use an NFS mount from the server > (jimblewix) on the workstation (emelia) for amongst other things playing > mp3s. Seems to be my day for this happening. A bit more data: There's next to no collisions going on, from ifconfig eth0 on the SERVER: RX packets:31331103 errors:0 dropped:1 overruns:0 frame:151 TX packets:42576602 errors:0 dropped:0 overruns:0 carrier:0 collisions:33733 txqueuelen:100 and the WORKSTATION: RX packets:301884 errors:0 dropped:0 overruns:0 frame:0 TX packets:238086 errors:0 dropped:0 overruns:0 carrier:0 collisions:397 txqueuelen:100 Also the numbers on the SERVER at least for collisions didn't increase the last time NFS cut out on me. I'm not seeing ANY other logging in kern.log on either machine above the NFS timeout reports, nothing about NICs having trouble or the like. -Ath -- - Athanasius = Athanasius(at)gurus.tf / http://www.clan-lovely.org/~athan/ Finger athan(at)fysh.org for PGP key "And it's me who is my enemy. Me who beats me up. Me who makes the monsters. Me who strips my confidence." Paula Cole - ME [-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2002-02-04 14:47 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-02-03 20:22 2.4.17 NFS hangup Burjan Gabor 2002-02-03 21:06 ` Trond Myklebust 2002-02-03 21:34 ` Burján Gábor 2002-02-03 21:58 ` Alan Cox 2002-02-03 22:44 ` Trond Myklebust 2002-02-03 23:00 ` Burján Gábor 2002-02-04 13:21 ` Athanasius 2002-02-04 14:47 ` Athanasius
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox