2.4.17 NFS hangup

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.4.17 NFS hangup
@ 2002-02-03 20:22 Burjan Gabor
  2002-02-03 21:06 ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: Burjan Gabor @ 2002-02-03 20:22 UTC (permalink / raw)
  To: LKML

Hello,

I have a reproducable problem with 2.4.17 kernel and NFS client after
netbooting an RS/6000 (ppc architecture).

Immediately after boot:

partvis:/tmp# dd if=/dev/zero of=blah1 count=1
1+0 records in
1+0 records out
partvis:/tmp#

partvis:/tmp# dd if=/dev/zero of=blah2 count=2
2+0 records in
2+0 records out
nfs: server 157.181.150.31 not responding, still trying
nfs: server 157.181.150.31 not responding, still trying
nfs: task 913 can't get a request slot
... and so on

Relevant tcpdump output:

20:41:40.927855 heron.elte.hu.nfs > partvis.elte.hu.3648238371: reply ok 28 lookup ERROR: No such file or directory (DF)
20:41:40.928622 partvis.elte.hu.3648238372 > heron.elte.hu.nfs: 148 create [|nfs] (DF)
20:41:40.929271 heron.elte.hu.nfs > partvis.elte.hu.3648238372: reply ok 128 create [|nfs] (DF)
20:41:40.930655 partvis.elte.hu.3648238373 > heron.elte.hu.nfs: 100 getattr [|nfs] (DF)
20:41:40.930976 heron.elte.hu.nfs > partvis.elte.hu.3648238373: reply ok 96 getattr REG 100644 ids 0

However, reading works without any problems.  Full tcpdump output from
poweron: http://www.csoma.elte.hu/~burjang/nfs-tcpdump-20010203.out.gz

	buga

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.17 NFS hangup
  2002-02-03 20:22 2.4.17 NFS hangup Burjan Gabor
@ 2002-02-03 21:06 ` Trond Myklebust
  2002-02-03 21:34   ` Burján Gábor
  0 siblings, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2002-02-03 21:06 UTC (permalink / raw)
  To: Burjan Gabor; +Cc: LKML

>>>>> " " == Burjan Gabor <buga+dated+1013026971.2270df@elte.hu> writes:

     > 20:41:40.927855 heron.elte.hu.nfs > partvis.elte.hu.3648238371:
     > reply ok 28 lookup ERROR: No such file or directory (DF)
     > 20:41:40.928622 partvis.elte.hu.3648238372 > heron.elte.hu.nfs:
     > 148 create [|nfs] (DF) 20:41:40.929271 heron.elte.hu.nfs >
     > partvis.elte.hu.3648238372: reply ok 128 create [|nfs] (DF)
     > 20:41:40.930655 partvis.elte.hu.3648238373 > heron.elte.hu.nfs:
     > 100 getattr [|nfs] (DF) 20:41:40.930976 heron.elte.hu.nfs >
     > partvis.elte.hu.3648238373: reply ok 96 getattr REG 100644 ids
     > 0

Nothing abnormal there or in your file. However, when you start
getting 'server not responding' messages, and no tcpdump output it's
usually a sign that the networking layer has given up on you. Any
strange output from 'netstat -s'?

It would be useful to know what networking card/driver combination you
are using? Any firewalls/netfilter setups? Any special mount options?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.17 NFS hangup
  2002-02-03 21:06 ` Trond Myklebust
@ 2002-02-03 21:34   ` Burján Gábor
  2002-02-03 21:58     ` Alan Cox
  2002-02-03 22:44     ` Trond Myklebust
  0 siblings, 2 replies; 8+ messages in thread
From: Burján Gábor @ 2002-02-03 21:34 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: LKML

Hello,

On Sun, Feb 03, Trond Myklebust wrote:

> Nothing abnormal there or in your file. However, when you start
> getting 'server not responding' messages, and no tcpdump output it's
> usually a sign that the networking layer has given up on you. Any
> strange output from 'netstat -s'?

Output is here: http://www.csoma.elte.hu/~burjang/netstat-s-20020203.out

I think `1710 reassemblies required' may be strange after boot...
How can I figure out what causes this?

> It would be useful to know what networking card/driver combination you
> are using? Any firewalls/netfilter setups? Any special mount options?

eth0: PCnet/PCI II 79C970A at 0x1020, 08 00 5a f8 82 e7
pcnet32: pcnet32_private lp=c0591000 lp_dma_addr=0x80591000 assigned IRQ 15.
pcnet32.c:v1.25kf 17.11.2001 tsbogend@alpha.franken.de

(this card is an integrated AMD pcnet32 in a 43P-140)

There are no firewalls or packet filters.  I didn't specify any
special mount options for nfs:

partvis:~$ cat /proc/mounts
/dev/root / nfs rw,v2,rsize=4096,wsize=4096,hard,udp,nolock,addr=157.181.150.31 0 0
proc /proc proc rw 0 0
partvis:~$

	buga

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.17 NFS hangup
  2002-02-03 21:34   ` Burján Gábor
@ 2002-02-03 21:58     ` Alan Cox
  2002-02-03 22:44     ` Trond Myklebust
  1 sibling, 0 replies; 8+ messages in thread
From: Alan Cox @ 2002-02-03 21:58 UTC (permalink / raw)
  To: "Burján Gábor"; +Cc: Trond Myklebust, LKML

> Output is here: http://www.csoma.elte.hu/~burjang/netstat-s-20020203.out
> 
> I think `1710 reassemblies required' may be strange after boot...
> How can I figure out what causes this?

NFS uses large packets.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.17 NFS hangup
  2002-02-03 21:34   ` Burján Gábor
  2002-02-03 21:58     ` Alan Cox
@ 2002-02-03 22:44     ` Trond Myklebust
  2002-02-03 23:00       ` Burján Gábor
  1 sibling, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2002-02-03 22:44 UTC (permalink / raw)
  To: Burján Gábor; +Cc: Alan Cox, Linux Kernel

Hmm... pcnet32.c seems to engage in some dubious practices. Look for
instance at the way it can call pcnet32_restart() from within the
interrupt handler.

Are you seeing any kernel log messages about 'Tx FIFO error!' that
might indicate that particular code is getting triggered?

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.17 NFS hangup
  2002-02-03 22:44     ` Trond Myklebust
@ 2002-02-03 23:00       ` Burján Gábor
  2002-02-04 13:21         ` Athanasius
  0 siblings, 1 reply; 8+ messages in thread
From: Burján Gábor @ 2002-02-03 23:00 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Alan Cox, Linux Kernel

On Sun, Feb 03, Trond Myklebust wrote:

> Are you seeing any kernel log messages about 'Tx FIFO error!' that
> might indicate that particular code is getting triggered?

No, nothing logged except the NFS related messages.  However, after NFS
hangup I cannot scp from the host, but ssh works...   I am beginning to
think that this is not an NFS issue.  Then what could it be?

	buga

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.17 NFS hangup
  2002-02-03 23:00       ` Burján Gábor
@ 2002-02-04 13:21         ` Athanasius
  2002-02-04 14:47           ` Athanasius
  0 siblings, 1 reply; 8+ messages in thread
From: Athanasius @ 2002-02-04 13:21 UTC (permalink / raw)
  To: Burj?n G?bor; +Cc: Trond Myklebust, Alan Cox, Linux Kernel

[-- Attachment #1: Type: text/plain, Size: 3292 bytes --]

On Mon, Feb 04, 2002 at 12:00:30AM +0100, Burj?n G?bor wrote:
> On Sun, Feb 03, Trond Myklebust wrote:
> 
> > Are you seeing any kernel log messages about 'Tx FIFO error!' that
> > might indicate that particular code is getting triggered?
> 
> No, nothing logged except the NFS related messages.  However, after NFS
> hangup I cannot scp from the host, but ssh works...   I am beginning to
> think that this is not an NFS issue.  Then what could it be?

   I'm seeing something like this as well.  Two machines using
BNC/thinwire (yes, I know, waiting on finances to make this better), 2
other machines on the same segment.  I use an NFS mount from the server
(jimblewix) on the workstation (emelia) for amongst other things playing
mp3s.
   Machine specs:

	SERVER
	PII-400 @400MHz
	384MB PC100 SDRAM
	eth0: NE2000 (ISA) <--- internal interface
	eth1: 3com509b     <--- external interface, NFS traffic NOT on this
	Linux jimblewix 2.4.17 #7 Sat Jan 5 16:15:44 GMT 2002 i686 unknown

	WORKSTATION
	AMD Athlon XP 1600+ 1.4GHz, not overclocked
	512MB PC2100 DDR
	eth0: NE2000 (PCI eth0: NetVin NV5000SC found at 0xdc00, IRQ 11,
	00:40:95:45:91:38.)
	Linux emelia 2.4.18-pre7 #3 Thu Jan 31 07:07:48 GMT 2002 i686 unknown
	ALSO on 2.4.17

Repeatedly I'll have xmms stop playing an mp3 mid-file due to NFS
timeouts.  I have the same problem cp'ing large files over the NFS
mounts as well.  Currently these are soft mounts.  IF I change them to
hard mounts rather than an i/o error on that file and control coming
back the app will just lock hard in D state until a reboot.

/etc/fstab on the WORKSTATION:

192.168.0.162:/home/users on /home/users type nfs (rw,nosuid,nodev,nolock,rsize=8192,wsize=8192,soft,intr,addr=192.168.0.162)
192.168.0.162:/usr/local on /export/miggy-1/usr-local type nfs (rw,nosuid,nodev,rsize=8192,wsize=8192,soft,intr,addr=192.168.0.162)
192.168.0.162:/other on /other type nfs (rw,nosuid,nodev,rsize=8192,wsize=8192,soft,intr,addr=192.168.0.162)

That last one is usually where I'm doing the big cp'ing to/from.

I've just had the problem twice whilst typing this email:

Feb  4 13:07:31 emelia kernel: nfs: server 192.168.0.162 not responding, timed o
ut
Feb  4 13:07:52 emelia last message repeated 2 times
Feb  4 13:12:17 emelia kernel: nfs: server 192.168.0.162 not responding, timed o
ut
Feb  4 13:12:38 emelia last message repeated 2 times

<NOTHING in /var/log/kern.log on jimblewix>

I haven't had any of the following since this line:

kern.log.2.gz:1649:Jan 18 07:39:28 emelia kernel: nfs: task 13016 can't
get a request slot

   Whilst I appreciate that thinnet/BNC isn't the best technology to be
using this segment isn't THAT busy most of the time, certainly not the
majority of times mp3s cut out (ones that WILL play fine end to end at
other times so it's not corruption in them).  

  If there any patches/options (other than hard mounts without other
changes) I should be trying please let me know.

thanks,

-Ath
-- 
- Athanasius = Athanasius(at)gurus.tf / http://www.clan-lovely.org/~athan/
                  Finger athan(at)fysh.org for PGP key
	   "And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence." Paula Cole - ME

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.17 NFS hangup
  2002-02-04 13:21         ` Athanasius
@ 2002-02-04 14:47           ` Athanasius
  0 siblings, 0 replies; 8+ messages in thread
From: Athanasius @ 2002-02-04 14:47 UTC (permalink / raw)
  To: Linux Kernel; +Cc: Burj?n G?bor, Trond Myklebust, Alan Cox

[-- Attachment #1: Type: text/plain, Size: 1427 bytes --]

On Mon, Feb 04, 2002 at 01:21:46PM +0000, Athanasius wrote:
>    I'm seeing something like this as well.  Two machines using
> BNC/thinwire (yes, I know, waiting on finances to make this better), 2
> other machines on the same segment.  I use an NFS mount from the server
> (jimblewix) on the workstation (emelia) for amongst other things playing
> mp3s.

   Seems to be my day for this happening.  A bit more data:

   There's next to no collisions going on, from ifconfig eth0 on the
SERVER:

          RX packets:31331103 errors:0 dropped:1 overruns:0 frame:151
          TX packets:42576602 errors:0 dropped:0 overruns:0 carrier:0
          collisions:33733 txqueuelen:100 

and the WORKSTATION:

          RX packets:301884 errors:0 dropped:0 overruns:0 frame:0
          TX packets:238086 errors:0 dropped:0 overruns:0 carrier:0
          collisions:397 txqueuelen:100 

Also the numbers on the SERVER at least for collisions didn't increase
the last time NFS cut out on me.

   I'm not seeing ANY other logging in kern.log on either machine above
the NFS timeout reports, nothing about NICs having trouble or the like.

-Ath
-- 
- Athanasius = Athanasius(at)gurus.tf / http://www.clan-lovely.org/~athan/
                  Finger athan(at)fysh.org for PGP key
	   "And it's me who is my enemy. Me who beats me up.
Me who makes the monsters. Me who strips my confidence." Paula Cole - ME

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-02-04 14:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-03 20:22 2.4.17 NFS hangup Burjan Gabor
2002-02-03 21:06 ` Trond Myklebust
2002-02-03 21:34   ` Burján Gábor
2002-02-03 21:58     ` Alan Cox
2002-02-03 22:44     ` Trond Myklebust
2002-02-03 23:00       ` Burján Gábor
2002-02-04 13:21         ` Athanasius
2002-02-04 14:47           ` Athanasius

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox