Linux NFS development
 help / color / mirror / Atom feed
* processes stuck in D state
@ 2003-05-06 14:51 Michael Buesch
  2003-05-06 15:20 ` Trond Myklebust
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Buesch @ 2003-05-06 14:51 UTC (permalink / raw)
  To: neilb; +Cc: nfs

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Please take a look at this problem:

[linux-kernel-mailing-list thread]
http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2

thanks.
Please cc me, as I'm not subscribed to the nfs-list.

=2D --=20
Regards Michael B=FCsch
http://www.8ung.at/tuxsoft
 16:28:34 up 20 min,  1 user,  load average: 1.07, 0.97, 0.66
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+t8wdoxoigfggmSgRAm5aAJsGJLPe9yUd4sqah4yiU0GsMIAGzACfSa2+
gAMZvSHQirHmE8yZChpgH/8=3D
=3Dpka2
=2D----END PGP SIGNATURE-----



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: processes stuck in D state
@ 2003-05-06 15:18 Lever, Charles
  0 siblings, 0 replies; 13+ messages in thread
From: Lever, Charles @ 2003-05-06 15:18 UTC (permalink / raw)
  To: Michael Buesch; +Cc: nfs, neilb

hi michael-

i'm not sure why you mailed neilb -- this appears to
be NFS client related, not server related.

can you spell out the sequence of events that leads
to the stuck processes?  it looks like the client
is working-as-designed, but if you can provide more
details, we can verify what's going on.

> -----Original Message-----
> From: Michael Buesch [mailto:fsdeveloper@yahoo.de]
> Sent: Tuesday, May 06, 2003 10:52 AM
> To: neilb@cse.unsw.edu.au
> Cc: nfs@lists.sourceforge.net
> Subject: [NFS] processes stuck in D state
>=20
>=20
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Hi!
>=20
> Please take a look at this problem:
>=20
> [linux-kernel-mailing-list thread]
> http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>=20
> thanks.
> Please cc me, as I'm not subscribed to the nfs-list.
>=20
> - --=20
> Regards Michael B=FCsch
> http://www.8ung.at/tuxsoft
>  16:28:34 up 20 min,  1 user,  load average: 1.07, 0.97, 0.66
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (GNU/Linux)
>=20
> iD8DBQE+t8wdoxoigfggmSgRAm5aAJsGJLPe9yUd4sqah4yiU0GsMIAGzACfSa2+
> gAMZvSHQirHmE8yZChpgH/8=3D
> =3Dpka2
> -----END PGP SIGNATURE-----
>=20
>=20
>=20
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux=20
> enterprise solutions
> www.enterpriselinuxforum.com
>=20
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: processes stuck in D state
@ 2003-05-06 15:47 Lever, Charles
  0 siblings, 0 replies; 13+ messages in thread
From: Lever, Charles @ 2003-05-06 15:47 UTC (permalink / raw)
  To: Michael Buesch; +Cc: nfs, linux kernel mailing list, Zeev Fisher

> To reproduce the problem:
> - - mount some nfs from a server in your lan.
> - - Open an app, that uses the mounted fs. I've simply opened a
>   konqueror-window for the directory where the nfs is mounted.
> - - shut down or crash the server or just pull the network-cable.
> - - Now the konqueror-process is nonkillable in D state. There's no
>   chance to kill it.

does the problem persist after you reconnect the network cable?
what happens when the server becomes available again?
are you mounting with UDP or TCP?


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: processes stuck in D state
@ 2003-05-06 17:09 pwitting
  2003-05-06 17:31 ` Michael Buesch
  0 siblings, 1 reply; 13+ messages in thread
From: pwitting @ 2003-05-06 17:09 UTC (permalink / raw)
  To: nfs; +Cc: fsdeveloper

Actually, I have seen this on the server before, specifically
When using an older version of IBM's JFS for Linux with RH 7.3

I worked with the JFS team and JFS v1.1.0 and kernel 2.4.20 
seemed to stabilize it, at least I've seen no more "freezes"
as a result. Judging by your mail address this might be your 
problem as well.

One noticeable symptom is that cd'ing to an affected dir and 
attempting an ls would also freeze (I had a large (20GB+) file 
copy going, so I usually knew what the affected dir was.

Two other things that helped:
1) increasing the # of nfs threads (120 or more)
2) ensuring the uid/gid the remote thread was using existed on 
the server. (sounds stupid but it helped)

neither "cured" the issue, but it went from being reproducible 
to being occasional.

Good Luck.

> From: "Lever, Charles" <Charles.Lever@netapp.com>
>
> i'm not sure why you mailed neilb -- this appears to
> be NFS client related, not server related.
> 
> can you spell out the sequence of events that leads
> to the stuck processes?  it looks like the client
> is working-as-designed, but if you can provide more
> details, we can verify what's going on.
> 
>> -----Original Message-----
>> From: Michael Buesch [mailto:fsdeveloper@yahoo.de]
>> Sent: Tuesday, May 06, 2003 10:52 AM
>> To: neilb@cse.unsw.edu.au
>> Cc: nfs@lists.sourceforge.net
>> Subject: [NFS] processes stuck in D state
>>
>> Please take a look at this problem:
>>
>> [linux-kernel-mailing-list thread]
>> http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>>
>> thanks.
>> Please cc me, as I'm not subscribed to the nfs-list.



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread
* RE: processes stuck in D state
@ 2003-05-06 18:32 Guolin Cheng
  0 siblings, 0 replies; 13+ messages in thread
From: Guolin Cheng @ 2003-05-06 18:32 UTC (permalink / raw)
  To: 'Lever, Charles', Michael Buesch, Andy Jewell,
	Jonathan Baker, Paul Sauer
  Cc: nfs, neilb, Ops

Hi, all,

 We are encountering the same problem here as well,=20
=20
[root@arc100 root]# ps auxw
......
 andy      9087  0.0  3.2 12900 10312 ?       D    May05   0:02
/0/tmp/av_explore .index /net/arc295/0/DQ_crawl26.20030416034937.arc.gz
......

although the nfs server has no problem at all. If I try to "ls
/net/arc295/0" then the new "ls" process will hang as well.
The method I followed to fix the problem is:

[root@arc100 root]# umount -f /net/arc295/0
umount2: Device or resource busy
umount: /net/arc295/0: Illegal seek
[root@arc100 root]# /etc/init.d/amd restart
Stopping amd:                                              [  OK  ]
Starting amd:                                              [  OK  ]
[root@arc100 root]# cd /net/arc295/0
[root@arc100 0]# ls=20

 My nfs clients/servers has the same configurations:=20

	Redhat 8.0
 	General Linux Kernel 2.4.20 ("nfs over tcp" is enabled)
	gcc-3.2-7=20
 	amd ( am-utils-6.0.7-9 )
	amd mount options in map amd.master
(opts:=3Drw,intr,nfsv3,tcp,nosuid,nodev,noresvport)
      The real amd mount status for the nfs directory:
        arc295:/0 on /.amd_mnt/arc295/host/0 type nfs
(rw,intr,nfsv3,tcp,nosuid,nodev,noresvport,dev=3D0000f10e,vers=3D3,proto=
=3Dtcp)

 Any one need more info the shoot the sort of problem, let me know.

 Thanks.
 --Guolin Cheng


-----Original Message-----
From: Lever, Charles [mailto:Charles.Lever@netapp.com]
Sent: Tuesday, May 06, 2003 8:18 AM
To: Michael Buesch
Cc: nfs@lists.sourceforge.net; neilb@cse.unsw.edu.au
Subject: RE: [NFS] processes stuck in D state


hi michael-

i'm not sure why you mailed neilb -- this appears to
be NFS client related, not server related.

can you spell out the sequence of events that leads
to the stuck processes?  it looks like the client
is working-as-designed, but if you can provide more
details, we can verify what's going on.

> -----Original Message-----
> From: Michael Buesch [mailto:fsdeveloper@yahoo.de]
> Sent: Tuesday, May 06, 2003 10:52 AM
> To: neilb@cse.unsw.edu.au
> Cc: nfs@lists.sourceforge.net
> Subject: [NFS] processes stuck in D state
>=20
>=20
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Hi!
>=20
> Please take a look at this problem:
>=20
> [linux-kernel-mailing-list thread]
> http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>=20
> thanks.
> Please cc me, as I'm not subscribed to the nfs-list.
>=20
> - --=20
> Regards Michael B=FCsch
> http://www.8ung.at/tuxsoft
>  16:28:34 up 20 min,  1 user,  load average: 1.07, 0.97, 0.66
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (GNU/Linux)
>=20
> iD8DBQE+t8wdoxoigfggmSgRAm5aAJsGJLPe9yUd4sqah4yiU0GsMIAGzACfSa2+
> gAMZvSHQirHmE8yZChpgH/8=3D
> =3Dpka2
> -----END PGP SIGNATURE-----
>=20
>=20
>=20
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux=20
> enterprise solutions
> www.enterpriselinuxforum.com
>=20
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise =
solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Processes stuck in D state
@ 2004-02-18 16:33 Olaf Kirch
  0 siblings, 0 replies; 13+ messages in thread
From: Olaf Kirch @ 2004-02-18 16:33 UTC (permalink / raw)
  To: nfs; +Cc: Olaf Hering

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]


Hi,

I spent much of today investigating a weird NFS problem on 2.6.3.

After one of our servers went away and came back, several processes on
a ppc machine were left in D state. They did not get woken up during
the whole day, until I did a "umount -f" after several hours of debugging.

The internal state of this RPC client looks a little weird. I'm
attaching some debug output that shows where they got stuck. Some
general observation:

 -	this does not seem a queue corruption bug, which is good :)
 -	the tasks were sleeping on different wait queues (pending,
 	sending, 1 one resend)
 -	all tasks have a tk_timeout value of 0
 -	the ntimeo values of the RTT estimators being 0 looks a
	little weird, given that the mount froze because the
	server wasn't reachable.
 -	the task on the resend queue has a timer with
	tk_timer.expires != 0, but unfortunately I forgot to check whether
	it was active.	But I doubt it; I had debugging enabled for much
	of the day and the tk_pid in question never showed up in the log

I'm not sure yet what exactly happened here. I don't understand how a
task on xprt->pending can have a timeout value of 0...

Does anyone have an idea what might be going wrong here?

Olaf
-- 
Olaf Kirch     |  Stop wasting entropy - start using predictable
okir@suse.de   |  tempfile names today!
---------------+ 

[-- Attachment #2: nfs-messages --]
[-- Type: text/plain, Size: 1489 bytes --]

 Found NFS mount, server=Hilbert2,v3,rsize=8192,wsize=8192
   RPC client 6 users
     Active RPC tasks for this client:
       task 21384, status=-11, timeout=0, active, sleeping, on queue c45ea044(xprt_sending)
       task 9969, status=-11, timeout=0, active, sleeping, on queue c45ea044(xprt_sending)
       task 52431, status=-11, timeout=0, active, sleeping, on queue c45ea044(xprt_sending)
       task 43528, status=-11, timeout=0, active, sleeping, on queue c45ea044(xprt_sending)
       task 43527, status=-11, timeout=0, timer, async, active, sleeping, on queue c45ea050(xprt_resend)
       task 55816, status=0, timeout=0, active, sleeping, on queue c45ea05c(xprt_pending)
     Transport c45ea000, sockstate=0x1
       cong 256/cwnd 256
       RTT estimates (def timeout 700):
         0: rtt   15 srtt  100 ntimeo   0
         1: rtt   15 srtt  100 ntimeo   0
         2: rtt   49 srtt  100 ntimeo   0
         3: rtt   15 srtt  100 ntimeo   0
         4: rtt   15 srtt  100 ntimeo   0
     RPC wait queue sending:
       task 43528, status=-11, timeout=0, active, sleeping
       task 52431, status=-11, timeout=0, active, sleeping
       task 9969, status=-11, timeout=0, active, sleeping
       task 21384, status=-11, timeout=0, active, sleeping
     RPC wait queue pending:
       task 55816, status=0, timeout=0, active, sleeping
     RPC wait queue resend:
       task 43527, status=-11, timeout=0, active timer, async, active, sleeping
     RPC wait queue backlog: empty

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-02-18 16:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-06 14:51 processes stuck in D state Michael Buesch
2003-05-06 15:20 ` Trond Myklebust
2003-05-06 15:41   ` Michael Buesch
2003-05-06 16:05     ` Trond Myklebust
2003-05-06 16:30       ` Michael Buesch
2003-05-06 16:54         ` Trond Myklebust
2003-05-06 17:32           ` [NFS] " Michael Buesch
  -- strict thread matches above, loose matches on Subject: below --
2003-05-06 15:18 Lever, Charles
2003-05-06 15:47 Lever, Charles
2003-05-06 17:09 pwitting
2003-05-06 17:31 ` Michael Buesch
2003-05-06 18:32 Guolin Cheng
2004-02-18 16:33 Processes " Olaf Kirch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox