All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: processes stuck in D state
@ 2003-05-06 17:09 pwitting
  2003-05-06 17:31 ` Michael Buesch
  0 siblings, 1 reply; 19+ messages in thread
From: pwitting @ 2003-05-06 17:09 UTC (permalink / raw)
  To: nfs; +Cc: fsdeveloper

Actually, I have seen this on the server before, specifically
When using an older version of IBM's JFS for Linux with RH 7.3

I worked with the JFS team and JFS v1.1.0 and kernel 2.4.20 
seemed to stabilize it, at least I've seen no more "freezes"
as a result. Judging by your mail address this might be your 
problem as well.

One noticeable symptom is that cd'ing to an affected dir and 
attempting an ls would also freeze (I had a large (20GB+) file 
copy going, so I usually knew what the affected dir was.

Two other things that helped:
1) increasing the # of nfs threads (120 or more)
2) ensuring the uid/gid the remote thread was using existed on 
the server. (sounds stupid but it helped)

neither "cured" the issue, but it went from being reproducible 
to being occasional.

Good Luck.

> From: "Lever, Charles" <Charles.Lever@netapp.com>
>
> i'm not sure why you mailed neilb -- this appears to
> be NFS client related, not server related.
> 
> can you spell out the sequence of events that leads
> to the stuck processes?  it looks like the client
> is working-as-designed, but if you can provide more
> details, we can verify what's going on.
> 
>> -----Original Message-----
>> From: Michael Buesch [mailto:fsdeveloper@yahoo.de]
>> Sent: Tuesday, May 06, 2003 10:52 AM
>> To: neilb@cse.unsw.edu.au
>> Cc: nfs@lists.sourceforge.net
>> Subject: [NFS] processes stuck in D state
>>
>> Please take a look at this problem:
>>
>> [linux-kernel-mailing-list thread]
>> http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>>
>> thanks.
>> Please cc me, as I'm not subscribed to the nfs-list.



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* Processes stuck in D state
@ 2004-02-18 16:33 Olaf Kirch
  0 siblings, 0 replies; 19+ messages in thread
From: Olaf Kirch @ 2004-02-18 16:33 UTC (permalink / raw)
  To: nfs; +Cc: Olaf Hering

[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]


Hi,

I spent much of today investigating a weird NFS problem on 2.6.3.

After one of our servers went away and came back, several processes on
a ppc machine were left in D state. They did not get woken up during
the whole day, until I did a "umount -f" after several hours of debugging.

The internal state of this RPC client looks a little weird. I'm
attaching some debug output that shows where they got stuck. Some
general observation:

 -	this does not seem a queue corruption bug, which is good :)
 -	the tasks were sleeping on different wait queues (pending,
 	sending, 1 one resend)
 -	all tasks have a tk_timeout value of 0
 -	the ntimeo values of the RTT estimators being 0 looks a
	little weird, given that the mount froze because the
	server wasn't reachable.
 -	the task on the resend queue has a timer with
	tk_timer.expires != 0, but unfortunately I forgot to check whether
	it was active.	But I doubt it; I had debugging enabled for much
	of the day and the tk_pid in question never showed up in the log

I'm not sure yet what exactly happened here. I don't understand how a
task on xprt->pending can have a timeout value of 0...

Does anyone have an idea what might be going wrong here?

Olaf
-- 
Olaf Kirch     |  Stop wasting entropy - start using predictable
okir@suse.de   |  tempfile names today!
---------------+ 

[-- Attachment #2: nfs-messages --]
[-- Type: text/plain, Size: 1489 bytes --]

 Found NFS mount, server=Hilbert2,v3,rsize=8192,wsize=8192
   RPC client 6 users
     Active RPC tasks for this client:
       task 21384, status=-11, timeout=0, active, sleeping, on queue c45ea044(xprt_sending)
       task 9969, status=-11, timeout=0, active, sleeping, on queue c45ea044(xprt_sending)
       task 52431, status=-11, timeout=0, active, sleeping, on queue c45ea044(xprt_sending)
       task 43528, status=-11, timeout=0, active, sleeping, on queue c45ea044(xprt_sending)
       task 43527, status=-11, timeout=0, timer, async, active, sleeping, on queue c45ea050(xprt_resend)
       task 55816, status=0, timeout=0, active, sleeping, on queue c45ea05c(xprt_pending)
     Transport c45ea000, sockstate=0x1
       cong 256/cwnd 256
       RTT estimates (def timeout 700):
         0: rtt   15 srtt  100 ntimeo   0
         1: rtt   15 srtt  100 ntimeo   0
         2: rtt   49 srtt  100 ntimeo   0
         3: rtt   15 srtt  100 ntimeo   0
         4: rtt   15 srtt  100 ntimeo   0
     RPC wait queue sending:
       task 43528, status=-11, timeout=0, active, sleeping
       task 52431, status=-11, timeout=0, active, sleeping
       task 9969, status=-11, timeout=0, active, sleeping
       task 21384, status=-11, timeout=0, active, sleeping
     RPC wait queue pending:
       task 55816, status=0, timeout=0, active, sleeping
     RPC wait queue resend:
       task 43527, status=-11, timeout=0, active timer, async, active, sleeping
     RPC wait queue backlog: empty

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: processes stuck in D state
@ 2003-05-06 18:32 Guolin Cheng
  0 siblings, 0 replies; 19+ messages in thread
From: Guolin Cheng @ 2003-05-06 18:32 UTC (permalink / raw)
  To: 'Lever, Charles', Michael Buesch, Andy Jewell,
	Jonathan Baker, Paul Sauer
  Cc: nfs, neilb, Ops

Hi, all,

 We are encountering the same problem here as well,=20
=20
[root@arc100 root]# ps auxw
......
 andy      9087  0.0  3.2 12900 10312 ?       D    May05   0:02
/0/tmp/av_explore .index /net/arc295/0/DQ_crawl26.20030416034937.arc.gz
......

although the nfs server has no problem at all. If I try to "ls
/net/arc295/0" then the new "ls" process will hang as well.
The method I followed to fix the problem is:

[root@arc100 root]# umount -f /net/arc295/0
umount2: Device or resource busy
umount: /net/arc295/0: Illegal seek
[root@arc100 root]# /etc/init.d/amd restart
Stopping amd:                                              [  OK  ]
Starting amd:                                              [  OK  ]
[root@arc100 root]# cd /net/arc295/0
[root@arc100 0]# ls=20

 My nfs clients/servers has the same configurations:=20

	Redhat 8.0
 	General Linux Kernel 2.4.20 ("nfs over tcp" is enabled)
	gcc-3.2-7=20
 	amd ( am-utils-6.0.7-9 )
	amd mount options in map amd.master
(opts:=3Drw,intr,nfsv3,tcp,nosuid,nodev,noresvport)
      The real amd mount status for the nfs directory:
        arc295:/0 on /.amd_mnt/arc295/host/0 type nfs
(rw,intr,nfsv3,tcp,nosuid,nodev,noresvport,dev=3D0000f10e,vers=3D3,proto=
=3Dtcp)

 Any one need more info the shoot the sort of problem, let me know.

 Thanks.
 --Guolin Cheng


-----Original Message-----
From: Lever, Charles [mailto:Charles.Lever@netapp.com]
Sent: Tuesday, May 06, 2003 8:18 AM
To: Michael Buesch
Cc: nfs@lists.sourceforge.net; neilb@cse.unsw.edu.au
Subject: RE: [NFS] processes stuck in D state


hi michael-

i'm not sure why you mailed neilb -- this appears to
be NFS client related, not server related.

can you spell out the sequence of events that leads
to the stuck processes?  it looks like the client
is working-as-designed, but if you can provide more
details, we can verify what's going on.

> -----Original Message-----
> From: Michael Buesch [mailto:fsdeveloper@yahoo.de]
> Sent: Tuesday, May 06, 2003 10:52 AM
> To: neilb@cse.unsw.edu.au
> Cc: nfs@lists.sourceforge.net
> Subject: [NFS] processes stuck in D state
>=20
>=20
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Hi!
>=20
> Please take a look at this problem:
>=20
> [linux-kernel-mailing-list thread]
> http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>=20
> thanks.
> Please cc me, as I'm not subscribed to the nfs-list.
>=20
> - --=20
> Regards Michael B=FCsch
> http://www.8ung.at/tuxsoft
>  16:28:34 up 20 min,  1 user,  load average: 1.07, 0.97, 0.66
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (GNU/Linux)
>=20
> iD8DBQE+t8wdoxoigfggmSgRAm5aAJsGJLPe9yUd4sqah4yiU0GsMIAGzACfSa2+
> gAMZvSHQirHmE8yZChpgH/8=3D
> =3Dpka2
> -----END PGP SIGNATURE-----
>=20
>=20
>=20
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux=20
> enterprise solutions
> www.enterpriselinuxforum.com
>=20
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise =
solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: processes stuck in D state
@ 2003-05-06 15:47 Lever, Charles
  0 siblings, 0 replies; 19+ messages in thread
From: Lever, Charles @ 2003-05-06 15:47 UTC (permalink / raw)
  To: Michael Buesch; +Cc: nfs, linux kernel mailing list, Zeev Fisher

> To reproduce the problem:
> - - mount some nfs from a server in your lan.
> - - Open an app, that uses the mounted fs. I've simply opened a
>   konqueror-window for the directory where the nfs is mounted.
> - - shut down or crash the server or just pull the network-cable.
> - - Now the konqueror-process is nonkillable in D state. There's no
>   chance to kill it.

does the problem persist after you reconnect the network cable?
what happens when the server becomes available again?
are you mounting with UDP or TCP?


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: processes stuck in D state
@ 2003-05-06 15:18 Lever, Charles
  0 siblings, 0 replies; 19+ messages in thread
From: Lever, Charles @ 2003-05-06 15:18 UTC (permalink / raw)
  To: Michael Buesch; +Cc: nfs, neilb

hi michael-

i'm not sure why you mailed neilb -- this appears to
be NFS client related, not server related.

can you spell out the sequence of events that leads
to the stuck processes?  it looks like the client
is working-as-designed, but if you can provide more
details, we can verify what's going on.

> -----Original Message-----
> From: Michael Buesch [mailto:fsdeveloper@yahoo.de]
> Sent: Tuesday, May 06, 2003 10:52 AM
> To: neilb@cse.unsw.edu.au
> Cc: nfs@lists.sourceforge.net
> Subject: [NFS] processes stuck in D state
>=20
>=20
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>=20
> Hi!
>=20
> Please take a look at this problem:
>=20
> [linux-kernel-mailing-list thread]
> http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2
>=20
> thanks.
> Please cc me, as I'm not subscribed to the nfs-list.
>=20
> - --=20
> Regards Michael B=FCsch
> http://www.8ung.at/tuxsoft
>  16:28:34 up 20 min,  1 user,  load average: 1.07, 0.97, 0.66
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.1 (GNU/Linux)
>=20
> iD8DBQE+t8wdoxoigfggmSgRAm5aAJsGJLPe9yUd4sqah4yiU0GsMIAGzACfSa2+
> gAMZvSHQirHmE8yZChpgH/8=3D
> =3Dpka2
> -----END PGP SIGNATURE-----
>=20
>=20
>=20
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux=20
> enterprise solutions
> www.enterpriselinuxforum.com
>=20
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
>=20


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* processes stuck in D state
@ 2003-05-06 14:51 Michael Buesch
  2003-05-06 15:20 ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Buesch @ 2003-05-06 14:51 UTC (permalink / raw)
  To: neilb; +Cc: nfs

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Please take a look at this problem:

[linux-kernel-mailing-list thread]
http://marc.theaimsgroup.com/?t=3D98639966100003&r=3D1&w=3D2

thanks.
Please cc me, as I'm not subscribed to the nfs-list.

=2D --=20
Regards Michael B=FCsch
http://www.8ung.at/tuxsoft
 16:28:34 up 20 min,  1 user,  load average: 1.07, 0.97, 0.66
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE+t8wdoxoigfggmSgRAm5aAJsGJLPe9yUd4sqah4yiU0GsMIAGzACfSa2+
gAMZvSHQirHmE8yZChpgH/8=3D
=3Dpka2
=2D----END PGP SIGNATURE-----



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* processes stuck in D state
@ 2003-05-05  5:52 Zeev Fisher
  2003-05-05 14:56 ` Michael Buesch
  0 siblings, 1 reply; 19+ messages in thread
From: Zeev Fisher @ 2003-05-05  5:52 UTC (permalink / raw)
  To: linux-kernel

Hi!

I got a continuos problem of unkillable processes stuck in D state ( 
uninterruptable sleep ) on my Linux servers.
It happens randomly every time on other server on another process ( all 
the servers are configured the same with 2.4.18-10 kernel ). Here's an 
example :

root@lnx35 /]# ps -el|grep D
 F S   UID   PID  PPID  C PRI  NI ADDR    SZ WCHAN  TTY          TIME CMD
000 D   911 29327     1  0  75   0    -  9382 lock_p ?        00:00:00 
calibre
000 D   894 30049 15854  0  75   0    -  8995 lock_p ?        00:00:01 
calibrewb
000 D   894 30092  8661  0  75   0    -  8995 lock_p ?        00:00:01 
calibrewb
000 D   894 29773 26052  0  75   0    -  8977 lock_p ?        00:00:01 
calibrewb


It was probably stuck while trying to get a lock (which was
certainly free) on an NFS volume mounted from a Netapp server.

Enabling debug mode on rpc ( echo '65535' >/proc/sys/sunrpc/rpc_debug ) 
didn't gave me any clue.
Tracing the stucked process pid doesn't give any output.

Those processes are there already few days and will stay there until 
next reboot.

The load average is now 4 ( although the machine is 100% idle ) and the 
system seems to work fine.
If other programs are started again they run and use the same mounts 
that the processes above are stuck on.

Another detail is that those problems started when i added the 'intr' 
option to my nfs mounted fs but i'm not sure. Also, i can't easily check 
that since this problem is not reproducible.

Has anyone noticed the same behavior ? Is this a well known problem ?


Thanks for your help.

-- 
Zeev Fisher - Unix System Administrator
Marvell Semiconductor Israel Ltd
Moshav Manof, D.N. Misgav 20184, ISRAEL
Email    -  Zeev.Fisher@il.marvell.com
Tel      -  + 972 4 9091402
Cell     -  + 972 54 995402
Fax      -  + 972 4 9091501
WWW Page:     http://www.marvell.com

------------------------------------------------------------------------
This message may contain confidential, proprietary or legally privileged
information. The information is intended only for the use of the individual
or entity named above. If the reader of this message is not the
intended recipient, you are hereby notified that any dissemination, distribution
or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify us
immediately by telephone, or by e-mail and delete the message from your
computer.



^ permalink raw reply	[flat|nested] 19+ messages in thread
* processes stuck in D state
@ 2001-04-04 15:47 Pau Aliagas
  2001-04-07 22:07 ` Barry K. Nathan
  0 siblings, 1 reply; 19+ messages in thread
From: Pau Aliagas @ 2001-04-04 15:47 UTC (permalink / raw)
  To: lkml


Since 2.2.4-ac28 and 2.4.3 I keep on getting processes in D state that I
cannot kill, usually mozilla or nautilus which use a large amount of RAM.
Today is galeon:

A ps -eo pid,stat,pcpu,nwchan,wchan=WIDE-WCHAN-COLUMN -o args shows the
following:
11520 D     0.0 105db1 down_write_failed /usr/bin/galeon-bin

This didn't happen neither with 2.4.2 nor with 2.4.3-pre7; I'm not sure
about pre8.

Pau


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2004-02-18 16:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-06 17:09 processes stuck in D state pwitting
2003-05-06 17:31 ` Michael Buesch
  -- strict thread matches above, loose matches on Subject: below --
2004-02-18 16:33 Processes " Olaf Kirch
2003-05-06 18:32 processes " Guolin Cheng
2003-05-06 15:47 Lever, Charles
2003-05-06 15:18 Lever, Charles
2003-05-06 14:51 Michael Buesch
2003-05-06 15:20 ` Trond Myklebust
2003-05-06 15:41   ` Michael Buesch
2003-05-06 16:05     ` Trond Myklebust
2003-05-06 16:30       ` Michael Buesch
2003-05-06 16:54         ` Trond Myklebust
2003-05-05  5:52 Zeev Fisher
2003-05-05 14:56 ` Michael Buesch
2003-05-05 15:24   ` Mike Waychison
2003-05-05 16:25     ` Michael Buesch
2003-05-05 22:12       ` jw schultz
2001-04-04 15:47 Pau Aliagas
2001-04-07 22:07 ` Barry K. Nathan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.