* process lockups @ 2000-10-24 1:22 Karsten Merker 2000-10-24 2:47 ` Ralf Baechle 2000-10-24 12:25 ` Florian Lohoff 0 siblings, 2 replies; 9+ messages in thread From: Karsten Merker @ 2000-10-24 1:22 UTC (permalink / raw) To: linux-mips; +Cc: linux-mips Hallo everyone, I am running Kernel 2.4.0-test9 on a DECstation 5000/150. I am experiencing a strange behaviour when having strong I/O-load, such as running a "tar xvf foobar.tgz" with a large archive. After some time of activity the process (in this case tar) is stuck in status "D". There is neither an entry in the syslog nor on the console that would give me a hint what is happening. Is anyone else experiencing this? Another thing I see on my 5000/150 (and only there - this is my only R4K-machine, so I do not know whether this is CPU- or machine-type-bound) is "top" going weird, eating lots of CPU cycles and spitting messages "schedule_timeout: wrong timeout value fffbd0b2 from 800900f8; Setting flush to zero for top". I know Florian also has this on his 5000/150. Anyone else with the same behavoiur or any idea about the cause for this? Greetings, Karsten -- #include <standard_disclaimer> Nach Paragraph 28 Abs. 3 Bundesdatenschutzgesetz widerspreche ich der Nutzung oder Uebermittlung meiner Daten fuer Werbezwecke oder fuer die Markt- oder Meinungsforschung. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: process lockups 2000-10-24 1:22 process lockups Karsten Merker @ 2000-10-24 2:47 ` Ralf Baechle 2000-10-24 5:51 ` Houten K.H.C. van (Karel) 2000-10-24 12:25 ` Florian Lohoff 1 sibling, 1 reply; 9+ messages in thread From: Ralf Baechle @ 2000-10-24 2:47 UTC (permalink / raw) To: linux-mips, linux-mips On Tue, Oct 24, 2000 at 03:22:32AM +0200, Karsten Merker wrote: > I am running Kernel 2.4.0-test9 on a DECstation 5000/150. I am > experiencing a strange behaviour when having strong I/O-load, such as > running a "tar xvf foobar.tgz" with a large archive. After some time of > activity the process (in this case tar) is stuck in status "D". There is > neither an entry in the syslog nor on the console that would give me a > hint what is happening. Is anyone else experiencing this? I observe similar stuck processes on Origins - even without massive I/O load. I'm trying to track them but little success aside of fixing a few unrelated little bugs. Do you observe those on your R4k box also? Another things which I'm observing is that I occasinally can't unmount a filesystem. umount then says the fs is still in use. Sometimes it's at least possible to remount the fs r/o. Have you also observed this one? > Another thing I see on my 5000/150 (and only there - this is my only > R4K-machine, so I do not know whether this is CPU- or machine-type-bound) > is "top" going weird, eating lots of CPU cycles and spitting messages > "schedule_timeout: wrong timeout value fffbd0b2 from 800900f8; Setting > flush to zero for top". I know Florian also has this on his 5000/150. > Anyone else with the same behavoiur or any idea about the cause for this? Setting flush to zero for <process name> means that the floating point approximator is now enabled ;-) The schedule_timeout thing is unrelated; I've never heared of it before. Ralf ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: process lockups 2000-10-24 2:47 ` Ralf Baechle @ 2000-10-24 5:51 ` Houten K.H.C. van (Karel) 2000-10-24 11:15 ` Karsten Merker ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Houten K.H.C. van (Karel) @ 2000-10-24 5:51 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips, linux-mips, K.H.C.vanHouten Ralf Baechle wrote: > > On Tue, Oct 24, 2000 at 03:22:32AM +0200, Karsten Merker wrote: > > > I am running Kernel 2.4.0-test9 on a DECstation 5000/150. I am > > experiencing a strange behaviour when having strong I/O-load, such as > > running a "tar xvf foobar.tgz" with a large archive. After some time of > > activity the process (in this case tar) is stuck in status "D". There is > > neither an entry in the syslog nor on the console that would give me a > > hint what is happening. Is anyone else experiencing this? > > I observe similar stuck processes on Origins - even without massive I/O > load. I'm trying to track them but little success aside of fixing a few > unrelated little bugs. Do you observe those on your R4k box also? On my DEC 5000/260 (R4k) I have no stuck processes, but I should mention that I am running without swap (I have 192Mb RAM). > Another things which I'm observing is that I occasinally can't unmount > a filesystem. umount then says the fs is still in use. Sometimes it's > at least possible to remount the fs r/o. Have you also observed this one? Yes, but only the root FS. I thought I might have to upgrade to a newer mount program for the 2.4 kernel, or is the system call returning the error? > > Another thing I see on my 5000/150 (and only there - this is my only > > R4K-machine, so I do not know whether this is CPU- or machine-type-bound) > > is "top" going weird, eating lots of CPU cycles and spitting messages > > "schedule_timeout: wrong timeout value fffbd0b2 from 800900f8; Setting > > flush to zero for top". I know Florian also has this on his 5000/150. > > Anyone else with the same behavoiur or any idea about the cause for this? > > Setting flush to zero for <process name> means that the floating point > approximator is now enabled ;-) > > The schedule_timeout thing is unrelated; I've never heared of it before. Aside from this I stil get 'bug in get_wchan' messages, but everything seems to run fine. I hope to test my current kernels on a 5000/150 and a 3100. Regards, -- Karel van Houten ---------------------------------------------------------- The box said "Requires Windows 95 or better." I can't understand why it won't work on my Linux computer. ---------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: process lockups 2000-10-24 5:51 ` Houten K.H.C. van (Karel) @ 2000-10-24 11:15 ` Karsten Merker 2000-10-24 14:38 ` Ralf Baechle 2000-10-24 15:09 ` Ralf Baechle 2 siblings, 0 replies; 9+ messages in thread From: Karsten Merker @ 2000-10-24 11:15 UTC (permalink / raw) To: linux-mips, linux-mips On Tue, Oct 24, 2000 at 07:51:42AM +0200, Houten K.H.C. van (Karel) wrote: > > Ralf Baechle wrote: [hanging processes in status "D"] > > I observe similar stuck processes on Origins - even without massive I/O > > load. I'm trying to track them but little success aside of fixing a few > > unrelated little bugs. Do you observe those on your R4k box also? > On my DEC 5000/260 (R4k) I have no stuck processes, but I should mention > that I am running without swap (I have 192Mb RAM). Having swap or not does not seem to influence the behaviour - I also get hangs with swap disabled. Good candidates for hangig are either tar or gcc. > > Another things which I'm observing is that I occasinally can't unmount > > a filesystem. umount then says the fs is still in use. Sometimes it's > > at least possible to remount the fs r/o. Have you also observed this one? > Yes, but only the root FS. I thought I might have to upgrade to a newer > mount program for the 2.4 kernel, or is the system call returning the error? Similar effect here - sometimes unmounting the root fs on shutdown is successfull, sometimes I get "/ is busy" without being able to find a reason for that. Possibly it is a bug in the mount (I am still running mount-2.9o). > > > Another thing I see on my 5000/150 (and only there - this is my only > > > R4K-machine, so I do not know whether this is CPU- or machine-type-bound) > > > is "top" going weird, eating lots of CPU cycles and spitting messages > > > "schedule_timeout: wrong timeout value fffbd0b2 from 800900f8; Setting > > > flush to zero for top". I know Florian also has this on his 5000/150. > > > Anyone else with the same behavoiur or any idea about the cause for this? > > > > Setting flush to zero for <process name> means that the floating point > > approximator is now enabled ;-) ??? Greetings, Karsten -- #include <standard_disclaimer> Nach Paragraph 28 Abs. 3 Bundesdatenschutzgesetz widerspreche ich der Nutzung oder Uebermittlung meiner Daten fuer Werbezwecke oder fuer die Markt- oder Meinungsforschung. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: process lockups 2000-10-24 5:51 ` Houten K.H.C. van (Karel) 2000-10-24 11:15 ` Karsten Merker @ 2000-10-24 14:38 ` Ralf Baechle 2000-10-24 17:55 ` Karsten Merker 2000-10-24 15:09 ` Ralf Baechle 2 siblings, 1 reply; 9+ messages in thread From: Ralf Baechle @ 2000-10-24 14:38 UTC (permalink / raw) To: K.H.C.vanHouten; +Cc: linux-mips, linux-mips, K.H.C.vanHouten On Tue, Oct 24, 2000 at 07:51:42AM +0200, Houten K.H.C. van (Karel) wrote: > > > I am running Kernel 2.4.0-test9 on a DECstation 5000/150. I am > > > experiencing a strange behaviour when having strong I/O-load, such as > > > running a "tar xvf foobar.tgz" with a large archive. After some time of > > > activity the process (in this case tar) is stuck in status "D". There is > > > neither an entry in the syslog nor on the console that would give me a > > > hint what is happening. Is anyone else experiencing this? > > > > I observe similar stuck processes on Origins - even without massive I/O > > load. I'm trying to track them but little success aside of fixing a few > > unrelated little bugs. Do you observe those on your R4k box also? > On my DEC 5000/260 (R4k) I have no stuck processes, but I should mention > that I am running without swap (I have 192Mb RAM). That matches my Origin experience with it's 1.5gb RAM and no swap. > > Another things which I'm observing is that I occasinally can't unmount > > a filesystem. umount then says the fs is still in use. Sometimes it's > > at least possible to remount the fs r/o. Have you also observed this one? > Yes, but only the root FS. I thought I might have to upgrade to a newer > mount program for the 2.4 kernel, or is the system call returning the error? It also happens for other filesystems; the heavier the usage of the filesystem has been the more often. But I've never seen a hanging tar or gcc process. > Aside from this I stil get 'bug in get_wchan' messages, but everything > seems to run fine. I hope to test my current kernels on a 5000/150 and > a 3100. This message is harmless. The only effect is that the WCHAN column of ps axl will have bogus information. Which is a problem - I need exactly the WCHAN information to debug this problem. Ralf ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: process lockups 2000-10-24 14:38 ` Ralf Baechle @ 2000-10-24 17:55 ` Karsten Merker 2000-10-25 1:29 ` Ralf Baechle 0 siblings, 1 reply; 9+ messages in thread From: Karsten Merker @ 2000-10-24 17:55 UTC (permalink / raw) To: Ralf Baechle; +Cc: linux-mips, linux-mips On Tue, Oct 24, 2000 at 04:38:43PM +0200, Ralf Baechle wrote: > Which is a problem - I need exactly the WCHAN information to debug this > problem. Here we go... Two major processes are running: a tar zxvf (PIDs 212 and 213) and a dpkg-buildpackage. Both together should consume all CPU time available, but they do not, they just sit idle. Interesting is that here there is no process in state "D" as I had before. This seems to be reproducible. These logs were created from a fresh cvs-checkout (already including your patch). root# ps -laww F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 100 S 0 189 168 0 60 0 - 1000 pause ttyp0 00:00:00 screen 100 S 0 212 191 1 60 0 - 579 ? ttya0 00:00:00 tar 000 S 0 213 212 0 60 0 - 394 pipe_w ttya0 00:00:00 gzip 100 S 0 220 197 0 60 0 - 873 wait4 ttya2 00:00:00 dpkg-buildpacka 100 S 0 272 220 0 60 0 - 1563 wait4 ttya2 00:00:02 dpkg-source 000 S 0 277 272 0 60 0 - 394 pipe_w ttya2 00:00:00 gunzip 100 S 0 278 272 0 60 0 - 536 ? ttya2 00:00:00 cpio 000 S 0 279 278 0 60 0 - 864 wait4 ttya2 00:00:00 sh 000 S 0 280 279 0 60 0 - 333 pipe_w ttya2 00:00:00 egrep 000 R 0 283 196 0 60 0 - 800 - ttya1 00:00:00 ps While this happens, top tells: 27 processes: 26 sleeping, 1 running, 0 zombie, 0 stopped CPU states: 10.5% user, 9.5% system, 0.0% nice, 79.9% idle Mem: 127056K av, 22064K used, 104992K free, 0K shrd, 488K buff Swap: 0K av, 0K used, 0K free 10952K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 284 root 0 0 1048 1048 684 R 0 12.9 0.8 0:00 top 190 root 0 0 1204 1204 984 S 0 0.8 0.9 0:02 screen 1 root 0 0 484 484 408 S 0 0.0 0.3 0:02 init [...] Any further processes have 0% CPU. After some time the tar zxvf suddenly starts running and decompresses the archive in one step. Hope this description is helpful, if you need further information, just mail me. Greetings, Karsten -- #include <standard_disclaimer> Nach Paragraph 28 Abs. 3 Bundesdatenschutzgesetz widerspreche ich der Nutzung oder Uebermittlung meiner Daten fuer Werbezwecke oder fuer die Markt- oder Meinungsforschung. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: process lockups 2000-10-24 17:55 ` Karsten Merker @ 2000-10-25 1:29 ` Ralf Baechle 0 siblings, 0 replies; 9+ messages in thread From: Ralf Baechle @ 2000-10-25 1:29 UTC (permalink / raw) To: linux-mips, linux-mips On Tue, Oct 24, 2000 at 07:55:55PM +0200, Karsten Merker wrote: > Two major processes are running: a tar zxvf (PIDs 212 and 213) and a > dpkg-buildpackage. Both together should consume all CPU time available, > but they do not, they just sit idle. Interesting is that here there is no > process in state "D" as I had before. This seems to be reproducible. > > These logs were created from a fresh cvs-checkout (already including your > patch). Which was still pretty fishy. The scheduler has changed significantly and so it took a little bit more fixing. Which explains the `?' in the listing below. I tried to fix this in the CVS tree. > root# ps -laww > F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD > 100 S 0 189 168 0 60 0 - 1000 pause ttyp0 00:00:00 screen > 100 S 0 212 191 1 60 0 - 579 ? ttya0 00:00:00 tar > 000 S 0 213 212 0 60 0 - 394 pipe_w ttya0 00:00:00 gzip > 100 S 0 220 197 0 60 0 - 873 wait4 ttya2 00:00:00 dpkg-buildpacka > 100 S 0 272 220 0 60 0 - 1563 wait4 ttya2 00:00:02 dpkg-source > 000 S 0 277 272 0 60 0 - 394 pipe_w ttya2 00:00:00 gunzip > 100 S 0 278 272 0 60 0 - 536 ? ttya2 00:00:00 cpio > 000 S 0 279 278 0 60 0 - 864 wait4 ttya2 00:00:00 sh > 000 S 0 280 279 0 60 0 - 333 pipe_w ttya2 00:00:00 egrep > 000 R 0 283 196 0 60 0 - 800 - ttya1 00:00:00 ps Ok, so dpkg-buildpackage is waiting for the termination of some other process. > While this happens, top tells: > > 27 processes: 26 sleeping, 1 running, 0 zombie, 0 stopped > CPU states: 10.5% user, 9.5% system, 0.0% nice, 79.9% idle > Mem: 127056K av, 22064K used, 104992K free, 0K shrd, 488K buff > Swap: 0K av, 0K used, 0K free 10952K cached > > PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND > 284 root 0 0 1048 1048 684 R 0 12.9 0.8 0:00 top > 190 root 0 0 1204 1204 984 S 0 0.8 0.9 0:02 screen > 1 root 0 0 484 484 408 S 0 0.0 0.3 0:02 init > [...] > Any further processes have 0% CPU. Those CPU percentage are meaninless anyway. They don't indicate anything about a process' current CPU usage. > After some time the tar zxvf suddenly starts running and decompresses the > archive in one step. The `?' show that tar is sleeping but due to thie get_wchan bug was don't see on what it is waiting for so there is little I can do with this information ... Ralf ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: process lockups 2000-10-24 5:51 ` Houten K.H.C. van (Karel) 2000-10-24 11:15 ` Karsten Merker 2000-10-24 14:38 ` Ralf Baechle @ 2000-10-24 15:09 ` Ralf Baechle 2 siblings, 0 replies; 9+ messages in thread From: Ralf Baechle @ 2000-10-24 15:09 UTC (permalink / raw) To: K.H.C.vanHouten; +Cc: linux-mips, linux-mips On Tue, Oct 24, 2000 at 07:51:42AM +0200, Houten K.H.C. van (Karel) wrote: > Aside from this I stil get 'bug in get_wchan' messages, but everything > seems to run fine. I hope to test my current kernels on a 5000/150 and > a 3100. Try this untested fix for get_wchan. The values in the ps axl column should now be numbers that make sense as addresses. Unless the `n' option is also used ps will try to translate the address back into a symbol. Cite from ps(1): [...] To produce the WCHAN field, ps needs to read the Sys tem.map file created when the kernel is compiled. The search path is: $PS_SYSTEM_MAP /boot/System.map-`uname -r` /boot/System.map /lib/modules/`uname -r`/System.map /usr/src/linux/System.map /System.map [...] If that's working as planned please send me the WCHAN of any stuck process. I need to know where they're stuck. Ralf --- arch/mips/kernel/process.c 2000/10/05 01:18:43 1.21 +++ arch/mips/kernel/process.c 2000/10/24 14:54:29 @@ -203,18 +203,9 @@ return 0; pc = thread_saved_pc(&p->thread); - if (pc == (unsigned long) interruptible_sleep_on - || pc == (unsigned long) sleep_on) { - schedule_frame = ((unsigned long *)p->thread.reg30)[9]; - return ((unsigned long *)schedule_frame)[15]; - } - if (pc == (unsigned long) interruptible_sleep_on_timeout - || pc == (unsigned long) sleep_on_timeout) { - schedule_frame = ((unsigned long *)p->thread.reg30)[9]; - return ((unsigned long *)schedule_frame)[16]; - } if (pc >= first_sched && pc < last_sched) { - printk(KERN_DEBUG "Bug in %s\n", __FUNCTION__); + schedule_frame = ((unsigned long *)p->thread.reg30)[9]; + return ((unsigned long *)schedule_frame)[11]; } return pc; ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: process lockups 2000-10-24 1:22 process lockups Karsten Merker 2000-10-24 2:47 ` Ralf Baechle @ 2000-10-24 12:25 ` Florian Lohoff 1 sibling, 0 replies; 9+ messages in thread From: Florian Lohoff @ 2000-10-24 12:25 UTC (permalink / raw) To: linux-mips On Tue, Oct 24, 2000 at 03:22:32AM +0200, Karsten Merker wrote: > Hallo everyone, > > I am running Kernel 2.4.0-test9 on a DECstation 5000/150. I am > experiencing a strange behaviour when having strong I/O-load, such as > running a "tar xvf foobar.tgz" with a large archive. After some time of > activity the process (in this case tar) is stuck in status "D". There is > neither an entry in the syslog nor on the console that would give me a > hint what is happening. Is anyone else experiencing this? I have not seen this on my /150 although i have not been running -test9. I got that machine @home right now so ill check if i can reproduce this. > Another thing I see on my 5000/150 (and only there - this is my only > R4K-machine, so I do not know whether this is CPU- or machine-type-bound) > is "top" going weird, eating lots of CPU cycles and spitting messages > "schedule_timeout: wrong timeout value fffbd0b2 from 800900f8; Setting > flush to zero for top". I know Florian also has this on his 5000/150. > Anyone else with the same behavoiur or any idea about the cause for this? I guess this is Decstation specific as i cant seem to be able to reproduce this on the I2 - I have seen this too. Flo -- Florian Lohoff flo@rfc822.org +49-5201-669912 "Write only memory - Oops. Time for my medication again ..." ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2000-10-25 1:30 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2000-10-24 1:22 process lockups Karsten Merker 2000-10-24 2:47 ` Ralf Baechle 2000-10-24 5:51 ` Houten K.H.C. van (Karel) 2000-10-24 11:15 ` Karsten Merker 2000-10-24 14:38 ` Ralf Baechle 2000-10-24 17:55 ` Karsten Merker 2000-10-25 1:29 ` Ralf Baechle 2000-10-24 15:09 ` Ralf Baechle 2000-10-24 12:25 ` Florian Lohoff
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox