public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* uninteruptable sleep
@ 2001-04-03 13:08 Trevor Nichols
  2001-04-03 13:32 ` Stephen E. Clark
  2001-04-03 14:35 ` Alan Cox
  0 siblings, 2 replies; 20+ messages in thread
From: Trevor Nichols @ 2001-04-03 13:08 UTC (permalink / raw)
  To: linux-kernel

Hi all,

Since upgrading to the latest stable (2.4.3) kernel, I've noticed that
randomly some processes are going into an uninteruptable sleep and not
waking up at all.

It's happened to nautilus and today just happened to mozilla also.
Another common related problem is the load averages go up to n + "normal"
where n is the number of processes that have gone uninteruptable sleep.
This is making me think it's a kernel related problem.

I had one time where nautilus with 9 [presumably forked] processes of
itself go this way, causing load averages to go 9+, however the system
doesn't appear to be straining or strugling under that much load.

The previous kernel version that I was using (2.4.1) did not have this
problem.

One last thing, if this turns out to be a non-kernel problem, the
processes that *do* get stuck, are unkillable - even by root with SIGKILL.
Is there any way for it to be able to? :)  So far I have to reboot each
time it happens.


Best regards,
Trevor Nichols.


ps please CC replies to my address. thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
  2001-04-03 13:08 Trevor Nichols
@ 2001-04-03 13:32 ` Stephen E. Clark
  2001-04-08  1:56   ` Anton Blanchard
  2001-04-03 14:35 ` Alan Cox
  1 sibling, 1 reply; 20+ messages in thread
From: Stephen E. Clark @ 2001-04-03 13:32 UTC (permalink / raw)
  To: Trevor Nichols; +Cc: linux-kernel

That happened to me with 2.4.2-ac28 when I tried using DRM.
I also got the following messages in syslog.

/var/log/messages.1:Mar 31 12:15:04 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:15:04 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:15:15 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:15:15 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:15:16 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:15:40 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:16:18 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:16:31 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:16:32 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:16:45 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:16:45 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:16:48 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!
/var/log/messages.1:Mar 31 12:16:49 joker kernel:
[drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!/

So I turned off DRI in X 4.0.3

HTH
Steve

Trevor Nichols wrote:
> 
> Hi all,
> 
> Since upgrading to the latest stable (2.4.3) kernel, I've noticed that
> randomly some processes are going into an uninteruptable sleep and not
> waking up at all.
> 
> It's happened to nautilus and today just happened to mozilla also.
> Another common related problem is the load averages go up to n + "normal"
> where n is the number of processes that have gone uninteruptable sleep.
> This is making me think it's a kernel related problem.
> 
> I had one time where nautilus with 9 [presumably forked] processes of
> itself go this way, causing load averages to go 9+, however the system
> doesn't appear to be straining or strugling under that much load.
> 
> The previous kernel version that I was using (2.4.1) did not have this
> problem.
> 
> One last thing, if this turns out to be a non-kernel problem, the
> processes that *do* get stuck, are unkillable - even by root with SIGKILL.
> Is there any way for it to be able to? :)  So far I have to reboot each
> time it happens.
> 
> Best regards,
> Trevor Nichols.
> 
> ps please CC replies to my address. thanks.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
  2001-04-03 13:08 Trevor Nichols
  2001-04-03 13:32 ` Stephen E. Clark
@ 2001-04-03 14:35 ` Alan Cox
  2001-04-03 16:13   ` Trevor Nichols
  1 sibling, 1 reply; 20+ messages in thread
From: Alan Cox @ 2001-04-03 14:35 UTC (permalink / raw)
  To: Trevor Nichols; +Cc: linux-kernel

> One last thing, if this turns out to be a non-kernel problem, the
> processes that *do* get stuck, are unkillable - even by root with SIGKILL.
> Is there any way for it to be able to? :)  So far I have to reboot each
> time it happens.

Its a kernel bug if it gets stuck like this. You need to provide more info
though - what file system, what devices, how much memory. Also ps can give you
the wait address of a process stuck in 'D' state which is valuable for debug

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
  2001-04-03 14:35 ` Alan Cox
@ 2001-04-03 16:13   ` Trevor Nichols
  2001-04-03 18:04     ` J Sloan
  2001-04-05 15:47     ` Christian Pernegger
  0 siblings, 2 replies; 20+ messages in thread
From: Trevor Nichols @ 2001-04-03 16:13 UTC (permalink / raw)
  To: linux-kernel

> Its a kernel bug if it gets stuck like this. You need to provide more info
> though - what file system, what devices, how much memory. Also ps can give you
> the wait address of a process stuck in 'D' state which is valuable for debug

System specs:
Pentium 200 MMX
80MB RAM

2 IDE Drives:
SAMSUNG SV0844D 8.4GB
WDC AC21200H 1.2GB

All partitions are ext2 filesytems.

ps xl:
  F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME  COMMAND
040  1000  1230     1   9   0 24320    4 down_w D    ?          0:00  /home/data/mozilla/obj/dist/bin/mozi

[I'm not exactly sure how to get the wait address if it isn't shown above]

Other stuff:

Creative SB AWE64 PnP
16MB Voodoo 3 2000 and a 2MB S3 Virge display
RealTek RTL-8029 NIC
Sony CRX100E Burner

I'm running X in a dual-head configuration using the above 2 cards.
That's all I can think of at this time.

Thanks,
Trevor Nichols.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
@ 2001-04-03 16:40 Manfred Spraul
  2001-04-04  7:47 ` uninteruptable sleep (D state => load_avrg++) christophe barbe
  2001-04-04 16:07 ` uninteruptable sleep christophe barbe
  0 siblings, 2 replies; 20+ messages in thread
From: Manfred Spraul @ 2001-04-03 16:40 UTC (permalink / raw)
  To: ocdi; +Cc: linux-kernel, Alan Cox

> ps xl:
>   F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> 040 1000 1230 1 9 0 24320 4 down_w D ? 0:00
>           /home/data/mozilla/obj/dist/bin/mozi
>
down_w

Perhaps down_write_failed()? 2.4.3 converted the mmap semaphore to a
rw-sem.
Did you compile sysrq into your kernel? Then enable it with

#echo 1 > /proc/sys/kernel/sysrq
and press <Alt>+<SysRQ>+'t'

It prints the complete back trace, not just one function name

--
    Manfred




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
  2001-04-03 16:13   ` Trevor Nichols
@ 2001-04-03 18:04     ` J Sloan
  2001-04-03 23:09       ` Trevor Nichols
  2001-04-05 15:47     ` Christian Pernegger
  1 sibling, 1 reply; 20+ messages in thread
From: J Sloan @ 2001-04-03 18:04 UTC (permalink / raw)
  To: Trevor Nichols; +Cc: linux-kernel

Trevor Nichols wrote:

> > Its a kernel bug if it gets stuck like this. You need to provide more info
> > though - what file system, what devices, how much memory. Also ps can give you
> > the wait address of a process stuck in 'D' state which is valuable for debug
>
> ps xl:
>   F   UID   PID  PPID PRI  NI   VSZ  RSS WCHAN  STAT TTY        TIME  COMMAND
> 040  1000  1230     1   9   0 24320    4 down_w D    ?          0:00  /home/data/mozilla/obj/dist/bin/mozi
>
> [I'm not exactly sure how to get the wait address if it isn't shown above]
>

Try this:

ps -eo pid,stat,pcpu,nwchan,wchan=WIDE-WCHAN-COLUMN -o args


cu

Jup


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
  2001-04-03 18:04     ` J Sloan
@ 2001-04-03 23:09       ` Trevor Nichols
  2001-04-04 19:30         ` andersg
  0 siblings, 1 reply; 20+ messages in thread
From: Trevor Nichols @ 2001-04-03 23:09 UTC (permalink / raw)
  To: linux-kernel

> Did you compile sysrq into your kernel?

I haven't yet.  I'll enable it and see if I can trigger it next time I
reboot again.


> ps -eo pid,stat,pcpu,nwchan,wchan=WIDE-WCHAN-COLUMN -o args

1230 D     0.0 105cc1 down_write_failed /home/data/mozilla/obj/dist/bin/mozilla-bin


Hopefully that helps a bit more.

-Trev


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep (D state => load_avrg++)
  2001-04-03 16:40 uninteruptable sleep Manfred Spraul
@ 2001-04-04  7:47 ` christophe barbe
  2001-04-04 11:15   ` Alan Cox
  2001-04-04 16:07 ` uninteruptable sleep christophe barbe
  1 sibling, 1 reply; 20+ messages in thread
From: christophe barbe @ 2001-04-04  7:47 UTC (permalink / raw)
  To: linux-kernel

Sorry if I fork a bit the thread but I'm wondering why the load average is incremented for each D process.

I don't know if the kernel use this information (if yes please let me know).
But some programs like sendmail use this information to sleep when the load is too high (I believe from 12 for sendmail).
It makes sence but in the case of D process, the load average give a bad idea of the load because these process don't use CPU.

I use GFS to share a filesystem on several nodes. 
The file locking use real IO and so when you ask for a lock, if the lock is already owned, you fall in a D state.
This differs from what a local filesystem does but IMHO makes sense for a distributed filesytem like GFS.

Christophe

On mar, 03 avr 2001 18:40:53 Manfred Spraul wrote:
> > ps xl:
> >   F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> > 040 1000 1230 1 9 0 24320 4 down_w D ? 0:00
> >           /home/data/mozilla/obj/dist/bin/mozi
> >
> down_w
> 
> Perhaps down_write_failed()? 2.4.3 converted the mmap semaphore to a
> rw-sem.
> Did you compile sysrq into your kernel? Then enable it with
> 
> #echo 1 > /proc/sys/kernel/sysrq
> and press <Alt>+<SysRQ>+'t'
> 
> It prints the complete back trace, not just one function name
> 
> --
>     Manfred
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-- 
Christophe Barbé
Software Engineer
Lineo High Availability Group
42-46, rue Médéric
92110 Clichy - France
phone (33).1.41.40.02.12
fax (33).1.41.40.02.01
www.lineo.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep (D state => load_avrg++)
  2001-04-04  7:47 ` uninteruptable sleep (D state => load_avrg++) christophe barbe
@ 2001-04-04 11:15   ` Alan Cox
  2001-04-04 12:13     ` christophe barbe
  0 siblings, 1 reply; 20+ messages in thread
From: Alan Cox @ 2001-04-04 11:15 UTC (permalink / raw)
  To: christophe barbe; +Cc: linux-kernel

> The file locking use real IO and so when you ask for a lock, if the loc=
> k is already owned, you fall in a D state.

That seems odd. They should be using interruptible sleeps so you can interrupt
the task waiting for the lock, surely.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep (D state => load_avrg++)
  2001-04-04 11:15   ` Alan Cox
@ 2001-04-04 12:13     ` christophe barbe
  2001-04-04 12:53       ` Alan Cox
                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: christophe barbe @ 2001-04-04 12:13 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

The sleep should certainly be interruptible and I that's what I said to the GFS guy.
But what the reason to increment the load average for each D process ?

Thanks,
Christophe

On mer, 04 avr 2001 13:15:52 Alan Cox wrote:
> > The file locking use real IO and so when you ask for a lock, if the loc=
> > k is already owned, you fall in a D state.
> 
> That seems odd. They should be using interruptible sleeps so you can interrupt
> the task waiting for the lock, surely.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-- 
Christophe Barbé
Software Engineer
Lineo High Availability Group
42-46, rue Médéric
92110 Clichy - France
phone (33).1.41.40.02.12
fax (33).1.41.40.02.01
www.lineo.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep (D state => load_avrg++)
  2001-04-04 12:13     ` christophe barbe
@ 2001-04-04 12:53       ` Alan Cox
  2001-04-04 14:20       ` Paul Jakma
  2001-04-04 22:39       ` Tim Wright
  2 siblings, 0 replies; 20+ messages in thread
From: Alan Cox @ 2001-04-04 12:53 UTC (permalink / raw)
  To: christophe barbe; +Cc: Alan Cox, linux-kernel

> The sleep should certainly be interruptible and I that's what I said to t=
> he GFS guy.
> But what the reason to increment the load average for each D process ?

D indicates short term I/O wait. This is how unix has always computed the
laod average.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep (D state => load_avrg++)
  2001-04-04 12:13     ` christophe barbe
  2001-04-04 12:53       ` Alan Cox
@ 2001-04-04 14:20       ` Paul Jakma
  2001-04-04 14:48         ` christophe barbe
  2001-04-04 22:39       ` Tim Wright
  2 siblings, 1 reply; 20+ messages in thread
From: Paul Jakma @ 2001-04-04 14:20 UTC (permalink / raw)
  To: christophe barbe; +Cc: Alan Cox, linux-kernel

On Wed, 4 Apr 2001, christophe barbe wrote:

> The sleep should certainly be interruptible and I that's what I
> said to the GFS guy. But what the reason to increment the load
> average for each D process ?

from a philosical POV: they are processes that will be runnable as
soon as the kernel returns to them.

no idea if there are technical reasons for it.

>
> Thanks,
> Christophe

--paulj


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep (D state => load_avrg++)
  2001-04-04 14:20       ` Paul Jakma
@ 2001-04-04 14:48         ` christophe barbe
  2001-04-04 15:05           ` Paul Jakma
  0 siblings, 1 reply; 20+ messages in thread
From: christophe barbe @ 2001-04-04 14:48 UTC (permalink / raw)
  To: Paul Jakma; +Cc: Alan Cox, linux-kernel

<skip>
I've unfortunately no significant Unix culture. 
I'm certainly young enough to be excused and by luck Linux shows me the road to the hacker heaven.
So now I move forward the good direction, trying to understand the POSIX stuff ....
</skip>

>From me, a POV without technical reasons is not a philosical one but more certainly an historical one.

Process that will be runnable are not participating to the load so why incrementing the load average.
Moreover if a process should be in state D only for a short time, the influence of the incrementation should be near null for an AVERAGE value.
So why doing that (I mean load++) if there's an influence only when a process stay in a D state for a long time (= when the only effect is to distort the load measure) ?

What's the technical reason behind this load_avrg++ ???

Christophe


On mer, 04 avr 2001 16:20:04 Paul Jakma wrote:
> On Wed, 4 Apr 2001, christophe barbe wrote:
> 
> > The sleep should certainly be interruptible and I that's what I
> > said to the GFS guy. But what the reason to increment the load
> > average for each D process ?
> 
> from a philosical POV: they are processes that will be runnable as
> soon as the kernel returns to them.
> 
> no idea if there are technical reasons for it.
> 
> >
> > Thanks,
> > Christophe
> 
> --paulj
> 
-- 
Christophe Barbé
Software Engineer
Lineo High Availability Group
42-46, rue Médéric
92110 Clichy - France
phone (33).1.41.40.02.12
fax (33).1.41.40.02.01
www.lineo.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep (D state => load_avrg++)
  2001-04-04 14:48         ` christophe barbe
@ 2001-04-04 15:05           ` Paul Jakma
  2001-04-04 15:15             ` christophe barbe
  0 siblings, 1 reply; 20+ messages in thread
From: Paul Jakma @ 2001-04-04 15:05 UTC (permalink / raw)
  To: christophe barbe; +Cc: Alan Cox, linux-kernel

On Wed, 4 Apr 2001, christophe barbe wrote:

> From me, a POV without technical reasons is not a philosical one
> but more certainly an historical one.

there may be (and indeed probably are) good technical reasons, however
i am not well enough informed to say what they are.

> Process that will be runnable are not participating to the load so
> why incrementing the load average.

As i understand it:

load avg by nature is a measure of how many processes are 'runnable'
(ie waiting to run) over time.

a process waiting for the kernel to complete IO will indeed be
runnable as soon as the kernel is finished.

instead of waiting for CPU time (as with processes marked R), instead
these processes are waiting for kernel to complete.

> Moreover if a process should be
> in state D only for a short time, the influence of the
> incrementation should be near null for an AVERAGE value.

because the number of processes asleep, waiting on kernel to complete
IO may reasonably be considered to be a load.

imagine a box with a bunch of processes that do almost nothing but
call on the kernel to do IO. If you only count the runnable state
towards load_avg then your load_avg will be very low, even though your
box is swamped - you are ignoring the work of the kernel.

if you count D towards load_avg then it will reflect this abstract
'load' concept more accurately.

Ie, counting D towards load_avg is a way of taking kernel IO work into
account when calculating the load average figures.

> What's the technical reason behind this load_avrg++ ???
>
> Christophe
>

--paulj


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep (D state => load_avrg++)
  2001-04-04 15:05           ` Paul Jakma
@ 2001-04-04 15:15             ` christophe barbe
  0 siblings, 0 replies; 20+ messages in thread
From: christophe barbe @ 2001-04-04 15:15 UTC (permalink / raw)
  To: Paul Jakma; +Cc: linux-kernel

On mer, 04 avr 2001 17:05:05 Paul Jakma wrote:
> imagine a box with a bunch of processes that do almost nothing but
> call on the kernel to do IO. If you only count the runnable state
> towards load_avg then your load_avg will be very low, even though your
> box is swamped - you are ignoring the work of the kernel.
> 
> if you count D towards load_avg then it will reflect this abstract
> 'load' concept more accurately.
> 
> Ie, counting D towards load_avg is a way of taking kernel IO work into
> account when calculating the load average figures.

ok I'm convinced.
And a measure can't be perfect.

Thank you,
Christophe

-- 
Christophe Barbé
Software Engineer
Lineo High Availability Group
42-46, rue Médéric
92110 Clichy - France
phone (33).1.41.40.02.12
fax (33).1.41.40.02.01
www.lineo.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
  2001-04-03 16:40 uninteruptable sleep Manfred Spraul
  2001-04-04  7:47 ` uninteruptable sleep (D state => load_avrg++) christophe barbe
@ 2001-04-04 16:07 ` christophe barbe
  1 sibling, 0 replies; 20+ messages in thread
From: christophe barbe @ 2001-04-04 16:07 UTC (permalink / raw)
  To: linux-kernel

This problem seems to be related with the recent post from David Howells <dhowells@cambridge.redhat.com> with the subject "rw_semaphore bug".

Christophe

On mar, 03 avr 2001 18:40:53 Manfred Spraul wrote:
> > ps xl:
> >   F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> > 040 1000 1230 1 9 0 24320 4 down_w D ? 0:00
> >           /home/data/mozilla/obj/dist/bin/mozi
> >
> down_w
> 
> Perhaps down_write_failed()? 2.4.3 converted the mmap semaphore to a
> rw-sem.
> Did you compile sysrq into your kernel? Then enable it with
> 
> #echo 1 > /proc/sys/kernel/sysrq
> and press <Alt>+<SysRQ>+'t'
> 
> It prints the complete back trace, not just one function name
> 
> --
>     Manfred
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
-- 
Christophe Barbé
Software Engineer
Lineo High Availability Group
42-46, rue Médéric
92110 Clichy - France
phone (33).1.41.40.02.12
fax (33).1.41.40.02.01
www.lineo.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
  2001-04-03 23:09       ` Trevor Nichols
@ 2001-04-04 19:30         ` andersg
  0 siblings, 0 replies; 20+ messages in thread
From: andersg @ 2001-04-04 19:30 UTC (permalink / raw)
  To: Trevor Nichols; +Cc: linux-kernel

On Wed, Apr 04, 2001 at 08:39:19AM +0930, Trevor Nichols wrote:

> > ps -eo pid,stat,pcpu,nwchan,wchan=WIDE-WCHAN-COLUMN -o args
> 
> 1230 D     0.0 105cc1 down_write_failed /home/data/mozilla/obj/dist/bin/mozilla-bin

My mysql-server got stuck in down_write_failed today too.
SMP dual PentiumIII system with no swap. I can provide more info at request
and is willing to do more bug-hunting if that is needed.

-- 

//anders/g


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep (D state => load_avrg++)
  2001-04-04 12:13     ` christophe barbe
  2001-04-04 12:53       ` Alan Cox
  2001-04-04 14:20       ` Paul Jakma
@ 2001-04-04 22:39       ` Tim Wright
  2 siblings, 0 replies; 20+ messages in thread
From: Tim Wright @ 2001-04-04 22:39 UTC (permalink / raw)
  To: christophe barbe; +Cc: linux-kernel

On Wed, Apr 04, 2001 at 02:13:49PM +0200, christophe barbe wrote:
> The sleep should certainly be interruptible and I that's what I said to the GFS guy.
> But what the reason to increment the load average for each D process ?
> 

OK, the Unix history goes something like this. Synchronization was achieved
using two primitives, sleep() and wakeup(). These guys rendezvous'd on a
wait channel, which was simply an 'int', and by convention was actually the
address of a data structure (yes I know int and pointers aren't the same, this
is a long time ago, OK ? :-).
Anyway, when you called sleep, you also had an associated priority. Priority
values less than PZERO were "high" priority, and >= PZERO were "low" priority.
sleeping above PZERO was interruptible, and processes sleeping at this priority
did not count towards the load. The idea was to use this for events that
potentially might never happen. Sleeping at a priority < PZERO was intended
to be used for things that are absolutely 100% guaranteed to happen, preferably
sometime very soon. Disk I/O (real disks, not NFS) fell into this category,
and hence it counts towards the load since this could be deemed a "fast wait"
state, and the process is nominally runnable. All a bit hand-wavy I know, but
it worked well enough.

The really important part of all this is that you should never sleep
uninterruptibly for anything that you cannot absolutely guarantee will happen,
otherwise you wind up with a stuck process.

Regards,

Tim


-- 
Tim Wright - timw@splhi.com or timw@aracnet.com or twright@us.ibm.com
IBM Linux Technology Center, Beaverton, Oregon
Interested in Linux scalability ? Look at http://lse.sourceforge.net/
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
  2001-04-03 16:13   ` Trevor Nichols
  2001-04-03 18:04     ` J Sloan
@ 2001-04-05 15:47     ` Christian Pernegger
  1 sibling, 0 replies; 20+ messages in thread
From: Christian Pernegger @ 2001-04-05 15:47 UTC (permalink / raw)
  To: linux-kernel

> Its a kernel bug if it gets stuck like this. You need to provide more info
> though - what file system, what devices, how much memory. Also ps can give you
> the wait address of a process stuck in 'D' state which is valuable for debug

Let's see if I'm getting this right, processes in D state should be killable?

I do not know if this is related, but I had two occurrences of those within
the last 48 hours, albeit on 2.2.18. 

1. starting tin (1.4.1) as a user - nothing happened, but the ssh session froze.
   Same on second try and a third with mutt (1.2.5) The three processes ended up
   D and unkillable by root. A few seconds later the sytem became totally
   unresponsive, with the kernel spewing 'VM: do_try_to_free_pages faild for
   cupsd' (sp?) at top speed... reboot.

2. The cups (1.1.4) usb backend ended up in this state after I did a
   'rmmod printer; insmod printer'

Regards
	Christian


Kernel: linux-2.2.18 + raid-2.2.17-A0

System: Pentium III 600 w/ 256MB RAM

hard disks: 
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: IBM      Model: DDRS-34560D      Rev: DC1B
  Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 01 Lun: 00
  Vendor: YAMAHA   Model: CRW4260          Rev: 1.0q
  Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi0 Channel: 00 Id: 08 Lun: 00
  Vendor: QUANTUM  Model: ATLAS_V_18_WLS   Rev: 0230
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 09 Lun: 00
  Vendor: QUANTUM  Model: ATLAS_V_18_WLS   Rev: 0230
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 10 Lun: 00
  Vendor: QUANTUM  Model: ATLAS_V_18_WLS   Rev: 0230
  Type:   Direct-Access                    ANSI SCSI revision: 03

RAID info:
Personalities : [raid5] 
read_ahead 1024 sectors
md0 : active raid5 sdd1[2] sdc1[1] sdb1[0] 35069824 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
unused devices: <none>

Layout of RAID disks (sdb-sdd):
Disk /dev/sdb: 64 heads, 32 sectors, 17510 cylinders
Units = cylinders of 2048 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1           387     17510  17534976   fd  Linux raid autodetect
/dev/sdb3           130       386    263168   83  Linux
/dev/sdb4             1       129    132095+  82  Linux swap

lspci -vvv:
00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
00:08.0 VGA compatible controller: Cirrus Logic GD 5430/40 [Alpine] (rev 48)
00:09.0 SCSI storage controller: Adaptec 7892A (rev 02)
00:0b.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
jesus:/raid/home/chris# lspci -vvv
00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03)
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
        Latency: 64 set
        Region 0: Memory at d8000000 (32-bit, prefetchable)
        Capabilities: [a0] AGP version 1.0
                Status: RQ=31 SBA+ 64bit- FW- Rate=21
                Command: RQ=0 SBA- AGP- 64bit- FW- Rate=

00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B-
        Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 set
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
        I/O behind bridge: 0000d000-0000dfff
        Memory behind bridge: fff00000-000fffff
        Prefetchable memory behind bridge: fff00000-000fffff
        BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B+

00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
        Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 0 set

00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01) (prog-if 80 [Master])
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Region 4: I/O ports at f000 [disabled]

00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01) (prog-if 00 [UHCI])
        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 64 set
        Interrupt: pin D routed to IRQ 15
        Region 4: I/O ports at e000

00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 02)
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:08.0 VGA compatible controller: Cirrus Logic GD 5430/40 [Alpine] (rev 48) (prog-if 00 [VGA])
        Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at dc000000 (32-bit, prefetchable)
        Expansion ROM at dd000000 [disabled]

00:09.0 SCSI storage controller: Adaptec 7892A (rev 02)
        Subsystem: Adaptec: Unknown device e2a0
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 40 min, 25 max, 64 set, cache line size 08
        Interrupt: pin A routed to IRQ 10
        BIST result: 00
        Region 0: I/O ports at e400
        Region 1: Memory at e0001000 (64-bit, non-prefetchable)
        Expansion ROM at de000000 [disabled]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- AuxPwr- DSI- D1- D2- PME-
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0b.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink] (rev 74)
        Subsystem: 3Com Corporation: Unknown device 1000
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
        Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
        Latency: 10 min, 10 max, 64 set, cache line size 08
        Interrupt: pin A routed to IRQ 15
        Region 0: I/O ports at e800
        Region 1: Memory at e0000000 (32-bit, non-prefetchable)
        Expansion ROM at df000000 [disabled]
        Capabilities: [dc] Power Management version 2
                Flags: PMEClk- AuxPwr- DSI- D1+ D2+ PME+
                Status: D0 PME-Enable+ DSel=0 DScale=2 PME-

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: uninteruptable sleep
  2001-04-03 13:32 ` Stephen E. Clark
@ 2001-04-08  1:56   ` Anton Blanchard
  0 siblings, 0 replies; 20+ messages in thread
From: Anton Blanchard @ 2001-04-08  1:56 UTC (permalink / raw)
  To: Stephen E. Clark; +Cc: linux-kernel


> That happened to me with 2.4.2-ac28 when I tried using DRM.
> I also got the following messages in syslog.
> 
> /var/log/messages.1:Mar 31 12:15:04 joker kernel:
> [drm:r128_do_wait_for_fifo] *ERROR* r128_do_wait_for_fifo failed!

You need to replace down(...->mmap_sem), up(...->mmap_sem) with
down_write(...), up_write(...) in the X11 r128 drm kernel module.

Anton

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2001-04-08  2:00 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-04-03 16:40 uninteruptable sleep Manfred Spraul
2001-04-04  7:47 ` uninteruptable sleep (D state => load_avrg++) christophe barbe
2001-04-04 11:15   ` Alan Cox
2001-04-04 12:13     ` christophe barbe
2001-04-04 12:53       ` Alan Cox
2001-04-04 14:20       ` Paul Jakma
2001-04-04 14:48         ` christophe barbe
2001-04-04 15:05           ` Paul Jakma
2001-04-04 15:15             ` christophe barbe
2001-04-04 22:39       ` Tim Wright
2001-04-04 16:07 ` uninteruptable sleep christophe barbe
  -- strict thread matches above, loose matches on Subject: below --
2001-04-03 13:08 Trevor Nichols
2001-04-03 13:32 ` Stephen E. Clark
2001-04-08  1:56   ` Anton Blanchard
2001-04-03 14:35 ` Alan Cox
2001-04-03 16:13   ` Trevor Nichols
2001-04-03 18:04     ` J Sloan
2001-04-03 23:09       ` Trevor Nichols
2001-04-04 19:30         ` andersg
2001-04-05 15:47     ` Christian Pernegger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox