public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch] O(1) scheduler, -H5
@ 2002-01-11  0:38 Ingo Molnar
  2002-01-11 11:31 ` Russell King
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2002-01-11  0:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


the -H5 patch adds a debugging check:

    http://redhat.com/~mingo/O(1)-scheduler/sched-O1-2.5.2-pre11-H5.patch

it adds code to catch places that call schedule() from global-cli()
sections. Right now release_kernel_lock() doesnt automatically release the
IRQ lock if there is no kernel lock held. A fair amount of code does this
still, and i think we should fix them in 2.5.

(Such code, while of questionable quality, is safe if it also holds the
big kernel lock, but it's definitely SMP-unsafe it doesnt hold the bkl -
the BUG() assert only catches the later case.)

(Andi Kleen noticed this on the first day the patch was released, and
Andrew Morton reminded me today that i forgot to fix it ... :-| )

my systems do not trigger the BUG(), so there cannot be all that much
broken code left.

	Ingo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
  2002-01-11  0:38 Ingo Molnar
@ 2002-01-11 11:31 ` Russell King
  2002-01-11 12:05   ` David S. Miller
  2002-01-11 13:09   ` Alan Cox
  0 siblings, 2 replies; 11+ messages in thread
From: Russell King @ 2002-01-11 11:31 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel

On Fri, Jan 11, 2002 at 01:38:51AM +0100, Ingo Molnar wrote:
> it adds code to catch places that call schedule() from global-cli()
> sections. Right now release_kernel_lock() doesnt automatically release the
> IRQ lock if there is no kernel lock held. A fair amount of code does this
> still, and i think we should fix them in 2.5.

The serial driver (old or new) open/close functions are one of the worst
offenders of the global-cli-and-hold-kernel-lock-and-schedule problem.
I'm currently working on fixing this in the new serial driver.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
       [not found] ` <20020111113131.C30756@flint.arm.linux.org.uk.suse.lists.linux.kernel>
@ 2002-01-11 11:42   ` Andi Kleen
  2002-01-11 11:57     ` Russell King
  2002-01-11 12:00     ` Arjan van de Ven
  0 siblings, 2 replies; 11+ messages in thread
From: Andi Kleen @ 2002-01-11 11:42 UTC (permalink / raw)
  To: Russell King; +Cc: linux-kernel

Russell King <rmk@arm.linux.org.uk> writes:
> 
> The serial driver (old or new) open/close functions are one of the worst
> offenders of the global-cli-and-hold-kernel-lock-and-schedule problem.
> I'm currently working on fixing this in the new serial driver.

When they hold the kernel lock in addition to the global cli() before
schedule() it should be ok. Only the behaviour of code not holding
kernel lock but global cli and calling schedule() has changed.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
  2002-01-11 11:42   ` [patch] O(1) scheduler, -H5 Andi Kleen
@ 2002-01-11 11:57     ` Russell King
  2002-01-11 12:00     ` Arjan van de Ven
  1 sibling, 0 replies; 11+ messages in thread
From: Russell King @ 2002-01-11 11:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On Fri, Jan 11, 2002 at 12:42:14PM +0100, Andi Kleen wrote:
> When they hold the kernel lock in addition to the global cli() before
> schedule() it should be ok. Only the behaviour of code not holding
> kernel lock but global cli and calling schedule() has changed.

Agreed, however, there is one thing that has bugged me for a long time
(and which I believe is causing someone a problem at the moment) - when
we shut down a port, we're holding the BKL, and have global IRQs disabled.
We unhook the port from the serial drivers chain, and maybe free and
reclaim the IRQ with a different handler, and then disable the IRQ from
the port in question.

If we happen to schedule within request_irq, it doesn't take too much
imagination to see that Bad Things can happen.

(There is a report of complete lockup, and re-ordering stuff around here
fixes the problem, but the example patch changed a number of things, and
I'm trying to work towards a proper solution).

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
  2002-01-11 11:42   ` [patch] O(1) scheduler, -H5 Andi Kleen
  2002-01-11 11:57     ` Russell King
@ 2002-01-11 12:00     ` Arjan van de Ven
  1 sibling, 0 replies; 11+ messages in thread
From: Arjan van de Ven @ 2002-01-11 12:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:
> 
> Russell King <rmk@arm.linux.org.uk> writes:
> >
> > The serial driver (old or new) open/close functions are one of the worst
> > offenders of the global-cli-and-hold-kernel-lock-and-schedule problem.
> > I'm currently working on fixing this in the new serial driver.
> 
> When they hold the kernel lock in addition to the global cli() before
> schedule() it should be ok. Only the behaviour of code not holding
> kernel lock but global cli and calling schedule() has changed.

well the biggest serial.c offender is block_til_ready of course...
oh and there's quite some dusty old code that does

save_flags();
cli();

while (some_condition)
	sleep_on(&queue);

eg not re-disabling interrupts after the sleep_on().....
to the point where just about every use of
sleep_on/interruptible_sleep_on is buggy
except in serial.c ;(

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
  2002-01-11 11:31 ` Russell King
@ 2002-01-11 12:05   ` David S. Miller
  2002-01-11 13:09   ` Alan Cox
  1 sibling, 0 replies; 11+ messages in thread
From: David S. Miller @ 2002-01-11 12:05 UTC (permalink / raw)
  To: rmk; +Cc: mingo, torvalds, linux-kernel

   From: Russell King <rmk@arm.linux.org.uk>
   Date: Fri, 11 Jan 2002 11:31:31 +0000
   
   The serial driver (old or new) open/close functions are one of the worst
   offenders of the global-cli-and-hold-kernel-lock-and-schedule problem.
   I'm currently working on fixing this in the new serial driver.

The tty layer is really the only layer that hasn't had any
"SMP love and care" given to it.

Last time I looked at trying to do something it appeared you could fix
the whole thing up with a semaphore for the user bits and a spinlock
with which to interlock with the drivers.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
  2002-01-11 11:31 ` Russell King
  2002-01-11 12:05   ` David S. Miller
@ 2002-01-11 13:09   ` Alan Cox
  2002-01-11 14:58     ` Russell King
  1 sibling, 1 reply; 11+ messages in thread
From: Alan Cox @ 2002-01-11 13:09 UTC (permalink / raw)
  To: Russell King; +Cc: Ingo Molnar, Linus Torvalds, linux-kernel

> The serial driver (old or new) open/close functions are one of the worst
> offenders of the global-cli-and-hold-kernel-lock-and-schedule problem.
> I'm currently working on fixing this in the new serial driver.

Someone fixed serial.c to use spinlocks a long while ago. Its just not
merged

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
  2002-01-11 13:09   ` Alan Cox
@ 2002-01-11 14:58     ` Russell King
  2002-01-11 15:22       ` Alan Cox
  0 siblings, 1 reply; 11+ messages in thread
From: Russell King @ 2002-01-11 14:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Fri, Jan 11, 2002 at 01:09:03PM +0000, Alan Cox wrote:
> > The serial driver (old or new) open/close functions are one of the worst
> > offenders of the global-cli-and-hold-kernel-lock-and-schedule problem.
> > I'm currently working on fixing this in the new serial driver.
> 
> Someone fixed serial.c to use spinlocks a long while ago. Its just not
> merged

Unfortunately it wasn't a simple "replace global irq with spinlocks" - some
code also got moved around so its not clear that the problem was fixed by
the spinlocks or the code reordering.  I'd rather know which it was.

In addition, holding a spinlock while calling request_irq is asking for
trouble.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
  2002-01-11 14:58     ` Russell King
@ 2002-01-11 15:22       ` Alan Cox
  2002-01-11 15:23         ` Russell King
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Cox @ 2002-01-11 15:22 UTC (permalink / raw)
  To: Russell King; +Cc: Alan Cox, linux-kernel

> Unfortunately it wasn't a simple "replace global irq with spinlocks" - some
> code also got moved around so its not clear that the problem was fixed by
> the spinlocks or the code reordering.  I'd rather know which it was.

The code re-ordering fixes the bug. The spinlocks are an unrelated change
that belong in a seperate diff.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
  2002-01-11 15:22       ` Alan Cox
@ 2002-01-11 15:23         ` Russell King
  0 siblings, 0 replies; 11+ messages in thread
From: Russell King @ 2002-01-11 15:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

On Fri, Jan 11, 2002 at 03:22:48PM +0000, Alan Cox wrote:
> > Unfortunately it wasn't a simple "replace global irq with spinlocks" - some
> > code also got moved around so its not clear that the problem was fixed by
> > the spinlocks or the code reordering.  I'd rather know which it was.
> 
> The code re-ordering fixes the bug. The spinlocks are an unrelated change
> that belong in a seperate diff.

This is at odds with the evidence at present:

| I'll give it a try, but from what I experienced in those days was that
| adding the _spinlock protection_ finally solved all.

He tried 2.4.17 without patch which locked up, and then he tried 2.4.17
with the patch, which didn't.  Unfortunately it contains both the reordering
and the spinlocking.

I'm trying to work with him at the moment to find out which change fixed
the problem.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [patch] O(1) scheduler, -H5
       [not found] <20020111091744.B1170@w-mikek2.des.beaverton.ibm.com>
@ 2002-01-13 17:18 ` Ingo Molnar
  0 siblings, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2002-01-13 17:18 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Linus Torvalds, linux-kernel


On Fri, 11 Jan 2002, Mike Kravetz wrote:

>         reacquire_kernel_lock(current);
>         if (unlikely(current->need_resched))
>                 goto need_resched_back;
>         return;
>
> The question is why do we reacquire the kernel lock before checking
> for need_resched?.  If it is not needed, we can save a lock/unlock
> cycle in the case of need_resched.

that code is something i discovered when trying to reduce scheduling
latencies for the very first lowlatency patchset i released. Back then,
1-2-3 years ago, kernel_lock usage was still common in the kernel. So it
often happened that tasks were spending more than 1 msec spinning for the
kernel lock, and often they did that in the reacquire code. So to reduce
latencies, i've added ->need_resched checking to kernel_lock() and
reacquire_kernel_lock() as well.

these days kernel_lock contention doesnt happen all that often, so i think
we should remove it from the 2.5 tree. I consider the preemptible kernel
patch to be the most advanced method to get low scheduling-latencies
anyway.

> This code isn't new with the O(1) scheduler, so I'm guessing there is
> a need to hold the kernel_lock when checking need_resched.  I just
> don't know what it is.

there is no need for it to be under the kernel_lock - i simply found it to
be an easy and common preemption point.

	Ingo


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-01-13 15:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.33.0201110130290.11478-100000@localhost.localdomain.suse.lists.linux.kernel>
     [not found] ` <20020111113131.C30756@flint.arm.linux.org.uk.suse.lists.linux.kernel>
2002-01-11 11:42   ` [patch] O(1) scheduler, -H5 Andi Kleen
2002-01-11 11:57     ` Russell King
2002-01-11 12:00     ` Arjan van de Ven
     [not found] <20020111091744.B1170@w-mikek2.des.beaverton.ibm.com>
2002-01-13 17:18 ` Ingo Molnar
2002-01-11  0:38 Ingo Molnar
2002-01-11 11:31 ` Russell King
2002-01-11 12:05   ` David S. Miller
2002-01-11 13:09   ` Alan Cox
2002-01-11 14:58     ` Russell King
2002-01-11 15:22       ` Alan Cox
2002-01-11 15:23         ` Russell King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox