linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* start_kernel(): bug: interrupts were enabled early
@ 2010-03-25 19:41 Rabin Vincent
  2010-03-31 20:40 ` Andrew Morton
  0 siblings, 1 reply; 43+ messages in thread
From: Rabin Vincent @ 2010-03-25 19:41 UTC (permalink / raw)
  To: lkml; +Cc: yinghai, hpa, penberg, cl

On latest git, I'm seeing "start_kernel(): bug: interrupts were enabled
early" messages on ARM (sample log below).

This appears to be caused by:

  start_kernel -> radix_tree_init -> kmem_cache_create (slub) ->
   down_write -> __down_write (lib/rwsem-spinlock.c) -> spin_unlock_irq

radix_tree_init was moved earlier by:

  commit 773e3eb7b81e5ba13b5155dfb3bb75b8ce37f8f9
  Author: Yinghai Lu <yinghai@kernel.org>
  Date:   Wed Feb 10 01:20:33 2010 -0800

      init: Move radix_tree_init() early

      Prepare for using radix trees in early_irq_init().

      Signed-off-by: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-30-git-send-email-yinghai@kernel.org>
      Signed-off-by: H. Peter Anvin <hpa@zytor.com>

Rabin

---
Uncompressing Linux... done, booting the kernel.
Linux version 2.6.34-rc2-00184-g01e7770 (rabin@debian) (gcc version 4.4.1 (Sourcery G++ Lite 2009q3-67) ) #5 Fri Mar 26 00:56:33 IST 2010
CPU: ARMv7 Processor [410fc080] revision 0 (ARMv7), cr=10c03c7f
CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
Machine: ARM-RealView PB-A8
Ignoring unrecognised tag 0x00000000
bootconsole [earlycon0] enabled
Memory policy: ECC disabled, Data cache writeback
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512
Kernel command line: earlyprintk mem=128M console=ttyAMA0 root=/dev/ram0 init=/linuxrc
PID hash table entries: 512 (order: -1, 2048 bytes)
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 128MB = 128MB total
Memory: 123916k/123916k available, 7156k reserved, 0K highmem
Virtual kernel memory layout:
    vector  : 0xffff0000 - 0xffff1000   (   4 kB)
    fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
    DMA     : 0xffc00000 - 0xffe00000   (   2 MB)
    vmalloc : 0xc8800000 - 0xf8000000   ( 760 MB)
    lowmem  : 0xc0000000 - 0xc8000000   ( 128 MB)
    modules : 0xbf000000 - 0xc0000000   (  16 MB)
      .init : 0xc0008000 - 0xc0331000   (3236 kB)
      .text : 0xc0331000 - 0xc040b000   ( 872 kB)
      .data : 0xc040c000 - 0xc0418180   (  49 kB)
SLUB: Genslabs=11, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Hierarchical RCU implementation.
RCU-based detection of stalled CPUs is enabled.
NR_IRQS:96
start_kernel(): bug: interrupts were enabled early
Calibrating delay loop... 419.43 BogoMIPS (lpj=2097152)
Mount-cache hash table entries: 512
CPU: Testing write buffer coherency: ok
devtmpfs: initialized
Serial: AMBA PL011 UART driver
dev:uart0: ttyAMA0 at MMIO 0x10009000 (irq = 44) is a AMBA/PL011
console [ttyAMA0] enabled, bootconsole disabled
console [ttyAMA0] enabled, bootconsole disabled
dev:uart1: ttyAMA1 at MMIO 0x1000a000 (irq = 45) is a AMBA/PL011
dev:uart2: ttyAMA2 at MMIO 0x1000b000 (irq = 46) is a AMBA/PL011
fpga:uart3: ttyAMA3 at MMIO 0x1000c000 (irq = 47) is a AMBA/PL011
Switching to clocksource timer3
Unpacking initramfs...
Freeing initrd memory: 1680K
VFP support v0.3: implementor 41 architecture 3 part 30 variant c rev 0
Freeing init memory: 3236K

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-25 19:41 start_kernel(): bug: interrupts were enabled early Rabin Vincent
@ 2010-03-31 20:40 ` Andrew Morton
  2010-03-31 20:47   ` Yinghai Lu
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2010-03-31 20:40 UTC (permalink / raw)
  To: Rabin Vincent
  Cc: lkml, yinghai, hpa, penberg, cl, Benjamin Herrenschmidt,
	linux-arch

On Fri, 26 Mar 2010 01:11:00 +0530
Rabin Vincent <rabin@rab.in> wrote:

> On latest git, I'm seeing "start_kernel(): bug: interrupts were enabled
> early" messages on ARM (sample log below).
> 
> This appears to be caused by:
> 
>   start_kernel -> radix_tree_init -> kmem_cache_create (slub) ->
>    down_write -> __down_write (lib/rwsem-spinlock.c) -> spin_unlock_irq
> 
> radix_tree_init was moved earlier by:
> 
>   commit 773e3eb7b81e5ba13b5155dfb3bb75b8ce37f8f9
>   Author: Yinghai Lu <yinghai@kernel.org>
>   Date:   Wed Feb 10 01:20:33 2010 -0800
> 
>       init: Move radix_tree_init() early
> 
>       Prepare for using radix trees in early_irq_init().
> 
>       Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>       LKML-Reference: <1265793639-15071-30-git-send-email-yinghai@kernel.org>
>       Signed-off-by: H. Peter Anvin <hpa@zytor.com>
> 

That's going to be hard to fix.

Once upon a time, enabling interrupts too early in boot would kill
powerpc boxes stone dead.  From the lack of noise I assume that this is
not happening in current kernels for some reason.

We have two checks in start_kernel():

	if (!irqs_disabled()) {
		printk(KERN_WARNING "start_kernel(): bug: interrupts were "
				"enabled *very* early, fixing it\n");
		local_irq_disable();
	}
	rcu_init();
	radix_tree_init();
	/* init some links before init_ISA_irqs() */
	early_irq_init();
	init_IRQ();
	prio_tree_init();
	init_timers();
	hrtimers_init();
	softirq_init();
	timekeeping_init();
	time_init();
	profile_init();
	if (!irqs_disabled())
		printk(KERN_CRIT "start_kernel(): bug: interrupts were "
				 "enabled early\n");

perhaps the second one isn't needed?  Perhaps no architecture requires
that local interrupts be disabled across the above initialisations?


I'll ask Rafael and Maciej to track this as a post-2.6.33 regression.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 20:40 ` Andrew Morton
@ 2010-03-31 20:47   ` Yinghai Lu
  2010-03-31 20:52     ` Andrew Morton
                       ` (4 more replies)
  0 siblings, 5 replies; 43+ messages in thread
From: Yinghai Lu @ 2010-03-31 20:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rabin Vincent, lkml, hpa, penberg, cl, Benjamin Herrenschmidt,
	linux-arch

On 03/31/2010 01:40 PM, Andrew Morton wrote:
> On Fri, 26 Mar 2010 01:11:00 +0530
> Rabin Vincent <rabin@rab.in> wrote:
> 
>> > On latest git, I'm seeing "start_kernel(): bug: interrupts were enabled
>> > early" messages on ARM (sample log below).
>> > 
>> > This appears to be caused by:
>> > 
>> >   start_kernel -> radix_tree_init -> kmem_cache_create (slub) ->
>> >    down_write -> __down_write (lib/rwsem-spinlock.c) -> spin_unlock_irq
>> > 
>> > radix_tree_init was moved earlier by:
>> > 
>> >   commit 773e3eb7b81e5ba13b5155dfb3bb75b8ce37f8f9
>> >   Author: Yinghai Lu <yinghai@kernel.org>
>> >   Date:   Wed Feb 10 01:20:33 2010 -0800
>> > 
>> >       init: Move radix_tree_init() early
>> > 
>> >       Prepare for using radix trees in early_irq_init().
>> > 
>> >       Signed-off-by: Yinghai Lu <yinghai@kernel.org>
>> >       LKML-Reference: <1265793639-15071-30-git-send-email-yinghai@kernel.org>
>> >       Signed-off-by: H. Peter Anvin <hpa@zytor.com>
>> > 
> That's going to be hard to fix.
> 
> Once upon a time, enabling interrupts too early in boot would kill
> powerpc boxes stone dead.  From the lack of noise I assume that this is
> not happening in current kernels for some reason.
> 
> We have two checks in start_kernel():
> 
> 	if (!irqs_disabled()) {
> 		printk(KERN_WARNING "start_kernel(): bug: interrupts were "
> 				"enabled *very* early, fixing it\n");
> 		local_irq_disable();
> 	}
> 	rcu_init();
> 	radix_tree_init();
> 	/* init some links before init_ISA_irqs() */
> 	early_irq_init();
> 	init_IRQ();
> 	prio_tree_init();
> 	init_timers();
> 	hrtimers_init();
> 	softirq_init();
> 	timekeeping_init();
> 	time_init();
> 	profile_init();
> 	if (!irqs_disabled())
> 		printk(KERN_CRIT "start_kernel(): bug: interrupts were "
> 				 "enabled early\n");
> 
> perhaps the second one isn't needed?  Perhaps no architecture requires
> that local interrupts be disabled across the above initialisations?
	
spin_unlock_irq from arm is different from other archs?

YH

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 20:47   ` Yinghai Lu
@ 2010-03-31 20:52     ` Andrew Morton
  2010-03-31 21:12       ` H. Peter Anvin
  2010-03-31 21:01     ` Matthew Wilcox
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2010-03-31 20:52 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Rabin Vincent, lkml, hpa, penberg, cl, Benjamin Herrenschmidt,
	linux-arch

On Wed, 31 Mar 2010 13:47:23 -0700
Yinghai Lu <yinghai@kernel.org> wrote:

> spin_unlock_irq from arm is different from other archs?

No, spin_unlock_irq() unconditionally enables interrupts on all
architectures.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 20:47   ` Yinghai Lu
  2010-03-31 20:52     ` Andrew Morton
@ 2010-03-31 21:01     ` Matthew Wilcox
  2010-03-31 21:05       ` H. Peter Anvin
  2010-03-31 21:05     ` Russell King
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2010-03-31 21:01 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Rabin Vincent, lkml, hpa, penberg, cl,
	Benjamin Herrenschmidt, linux-arch

On Wed, Mar 31, 2010 at 01:47:23PM -0700, Yinghai Lu wrote:
> >> > This appears to be caused by:
> >> > 
> >> >   start_kernel -> radix_tree_init -> kmem_cache_create (slub) ->
> >> >    down_write -> __down_write (lib/rwsem-spinlock.c) -> spin_unlock_irq
> >> > 
> > That's going to be hard to fix.
> > 
> spin_unlock_irq from arm is different from other archs?

Not all arches use lib/rwsem-spinlock.c.  In particular, x86 doesn't
when X86_XADD is set.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:01     ` Matthew Wilcox
@ 2010-03-31 21:05       ` H. Peter Anvin
  2010-03-31 21:17         ` Matthew Wilcox
  0 siblings, 1 reply; 43+ messages in thread
From: H. Peter Anvin @ 2010-03-31 21:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Yinghai Lu, Andrew Morton, Rabin Vincent, lkml, penberg, cl,
	Benjamin Herrenschmidt, linux-arch

On 03/31/2010 02:01 PM, Matthew Wilcox wrote:
> On Wed, Mar 31, 2010 at 01:47:23PM -0700, Yinghai Lu wrote:
>>>>> This appears to be caused by:
>>>>>
>>>>>   start_kernel -> radix_tree_init -> kmem_cache_create (slub) ->
>>>>>    down_write -> __down_write (lib/rwsem-spinlock.c) -> spin_unlock_irq
>>>>>
>>> That's going to be hard to fix.
>>>
>> spin_unlock_irq from arm is different from other archs?
> 
> Not all arches use lib/rwsem-spinlock.c.  In particular, x86 doesn't
> when X86_XADD is set.
> 

What I note is that lib/rwsem-spinlock.c seems to be rather inconsistent
in its use of spin_lock_irqsave/spin_lock_irqrestore versus
spin_lock_irq/spin_unlock_irq... in fact, __down_read is the *only*
place where we use the latter as opposed to the former.

Is that a bug?  If so, it would certainly explain this behavior.

	-hpa

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 20:47   ` Yinghai Lu
  2010-03-31 20:52     ` Andrew Morton
  2010-03-31 21:01     ` Matthew Wilcox
@ 2010-03-31 21:05     ` Russell King
  2010-03-31 21:08       ` H. Peter Anvin
  2010-03-31 22:31     ` Benjamin Herrenschmidt
  2010-03-31 22:58     ` David Howells
  4 siblings, 1 reply; 43+ messages in thread
From: Russell King @ 2010-03-31 21:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Rabin Vincent, lkml, hpa, penberg, cl,
	Benjamin Herrenschmidt, linux-arch

On Wed, Mar 31, 2010 at 01:47:23PM -0700, Yinghai Lu wrote:
> On 03/31/2010 01:40 PM, Andrew Morton wrote:
> > We have two checks in start_kernel():
> > 
> > 	if (!irqs_disabled()) {
> > 		printk(KERN_WARNING "start_kernel(): bug: interrupts were "
> > 				"enabled *very* early, fixing it\n");
> > 		local_irq_disable();
> > 	}
> > 	rcu_init();
> > 	radix_tree_init();
> > 	/* init some links before init_ISA_irqs() */
> > 	early_irq_init();
> > 	init_IRQ();
> > 	prio_tree_init();
> > 	init_timers();
> > 	hrtimers_init();
> > 	softirq_init();
> > 	timekeeping_init();
> > 	time_init();
> > 	profile_init();
> > 	if (!irqs_disabled())
> > 		printk(KERN_CRIT "start_kernel(): bug: interrupts were "
> > 				 "enabled early\n");
> > 
> > perhaps the second one isn't needed?  Perhaps no architecture requires
> > that local interrupts be disabled across the above initialisations?
> 	
> spin_unlock_irq from arm is different from other archs?

We use the standard generic kernel implementation.  Is x86 different? ;)

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:05     ` Russell King
@ 2010-03-31 21:08       ` H. Peter Anvin
  0 siblings, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2010-03-31 21:08 UTC (permalink / raw)
  To: Yinghai Lu, Andrew Morton, Rabin Vincent, lkml, penberg, cl,
	Benjamin Herrenschmidt, linux-arch

On 03/31/2010 02:05 PM, Russell King wrote:
> 
> We use the standard generic kernel implementation.  Is x86 different? ;)
> 

Yes, these are optimized on x86.

	-hpa

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 20:52     ` Andrew Morton
@ 2010-03-31 21:12       ` H. Peter Anvin
  2010-03-31 21:28         ` Andrew Morton
  2010-04-01 16:13         ` Linus Torvalds
  0 siblings, 2 replies; 43+ messages in thread
From: H. Peter Anvin @ 2010-03-31 21:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yinghai Lu, Rabin Vincent, lkml, penberg, cl,
	Benjamin Herrenschmidt, linux-arch, David Howells, Linus Torvalds

On 03/31/2010 01:52 PM, Andrew Morton wrote:
> On Wed, 31 Mar 2010 13:47:23 -0700
> Yinghai Lu <yinghai@kernel.org> wrote:
> 
>> spin_unlock_irq from arm is different from other archs?
> 
> No, spin_unlock_irq() unconditionally enables interrupts on all
> architectures.

So I found checkin 60ba96e546da45d9e22bb04b84971a25684e4d46 in the
bk-historic git tree:

[PATCH] rwsem: Make rwsems use interrupt disabling spinlocks

The attached patch makes read/write semaphores use interrupt disabling
spinlocks in the slow path, thus rendering the up functions and trylock
functions available for use in interrupt context.  This matches the
regular semaphore behaviour.

I've assumed that the normal down functions must be called with
interrupts enabled (since they might schedule), and used the
irq-disabling spinlock variants that don't save the flags.

Signed-Off-By: David Howells <dhowells@redhat.com>
Tested-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

What we have here is a case of this assumption being violated, because
the lock is taken with interrupts disabled on a path where contention
cannot happen (because the code is single-threaded at this point), but
the lock is taken due to reuse of generic code.

The obvious way to fix this would be to use
spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
other locations; I don't have a good feel for what the cost of doing so
would be, though.  On x86 it's fairly expensive simply because the only
way to save the state is to push it on the stack, which the compiler
doesn't deal well with, but this code isn't used on x86.

	-hpa

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:05       ` H. Peter Anvin
@ 2010-03-31 21:17         ` Matthew Wilcox
  2010-03-31 21:42           ` Christoph Lameter
  0 siblings, 1 reply; 43+ messages in thread
From: Matthew Wilcox @ 2010-03-31 21:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Andrew Morton, Rabin Vincent, lkml, penberg, cl,
	Benjamin Herrenschmidt, linux-arch

On Wed, Mar 31, 2010 at 02:05:00PM -0700, H. Peter Anvin wrote:
> What I note is that lib/rwsem-spinlock.c seems to be rather inconsistent
> in its use of spin_lock_irqsave/spin_lock_irqrestore versus
> spin_lock_irq/spin_unlock_irq... in fact, __down_read is the *only*
> place where we use the latter as opposed to the former.
> 
> Is that a bug?  If so, it would certainly explain this behavior.

It's based on down_read() and down_write() not being callable from
interrupt context, or with interrupts disabled (since they can sleep).
up_read(), up_write(), down_read_trylock(), down_write_trylock(),
downgrade_write() can all be called from interrupt context since they
cannot sleep.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:12       ` H. Peter Anvin
@ 2010-03-31 21:28         ` Andrew Morton
  2010-03-31 22:35           ` Benjamin Herrenschmidt
  2010-04-01 16:13         ` Linus Torvalds
  1 sibling, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2010-03-31 21:28 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Yinghai Lu, Rabin Vincent, lkml, penberg, cl,
	Benjamin Herrenschmidt, linux-arch, David Howells, Linus Torvalds

On Wed, 31 Mar 2010 14:12:54 -0700
"H. Peter Anvin" <hpa@zytor.com> wrote:

> On 03/31/2010 01:52 PM, Andrew Morton wrote:
> > On Wed, 31 Mar 2010 13:47:23 -0700
> > Yinghai Lu <yinghai@kernel.org> wrote:
> > 
> >> spin_unlock_irq from arm is different from other archs?
> > 
> > No, spin_unlock_irq() unconditionally enables interrupts on all
> > architectures.
> 
> So I found checkin 60ba96e546da45d9e22bb04b84971a25684e4d46 in the
> bk-historic git tree:
> 
> [PATCH] rwsem: Make rwsems use interrupt disabling spinlocks
> 
> The attached patch makes read/write semaphores use interrupt disabling
> spinlocks in the slow path, thus rendering the up functions and trylock
> functions available for use in interrupt context.  This matches the
> regular semaphore behaviour.
> 
> I've assumed that the normal down functions must be called with
> interrupts enabled (since they might schedule), and used the
> irq-disabling spinlock variants that don't save the flags.
> 
> Signed-Off-By: David Howells <dhowells@redhat.com>
> Tested-by: Badari Pulavarty <pbadari@us.ibm.com>
> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
> 
> What we have here is a case of this assumption being violated, because
> the lock is taken with interrupts disabled on a path where contention
> cannot happen (because the code is single-threaded at this point), but
> the lock is taken due to reuse of generic code.
> 
> The obvious way to fix this would be to use
> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
> other locations; I don't have a good feel for what the cost of doing so
> would be, though.  On x86 it's fairly expensive simply because the only
> way to save the state is to push it on the stack, which the compiler
> doesn't deal well with, but this code isn't used on x86.
> 

Well, it's all a bit nasty.  kmem_cache_create() does a lot of stuff,
including calling into the page allocator with GFP_KERNEL - expecting
kmem_cache_create() to preserve local_irq_disable() is a bit optimistic.

radix_tree_init() calls hotcpu_notifier() which also does
mutex_lock(&cpu_add_remove_lock);

The easiest fix is to reposition the interrutps-are-now-enabled point
in start_kernel().  But I have a feeling that some versions of
early_irq_init() won't like that.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:17         ` Matthew Wilcox
@ 2010-03-31 21:42           ` Christoph Lameter
  2010-03-31 21:54             ` Russell King
  2010-03-31 22:36             ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 43+ messages in thread
From: Christoph Lameter @ 2010-03-31 21:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: H. Peter Anvin, Yinghai Lu, Andrew Morton, Rabin Vincent, lkml,
	penberg, Benjamin Herrenschmidt, linux-arch

On Wed, 31 Mar 2010, Matthew Wilcox wrote:

> On Wed, Mar 31, 2010 at 02:05:00PM -0700, H. Peter Anvin wrote:
> > What I note is that lib/rwsem-spinlock.c seems to be rather inconsistent
> > in its use of spin_lock_irqsave/spin_lock_irqrestore versus
> > spin_lock_irq/spin_unlock_irq... in fact, __down_read is the *only*
> > place where we use the latter as opposed to the former.
> >
> > Is that a bug?  If so, it would certainly explain this behavior.
>
> It's based on down_read() and down_write() not being callable from
> interrupt context, or with interrupts disabled (since they can sleep).
> up_read(), up_write(), down_read_trylock(), down_write_trylock(),
> downgrade_write() can all be called from interrupt context since they
> cannot sleep.

Do not run the checks while we are in a single threaded context?

I thought we had some dynamic code patching thingamy that could change
those when we go to smp mode?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:42           ` Christoph Lameter
@ 2010-03-31 21:54             ` Russell King
  2010-03-31 21:57               ` H. Peter Anvin
  2010-03-31 22:36             ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 43+ messages in thread
From: Russell King @ 2010-03-31 21:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matthew Wilcox, H. Peter Anvin, Yinghai Lu, Andrew Morton,
	Rabin Vincent, lkml, penberg, Benjamin Herrenschmidt, linux-arch

On Wed, Mar 31, 2010 at 04:42:25PM -0500, Christoph Lameter wrote:
> Do not run the checks while we are in a single threaded context?
> 
> I thought we had some dynamic code patching thingamy that could change
> those when we go to smp mode?

You have to remember that on embedded architectures, such as ARM,
where XIP is supported we can't change the text segment at run time -
which means dynamic code patching won't work.

However, the kernel should still work in such situations.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:54             ` Russell King
@ 2010-03-31 21:57               ` H. Peter Anvin
  2010-03-31 22:30                 ` Russell King
  2010-03-31 22:37                 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 43+ messages in thread
From: H. Peter Anvin @ 2010-03-31 21:57 UTC (permalink / raw)
  To: Christoph Lameter, Matthew Wilcox, Yinghai Lu, Andrew Morton,
	Rabin Vincent, lkml, penberg, Benjamin Herrenschmidt, linux-arch

On 03/31/2010 02:54 PM, Russell King wrote:
> On Wed, Mar 31, 2010 at 04:42:25PM -0500, Christoph Lameter wrote:
>> Do not run the checks while we are in a single threaded context?
>>
>> I thought we had some dynamic code patching thingamy that could change
>> those when we go to smp mode?
> 
> You have to remember that on embedded architectures, such as ARM,
> where XIP is supported we can't change the text segment at run time -
> which means dynamic code patching won't work.
> 
> However, the kernel should still work in such situations.
> 

The question still remains what the incremental cost is of doing
irqsave/irqrestore.

	-hpa

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01  1:17                     ` Benjamin Herrenschmidt
@ 2010-03-31 22:26                       ` Andrew Morton
  2010-04-01  6:26                         ` H. Peter Anvin
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2010-03-31 22:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: H. Peter Anvin, Christoph Lameter, Matthew Wilcox, Yinghai Lu,
	Rabin Vincent, lkml, penberg, linux-arch

On Thu, 01 Apr 2010 12:17:11 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Wed, 2010-03-31 at 15:49 -0700, Andrew Morton wrote:
> > 
> > But these things are all utterly gross.  The bottom line is that
> > radix_tree_init() is manifestly unsuited to being called with local
> > interrupts disabled.  773e3eb7b81e5ba13b5155dfb3bb75b8ce37f8f9 was
> > just a wrong patch. 
> 
> Except that powerpc (and now it seems x86) both want to use radix trees
> for interrupt handling... At least on powerpc, we trick and use a linear
> search until the radix trees are initialized, which we do later during
> boot, but that somewhat sucks.
> 
> I believe sherry picking things like not calling radix_tree_init() is
> going to fix one case today, until we have another one, and another one,
> and etc...
> 
> I suspect we're better off fixing the root of the problem in down/up.
> 

Not by adding overhead to every single down_read()/down_write() just to
fix a once-off startup problem - that's taking laziness way too far.

We'd be better off hacking a kmem_cache_create() special case to avoid
taking the rwsem.  Add SLAB_I_SUCK to `flags' perhaps.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:57               ` H. Peter Anvin
@ 2010-03-31 22:30                 ` Russell King
  2010-03-31 22:37                 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 43+ messages in thread
From: Russell King @ 2010-03-31 22:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Lameter, Matthew Wilcox, Yinghai Lu, Andrew Morton,
	Rabin Vincent, lkml, penberg, Benjamin Herrenschmidt, linux-arch

On Wed, Mar 31, 2010 at 02:57:20PM -0700, H. Peter Anvin wrote:
> On 03/31/2010 02:54 PM, Russell King wrote:
> > On Wed, Mar 31, 2010 at 04:42:25PM -0500, Christoph Lameter wrote:
> >> Do not run the checks while we are in a single threaded context?
> >>
> >> I thought we had some dynamic code patching thingamy that could change
> >> those when we go to smp mode?
> > 
> > You have to remember that on embedded architectures, such as ARM,
> > where XIP is supported we can't change the text segment at run time -
> > which means dynamic code patching won't work.
> > 
> > However, the kernel should still work in such situations.
> > 
> 
> The question still remains what the incremental cost is of doing
> irqsave/irqrestore.

Compared to irq disable/enable, it wouldn't be much higher; saving
can be done by a direct register to register move, so that should be
relatively cheap.  The restore may be a little bit more depending on
the CPU arch version, but not significant.

So there shouldn't be a problem from ARM POV to switch to using
irqsave/irqrestore.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 20:47   ` Yinghai Lu
                       ` (2 preceding siblings ...)
  2010-03-31 21:05     ` Russell King
@ 2010-03-31 22:31     ` Benjamin Herrenschmidt
  2010-03-31 22:36       ` H. Peter Anvin
  2010-03-31 22:58     ` David Howells
  4 siblings, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-31 22:31 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Andrew Morton, Rabin Vincent, lkml, hpa, penberg, cl, linux-arch

On Wed, 2010-03-31 at 13:47 -0700, Yinghai Lu wrote:
> > perhaps the second one isn't needed?  Perhaps no architecture
> requires
> > that local interrupts be disabled across the above initialisations?
>         
> spin_unlock_irq from arm is different from other archs?

No, it's not, it will enable IRQs and thats illegal to do so early
during boot. We've been over that one again and again, the problem is
that people want to keep using that instead of irqsave/restore because
it's a nano-optimisation on x86... oh well...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:28         ` Andrew Morton
@ 2010-03-31 22:35           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-31 22:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: H. Peter Anvin, Yinghai Lu, Rabin Vincent, lkml, penberg, cl,
	linux-arch, David Howells, Linus Torvalds

On Wed, 2010-03-31 at 14:28 -0700, Andrew Morton wrote:
> 
> Well, it's all a bit nasty.  kmem_cache_create() does a lot of stuff,
> including calling into the page allocator with GFP_KERNEL - expecting
> kmem_cache_create() to preserve local_irq_disable() is a bit
> optimistic.

Well, the sl*b allocator -has- been modified to avoid enabling IRQs
early, at least I remember we did that when we moved it to be
initialized earlier.

> radix_tree_init() calls hotcpu_notifier() which also does
> mutex_lock(&cpu_add_remove_lock);
> 
> The easiest fix is to reposition the interrutps-are-now-enabled point
> in start_kernel().  But I have a feeling that some versions of
> early_irq_init() won't like that.

Yeah that won't work. Interrupts must not be enabled before at least
init_IRQ() and time_init(). The problem is that until all these guys
have gone through their initializations, there may be pending spurrious
crap coming from the HW (timers, external IRQs, profile IRQs) due to
such HW not yet properly "sanitized" by the kernel.

Plenty of archs have those assumptions wired in. I don't think moving
the IRQ enable point earlier is the right approach.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 22:31     ` Benjamin Herrenschmidt
@ 2010-03-31 22:36       ` H. Peter Anvin
  0 siblings, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2010-03-31 22:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Yinghai Lu, Andrew Morton, Rabin Vincent, lkml, penberg, cl,
	linux-arch

On 03/31/2010 03:31 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2010-03-31 at 13:47 -0700, Yinghai Lu wrote:
>>> perhaps the second one isn't needed?  Perhaps no architecture
>> requires
>>> that local interrupts be disabled across the above initialisations?
>>         
>> spin_unlock_irq from arm is different from other archs?
> 
> No, it's not, it will enable IRQs and thats illegal to do so early
> during boot. We've been over that one again and again, the problem is
> that people want to keep using that instead of irqsave/restore because
> it's a nano-optimisation on x86... oh well...
> 

Well, guess what... the particular user in this case *isn't used at all
on x86* so it is a non-issue here.  So I take it we have your particular
vote to use irqsave/irqrestore in lib/rwsem-spinlock.c?

	-hpa

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:42           ` Christoph Lameter
  2010-03-31 21:54             ` Russell King
@ 2010-03-31 22:36             ` Benjamin Herrenschmidt
  2010-04-01 15:57               ` Christoph Lameter
  1 sibling, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-31 22:36 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Matthew Wilcox, H. Peter Anvin, Yinghai Lu, Andrew Morton,
	Rabin Vincent, lkml, penberg, linux-arch

On Wed, 2010-03-31 at 16:42 -0500, Christoph Lameter wrote:
> 
> Do not run the checks while we are in a single threaded context?
> 
> I thought we had some dynamic code patching thingamy that could change
> those when we go to smp mode? 

The problem isn't about checks. Those -will- enable IRQs before it's
safe to do so, the checks are perfectly right to point it out, it -is- a
bug waiting to happen on some random HW.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:57               ` H. Peter Anvin
  2010-03-31 22:30                 ` Russell King
@ 2010-03-31 22:37                 ` Benjamin Herrenschmidt
  2010-03-31 22:49                   ` Andrew Morton
  1 sibling, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2010-03-31 22:37 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Christoph Lameter, Matthew Wilcox, Yinghai Lu, Andrew Morton,
	Rabin Vincent, lkml, penberg, linux-arch

On Wed, 2010-03-31 at 14:57 -0700, H. Peter Anvin wrote:
> 
> The question still remains what the incremental cost is of doing
> irqsave/irqrestore. 

The only other option is to have local_irq_enable() check a global
(system_state ?) before enabling. Almost as gross ...

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 22:37                 ` Benjamin Herrenschmidt
@ 2010-03-31 22:49                   ` Andrew Morton
  2010-04-01  1:17                     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Morton @ 2010-03-31 22:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: H. Peter Anvin, Christoph Lameter, Matthew Wilcox, Yinghai Lu,
	Rabin Vincent, lkml, penberg, linux-arch

On Thu, 01 Apr 2010 09:37:51 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Wed, 2010-03-31 at 14:57 -0700, H. Peter Anvin wrote:
> > 
> > The question still remains what the incremental cost is of doing
> > irqsave/irqrestore. 
> 
> The only other option is to have local_irq_enable() check a global
> (system_state ?) before enabling. Almost as gross ...
> 

Add an irq-disable-depth counter to the task_struct, fix all the bugs
which that exposes..

But these things are all utterly gross.  The bottom line is that
radix_tree_init() is manifestly unsuited to being called with local
interrupts disabled.  773e3eb7b81e5ba13b5155dfb3bb75b8ce37f8f9 was just
a wrong patch.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 20:47   ` Yinghai Lu
                       ` (3 preceding siblings ...)
  2010-03-31 22:31     ` Benjamin Herrenschmidt
@ 2010-03-31 22:58     ` David Howells
  2010-04-01  9:41       ` Jamie Lokier
  2010-04-01 10:50       ` David Howells
  4 siblings, 2 replies; 43+ messages in thread
From: David Howells @ 2010-03-31 22:58 UTC (permalink / raw)
  To: Russell King
  Cc: dhowells, Yinghai Lu, Andrew Morton, Rabin Vincent, lkml, hpa,
	penberg, cl, Benjamin Herrenschmidt, linux-arch

Russell King <rmk@arm.linux.org.uk> wrote:

> We use the standard generic kernel implementation.  Is x86 different? ;)

The optimised fast paths used on x86 rwsems don't disable interrupts.

David

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 22:49                   ` Andrew Morton
@ 2010-04-01  1:17                     ` Benjamin Herrenschmidt
  2010-03-31 22:26                       ` Andrew Morton
  0 siblings, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2010-04-01  1:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: H. Peter Anvin, Christoph Lameter, Matthew Wilcox, Yinghai Lu,
	Rabin Vincent, lkml, penberg, linux-arch

On Wed, 2010-03-31 at 15:49 -0700, Andrew Morton wrote:
> 
> But these things are all utterly gross.  The bottom line is that
> radix_tree_init() is manifestly unsuited to being called with local
> interrupts disabled.  773e3eb7b81e5ba13b5155dfb3bb75b8ce37f8f9 was
> just a wrong patch. 

Except that powerpc (and now it seems x86) both want to use radix trees
for interrupt handling... At least on powerpc, we trick and use a linear
search until the radix trees are initialized, which we do later during
boot, but that somewhat sucks.

I believe sherry picking things like not calling radix_tree_init() is
going to fix one case today, until we have another one, and another one,
and etc...

I suspect we're better off fixing the root of the problem in down/up.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01  6:26                         ` H. Peter Anvin
@ 2010-04-01  3:33                           ` Andrew Morton
  2010-04-01  6:48                             ` Benjamin Herrenschmidt
  2010-04-01 11:06                             ` David Howells
  2010-04-01  6:50                           ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 43+ messages in thread
From: Andrew Morton @ 2010-04-01  3:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Benjamin Herrenschmidt, Christoph Lameter, Matthew Wilcox,
	Yinghai Lu, Rabin Vincent, lkml, penberg, linux-arch

On Wed, 31 Mar 2010 23:26:52 -0700 "H. Peter Anvin" <hpa@zytor.com> wrote:

> On 03/31/2010 03:26 PM, Andrew Morton wrote:
> > 
> > Not by adding overhead to every single down_read()/down_write() just to
> > fix a once-off startup problem - that's taking laziness way too far.
> > 
> 
> How much overhead is this on non-x86 architectures (keep in mind x86
> doesn't use this?)
> 

Just a few instructions, I guess.   But we can do it with zero.

And from a design POV, pretending that down_read()/down_write() can be
called with interrupts disabled is daft - they cannot!  Why muck up the
usual code paths with this startup-specific hack?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 22:26                       ` Andrew Morton
@ 2010-04-01  6:26                         ` H. Peter Anvin
  2010-04-01  3:33                           ` Andrew Morton
  2010-04-01  6:50                           ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 43+ messages in thread
From: H. Peter Anvin @ 2010-04-01  6:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Benjamin Herrenschmidt, Christoph Lameter, Matthew Wilcox,
	Yinghai Lu, Rabin Vincent, lkml, penberg, linux-arch

On 03/31/2010 03:26 PM, Andrew Morton wrote:
> 
> Not by adding overhead to every single down_read()/down_write() just to
> fix a once-off startup problem - that's taking laziness way too far.
> 

How much overhead is this on non-x86 architectures (keep in mind x86
doesn't use this?)

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01  3:33                           ` Andrew Morton
@ 2010-04-01  6:48                             ` Benjamin Herrenschmidt
  2010-04-01 16:15                               ` H. Peter Anvin
  2010-04-01 11:06                             ` David Howells
  1 sibling, 1 reply; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2010-04-01  6:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: H. Peter Anvin, Christoph Lameter, Matthew Wilcox, Yinghai Lu,
	Rabin Vincent, lkml, penberg, linux-arch

On Wed, 2010-03-31 at 23:33 -0400, Andrew Morton wrote:
> Just a few instructions, I guess.   But we can do it with zero.
> 
> And from a design POV, pretending that down_read()/down_write() can be
> called with interrupts disabled is daft - they cannot!  Why muck up
> the
> usual code paths with this startup-specific hack? 

Because we the problem of when interrupts are enabled for the first time
is a nasty one, and having entire layer of things not usable at the
right level of init because somewhere something might do an irq enable
due to calling generic code that down's a semaphore is a PITA.

Seriously, Andrew, I don't see a clean solution... Something -somewhere-
will have to be ugly.

Allocation is a pretty basic service that a lot of stuff expect
especially when booting.

We went through that discussion before when we moved the SLAB init
earlier during boot, because it makes no sense to have tons of code to
have to figure out what allocator to call depending on what phase of the
moon it's called from (especially when said code can also be called
later during boot, say for hotplug reasons).

So we moved sl*b init earlier, thus we ought to be able to also
kmem_cache_alloc() earlier. We -fixed- that problem already afaik.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01  6:26                         ` H. Peter Anvin
  2010-04-01  3:33                           ` Andrew Morton
@ 2010-04-01  6:50                           ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2010-04-01  6:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Christoph Lameter, Matthew Wilcox, Yinghai Lu,
	Rabin Vincent, lkml, penberg, linux-arch

On Wed, 2010-03-31 at 23:26 -0700, H. Peter Anvin wrote:
> On 03/31/2010 03:26 PM, Andrew Morton wrote:
> > 
> > Not by adding overhead to every single down_read()/down_write() just to
> > fix a once-off startup problem - that's taking laziness way too far.
> > 
> 
> How much overhead is this on non-x86 architectures (keep in mind x86
> doesn't use this?)

None on powerpc, we use atomic ops and don't disable IRQs.

BTW. The same problem used to happen with mutex debug. Was this ever
fixed ?

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 22:58     ` David Howells
@ 2010-04-01  9:41       ` Jamie Lokier
  2010-04-01 11:23         ` Matthew Wilcox
  2010-04-01 10:50       ` David Howells
  1 sibling, 1 reply; 43+ messages in thread
From: Jamie Lokier @ 2010-04-01  9:41 UTC (permalink / raw)
  To: David Howells
  Cc: Russell King, Yinghai Lu, Andrew Morton, Rabin Vincent, lkml, hpa,
	penberg, cl, Benjamin Herrenschmidt, linux-arch

David Howells wrote:
> Russell King <rmk@arm.linux.org.uk> wrote:
> > We use the standard generic kernel implementation.  Is x86 different? ;)
> 
> The optimised fast paths used on x86 rwsems don't disable interrupts.

Any reason not to use the same technique for all the archs - plus the
trick used in arch/armkernel/entry-armv.S:__kuser_cmpxchg for those
archs which don't have atomic instructions or ll/sc?

If the problem here is _only_ semaphores, and the above might make
semaphores faster anyway, perhaps it's a solution.

-- Jamie

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 22:58     ` David Howells
  2010-04-01  9:41       ` Jamie Lokier
@ 2010-04-01 10:50       ` David Howells
  1 sibling, 0 replies; 43+ messages in thread
From: David Howells @ 2010-04-01 10:50 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: dhowells, Russell King, Yinghai Lu, Andrew Morton, Rabin Vincent,
	lkml, hpa, penberg, cl, Benjamin Herrenschmidt, linux-arch

Jamie Lokier <jamie@shareable.org> wrote:

> Any reason not to use the same technique for all the archs

It depends what atomic instructions you have available.

David

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01  3:33                           ` Andrew Morton
  2010-04-01  6:48                             ` Benjamin Herrenschmidt
@ 2010-04-01 11:06                             ` David Howells
  2010-04-01 15:55                               ` Christoph Lameter
  2010-04-01 23:00                               ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 43+ messages in thread
From: David Howells @ 2010-04-01 11:06 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: dhowells, Andrew Morton, H. Peter Anvin, Christoph Lameter,
	Matthew Wilcox, Yinghai Lu, Rabin Vincent, lkml, penberg,
	linux-arch


Can we provide a kmem_cache_create_early()?  One that takes no locks and gets
cleaned up with the other __init stuff?

David

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01  9:41       ` Jamie Lokier
@ 2010-04-01 11:23         ` Matthew Wilcox
  0 siblings, 0 replies; 43+ messages in thread
From: Matthew Wilcox @ 2010-04-01 11:23 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: David Howells, Russell King, Yinghai Lu, Andrew Morton,
	Rabin Vincent, lkml, hpa, penberg, cl, Benjamin Herrenschmidt,
	linux-arch

On Thu, Apr 01, 2010 at 10:41:11AM +0100, Jamie Lokier wrote:
> David Howells wrote:
> > Russell King <rmk@arm.linux.org.uk> wrote:
> > > We use the standard generic kernel implementation.  Is x86 different? ;)
> > 
> > The optimised fast paths used on x86 rwsems don't disable interrupts.
> 
> Any reason not to use the same technique for all the archs - plus the
> trick used in arch/armkernel/entry-armv.S:__kuser_cmpxchg for those
> archs which don't have atomic instructions or ll/sc?

Assuming you're talking about the __LINUX_ARM_ARCH__ < 6 + CONFIG_MMU
case, then this only works for uniprocessor machines.

> If the problem here is _only_ semaphores, and the above might make
> semaphores faster anyway, perhaps it's a solution.

You trade off a bit of overhead in the semaphore path for a bit of
overhead in the interrupt path.  We probably take more sempahores than
we do interrupts, so it's probably worthwhile.  Still, cmpxchg() needs
to be SMP-safe.

Realistically, this isn't something that can be done in generic code.
It has to be done in arch-specific code.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01 16:13         ` Linus Torvalds
@ 2010-04-01 14:27           ` Andrew Morton
  2010-04-01 20:12             ` Linus Torvalds
  2010-04-02 14:46             ` David Howells
  2010-04-07 19:09           ` Kevin Hilman
  1 sibling, 2 replies; 43+ messages in thread
From: Andrew Morton @ 2010-04-01 14:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Yinghai Lu, Rabin Vincent, lkml, penberg, cl,
	Benjamin Herrenschmidt, linux-arch, David Howells

On Thu, 1 Apr 2010 09:13:31 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Wed, 31 Mar 2010, H. Peter Anvin wrote:
> > 
> > The obvious way to fix this would be to use
> > spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
> > other locations; I don't have a good feel for what the cost of doing so
> > would be, though.  On x86 it's fairly expensive simply because the only
> > way to save the state is to push it on the stack, which the compiler
> > doesn't deal well with, but this code isn't used on x86.
> 
> I think that's what we should just do, with a good comment both in the 
> code and the changelog. I'm not entirely happy with it, because obviously 
> it's conceptually kind of dubious to take a lock with interrupts disabled 
> in the first place, but this is not a new issue per se.
> 
> The whole bootup code is special, and we already make similar guarantees 
> about memory allocators and friends - just because it's too dang painful 
> to have some special code that does GFP_ATOMIC for early bootup when the 
> same code is often shared and used at run-time too.
> 
> So we've accepted that people can do GFP_KERNEL allocations and we won't 
> care about them if we're in the boot phase (and suspend/resume), and we 
> have that whole 'gfp_allowed_mask' thing for that.
> 
> I think this probably falls under exactly the same heading of "not pretty, 
> but let's not blow up".
> 
> So making the slow-path do the spin_[un]lock_irq{save,restore}() versions 
> sounds like the right thing. It won't be a performance issue: it _is_ the 
> slow-path, and we're already doing the expensive part (the spinlock itself 
> and the irq thing).

It's actually on the fastpath for lib/rwsem-spinlock.c.

> So ACK on the idea. Who wants to write the trivial patch and test it? 
> Preferably somebody who sees the problem in the first place - x86 should 
> not be impacted, since the irq-disabling slow-path should never be hit 
> without contention anyway (and contention cannot/mustnot happen for this 
> case).
> 
> 			Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01 11:06                             ` David Howells
@ 2010-04-01 15:55                               ` Christoph Lameter
  2010-04-01 23:00                               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 43+ messages in thread
From: Christoph Lameter @ 2010-04-01 15:55 UTC (permalink / raw)
  To: David Howells
  Cc: Benjamin Herrenschmidt, Andrew Morton, H. Peter Anvin,
	Matthew Wilcox, Yinghai Lu, Rabin Vincent, lkml, penberg,
	linux-arch

On Thu, 1 Apr 2010, David Howells wrote:

>
> Can we provide a kmem_cache_create_early()?  One that takes no locks and gets
> cleaned up with the other __init stuff?

Sure. We can also check if we are early in boot in kmem_cache_create and
just not take the semaphores (which are useless if we are single
threaded).



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 22:36             ` Benjamin Herrenschmidt
@ 2010-04-01 15:57               ` Christoph Lameter
  0 siblings, 0 replies; 43+ messages in thread
From: Christoph Lameter @ 2010-04-01 15:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Matthew Wilcox, H. Peter Anvin, Yinghai Lu, Andrew Morton,
	Rabin Vincent, lkml, penberg, linux-arch

On Thu, 1 Apr 2010, Benjamin Herrenschmidt wrote:

> On Wed, 2010-03-31 at 16:42 -0500, Christoph Lameter wrote:
> >
> > Do not run the checks while we are in a single threaded context?
> >
> > I thought we had some dynamic code patching thingamy that could change
> > those when we go to smp mode?
>
> The problem isn't about checks. Those -will- enable IRQs before it's
> safe to do so, the checks are perfectly right to point it out, it -is- a
> bug waiting to happen on some random HW.

Taking sems in single threaded mode is pretty pointless. Those functions
could just return success until the scheduler is actually able to do
something with threads.



^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-03-31 21:12       ` H. Peter Anvin
  2010-03-31 21:28         ` Andrew Morton
@ 2010-04-01 16:13         ` Linus Torvalds
  2010-04-01 14:27           ` Andrew Morton
  2010-04-07 19:09           ` Kevin Hilman
  1 sibling, 2 replies; 43+ messages in thread
From: Linus Torvalds @ 2010-04-01 16:13 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Andrew Morton, Yinghai Lu, Rabin Vincent, lkml, penberg, cl,
	Benjamin Herrenschmidt, linux-arch, David Howells



On Wed, 31 Mar 2010, H. Peter Anvin wrote:
> 
> The obvious way to fix this would be to use
> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
> other locations; I don't have a good feel for what the cost of doing so
> would be, though.  On x86 it's fairly expensive simply because the only
> way to save the state is to push it on the stack, which the compiler
> doesn't deal well with, but this code isn't used on x86.

I think that's what we should just do, with a good comment both in the 
code and the changelog. I'm not entirely happy with it, because obviously 
it's conceptually kind of dubious to take a lock with interrupts disabled 
in the first place, but this is not a new issue per se.

The whole bootup code is special, and we already make similar guarantees 
about memory allocators and friends - just because it's too dang painful 
to have some special code that does GFP_ATOMIC for early bootup when the 
same code is often shared and used at run-time too.

So we've accepted that people can do GFP_KERNEL allocations and we won't 
care about them if we're in the boot phase (and suspend/resume), and we 
have that whole 'gfp_allowed_mask' thing for that.

I think this probably falls under exactly the same heading of "not pretty, 
but let's not blow up".

So making the slow-path do the spin_[un]lock_irq{save,restore}() versions 
sounds like the right thing. It won't be a performance issue: it _is_ the 
slow-path, and we're already doing the expensive part (the spinlock itself 
and the irq thing).

So ACK on the idea. Who wants to write the trivial patch and test it? 
Preferably somebody who sees the problem in the first place - x86 should 
not be impacted, since the irq-disabling slow-path should never be hit 
without contention anyway (and contention cannot/mustnot happen for this 
case).

			Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01  6:48                             ` Benjamin Herrenschmidt
@ 2010-04-01 16:15                               ` H. Peter Anvin
  0 siblings, 0 replies; 43+ messages in thread
From: H. Peter Anvin @ 2010-04-01 16:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Andrew Morton, Christoph Lameter, Matthew Wilcox, Yinghai Lu,
	Rabin Vincent, lkml, penberg, linux-arch

On 03/31/2010 11:48 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2010-03-31 at 23:33 -0400, Andrew Morton wrote:
>> Just a few instructions, I guess.   But we can do it with zero.
>>
>> And from a design POV, pretending that down_read()/down_write() can be
>> called with interrupts disabled is daft - they cannot!  Why muck up
>> the
>> usual code paths with this startup-specific hack? 
> 
> Because we the problem of when interrupts are enabled for the first time
> is a nasty one, and having entire layer of things not usable at the
> right level of init because somewhere something might do an irq enable
> due to calling generic code that down's a semaphore is a PITA.
> 
> Seriously, Andrew, I don't see a clean solution... Something -somewhere-
> will have to be ugly.
> 
> Allocation is a pretty basic service that a lot of stuff expect
> especially when booting.
> 
> We went through that discussion before when we moved the SLAB init
> earlier during boot, because it makes no sense to have tons of code to
> have to figure out what allocator to call depending on what phase of the
> moon it's called from (especially when said code can also be called
> later during boot, say for hotplug reasons).
> 
> So we moved sl*b init earlier, thus we ought to be able to also
> kmem_cache_alloc() earlier. We -fixed- that problem already afaik.

I would like to point out that initialization is a particular subcase of
a more general rule:

- It is safe to call a semaphore/rwlock down with IRQ disabled *if and
only if* the caller can guarantee non-contention.

Initialization is an obvious subcase, but there might be others.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01 14:27           ` Andrew Morton
@ 2010-04-01 20:12             ` Linus Torvalds
  2010-04-02 14:46             ` David Howells
  1 sibling, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2010-04-01 20:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: H. Peter Anvin, Yinghai Lu, Rabin Vincent, lkml, penberg, cl,
	Benjamin Herrenschmidt, linux-arch, David Howells



On Thu, 1 Apr 2010, Andrew Morton wrote:
>
> > So making the slow-path do the spin_[un]lock_irq{save,restore}() versions 
> > sounds like the right thing. It won't be a performance issue: it _is_ the 
> > slow-path, and we're already doing the expensive part (the spinlock itself 
> > and the irq thing).
> 
> It's actually on the fastpath for lib/rwsem-spinlock.c.

Ahh, yes. In this case, that doesn't likely change anything. The 
save/restore versions of the irq-safe locks shouldn't be appreciably more 
expensive than the non-saving ones. And architectures that really care 
should have done their own per-arch optimized version anyway.

Maybe we should even document that - so that nobody else makes the mistake 
x86-64 did of thinking that the "generic spinlock" version of the rwsem's 
is anything but a hacky and bad fallback case.

				Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01 11:06                             ` David Howells
  2010-04-01 15:55                               ` Christoph Lameter
@ 2010-04-01 23:00                               ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 43+ messages in thread
From: Benjamin Herrenschmidt @ 2010-04-01 23:00 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew Morton, H. Peter Anvin, Christoph Lameter, Matthew Wilcox,
	Yinghai Lu, Rabin Vincent, lkml, penberg, linux-arch

On Thu, 2010-04-01 at 12:06 +0100, David Howells wrote:
> Can we provide a kmem_cache_create_early()?  One that takes no locks and gets
> cleaned up with the other __init stuff?

Yuck. I hate having to expose more APIs. Also the problem with that is
means callers have to know. So we need to propagate up all call chains
etc... (ie, radix_tree_init_early(), etc...)

This is pretty much exactly the discussion we had when moving sl*b
early, and back then, the final word from Linus (heh, for once he agreed
with me :-) was that this made no sense.

We can bury logic inside kmem_cache_create() though, it's not -that- a
hot path.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01 14:27           ` Andrew Morton
  2010-04-01 20:12             ` Linus Torvalds
@ 2010-04-02 14:46             ` David Howells
  2010-04-02 14:54               ` Linus Torvalds
  1 sibling, 1 reply; 43+ messages in thread
From: David Howells @ 2010-04-02 14:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: dhowells, Andrew Morton, H. Peter Anvin, Yinghai Lu,
	Rabin Vincent, lkml, penberg, cl, Benjamin Herrenschmidt,
	linux-arch

Linus Torvalds <torvalds@linux-foundation.org> wrote:

> Ahh, yes. In this case, that doesn't likely change anything. The 
> save/restore versions of the irq-safe locks shouldn't be appreciably more 
> expensive than the non-saving ones. And architectures that really care 
> should have done their own per-arch optimized version anyway.

That depends on the CPU.  Some CPUs have quite expensive interrupt disablement
instructions.  FRV does for instance; but fortunately, on the FRV, I can use
some of the excessive quantities of conditional registers to pretend that I
disable interrupts, and only actually do so if an interrupt actually happens.

> Maybe we should even document that - so that nobody else makes the mistake 
> x86-64 did of thinking that the "generic spinlock" version of the rwsem's 
> is anything but a hacky and bad fallback case.

In some cases, it's actually the best way.  On a UP machine, for instance,
where they reduce to nothing or where your only atomic instruction is an XCHG
equivalent.

David

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-02 14:46             ` David Howells
@ 2010-04-02 14:54               ` Linus Torvalds
  0 siblings, 0 replies; 43+ messages in thread
From: Linus Torvalds @ 2010-04-02 14:54 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew Morton, H. Peter Anvin, Yinghai Lu, Rabin Vincent, lkml,
	penberg, cl, Benjamin Herrenschmidt, linux-arch



On Fri, 2 Apr 2010, David Howells wrote:

> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > Ahh, yes. In this case, that doesn't likely change anything. The 
> > save/restore versions of the irq-safe locks shouldn't be appreciably more 
> > expensive than the non-saving ones. And architectures that really care 
> > should have done their own per-arch optimized version anyway.
> 
> That depends on the CPU.  Some CPUs have quite expensive interrupt disablement
> instructions.  FRV does for instance; but fortunately, on the FRV, I can use
> some of the excessive quantities of conditional registers to pretend that I
> disable interrupts, and only actually do so if an interrupt actually happens.

I think you're missing the part where we're not _adding_ any irq disables: 
we're just changing the unconditional irq disable to a save-and-disable 
(and the unconditional irq enable to a restore).

So even if irq's are expensive to disable, the change from

	spin_lock_irq()

to

	spin_lock_irqsave()

won't make that code any more expensive.

> > Maybe we should even document that - so that nobody else makes the mistake 
> > x86-64 did of thinking that the "generic spinlock" version of the rwsem's 
> > is anything but a hacky and bad fallback case.
> 
> In some cases, it's actually the best way.  On a UP machine, for instance,
> where they reduce to nothing or where your only atomic instruction is an XCHG
> equivalent.

Again, you seem to think that we used to have just a plain spin_lock. Not 
so. We currently have a spin_lock_irq(), and it is NOT a no-op even on UP. 
It does that irq disable.

Anyway, I suspect that even with just an atomic xchg, you can do a better 
job at doing down_read() than using the generic spin-lock version (likely 
by busy-looping on a special "we're busy" value). But if you do want to 
use the generic spin-lock version, I doubt any architecture makes that 
irqsave version noticeable slower than the unconditional irq version.

		Linus

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-01 16:13         ` Linus Torvalds
  2010-04-01 14:27           ` Andrew Morton
@ 2010-04-07 19:09           ` Kevin Hilman
  2010-04-08 15:55             ` Américo Wang
  1 sibling, 1 reply; 43+ messages in thread
From: Kevin Hilman @ 2010-04-07 19:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H. Peter Anvin, Andrew Morton, Yinghai Lu, Rabin Vincent, lkml,
	penberg, cl, Benjamin Herrenschmidt, linux-arch, David Howells

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Wed, 31 Mar 2010, H. Peter Anvin wrote:
>> 
>> The obvious way to fix this would be to use
>> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
>> other locations; I don't have a good feel for what the cost of doing so
>> would be, though.  On x86 it's fairly expensive simply because the only
>> way to save the state is to push it on the stack, which the compiler
>> doesn't deal well with, but this code isn't used on x86.
>

[...]

> So making the slow-path do the spin_[un]lock_irq{save,restore}() versions 
> sounds like the right thing. It won't be a performance issue: it _is_ the 
> slow-path, and we're already doing the expensive part (the spinlock itself 
> and the irq thing).
>
> So ACK on the idea. Who wants to write the trivial patch and test it? 

OK, I'll bite since I was seeing boot-time hangs on ARM (TI OMAP3) due
to this.  Patch below.

Kevin


>From 7baff4008353bbfd2a2e2a4da22b87bc4efa4194 Mon Sep 17 00:00:00 2001
From: Kevin Hilman <khilman@deeprootsystems.com>
Date: Wed, 7 Apr 2010 11:52:46 -0700
Subject: [PATCH] rwsem generic spinlock: use IRQ save/restore spinlocks

rwsems can be used with IRQs disabled, particularily in early boot
before IRQs are enabled.  Currently the spin_unlock_irq() usage in the
slow-patch will unconditionally enable interrupts and cause problems
since interrupts are not yet initialized or enabled.

This patch uses save/restore versions of IRQ spinlocks in the slowpath
to ensure interrupts are not unintentionally disabled.

Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
---
 lib/rwsem-spinlock.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
index ccf95bf..ffc9fc7 100644
--- a/lib/rwsem-spinlock.c
+++ b/lib/rwsem-spinlock.c
@@ -143,13 +143,14 @@ void __sched __down_read(struct rw_semaphore *sem)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
+	unsigned long flags;
 
-	spin_lock_irq(&sem->wait_lock);
+	spin_lock_irqsave(&sem->wait_lock, flags);
 
 	if (sem->activity >= 0 && list_empty(&sem->wait_list)) {
 		/* granted */
 		sem->activity++;
-		spin_unlock_irq(&sem->wait_lock);
+		spin_unlock_irqrestore(&sem->wait_lock, flags);
 		goto out;
 	}
 
@@ -164,7 +165,7 @@ void __sched __down_read(struct rw_semaphore *sem)
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we don't need to touch the semaphore struct anymore */
-	spin_unlock_irq(&sem->wait_lock);
+	spin_unlock_irqrestore(&sem->wait_lock, flags);
 
 	/* wait to be given the lock */
 	for (;;) {
@@ -209,13 +210,14 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 {
 	struct rwsem_waiter waiter;
 	struct task_struct *tsk;
+	unsigned long flags;
 
-	spin_lock_irq(&sem->wait_lock);
+	spin_lock_irqsave(&sem->wait_lock, flags);
 
 	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
 		/* granted */
 		sem->activity = -1;
-		spin_unlock_irq(&sem->wait_lock);
+		spin_unlock_irqrestore(&sem->wait_lock, flags);
 		goto out;
 	}
 
@@ -230,7 +232,7 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 	list_add_tail(&waiter.list, &sem->wait_list);
 
 	/* we don't need to touch the semaphore struct anymore */
-	spin_unlock_irq(&sem->wait_lock);
+	spin_unlock_irqrestore(&sem->wait_lock, flags);
 
 	/* wait to be given the lock */
 	for (;;) {
-- 
1.7.0.2


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: start_kernel(): bug: interrupts were enabled early
  2010-04-07 19:09           ` Kevin Hilman
@ 2010-04-08 15:55             ` Américo Wang
  0 siblings, 0 replies; 43+ messages in thread
From: Américo Wang @ 2010-04-08 15:55 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Linus Torvalds, H. Peter Anvin, Andrew Morton, Yinghai Lu,
	Rabin Vincent, lkml, penberg, cl, Benjamin Herrenschmidt,
	linux-arch, David Howells

On Wed, Apr 07, 2010 at 12:09:17PM -0700, Kevin Hilman wrote:
>Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> On Wed, 31 Mar 2010, H. Peter Anvin wrote:
>>> 
>>> The obvious way to fix this would be to use
>>> spin_lock_irqsave..spin_lock_irqrestore in __down_read as well as in the
>>> other locations; I don't have a good feel for what the cost of doing so
>>> would be, though.  On x86 it's fairly expensive simply because the only
>>> way to save the state is to push it on the stack, which the compiler
>>> doesn't deal well with, but this code isn't used on x86.
>>
>
>[...]
>
>> So making the slow-path do the spin_[un]lock_irq{save,restore}() versions 
>> sounds like the right thing. It won't be a performance issue: it _is_ the 
>> slow-path, and we're already doing the expensive part (the spinlock itself 
>> and the irq thing).
>>
>> So ACK on the idea. Who wants to write the trivial patch and test it? 
>
>OK, I'll bite since I was seeing boot-time hangs on ARM (TI OMAP3) due
>to this.  Patch below.
>
>Kevin
>
>
>From 7baff4008353bbfd2a2e2a4da22b87bc4efa4194 Mon Sep 17 00:00:00 2001
>From: Kevin Hilman <khilman@deeprootsystems.com>
>Date: Wed, 7 Apr 2010 11:52:46 -0700
>Subject: [PATCH] rwsem generic spinlock: use IRQ save/restore spinlocks
>
>rwsems can be used with IRQs disabled, particularily in early boot
>before IRQs are enabled.  Currently the spin_unlock_irq() usage in the
>slow-patch will unconditionally enable interrupts and cause problems
>since interrupts are not yet initialized or enabled.
>
>This patch uses save/restore versions of IRQ spinlocks in the slowpath
>to ensure interrupts are not unintentionally disabled.
>
>Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>

This looks reasonable and fine for me.

Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>

Thanks.


>---
> lib/rwsem-spinlock.c |   14 ++++++++------
> 1 files changed, 8 insertions(+), 6 deletions(-)
>
>diff --git a/lib/rwsem-spinlock.c b/lib/rwsem-spinlock.c
>index ccf95bf..ffc9fc7 100644
>--- a/lib/rwsem-spinlock.c
>+++ b/lib/rwsem-spinlock.c
>@@ -143,13 +143,14 @@ void __sched __down_read(struct rw_semaphore *sem)
> {
> 	struct rwsem_waiter waiter;
> 	struct task_struct *tsk;
>+	unsigned long flags;
> 
>-	spin_lock_irq(&sem->wait_lock);
>+	spin_lock_irqsave(&sem->wait_lock, flags);
> 
> 	if (sem->activity >= 0 && list_empty(&sem->wait_list)) {
> 		/* granted */
> 		sem->activity++;
>-		spin_unlock_irq(&sem->wait_lock);
>+		spin_unlock_irqrestore(&sem->wait_lock, flags);
> 		goto out;
> 	}
> 
>@@ -164,7 +165,7 @@ void __sched __down_read(struct rw_semaphore *sem)
> 	list_add_tail(&waiter.list, &sem->wait_list);
> 
> 	/* we don't need to touch the semaphore struct anymore */
>-	spin_unlock_irq(&sem->wait_lock);
>+	spin_unlock_irqrestore(&sem->wait_lock, flags);
> 
> 	/* wait to be given the lock */
> 	for (;;) {
>@@ -209,13 +210,14 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
> {
> 	struct rwsem_waiter waiter;
> 	struct task_struct *tsk;
>+	unsigned long flags;
> 
>-	spin_lock_irq(&sem->wait_lock);
>+	spin_lock_irqsave(&sem->wait_lock, flags);
> 
> 	if (sem->activity == 0 && list_empty(&sem->wait_list)) {
> 		/* granted */
> 		sem->activity = -1;
>-		spin_unlock_irq(&sem->wait_lock);
>+		spin_unlock_irqrestore(&sem->wait_lock, flags);
> 		goto out;
> 	}
> 
>@@ -230,7 +232,7 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
> 	list_add_tail(&waiter.list, &sem->wait_list);
> 
> 	/* we don't need to touch the semaphore struct anymore */
>-	spin_unlock_irq(&sem->wait_lock);
>+	spin_unlock_irqrestore(&sem->wait_lock, flags);
> 
> 	/* wait to be given the lock */
> 	for (;;) {
>-- 
>1.7.0.2
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

-- 
Live like a child, think like the god.
 

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2010-04-08 15:52 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-25 19:41 start_kernel(): bug: interrupts were enabled early Rabin Vincent
2010-03-31 20:40 ` Andrew Morton
2010-03-31 20:47   ` Yinghai Lu
2010-03-31 20:52     ` Andrew Morton
2010-03-31 21:12       ` H. Peter Anvin
2010-03-31 21:28         ` Andrew Morton
2010-03-31 22:35           ` Benjamin Herrenschmidt
2010-04-01 16:13         ` Linus Torvalds
2010-04-01 14:27           ` Andrew Morton
2010-04-01 20:12             ` Linus Torvalds
2010-04-02 14:46             ` David Howells
2010-04-02 14:54               ` Linus Torvalds
2010-04-07 19:09           ` Kevin Hilman
2010-04-08 15:55             ` Américo Wang
2010-03-31 21:01     ` Matthew Wilcox
2010-03-31 21:05       ` H. Peter Anvin
2010-03-31 21:17         ` Matthew Wilcox
2010-03-31 21:42           ` Christoph Lameter
2010-03-31 21:54             ` Russell King
2010-03-31 21:57               ` H. Peter Anvin
2010-03-31 22:30                 ` Russell King
2010-03-31 22:37                 ` Benjamin Herrenschmidt
2010-03-31 22:49                   ` Andrew Morton
2010-04-01  1:17                     ` Benjamin Herrenschmidt
2010-03-31 22:26                       ` Andrew Morton
2010-04-01  6:26                         ` H. Peter Anvin
2010-04-01  3:33                           ` Andrew Morton
2010-04-01  6:48                             ` Benjamin Herrenschmidt
2010-04-01 16:15                               ` H. Peter Anvin
2010-04-01 11:06                             ` David Howells
2010-04-01 15:55                               ` Christoph Lameter
2010-04-01 23:00                               ` Benjamin Herrenschmidt
2010-04-01  6:50                           ` Benjamin Herrenschmidt
2010-03-31 22:36             ` Benjamin Herrenschmidt
2010-04-01 15:57               ` Christoph Lameter
2010-03-31 21:05     ` Russell King
2010-03-31 21:08       ` H. Peter Anvin
2010-03-31 22:31     ` Benjamin Herrenschmidt
2010-03-31 22:36       ` H. Peter Anvin
2010-03-31 22:58     ` David Howells
2010-04-01  9:41       ` Jamie Lokier
2010-04-01 11:23         ` Matthew Wilcox
2010-04-01 10:50       ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).