public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] request_irq(...,SA_BOOTMEM);
@ 2006-06-05  5:40 Benjamin Herrenschmidt
  2006-06-05  7:08 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-05  5:40 UTC (permalink / raw)
  To: Linux Kernel list; +Cc: Ingo Molnar, Thomas Gleixner

In various places, architectures need to request interrupts early during
boot. Before slab is initialized typically. We used to have all sorts of
arch hacks to do so, which ended up being turned into something like:

static struct irqaction xxx_irq = { xxx_irq_handler, 0, CPU_MASK_NONE,
"xxx", NULL, NULL };

 .../...

setup_irq(&xxx_irq);

(for example fpu_irq in i386's i8259.c)

I quite dislike that and would like to propose that patch instead,
adding an SA_BOOTMEM flag that can be used to request IRQs before slab
is initialized. (Note that the register_* calls to the proc code aren't
protected, they shouldn't be a problem as they test for the /proc/irq
root node before doing anything and this can't have been created if slab
doesn't work).

Of course, an SA_BOOTMEM irqaction can't be freed. My proposed patch
"disconnects" it completely and just skips the freeing step but we might
want to refuse with a WARN_ON() attempts to free_irq() such an interrupt
instead. Note that the existing practice using a static descriptor
doesn't protect against such attempts at freeing.

Ben.

Index: linux-work/include/linux/signal.h
===================================================================
--- linux-work.orig/include/linux/signal.h	2006-05-30 13:03:41.000000000 +1000
+++ linux-work/include/linux/signal.h	2006-06-05 15:14:16.000000000 +1000
@@ -19,7 +19,7 @@
 #define SA_SAMPLE_RANDOM	SA_RESTART
 #define SA_SHIRQ		0x04000000
 #define SA_PROBEIRQ		0x08000000
-
+#define SA_BOOTMEM		0x02000000
 /*
  * As above, these correspond to the IORESOURCE_IRQ_* defines in
  * linux/ioport.h to select the interrupt line behaviour.  When
Index: linux-work/kernel/irq/manage.c
===================================================================
--- linux-work.orig/kernel/irq/manage.c	2006-05-31 11:26:45.000000000 +1000
+++ linux-work/kernel/irq/manage.c	2006-06-05 15:34:27.000000000 +1000
@@ -12,6 +12,7 @@
 #include <linux/module.h>
 #include <linux/random.h>
 #include <linux/interrupt.h>
+#include <linux/bootmem.h>
 
 #include "internals.h"
 
@@ -206,7 +207,7 @@ int setup_irq(unsigned int irq, struct i
 	 * so we have to be careful not to interfere with a
 	 * running system.
 	 */
-	if (new->flags & SA_SAMPLE_RANDOM) {
+	if ((new->flags & SA_SAMPLE_RANDOM) && !(new->flags & SA_BOOTMEM)) {
 		/*
 		 * This function might sleep, we want to call it first,
 		 * outside of the atomic block.
@@ -363,7 +364,8 @@ void free_irq(unsigned int irq, void *de
 
 			/* Make sure it's not being used on another CPU */
 			synchronize_irq(irq);
-			kfree(action);
+			if (!(action->flags & SA_BOOTMEM))
+				kfree(action);
 			return;
 		}
 		printk(KERN_ERR "Trying to free free IRQ%d\n", irq);
@@ -424,7 +426,10 @@ int request_irq(unsigned int irq,
 	if (!handler)
 		return -EINVAL;
 
-	action = kmalloc(sizeof(struct irqaction), GFP_ATOMIC);
+	if (irqflags & SA_BOOTMEM)
+		action = alloc_bootmem(sizeof(struct irqaction));
+	else
+		action = kmalloc(sizeof(struct irqaction), GFP_ATOMIC);
 	if (!action)
 		return -ENOMEM;
 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] request_irq(...,SA_BOOTMEM);
  2006-06-05  5:40 [RFC][PATCH] request_irq(...,SA_BOOTMEM); Benjamin Herrenschmidt
@ 2006-06-05  7:08 ` Benjamin Herrenschmidt
  2006-06-05  7:31   ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-05  7:08 UTC (permalink / raw)
  To: Linux Kernel list
  Cc: Ingo Molnar, Thomas Gleixner, Andrew Morton, Linus Torvalds

On Mon, 2006-06-05 at 15:40 +1000, Benjamin Herrenschmidt wrote:
> In various places, architectures need to request interrupts early during
> boot. Before slab is initialized typically. We used to have all sorts of
> arch hacks to do so, which ended up being turned into something like:

 ..../.... (skip workaround based on bootmem)

Hrm... I'm running into more problems with some of my powerpc irq
cleanups related to slab not being initialized.

Why do we do it so late ? I don't see any reason. A bunch of stuff like
init_IRQ(), time_init() etc...end up being called without a working slab
(not even GFP_ATOMIC). Damn, even console_init().

What are the fundamental reasons, if any, why we initialize the slab so
late ? Would it be possible to have the slab fallback to bootmem or some
crap like that early during boot (though that may be an issue for things
trying to free blocks) ?

For example, in my big irq rework for powerpc, I need to clean up the
virtual to physical irq mapping for which we are using a radix tree.
Right now, we kludge around in the pseries code to use that radix tree
late (that is to setup the cascaded interrupt from an arch_initcall),
but that's becoming a bigger issue as I'm trying to push some of that
stuff into a nice common remapping layer (we need that for cell too
among others). That's just an example of a low level utility (radix
trees) we can't use because it relies on the slab.

Another example is that things like init_IRQ(), time_init(),
console_init() etc.. need to be able to do ioremap. How many hacks are
archs cluttered with to be able to do ioremap before mem_init ? It
varies from hacks in pte_alloc_kernel() to use bootmem, to the use of
bolted hash entries on ppc64, etc....

Maybe we should have a look at pushing some things later into the boot
process that have no reason anymore to be done that early... And even if
not, a consistent memory allocator is something that I would really
consider as a very basic service that should be up before pretty much
everything else. It's not like the slab was relying on hardware to be up
or anything fancy... it doesn't even rely on vmalloc() to be available
nor the MMU in any consistent state outside of having the linear mapping
up (which I think all archs have at this point in time anyway).

I'm sure there is a subtle sneaky dependency, I would suspect something
to do with the scheduler and/or the cpumask/numa infos, whatever, but I
think we should really consider solving that.

Bootmem is a hack. It may be good enough for very-early-boot type
allocations, but them, I mean really really early (and even then, heh,
powerpc has it's own allocator that works even before bootmem is up). 

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] request_irq(...,SA_BOOTMEM);
  2006-06-05  7:08 ` Benjamin Herrenschmidt
@ 2006-06-05  7:31   ` Andrew Morton
  2006-06-05  7:48     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2006-06-05  7:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel, mingo, tglx, torvalds

On Mon, 05 Jun 2006 17:08:29 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> On Mon, 2006-06-05 at 15:40 +1000, Benjamin Herrenschmidt wrote:
> > In various places, architectures need to request interrupts early during
> > boot. Before slab is initialized typically. We used to have all sorts of
> > arch hacks to do so, which ended up being turned into something like:
> 
>  ..../.... (skip workaround based on bootmem)
> 
> Hrm... I'm running into more problems with some of my powerpc irq
> cleanups related to slab not being initialized.
> 
> Why do we do it so late ? I don't see any reason. A bunch of stuff like
> init_IRQ(), time_init() etc...end up being called without a working slab
> (not even GFP_ATOMIC). Damn, even console_init().
> 
> What are the fundamental reasons, if any, why we initialize the slab so
> late ?

I suspect because it's been like that since for ever, and any time we touch
anything in there, bad things happen.

> ...
>
> I'm sure there is a subtle sneaky dependency, I would suspect something
> to do with the scheduler and/or the cpumask/numa infos, whatever, but I
> think we should really consider solving that.

I don't immediately see anything in there which would prevent us from
running these:

	vfs_caches_init_early();
	cpuset_init_early();
	mem_init();
	kmem_cache_init();
	setup_per_cpu_pageset();

just after sort_main_extable().

But things will explode ;)

I suggest you run up a patch, test it on whatever machines you have, send
it over and I'll do the same.  But please make sure it has a config option
to restore the old sequence for now.  a) So people can work out that it was
this patch which broke things and b) so it doesn't adversely affect testing
of other things too much.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] request_irq(...,SA_BOOTMEM);
  2006-06-05  7:31   ` Andrew Morton
@ 2006-06-05  7:48     ` Benjamin Herrenschmidt
  2006-06-05  8:24       ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-05  7:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo, tglx, torvalds


> I don't immediately see anything in there which would prevent us from
> running these:
> 
> 	vfs_caches_init_early();
> 	cpuset_init_early();
> 	mem_init();
> 	kmem_cache_init();
> 	setup_per_cpu_pageset();
> 
> just after sort_main_extable().
> 
> But things will explode ;)
> 
> I suggest you run up a patch, test it on whatever machines you have, send
> it over and I'll do the same.  But please make sure it has a config option
> to restore the old sequence for now.  a) So people can work out that it was
> this patch which broke things and b) so it doesn't adversely affect testing
> of other things too much.

Good ideas. I'll give these things a spin. One thing that may explode is
that all that code is running with local_irq_disable() (since local irqs
aren't enabled before init_IRQ()) and that means possible use of some
types of semaphores may trigger warn-on's (or worse as I think some
implementations of down_read() might even force-enable irqs).

But there is no fundamental reasons to do so ... that's the trick :) If
that happens, those semaphores are still ok as they should never get
into contention that early.

Anyway, I'll give it a spin on ppc and maybe x86 if I can find a victim
to test on here, and will send something.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] request_irq(...,SA_BOOTMEM);
  2006-06-05  7:48     ` Benjamin Herrenschmidt
@ 2006-06-05  8:24       ` Andrew Morton
  2006-06-05  8:49         ` Benjamin Herrenschmidt
  2006-06-05  9:01         ` Ingo Molnar
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2006-06-05  8:24 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel, mingo, tglx, torvalds

On Mon, 05 Jun 2006 17:48:10 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> 
> > I don't immediately see anything in there which would prevent us from
> > running these:
> > 
> > 	vfs_caches_init_early();
> > 	cpuset_init_early();
> > 	mem_init();
> > 	kmem_cache_init();
> > 	setup_per_cpu_pageset();
> > 
> > just after sort_main_extable().
> > 
> > But things will explode ;)
> > 
> > I suggest you run up a patch, test it on whatever machines you have, send
> > it over and I'll do the same.  But please make sure it has a config option
> > to restore the old sequence for now.  a) So people can work out that it was
> > this patch which broke things and b) so it doesn't adversely affect testing
> > of other things too much.
> 
> Good ideas. I'll give these things a spin. One thing that may explode is
> that all that code is running with local_irq_disable() (since local irqs
> aren't enabled before init_IRQ()) and that means possible use of some
> types of semaphores may trigger warn-on's (or worse as I think some
> implementations of down_read() might even force-enable irqs).

Yes, there are a few sleeping locks taken in
kmem_cache_init()->kmem_cache_init(), for example.

They would normally generate might_sleep() warnings, but __might_sleep()
suppresses that early in boot, for this very reason.

It's all a bit sleazy, "knowing" that these locks won't be contended, so
it's safe to do apparently-unsafe things.  We haven't even started the
other CPUs yet.

For something like kmem_cache_init() we could, I suppose, pass in a
dont-take-any-locks flag.  But for a fastpath thing (if there are any such
cases) that wouldn't be an option.

All very unpleasant.

And yes, the mutex code will (with debug enabled) unconditionally enable
interrupts.  ppc64 tends to oops when this happens, in the timer handler
(so it'll be intermittent...)

But looking at
work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
I realise I don't understand it.  We only go into the irq-enabling code in
the case of contention, and there cannot be contention in this case?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] request_irq(...,SA_BOOTMEM);
  2006-06-05  8:24       ` Andrew Morton
@ 2006-06-05  8:49         ` Benjamin Herrenschmidt
  2006-06-05  8:57           ` Andrew Morton
  2006-06-05  9:01         ` Ingo Molnar
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2006-06-05  8:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mingo, tglx, torvalds

\
> Yes, there are a few sleeping locks taken in
> kmem_cache_init()->kmem_cache_init(), for example.
> 
> They would normally generate might_sleep() warnings, but __might_sleep()
> suppresses that early in boot, for this very reason.
> 
> It's all a bit sleazy, "knowing" that these locks won't be contended, so
> it's safe to do apparently-unsafe things.  We haven't even started the
> other CPUs yet.
> 
> For something like kmem_cache_init() we could, I suppose, pass in a
> dont-take-any-locks flag.  But for a fastpath thing (if there are any such
> cases) that wouldn't be an option.
> 
> All very unpleasant.

It is, but I think it's ok to assume that we won't contention that early
during boot. I mean, irqs are disabled not because we are in an atomic
section but bcs we just have never enabled them yet :) 

> And yes, the mutex code will (with debug enabled) unconditionally enable
> interrupts.  ppc64 tends to oops when this happens, in the timer handler
> (so it'll be intermittent...)

I tend to say that any code that hard-enables interrupts is looking for
trouble (mostly for that very reason of init stuff).

The thing is, to talk to the PIC, we need generally quite a bit of stuff
up, like page tables for ioremap etc... and thus because of that, as I
said, archs carry horrible hacks all over the place.

We can't just local_irq_enable() and setup the PIC later because the PIC
may have been left in a crap state by the BIOS (or whatever else we boot
from like kexec/kdump).

Thus it's a matter of

 - making sure we have some sanity with irq enabling/disabling
(basically not hard-enabling, always doing save/restore)
 - making sure our very zealous runtime checks don't trigger on
"interrupts have never been enabled yet" :)

I think the above is a small price to pay for all the added sanity of
having an allocator earlier and removing all of the crap archs carry
around to have ioremap working very early.

Ok, not _all_ of it because archs took bad habits and setup_arch() tends
to be full of stuff needing to tap the hardware as well, but heh, let's
clean things one step at a time. Once we have done that, we can/should
probably split setup_arch() (before allocator etc... setup) and
init_arch() after. That way, all the code that need to atually probe
hardware can be moved there (on ppc, that's also where we need to
discover PCI bridges so we can get the legacy ISA stuff etc....) 

I can already foresee vast amounts of cleanups in the ppc code (and
getting rid of bootmem allocations in lots of places) with such
things :)

> But looking at
> work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch
> I realise I don't understand it.  We only go into the irq-enabling code in
> the case of contention, and there cannot be contention in this case?

I'll have a look, possibly not before tomorrow though.

Ben.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] request_irq(...,SA_BOOTMEM);
  2006-06-05  8:49         ` Benjamin Herrenschmidt
@ 2006-06-05  8:57           ` Andrew Morton
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2006-06-05  8:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel, mingo, tglx, torvalds

On Mon, 05 Jun 2006 18:49:36 +1000
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> > And yes, the mutex code will (with debug enabled) unconditionally enable
> > interrupts.  ppc64 tends to oops when this happens, in the timer handler
> > (so it'll be intermittent...)
> 
> I tend to say that any code that hard-enables interrupts is looking for
> trouble (mostly for that very reason of init stuff).

Usually.  But it's 100% daft to be preserving local irq state in
mutex_lock().

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC][PATCH] request_irq(...,SA_BOOTMEM);
  2006-06-05  8:24       ` Andrew Morton
  2006-06-05  8:49         ` Benjamin Herrenschmidt
@ 2006-06-05  9:01         ` Ingo Molnar
  1 sibling, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2006-06-05  9:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Benjamin Herrenschmidt, linux-kernel, tglx, torvalds


* Andrew Morton <akpm@osdl.org> wrote:

> And yes, the mutex code will (with debug enabled) unconditionally 
> enable interrupts.  ppc64 tends to oops when this happens, in the 
> timer handler (so it'll be intermittent...)

hm, i sent a patch to fix that, long time ago.

> But looking at 
> work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch 
> I realise I don't understand it.  We only go into the irq-enabling 
> code in the case of contention, and there cannot be contention in this 
> case?

in the debug case we go into the 'slowpath' all the time - so that we 
can do the debug checks under the mutex lock.

if we get real contention then we have a might_sleep() check that will 
catch that.

i'd suggest to push 
work-around-ppc64-bootup-bug-by-making-mutex-debugging-save-restore-irqs.patch 
upstream - i thought we agreed that while it's a bit hacky and slows the 
mutex code down a bit, it's not practical right now to forbid 
uncontended mutex_lock() in early init code?

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-06-05  9:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-05  5:40 [RFC][PATCH] request_irq(...,SA_BOOTMEM); Benjamin Herrenschmidt
2006-06-05  7:08 ` Benjamin Herrenschmidt
2006-06-05  7:31   ` Andrew Morton
2006-06-05  7:48     ` Benjamin Herrenschmidt
2006-06-05  8:24       ` Andrew Morton
2006-06-05  8:49         ` Benjamin Herrenschmidt
2006-06-05  8:57           ` Andrew Morton
2006-06-05  9:01         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox