public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Serial Oopsen caused by global IRQ chanes
@ 2002-07-27 18:11 Russell King
  2002-07-27 18:19 ` Russell King
  2002-07-27 18:43 ` [patch] " Ingo Molnar
  0 siblings, 2 replies; 7+ messages in thread
From: Russell King @ 2002-07-27 18:11 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

Hi,

Two people have now reported to me a couple of oopsen which appear to be
caused by a change in 2.5.29 to synchronize_irq(), which I believe has
made synchronize_irq() useless.

In effect, we no longer guarantee that any IRQ handlers for a particular
IRQ will have finished running by the time free_irq() returns.  So, code
which has:

int bar;
int *foo = &bar;

irq_handler()
{
	*foo = 0;
}

void module_exit(void)
{
	free_irq(irq, NULL);
	foo = NULL;
}

is currently broken in two ways:

1. it's possible for irq_handler to dereference foo on another CPU _after_
   free_irq has returned.
2. it's possible for the module to be unloaded while the irq_handler is
   still running on another CPU.

Would someone else (Ingo?) like to comment on the above please?
The serial code regularly trips up because of this on SMP boxen.

Thanks.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Serial Oopsen caused by global IRQ chanes
  2002-07-27 18:11 Serial Oopsen caused by global IRQ chanes Russell King
@ 2002-07-27 18:19 ` Russell King
  2002-07-27 18:21   ` Ingo Molnar
  2002-07-27 18:43 ` [patch] " Ingo Molnar
  1 sibling, 1 reply; 7+ messages in thread
From: Russell King @ 2002-07-27 18:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo

On Sat, Jul 27, 2002 at 07:11:19PM +0100, Russell King wrote:
> Two people have now reported to me a couple of oopsen which appear to be
> caused by a change in 2.5.29 to synchronize_irq(), which I believe has
> made synchronize_irq() useless.

Sorry, not useless.  Just free_irq extremely buggy.

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Serial Oopsen caused by global IRQ chanes
  2002-07-27 18:19 ` Russell King
@ 2002-07-27 18:21   ` Ingo Molnar
  0 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2002-07-27 18:21 UTC (permalink / raw)
  To: Russell King; +Cc: linux-kernel, mingo


On Sat, 27 Jul 2002, Russell King wrote:

> > Two people have now reported to me a couple of oopsen which appear to be
> > caused by a change in 2.5.29 to synchronize_irq(), which I believe has
> > made synchronize_irq() useless.
> 
> Sorry, not useless.  Just free_irq extremely buggy.

i found the bug, will send a patch soon.

	Ingo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch] Re: Serial Oopsen caused by global IRQ chanes
  2002-07-27 18:11 Serial Oopsen caused by global IRQ chanes Russell King
  2002-07-27 18:19 ` Russell King
@ 2002-07-27 18:43 ` Ingo Molnar
  2002-07-27 22:36   ` William Lee Irwin III
  1 sibling, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2002-07-27 18:43 UTC (permalink / raw)
  To: Russell King; +Cc: linux-kernel, Linus Torvalds


the attached patch fixes a synchronize_irq() bug: if the interrupt is 
freed while an IRQ handler is running (irq state is IRQ_INPROGRESS) then 
synchronize_irq() will return early, which is incorrect.

there was another do_IRQ() bug that in fact necessiated the bad code that
caused the synchronize_irq() bug - we kept the IRQ_INPROGRESS bit set for
not active interrupt sources - after they happen for the first time. Now
the only effect this has is on i8259A irq handling - we used to keep these
irqs disabled after the first 'spurious' interrupt happened.  Now what the
i8259A code really wants to do IMO is to keep the interrupt disabled if
there is no handler defined for that interrupt source. The patch adds
exactly this. I dont remember why this was needed in the first place (irq
probing? avoidance of interrupt storms?), but with the patch the behavior
should be equivalent.

	Ingo

--- linux/arch/i386/kernel/irq.c.orig2	Sat Jul 27 20:28:05 2002
+++ linux/arch/i386/kernel/irq.c	Sat Jul 27 20:31:58 2002
@@ -187,10 +187,6 @@
 #if CONFIG_SMP
 inline void synchronize_irq(unsigned int irq)
 {
-	/* is there anything to synchronize with? */
-	if (!irq_desc[irq].action)
-		return;
-
 	while (irq_desc[irq].status & IRQ_INPROGRESS)
 		cpu_relax();
 }
@@ -350,7 +346,7 @@
 	 * use the action we have.
 	 */
 	action = NULL;
-	if (!(status & (IRQ_DISABLED | IRQ_INPROGRESS))) {
+	if (likely(!(status & (IRQ_DISABLED | IRQ_INPROGRESS)))) {
 		action = desc->action;
 		status &= ~IRQ_PENDING; /* we commit to handling */
 		status |= IRQ_INPROGRESS; /* we are handling it */
@@ -363,7 +359,7 @@
 	   a different instance of this same irq, the other processor
 	   will take care of it.
 	 */
-	if (!action)
+	if (unlikely(!action))
 		goto out;
 
 	/*
@@ -381,12 +377,12 @@
 		handle_IRQ_event(irq, &regs, action);
 		spin_lock(&desc->lock);
 		
-		if (!(desc->status & IRQ_PENDING))
+		if (likely(!(desc->status & IRQ_PENDING)))
 			break;
 		desc->status &= ~IRQ_PENDING;
 	}
-	desc->status &= ~IRQ_INPROGRESS;
 out:
+	desc->status &= ~IRQ_INPROGRESS;
 	/*
 	 * The ->end() handler has to deal with interrupts which got
 	 * disabled while the handler was running.
--- linux/arch/i386/kernel/i8259.c.orig2	Sat Jul 27 20:40:11 2002
+++ linux/arch/i386/kernel/i8259.c	Sat Jul 27 20:42:44 2002
@@ -38,7 +38,8 @@
 
 static void end_8259A_irq (unsigned int irq)
 {
-	if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)))
+	if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)) &&
+							irq_desc[irq].action)
 		enable_8259A_irq(irq);
 }
 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Re: Serial Oopsen caused by global IRQ chanes
  2002-07-27 18:43 ` [patch] " Ingo Molnar
@ 2002-07-27 22:36   ` William Lee Irwin III
  2002-07-27 23:01     ` William Lee Irwin III
  0 siblings, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2002-07-27 22:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Russell King, linux-kernel, Linus Torvalds

On Sat, Jul 27, 2002 at 08:43:04PM +0200, Ingo Molnar wrote:
> the attached patch fixes a synchronize_irq() bug: if the interrupt is 
> freed while an IRQ handler is running (irq state is IRQ_INPROGRESS) then 
> synchronize_irq() will return early, which is incorrect.
> there was another do_IRQ() bug that in fact necessiated the bad code that
> caused the synchronize_irq() bug - we kept the IRQ_INPROGRESS bit set for
> not active interrupt sources - after they happen for the first time. Now
> the only effect this has is on i8259A irq handling - we used to keep these
> irqs disabled after the first 'spurious' interrupt happened.  Now what the
> i8259A code really wants to do IMO is to keep the interrupt disabled if
> there is no handler defined for that interrupt source. The patch adds
> exactly this. I dont remember why this was needed in the first place (irq
> probing? avoidance of interrupt storms?), but with the patch the behavior
> should be equivalent.

I'm having trouble with this one, I seem to get lots of these messages:

pu: 12, clocks: 99983, slice: 3029
CPU12<T0:99968,T1:60576,D:15,S:3029,C:99983>
CPU 12 IS NOW UP!
Bringing up 13
cpu: 13, clocks: 99983, slice: 3029
CPU13<T0:99968,T1:57552,D:10,S:3029,C:99983>
CPU 13 IS NOW UP!
Bringing up 14
cpu: 14, clocks: 99983, slice: 3029
CPU14<T0:99968,T1:54528,D:5,S:3029,C:99983>
CPU 14 IS NOW UP!
Bringing up 15
cpu: 15, clocks: 99983, slice: 3029
CPU15<T0:99968,T1:51504,D:0,S:3029,C:99983>
CPU 15 IS NOW UP!
CPUS done 4294967295
Linux NET4.0 for Linux 2.4

... and then the kernel deadlocks after free_initmem()'s printk().


Cheers,
Bill

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Re: Serial Oopsen caused by global IRQ chanes
  2002-07-27 22:36   ` William Lee Irwin III
@ 2002-07-27 23:01     ` William Lee Irwin III
  2002-07-27 23:27       ` William Lee Irwin III
  0 siblings, 1 reply; 7+ messages in thread
From: William Lee Irwin III @ 2002-07-27 23:01 UTC (permalink / raw)
  To: Ingo Molnar, Russell King, linux-kernel, Linus Torvalds

On Sat, Jul 27, 2002 at 08:43:04PM +0200, Ingo Molnar wrote:
>> the attached patch fixes a synchronize_irq() bug: if the interrupt is 
>> freed while an IRQ handler is running (irq state is IRQ_INPROGRESS) then 

On Sat, Jul 27, 2002 at 03:36:17PM -0700, William Lee Irwin III wrote:
> I'm having trouble with this one, I seem to get lots of these messages:
> pu: 12, clocks: 99983, slice: 3029
> CPU12<T0:99968,T1:60576,D:15,S:3029,C:99983>
> CPU 12 IS NOW UP!

Sorry, this is the hotplug stuff which surprised me, but is unrelated
to any deadlock (unless it's printk'ing at a bad time, which I doubt).
Thought it was an arch complaint about trying to wake already-woken cpus.

On Sat, Jul 27, 2002 at 03:36:17PM -0700, William Lee Irwin III wrote:
> ... and then the kernel deadlocks after free_initmem()'s printk().

Seems to be timing related, printk's in init/main.c made it go away.
ISTR hearing something about hotplug merge troubles, I'll investigate
that thread and report results with the fixes posted there.

Cheers,
Bill

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Re: Serial Oopsen caused by global IRQ chanes
  2002-07-27 23:01     ` William Lee Irwin III
@ 2002-07-27 23:27       ` William Lee Irwin III
  0 siblings, 0 replies; 7+ messages in thread
From: William Lee Irwin III @ 2002-07-27 23:27 UTC (permalink / raw)
  To: Ingo Molnar, Russell King, linux-kernel, Linus Torvalds

On Sat, Jul 27, 2002 at 08:43:04PM +0200, Ingo Molnar wrote:
>>> the attached patch fixes a synchronize_irq() bug: if the interrupt is 
>>> freed while an IRQ handler is running (irq state is IRQ_INPROGRESS) then 

In combination with Rusty's hotplug fixes, and your additional diff
atop that, this successfully allows my 16-way to boot.

Let me know if there's anything else you'd like me to test.


Thanks,
Bill

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2002-07-27 23:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-27 18:11 Serial Oopsen caused by global IRQ chanes Russell King
2002-07-27 18:19 ` Russell King
2002-07-27 18:21   ` Ingo Molnar
2002-07-27 18:43 ` [patch] " Ingo Molnar
2002-07-27 22:36   ` William Lee Irwin III
2002-07-27 23:01     ` William Lee Irwin III
2002-07-27 23:27       ` William Lee Irwin III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox