[RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
@ 2010-03-08 23:57 Lars-Peter Clausen
  2010-03-09  7:58 ` Thomas Gleixner
  2010-03-09 23:22 ` Thomas Gleixner
  0 siblings, 2 replies; 13+ messages in thread
From: Lars-Peter Clausen @ 2010-03-08 23:57 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Ingo Molnar, linux-kernel, Lars-Peter Clausen

If the kernel has been compiled with preemtion support and handle_level_irq is
called from process context for a oneshot irq there is a race between
irq_finalize_oneshot and handle_level_irq which results in the irq not being
unmasked after its handlers have been run.

irq_finalize_oneshot is expected to unmask the irq after the threaded irq
handler has been run. It only does so if IRQ_MASKED is set for the irqs status.
IRQ_MASKED gets set in the lower part of handle_level_irq after handle_IRQ_event
has been called.
handle_IRQ_event will wakeup the oneshot irqs threaded handler and if the
kernel has been build with preemption there is a chance that the threaded irq
handler will finish before execution is returned to handle_level_irq.
As a result irq_finalize_oneshot will not unmask the irq and handle_level_irq
will set the IRQ_MASKED flag. Thus the irq will stay masked and stalls.

In case of an race the call-graph would look like this:
 handle_level_irq
 |- mask_ack_irq
 |- handle_IRQ_event
    |- wake_up_process
       |- irq_thread
          |- action->thread_fn
          |- irq_finalize_oneshot # Does not unmask the irq
|- # Set IRQ_MASKED status flag

This patch fixes the race by adding an additional irq status flag which indicates
whether the oneshot threaded handler is still running. And if it's not running
anymore handle_level_irq is going to unmask the irq as for non oneshot irqs.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
---
 include/linux/irq.h |    1 +
 kernel/irq/chip.c   |    5 +++--
 kernel/irq/manage.c |    1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 707ab12..48ff905 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -71,6 +71,7 @@ typedef	void (*irq_flow_handler_t)(unsigned int irq,
 #define IRQ_SUSPENDED		0x04000000	/* IRQ has gone through suspend sequence */
 #define IRQ_ONESHOT		0x08000000	/* IRQ is not unmasked after hardirq */
 #define IRQ_NESTED_THREAD	0x10000000	/* IRQ is nested into another, no own handler thread */
+#define IRQ_ONESHOT_INPROGRESS	0x20000000	/* IRQ onshot handler is running */
 
 #ifdef CONFIG_IRQ_PER_CPU
 # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 42ec11b..ea5c398 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -474,7 +474,7 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
 	if (unlikely(!action || (desc->status & IRQ_DISABLED)))
 		goto out_unlock;
 
-	desc->status |= IRQ_INPROGRESS;
+	desc->status |= IRQ_INPROGRESS | IRQ_ONESHOT_INPROGRESS;
 	raw_spin_unlock(&desc->lock);
 
 	action_ret = handle_IRQ_event(irq, action);
@@ -484,7 +484,8 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
 	raw_spin_lock(&desc->lock);
 	desc->status &= ~IRQ_INPROGRESS;
 
-	if (unlikely(desc->status & IRQ_ONESHOT))
+	if (unlikely((desc->status & (IRQ_ONESHOT | IRQ_ONESHOT_INPROGRESS)) ==
+			IRQ_ONESHOT | IRQ_ONESHOT_INPROGRESS))
 		desc->status |= IRQ_MASKED;
 	else if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
 		desc->chip->unmask(irq);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index eb6078c..cfff5e5 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -489,6 +489,7 @@ static void irq_finalize_oneshot(unsigned int irq, struct irq_desc *desc)
 		desc->status &= ~IRQ_MASKED;
 		desc->chip->unmask(irq);
 	}
+	desc->status &= ~IRQ_ONESHOT_INPROGRESS;
 	raw_spin_unlock_irq(&desc->lock);
 	chip_bus_sync_unlock(irq, desc);
 }
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-08 23:57 [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq Lars-Peter Clausen
@ 2010-03-09  7:58 ` Thomas Gleixner
  2010-03-09  8:08   ` Yong Zhang
  2010-03-09 16:59   ` Valdis.Kletnieks
  2010-03-09 23:22 ` Thomas Gleixner
  1 sibling, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2010-03-09  7:58 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: Ingo Molnar, linux-kernel

On Tue, 9 Mar 2010, Lars-Peter Clausen wrote:

> If the kernel has been compiled with preemtion support and handle_level_irq is
> called from process context for a oneshot irq there is a race between
> irq_finalize_oneshot and handle_level_irq which results in the irq not being
> unmasked after its handlers have been run.
> 
> irq_finalize_oneshot is expected to unmask the irq after the threaded irq
> handler has been run. It only does so if IRQ_MASKED is set for the irqs status.
> IRQ_MASKED gets set in the lower part of handle_level_irq after handle_IRQ_event
> has been called.
> handle_IRQ_event will wakeup the oneshot irqs threaded handler and if the
> kernel has been build with preemption there is a chance that the threaded irq
> handler will finish before execution is returned to handle_level_irq.
> As a result irq_finalize_oneshot will not unmask the irq and handle_level_irq
> will set the IRQ_MASKED flag. Thus the irq will stay masked and stalls.
> 
> In case of an race the call-graph would look like this:
>  handle_level_irq
>  |- mask_ack_irq
>  |- handle_IRQ_event
>     |- wake_up_process
>        |- irq_thread
>           |- action->thread_fn
>           |- irq_finalize_oneshot # Does not unmask the irq
> |- # Set IRQ_MASKED status flag

Errm, a thread _CANNOT_ preempt a hard interrupt handler.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-09  7:58 ` Thomas Gleixner
@ 2010-03-09  8:08   ` Yong Zhang
  2010-03-09 16:59   ` Valdis.Kletnieks
  1 sibling, 0 replies; 13+ messages in thread
From: Yong Zhang @ 2010-03-09  8:08 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Lars-Peter Clausen, Ingo Molnar, linux-kernel

On Tue, Mar 09, 2010 at 08:58:11AM +0100, Thomas Gleixner wrote:
> On Tue, 9 Mar 2010, Lars-Peter Clausen wrote:
> 
> > If the kernel has been compiled with preemtion support and handle_level_irq is
> > called from process context for a oneshot irq there is a race between
> > irq_finalize_oneshot and handle_level_irq which results in the irq not being
> > unmasked after its handlers have been run.
> > 
> > irq_finalize_oneshot is expected to unmask the irq after the threaded irq
> > handler has been run. It only does so if IRQ_MASKED is set for the irqs status.
> > IRQ_MASKED gets set in the lower part of handle_level_irq after handle_IRQ_event
> > has been called.
> > handle_IRQ_event will wakeup the oneshot irqs threaded handler and if the
> > kernel has been build with preemption there is a chance that the threaded irq
> > handler will finish before execution is returned to handle_level_irq.
> > As a result irq_finalize_oneshot will not unmask the irq and handle_level_irq
> > will set the IRQ_MASKED flag. Thus the irq will stay masked and stalls.
> > 
> > In case of an race the call-graph would look like this:
> >  handle_level_irq
> >  |- mask_ack_irq
> >  |- handle_IRQ_event
> >     |- wake_up_process
> >        |- irq_thread
> >           |- action->thread_fn
> >           |- irq_finalize_oneshot # Does not unmask the irq
> > |- # Set IRQ_MASKED status flag
> 
> Errm, a thread _CANNOT_ preempt a hard interrupt handler.

But on SMP, an irq thread could run on another cpu, right?

Thanks,
Yong

> 
> Thanks,
> 
> 	tglx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-09  7:58 ` Thomas Gleixner
  2010-03-09  8:08   ` Yong Zhang
@ 2010-03-09 16:59   ` Valdis.Kletnieks
  2010-03-09 18:10     ` Thomas Gleixner
  1 sibling, 1 reply; 13+ messages in thread
From: Valdis.Kletnieks @ 2010-03-09 16:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Lars-Peter Clausen, Ingo Molnar, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]

On Tue, 09 Mar 2010 08:58:11 +0100, Thomas Gleixner said:
> On Tue, 9 Mar 2010, Lars-Peter Clausen wrote:
> 
> > If the kernel has been compiled with preemtion support and handle_level_irq is
> > called from process context for a oneshot irq there is a race between
> > irq_finalize_oneshot and handle_level_irq which results in the irq not being
> > unmasked after its handlers have been run.
> > 
> > irq_finalize_oneshot is expected to unmask the irq after the threaded irq
> > handler has been run. It only does so if IRQ_MASKED is set for the irqs status.
> > IRQ_MASKED gets set in the lower part of handle_level_irq after handle_IRQ_event
> > has been called.
> > handle_IRQ_event will wakeup the oneshot irqs threaded handler and if the
> > kernel has been build with preemption there is a chance that the threaded irq
> > handler will finish before execution is returned to handle_level_irq.
> > As a result irq_finalize_oneshot will not unmask the irq and handle_level_irq
> > will set the IRQ_MASKED flag. Thus the irq will stay masked and stalls.
> > 
> > In case of an race the call-graph would look like this:
> >  handle_level_irq
> >  |- mask_ack_irq
> >  |- handle_IRQ_event
> >     |- wake_up_process
> >        |- irq_thread
> >           |- action->thread_fn
> >           |- irq_finalize_oneshot # Does not unmask the irq
> > |- # Set IRQ_MASKED status flag
> 
> Errm, a thread _CANNOT_ preempt a hard interrupt handler.

What stops the thread from concurrently running on another CPU and racing
that way?  I'm an idiot, use small words. :)

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-09 16:59   ` Valdis.Kletnieks
@ 2010-03-09 18:10     ` Thomas Gleixner
  2010-03-09 22:48       ` Lars-Peter Clausen
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2010-03-09 18:10 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Lars-Peter Clausen, Ingo Molnar, linux-kernel

On Tue, 9 Mar 2010, Valdis.Kletnieks@vt.edu wrote:
> On Tue, 09 Mar 2010 08:58:11 +0100, Thomas Gleixner said:
> > On Tue, 9 Mar 2010, Lars-Peter Clausen wrote:
> > 
> > > If the kernel has been compiled with preemtion support and handle_level_irq is
> > > called from process context for a oneshot irq there is a race between
> > > irq_finalize_oneshot and handle_level_irq which results in the irq not being
> > > unmasked after its handlers have been run.
> > > 
> > > irq_finalize_oneshot is expected to unmask the irq after the threaded irq
> > > handler has been run. It only does so if IRQ_MASKED is set for the irqs status.
> > > IRQ_MASKED gets set in the lower part of handle_level_irq after handle_IRQ_event
> > > has been called.
> > > handle_IRQ_event will wakeup the oneshot irqs threaded handler and if the
> > > kernel has been build with preemption there is a chance that the threaded irq
> > > handler will finish before execution is returned to handle_level_irq.
> > > As a result irq_finalize_oneshot will not unmask the irq and handle_level_irq
> > > will set the IRQ_MASKED flag. Thus the irq will stay masked and stalls.
> > > 
> > > In case of an race the call-graph would look like this:
> > >  handle_level_irq
> > >  |- mask_ack_irq
> > >  |- handle_IRQ_event
> > >     |- wake_up_process
> > >        |- irq_thread
> > >           |- action->thread_fn
> > >           |- irq_finalize_oneshot # Does not unmask the irq
> > > |- # Set IRQ_MASKED status flag
> > 
> > Errm, a thread _CANNOT_ preempt a hard interrupt handler.
> 
> What stops the thread from concurrently running on another CPU and racing
> that way?  I'm an idiot, use small words. :)

Right it's a valid SMP problem, but I got confused by the lengthy
explanation of a thread preempting an hard interrupt handler. :)

Will have a look after dinner.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-09 18:10     ` Thomas Gleixner
@ 2010-03-09 22:48       ` Lars-Peter Clausen
  2010-03-09 23:32         ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Lars-Peter Clausen @ 2010-03-09 22:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Valdis.Kletnieks, Ingo Molnar, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Thomas Gleixner wrote:
> On Tue, 9 Mar 2010, Valdis.Kletnieks@vt.edu wrote:
>> On Tue, 09 Mar 2010 08:58:11 +0100, Thomas Gleixner said:
>>> On Tue, 9 Mar 2010, Lars-Peter Clausen wrote:
>>>
>>>> If the kernel has been compiled with preemtion support and
handle_level_irq is
>>>> called from process context for a oneshot irq there is a race between
>>>> irq_finalize_oneshot and handle_level_irq which results in the irq
not being
>>>> unmasked after its handlers have been run.
>>>>
>>>> irq_finalize_oneshot is expected to unmask the irq after the
threaded irq
>>>> handler has been run. It only does so if IRQ_MASKED is set for the
irqs status.
>>>> IRQ_MASKED gets set in the lower part of handle_level_irq after
handle_IRQ_event
>>>> has been called.
>>>> handle_IRQ_event will wakeup the oneshot irqs threaded handler and
if the
>>>> kernel has been build with preemption there is a chance that the
threaded irq
>>>> handler will finish before execution is returned to handle_level_irq.
>>>> As a result irq_finalize_oneshot will not unmask the irq and
handle_level_irq
>>>> will set the IRQ_MASKED flag. Thus the irq will stay masked and stalls.
>>>>
>>>> In case of an race the call-graph would look like this:
>>>>  handle_level_irq
>>>>  |- mask_ack_irq
>>>>  |- handle_IRQ_event
>>>>     |- wake_up_process
>>>>        |- irq_thread
>>>>           |- action->thread_fn
>>>>           |- irq_finalize_oneshot # Does not unmask the irq
>>>> |- # Set IRQ_MASKED status flag
>>> Errm, a thread _CANNOT_ preempt a hard interrupt handler.
>> What stops the thread from concurrently running on another CPU and racing
>> that way?  I'm an idiot, use small words. :)
>
> Right it's a valid SMP problem, but I got confused by the lengthy
> explanation of a thread preempting an hard interrupt handler. :)
Yes, sorry for bothering you with that. I actually had the symptoms
described on a non SMP system. The irq in question was part of a
longer irq chain and I though that it might be detached from the
hardirq context somewhere along the chain.
After your comment I reread some parts and it turned out that
irq_enter()/irq_exit() was missing around the first level irq handler.

- - Lars
>
> Will have a look after dinner.
>
> Thanks,
>
>     tglx
>

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkuW0FUACgkQBX4mSR26RiOiSgCfUJJM6i8rEoiifuY5LwoOaJnA
kfAAnjHh5us4m8NgPEV2wYkyIPh7zzly
=R/J/
-----END PGP SIGNATURE-----


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-08 23:57 [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq Lars-Peter Clausen
  2010-03-09  7:58 ` Thomas Gleixner
@ 2010-03-09 23:22 ` Thomas Gleixner
  2010-03-10  3:21   ` Yong Zhang
  1 sibling, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2010-03-09 23:22 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: Ingo Molnar, linux-kernel

B1;2005;0cOn Tue, 9 Mar 2010, Lars-Peter Clausen wrote:
>  
> -	desc->status |= IRQ_INPROGRESS;
> +	desc->status |= IRQ_INPROGRESS | IRQ_ONESHOT_INPROGRESS;
>  	raw_spin_unlock(&desc->lock);

That keeps the IRQ_ONESHOT_INPROGRESS dangling for non ONESHOT
interrupts. Not a big deal, but not pretty either.
  
The race between the thread and the irq handler exists indeed on SMP,
but I think there are more fundamental issues about the state which
need to be addressed.

The first thing is that we do not mark the status MASKED when we
actually mask the interrupt in mask_ack_irq(). 

That conditional MASKED after running the primary handler is really
horrible - I already ranted in private at the moron who committed that
crime :)

So the following patch fixes that and the SMP race scenario:

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index d70394f..3e53334 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -359,6 +359,7 @@ static inline void mask_ack_irq(struct irq_desc *desc, int irq)
 		if (desc->chip->ack)
 			desc->chip->ack(irq);
 	}
+	desc->status |= IRQ_MASKED;
 }
 
 /*
@@ -484,10 +485,11 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
 	raw_spin_lock(&desc->lock);
 	desc->status &= ~IRQ_INPROGRESS;
 
-	if (unlikely(desc->status & IRQ_ONESHOT))
-		desc->status |= IRQ_MASKED;
-	else if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
+	if (!(desc->status & (IRQ_DISABLED | IRQ_ONESHOT)) &&
+	    desc->chip->unmask) {
 		desc->chip->unmask(irq);
+		desc->status &= ~IRQ_MASKED;
+	}
 out_unlock:
 	raw_spin_unlock(&desc->lock);
 }

But that change opens up another race which is similar to the one you
pointed out:

 CPU0                            CPU1
 hande_level_irq(irq X)
   mask_ack_irq(irq X)
   handle_IRQ_event(irq X)
     wake_up(thread_handler)
                                 thread handler(irq X) runs
     interrupt irqY
       handle_*_irq(irq Y)
       .....
                                 finalize_oneshot(irq X)
                                   unmask(irq X)
     interrupt irq X
       handle_level_irq(irq X)
       mask_ack_irq(irq X)
     return from irq X due to IRQ_INPROGRESS

   return from irq Y

 return from irq X w/o unmask due to IRQ_ONESHOT

That's pretty unlikely, but possible. I know that your patch would
mitigate that problem, but I'm not happy about making it go away with
non obvious state magic. I think I know how to fix it, but I need some
sleep now.

Thanks,

	tglx



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-09 22:48       ` Lars-Peter Clausen
@ 2010-03-09 23:32         ` Thomas Gleixner
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2010-03-09 23:32 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: Valdis.Kletnieks, Ingo Molnar, linux-kernel

Lars-Peter,

On Tue, 9 Mar 2010, Lars-Peter Clausen wrote:
> > Right it's a valid SMP problem, but I got confused by the lengthy
> > explanation of a thread preempting an hard interrupt handler. :)
>
> Yes, sorry for bothering you with that. I actually had the symptoms
> described on a non SMP system. The irq in question was part of a
> longer irq chain and I though that it might be detached from the
> hardirq context somewhere along the chain.
> After your comment I reread some parts and it turned out that
> irq_enter()/irq_exit() was missing around the first level irq handler.

I'm happy you stared at that code even if you came to the wrong
conclusion vs. the problem that triggered your patch. You pointed out
a real bug in the code, which did not yet trigger because UP is not
affected and usually irqbalance pins irqs and also the irq threads to
a single cpu.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-09 23:22 ` Thomas Gleixner
@ 2010-03-10  3:21   ` Yong Zhang
  2010-03-10  7:56     ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Yong Zhang @ 2010-03-10  3:21 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Lars-Peter Clausen, Ingo Molnar, linux-kernel

On Wed, Mar 10, 2010 at 12:22:12AM +0100, Thomas Gleixner wrote:
> B1;2005;0cOn Tue, 9 Mar 2010, Lars-Peter Clausen wrote:
> >  
> > -	desc->status |= IRQ_INPROGRESS;
> > +	desc->status |= IRQ_INPROGRESS | IRQ_ONESHOT_INPROGRESS;
> >  	raw_spin_unlock(&desc->lock);
> 
> That keeps the IRQ_ONESHOT_INPROGRESS dangling for non ONESHOT
> interrupts. Not a big deal, but not pretty either.
>   
> The race between the thread and the irq handler exists indeed on SMP,
> but I think there are more fundamental issues about the state which
> need to be addressed.
> 
> The first thing is that we do not mark the status MASKED when we
> actually mask the interrupt in mask_ack_irq(). 
> 
> That conditional MASKED after running the primary handler is really
> horrible - I already ranted in private at the moron who committed that
> crime :)
> 
> So the following patch fixes that and the SMP race scenario:

Hi Thomas,

How about the following patch(maybe a little ugly). I think it will
resolve your concerns.

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index d70394f..23b79c6 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -461,9 +461,24 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
 	raw_spin_lock(&desc->lock);
 	mask_ack_irq(desc, irq);
 
-	if (unlikely(desc->status & IRQ_INPROGRESS))
-		goto out_unlock;
+	/*
+	 * if we are in oneshot mode and the irq thread is running on
+	 * another cpu, just return because the irq thread will unmask
+	 * the irq
+	 */
+	if (unlikely(desc->status & IRQ_ONESHOT)) {
+		if (unlikely(desc->status & (IRQ_INPROGRESS | IRQ_MASKED)
+					  == IRQ_INPROGRESS | IRQ_MASKED))
+			goto out_unlock;
+	}
+	else {
+		if (unlikely(desc->status & IRQ_INPROGRESS))
+			goto out_unlock;
+	}
+
 	desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
+	if (unlikely(desc->status & IRQ_ONESHOT))
+		desc->status |= IRQ_MASKED;
 	kstat_incr_irqs_this_cpu(irq, desc);
 
 	/*
@@ -484,9 +499,7 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
 	raw_spin_lock(&desc->lock);
 	desc->status &= ~IRQ_INPROGRESS;
 
-	if (unlikely(desc->status & IRQ_ONESHOT))
-		desc->status |= IRQ_MASKED;
-	else if (!(desc->status & IRQ_DISABLED) && desc->chip->unmask)
+	if (!(desc->status & (IRQ_DISABLED | IRQ_MASKED) && desc->chip->unmask)
 		desc->chip->unmask(irq);
 out_unlock:
 	raw_spin_unlock(&desc->lock);

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-10  3:21   ` Yong Zhang
@ 2010-03-10  7:56     ` Thomas Gleixner
  2010-03-11  2:55       ` Yong Zhang
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2010-03-10  7:56 UTC (permalink / raw)
  To: Yong Zhang; +Cc: Lars-Peter Clausen, Ingo Molnar, linux-kernel

On Wed, 10 Mar 2010, Yong Zhang wrote:

> On Wed, Mar 10, 2010 at 12:22:12AM +0100, Thomas Gleixner wrote:
> > B1;2005;0cOn Tue, 9 Mar 2010, Lars-Peter Clausen wrote:
> > >  
> > > -	desc->status |= IRQ_INPROGRESS;
> > > +	desc->status |= IRQ_INPROGRESS | IRQ_ONESHOT_INPROGRESS;
> > >  	raw_spin_unlock(&desc->lock);
> > 
> > That keeps the IRQ_ONESHOT_INPROGRESS dangling for non ONESHOT
> > interrupts. Not a big deal, but not pretty either.
> >   
> > The race between the thread and the irq handler exists indeed on SMP,
> > but I think there are more fundamental issues about the state which
> > need to be addressed.
> > 
> > The first thing is that we do not mark the status MASKED when we
> > actually mask the interrupt in mask_ack_irq(). 
> > 
> > That conditional MASKED after running the primary handler is really
> > horrible - I already ranted in private at the moron who committed that
> > crime :)
> > 
> > So the following patch fixes that and the SMP race scenario:
> 
> Hi Thomas,
> 
> How about the following patch(maybe a little ugly). I think it will
> resolve your concerns.

No it does not, but you are right that it's ugly. And it is patently
wrong as well.
 
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index d70394f..23b79c6 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -461,9 +461,24 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
>  	raw_spin_lock(&desc->lock);
>  	mask_ack_irq(desc, irq);
>  
> -	if (unlikely(desc->status & IRQ_INPROGRESS))
> -		goto out_unlock;
> +	/*
> +	 * if we are in oneshot mode and the irq thread is running on
> +	 * another cpu, just return because the irq thread will unmask
> +	 * the irq
> +	 */
> +	if (unlikely(desc->status & IRQ_ONESHOT)) {
> +		if (unlikely(desc->status & (IRQ_INPROGRESS | IRQ_MASKED)
> +					  == IRQ_INPROGRESS | IRQ_MASKED))
> +			goto out_unlock;
> +	}
> +	else {
> +		if (unlikely(desc->status & IRQ_INPROGRESS))
> +			goto out_unlock;
> +	}

  In case of IRQ_SHOT and IRQ_INPROGRESS and the other CPU having
  unmasked the interrupt already you are reentering the handler which
  is a nono.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-10  7:56     ` Thomas Gleixner
@ 2010-03-11  2:55       ` Yong Zhang
  2010-03-11  8:41         ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Yong Zhang @ 2010-03-11  2:55 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Lars-Peter Clausen, Ingo Molnar, linux-kernel

On Wed, Mar 10, 2010 at 08:56:22AM +0100, Thomas Gleixner wrote:
> > How about the following patch(maybe a little ugly). I think it will
> > resolve your concerns.
> 
> No it does not, but you are right that it's ugly. And it is patently
> wrong as well.
>  
> > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > index d70394f..23b79c6 100644
> > --- a/kernel/irq/chip.c
> > +++ b/kernel/irq/chip.c
> > @@ -461,9 +461,24 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
> >  	raw_spin_lock(&desc->lock);
> >  	mask_ack_irq(desc, irq);

I must say this patch isn't based on your previous one in which
mask_ack_irq() is modified to flag IRQ_MASKED.
I let IRQ_MASKED serialise interrupt-handler and irq-thread in oneshot
mode. 

> >  
> > -	if (unlikely(desc->status & IRQ_INPROGRESS))
> > -		goto out_unlock;
> > +	/*
> > +	 * if we are in oneshot mode and the irq thread is running on
> > +	 * another cpu, just return because the irq thread will unmask
> > +	 * the irq
> > +	 */
> > +	if (unlikely(desc->status & IRQ_ONESHOT)) {
> > +		if (unlikely(desc->status & (IRQ_INPROGRESS | IRQ_MASKED)
> > +					  == IRQ_INPROGRESS | IRQ_MASKED))
> > +			goto out_unlock;
> > +	}
> > +	else {
> > +		if (unlikely(desc->status & IRQ_INPROGRESS))
> > +			goto out_unlock;
> > +	}
> 
>   In case of IRQ_SHOT and IRQ_INPROGRESS and the other CPU having
>   unmasked the interrupt already you are reentering the handler which
>   is a nono.

I have thought of that kind of reentering you point above,
if IRQ_MASKED is cleared by irq_finalize_oneshot(), the thread_fn()
is done as well as the hardware opration. If another irq comes in,
the reentering does happen, but I think it's harmless because at
this time we just set IRQTF_RUNTHREAD and IRQ_MASKED and wake
irq thread up and then the irq thread loops for another time.
So irq will not return unmask in case of oneshot.

If I miss something, correct me please.

Thanks,
Yong



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-11  2:55       ` Yong Zhang
@ 2010-03-11  8:41         ` Thomas Gleixner
  2010-03-11  9:13           ` Yong Zhang
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2010-03-11  8:41 UTC (permalink / raw)
  To: Yong Zhang; +Cc: Lars-Peter Clausen, Ingo Molnar, linux-kernel

On Thu, 11 Mar 2010, Yong Zhang wrote:

> On Wed, Mar 10, 2010 at 08:56:22AM +0100, Thomas Gleixner wrote:
> > > How about the following patch(maybe a little ugly). I think it will
> > > resolve your concerns.
> > 
> > No it does not, but you are right that it's ugly. And it is patently
> > wrong as well.
> >  
> > > diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> > > index d70394f..23b79c6 100644
> > > --- a/kernel/irq/chip.c
> > > +++ b/kernel/irq/chip.c
> > > @@ -461,9 +461,24 @@ handle_level_irq(unsigned int irq, struct irq_desc *desc)
> > >  	raw_spin_lock(&desc->lock);
> > >  	mask_ack_irq(desc, irq);
> 
> I must say this patch isn't based on your previous one in which
> mask_ack_irq() is modified to flag IRQ_MASKED.
>
> I let IRQ_MASKED serialise interrupt-handler and irq-thread in oneshot
> mode. 

And I said in a previous mail, that magically covering stuff with
inconsistent status information is the wrong way to fix it. 

The IRQ_MASKED state is inconsistent in the current code and that
needs to be fixed first.

And your patch does solve nothing at all. It merily violates the
semantics of non reentrant interrupt handlers.

> > >  
> > > -	if (unlikely(desc->status & IRQ_INPROGRESS))
> > > -		goto out_unlock;
> > > +	/*
> > > +	 * if we are in oneshot mode and the irq thread is running on
> > > +	 * another cpu, just return because the irq thread will unmask
> > > +	 * the irq
> > > +	 */
> > > +	if (unlikely(desc->status & IRQ_ONESHOT)) {
> > > +		if (unlikely(desc->status & (IRQ_INPROGRESS | IRQ_MASKED)
> > > +					  == IRQ_INPROGRESS | IRQ_MASKED))
> > > +			goto out_unlock;
> > > +	}
> > > +	else {
> > > +		if (unlikely(desc->status & IRQ_INPROGRESS))
> > > +			goto out_unlock;
> > > +	}
> > 
> >   In case of IRQ_SHOT and IRQ_INPROGRESS and the other CPU having
> >   unmasked the interrupt already you are reentering the handler which
> >   is a nono.
> 
> I have thought of that kind of reentering you point above,
> if IRQ_MASKED is cleared by irq_finalize_oneshot(), the thread_fn()
> is done as well as the hardware opration. If another irq comes in,
> the reentering does happen, but I think it's harmless because at
> this time we just set IRQTF_RUNTHREAD and IRQ_MASKED and wake
> irq thread up and then the irq thread loops for another time.
> So irq will not return unmask in case of oneshot.

No, it is _NOT_ harmless. interrupt handlers are guaranteed _NOT_ to
be reentered. That's why we check the INPROGRESS flag. 

And you totally miss the case which caused the disussion in the first
place: When the thread finishes _before_ the hard irq handler on the
other CPU has set IRQ_MASKED. So this patch solves nothing at all, it
just makes stuff worse than it was.

Here is the solution which solves the inconsistent lock state _AND_
the reentrancy race.

http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=0b1adaa031a55e44f5dd942f234bf09d28e8a0d6

The change log has a full explanation of the scenarios.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq
  2010-03-11  8:41         ` Thomas Gleixner
@ 2010-03-11  9:13           ` Yong Zhang
  0 siblings, 0 replies; 13+ messages in thread
From: Yong Zhang @ 2010-03-11  9:13 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Lars-Peter Clausen, Ingo Molnar, linux-kernel

On Thu, Mar 11, 2010 at 09:41:45AM +0100, Thomas Gleixner wrote:
> No, it is _NOT_ harmless. interrupt handlers are guaranteed _NOT_ to
> be reentered. That's why we check the INPROGRESS flag. 
> 
> And you totally miss the case which caused the disussion in the first
> place: When the thread finishes _before_ the hard irq handler on the
> other CPU has set IRQ_MASKED. So this patch solves nothing at all, it
> just makes stuff worse than it was.
> 
> Here is the solution which solves the inconsistent lock state _AND_
> the reentrancy race.
> 
> http://git.kernel.org/?p=linux/kernel/git/tip/linux-2.6-tip.git;a=commit;h=0b1adaa031a55e44f5dd942f234bf09d28e8a0d6
> 
> The change log has a full explanation of the scenarios.

Yeah, I see. Thanks for your point.
It will smooth the concerns.

Thanks,
Yong

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-03-11  9:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-08 23:57 [RFC][PATCH] IRQ: Fix oneshot irq race between irq_finalize_oneshot and handle_level_irq Lars-Peter Clausen
2010-03-09  7:58 ` Thomas Gleixner
2010-03-09  8:08   ` Yong Zhang
2010-03-09 16:59   ` Valdis.Kletnieks
2010-03-09 18:10     ` Thomas Gleixner
2010-03-09 22:48       ` Lars-Peter Clausen
2010-03-09 23:32         ` Thomas Gleixner
2010-03-09 23:22 ` Thomas Gleixner
2010-03-10  3:21   ` Yong Zhang
2010-03-10  7:56     ` Thomas Gleixner
2010-03-11  2:55       ` Yong Zhang
2010-03-11  8:41         ` Thomas Gleixner
2010-03-11  9:13           ` Yong Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).