All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
@ 2013-12-23 19:25 Austin Schuh
  2014-01-06 13:32 ` Oliver Hartkopp
  0 siblings, 1 reply; 15+ messages in thread
From: Austin Schuh @ 2013-12-23 19:25 UTC (permalink / raw)
  To: Thomas Gleixner, Wolfgang Grandegger, Pavel Pisa,
	Marc Kleine-Budde, Oliver Hartkopp, linux-can

Hi Thomas,

Did anything happen with your patch to note_interrupt, originally
posted on May 8th of 2013?  (https://lkml.org/lkml/2013/3/7/222)

I am seeing an issue on a machine right now running a
config-preempt-rt kernel and a SJA1000 CAN card from PEAK.  It works
for ~1 day, and then proceeds to die with a "Disabling IRQ #18"
message.  I posted on the Linux CAN mailing list, and Oliver Hartkopp
was able to reproduce the issue only on a realtime kernel.  A function
trace ending when the IRQ was disabled shows that note_interrupt is
being called regularly from the IRQ handler threads, and one of the
threads is doing work (and therefore calling note_interrupt with
IRQ_HANDLED).

Oliver Hartkopp and I ran tests over the weekend on numerous machines
and verified that the patch that you proposed fixes the problem.  We
think that the race condition that Till reported is causing the
problem here.

In reply to the comment about using the upper bit of
threads_handled_last for holding the SPURIOUS_DEFERRED flag, while
that may still be an over-optimization, the code should still work.
All comparisons are done with the bit set, which just makes it a 31
bit counter.  It will take 8 more days for the counter to overflow on
my machine, so I won't know for certain until then.

My only concern is that there may still be a small race condition with
this new code.  If the interrupt handler thread is running at a
realtime priority, but lower than another task, it may not get run
until a large number of IRQs get triggered, and then process them
quickly.  With your new handler code, this would be counted as one
single handled interrupt.  With the current constants, this is only a
problem if more than 1000 calls to the handler happen between IRQs.  I
starved my card's irq threads by running 4 tasks at a higher realtime
priority than the handler threads, and saw the number of unhandled
IRQs jump from 1/100000 to 3/100000, so that problem may not show up
in practice.

Austin Schuh

Tested-by: Austin Schuh <austin@peloton-tech.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread
* [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs
@ 2013-03-07 13:53 Thomas Gleixner
  2013-03-08 14:47 ` Till Straumann
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2013-03-07 13:53 UTC (permalink / raw)
  To: LKML; +Cc: Till Straumann

Till reported that he spurious interrupt detection of threaded
interrupts is broken in two ways:

- note_interrupt() is called for each action thread of a shared
  interrupt line. That's wrong as we are only interested whether none
  of the device drivers felt responsible for the interrupt, but by
  calling multiple times for a single interrupt line we account
  IRQ_NONE even if one of the drivers felt responsible.

- note_interrupt() when called from the thread handler is not
  serialized. That leaves the members of irq_desc which are used for
  the spurious detection unprotected.

To solve this we need to defer the spurious detection of a threaded
interrupt to the next hardware interrupt context where we have
implicit serialization.

If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we
check whether the previous interrupt requested a deferred check. If
not, we request a deferred check for the next hardware interrupt and
return. 

If set, we check whether one of the interrupt threads signaled
success. Depending on this information we feed the result into the
spurious detector.

If one primary handler of a shared interrupt returns IRQ_HANDLED we
disable the deferred check of irq threads on the same line, as we have
found at least one device driver who cared.

Reported-by: Till Straumann <strauman@slac.stanford.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/irqdesc.h |    4 +++
 kernel/irq/internals.h  |    4 +++
 kernel/irq/manage.c     |    4 +--
 kernel/irq/spurious.c   |   57 ++++++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 63 insertions(+), 6 deletions(-)

Index: linux-2.6/include/linux/irqdesc.h
===================================================================
--- linux-2.6.orig/include/linux/irqdesc.h
+++ linux-2.6/include/linux/irqdesc.h
@@ -27,6 +27,8 @@ struct irq_desc;
  * @irq_count:		stats field to detect stalled irqs
  * @last_unhandled:	aging timer for unhandled count
  * @irqs_unhandled:	stats field for spurious unhandled interrupts
+ * @threads_handled:	stats field for deferred spurious detection of threaded handlers
+ * @threads_handled_last: comparator field for deferred spurious detection of theraded handlers
  * @lock:		locking for SMP
  * @affinity_hint:	hint to user space for preferred irq affinity
  * @affinity_notify:	context for notification of affinity changes
@@ -52,6 +54,8 @@ struct irq_desc {
 	unsigned int		irq_count;	/* For detecting broken IRQs */
 	unsigned long		last_unhandled;	/* Aging timer for unhandled count */
 	unsigned int		irqs_unhandled;
+	atomic_t		threads_handled;
+	int			threads_handled_last;
 	raw_spinlock_t		lock;
 	struct cpumask		*percpu_enabled;
 #ifdef CONFIG_SMP
Index: linux-2.6/kernel/irq/internals.h
===================================================================
--- linux-2.6.orig/kernel/irq/internals.h
+++ linux-2.6/kernel/irq/internals.h
@@ -55,6 +55,10 @@ enum {
 	IRQS_SUSPENDED		= 0x00000800,
 };
 
+/* Bits for spurious detection of threaded handlers */
+#define SPURIOUS_MASK		0x7fffffff
+#define SPURIOUS_DEFERRED	0x80000000
+
 #include "debug.h"
 #include "settings.h"
 
Index: linux-2.6/kernel/irq/manage.c
===================================================================
--- linux-2.6.orig/kernel/irq/manage.c
+++ linux-2.6/kernel/irq/manage.c
@@ -867,8 +867,8 @@ static int irq_thread(void *data)
 		irq_thread_check_affinity(desc, action);
 
 		action_ret = handler_fn(desc, action);
-		if (!noirqdebug)
-			note_interrupt(action->irq, desc, action_ret);
+		if (!noirqdebug && action_ret == IRQ_HANDLED)
+			atomic_inc(&desc->threads_handled);
 
 		wake_threads_waitq(desc);
 	}
Index: linux-2.6/kernel/irq/spurious.c
===================================================================
--- linux-2.6.orig/kernel/irq/spurious.c
+++ linux-2.6/kernel/irq/spurious.c
@@ -271,15 +271,64 @@ void note_interrupt(unsigned int irq, st
 	if (desc->istate & IRQS_POLL_INPROGRESS)
 		return;
 
-	/* we get here again via the threaded handler */
-	if (action_ret == IRQ_WAKE_THREAD)
-		return;
-
 	if (bad_action_ret(action_ret)) {
 		report_bad_irq(irq, desc, action_ret);
 		return;
 	}
 
+	/*
+	 * We cannot call note_interrupt from the threaded handler. So
+	 * the threaded handlers store whether they sucessfully
+	 * handled an interrupt. Here is the right place to take care
+	 * of this.
+	 */
+	if (action_ret & IRQ_WAKE_THREAD) {
+		/*
+		 * There is a thread woken. Check whether one of the
+		 * shared primary handlers returned IRQ_HANDLED. If
+		 * not we defer the spurious detection to the next
+		 * interrupt.
+		 */
+		if (action_ret == IRQ_WAKE_THREAD) {
+			int handled;
+
+			/*
+			 * We use bit 31 of thread_handled_last to
+			 * note the deferred spurious detection
+			 * active. No locking necessary as
+			 * thread_handled_last is only accessed here
+			 * and we have the guarantee that hard
+			 * interrupts are not reentrant.
+			 */
+			if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) {
+				desc->threads_handled_last |= SPURIOUS_DEFERRED;
+				return;
+			}
+			/*
+			 * Check whether one of the threaded handlers
+			 * returned IRQ_HANDLED since the last
+			 * interrupt happened. Or the spurious
+			 * deferred flag for ease of comparison and
+			 * preservation in the writeback.
+			 */
+			handled = atomic_read(&desc->threads_handled);
+			handled |= SPURIOUS_DEFERRED;
+			if (handled != desc->threads_handled_last) {
+				action_ret = IRQ_HANDLED;
+				desc->threads_handled_last = handled;
+			} else {
+				action_ret = IRQ_NONE;
+			}
+		} else {
+			/*
+			 * One of the primary handlers returned
+			 * IRQ_HANDLED. So we don't care about the
+			 * threaded handlers on the same line.
+			 */
+			desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
+		}
+	}
+
 	if (unlikely(action_ret == IRQ_NONE)) {
 		/*
 		 * If we are seeing only the odd spurious IRQ caused by

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-04-28 20:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-23 19:25 [PATCH] genirq: Sanitize spurious interrupt detection of threaded irqs Austin Schuh
2014-01-06 13:32 ` Oliver Hartkopp
2014-04-07 18:38   ` Austin Schuh
2014-04-07 18:41     ` Thomas Gleixner
2014-04-07 20:05       ` Austin Schuh
2014-04-07 20:07         ` Thomas Gleixner
2014-04-07 20:08           ` Austin Schuh
2014-04-28 20:20             ` Austin Schuh
2014-04-28 20:44               ` Thomas Gleixner
  -- strict thread matches above, loose matches on Subject: below --
2013-03-07 13:53 Thomas Gleixner
2013-03-08 14:47 ` Till Straumann
2013-03-08 16:12   ` Thomas Gleixner
2013-03-08 17:19     ` Till Straumann
2013-03-08 19:41       ` Thomas Gleixner
2013-03-12 13:22         ` Till Straumann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.