stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] FIXUP: genirq: defuse spurious-irq timebomb
@ 2024-06-15  4:42 Pete Swain
  2024-07-07 18:39 ` Thomas Gleixner
  0 siblings, 1 reply; 2+ messages in thread
From: Pete Swain @ 2024-06-15  4:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: tglx, Pete Swain, stable

The flapping-irq detector still has a timebomb.

A pathological workload, or test script,
can arm the spurious-irq timebomb described in
  4f27c00bf80f ("Improve behaviour of spurious IRQ detect")

This leads to irqs being moved the much slower polled mode,
despite the actual unhandled-irq rate being well under the
99.9k/100k threshold that the code appears to check.

How?
  - Queued completion handler, like nvme, servicing events
    as they appear in the queue, even if the irq corresponding
    to the event has not yet been seen.

  - queues frequently empty, so seeing "spurious" irqs
    whenever the last events of a threaded handler's
      while (events_queued()) process_them();
    ends with those events' irqs posted while thread was scanning.
    In this case the while() has consumed last event(s),
    so next handler says IRQ_NONE.

  - In each run of "unhandled" irqs, exactly one IRQ_NONE response
    is promoted from IRQ_NONE to IRQ_HANDLED, by note_interrupt()'s
    SPURIOUS_DEFERRED logic.

  - Any 2+ unhandled-irq runs will increment irqs_unhandled.
    The time_after() check in note_interrupt() resets irqs_unhandled
    to 1 after an idle period, but if irqs are never spaced more
    than HZ/10 apart, irqs_unhandled keeps growing.

  - During processing of long completion queues, the non-threaded
    handlers will return IRQ_WAKE_THREAD, for potentially thousands
    of per-event irqs. These bypass note_interrupt()'s irq_count++ logic,
    so do not count as handled, and do not invoke the flapping-irq
    logic.

  - When the _counted_ irq_count reaches the 100k threshold,
    it's possible for irqs_unhandled > 99.9k to force a move
    to polling mode, even though many millions of _WAKE_THREAD
    irqs have been handled without being counted.

Solution: include IRQ_WAKE_THREAD events in irq_count.
Only when IRQ_NONE responses outweigh (IRQ_HANDLED + IRQ_WAKE_THREAD)
by the old 99:1 ratio will an irq be moved to polling mode.

Fixes: 4f27c00bf80f ("Improve behaviour of spurious IRQ detect")
Cc: stable@vger.kernel.org
Signed-off-by: Pete Swain <swine@google.com>
---
 kernel/irq/spurious.c | 68 +++++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index 02b2daf07441..ac596c8dc4b1 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -321,44 +321,44 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
 			 */
 			if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) {
 				desc->threads_handled_last |= SPURIOUS_DEFERRED;
-				return;
-			}
-			/*
-			 * Check whether one of the threaded handlers
-			 * returned IRQ_HANDLED since the last
-			 * interrupt happened.
-			 *
-			 * For simplicity we just set bit 31, as it is
-			 * set in threads_handled_last as well. So we
-			 * avoid extra masking. And we really do not
-			 * care about the high bits of the handled
-			 * count. We just care about the count being
-			 * different than the one we saw before.
-			 */
-			handled = atomic_read(&desc->threads_handled);
-			handled |= SPURIOUS_DEFERRED;
-			if (handled != desc->threads_handled_last) {
-				action_ret = IRQ_HANDLED;
-				/*
-				 * Note: We keep the SPURIOUS_DEFERRED
-				 * bit set. We are handling the
-				 * previous invocation right now.
-				 * Keep it for the current one, so the
-				 * next hardware interrupt will
-				 * account for it.
-				 */
-				desc->threads_handled_last = handled;
 			} else {
 				/*
-				 * None of the threaded handlers felt
-				 * responsible for the last interrupt
+				 * Check whether one of the threaded handlers
+				 * returned IRQ_HANDLED since the last
+				 * interrupt happened.
 				 *
-				 * We keep the SPURIOUS_DEFERRED bit
-				 * set in threads_handled_last as we
-				 * need to account for the current
-				 * interrupt as well.
+				 * For simplicity we just set bit 31, as it is
+				 * set in threads_handled_last as well. So we
+				 * avoid extra masking. And we really do not
+				 * care about the high bits of the handled
+				 * count. We just care about the count being
+				 * different than the one we saw before.
 				 */
-				action_ret = IRQ_NONE;
+				handled = atomic_read(&desc->threads_handled);
+				handled |= SPURIOUS_DEFERRED;
+				if (handled != desc->threads_handled_last) {
+					action_ret = IRQ_HANDLED;
+					/*
+					 * Note: We keep the SPURIOUS_DEFERRED
+					 * bit set. We are handling the
+					 * previous invocation right now.
+					 * Keep it for the current one, so the
+					 * next hardware interrupt will
+					 * account for it.
+					 */
+					desc->threads_handled_last = handled;
+				} else {
+					/*
+					 * None of the threaded handlers felt
+					 * responsible for the last interrupt
+					 *
+					 * We keep the SPURIOUS_DEFERRED bit
+					 * set in threads_handled_last as we
+					 * need to account for the current
+					 * interrupt as well.
+					 */
+					action_ret = IRQ_NONE;
+				}
 			}
 		} else {
 			/*
-- 
2.45.2.627.g7a2c4fd464-goog


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH] FIXUP: genirq: defuse spurious-irq timebomb
  2024-06-15  4:42 [PATCH] FIXUP: genirq: defuse spurious-irq timebomb Pete Swain
@ 2024-07-07 18:39 ` Thomas Gleixner
  0 siblings, 0 replies; 2+ messages in thread
From: Thomas Gleixner @ 2024-07-07 18:39 UTC (permalink / raw)
  To: Pete Swain, linux-kernel; +Cc: Pete Swain, stable

Pete!

On Fri, Jun 14 2024 at 21:42, Pete Swain wrote:
> The flapping-irq detector still has a timebomb.
>
> A pathological workload, or test script,
> can arm the spurious-irq timebomb described in
>   4f27c00bf80f ("Improve behaviour of spurious IRQ detect")
>
> This leads to irqs being moved the much slower polled mode,
> despite the actual unhandled-irq rate being well under the
> 99.9k/100k threshold that the code appears to check.
>
> How?
>   - Queued completion handler, like nvme, servicing events
>     as they appear in the queue, even if the irq corresponding
>     to the event has not yet been seen.
>
>   - queues frequently empty, so seeing "spurious" irqs
>     whenever the last events of a threaded handler's
>       while (events_queued()) process_them();
>     ends with those events' irqs posted while thread was scanning.
>     In this case the while() has consumed last event(s),
>     so next handler says IRQ_NONE.
>
>   - In each run of "unhandled" irqs, exactly one IRQ_NONE response
>     is promoted from IRQ_NONE to IRQ_HANDLED, by note_interrupt()'s
>     SPURIOUS_DEFERRED logic.
>
>   - Any 2+ unhandled-irq runs will increment irqs_unhandled.
>     The time_after() check in note_interrupt() resets irqs_unhandled
>     to 1 after an idle period, but if irqs are never spaced more
>     than HZ/10 apart, irqs_unhandled keeps growing.
>
>   - During processing of long completion queues, the non-threaded
>     handlers will return IRQ_WAKE_THREAD, for potentially thousands
>     of per-event irqs. These bypass note_interrupt()'s irq_count++ logic,
>     so do not count as handled, and do not invoke the flapping-irq
>     logic.
>
>   - When the _counted_ irq_count reaches the 100k threshold,
>     it's possible for irqs_unhandled > 99.9k to force a move
>     to polling mode, even though many millions of _WAKE_THREAD
>     irqs have been handled without being counted.
>
> Solution: include IRQ_WAKE_THREAD events in irq_count.
> Only when IRQ_NONE responses outweigh (IRQ_HANDLED + IRQ_WAKE_THREAD)
> by the old 99:1 ratio will an irq be moved to polling mode.

Nice detective work. Though I'm not entirely sure whether that's the
correct approach as it might misjudge the situation where
IRQ_WAKE_THREAD is issued but the thread does not make progress at all.

Let me think about it some more.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-07-07 18:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-15  4:42 [PATCH] FIXUP: genirq: defuse spurious-irq timebomb Pete Swain
2024-07-07 18:39 ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).