From: Pete Swain <swine@google.com>
To: linux-kernel@vger.kernel.org
Cc: tglx@linutronix.de, Pete Swain <swine@google.com>,
stable@vger.kernel.org
Subject: [PATCH] FIXUP: genirq: defuse spurious-irq timebomb
Date: Fri, 14 Jun 2024 21:42:56 -0700 [thread overview]
Message-ID: <20240615044307.359980-1-swine@google.com> (raw)
The flapping-irq detector still has a timebomb.
A pathological workload, or test script,
can arm the spurious-irq timebomb described in
4f27c00bf80f ("Improve behaviour of spurious IRQ detect")
This leads to irqs being moved the much slower polled mode,
despite the actual unhandled-irq rate being well under the
99.9k/100k threshold that the code appears to check.
How?
- Queued completion handler, like nvme, servicing events
as they appear in the queue, even if the irq corresponding
to the event has not yet been seen.
- queues frequently empty, so seeing "spurious" irqs
whenever the last events of a threaded handler's
while (events_queued()) process_them();
ends with those events' irqs posted while thread was scanning.
In this case the while() has consumed last event(s),
so next handler says IRQ_NONE.
- In each run of "unhandled" irqs, exactly one IRQ_NONE response
is promoted from IRQ_NONE to IRQ_HANDLED, by note_interrupt()'s
SPURIOUS_DEFERRED logic.
- Any 2+ unhandled-irq runs will increment irqs_unhandled.
The time_after() check in note_interrupt() resets irqs_unhandled
to 1 after an idle period, but if irqs are never spaced more
than HZ/10 apart, irqs_unhandled keeps growing.
- During processing of long completion queues, the non-threaded
handlers will return IRQ_WAKE_THREAD, for potentially thousands
of per-event irqs. These bypass note_interrupt()'s irq_count++ logic,
so do not count as handled, and do not invoke the flapping-irq
logic.
- When the _counted_ irq_count reaches the 100k threshold,
it's possible for irqs_unhandled > 99.9k to force a move
to polling mode, even though many millions of _WAKE_THREAD
irqs have been handled without being counted.
Solution: include IRQ_WAKE_THREAD events in irq_count.
Only when IRQ_NONE responses outweigh (IRQ_HANDLED + IRQ_WAKE_THREAD)
by the old 99:1 ratio will an irq be moved to polling mode.
Fixes: 4f27c00bf80f ("Improve behaviour of spurious IRQ detect")
Cc: stable@vger.kernel.org
Signed-off-by: Pete Swain <swine@google.com>
---
kernel/irq/spurious.c | 68 +++++++++++++++++++++----------------------
1 file changed, 34 insertions(+), 34 deletions(-)
diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index 02b2daf07441..ac596c8dc4b1 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -321,44 +321,44 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
*/
if (!(desc->threads_handled_last & SPURIOUS_DEFERRED)) {
desc->threads_handled_last |= SPURIOUS_DEFERRED;
- return;
- }
- /*
- * Check whether one of the threaded handlers
- * returned IRQ_HANDLED since the last
- * interrupt happened.
- *
- * For simplicity we just set bit 31, as it is
- * set in threads_handled_last as well. So we
- * avoid extra masking. And we really do not
- * care about the high bits of the handled
- * count. We just care about the count being
- * different than the one we saw before.
- */
- handled = atomic_read(&desc->threads_handled);
- handled |= SPURIOUS_DEFERRED;
- if (handled != desc->threads_handled_last) {
- action_ret = IRQ_HANDLED;
- /*
- * Note: We keep the SPURIOUS_DEFERRED
- * bit set. We are handling the
- * previous invocation right now.
- * Keep it for the current one, so the
- * next hardware interrupt will
- * account for it.
- */
- desc->threads_handled_last = handled;
} else {
/*
- * None of the threaded handlers felt
- * responsible for the last interrupt
+ * Check whether one of the threaded handlers
+ * returned IRQ_HANDLED since the last
+ * interrupt happened.
*
- * We keep the SPURIOUS_DEFERRED bit
- * set in threads_handled_last as we
- * need to account for the current
- * interrupt as well.
+ * For simplicity we just set bit 31, as it is
+ * set in threads_handled_last as well. So we
+ * avoid extra masking. And we really do not
+ * care about the high bits of the handled
+ * count. We just care about the count being
+ * different than the one we saw before.
*/
- action_ret = IRQ_NONE;
+ handled = atomic_read(&desc->threads_handled);
+ handled |= SPURIOUS_DEFERRED;
+ if (handled != desc->threads_handled_last) {
+ action_ret = IRQ_HANDLED;
+ /*
+ * Note: We keep the SPURIOUS_DEFERRED
+ * bit set. We are handling the
+ * previous invocation right now.
+ * Keep it for the current one, so the
+ * next hardware interrupt will
+ * account for it.
+ */
+ desc->threads_handled_last = handled;
+ } else {
+ /*
+ * None of the threaded handlers felt
+ * responsible for the last interrupt
+ *
+ * We keep the SPURIOUS_DEFERRED bit
+ * set in threads_handled_last as we
+ * need to account for the current
+ * interrupt as well.
+ */
+ action_ret = IRQ_NONE;
+ }
}
} else {
/*
--
2.45.2.627.g7a2c4fd464-goog
next reply other threads:[~2024-06-15 4:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-15 4:42 Pete Swain [this message]
2024-07-07 18:39 ` [PATCH] FIXUP: genirq: defuse spurious-irq timebomb Thomas Gleixner
2024-07-26 17:27 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240615044307.359980-1-swine@google.com \
--to=swine@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox