* [patch 1/4] genirq: Remove pointless local variable
2025-07-18 18:54 [patch 0/4] genirq: Prevent migration live lock in handle_edge_irq() Thomas Gleixner
@ 2025-07-18 18:54 ` Thomas Gleixner
2025-07-22 12:45 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2025-07-18 18:54 ` [patch 2/4] genirq: Move irq_wait_for_poll() to call site Thomas Gleixner
` (3 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2025-07-18 18:54 UTC (permalink / raw)
To: LKML; +Cc: Liangyan, Yicong Shen, Jiri Slaby
The variable is only used at one place, which can simply take the constant
as function argument.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/irq/chip.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -458,13 +458,11 @@ static bool irq_check_poll(struct irq_de
static bool irq_can_handle_pm(struct irq_desc *desc)
{
- unsigned int mask = IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED;
-
/*
* If the interrupt is not in progress and is not an armed
* wakeup interrupt, proceed.
*/
- if (!irqd_has_set(&desc->irq_data, mask))
+ if (!irqd_has_set(&desc->irq_data, IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED))
return true;
/*
^ permalink raw reply [flat|nested] 14+ messages in thread
* [tip: irq/core] genirq: Remove pointless local variable
2025-07-18 18:54 ` [patch 1/4] genirq: Remove pointless local variable Thomas Gleixner
@ 2025-07-22 12:45 ` tip-bot2 for Thomas Gleixner
0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2025-07-22 12:45 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Liangyan, x86, linux-kernel, maz
The following commit has been merged into the irq/core branch of tip:
Commit-ID: 46958a7bac2d32fda43fd7cd1858aa414640fbd1
Gitweb: https://git.kernel.org/tip/46958a7bac2d32fda43fd7cd1858aa414640fbd1
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 18 Jul 2025 20:54:06 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 22 Jul 2025 14:30:42 +02:00
genirq: Remove pointless local variable
The variable is only used at one place, which can simply take the constant
as function argument.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Link: https://lore.kernel.org/all/20250718185311.884314473@linutronix.de
---
kernel/irq/chip.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 2b27400..5bb26fc 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -466,13 +466,11 @@ static bool irq_check_poll(struct irq_desc *desc)
static bool irq_can_handle_pm(struct irq_desc *desc)
{
- unsigned int mask = IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED;
-
/*
* If the interrupt is not in progress and is not an armed
* wakeup interrupt, proceed.
*/
- if (!irqd_has_set(&desc->irq_data, mask))
+ if (!irqd_has_set(&desc->irq_data, IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED))
return true;
/*
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [patch 2/4] genirq: Move irq_wait_for_poll() to call site
2025-07-18 18:54 [patch 0/4] genirq: Prevent migration live lock in handle_edge_irq() Thomas Gleixner
2025-07-18 18:54 ` [patch 1/4] genirq: Remove pointless local variable Thomas Gleixner
@ 2025-07-18 18:54 ` Thomas Gleixner
2025-07-22 7:07 ` Jiri Slaby
2025-07-22 12:45 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2025-07-18 18:54 ` [patch 3/4] genirq: Split up irq_pm_check_wakeup() Thomas Gleixner
` (2 subsequent siblings)
4 siblings, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2025-07-18 18:54 UTC (permalink / raw)
To: LKML; +Cc: Liangyan, Yicong Shen, Jiri Slaby
Move it to the call site so that the waiting for the INPROGRESS flag can be
reused by an upcoming mitigation for a potential live lock in the edge type
handler.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/irq/chip.c | 29 +++++++++++++++++++++--------
kernel/irq/internals.h | 2 +-
kernel/irq/spurious.c | 37 +------------------------------------
3 files changed, 23 insertions(+), 45 deletions(-)
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -449,11 +449,19 @@ void unmask_threaded_irq(struct irq_desc
unmask_irq(desc);
}
-static bool irq_check_poll(struct irq_desc *desc)
+/* Busy wait until INPROGRESS is cleared */
+static bool irq_wait_on_inprogress(struct irq_desc *desc)
{
- if (!(desc->istate & IRQS_POLL_INPROGRESS))
- return false;
- return irq_wait_for_poll(desc);
+ if (IS_ENABLED(CONFIG_SMP)) {
+ do {
+ raw_spin_unlock(&desc->lock);
+ while (irqd_irq_inprogress(&desc->irq_data))
+ cpu_relax();
+ raw_spin_lock(&desc->lock);
+ } while (irqd_irq_inprogress(&desc->irq_data));
+ }
+ /* Might have been disabled in meantime */
+ return !irqd_irq_disabled(&desc->irq_data) && desc->action;
}
static bool irq_can_handle_pm(struct irq_desc *desc)
@@ -473,10 +481,15 @@ static bool irq_can_handle_pm(struct irq
if (irq_pm_check_wakeup(desc))
return false;
- /*
- * Handle a potential concurrent poll on a different core.
- */
- return irq_check_poll(desc);
+ /* Check whether the interrupt is polled on another CPU */
+ if (unlikely(desc->istate & IRQS_POLL_INPROGRESS)) {
+ if (WARN_ONCE(irq_poll_cpu == smp_processor_id(),
+ "irq poll in progress on cpu %d for irq %d\n",
+ smp_processor_id(), desc->irq_data.irq))
+ return false;
+ return irq_wait_on_inprogress(desc);
+ }
+ return false;
}
static inline bool irq_can_handle_actions(struct irq_desc *desc)
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -20,6 +20,7 @@
#define istate core_internal_state__do_not_mess_with_it
extern bool noirqdebug;
+extern int irq_poll_cpu;
extern struct irqaction chained_action;
@@ -112,7 +113,6 @@ irqreturn_t handle_irq_event(struct irq_
int check_irq_resend(struct irq_desc *desc, bool inject);
void clear_irq_resend(struct irq_desc *desc);
void irq_resend_init(struct irq_desc *desc);
-bool irq_wait_for_poll(struct irq_desc *desc);
void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action);
void wake_threads_waitq(struct irq_desc *desc);
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -19,45 +19,10 @@ static int irqfixup __read_mostly;
#define POLL_SPURIOUS_IRQ_INTERVAL (HZ/10)
static void poll_spurious_irqs(struct timer_list *unused);
static DEFINE_TIMER(poll_spurious_irq_timer, poll_spurious_irqs);
-static int irq_poll_cpu;
+int irq_poll_cpu;
static atomic_t irq_poll_active;
/*
- * We wait here for a poller to finish.
- *
- * If the poll runs on this CPU, then we yell loudly and return
- * false. That will leave the interrupt line disabled in the worst
- * case, but it should never happen.
- *
- * We wait until the poller is done and then recheck disabled and
- * action (about to be disabled). Only if it's still active, we return
- * true and let the handler run.
- */
-bool irq_wait_for_poll(struct irq_desc *desc)
-{
- lockdep_assert_held(&desc->lock);
-
- if (WARN_ONCE(irq_poll_cpu == smp_processor_id(),
- "irq poll in progress on cpu %d for irq %d\n",
- smp_processor_id(), desc->irq_data.irq))
- return false;
-
-#ifdef CONFIG_SMP
- do {
- raw_spin_unlock(&desc->lock);
- while (irqd_irq_inprogress(&desc->irq_data))
- cpu_relax();
- raw_spin_lock(&desc->lock);
- } while (irqd_irq_inprogress(&desc->irq_data));
- /* Might have been disabled in meantime */
- return !irqd_irq_disabled(&desc->irq_data) && desc->action;
-#else
- return false;
-#endif
-}
-
-
-/*
* Recovery handler for misrouted interrupts.
*/
static bool try_one_irq(struct irq_desc *desc, bool force)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/4] genirq: Move irq_wait_for_poll() to call site
2025-07-18 18:54 ` [patch 2/4] genirq: Move irq_wait_for_poll() to call site Thomas Gleixner
@ 2025-07-22 7:07 ` Jiri Slaby
2025-07-22 12:36 ` Thomas Gleixner
2025-07-22 12:45 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
1 sibling, 1 reply; 14+ messages in thread
From: Jiri Slaby @ 2025-07-22 7:07 UTC (permalink / raw)
To: Thomas Gleixner, LKML; +Cc: Liangyan, Yicong Shen
On 18. 07. 25, 20:54, Thomas Gleixner wrote:
> Move it to the call site so that the waiting for the INPROGRESS flag can be
> reused by an upcoming mitigation for a potential live lock in the edge type
> handler.
>
> No functional change.
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -449,11 +449,19 @@ void unmask_threaded_irq(struct irq_desc
> unmask_irq(desc);
> }
>
> -static bool irq_check_poll(struct irq_desc *desc)
> +/* Busy wait until INPROGRESS is cleared */
> +static bool irq_wait_on_inprogress(struct irq_desc *desc)
> {
> - if (!(desc->istate & IRQS_POLL_INPROGRESS))
> - return false;
> - return irq_wait_for_poll(desc);
> + if (IS_ENABLED(CONFIG_SMP)) {
> + do {
> + raw_spin_unlock(&desc->lock);
> + while (irqd_irq_inprogress(&desc->irq_data))
> + cpu_relax();
> + raw_spin_lock(&desc->lock);
> + } while (irqd_irq_inprogress(&desc->irq_data));
> + }
> + /* Might have been disabled in meantime */
> + return !irqd_irq_disabled(&desc->irq_data) && desc->action;
Just noting that this line is newly evaluated on !SMP too. But it is
still supposed to evaluate to false, given we are here on this only CPU.
thanks,
--
js
suse labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 2/4] genirq: Move irq_wait_for_poll() to call site
2025-07-22 7:07 ` Jiri Slaby
@ 2025-07-22 12:36 ` Thomas Gleixner
0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2025-07-22 12:36 UTC (permalink / raw)
To: Jiri Slaby, LKML; +Cc: Liangyan, Yicong Shen
On Tue, Jul 22 2025 at 09:07, Jiri Slaby wrote:
>> + if (IS_ENABLED(CONFIG_SMP)) {
>> + do {
>> + raw_spin_unlock(&desc->lock);
>> + while (irqd_irq_inprogress(&desc->irq_data))
>> + cpu_relax();
>> + raw_spin_lock(&desc->lock);
>> + } while (irqd_irq_inprogress(&desc->irq_data));
>> + }
>> + /* Might have been disabled in meantime */
>> + return !irqd_irq_disabled(&desc->irq_data) && desc->action;
>
> Just noting that this line is newly evaluated on !SMP too. But it is
> still supposed to evaluate to false, given we are here on this only CPU.
It does not, but in that case the code is not reached because the check
at the call site which evaluates whether the polling CPU is the current
CPU triggers. So it does not really matter in this context.
But for the other case in handle_edge_irq() this has to return false. It
should not ever get there because interrupts are disabled while
INPROGRESS is set, emphasis on *should* :)
So I moved it back into the CONFIG_SMP conditional. Thanks for spotting
it!
tglx
^ permalink raw reply [flat|nested] 14+ messages in thread
* [tip: irq/core] genirq: Move irq_wait_for_poll() to call site
2025-07-18 18:54 ` [patch 2/4] genirq: Move irq_wait_for_poll() to call site Thomas Gleixner
2025-07-22 7:07 ` Jiri Slaby
@ 2025-07-22 12:45 ` tip-bot2 for Thomas Gleixner
2025-07-23 6:22 ` Ingo Molnar
1 sibling, 1 reply; 14+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2025-07-22 12:45 UTC (permalink / raw)
To: linux-tip-commits
Cc: Thomas Gleixner, Liangyan, Jiri Slaby, x86, linux-kernel, maz
The following commit has been merged into the irq/core branch of tip:
Commit-ID: 4e879dedd571128ed5aa4d5989ec0a1938804d20
Gitweb: https://git.kernel.org/tip/4e879dedd571128ed5aa4d5989ec0a1938804d20
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 18 Jul 2025 20:54:08 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 22 Jul 2025 14:30:42 +02:00
genirq: Move irq_wait_for_poll() to call site
Move it to the call site so that the waiting for the INPROGRESS flag can be
reused by an upcoming mitigation for a potential live lock in the edge type
handler.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/all/20250718185311.948555026@linutronix.de
---
kernel/irq/chip.c | 33 ++++++++++++++++++++++++---------
kernel/irq/internals.h | 2 +-
kernel/irq/spurious.c | 37 +------------------------------------
3 files changed, 26 insertions(+), 46 deletions(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 5bb26fc..290244c 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -457,11 +457,21 @@ void unmask_threaded_irq(struct irq_desc *desc)
unmask_irq(desc);
}
-static bool irq_check_poll(struct irq_desc *desc)
-{
- if (!(desc->istate & IRQS_POLL_INPROGRESS))
- return false;
- return irq_wait_for_poll(desc);
+/* Busy wait until INPROGRESS is cleared */
+static bool irq_wait_on_inprogress(struct irq_desc *desc)
+{
+ if (IS_ENABLED(CONFIG_SMP)) {
+ do {
+ raw_spin_unlock(&desc->lock);
+ while (irqd_irq_inprogress(&desc->irq_data))
+ cpu_relax();
+ raw_spin_lock(&desc->lock);
+ } while (irqd_irq_inprogress(&desc->irq_data));
+
+ /* Might have been disabled in meantime */
+ return !irqd_irq_disabled(&desc->irq_data) && desc->action;
+ }
+ return false;
}
static bool irq_can_handle_pm(struct irq_desc *desc)
@@ -481,10 +491,15 @@ static bool irq_can_handle_pm(struct irq_desc *desc)
if (irq_pm_check_wakeup(desc))
return false;
- /*
- * Handle a potential concurrent poll on a different core.
- */
- return irq_check_poll(desc);
+ /* Check whether the interrupt is polled on another CPU */
+ if (unlikely(desc->istate & IRQS_POLL_INPROGRESS)) {
+ if (WARN_ONCE(irq_poll_cpu == smp_processor_id(),
+ "irq poll in progress on cpu %d for irq %d\n",
+ smp_processor_id(), desc->irq_data.irq))
+ return false;
+ return irq_wait_on_inprogress(desc);
+ }
+ return false;
}
static inline bool irq_can_handle_actions(struct irq_desc *desc)
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index aebfe22..82b0d67 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -20,6 +20,7 @@
#define istate core_internal_state__do_not_mess_with_it
extern bool noirqdebug;
+extern int irq_poll_cpu;
extern struct irqaction chained_action;
@@ -112,7 +113,6 @@ irqreturn_t handle_irq_event(struct irq_desc *desc);
int check_irq_resend(struct irq_desc *desc, bool inject);
void clear_irq_resend(struct irq_desc *desc);
void irq_resend_init(struct irq_desc *desc);
-bool irq_wait_for_poll(struct irq_desc *desc);
void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action);
void wake_threads_waitq(struct irq_desc *desc);
diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index 8f26982..73280cc 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -19,45 +19,10 @@ static int irqfixup __read_mostly;
#define POLL_SPURIOUS_IRQ_INTERVAL (HZ/10)
static void poll_spurious_irqs(struct timer_list *unused);
static DEFINE_TIMER(poll_spurious_irq_timer, poll_spurious_irqs);
-static int irq_poll_cpu;
+int irq_poll_cpu;
static atomic_t irq_poll_active;
/*
- * We wait here for a poller to finish.
- *
- * If the poll runs on this CPU, then we yell loudly and return
- * false. That will leave the interrupt line disabled in the worst
- * case, but it should never happen.
- *
- * We wait until the poller is done and then recheck disabled and
- * action (about to be disabled). Only if it's still active, we return
- * true and let the handler run.
- */
-bool irq_wait_for_poll(struct irq_desc *desc)
-{
- lockdep_assert_held(&desc->lock);
-
- if (WARN_ONCE(irq_poll_cpu == smp_processor_id(),
- "irq poll in progress on cpu %d for irq %d\n",
- smp_processor_id(), desc->irq_data.irq))
- return false;
-
-#ifdef CONFIG_SMP
- do {
- raw_spin_unlock(&desc->lock);
- while (irqd_irq_inprogress(&desc->irq_data))
- cpu_relax();
- raw_spin_lock(&desc->lock);
- } while (irqd_irq_inprogress(&desc->irq_data));
- /* Might have been disabled in meantime */
- return !irqd_irq_disabled(&desc->irq_data) && desc->action;
-#else
- return false;
-#endif
-}
-
-
-/*
* Recovery handler for misrouted interrupts.
*/
static bool try_one_irq(struct irq_desc *desc, bool force)
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [tip: irq/core] genirq: Move irq_wait_for_poll() to call site
2025-07-22 12:45 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
@ 2025-07-23 6:22 ` Ingo Molnar
0 siblings, 0 replies; 14+ messages in thread
From: Ingo Molnar @ 2025-07-23 6:22 UTC (permalink / raw)
To: linux-kernel, Thomas Gleixner
Cc: linux-tip-commits, Liangyan, Jiri Slaby, x86, maz
Two minor nits:
* tip-bot2 for Thomas Gleixner <tip-bot2@linutronix.de> wrote:
> + /* Might have been disabled in meantime */
> + return !irqd_irq_disabled(&desc->irq_data) && desc->action;
This has a (pre-existing) spelling mistake:
s/in meantime
/in the meantime
> + if (WARN_ONCE(irq_poll_cpu == smp_processor_id(),
> + "irq poll in progress on cpu %d for irq %d\n",
> + smp_processor_id(), desc->irq_data.irq))
And we usually capitalize these:
s/on cpu
/on CPU
s/irq poll
/IRQ poll
Just like in the surrounding comments:
> - * If the poll runs on this CPU, then we yell loudly and return
Thanks,
Ingo
^ permalink raw reply [flat|nested] 14+ messages in thread
* [patch 3/4] genirq: Split up irq_pm_check_wakeup()
2025-07-18 18:54 [patch 0/4] genirq: Prevent migration live lock in handle_edge_irq() Thomas Gleixner
2025-07-18 18:54 ` [patch 1/4] genirq: Remove pointless local variable Thomas Gleixner
2025-07-18 18:54 ` [patch 2/4] genirq: Move irq_wait_for_poll() to call site Thomas Gleixner
@ 2025-07-18 18:54 ` Thomas Gleixner
2025-07-22 12:45 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2025-07-18 18:54 ` [patch 4/4] genirq: Prevent migration live lock in handle_edge_irq() Thomas Gleixner
2025-07-21 15:05 ` [External] [patch 0/4] " Liangyan
4 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2025-07-18 18:54 UTC (permalink / raw)
To: LKML; +Cc: Liangyan, Yicong Shen, Jiri Slaby
Let the calling code check for the IRQD_WAKEUP_ARMED flag to prepare for a
live lock mitigation in the edge type handler.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
kernel/irq/chip.c | 4 +++-
kernel/irq/internals.h | 4 ++--
kernel/irq/pm.c | 16 ++++++----------
3 files changed, 11 insertions(+), 13 deletions(-)
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -478,8 +478,10 @@ static bool irq_can_handle_pm(struct irq
* and suspended, disable it and notify the pm core about the
* event.
*/
- if (irq_pm_check_wakeup(desc))
+ if (unlikely(irqd_has_set(irqd, IRQD_WAKEUP_ARMED))) {
+ irq_pm_handle_wakeup(desc);
return false;
+ }
/* Check whether the interrupt is polled on another CPU */
if (unlikely(desc->istate & IRQS_POLL_INPROGRESS)) {
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -277,11 +277,11 @@ static inline bool irq_is_nmi(struct irq
}
#ifdef CONFIG_PM_SLEEP
-bool irq_pm_check_wakeup(struct irq_desc *desc);
+void irq_pm_handle_wakeup(struct irq_desc *desc);
void irq_pm_install_action(struct irq_desc *desc, struct irqaction *action);
void irq_pm_remove_action(struct irq_desc *desc, struct irqaction *action);
#else
-static inline bool irq_pm_check_wakeup(struct irq_desc *desc) { return false; }
+static inline void irq_pm_handle_wakeup(struct irq_desc *desc) { }
static inline void
irq_pm_install_action(struct irq_desc *desc, struct irqaction *action) { }
static inline void
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -13,17 +13,13 @@
#include "internals.h"
-bool irq_pm_check_wakeup(struct irq_desc *desc)
+void irq_pm_handle_wakeup(struct irq_desc *desc)
{
- if (irqd_is_wakeup_armed(&desc->irq_data)) {
- irqd_clear(&desc->irq_data, IRQD_WAKEUP_ARMED);
- desc->istate |= IRQS_SUSPENDED | IRQS_PENDING;
- desc->depth++;
- irq_disable(desc);
- pm_system_irq_wakeup(irq_desc_get_irq(desc));
- return true;
- }
- return false;
+ irqd_clear(&desc->irq_data, IRQD_WAKEUP_ARMED);
+ desc->istate |= IRQS_SUSPENDED | IRQS_PENDING;
+ desc->depth++;
+ irq_disable(desc);
+ pm_system_irq_wakeup(irq_desc_get_irq(desc));
}
/*
^ permalink raw reply [flat|nested] 14+ messages in thread
* [tip: irq/core] genirq: Split up irq_pm_check_wakeup()
2025-07-18 18:54 ` [patch 3/4] genirq: Split up irq_pm_check_wakeup() Thomas Gleixner
@ 2025-07-22 12:45 ` tip-bot2 for Thomas Gleixner
0 siblings, 0 replies; 14+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2025-07-22 12:45 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Thomas Gleixner, Liangyan, x86, linux-kernel, maz
The following commit has been merged into the irq/core branch of tip:
Commit-ID: c609045abc778689ce42e8f5827a84179ace52c5
Gitweb: https://git.kernel.org/tip/c609045abc778689ce42e8f5827a84179ace52c5
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 18 Jul 2025 20:54:10 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 22 Jul 2025 14:30:42 +02:00
genirq: Split up irq_pm_check_wakeup()
Let the calling code check for the IRQD_WAKEUP_ARMED flag to prepare for a
live lock mitigation in the edge type handler.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Link: https://lore.kernel.org/all/20250718185312.012392426@linutronix.de
---
kernel/irq/chip.c | 4 +++-
kernel/irq/internals.h | 4 ++--
kernel/irq/pm.c | 16 ++++++----------
3 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 290244c..11ecf6c 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -488,8 +488,10 @@ static bool irq_can_handle_pm(struct irq_desc *desc)
* and suspended, disable it and notify the pm core about the
* event.
*/
- if (irq_pm_check_wakeup(desc))
+ if (unlikely(irqd_has_set(irqd, IRQD_WAKEUP_ARMED))) {
+ irq_pm_handle_wakeup(desc);
return false;
+ }
/* Check whether the interrupt is polled on another CPU */
if (unlikely(desc->istate & IRQS_POLL_INPROGRESS)) {
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 82b0d67..0164ca4 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -277,11 +277,11 @@ static inline bool irq_is_nmi(struct irq_desc *desc)
}
#ifdef CONFIG_PM_SLEEP
-bool irq_pm_check_wakeup(struct irq_desc *desc);
+void irq_pm_handle_wakeup(struct irq_desc *desc);
void irq_pm_install_action(struct irq_desc *desc, struct irqaction *action);
void irq_pm_remove_action(struct irq_desc *desc, struct irqaction *action);
#else
-static inline bool irq_pm_check_wakeup(struct irq_desc *desc) { return false; }
+static inline void irq_pm_handle_wakeup(struct irq_desc *desc) { }
static inline void
irq_pm_install_action(struct irq_desc *desc, struct irqaction *action) { }
static inline void
diff --git a/kernel/irq/pm.c b/kernel/irq/pm.c
index 445912d..f739472 100644
--- a/kernel/irq/pm.c
+++ b/kernel/irq/pm.c
@@ -13,17 +13,13 @@
#include "internals.h"
-bool irq_pm_check_wakeup(struct irq_desc *desc)
+void irq_pm_handle_wakeup(struct irq_desc *desc)
{
- if (irqd_is_wakeup_armed(&desc->irq_data)) {
- irqd_clear(&desc->irq_data, IRQD_WAKEUP_ARMED);
- desc->istate |= IRQS_SUSPENDED | IRQS_PENDING;
- desc->depth++;
- irq_disable(desc);
- pm_system_irq_wakeup(irq_desc_get_irq(desc));
- return true;
- }
- return false;
+ irqd_clear(&desc->irq_data, IRQD_WAKEUP_ARMED);
+ desc->istate |= IRQS_SUSPENDED | IRQS_PENDING;
+ desc->depth++;
+ irq_disable(desc);
+ pm_system_irq_wakeup(irq_desc_get_irq(desc));
}
/*
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [patch 4/4] genirq: Prevent migration live lock in handle_edge_irq()
2025-07-18 18:54 [patch 0/4] genirq: Prevent migration live lock in handle_edge_irq() Thomas Gleixner
` (2 preceding siblings ...)
2025-07-18 18:54 ` [patch 3/4] genirq: Split up irq_pm_check_wakeup() Thomas Gleixner
@ 2025-07-18 18:54 ` Thomas Gleixner
2025-07-22 7:37 ` Jiri Slaby
2025-07-22 12:45 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
2025-07-21 15:05 ` [External] [patch 0/4] " Liangyan
4 siblings, 2 replies; 14+ messages in thread
From: Thomas Gleixner @ 2025-07-18 18:54 UTC (permalink / raw)
To: LKML; +Cc: Liangyan, Yicong Shen, Jiri Slaby
Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
related to interrupt migration.
If the interrupt affinity is moved to a new target CPU and the interrupt is
currently handled on the previous target CPU for edge type interrupts the
handler might get stuck on the previous target:
CPU 0 (previous target) CPU 1 (new target)
handle_edge_irq()
repeat:
handle_event() handle_edge_irq()
if (INPROGESS) {
set(PENDING);
mask();
return;
}
if (PENDING) {
clear(PENDING);
unmask();
goto repeat;
}
The migration in software never completes and CPU0 continues to handle the
pending events forever. This happens when the device raises interrupts with
a high rate and always before handle_event() completes and before the CPU0
handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
over. This has been observed in virtual machines.
Prevent this by checking whether the CPU which observes the INPROGRESS flag
is the new affinity target. If that's the case, do not set the PENDING flag
and wait for the INPROGRESS flag to be cleared instead, so that the new
interrupt is handled on the new target CPU and the previous CPU is released
from the action.
This is restricted to the edge type handler and only utilized on systems,
which use single CPU targets for interrupt affinity.
Reported-by: Yicong Shen <shenyicong.1023@bytedance.com>
Reported-by: Liangyan <liangyan.peng@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250701163558.2588435-1-liangyan.peng@bytedance.com
---
kernel/irq/chip.c | 41 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 39 insertions(+), 2 deletions(-)
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -466,11 +466,14 @@ static bool irq_wait_on_inprogress(struc
static bool irq_can_handle_pm(struct irq_desc *desc)
{
+ struct irq_data *irqd = &desc->irq_data;
+ const struct cpumask *aff;
+
/*
* If the interrupt is not in progress and is not an armed
* wakeup interrupt, proceed.
*/
- if (!irqd_has_set(&desc->irq_data, IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED))
+ if (!irqd_has_set(irqd, IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED))
return true;
/*
@@ -491,7 +494,41 @@ static bool irq_can_handle_pm(struct irq
return false;
return irq_wait_on_inprogress(desc);
}
- return false;
+
+ /* The below works only for single target interrupts */
+ if (!IS_ENABLED(CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK) ||
+ !irqd_is_single_target(irqd) || desc->handle_irq != handle_edge_irq)
+ return false;
+
+ /*
+ * If the interrupt affinity was moved to this CPU and the
+ * interrupt is currently handled on the previous target CPU, then
+ * busy wait for INPROGRESS to be cleared. Otherwise for edge type
+ * interrupts the handler might get stuck on the previous target:
+ *
+ * CPU 0 CPU 1 (new target)
+ * handle_edge_irq()
+ * repeat:
+ * handle_event() handle_edge_irq()
+ * if (INPROGESS) {
+ * set(PENDING);
+ * mask();
+ * return;
+ * }
+ * if (PENDING) {
+ * clear(PENDING);
+ * unmask();
+ * goto repeat;
+ * }
+ *
+ * This happens when the device raises interrupts with a high rate
+ * and always before handle_event() completes and the CPU0 handler
+ * can clear INPROGRESS. This has been observed in virtual machines.
+ */
+ aff = irq_data_get_effective_affinity_mask(irqd);
+ if (cpumask_first(aff) != smp_processor_id())
+ return false;
+ return irq_wait_on_inprogress(desc);
}
static inline bool irq_can_handle_actions(struct irq_desc *desc)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [patch 4/4] genirq: Prevent migration live lock in handle_edge_irq()
2025-07-18 18:54 ` [patch 4/4] genirq: Prevent migration live lock in handle_edge_irq() Thomas Gleixner
@ 2025-07-22 7:37 ` Jiri Slaby
2025-07-22 12:45 ` [tip: irq/core] " tip-bot2 for Thomas Gleixner
1 sibling, 0 replies; 14+ messages in thread
From: Jiri Slaby @ 2025-07-22 7:37 UTC (permalink / raw)
To: Thomas Gleixner, LKML; +Cc: Liangyan, Yicong Shen
On 18. 07. 25, 20:54, Thomas Gleixner wrote:
> Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
> related to interrupt migration.
>
> If the interrupt affinity is moved to a new target CPU and the interrupt is
> currently handled on the previous target CPU for edge type interrupts the
> handler might get stuck on the previous target:
>
> CPU 0 (previous target) CPU 1 (new target)
>
> handle_edge_irq()
> repeat:
> handle_event() handle_edge_irq()
> if (INPROGESS) {
> set(PENDING);
> mask();
> return;
> }
> if (PENDING) {
> clear(PENDING);
> unmask();
> goto repeat;
> }
>
> The migration in software never completes and CPU0 continues to handle the
> pending events forever. This happens when the device raises interrupts with
> a high rate and always before handle_event() completes and before the CPU0
> handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
> over. This has been observed in virtual machines.
>
> Prevent this by checking whether the CPU which observes the INPROGRESS flag
> is the new affinity target. If that's the case, do not set the PENDING flag
> and wait for the INPROGRESS flag to be cleared instead, so that the new
> interrupt is handled on the new target CPU and the previous CPU is released
> from the action.
>
> This is restricted to the edge type handler and only utilized on systems,
> which use single CPU targets for interrupt affinity.
LGTM
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
> Reported-by: Yicong Shen <shenyicong.1023@bytedance.com>
> Reported-by: Liangyan <liangyan.peng@bytedance.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lore.kernel.org/all/20250701163558.2588435-1-liangyan.peng@bytedance.com
thanks,
--
js
suse labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* [tip: irq/core] genirq: Prevent migration live lock in handle_edge_irq()
2025-07-18 18:54 ` [patch 4/4] genirq: Prevent migration live lock in handle_edge_irq() Thomas Gleixner
2025-07-22 7:37 ` Jiri Slaby
@ 2025-07-22 12:45 ` tip-bot2 for Thomas Gleixner
1 sibling, 0 replies; 14+ messages in thread
From: tip-bot2 for Thomas Gleixner @ 2025-07-22 12:45 UTC (permalink / raw)
To: linux-tip-commits
Cc: Yicong Shen, Liangyan, Thomas Gleixner, Jiri Slaby, x86,
linux-kernel, maz
The following commit has been merged into the irq/core branch of tip:
Commit-ID: 8d39d6ec4db5da9899993092227584a97c203fd3
Gitweb: https://git.kernel.org/tip/8d39d6ec4db5da9899993092227584a97c203fd3
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 18 Jul 2025 20:54:12 +02:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 22 Jul 2025 14:30:42 +02:00
genirq: Prevent migration live lock in handle_edge_irq()
Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
related to interrupt migration.
If the interrupt affinity is moved to a new target CPU and the interrupt is
currently handled on the previous target CPU for edge type interrupts the
handler might get stuck on the previous target:
CPU 0 (previous target) CPU 1 (new target)
handle_edge_irq()
repeat:
handle_event() handle_edge_irq()
if (INPROGESS) {
set(PENDING);
mask();
return;
}
if (PENDING) {
clear(PENDING);
unmask();
goto repeat;
}
The migration in software never completes and CPU0 continues to handle the
pending events forever. This happens when the device raises interrupts with
a high rate and always before handle_event() completes and before the CPU0
handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
over. This has been observed in virtual machines.
Prevent this by checking whether the CPU which observes the INPROGRESS flag
is the new affinity target. If that's the case, do not set the PENDING flag
and wait for the INPROGRESS flag to be cleared instead, so that the new
interrupt is handled on the new target CPU and the previous CPU is released
from the action.
This is restricted to the edge type handler and only utilized on systems,
which use single CPU targets for interrupt affinity.
Reported-by: Yicong Shen <shenyicong.1023@bytedance.com>
Reported-by: Liangyan <liangyan.peng@bytedance.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/all/20250701163558.2588435-1-liangyan.peng@bytedance.com
Link: https://lore.kernel.org/all/20250718185312.076515034@linutronix.de
---
kernel/irq/chip.c | 41 +++++++++++++++++++++++++++++++++++++++--
1 file changed, 39 insertions(+), 2 deletions(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 11ecf6c..624106e 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -476,11 +476,14 @@ static bool irq_wait_on_inprogress(struct irq_desc *desc)
static bool irq_can_handle_pm(struct irq_desc *desc)
{
+ struct irq_data *irqd = &desc->irq_data;
+ const struct cpumask *aff;
+
/*
* If the interrupt is not in progress and is not an armed
* wakeup interrupt, proceed.
*/
- if (!irqd_has_set(&desc->irq_data, IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED))
+ if (!irqd_has_set(irqd, IRQD_IRQ_INPROGRESS | IRQD_WAKEUP_ARMED))
return true;
/*
@@ -501,7 +504,41 @@ static bool irq_can_handle_pm(struct irq_desc *desc)
return false;
return irq_wait_on_inprogress(desc);
}
- return false;
+
+ /* The below works only for single target interrupts */
+ if (!IS_ENABLED(CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK) ||
+ !irqd_is_single_target(irqd) || desc->handle_irq != handle_edge_irq)
+ return false;
+
+ /*
+ * If the interrupt affinity was moved to this CPU and the
+ * interrupt is currently handled on the previous target CPU, then
+ * busy wait for INPROGRESS to be cleared. Otherwise for edge type
+ * interrupts the handler might get stuck on the previous target:
+ *
+ * CPU 0 CPU 1 (new target)
+ * handle_edge_irq()
+ * repeat:
+ * handle_event() handle_edge_irq()
+ * if (INPROGESS) {
+ * set(PENDING);
+ * mask();
+ * return;
+ * }
+ * if (PENDING) {
+ * clear(PENDING);
+ * unmask();
+ * goto repeat;
+ * }
+ *
+ * This happens when the device raises interrupts with a high rate
+ * and always before handle_event() completes and the CPU0 handler
+ * can clear INPROGRESS. This has been observed in virtual machines.
+ */
+ aff = irq_data_get_effective_affinity_mask(irqd);
+ if (cpumask_first(aff) != smp_processor_id())
+ return false;
+ return irq_wait_on_inprogress(desc);
}
static inline bool irq_can_handle_actions(struct irq_desc *desc)
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [External] [patch 0/4] genirq: Prevent migration live lock in handle_edge_irq()
2025-07-18 18:54 [patch 0/4] genirq: Prevent migration live lock in handle_edge_irq() Thomas Gleixner
` (3 preceding siblings ...)
2025-07-18 18:54 ` [patch 4/4] genirq: Prevent migration live lock in handle_edge_irq() Thomas Gleixner
@ 2025-07-21 15:05 ` Liangyan
4 siblings, 0 replies; 14+ messages in thread
From: Liangyan @ 2025-07-21 15:05 UTC (permalink / raw)
To: Thomas Gleixner, LKML; +Cc: Liangyan, Yicong Shen, Jiri Slaby
On 2025/7/19 02:54, Thomas Gleixner wrote:
> Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
> related to interrupt migration.
>
> If the interrupt affinity is moved to a new target CPU and the interrupt is
> currently handled on the previous target CPU for edge type interrupts the
> handler might get stuck on the previous target:
>
> CPU 0 (previous target) CPU 1 (new target)
>
> handle_edge_irq()
> repeat:
> handle_event() handle_edge_irq()
> if (INPROGESS) {
> set(PENDING);
> mask();
> return;
> }
> if (PENDING) {
> clear(PENDING);
> unmask();
> goto repeat;
> }
>
> The migration in software never completes and CPU0 continues to handle the
> pending events forever. This happens when the device raises interrupts with
> a high rate and always before handle_event() completes and before the CPU0
> handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
> over. This has been observed in virtual machines.
>
> The following series is addressing this by making the new target CPU wait
> for the handler to complete on CPU1 and thereby completing the software
> migration.
>
> A draft combo patch of this has been tested by Liangyan:
>
> https://lore.kernel.org/all/87o6u0rpaa.ffs@tglx
>
> The series splits up the draft patch and has proper changelogs.
>
> Thanks,
>
> tglx
> ---
> chip.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------
> internals.h | 6 ++---
> pm.c | 16 +++++---------
> spurious.c | 37 --------------------------------
> 4 files changed, 69 insertions(+), 58 deletions(-)
>
>
Tested-by: Liangyan <liangyan.peng@bytedance.com>
Regards,
Liangyan
^ permalink raw reply [flat|nested] 14+ messages in thread