From: Brian Norris <briannorris@chromium.org>
To: Aleksandrs Vinarskis <alex.vinarskis@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Tsai Sung-Fu <danielsftsai@google.com>,
Douglas Anderson <dianders@chromium.org>,
linux-kernel@vger.kernel.org, Johan Hovold <johan@kernel.org>
Subject: Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
Date: Mon, 9 Jun 2025 10:13:48 -0700 [thread overview]
Message-ID: <aEcWTM3Y1roOf4Ph@google.com> (raw)
In-Reply-To: <24ec4adc-7c80-49e9-93ee-19908a97ab84@gmail.com>
Hi Alex,
On Fri, Jun 06, 2025 at 02:21:54PM +0200, Aleksandrs Vinarskis wrote:
> On 5/14/25 22:13, Brian Norris wrote:
> > Affinity-managed IRQs may be shut down and restarted during CPU
> > hotunplug/plug, and the IRQ may be left in an unexpected state.
> > Specifically:
[...]
> It appears that this commit introduces a critical bug observed on at least
> some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend function
> unusable.
>
> With this change in place, after successful suspend the device either:
> 1. Cannot wake up at all. Screen stays black, even though PM has existed
> suspend (observed by external LEDs controlled by PM)
>
> 2. Wakes up eventually after minutes (instead of seconds) with SSD related
> errors in dmesg. System still exhibits errors eg. UI icons are not properly
> loaded, WiFi does not (always) connect.
I'm sorry to hear this has caused regressions. I don't yet know why your
particular problems have occurred, but I did notice last week that there
were some issues with the patch in question. I wrote a patch which I'll
append, and have started (but not completely finished) testing it.
Perhaps you could try it out and let me know how it goes?
> Is it possible to have this addressed/patched up/reverted before 6.16-rc1
> goes live and introduces the regression?
> It also appears this series was selected for backporting to 6.6, 6.12, 6.14,
> 6.15: perhaps this should be postponed/aborted until better solution is
> found?
Regarding stable backports: yes, please. It looks like Johan requested
holding this back on stable here:
https://lore.kernel.org/all/aELf3QmuEJOlR7Dv@hovoldconsulting.com/
Hopefully we can figure out a mainline solution promptly enough, but
revert is also OK if it comes down to it.
Below is a patch I'm working with so far. I can submit it as a separate
patch if that helps you.
Brian
---
Subject: [PATCH] genirq: Rebalance managed interrupts across multi-CPU hotplug
Commit 788019eb559f ("genirq: Retain disable depth for managed
interrupts across CPU hotplug") intended to only decrement the disable
depth once per managed shutdown, but instead it decrements for each CPU
hotplug in the affinity mask, until its depth reaches a point where it
finally gets re-started.
For example, consider:
1. Interrupt is affine to CPU {M,N}
2. disable_irq() -> depth is 1
3. CPU M goes offline -> interrupt migrates to CPU N / depth is still 1
4. CPU N goes offline -> irq_shutdown() / depth is 2
5. CPU N goes online
-> irq_restore_affinity_of_irq()
-> irqd_is_managed_and_shutdown()==true
-> irq_startup_managed() -> depth is 1
6. CPU M goes online
-> irq_restore_affinity_of_irq()
-> irqd_is_managed_and_shutdown()==true
-> irq_startup_managed() -> depth is 0
*** BUG: driver expects the interrupt is still disabled ***
-> irq_startup() -> irqd_clr_managed_shutdown()
7. enable_irq() -> depth underflow / unbalanced enable_irq() warning
We should clear the managed-shutdown flag at step 6, so that further
hotplugs don't cause further imbalance.
Fixes: commit 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
kernel/irq/chip.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b0e0a7332993..1af5fe14f3e0 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -175,8 +175,6 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
if (!irqd_affinity_is_managed(d))
return IRQ_STARTUP_NORMAL;
- irqd_clr_managed_shutdown(d);
-
if (!cpumask_intersects(aff, cpu_online_mask)) {
/*
* Catch code which fiddles with enable_irq() on a managed
@@ -205,12 +203,15 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
void irq_startup_managed(struct irq_desc *desc)
{
+ struct irq_data *d = irq_desc_get_irq_data(desc);
+
/*
* Only start it up when the disable depth is 1, so that a disable,
* hotunplug, hotplug sequence does not end up enabling it during
* hotplug unconditionally.
*/
desc->depth--;
+ irqd_clr_managed_shutdown(d);
if (!desc->depth)
irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
}
--
2.50.0.rc0.642.g800a2b2222-goog
next prev parent reply other threads:[~2025-06-09 17:13 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-14 20:13 [PATCH v2 0/2] genirq: Retain depth for managed IRQs across CPU hotplug Brian Norris
2025-05-14 20:13 ` [PATCH v2 1/2] " Brian Norris
2025-05-15 14:51 ` [tip: irq/core] genirq: Retain disable depth for managed interrupts " tip-bot2 for Brian Norris
2025-06-06 12:21 ` [PATCH v2 1/2] genirq: Retain depth for managed IRQs " Aleksandrs Vinarskis
2025-06-09 17:13 ` Brian Norris [this message]
2025-06-09 18:19 ` Aleksandrs Vinarskis
2025-06-10 20:07 ` Brian Norris
2025-06-11 6:50 ` Thomas Gleixner
2025-06-11 8:50 ` Thomas Gleixner
2025-06-11 18:51 ` Brian Norris
2025-06-11 6:56 ` Aleksandrs Vinarskis
2025-06-11 19:08 ` Brian Norris
2025-06-12 18:40 ` Brian Norris
2025-06-18 10:17 ` Johan Hovold
2025-06-18 17:10 ` Brian Norris
2025-06-19 8:32 ` Johan Hovold
2025-05-14 20:13 ` [PATCH v2 2/2] genirq: Add kunit tests for depth counts Brian Norris
2025-05-15 14:01 ` kernel test robot
2025-05-15 17:21 ` Brian Norris
2025-05-15 22:24 ` Thomas Gleixner
2025-05-15 14:51 ` [tip: irq/core] genirq: Add kunit tests for disable " tip-bot2 for Brian Norris
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aEcWTM3Y1roOf4Ph@google.com \
--to=briannorris@chromium.org \
--cc=alex.vinarskis@gmail.com \
--cc=danielsftsai@google.com \
--cc=dianders@chromium.org \
--cc=johan@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).