[PATCH v2 0/2] genirq: Retain depth for managed IRQs across CPU hotplug

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] genirq: Retain depth for managed IRQs across CPU hotplug
@ 2025-05-14 20:13 Brian Norris
  2025-05-14 20:13 ` [PATCH v2 1/2] " Brian Norris
  2025-05-14 20:13 ` [PATCH v2 2/2] genirq: Add kunit tests for depth counts Brian Norris
  0 siblings, 2 replies; 21+ messages in thread
From: Brian Norris @ 2025-05-14 20:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tsai Sung-Fu, Douglas Anderson, linux-kernel, Brian Norris

v1:
 https://lore.kernel.org/lkml/20250513224402.864767-1-briannorris@chromium.org/
 [PATCH 0/2] genirq: Retain disable-depth across irq_{shutdown,startup}()

I'm seeing problems in a driver that:
(a) requests an affinity-managed IRQ (struct
    irq_affinity_desc::is_managed == 1);
(b) disables that IRQ (disable_irq()); and
(c) undergoes CPU hotplug for the affined CPU.

When we do the above, the genirq core leaves the IRQ in a different
state than it started -- the kernel IRQ is re-enabled after CPU hot
unplug/plug.

This problem seems to stem from the behavior of irq_shutdown() and
irq_shutdown(): that they assume they always run with an enabled IRQ,
and can simply set depth to 1 and 0 respectively.

I incorporate a fix suggested by Thomas Gleixner in patch 1, and provide
some new kunit test cases for this area in patch 2.

Side note: I understand my colleague has reported other issues related
to the same code:

  Subject: [PATCH] genirq/PM: Fix IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND if depth > 1
  https://lore.kernel.org/lkml/20250512173250.1.If5c00cf9f08732f4af5f104ae59b8785c7f69536@changeid/

We're addressing different problems, but they do happen to hit on some
of the same awkwardness in irq_startup(). These two patches would
probably need to be reconciled in some way.

Changes in v2:
 * Adapt Thomas Gleixner's alternative solution, to focus only on CPU
   hotplug cases
 * add request_irq()/disable_irq()/free_irq()/request_irq() test
   sequence
 * clean up more resources in tests
 * move tests to patch 2 (i.e., after bugs are fixed and tests pass)
 * adapt to irq_startup_managed() (new API)

Brian Norris (2):
  genirq: Retain depth for managed IRQs across CPU hotplug
  genirq: Add kunit tests for depth counts

 kernel/irq/Kconfig      |  11 ++
 kernel/irq/Makefile     |   1 +
 kernel/irq/chip.c       |  22 +++-
 kernel/irq/cpuhotplug.c |   2 +-
 kernel/irq/internals.h  |   1 +
 kernel/irq/irq_test.c   | 229 ++++++++++++++++++++++++++++++++++++++++
 6 files changed, 264 insertions(+), 2 deletions(-)
 create mode 100644 kernel/irq/irq_test.c

-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-05-14 20:13 [PATCH v2 0/2] genirq: Retain depth for managed IRQs across CPU hotplug Brian Norris
@ 2025-05-14 20:13 ` Brian Norris
  2025-05-15 14:51   ` [tip: irq/core] genirq: Retain disable depth for managed interrupts " tip-bot2 for Brian Norris
  2025-06-06 12:21   ` [PATCH v2 1/2] genirq: Retain depth for managed IRQs " Aleksandrs Vinarskis
  2025-05-14 20:13 ` [PATCH v2 2/2] genirq: Add kunit tests for depth counts Brian Norris
  1 sibling, 2 replies; 21+ messages in thread
From: Brian Norris @ 2025-05-14 20:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tsai Sung-Fu, Douglas Anderson, linux-kernel, Brian Norris

Affinity-managed IRQs may be shut down and restarted during CPU
hotunplug/plug, and the IRQ may be left in an unexpected state.
Specifically:

 1. IRQ affines to CPU N
 2. disable_irq() -> depth is 1
 3. CPU N goes offline
 4. irq_shutdown() -> depth is set to 1 (again)
 5. CPU N goes online
 6. irq_startup() -> depth is set to 0 (BUG! client expected IRQ is
    still disabled)
 7. enable_irq() -> depth underflow / unbalanced enable_irq() WARN

It seems depth only needs preserved for managed IRQs + CPU hotplug, so
per Thomas's recommendation, we make that explicit.

I add kunit tests that cover some of this in a following patch.

Signed-off-by: Brian Norris <briannorris@chromium.org>
Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
Thomas provided a better suggestion than my v1, without fully-formed
patch metadata. I've incorporated that as "Co-developed-by". Feel free to
suggest something different.

Changes in v2:
 * Adapt Thomas Gleixner's alternative solution, to focus only on CPU
   hotplug cases

 kernel/irq/chip.c       | 22 +++++++++++++++++++++-
 kernel/irq/cpuhotplug.c |  2 +-
 kernel/irq/internals.h  |  1 +
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 36cf1b09cc84..ab2bf0de3422 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -223,6 +223,19 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
 		return IRQ_STARTUP_ABORT;
 	return IRQ_STARTUP_MANAGED;
 }
+
+void irq_startup_managed(struct irq_desc *desc)
+{
+	/*
+	 * Only start it up when the disable depth is 1, so that a disable,
+	 * hotunplug, hotplug sequence does not end up enabling it during
+	 * hotplug unconditionally.
+	 */
+	desc->depth--;
+	if (!desc->depth)
+		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
+}
+
 #else
 static __always_inline int
 __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
@@ -290,6 +303,7 @@ int irq_startup(struct irq_desc *desc, bool resend, bool force)
 			ret = __irq_startup(desc);
 			break;
 		case IRQ_STARTUP_ABORT:
+			desc->depth = 1;
 			irqd_set_managed_shutdown(d);
 			return 0;
 		}
@@ -322,7 +336,13 @@ void irq_shutdown(struct irq_desc *desc)
 {
 	if (irqd_is_started(&desc->irq_data)) {
 		clear_irq_resend(desc);
-		desc->depth = 1;
+		/*
+		 * Increment disable depth, so that a managed shutdown on
+		 * CPU hotunplug preserves the actual disabled state when the
+		 * CPU comes back online. See irq_startup_managed().
+		 */
+		desc->depth++;
+
 		if (desc->irq_data.chip->irq_shutdown) {
 			desc->irq_data.chip->irq_shutdown(&desc->irq_data);
 			irq_state_set_disabled(desc);
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 15a7654eff68..3ed5b1592735 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -219,7 +219,7 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
 		return;
 
 	if (irqd_is_managed_and_shutdown(data))
-		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
+		irq_startup_managed(desc);
 
 	/*
 	 * If the interrupt can only be directed to a single target
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index b0290849c395..7111747ecb86 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -87,6 +87,7 @@ extern void __enable_irq(struct irq_desc *desc);
 extern int irq_activate(struct irq_desc *desc);
 extern int irq_activate_and_startup(struct irq_desc *desc, bool resend);
 extern int irq_startup(struct irq_desc *desc, bool resend, bool force);
+extern void irq_startup_managed(struct irq_desc *desc);
 
 extern void irq_shutdown(struct irq_desc *desc);
 extern void irq_shutdown_and_deactivate(struct irq_desc *desc);
-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [tip: irq/core] genirq: Retain disable depth for managed interrupts across CPU hotplug
  2025-05-14 20:13 ` [PATCH v2 1/2] " Brian Norris
@ 2025-05-15 14:51   ` tip-bot2 for Brian Norris
  2025-06-06 12:21   ` [PATCH v2 1/2] genirq: Retain depth for managed IRQs " Aleksandrs Vinarskis
  1 sibling, 0 replies; 21+ messages in thread
From: tip-bot2 for Brian Norris @ 2025-05-15 14:51 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, Brian Norris, x86, linux-kernel, maz

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     788019eb559fd0b365f501467ceafce540e377cc
Gitweb:        https://git.kernel.org/tip/788019eb559fd0b365f501467ceafce540e377cc
Author:        Brian Norris <briannorris@chromium.org>
AuthorDate:    Wed, 14 May 2025 13:13:16 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 15 May 2025 16:44:25 +02:00

genirq: Retain disable depth for managed interrupts across CPU hotplug

Affinity-managed interrupts can be shut down and restarted during CPU
hotunplug/plug. Thereby the interrupt may be left in an unexpected state.
Specifically:

 1. Interrupt is affine to CPU N
 2. disable_irq() -> depth is 1
 3. CPU N goes offline
 4. irq_shutdown() -> depth is set to 1 (again)
 5. CPU N goes online
 6. irq_startup() -> depth is set to 0 (BUG! driver expects that the interrupt
    		     	      	        still disabled)
 7. enable_irq() -> depth underflow / unbalanced enable_irq() warning

This is only a problem for managed interrupts and CPU hotplug, all other
cases like request()/free()/request() truly needs to reset a possibly stale
disable depth value.

Provide a startup function, which takes the disable depth into account, and
invoked it for the managed interrupts in the CPU hotplug path.

This requires to change irq_shutdown() to do a depth increment instead of
setting it to 1, which allows to retain the disable depth, but is harmless
for the other code paths using irq_startup(), which will still reset the
disable depth unconditionally to keep the original correct behaviour.

A kunit tests will be added separately to cover some of these aspects.

[ tglx: Massaged changelog ]

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250514201353.3481400-2-briannorris@chromium.org
---
 kernel/irq/chip.c       | 22 +++++++++++++++++++++-
 kernel/irq/cpuhotplug.c |  2 +-
 kernel/irq/internals.h  |  1 +
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index 1d45c84..b0e0a73 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -202,6 +202,19 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
 		return IRQ_STARTUP_ABORT;
 	return IRQ_STARTUP_MANAGED;
 }
+
+void irq_startup_managed(struct irq_desc *desc)
+{
+	/*
+	 * Only start it up when the disable depth is 1, so that a disable,
+	 * hotunplug, hotplug sequence does not end up enabling it during
+	 * hotplug unconditionally.
+	 */
+	desc->depth--;
+	if (!desc->depth)
+		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
+}
+
 #else
 static __always_inline int
 __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
@@ -269,6 +282,7 @@ int irq_startup(struct irq_desc *desc, bool resend, bool force)
 			ret = __irq_startup(desc);
 			break;
 		case IRQ_STARTUP_ABORT:
+			desc->depth = 1;
 			irqd_set_managed_shutdown(d);
 			return 0;
 		}
@@ -301,7 +315,13 @@ void irq_shutdown(struct irq_desc *desc)
 {
 	if (irqd_is_started(&desc->irq_data)) {
 		clear_irq_resend(desc);
-		desc->depth = 1;
+		/*
+		 * Increment disable depth, so that a managed shutdown on
+		 * CPU hotunplug preserves the actual disabled state when the
+		 * CPU comes back online. See irq_startup_managed().
+		 */
+		desc->depth++;
+
 		if (desc->irq_data.chip->irq_shutdown) {
 			desc->irq_data.chip->irq_shutdown(&desc->irq_data);
 			irq_state_set_disabled(desc);
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index e77ca6d..f07529a 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -218,7 +218,7 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
 		return;
 
 	if (irqd_is_managed_and_shutdown(data))
-		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
+		irq_startup_managed(desc);
 
 	/*
 	 * If the interrupt can only be directed to a single target
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 476a20f..aebfe22 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -87,6 +87,7 @@ extern void __enable_irq(struct irq_desc *desc);
 extern int irq_activate(struct irq_desc *desc);
 extern int irq_activate_and_startup(struct irq_desc *desc, bool resend);
 extern int irq_startup(struct irq_desc *desc, bool resend, bool force);
+extern void irq_startup_managed(struct irq_desc *desc);
 
 extern void irq_shutdown(struct irq_desc *desc);
 extern void irq_shutdown_and_deactivate(struct irq_desc *desc);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-05-14 20:13 ` [PATCH v2 1/2] " Brian Norris
  2025-05-15 14:51   ` [tip: irq/core] genirq: Retain disable depth for managed interrupts " tip-bot2 for Brian Norris
@ 2025-06-06 12:21   ` Aleksandrs Vinarskis
  2025-06-09 17:13     ` Brian Norris
  1 sibling, 1 reply; 21+ messages in thread
From: Aleksandrs Vinarskis @ 2025-06-06 12:21 UTC (permalink / raw)
  To: Brian Norris, Thomas Gleixner
  Cc: Tsai Sung-Fu, Douglas Anderson, linux-kernel, Johan Hovold


On 5/14/25 22:13, Brian Norris wrote:
> Affinity-managed IRQs may be shut down and restarted during CPU
> hotunplug/plug, and the IRQ may be left in an unexpected state.
> Specifically:
>
>   1. IRQ affines to CPU N
>   2. disable_irq() -> depth is 1
>   3. CPU N goes offline
>   4. irq_shutdown() -> depth is set to 1 (again)
>   5. CPU N goes online
>   6. irq_startup() -> depth is set to 0 (BUG! client expected IRQ is
>      still disabled)
>   7. enable_irq() -> depth underflow / unbalanced enable_irq() WARN
>
> It seems depth only needs preserved for managed IRQs + CPU hotplug, so
> per Thomas's recommendation, we make that explicit.
>
> I add kunit tests that cover some of this in a following patch.
>
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> Co-developed-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>


Hi All,

It appears that this commit introduces a critical bug observed on at 
least some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend 
function unusable.

With this change in place, after successful suspend the device either:
1. Cannot wake up at all. Screen stays black, even though PM has existed 
suspend (observed by external LEDs controlled by PM)

2. Wakes up eventually after minutes (instead of seconds) with SSD 
related errors in dmesg. System still exhibits errors eg. UI icons are 
not properly loaded, WiFi does not (always) connect.

Example of SSD errors:
```

[   45.997238] PM: suspend exit
[   76.276320] nvme nvme0: I/O tag 320 (5140) QID 3 timeout, completion 
polled
[  104.945562] nvme nvme0: I/O tag 38 (9026) QID 2 timeout, completion 
polled
[  104.946170] nvme nvme0: I/O tag 147 (8093) QID 4 timeout, completion 
polled
[  106.354320] nvme nvme0: I/O tag 321 (3141) QID 3 timeout, completion 
polled
[  136.689693] nvme nvme0: I/O tag 322 (3142) QID 3 timeout, completion 
polled
[  141.428102] wlP4p1s0: authenticate with 50:64:2b:5f:e3:ba (local 
address=8c:3b:4a:a6:fa:f3)
[  141.428123] wlP4p1s0: send auth to 50:64:2b:5f:e3:ba (try 1/3)
[  141.433397] wlP4p1s0: authenticated
[  141.434303] wlP4p1s0: associate with 50:64:2b:5f:e3:ba (try 1/3)
[  141.438224] wlP4p1s0: RX AssocResp from 50:64:2b:5f:e3:ba 
(capab=0x1011 status=0 aid=2)
[  141.451828] wlP4p1s0: associated
[  149.984776] ath11k_pci 0004:01:00.0: msdu_done bit in attention is 
not set
[  160.635229] ath11k_pci 0004:01:00.0: msdu_done bit in attention is 
not set
[  165.873389] nvme nvme0: I/O tag 12 (500c) QID 7 timeout, completion 
polled
[  165.873478] nvme nvme0: I/O tag 526 (120e) QID 8 timeout, completion 
polled
[  166.026369] nvme nvme0: I/O tag 776 (a308) QID 5 timeout, completion 
polled
[  166.026406] nvme nvme0: I/O tag 704 (22c0) QID 6 timeout, completion 
polled
[  166.769312] nvme nvme0: I/O tag 128 (d080) QID 4 timeout, completion 
polled
[  166.858582] systemd-journald[452]: Time jumped backwards, rotating.
[  196.072359] nvme nvme0: I/O tag 45 (a02d) QID 2 timeout, completion 
polled
[  196.072429] nvme nvme0: I/O tag 778 (630a) QID 5 timeout, completion 
polled
[  196.072440] nvme nvme0: I/O tag 705 (82c1) QID 6 timeout, completion 
polled
[  196.904376] nvme nvme0: I/O tag 346 (215a) QID 3 timeout, completion 
polled
[  212.970816] nvme nvme0: I/O tag 129 (e081) QID 4 timeout, completion 
polled

```


This series was merged to linux-next on 20250516 introducing this bug.

Reverting commit 788019eb559fd0b3 ("genirq: Retain disable depth for 
managed interrupts across CPU hotplug") on either `next-20250516` or 
anything newer eg. `next-20250605` fixes the issue.

Tested on Dell XPS 9345 (Snaprdagon X1E-80-100), Asus Zenbook A14 
(Snapdragon X1-26-100).


Is it possible to have this addressed/patched up/reverted before 
6.16-rc1 goes live and introduces the regression?
It also appears this series was selected for backporting to 6.6, 6.12, 
6.14, 6.15:  perhaps this should be postponed/aborted until better 
solution is found?


Thanks in advance,

Alex

> ---
> Thomas provided a better suggestion than my v1, without fully-formed
> patch metadata. I've incorporated that as "Co-developed-by". Feel free to
> suggest something different.
>
> Changes in v2:
>   * Adapt Thomas Gleixner's alternative solution, to focus only on CPU
>     hotplug cases
>
>   kernel/irq/chip.c       | 22 +++++++++++++++++++++-
>   kernel/irq/cpuhotplug.c |  2 +-
>   kernel/irq/internals.h  |  1 +
>   3 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index 36cf1b09cc84..ab2bf0de3422 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -223,6 +223,19 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
>   		return IRQ_STARTUP_ABORT;
>   	return IRQ_STARTUP_MANAGED;
>   }
> +
> +void irq_startup_managed(struct irq_desc *desc)
> +{
> +	/*
> +	 * Only start it up when the disable depth is 1, so that a disable,
> +	 * hotunplug, hotplug sequence does not end up enabling it during
> +	 * hotplug unconditionally.
> +	 */
> +	desc->depth--;
> +	if (!desc->depth)
> +		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
> +}
> +
>   #else
>   static __always_inline int
>   __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
> @@ -290,6 +303,7 @@ int irq_startup(struct irq_desc *desc, bool resend, bool force)
>   			ret = __irq_startup(desc);
>   			break;
>   		case IRQ_STARTUP_ABORT:
> +			desc->depth = 1;
>   			irqd_set_managed_shutdown(d);
>   			return 0;
>   		}
> @@ -322,7 +336,13 @@ void irq_shutdown(struct irq_desc *desc)
>   {
>   	if (irqd_is_started(&desc->irq_data)) {
>   		clear_irq_resend(desc);
> -		desc->depth = 1;
> +		/*
> +		 * Increment disable depth, so that a managed shutdown on
> +		 * CPU hotunplug preserves the actual disabled state when the
> +		 * CPU comes back online. See irq_startup_managed().
> +		 */
> +		desc->depth++;
> +
>   		if (desc->irq_data.chip->irq_shutdown) {
>   			desc->irq_data.chip->irq_shutdown(&desc->irq_data);
>   			irq_state_set_disabled(desc);
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index 15a7654eff68..3ed5b1592735 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -219,7 +219,7 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
>   		return;
>   
>   	if (irqd_is_managed_and_shutdown(data))
> -		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
> +		irq_startup_managed(desc);
>   
>   	/*
>   	 * If the interrupt can only be directed to a single target
> diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
> index b0290849c395..7111747ecb86 100644
> --- a/kernel/irq/internals.h
> +++ b/kernel/irq/internals.h
> @@ -87,6 +87,7 @@ extern void __enable_irq(struct irq_desc *desc);
>   extern int irq_activate(struct irq_desc *desc);
>   extern int irq_activate_and_startup(struct irq_desc *desc, bool resend);
>   extern int irq_startup(struct irq_desc *desc, bool resend, bool force);
> +extern void irq_startup_managed(struct irq_desc *desc);
>   
>   extern void irq_shutdown(struct irq_desc *desc);
>   extern void irq_shutdown_and_deactivate(struct irq_desc *desc);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-06 12:21   ` [PATCH v2 1/2] genirq: Retain depth for managed IRQs " Aleksandrs Vinarskis
@ 2025-06-09 17:13     ` Brian Norris
  2025-06-09 18:19       ` Aleksandrs Vinarskis
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Norris @ 2025-06-09 17:13 UTC (permalink / raw)
  To: Aleksandrs Vinarskis
  Cc: Thomas Gleixner, Tsai Sung-Fu, Douglas Anderson, linux-kernel,
	Johan Hovold

Hi Alex,

On Fri, Jun 06, 2025 at 02:21:54PM +0200, Aleksandrs Vinarskis wrote:
> On 5/14/25 22:13, Brian Norris wrote:
> > Affinity-managed IRQs may be shut down and restarted during CPU
> > hotunplug/plug, and the IRQ may be left in an unexpected state.
> > Specifically:

[...]

> It appears that this commit introduces a critical bug observed on at least
> some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend function
> unusable.
> 
> With this change in place, after successful suspend the device either:
> 1. Cannot wake up at all. Screen stays black, even though PM has existed
> suspend (observed by external LEDs controlled by PM)
> 
> 2. Wakes up eventually after minutes (instead of seconds) with SSD related
> errors in dmesg. System still exhibits errors eg. UI icons are not properly
> loaded, WiFi does not (always) connect.

I'm sorry to hear this has caused regressions. I don't yet know why your
particular problems have occurred, but I did notice last week that there
were some issues with the patch in question. I wrote a patch which I'll
append, and have started (but not completely finished) testing it.
Perhaps you could try it out and let me know how it goes?

> Is it possible to have this addressed/patched up/reverted before 6.16-rc1
> goes live and introduces the regression?
> It also appears this series was selected for backporting to 6.6, 6.12, 6.14,
> 6.15:  perhaps this should be postponed/aborted until better solution is
> found?

Regarding stable backports: yes, please. It looks like Johan requested
holding this back on stable here:

https://lore.kernel.org/all/aELf3QmuEJOlR7Dv@hovoldconsulting.com/

Hopefully we can figure out a mainline solution promptly enough, but
revert is also OK if it comes down to it.

Below is a patch I'm working with so far. I can submit it as a separate
patch if that helps you.

Brian

---

Subject: [PATCH] genirq: Rebalance managed interrupts across multi-CPU hotplug

Commit 788019eb559f ("genirq: Retain disable depth for managed
interrupts across CPU hotplug") intended to only decrement the disable
depth once per managed shutdown, but instead it decrements for each CPU
hotplug in the affinity mask, until its depth reaches a point where it
finally gets re-started.

For example, consider:

1. Interrupt is affine to CPU {M,N}
2. disable_irq() -> depth is 1
3. CPU M goes offline -> interrupt migrates to CPU N / depth is still 1
4. CPU N goes offline -> irq_shutdown() / depth is 2
5. CPU N goes online
    -> irq_restore_affinity_of_irq()
       -> irqd_is_managed_and_shutdown()==true
          -> irq_startup_managed() -> depth is 1
6. CPU M goes online
    -> irq_restore_affinity_of_irq()
       -> irqd_is_managed_and_shutdown()==true
          -> irq_startup_managed() -> depth is 0
          *** BUG: driver expects the interrupt is still disabled ***
             -> irq_startup() -> irqd_clr_managed_shutdown()
7. enable_irq() -> depth underflow / unbalanced enable_irq() warning

We should clear the managed-shutdown flag at step 6, so that further
hotplugs don't cause further imbalance.

Fixes: commit 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
 kernel/irq/chip.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b0e0a7332993..1af5fe14f3e0 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -175,8 +175,6 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
 	if (!irqd_affinity_is_managed(d))
 		return IRQ_STARTUP_NORMAL;
 
-	irqd_clr_managed_shutdown(d);
-
 	if (!cpumask_intersects(aff, cpu_online_mask)) {
 		/*
 		 * Catch code which fiddles with enable_irq() on a managed
@@ -205,12 +203,15 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
 
 void irq_startup_managed(struct irq_desc *desc)
 {
+	struct irq_data *d = irq_desc_get_irq_data(desc);
+
 	/*
 	 * Only start it up when the disable depth is 1, so that a disable,
 	 * hotunplug, hotplug sequence does not end up enabling it during
 	 * hotplug unconditionally.
 	 */
 	desc->depth--;
+	irqd_clr_managed_shutdown(d);
 	if (!desc->depth)
 		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
 }
-- 
2.50.0.rc0.642.g800a2b2222-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-09 17:13     ` Brian Norris
@ 2025-06-09 18:19       ` Aleksandrs Vinarskis
  2025-06-10 20:07         ` Brian Norris
  0 siblings, 1 reply; 21+ messages in thread
From: Aleksandrs Vinarskis @ 2025-06-09 18:19 UTC (permalink / raw)
  To: Brian Norris
  Cc: Thomas Gleixner, Tsai Sung-Fu, Douglas Anderson, linux-kernel,
	Johan Hovold

On Mon, 9 Jun 2025 at 19:13, Brian Norris <briannorris@chromium.org> wrote:
>
> Hi Alex,
>
> On Fri, Jun 06, 2025 at 02:21:54PM +0200, Aleksandrs Vinarskis wrote:
> > On 5/14/25 22:13, Brian Norris wrote:
> > > Affinity-managed IRQs may be shut down and restarted during CPU
> > > hotunplug/plug, and the IRQ may be left in an unexpected state.
> > > Specifically:
>
> [...]
>
> > It appears that this commit introduces a critical bug observed on at least
> > some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend function
> > unusable.
> >
> > With this change in place, after successful suspend the device either:
> > 1. Cannot wake up at all. Screen stays black, even though PM has existed
> > suspend (observed by external LEDs controlled by PM)
> >
> > 2. Wakes up eventually after minutes (instead of seconds) with SSD related
> > errors in dmesg. System still exhibits errors eg. UI icons are not properly
> > loaded, WiFi does not (always) connect.
>
> I'm sorry to hear this has caused regressions. I don't yet know why your
> particular problems have occurred, but I did notice last week that there
> were some issues with the patch in question. I wrote a patch which I'll
> append, and have started (but not completely finished) testing it.
> Perhaps you could try it out and let me know how it goes?

Hi Brian,

I have tested your attached patch in addition to the original one, and
unfortunately it did not resolve the problem on either of the two
laptops: neither managed to wake up, just like before.
Will be happy to promptly test other proposed solutions.

Thanks,
Alex

>
> > Is it possible to have this addressed/patched up/reverted before 6.16-rc1
> > goes live and introduces the regression?
> > It also appears this series was selected for backporting to 6.6, 6.12, 6.14,
> > 6.15:  perhaps this should be postponed/aborted until better solution is
> > found?
>
> Regarding stable backports: yes, please. It looks like Johan requested
> holding this back on stable here:
>
> https://lore.kernel.org/all/aELf3QmuEJOlR7Dv@hovoldconsulting.com/
>
> Hopefully we can figure out a mainline solution promptly enough, but
> revert is also OK if it comes down to it.
>
> Below is a patch I'm working with so far. I can submit it as a separate
> patch if that helps you.
>
> Brian
>
> ---
>
> Subject: [PATCH] genirq: Rebalance managed interrupts across multi-CPU hotplug
>
> Commit 788019eb559f ("genirq: Retain disable depth for managed
> interrupts across CPU hotplug") intended to only decrement the disable
> depth once per managed shutdown, but instead it decrements for each CPU
> hotplug in the affinity mask, until its depth reaches a point where it
> finally gets re-started.
>
> For example, consider:
>
> 1. Interrupt is affine to CPU {M,N}
> 2. disable_irq() -> depth is 1
> 3. CPU M goes offline -> interrupt migrates to CPU N / depth is still 1
> 4. CPU N goes offline -> irq_shutdown() / depth is 2
> 5. CPU N goes online
>     -> irq_restore_affinity_of_irq()
>        -> irqd_is_managed_and_shutdown()==true
>           -> irq_startup_managed() -> depth is 1
> 6. CPU M goes online
>     -> irq_restore_affinity_of_irq()
>        -> irqd_is_managed_and_shutdown()==true
>           -> irq_startup_managed() -> depth is 0
>           *** BUG: driver expects the interrupt is still disabled ***
>              -> irq_startup() -> irqd_clr_managed_shutdown()
> 7. enable_irq() -> depth underflow / unbalanced enable_irq() warning
>
> We should clear the managed-shutdown flag at step 6, so that further
> hotplugs don't cause further imbalance.
>
> Fixes: commit 788019eb559f ("genirq: Retain disable depth for managed interrupts across CPU hotplug")
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> ---
>  kernel/irq/chip.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index b0e0a7332993..1af5fe14f3e0 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -175,8 +175,6 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
>         if (!irqd_affinity_is_managed(d))
>                 return IRQ_STARTUP_NORMAL;
>
> -       irqd_clr_managed_shutdown(d);
> -
>         if (!cpumask_intersects(aff, cpu_online_mask)) {
>                 /*
>                  * Catch code which fiddles with enable_irq() on a managed
> @@ -205,12 +203,15 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
>
>  void irq_startup_managed(struct irq_desc *desc)
>  {
> +       struct irq_data *d = irq_desc_get_irq_data(desc);
> +
>         /*
>          * Only start it up when the disable depth is 1, so that a disable,
>          * hotunplug, hotplug sequence does not end up enabling it during
>          * hotplug unconditionally.
>          */
>         desc->depth--;
> +       irqd_clr_managed_shutdown(d);
>         if (!desc->depth)
>                 irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
>  }
> --
> 2.50.0.rc0.642.g800a2b2222-goog
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-09 18:19       ` Aleksandrs Vinarskis
@ 2025-06-10 20:07         ` Brian Norris
  2025-06-11  6:50           ` Thomas Gleixner
  2025-06-11  6:56           ` Aleksandrs Vinarskis
  0 siblings, 2 replies; 21+ messages in thread
From: Brian Norris @ 2025-06-10 20:07 UTC (permalink / raw)
  To: Aleksandrs Vinarskis
  Cc: Thomas Gleixner, Tsai Sung-Fu, Douglas Anderson, linux-kernel,
	Johan Hovold

Hi Alex,

On Mon, Jun 09, 2025 at 08:19:58PM +0200, Aleksandrs Vinarskis wrote:
> On Mon, 9 Jun 2025 at 19:13, Brian Norris <briannorris@chromium.org> wrote:
> > On Fri, Jun 06, 2025 at 02:21:54PM +0200, Aleksandrs Vinarskis wrote:
> > > It appears that this commit introduces a critical bug observed on at least
> > > some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend function
> > > unusable.

For my reference, are these laptops represented by the
arch/arm64/boot/dts/qcom/x1e80100.dtsi family of device trees? I'm just
trying to reason through what sorts of driver(s) are in use here, in
case there's something I'm overlooking, as I don't have the laptop in
question to test.

> > > With this change in place, after successful suspend the device either:
> > > 1. Cannot wake up at all. Screen stays black, even though PM has existed
> > > suspend (observed by external LEDs controlled by PM)
> > >
> > > 2. Wakes up eventually after minutes (instead of seconds) with SSD related
> > > errors in dmesg. System still exhibits errors eg. UI icons are not properly
> > > loaded, WiFi does not (always) connect.

FYI, my assumption here based on the log snippets and the patch in
question is that "only" the NVMe driver's IRQs are getting b0rked by my
change. I could imagine that would produce the above symptoms in most
laptop configurations, because failing disk I/O will likely block most
wakeup-related activity, and cause all sorts of UI and system daemon
(e.g., WiFi supplicant) misbehavior.

> > I'm sorry to hear this has caused regressions. I don't yet know why your
> > particular problems have occurred, but I did notice last week that there
> > were some issues with the patch in question. I wrote a patch which I'll
> > append, and have started (but not completely finished) testing it.
> > Perhaps you could try it out and let me know how it goes?
> 
> Hi Brian,
> 
> I have tested your attached patch in addition to the original one, and
> unfortunately it did not resolve the problem on either of the two
> laptops: neither managed to wake up, just like before.
> Will be happy to promptly test other proposed solutions.

Thanks for the testing. I've found a few problems with my proposed
patch, and I've come up with the appended alternative that solves them.
Could you give it a try?

Also, if it's not too much trouble (and especially if my patch still
doesn't help you), could you also provide a more complete kernel log and
kernel .config file? (Attachment is fine with me. Or a direct email, if
somehow the lists don't like it.) It's possible that would give me more
hints as to what's going wrong for you.

Thanks,
Brian

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index b0e0a7332993..3e873c5ce623 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -205,12 +205,15 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
 
 void irq_startup_managed(struct irq_desc *desc)
 {
+	struct irq_data *d = irq_desc_get_irq_data(desc);
+
 	/*
 	 * Only start it up when the disable depth is 1, so that a disable,
 	 * hotunplug, hotplug sequence does not end up enabling it during
 	 * hotplug unconditionally.
 	 */
 	desc->depth--;
+	irqd_clr_managed_shutdown(d);
 	if (!desc->depth)
 		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
 }
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index f07529ae4895..755346ea9819 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -210,13 +210,6 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
 	    !irq_data_get_irq_chip(data) || !cpumask_test_cpu(cpu, affinity))
 		return;
 
-	/*
-	 * Don't restore suspended interrupts here when a system comes back
-	 * from S3. They are reenabled via resume_device_irqs().
-	 */
-	if (desc->istate & IRQS_SUSPENDED)
-		return;
-
 	if (irqd_is_managed_and_shutdown(data))
 		irq_startup_managed(desc);
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-10 20:07         ` Brian Norris
@ 2025-06-11  6:50           ` Thomas Gleixner
  2025-06-11  8:50             ` Thomas Gleixner
  2025-06-11  6:56           ` Aleksandrs Vinarskis
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2025-06-11  6:50 UTC (permalink / raw)
  To: Brian Norris, Aleksandrs Vinarskis
  Cc: Tsai Sung-Fu, Douglas Anderson, linux-kernel, Johan Hovold

On Tue, Jun 10 2025 at 13:07, Brian Norris wrote:
> On Mon, Jun 09, 2025 at 08:19:58PM +0200, Aleksandrs Vinarskis wrote:
>  
>  void irq_startup_managed(struct irq_desc *desc)
>  {
> +	struct irq_data *d = irq_desc_get_irq_data(desc);
> +
>  	/*
>  	 * Only start it up when the disable depth is 1, so that a disable,
>  	 * hotunplug, hotplug sequence does not end up enabling it during
>  	 * hotplug unconditionally.
>  	 */
>  	desc->depth--;
> +	irqd_clr_managed_shutdown(d);

If depth > 0, then it's still shutdown and the subsequent enable
operation which brings it down to 0 will take care of it. So what are
you trying to solve here?

>  	if (!desc->depth)
>  		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
>  }
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index f07529ae4895..755346ea9819 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -210,13 +210,6 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
>  	    !irq_data_get_irq_chip(data) || !cpumask_test_cpu(cpu, affinity))
>  		return;
>  
> -	/*
> -	 * Don't restore suspended interrupts here when a system comes back
> -	 * from S3. They are reenabled via resume_device_irqs().
> -	 */
> -	if (desc->istate & IRQS_SUSPENDED)
> -		return;
> -

Huch? Care to read:

a60dd06af674 ("genirq/cpuhotplug: Skip suspended interrupts when restoring affinity")

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-11  6:50           ` Thomas Gleixner
@ 2025-06-11  8:50             ` Thomas Gleixner
  2025-06-11 18:51               ` Brian Norris
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2025-06-11  8:50 UTC (permalink / raw)
  To: Brian Norris, Aleksandrs Vinarskis
  Cc: Tsai Sung-Fu, Douglas Anderson, linux-kernel, Johan Hovold

On Wed, Jun 11 2025 at 08:50, Thomas Gleixner wrote:
> On Tue, Jun 10 2025 at 13:07, Brian Norris wrote:
>> On Mon, Jun 09, 2025 at 08:19:58PM +0200, Aleksandrs Vinarskis wrote:
>>  
>>  void irq_startup_managed(struct irq_desc *desc)
>>  {
>> +	struct irq_data *d = irq_desc_get_irq_data(desc);
>> +
>>  	/*
>>  	 * Only start it up when the disable depth is 1, so that a disable,
>>  	 * hotunplug, hotplug sequence does not end up enabling it during
>>  	 * hotplug unconditionally.
>>  	 */
>>  	desc->depth--;
>> +	irqd_clr_managed_shutdown(d);
>
> If depth > 0, then it's still shutdown and the subsequent enable
> operation which brings it down to 0 will take care of it. So what are
> you trying to solve here?

I found the previous version which has an explanation for this. That
makes sense. You really want this to be:

	struct irq_data *d = irq_desc_get_irq_data(desc);

        /* Proper comment */
	irqd_clr_managed_shutdown(d);

  	/*
  	 * Only start it up when the disable depth is 1, so that a disable,
  	 * hotunplug, hotplug sequence does not end up enabling it during
  	 * hotplug unconditionally.
  	 */
  	desc->depth--;

...

>>  	if (!desc->depth)
>>  		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
>>  }
>> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
>> index f07529ae4895..755346ea9819 100644
>> --- a/kernel/irq/cpuhotplug.c
>> +++ b/kernel/irq/cpuhotplug.c
>> @@ -210,13 +210,6 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
>>  	    !irq_data_get_irq_chip(data) || !cpumask_test_cpu(cpu, affinity))
>>  		return;
>>  
>> -	/*
>> -	 * Don't restore suspended interrupts here when a system comes back
>> -	 * from S3. They are reenabled via resume_device_irqs().
>> -	 */
>> -	if (desc->istate & IRQS_SUSPENDED)
>> -		return;
>> -
>
> Huch? Care to read:
>
> a60dd06af674 ("genirq/cpuhotplug: Skip suspended interrupts when restoring affinity")

Never mind. After staring long enough at it, this should work because
suspend increments depth and shutdown does too.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-11  8:50             ` Thomas Gleixner
@ 2025-06-11 18:51               ` Brian Norris
  0 siblings, 0 replies; 21+ messages in thread
From: Brian Norris @ 2025-06-11 18:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Aleksandrs Vinarskis, Tsai Sung-Fu, Douglas Anderson,
	linux-kernel, Johan Hovold

Hi Thomas,

On Wed, Jun 11, 2025 at 10:50:32AM +0200, Thomas Gleixner wrote:
> On Wed, Jun 11 2025 at 08:50, Thomas Gleixner wrote:
> > On Tue, Jun 10 2025 at 13:07, Brian Norris wrote:
> >> On Mon, Jun 09, 2025 at 08:19:58PM +0200, Aleksandrs Vinarskis wrote:
> >>  
> >>  void irq_startup_managed(struct irq_desc *desc)
> >>  {
> >> +	struct irq_data *d = irq_desc_get_irq_data(desc);
> >> +
> >>  	/*
> >>  	 * Only start it up when the disable depth is 1, so that a disable,
> >>  	 * hotunplug, hotplug sequence does not end up enabling it during
> >>  	 * hotplug unconditionally.
> >>  	 */
> >>  	desc->depth--;
> >> +	irqd_clr_managed_shutdown(d);
> >
> > If depth > 0, then it's still shutdown and the subsequent enable
> > operation which brings it down to 0 will take care of it. So what are
> > you trying to solve here?
> 
> I found the previous version which has an explanation for this. That

Right, I didn't update and carry the explanations here, as I figured I'd
format this all as proper patches after getting Alex's testing
feedbabck. (I think I'll split to two patches, since there are two
distinct bugs I'm fixing in here.)

> makes sense. You really want this to be:
> 
> 	struct irq_data *d = irq_desc_get_irq_data(desc);
> 
>         /* Proper comment */
> 	irqd_clr_managed_shutdown(d);

Ack, will add a comment.

>   	/*
>   	 * Only start it up when the disable depth is 1, so that a disable,
>   	 * hotunplug, hotplug sequence does not end up enabling it during
>   	 * hotplug unconditionally.
>   	 */
>   	desc->depth--;
> 
> ...
> 
> >>  	if (!desc->depth)
> >>  		irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
> >>  }
> >> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> >> index f07529ae4895..755346ea9819 100644
> >> --- a/kernel/irq/cpuhotplug.c
> >> +++ b/kernel/irq/cpuhotplug.c
> >> @@ -210,13 +210,6 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
> >>  	    !irq_data_get_irq_chip(data) || !cpumask_test_cpu(cpu, affinity))
> >>  		return;
> >>  
> >> -	/*
> >> -	 * Don't restore suspended interrupts here when a system comes back
> >> -	 * from S3. They are reenabled via resume_device_irqs().
> >> -	 */
> >> -	if (desc->istate & IRQS_SUSPENDED)
> >> -		return;
> >> -
> >
> > Huch? Care to read:
> >
> > a60dd06af674 ("genirq/cpuhotplug: Skip suspended interrupts when restoring affinity")
> 
> Never mind. After staring long enough at it, this should work because
> suspend increments depth and shutdown does too.

Yeah, that one definitely deserves an explanation :)

I wrote out a commit message already, but didn't include it here yet, as
I was only looking for testing. Sorry to have sent you staring so long
at it.

One snippet:

    This effectively reverts commit a60dd06af674 ("genirq/cpuhotplug: Skip
    suspended interrupts when restoring affinity"), because it is replaced
    by commit 788019eb559f ("genirq: Retain disable depth for managed
    interrupts across CPU hotplug").

IOW, the depth-tracking we added accomplishes the same goal as commit
a60dd06af674, but including both causes further bugs/imbalances.

Brian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-10 20:07         ` Brian Norris
  2025-06-11  6:50           ` Thomas Gleixner
@ 2025-06-11  6:56           ` Aleksandrs Vinarskis
  2025-06-11 19:08             ` Brian Norris
  1 sibling, 1 reply; 21+ messages in thread
From: Aleksandrs Vinarskis @ 2025-06-11  6:56 UTC (permalink / raw)
  To: Brian Norris
  Cc: Thomas Gleixner, Tsai Sung-Fu, Douglas Anderson, linux-kernel,
	Johan Hovold

On Tue, 10 Jun 2025 at 22:07, Brian Norris <briannorris@chromium.org> wrote:
>
> Hi Alex,
>
> On Mon, Jun 09, 2025 at 08:19:58PM +0200, Aleksandrs Vinarskis wrote:
> > On Mon, 9 Jun 2025 at 19:13, Brian Norris <briannorris@chromium.org> wrote:
> > > On Fri, Jun 06, 2025 at 02:21:54PM +0200, Aleksandrs Vinarskis wrote:
> > > > It appears that this commit introduces a critical bug observed on at least
> > > > some Qualcomm Snapdragon X1E/X1P laptops, rendering the suspend function
> > > > unusable.
>
> For my reference, are these laptops represented by the
> arch/arm64/boot/dts/qcom/x1e80100.dtsi family of device trees? I'm just
> trying to reason through what sorts of driver(s) are in use here, in
> case there's something I'm overlooking, as I don't have the laptop in
> question to test.

Hi,

Yes. Dell XPS 9345 is arch/arm64/boot/dts/qcom/x1e80100.dtsi based,
and Asus Zenbook A14 is arch/arm64/boot/dts/qcom/x1p42100.dtsi based,
which is a derivative but has a slightly different PCIe setup. So far
both laptops would behave in the same ways.

>
> > > > With this change in place, after successful suspend the device either:
> > > > 1. Cannot wake up at all. Screen stays black, even though PM has existed
> > > > suspend (observed by external LEDs controlled by PM)
> > > >
> > > > 2. Wakes up eventually after minutes (instead of seconds) with SSD related
> > > > errors in dmesg. System still exhibits errors eg. UI icons are not properly
> > > > loaded, WiFi does not (always) connect.
>
> FYI, my assumption here based on the log snippets and the patch in
> question is that "only" the NVMe driver's IRQs are getting b0rked by my
> change. I could imagine that would produce the above symptoms in most
> laptop configurations, because failing disk I/O will likely block most
> wakeup-related activity, and cause all sorts of UI and system daemon
> (e.g., WiFi supplicant) misbehavior.
>
> > > I'm sorry to hear this has caused regressions. I don't yet know why your
> > > particular problems have occurred, but I did notice last week that there
> > > were some issues with the patch in question. I wrote a patch which I'll
> > > append, and have started (but not completely finished) testing it.
> > > Perhaps you could try it out and let me know how it goes?
> >
> > Hi Brian,
> >
> > I have tested your attached patch in addition to the original one, and
> > unfortunately it did not resolve the problem on either of the two
> > laptops: neither managed to wake up, just like before.
> > Will be happy to promptly test other proposed solutions.
>
> Thanks for the testing. I've found a few problems with my proposed
> patch, and I've come up with the appended alternative that solves them.
> Could you give it a try?

Just tested, and it appears to solve it, though I see some errors on
wakeup that I don't remember seeing before. I will test-drive this
setup for a day to provide better feedback and confirm if it is
related to the fixup or not.

>
> Also, if it's not too much trouble (and especially if my patch still
> doesn't help you), could you also provide a more complete kernel log and
> kernel .config file? (Attachment is fine with me. Or a direct email, if
> somehow the lists don't like it.) It's possible that would give me more
> hints as to what's going wrong for you.

I will share the logs with and without the fixup by private email
attachment in a bit.

Thanks for looking into this,
Alex


>
> Thanks,
> Brian
>
> diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
> index b0e0a7332993..3e873c5ce623 100644
> --- a/kernel/irq/chip.c
> +++ b/kernel/irq/chip.c
> @@ -205,12 +205,15 @@ __irq_startup_managed(struct irq_desc *desc, const struct cpumask *aff,
>
>  void irq_startup_managed(struct irq_desc *desc)
>  {
> +       struct irq_data *d = irq_desc_get_irq_data(desc);
> +
>         /*
>          * Only start it up when the disable depth is 1, so that a disable,
>          * hotunplug, hotplug sequence does not end up enabling it during
>          * hotplug unconditionally.
>          */
>         desc->depth--;
> +       irqd_clr_managed_shutdown(d);
>         if (!desc->depth)
>                 irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
>  }
> diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
> index f07529ae4895..755346ea9819 100644
> --- a/kernel/irq/cpuhotplug.c
> +++ b/kernel/irq/cpuhotplug.c
> @@ -210,13 +210,6 @@ static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int cpu)
>             !irq_data_get_irq_chip(data) || !cpumask_test_cpu(cpu, affinity))
>                 return;
>
> -       /*
> -        * Don't restore suspended interrupts here when a system comes back
> -        * from S3. They are reenabled via resume_device_irqs().
> -        */
> -       if (desc->istate & IRQS_SUSPENDED)
> -               return;
> -
>         if (irqd_is_managed_and_shutdown(data))
>                 irq_startup_managed(desc);
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-11  6:56           ` Aleksandrs Vinarskis
@ 2025-06-11 19:08             ` Brian Norris
  2025-06-12 18:40               ` Brian Norris
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Norris @ 2025-06-11 19:08 UTC (permalink / raw)
  To: Aleksandrs Vinarskis
  Cc: Thomas Gleixner, Tsai Sung-Fu, Douglas Anderson, linux-kernel,
	Johan Hovold

Hi Alex,

On Wed, Jun 11, 2025 at 08:56:40AM +0200, Aleksandrs Vinarskis wrote:
> Yes. Dell XPS 9345 is arch/arm64/boot/dts/qcom/x1e80100.dtsi based,
> and Asus Zenbook A14 is arch/arm64/boot/dts/qcom/x1p42100.dtsi based,
> which is a derivative but has a slightly different PCIe setup. So far
> both laptops would behave in the same ways.

Thanks. So that's what I suspected, a DWC/pcie-qcom PCIe driver, and
seemingly standard NVMe on top. pcie-qcom doesn't seem to do anything
weird regarding MSIs or affinity, so I wonder why I can't reproduce the
same symptoms on other NVMe setups so far. I can see some of the NVMe
queue MSIs left disabled (queue 0 / CPU 0 doesn't hit the same bugs,
since we don't hotplug CPU 0), but operations seem to function OK even
when missing a few queues. Maybe that's an implementation-specific
behavior that exhibits differently depending on the exact disk in
question.

Anyway, I think we've probably honed in on the bugs, so this might just
be a curiosity.

> > Thanks for the testing. I've found a few problems with my proposed
> > patch, and I've come up with the appended alternative that solves them.
> > Could you give it a try?
> 
> Just tested, and it appears to solve it, though I see some errors on
> wakeup that I don't remember seeing before. I will test-drive this
> setup for a day to provide better feedback and confirm if it is
> related to the fixup or not.

That's promising, I think. Do feel free to forward info if you think
there's still a problem though. I'll await your feedback before spinning
patches.

Brian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-11 19:08             ` Brian Norris
@ 2025-06-12 18:40               ` Brian Norris
  2025-06-18 10:17                 ` Johan Hovold
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Norris @ 2025-06-12 18:40 UTC (permalink / raw)
  To: Aleksandrs Vinarskis
  Cc: Thomas Gleixner, Tsai Sung-Fu, Douglas Anderson, linux-kernel,
	Johan Hovold

On Wed, Jun 11, 2025 at 12:08:16PM -0700, Brian Norris wrote:
> On Wed, Jun 11, 2025 at 08:56:40AM +0200, Aleksandrs Vinarskis wrote:
> > Yes. Dell XPS 9345 is arch/arm64/boot/dts/qcom/x1e80100.dtsi based,
> > and Asus Zenbook A14 is arch/arm64/boot/dts/qcom/x1p42100.dtsi based,
> > which is a derivative but has a slightly different PCIe setup. So far
> > both laptops would behave in the same ways.
> 
> Thanks. So that's what I suspected, a DWC/pcie-qcom PCIe driver, and
> seemingly standard NVMe on top. pcie-qcom doesn't seem to do anything
> weird regarding MSIs or affinity, [...]

For the record, I was reminded that DWC/pcie-qcom does not, in fact,
support irq_chip::irq_set_affinity(), which could perhaps be a unique
factor in his systems' behavior.

> > > Thanks for the testing. I've found a few problems with my proposed
> > > patch, and I've come up with the appended alternative that solves them.
> > > Could you give it a try?
> > 
> > Just tested, and it appears to solve it, though I see some errors on
> > wakeup that I don't remember seeing before. I will test-drive this
> > setup for a day to provide better feedback and confirm if it is
> > related to the fixup or not.
> 
> That's promising, I think. Do feel free to forward info if you think
> there's still a problem though. I'll await your feedback before spinning
> patches.

Alex sent some private feedback, and from what I could tell, there was
nothing concerning. The "new" errors are simply about a wakeup attempt
interrupting the CPU offlining process, which I believe is normal
behavior depending on the wakeup actvity on his laptop (e.g., input
devices).

I've submitted my fixes here:

Subject: [PATCH 6.16 0/2] genirq: Fixes for CPU hotplug / disable-depth regressions
https://lore.kernel.org/lkml/20250612183303.3433234-1-briannorris@chromium.org/

Brian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-12 18:40               ` Brian Norris
@ 2025-06-18 10:17                 ` Johan Hovold
  2025-06-18 17:10                   ` Brian Norris
  0 siblings, 1 reply; 21+ messages in thread
From: Johan Hovold @ 2025-06-18 10:17 UTC (permalink / raw)
  To: Brian Norris
  Cc: Aleksandrs Vinarskis, Thomas Gleixner, Tsai Sung-Fu,
	Douglas Anderson, linux-kernel

On Thu, Jun 12, 2025 at 11:40:38AM -0700, Brian Norris wrote:
> On Wed, Jun 11, 2025 at 12:08:16PM -0700, Brian Norris wrote:
> > On Wed, Jun 11, 2025 at 08:56:40AM +0200, Aleksandrs Vinarskis wrote:
> > > Yes. Dell XPS 9345 is arch/arm64/boot/dts/qcom/x1e80100.dtsi based,
> > > and Asus Zenbook A14 is arch/arm64/boot/dts/qcom/x1p42100.dtsi based,
> > > which is a derivative but has a slightly different PCIe setup. So far
> > > both laptops would behave in the same ways.
> > 
> > Thanks. So that's what I suspected, a DWC/pcie-qcom PCIe driver, and
> > seemingly standard NVMe on top. pcie-qcom doesn't seem to do anything
> > weird regarding MSIs or affinity, [...]
> 
> For the record, I was reminded that DWC/pcie-qcom does not, in fact,
> support irq_chip::irq_set_affinity(), which could perhaps be a unique
> factor in his systems' behavior.

No, we use the GIC ITS for the NVMe interrupts on X1E so that should
not be involved here.

Johan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-18 10:17                 ` Johan Hovold
@ 2025-06-18 17:10                   ` Brian Norris
  2025-06-19  8:32                     ` Johan Hovold
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Norris @ 2025-06-18 17:10 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Aleksandrs Vinarskis, Thomas Gleixner, Tsai Sung-Fu,
	Douglas Anderson, linux-kernel

On Wed, Jun 18, 2025 at 12:17:17PM +0200, Johan Hovold wrote:
> On Thu, Jun 12, 2025 at 11:40:38AM -0700, Brian Norris wrote:
> > For the record, I was reminded that DWC/pcie-qcom does not, in fact,
> > support irq_chip::irq_set_affinity(), which could perhaps be a unique
> > factor in his systems' behavior.
> 
> No, we use the GIC ITS for the NVMe interrupts on X1E so that should
> not be involved here.

Huh, interesting callout. I had looked at trying that previously on
another DWC-based platform, for the same reasons noted in that
switchover.

Anyway, all I can tell you is that Alex's logs clearly showed:

  set affinity failed(-22)

for what looked like the NVMe interrupts during CPU unplug / migration.

Brian

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] genirq: Retain depth for managed IRQs across CPU hotplug
  2025-06-18 17:10                   ` Brian Norris
@ 2025-06-19  8:32                     ` Johan Hovold
  0 siblings, 0 replies; 21+ messages in thread
From: Johan Hovold @ 2025-06-19  8:32 UTC (permalink / raw)
  To: Brian Norris
  Cc: Aleksandrs Vinarskis, Thomas Gleixner, Tsai Sung-Fu,
	Douglas Anderson, linux-kernel

On Wed, Jun 18, 2025 at 10:10:14AM -0700, Brian Norris wrote:
> On Wed, Jun 18, 2025 at 12:17:17PM +0200, Johan Hovold wrote:
> > On Thu, Jun 12, 2025 at 11:40:38AM -0700, Brian Norris wrote:
> > > For the record, I was reminded that DWC/pcie-qcom does not, in fact,
> > > support irq_chip::irq_set_affinity(), which could perhaps be a unique
> > > factor in his systems' behavior.
> > 
> > No, we use the GIC ITS for the NVMe interrupts on X1E so that should
> > not be involved here.
> 
> Huh, interesting callout. I had looked at trying that previously on
> another DWC-based platform, for the same reasons noted in that
> switchover.
> 
> Anyway, all I can tell you is that Alex's logs clearly showed:
> 
>   set affinity failed(-22)
> 
> for what looked like the NVMe interrupts during CPU unplug / migration.

No, those warnings are for wakeup enabled GPIO interrupts (e.g. the lid
switch), which are currently partly disabled on this platform pending
some rework of the PDC driver:

	602cb14e310a ("pinctrl: qcom: x1e80100: Bypass PDC wakeup parent for now")

(And with the rework those warnings go away, while the IRQ suspend
regression is still there.)

Johan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 2/2] genirq: Add kunit tests for depth counts
  2025-05-14 20:13 [PATCH v2 0/2] genirq: Retain depth for managed IRQs across CPU hotplug Brian Norris
  2025-05-14 20:13 ` [PATCH v2 1/2] " Brian Norris
@ 2025-05-14 20:13 ` Brian Norris
  2025-05-15 14:01   ` kernel test robot
  2025-05-15 14:51   ` [tip: irq/core] genirq: Add kunit tests for disable " tip-bot2 for Brian Norris
  1 sibling, 2 replies; 21+ messages in thread
From: Brian Norris @ 2025-05-14 20:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Tsai Sung-Fu, Douglas Anderson, linux-kernel, Brian Norris

There have been a few bugs and/or misunderstandings about the reference
counting, and startup/shutdown behaviors in the IRQ core and related CPU
hotplug code. These 4 test cases try to capture a few interesting cases.

* irq_disable_depth_test: basic request/disable/enable sequence

* irq_free_disabled_test: request/disable/free/re-request sequence -
  this catches errors on previous revisions of my work

* irq_cpuhotplug_test: exercises managed-affinity IRQ + CPU hotplug.
  This captures a problematic test case that I've fixed.
  This test requires CONFIG_SMP and a hotpluggable CPU#1.

* irq_shutdown_depth_test: exercises similar behavior from
  irq_cpuhotplug_test, but directly using irq_*() APIs instead of going
  through CPU hotplug. This still requires CONFIG_SMP, because
  managed-affinity is stubbed out (and not all APIs are even present)
  without it.

Note the use of 'imply SMP': ARCH=um doesn't support SMP, and kunit is
often exercised there. Thus, 'imply' will force SMP on where possible
(such as ARCH=x86_64), but leave it off where it's not.

Behavior on various SMP and ARCH configurations:

  $ tools/testing/kunit/kunit.py run 'irq_test_cases*' --arch x86_64 --qemu_args '-smp 2'
  [...]
  [11:12:24] Testing complete. Ran 4 tests: passed: 4

  $ tools/testing/kunit/kunit.py run 'irq_test_cases*' --arch x86_64
  [...]
  [11:13:27] [SKIPPED] irq_cpuhotplug_test
  [11:13:27] ================= [PASSED] irq_test_cases ==================
  [11:13:27] ============================================================
  [11:13:27] Testing complete. Ran 4 tests: passed: 3, skipped: 1

  # default: ARCH=um
  $ tools/testing/kunit/kunit.py run 'irq_test_cases*'
  [11:14:26] [SKIPPED] irq_shutdown_depth_test
  [11:14:26] [SKIPPED] irq_cpuhotplug_test
  [11:14:26] ================= [PASSED] irq_test_cases ==================
  [11:14:26] ============================================================
  [11:14:26] Testing complete. Ran 4 tests: passed: 2, skipped: 2

Without the prior fix ("genirq: Retain depth for managed IRQs across CPU
hotplug"), we fail as follows:

  [11:18:55] =============== irq_test_cases (4 subtests) ================
  [11:18:55] [PASSED] irq_disable_depth_test
  [11:18:55] [PASSED] irq_free_disabled_test
  [11:18:55]     # irq_shutdown_depth_test: EXPECTATION FAILED at kernel/irq/irq_test.c:147
  [11:18:55]     Expected desc->depth == 1, but
  [11:18:55]         desc->depth == 0 (0x0)
  [11:18:55] ------------[ cut here ]------------
  [11:18:55] Unbalanced enable for IRQ 26
  [11:18:55] WARNING: CPU: 1 PID: 36 at kernel/irq/manage.c:792 __enable_irq+0x36/0x60
  ...
  [11:18:55] [FAILED] irq_shutdown_depth_test
  [11:18:55]  #1
  [11:18:55]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:202
  [11:18:55]     Expected irqd_is_activated(data) to be false, but is true
  [11:18:55]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:203
  [11:18:55]     Expected irqd_is_started(data) to be false, but is true
  [11:18:55]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:204
  [11:18:55]     Expected desc->depth == 1, but
  [11:18:55]         desc->depth == 0 (0x0)
  [11:18:55] ------------[ cut here ]------------
  [11:18:55] Unbalanced enable for IRQ 27
  [11:18:55] WARNING: CPU: 0 PID: 38 at kernel/irq/manage.c:792 __enable_irq+0x36/0x60
  ...
  [11:18:55] [FAILED] irq_cpuhotplug_test
  [11:18:55]     # module: irq_test
  [11:18:55] # irq_test_cases: pass:2 fail:2 skip:0 total:4
  [11:18:55] # Totals: pass:2 fail:2 skip:0 total:4
  [11:18:55] ================= [FAILED] irq_test_cases ==================
  [11:18:55] ============================================================
  [11:18:55] Testing complete. Ran 4 tests: passed: 2, failed: 2

Signed-off-by: Brian Norris <briannorris@chromium.org>
---

Changes in v2:
 * add request_irq()/disable_irq()/free_irq()/request_irq() test
   sequence
 * clean up more resources in tests
 * move tests to patch 2 (i.e., after bugs are fixed and tests pass)
 * adapt to irq_startup_managed() (new API)

 kernel/irq/Kconfig    |  11 ++
 kernel/irq/Makefile   |   1 +
 kernel/irq/irq_test.c | 229 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 241 insertions(+)
 create mode 100644 kernel/irq/irq_test.c

diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 3f02a0e45254..7429abe5011f 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -144,6 +144,17 @@ config GENERIC_IRQ_DEBUGFS
 config GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD
 	bool
 
+config IRQ_KUNIT_TEST
+	tristate "KUnit tests for IRQ management APIs" if !KUNIT_ALL_TESTS
+	depends on KUNIT
+	default KUNIT_ALL_TESTS
+	imply SMP
+	help
+	  This option enables KUnit tests for the IRQ subsystem API. These are
+	  only for development and testing, not for regular kernel use cases.
+
+	  If unsure, say N.
+
 endmenu
 
 config GENERIC_IRQ_MULTI_HANDLER
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index c0f44c06d69d..6ab3a4055667 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -19,3 +19,4 @@ obj-$(CONFIG_GENERIC_IRQ_IPI_MUX) += ipi-mux.o
 obj-$(CONFIG_SMP) += affinity.o
 obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
 obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
+obj-$(CONFIG_IRQ_KUNIT_TEST) += irq_test.o
diff --git a/kernel/irq/irq_test.c b/kernel/irq/irq_test.c
new file mode 100644
index 000000000000..5161b56a12f9
--- /dev/null
+++ b/kernel/irq/irq_test.c
@@ -0,0 +1,229 @@
+// SPDX-License-Identifier: LGPL-2.1+
+
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
+#include <linux/irqdomain.h>
+#include <linux/nodemask.h>
+#include <kunit/test.h>
+
+#include "internals.h"
+
+static irqreturn_t noop_handler(int irq, void *data)
+{
+	return IRQ_HANDLED;
+}
+
+static void noop(struct irq_data *data) { }
+static unsigned int noop_ret(struct irq_data *data) { return 0; }
+
+static int noop_affinity(struct irq_data *data, const struct cpumask *dest,
+			 bool force)
+{
+	irq_data_update_effective_affinity(data, dest);
+
+	return 0;
+}
+
+static struct irq_chip fake_irq_chip = {
+	.name           = "fake",
+	.irq_startup    = noop_ret,
+	.irq_shutdown   = noop,
+	.irq_enable     = noop,
+	.irq_disable    = noop,
+	.irq_ack        = noop,
+	.irq_mask       = noop,
+	.irq_unmask     = noop,
+	.irq_set_affinity = noop_affinity,
+	.flags          = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static void irq_disable_depth_test(struct kunit *test)
+{
+	struct irq_desc *desc;
+	int virq, ret;
+
+	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, NULL);
+	KUNIT_ASSERT_GE(test, virq, 0);
+
+	irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq);
+
+	desc = irq_to_desc(virq);
+	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	disable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	enable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	free_irq(virq, NULL);
+}
+
+static void irq_free_disabled_test(struct kunit *test)
+{
+	struct irq_desc *desc;
+	int virq, ret;
+
+	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, NULL);
+	KUNIT_ASSERT_GE(test, virq, 0);
+
+	irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq);
+
+	desc = irq_to_desc(virq);
+	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	disable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	free_irq(virq, NULL);
+	KUNIT_EXPECT_GE(test, desc->depth, 1);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	free_irq(virq, NULL);
+}
+
+static void irq_shutdown_depth_test(struct kunit *test)
+{
+	struct irq_desc *desc;
+	struct irq_data *data;
+	int virq, ret;
+	struct irq_affinity_desc affinity = {
+		.is_managed = 1,
+		.mask = CPU_MASK_ALL,
+	};
+
+	if (!IS_ENABLED(CONFIG_SMP))
+		kunit_skip(test, "requires CONFIG_SMP for managed shutdown");
+
+	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, &affinity);
+	KUNIT_ASSERT_GE(test, virq, 0);
+
+	irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq);
+
+	desc = irq_to_desc(virq);
+	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
+
+	data = irq_desc_get_irq_data(desc);
+	KUNIT_ASSERT_PTR_NE(test, data, NULL);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+
+	KUNIT_EXPECT_TRUE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_TRUE(test, irqd_is_started(data));
+	KUNIT_EXPECT_TRUE(test, irqd_affinity_is_managed(data));
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	disable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	irq_shutdown_and_deactivate(desc);
+
+	KUNIT_EXPECT_FALSE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_FALSE(test, irqd_is_started(data));
+
+	KUNIT_EXPECT_EQ(test, irq_activate(desc), 0);
+#ifdef CONFIG_SMP
+	irq_startup_managed(desc);
+#endif
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	enable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	free_irq(virq, NULL);
+}
+
+static void irq_cpuhotplug_test(struct kunit *test)
+{
+	struct irq_desc *desc;
+	struct irq_data *data;
+	int virq, ret;
+	struct irq_affinity_desc affinity = {
+		.is_managed = 1,
+	};
+
+	if (!IS_ENABLED(CONFIG_SMP))
+		kunit_skip(test, "requires CONFIG_SMP for CPU hotplug");
+	if (!get_cpu_device(1))
+		kunit_skip(test, "requires more than 1 CPU for CPU hotplug");
+	if (!cpu_is_hotpluggable(1))
+		kunit_skip(test, "CPU 1 must be hotpluggable");
+
+	cpumask_copy(&affinity.mask, cpumask_of(1));
+
+	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, &affinity);
+	KUNIT_ASSERT_GE(test, virq, 0);
+
+	irq_set_chip_and_handler(virq, &fake_irq_chip, handle_simple_irq);
+
+	desc = irq_to_desc(virq);
+	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
+
+	data = irq_desc_get_irq_data(desc);
+	KUNIT_ASSERT_PTR_NE(test, data, NULL);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+
+	KUNIT_EXPECT_TRUE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_TRUE(test, irqd_is_started(data));
+	KUNIT_EXPECT_TRUE(test, irqd_affinity_is_managed(data));
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	disable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	KUNIT_EXPECT_EQ(test, remove_cpu(1), 0);
+	KUNIT_EXPECT_FALSE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_FALSE(test, irqd_is_started(data));
+	KUNIT_EXPECT_GE(test, desc->depth, 1);
+	KUNIT_EXPECT_EQ(test, add_cpu(1), 0);
+
+	KUNIT_EXPECT_FALSE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_FALSE(test, irqd_is_started(data));
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	enable_irq(virq);
+	KUNIT_EXPECT_TRUE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_TRUE(test, irqd_is_started(data));
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	free_irq(virq, NULL);
+}
+
+static struct kunit_case irq_test_cases[] = {
+	KUNIT_CASE(irq_disable_depth_test),
+	KUNIT_CASE(irq_free_disabled_test),
+	KUNIT_CASE(irq_shutdown_depth_test),
+	KUNIT_CASE(irq_cpuhotplug_test),
+	{}
+};
+
+static struct kunit_suite irq_test_suite = {
+	.name = "irq_test_cases",
+	.test_cases = irq_test_cases,
+};
+
+kunit_test_suite(irq_test_suite);
+MODULE_DESCRIPTION("IRQ unit test suite");
+MODULE_LICENSE("GPL");
-- 
2.49.0.1045.g170613ef41-goog


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] genirq: Add kunit tests for depth counts
  2025-05-14 20:13 ` [PATCH v2 2/2] genirq: Add kunit tests for depth counts Brian Norris
@ 2025-05-15 14:01   ` kernel test robot
  2025-05-15 17:21     ` Brian Norris
  2025-05-15 14:51   ` [tip: irq/core] genirq: Add kunit tests for disable " tip-bot2 for Brian Norris
  1 sibling, 1 reply; 21+ messages in thread
From: kernel test robot @ 2025-05-15 14:01 UTC (permalink / raw)
  To: Brian Norris, Thomas Gleixner
  Cc: llvm, oe-kbuild-all, Tsai Sung-Fu, Douglas Anderson, linux-kernel,
	Brian Norris

Hi Brian,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/irq/core]
[also build test ERROR on linus/master v6.15-rc6 next-20250515]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Brian-Norris/genirq-Retain-depth-for-managed-IRQs-across-CPU-hotplug/20250515-041533
base:   tip/irq/core
patch link:    https://lore.kernel.org/r/20250514201353.3481400-3-briannorris%40chromium.org
patch subject: [PATCH v2 2/2] genirq: Add kunit tests for depth counts
config: i386-buildonly-randconfig-004-20250515 (https://download.01.org/0day-ci/archive/20250515/202505152136.y04AHovS-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250515/202505152136.y04AHovS-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505152136.y04AHovS-lkp@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

WARNING: modpost: missing MODULE_DESCRIPTION() in lib/ucs2_string.o
ERROR: modpost: "irq_domain_alloc_descs" [kernel/irq/irq_test.ko] undefined!
ERROR: modpost: "irq_to_desc" [kernel/irq/irq_test.ko] undefined!
ERROR: modpost: "irq_shutdown_and_deactivate" [kernel/irq/irq_test.ko] undefined!
>> ERROR: modpost: "irq_activate" [kernel/irq/irq_test.ko] undefined!
>> ERROR: modpost: "irq_startup_managed" [kernel/irq/irq_test.ko] undefined!

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] genirq: Add kunit tests for depth counts
  2025-05-15 14:01   ` kernel test robot
@ 2025-05-15 17:21     ` Brian Norris
  2025-05-15 22:24       ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Norris @ 2025-05-15 17:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Thomas Gleixner, llvm, oe-kbuild-all, Tsai Sung-Fu,
	Douglas Anderson, linux-kernel, kernel test robot

Hi Thomas,

On Thu, May 15, 2025 at 10:01:18PM +0800, kernel test robot wrote:
> patch link:    https://lore.kernel.org/r/20250514201353.3481400-3-briannorris%40chromium.org
> patch subject: [PATCH v2 2/2] genirq: Add kunit tests for depth counts

First of all, thanks for the help, and for applying patch 1. I see that:
1) this bot noticed a trivial problem with patch 2; and
2) I received notification that patch 2 was applied to tip/irq/core, but
3) I can't find it there any more.

I'm not sure if #3 is because you dropped it (e.g., due to #1's report)
or some other reason, so I'm not sure what to do next. Possibilities:

(a) send the trivial fix separately, as a fixup (against what tree?)
(b) resend an improved patch 2 on its own, against tip/irq/core
(c) just drop it, because you have deeper reasons to not want these
    tests.

I'm fine with anything you'd like, although I do think there's value in
providing unit tests for corner cases like this.

See below for the trivial fix, for the record. I can send it separately
if you'd like.

> config: i386-buildonly-randconfig-004-20250515 (https://download.01.org/0day-ci/archive/20250515/202505152136.y04AHovS-lkp@intel.com/config)
> compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250515/202505152136.y04AHovS-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202505152136.y04AHovS-lkp@intel.com/
> 
> All errors (new ones prefixed by >>, old ones prefixed by <<):
> 
> WARNING: modpost: missing MODULE_DESCRIPTION() in lib/ucs2_string.o
> ERROR: modpost: "irq_domain_alloc_descs" [kernel/irq/irq_test.ko] undefined!
> ERROR: modpost: "irq_to_desc" [kernel/irq/irq_test.ko] undefined!
> ERROR: modpost: "irq_shutdown_and_deactivate" [kernel/irq/irq_test.ko] undefined!
> >> ERROR: modpost: "irq_activate" [kernel/irq/irq_test.ko] undefined!
> >> ERROR: modpost: "irq_startup_managed" [kernel/irq/irq_test.ko] undefined!

The test Kconfig symbol should be bool, not tristate. Some of the
functions required for the test are non-modular.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202505152136.y04AHovS-lkp@intel.com/
Signed-off-by: Brian Norris <briannorris@chromium.org>

--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -145,7 +145,7 @@ config GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD
 	bool
 
 config IRQ_KUNIT_TEST
-	tristate "KUnit tests for IRQ management APIs" if !KUNIT_ALL_TESTS
+	bool "KUnit tests for IRQ management APIs" if !KUNIT_ALL_TESTS
 	depends on KUNIT
 	default KUNIT_ALL_TESTS
 	imply SMP

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/2] genirq: Add kunit tests for depth counts
  2025-05-15 17:21     ` Brian Norris
@ 2025-05-15 22:24       ` Thomas Gleixner
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Gleixner @ 2025-05-15 22:24 UTC (permalink / raw)
  To: Brian Norris
  Cc: llvm, oe-kbuild-all, Tsai Sung-Fu, Douglas Anderson, linux-kernel,
	kernel test robot

On Thu, May 15 2025 at 10:21, Brian Norris wrote:
> On Thu, May 15, 2025 at 10:01:18PM +0800, kernel test robot wrote:
>> patch link:    https://lore.kernel.org/r/20250514201353.3481400-3-briannorris%40chromium.org
>> patch subject: [PATCH v2 2/2] genirq: Add kunit tests for depth counts
>
> First of all, thanks for the help, and for applying patch 1. I see that:
> 1) this bot noticed a trivial problem with patch 2; and
> 2) I received notification that patch 2 was applied to tip/irq/core, but
> 3) I can't find it there any more.
>
> I'm not sure if #3 is because you dropped it (e.g., due to #1's report)
> or some other reason, so I'm not sure what to do next. Possibilities:

#3 because I dropped it.

> (a) send the trivial fix separately, as a fixup (against what tree?)
> (b) resend an improved patch 2 on its own, against tip/irq/core
> (c) just drop it, because you have deeper reasons to not want these
>     tests.

#b please

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [tip: irq/core] genirq: Add kunit tests for disable depth counts
  2025-05-14 20:13 ` [PATCH v2 2/2] genirq: Add kunit tests for depth counts Brian Norris
  2025-05-15 14:01   ` kernel test robot
@ 2025-05-15 14:51   ` tip-bot2 for Brian Norris
  1 sibling, 0 replies; 21+ messages in thread
From: tip-bot2 for Brian Norris @ 2025-05-15 14:51 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Brian Norris, Thomas Gleixner, x86, linux-kernel, maz

The following commit has been merged into the irq/core branch of tip:

Commit-ID:     334c36eeb5df010ea1d954a96b6eba42a15a1d4a
Gitweb:        https://git.kernel.org/tip/334c36eeb5df010ea1d954a96b6eba42a15a1d4a
Author:        Brian Norris <briannorris@chromium.org>
AuthorDate:    Wed, 14 May 2025 13:13:17 -07:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 15 May 2025 16:44:25 +02:00

genirq: Add kunit tests for disable depth counts

There have been a few bugs and/or misunderstandings about the reference
counting, and startup/shutdown behaviors in the interrupt core and the
related CPU hotplug code. These 4 test cases try to capture a few
interesting cases.

* irq_disable_depth_test: basic request/disable/enable sequence

* irq_free_disabled_test: request/disable/free/re-request sequence - this
  catches errors on previous revisions of the fix for the hotplug case.

* irq_cpuhotplug_test: exercises managed-affinity IRQ + CPU hotplug.

  This captures a problematic test case for a bug in that code path, which
  failed to retain the disable depth across the managed interrupt
  hotunplug/hotplug path.

  This test requires CONFIG_SMP and a hotpluggable CPU#1.

* irq_shutdown_depth_test: exercises similar behavior as

  irq_cpuhotplug_test, but directly uses the irq_*() APIs instead of going
  through CPU hotplug. This still requires CONFIG_SMP, because
  managed-affinity is stubbed out (and not all APIs are even present)
  without it.

Note the use of 'imply SMP': ARCH=um doesn't support SMP, and kunit is
often exercised there. Thus, 'imply' will force SMP on where possible
(such as ARCH=x86_64), but leave it off where it's not.

Behavior on various SMP and ARCH configurations:

  $ tools/testing/kunit/kunit.py run 'irq_test_cases*' --arch x86_64 --qemu_args '-smp 2'
  [...]
  [11:12:24] Testing complete. Ran 4 tests: passed: 4

  $ tools/testing/kunit/kunit.py run 'irq_test_cases*' --arch x86_64
  [...]
  [11:13:27] [SKIPPED] irq_cpuhotplug_test
  [11:13:27] ================= [PASSED] irq_test_cases ==================
  [11:13:27] ============================================================
  [11:13:27] Testing complete. Ran 4 tests: passed: 3, skipped: 1

  # default: ARCH=um
  $ tools/testing/kunit/kunit.py run 'irq_test_cases*'
  [11:14:26] [SKIPPED] irq_shutdown_depth_test
  [11:14:26] [SKIPPED] irq_cpuhotplug_test
  [11:14:26] ================= [PASSED] irq_test_cases ==================
  [11:14:26] ============================================================
  [11:14:26] Testing complete. Ran 4 tests: passed: 2, skipped: 2

Without the prior fix ("genirq: Retain disable depth for managed interrupts
across CPU hotplug"), this fails as follows:

  [11:18:55] =============== irq_test_cases (4 subtests) ================
  [11:18:55] [PASSED] irq_disable_depth_test
  [11:18:55] [PASSED] irq_free_disabled_test
  [11:18:55]     # irq_shutdown_depth_test: EXPECTATION FAILED at kernel/irq/irq_test.c:147
  [11:18:55]     Expected desc->depth == 1, but
  [11:18:55]         desc->depth == 0 (0x0)
  [11:18:55] ------------[ cut here ]------------
  [11:18:55] Unbalanced enable for IRQ 26
  [11:18:55] WARNING: CPU: 1 PID: 36 at kernel/irq/manage.c:792 __enable_irq+0x36/0x60
  ...
  [11:18:55] [FAILED] irq_shutdown_depth_test
  [11:18:55]  #1
  [11:18:55]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:202
  [11:18:55]     Expected irqd_is_activated(data) to be false, but is true
  [11:18:55]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:203
  [11:18:55]     Expected irqd_is_started(data) to be false, but is true
  [11:18:55]     # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:204
  [11:18:55]     Expected desc->depth == 1, but
  [11:18:55]         desc->depth == 0 (0x0)
  [11:18:55] ------------[ cut here ]------------
  [11:18:55] Unbalanced enable for IRQ 27
  [11:18:55] WARNING: CPU: 0 PID: 38 at kernel/irq/manage.c:792 __enable_irq+0x36/0x60
  ...
  [11:18:55] [FAILED] irq_cpuhotplug_test
  [11:18:55]     # module: irq_test
  [11:18:55] # irq_test_cases: pass:2 fail:2 skip:0 total:4
  [11:18:55] # Totals: pass:2 fail:2 skip:0 total:4
  [11:18:55] ================= [FAILED] irq_test_cases ==================
  [11:18:55] ============================================================
  [11:18:55] Testing complete. Ran 4 tests: passed: 2, failed: 2

Signed-off-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20250514201353.3481400-3-briannorris@chromium.org

---
 kernel/irq/Kconfig    |  11 ++-
 kernel/irq/Makefile   |   1 +-
 kernel/irq/irq_test.c | 229 +++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 241 insertions(+)
 create mode 100644 kernel/irq/irq_test.c

diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 3f02a0e..7429abe 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -144,6 +144,17 @@ config GENERIC_IRQ_DEBUGFS
 config GENERIC_IRQ_KEXEC_CLEAR_VM_FORWARD
 	bool
 
+config IRQ_KUNIT_TEST
+	tristate "KUnit tests for IRQ management APIs" if !KUNIT_ALL_TESTS
+	depends on KUNIT
+	default KUNIT_ALL_TESTS
+	imply SMP
+	help
+	  This option enables KUnit tests for the IRQ subsystem API. These are
+	  only for development and testing, not for regular kernel use cases.
+
+	  If unsure, say N.
+
 endmenu
 
 config GENERIC_IRQ_MULTI_HANDLER
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index c0f44c0..6ab3a40 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -19,3 +19,4 @@ obj-$(CONFIG_GENERIC_IRQ_IPI_MUX) += ipi-mux.o
 obj-$(CONFIG_SMP) += affinity.o
 obj-$(CONFIG_GENERIC_IRQ_DEBUGFS) += debugfs.o
 obj-$(CONFIG_GENERIC_IRQ_MATRIX_ALLOCATOR) += matrix.o
+obj-$(CONFIG_IRQ_KUNIT_TEST) += irq_test.o
diff --git a/kernel/irq/irq_test.c b/kernel/irq/irq_test.c
new file mode 100644
index 0000000..5161b56
--- /dev/null
+++ b/kernel/irq/irq_test.c
@@ -0,0 +1,229 @@
+// SPDX-License-Identifier: LGPL-2.1+
+
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/irqdesc.h>
+#include <linux/irqdomain.h>
+#include <linux/nodemask.h>
+#include <kunit/test.h>
+
+#include "internals.h"
+
+static irqreturn_t noop_handler(int irq, void *data)
+{
+	return IRQ_HANDLED;
+}
+
+static void noop(struct irq_data *data) { }
+static unsigned int noop_ret(struct irq_data *data) { return 0; }
+
+static int noop_affinity(struct irq_data *data, const struct cpumask *dest,
+			 bool force)
+{
+	irq_data_update_effective_affinity(data, dest);
+
+	return 0;
+}
+
+static struct irq_chip fake_irq_chip = {
+	.name           = "fake",
+	.irq_startup    = noop_ret,
+	.irq_shutdown   = noop,
+	.irq_enable     = noop,
+	.irq_disable    = noop,
+	.irq_ack        = noop,
+	.irq_mask       = noop,
+	.irq_unmask     = noop,
+	.irq_set_affinity = noop_affinity,
+	.flags          = IRQCHIP_SKIP_SET_WAKE,
+};
+
+static void irq_disable_depth_test(struct kunit *test)
+{
+	struct irq_desc *desc;
+	int virq, ret;
+
+	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, NULL);
+	KUNIT_ASSERT_GE(test, virq, 0);
+
+	irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq);
+
+	desc = irq_to_desc(virq);
+	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	disable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	enable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	free_irq(virq, NULL);
+}
+
+static void irq_free_disabled_test(struct kunit *test)
+{
+	struct irq_desc *desc;
+	int virq, ret;
+
+	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, NULL);
+	KUNIT_ASSERT_GE(test, virq, 0);
+
+	irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq);
+
+	desc = irq_to_desc(virq);
+	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	disable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	free_irq(virq, NULL);
+	KUNIT_EXPECT_GE(test, desc->depth, 1);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	free_irq(virq, NULL);
+}
+
+static void irq_shutdown_depth_test(struct kunit *test)
+{
+	struct irq_desc *desc;
+	struct irq_data *data;
+	int virq, ret;
+	struct irq_affinity_desc affinity = {
+		.is_managed = 1,
+		.mask = CPU_MASK_ALL,
+	};
+
+	if (!IS_ENABLED(CONFIG_SMP))
+		kunit_skip(test, "requires CONFIG_SMP for managed shutdown");
+
+	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, &affinity);
+	KUNIT_ASSERT_GE(test, virq, 0);
+
+	irq_set_chip_and_handler(virq, &dummy_irq_chip, handle_simple_irq);
+
+	desc = irq_to_desc(virq);
+	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
+
+	data = irq_desc_get_irq_data(desc);
+	KUNIT_ASSERT_PTR_NE(test, data, NULL);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+
+	KUNIT_EXPECT_TRUE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_TRUE(test, irqd_is_started(data));
+	KUNIT_EXPECT_TRUE(test, irqd_affinity_is_managed(data));
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	disable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	irq_shutdown_and_deactivate(desc);
+
+	KUNIT_EXPECT_FALSE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_FALSE(test, irqd_is_started(data));
+
+	KUNIT_EXPECT_EQ(test, irq_activate(desc), 0);
+#ifdef CONFIG_SMP
+	irq_startup_managed(desc);
+#endif
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	enable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	free_irq(virq, NULL);
+}
+
+static void irq_cpuhotplug_test(struct kunit *test)
+{
+	struct irq_desc *desc;
+	struct irq_data *data;
+	int virq, ret;
+	struct irq_affinity_desc affinity = {
+		.is_managed = 1,
+	};
+
+	if (!IS_ENABLED(CONFIG_SMP))
+		kunit_skip(test, "requires CONFIG_SMP for CPU hotplug");
+	if (!get_cpu_device(1))
+		kunit_skip(test, "requires more than 1 CPU for CPU hotplug");
+	if (!cpu_is_hotpluggable(1))
+		kunit_skip(test, "CPU 1 must be hotpluggable");
+
+	cpumask_copy(&affinity.mask, cpumask_of(1));
+
+	virq = irq_domain_alloc_descs(-1, 1, 0, NUMA_NO_NODE, &affinity);
+	KUNIT_ASSERT_GE(test, virq, 0);
+
+	irq_set_chip_and_handler(virq, &fake_irq_chip, handle_simple_irq);
+
+	desc = irq_to_desc(virq);
+	KUNIT_ASSERT_PTR_NE(test, desc, NULL);
+
+	data = irq_desc_get_irq_data(desc);
+	KUNIT_ASSERT_PTR_NE(test, data, NULL);
+
+	ret = request_irq(virq, noop_handler, 0, "test_irq", NULL);
+	KUNIT_EXPECT_EQ(test, ret, 0);
+
+	KUNIT_EXPECT_TRUE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_TRUE(test, irqd_is_started(data));
+	KUNIT_EXPECT_TRUE(test, irqd_affinity_is_managed(data));
+
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	disable_irq(virq);
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	KUNIT_EXPECT_EQ(test, remove_cpu(1), 0);
+	KUNIT_EXPECT_FALSE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_FALSE(test, irqd_is_started(data));
+	KUNIT_EXPECT_GE(test, desc->depth, 1);
+	KUNIT_EXPECT_EQ(test, add_cpu(1), 0);
+
+	KUNIT_EXPECT_FALSE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_FALSE(test, irqd_is_started(data));
+	KUNIT_EXPECT_EQ(test, desc->depth, 1);
+
+	enable_irq(virq);
+	KUNIT_EXPECT_TRUE(test, irqd_is_activated(data));
+	KUNIT_EXPECT_TRUE(test, irqd_is_started(data));
+	KUNIT_EXPECT_EQ(test, desc->depth, 0);
+
+	free_irq(virq, NULL);
+}
+
+static struct kunit_case irq_test_cases[] = {
+	KUNIT_CASE(irq_disable_depth_test),
+	KUNIT_CASE(irq_free_disabled_test),
+	KUNIT_CASE(irq_shutdown_depth_test),
+	KUNIT_CASE(irq_cpuhotplug_test),
+	{}
+};
+
+static struct kunit_suite irq_test_suite = {
+	.name = "irq_test_cases",
+	.test_cases = irq_test_cases,
+};
+
+kunit_test_suite(irq_test_suite);
+MODULE_DESCRIPTION("IRQ unit test suite");
+MODULE_LICENSE("GPL");

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-06-19  8:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-14 20:13 [PATCH v2 0/2] genirq: Retain depth for managed IRQs across CPU hotplug Brian Norris
2025-05-14 20:13 ` [PATCH v2 1/2] " Brian Norris
2025-05-15 14:51   ` [tip: irq/core] genirq: Retain disable depth for managed interrupts " tip-bot2 for Brian Norris
2025-06-06 12:21   ` [PATCH v2 1/2] genirq: Retain depth for managed IRQs " Aleksandrs Vinarskis
2025-06-09 17:13     ` Brian Norris
2025-06-09 18:19       ` Aleksandrs Vinarskis
2025-06-10 20:07         ` Brian Norris
2025-06-11  6:50           ` Thomas Gleixner
2025-06-11  8:50             ` Thomas Gleixner
2025-06-11 18:51               ` Brian Norris
2025-06-11  6:56           ` Aleksandrs Vinarskis
2025-06-11 19:08             ` Brian Norris
2025-06-12 18:40               ` Brian Norris
2025-06-18 10:17                 ` Johan Hovold
2025-06-18 17:10                   ` Brian Norris
2025-06-19  8:32                     ` Johan Hovold
2025-05-14 20:13 ` [PATCH v2 2/2] genirq: Add kunit tests for depth counts Brian Norris
2025-05-15 14:01   ` kernel test robot
2025-05-15 17:21     ` Brian Norris
2025-05-15 22:24       ` Thomas Gleixner
2025-05-15 14:51   ` [tip: irq/core] genirq: Add kunit tests for disable " tip-bot2 for Brian Norris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).