All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] genirq: Enforce monotonic increase contract in irq_get_next_irq()
@ 2026-06-04  2:01 Aaron Tomlin
  2026-06-04  9:21 ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Aaron Tomlin @ 2026-06-04  2:01 UTC (permalink / raw)
  To: tglx; +Cc: neelx, sean, steve, mproche, linux-kernel

When an IRQ descriptor is corrupted in memory (e.g., via an out-of-bounds
write by a rogue driver), the descriptor's internal IRQ number may be
zeroed out.

During iteration via for_each_active_irq(), irq_get_next_irq() relies on
irq_desc_get_irq(desc) to retrieve the next IRQ number. If a descriptor is
corrupted, this can result in returning an IRQ number (e.g., 0) that is
strictly less than the requested offset. This breaks the fundamental
forward-progress guarantee of the iterator.

This contract violation causes catastrophic unsigned integer underflows in
callers. For instance, show_all_irqs() in fs/proc/stat.c calculates
padding using (i - next). A corrupted descriptor returning 0 forces a
massive unsigned underflow, trapping the CPU in an extensive loop inside
show_irq_gap() and triggering a soft lockup watchdog.

While the underlying issue is a memory corruption bug, core iterators
should be resilient against returning values that violate their own
mathematical boundaries and induce lockups in other subsystems.

Introduce a lightweight boundary check in irq_get_next_irq() to verify
the returned IRQ is greater than or equal to the offset. If corruption
is detected, raise a WARN_ONCE() to pinpoint the invalid state and
return nr_irqs to safely abort the iteration.

Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
---
 kernel/irq/irqdesc.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 7173b8b634f2..47a9dedb36b3 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -927,7 +927,22 @@ EXPORT_SYMBOL_GPL(__irq_alloc_descs);
  */
 unsigned int irq_get_next_irq(unsigned int offset)
 {
-	return irq_find_at_or_after(offset);
+	unsigned int irq;
+	const unsigned int nr_irqs = irq_get_nr_irqs();
+
+	irq = irq_find_at_or_after(offset);
+
+	/*
+	 * Defend against corrupted IRQ descriptors violating the monotonic
+	 * iterator contract. Returning a value lower than the offset will
+	 * cause catastrophic unsigned underflows in callers.
+	 */
+	if (WARN_ONCE(irq < offset && irq < nr_irqs,
+		      "genirq: Corrupted IRQ descriptor detected: irq %u < offset %u\n",
+		      irq, offset))
+		return nr_irqs;
+
+	return irq;
 }
 
 struct irq_desc *__irq_get_desc_lock(unsigned int irq, unsigned long *flags, bool bus,
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH] genirq: Enforce monotonic increase contract in irq_get_next_irq()
  2026-06-04  2:01 [RFC PATCH] genirq: Enforce monotonic increase contract in irq_get_next_irq() Aaron Tomlin
@ 2026-06-04  9:21 ` Thomas Gleixner
  2026-06-04 13:23   ` Aaron Tomlin
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Gleixner @ 2026-06-04  9:21 UTC (permalink / raw)
  To: Aaron Tomlin; +Cc: neelx, sean, steve, mproche, linux-kernel

On Wed, Jun 03 2026 at 22:01, Aaron Tomlin wrote:
> When an IRQ descriptor is corrupted in memory (e.g., via an out-of-bounds
> write by a rogue driver), the descriptor's internal IRQ number may be
> zeroed out.

Which means the system integrity is compromised.

> During iteration via for_each_active_irq(), irq_get_next_irq() relies on
> irq_desc_get_irq(desc) to retrieve the next IRQ number. If a descriptor is
> corrupted, this can result in returning an IRQ number (e.g., 0) that is
> strictly less than the requested offset. This breaks the fundamental
> forward-progress guarantee of the iterator.
>
> This contract violation causes catastrophic unsigned integer underflows in
> callers. For instance, show_all_irqs() in fs/proc/stat.c calculates
> padding using (i - next). A corrupted descriptor returning 0 forces a
> massive unsigned underflow, trapping the CPU in an extensive loop inside
> show_irq_gap() and triggering a soft lockup watchdog.
>
> While the underlying issue is a memory corruption bug, core iterators
> should be resilient against returning values that violate their own
> mathematical boundaries and induce lockups in other subsystems.

Seriously?

If memory is corrupted and corruption is detected, then the only
sensible thing is to panic the machine and not papering over in a
particular instance and hope that this is the only side effect.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC PATCH] genirq: Enforce monotonic increase contract in irq_get_next_irq()
  2026-06-04  9:21 ` Thomas Gleixner
@ 2026-06-04 13:23   ` Aaron Tomlin
  0 siblings, 0 replies; 3+ messages in thread
From: Aaron Tomlin @ 2026-06-04 13:23 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: neelx, sean, steve, mproche, linux-kernel

On Thu, Jun 04, 2026 at 11:21:17AM +0200, Thomas Gleixner wrote:
> On Wed, Jun 03 2026 at 22:01, Aaron Tomlin wrote:
> > When an IRQ descriptor is corrupted in memory (e.g., via an out-of-bounds
> > write by a rogue driver), the descriptor's internal IRQ number may be
> > zeroed out.
> 
> Which means the system integrity is compromised.
> 
> > During iteration via for_each_active_irq(), irq_get_next_irq() relies on
> > irq_desc_get_irq(desc) to retrieve the next IRQ number. If a descriptor is
> > corrupted, this can result in returning an IRQ number (e.g., 0) that is
> > strictly less than the requested offset. This breaks the fundamental
> > forward-progress guarantee of the iterator.
> >
> > This contract violation causes catastrophic unsigned integer underflows in
> > callers. For instance, show_all_irqs() in fs/proc/stat.c calculates
> > padding using (i - next). A corrupted descriptor returning 0 forces a
> > massive unsigned underflow, trapping the CPU in an extensive loop inside
> > show_irq_gap() and triggering a soft lockup watchdog.
> >
> > While the underlying issue is a memory corruption bug, core iterators
> > should be resilient against returning values that violate their own
> > mathematical boundaries and induce lockups in other subsystems.
> 
> Seriously?
> 
> If memory is corrupted and corruption is detected, then the only
> sensible thing is to panic the machine and not papering over in a
> particular instance and hope that this is the only side effect.

Hi Thomas,

Fair point.

My initial intention was to gracefully handle the iterator boundary
violation, but I completely agree with your assessment. If the backing
memory for the IRQ descriptor is demonstrably corrupted, allowing the
system to continue with a WARN_ONCE() is inherently dangerous and risks
silent data corruption elsewhere.

I will drop this patch.

Thanks for taking the time to review.


Kind regards,
-- 
Aaron Tomlin

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-04 13:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04  2:01 [RFC PATCH] genirq: Enforce monotonic increase contract in irq_get_next_irq() Aaron Tomlin
2026-06-04  9:21 ` Thomas Gleixner
2026-06-04 13:23   ` Aaron Tomlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.