linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
@ 2025-09-08 16:08 Lukas Wunner
  2025-09-17 14:56 ` Sebastian Andrzej Siewior
  2025-09-20 21:20 ` Thomas Gleixner
  0 siblings, 2 replies; 10+ messages in thread
From: Lukas Wunner @ 2025-09-08 16:08 UTC (permalink / raw)
  To: Thomas Gleixner, Sebastian Andrzej Siewior, Crystal Wood,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Clark Williams, Steven Rostedt
  Cc: Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel,
	Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran

Crystal reports the PCIe Advanced Error Reporting driver getting stuck
in an infinite loop on PREEMPT_RT:  Both the primary IRQ handler
aer_irq() as well as the secondary handler aer_isr() are forced into
threads with identical priority.

Crystal writes that on the ARM system in question, the primary IRQ
handler has to clear an error in the Root Error Status register...

   "before the next error happens, or else the hardware will set the
    Multiple ERR_COR Received bit.  If that bit is set, then aer_isr()
    can't rely on the Error Source Identification register, so it scans
    through all devices looking for errors -- and for some reason, on
    this system, accessing the AER registers (or any Config Space above
    0x400, even though there are capabilities located there) generates
    an Unsupported Request Error (but returns valid data).  Since this
    happens more than once, without aer_irq() preempting, it causes
    another multi error and we get stuck in a loop."

The issue does not show on non-PREEMPT_RT since the primary IRQ handler
runs in hardirq context and thus clears the Root Error Status register
before waking the secondary IRQ handler's thread.

Simulate the same behavior on PREEMPT_RT by assigning a lower default
priority to forced secondary IRQ handlers.

Reported-by: Crystal Wood <crwood@redhat.com>
Tested-by: Crystal Wood <crwood@redhat.com>
Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Crystal provided a Tested-by off-list.  I've used Crystal's suggestion
for the code comment, but if anyone prefers Sebastian's or some other
wording, I don't mind either respinning myself or a maintainer amending
this or any other aspect of the patch when applying.  Thanks!

 include/linux/sched.h   |  1 +
 kernel/irq/manage.c     |  5 ++++-
 kernel/sched/syscalls.c | 11 +++++++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2b27238..ceed23d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1894,6 +1894,7 @@ static inline int task_nice(const struct task_struct *p)
 extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
 extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
 extern void sched_set_fifo(struct task_struct *p);
+extern void sched_set_fifo_minus_one(struct task_struct *p);
 extern void sched_set_fifo_low(struct task_struct *p);
 extern void sched_set_normal(struct task_struct *p, int nice);
 extern int sched_setattr(struct task_struct *, const struct sched_attr *);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c948373..b09c18a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1239,7 +1239,10 @@ static int irq_thread(void *data)
 
 	irq_thread_set_ready(desc, action);
 
-	sched_set_fifo(current);
+	if (action->handler == irq_forced_secondary_handler)
+		sched_set_fifo_minus_one(current);
+	else
+		sched_set_fifo(current);
 
 	if (force_irqthreads() && test_bit(IRQTF_FORCED_THREAD,
 					   &action->thread_flags))
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 77ae87f..4be85aa 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -847,6 +847,17 @@ void sched_set_fifo(struct task_struct *p)
 EXPORT_SYMBOL_GPL(sched_set_fifo);
 
 /*
+ * For tasks that must be preemptible by a sched_set_fifo() task, such as
+ * to simulate the behavior of a non-PREEMPT_RT system where the
+ * sched_set_fifo() task is a hard interrupt.
+ */
+void sched_set_fifo_minus_one(struct task_struct *p)
+{
+	struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 - 1 };
+	WARN_ON_ONCE(sched_setscheduler_nocheck(p, SCHED_FIFO, &sp) != 0);
+}
+
+/*
  * For when you don't much care about FIFO, but want to be above SCHED_NORMAL.
  */
 void sched_set_fifo_low(struct task_struct *p)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-10-27  8:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-08 16:08 [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler Lukas Wunner
2025-09-17 14:56 ` Sebastian Andrzej Siewior
2025-09-20 21:20 ` Thomas Gleixner
2025-09-21 13:12   ` Lukas Wunner
2025-09-21 18:15     ` Thomas Gleixner
2025-10-03 18:25     ` Crystal Wood
2025-10-24 13:33       ` Sebastian Andrzej Siewior
2025-10-24 21:00         ` Crystal Wood
2025-10-27  6:40           ` Lukas Wunner
2025-10-27  8:18             ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).