* [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
@ 2025-09-08 16:08 Lukas Wunner
2025-09-17 14:56 ` Sebastian Andrzej Siewior
2025-09-20 21:20 ` Thomas Gleixner
0 siblings, 2 replies; 10+ messages in thread
From: Lukas Wunner @ 2025-09-08 16:08 UTC (permalink / raw)
To: Thomas Gleixner, Sebastian Andrzej Siewior, Crystal Wood,
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Clark Williams, Steven Rostedt
Cc: Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel,
Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran
Crystal reports the PCIe Advanced Error Reporting driver getting stuck
in an infinite loop on PREEMPT_RT: Both the primary IRQ handler
aer_irq() as well as the secondary handler aer_isr() are forced into
threads with identical priority.
Crystal writes that on the ARM system in question, the primary IRQ
handler has to clear an error in the Root Error Status register...
"before the next error happens, or else the hardware will set the
Multiple ERR_COR Received bit. If that bit is set, then aer_isr()
can't rely on the Error Source Identification register, so it scans
through all devices looking for errors -- and for some reason, on
this system, accessing the AER registers (or any Config Space above
0x400, even though there are capabilities located there) generates
an Unsupported Request Error (but returns valid data). Since this
happens more than once, without aer_irq() preempting, it causes
another multi error and we get stuck in a loop."
The issue does not show on non-PREEMPT_RT since the primary IRQ handler
runs in hardirq context and thus clears the Root Error Status register
before waking the secondary IRQ handler's thread.
Simulate the same behavior on PREEMPT_RT by assigning a lower default
priority to forced secondary IRQ handlers.
Reported-by: Crystal Wood <crwood@redhat.com>
Tested-by: Crystal Wood <crwood@redhat.com>
Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Crystal provided a Tested-by off-list. I've used Crystal's suggestion
for the code comment, but if anyone prefers Sebastian's or some other
wording, I don't mind either respinning myself or a maintainer amending
this or any other aspect of the patch when applying. Thanks!
include/linux/sched.h | 1 +
kernel/irq/manage.c | 5 ++++-
kernel/sched/syscalls.c | 11 +++++++++++
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2b27238..ceed23d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1894,6 +1894,7 @@ static inline int task_nice(const struct task_struct *p)
extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
extern void sched_set_fifo(struct task_struct *p);
+extern void sched_set_fifo_minus_one(struct task_struct *p);
extern void sched_set_fifo_low(struct task_struct *p);
extern void sched_set_normal(struct task_struct *p, int nice);
extern int sched_setattr(struct task_struct *, const struct sched_attr *);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c948373..b09c18a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1239,7 +1239,10 @@ static int irq_thread(void *data)
irq_thread_set_ready(desc, action);
- sched_set_fifo(current);
+ if (action->handler == irq_forced_secondary_handler)
+ sched_set_fifo_minus_one(current);
+ else
+ sched_set_fifo(current);
if (force_irqthreads() && test_bit(IRQTF_FORCED_THREAD,
&action->thread_flags))
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 77ae87f..4be85aa 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -847,6 +847,17 @@ void sched_set_fifo(struct task_struct *p)
EXPORT_SYMBOL_GPL(sched_set_fifo);
/*
+ * For tasks that must be preemptible by a sched_set_fifo() task, such as
+ * to simulate the behavior of a non-PREEMPT_RT system where the
+ * sched_set_fifo() task is a hard interrupt.
+ */
+void sched_set_fifo_minus_one(struct task_struct *p)
+{
+ struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 - 1 };
+ WARN_ON_ONCE(sched_setscheduler_nocheck(p, SCHED_FIFO, &sp) != 0);
+}
+
+/*
* For when you don't much care about FIFO, but want to be above SCHED_NORMAL.
*/
void sched_set_fifo_low(struct task_struct *p)
--
2.50.1
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler 2025-09-08 16:08 [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler Lukas Wunner @ 2025-09-17 14:56 ` Sebastian Andrzej Siewior 2025-09-20 21:20 ` Thomas Gleixner 1 sibling, 0 replies; 10+ messages in thread From: Sebastian Andrzej Siewior @ 2025-09-17 14:56 UTC (permalink / raw) To: Lukas Wunner Cc: Thomas Gleixner, Crystal Wood, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran On 2025-09-08 18:08:31 [+0200], Lukas Wunner wrote: > Crystal reports the PCIe Advanced Error Reporting driver getting stuck > in an infinite loop on PREEMPT_RT: Both the primary IRQ handler > aer_irq() as well as the secondary handler aer_isr() are forced into > threads with identical priority. > > Crystal writes that on the ARM system in question, the primary IRQ > handler has to clear an error in the Root Error Status register... > > "before the next error happens, or else the hardware will set the > Multiple ERR_COR Received bit. If that bit is set, then aer_isr() > can't rely on the Error Source Identification register, so it scans > through all devices looking for errors -- and for some reason, on > this system, accessing the AER registers (or any Config Space above > 0x400, even though there are capabilities located there) generates > an Unsupported Request Error (but returns valid data). Since this > happens more than once, without aer_irq() preempting, it causes > another multi error and we get stuck in a loop." > > The issue does not show on non-PREEMPT_RT since the primary IRQ handler > runs in hardirq context and thus clears the Root Error Status register > before waking the secondary IRQ handler's thread. > > Simulate the same behavior on PREEMPT_RT by assigning a lower default > priority to forced secondary IRQ handlers. > > Reported-by: Crystal Wood <crwood@redhat.com> > Tested-by: Crystal Wood <crwood@redhat.com> > Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/ > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > Crystal provided a Tested-by off-list. I've used Crystal's suggestion > for the code comment, but if anyone prefers Sebastian's or some other > wording, I don't mind either respinning myself or a maintainer amending > this or any other aspect of the patch when applying. Thanks! Crystal's wording ist fine. Sebastian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler 2025-09-08 16:08 [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler Lukas Wunner 2025-09-17 14:56 ` Sebastian Andrzej Siewior @ 2025-09-20 21:20 ` Thomas Gleixner 2025-09-21 13:12 ` Lukas Wunner 1 sibling, 1 reply; 10+ messages in thread From: Thomas Gleixner @ 2025-09-20 21:20 UTC (permalink / raw) To: Lukas Wunner, Sebastian Andrzej Siewior, Crystal Wood, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt Cc: Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran On Mon, Sep 08 2025 at 18:08, Lukas Wunner wrote: > Crystal reports the PCIe Advanced Error Reporting driver getting stuck > in an infinite loop on PREEMPT_RT: Both the primary IRQ handler Can you please write out interrupt. This IRQ acronym is just not helpful at all. > aer_irq() as well as the secondary handler aer_isr() are forced into > threads with identical priority. > > Crystal writes that on the ARM system in question, the primary IRQ > handler has to clear an error in the Root Error Status register... > > "before the next error happens, or else the hardware will set the > Multiple ERR_COR Received bit. If that bit is set, then aer_isr() > can't rely on the Error Source Identification register, so it scans > through all devices looking for errors -- and for some reason, on > this system, accessing the AER registers (or any Config Space above > 0x400, even though there are capabilities located there) generates > an Unsupported Request Error (but returns valid data). Since this > happens more than once, without aer_irq() preempting, it causes > another multi error and we get stuck in a loop." > > The issue does not show on non-PREEMPT_RT since the primary IRQ handler > runs in hardirq context and thus clears the Root Error Status register > before waking the secondary IRQ handler's thread. That sentence does not make sense at all because on RT this is not any different. The primary handler clears the status register before waking the secondary handler, no? What's different on non-RT is that the primary handler can interrupt the threaded secondary handler, which is required to make progress. > Simulate the same behavior on PREEMPT_RT by assigning a lower default > priority to forced secondary IRQ handlers. I'm not really fond of this new minus one interface naming. It's a horrible misnomer as it suggests that it sets the FIFO priority to -1, which doesn't make any sense at all and is confusing at best. Can you please pick a name, which makes it obvious what this is about and as this is really a special case for forced secondary interrupt handlers, it should just reflect that in the name. I obviously understand that the proposed change squashs the whole class of similar (not yet detected) issues, but that made me look at that particular instance nevertheless. All aer_irq() does is reading two PCI config words, writing one and then sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So arguably this AER problem can be nicely solved by the below one-liner, no? And because that made me curious I looked at the first 40 instances of interrupt requests which have a primary and a secondary handler. 21 of them can simply set IRQF_NO_THREAD 3 of them are just silly. I really could not resist to create the diff and append it below for entertainment. Which means going through all the 231 instances is definitely a worthwhile exercise. The rest: 1) is not immediately obvious as I did not follow (indirect) call chains to figure out what's really behind them 2) sets IRQF_ONESHOT, which means the interrupt line is masked before the primary handler runs and stays masked until the secondary handler returns. #1 needs eyeballs #2 should actually not use the secondary thread mechanism at all when forced threaded. This really should do: hardirq() wake(primary_thread); primary_thread() if (primary_handler() == IRQ_WAKE_THREAD) { if (secondary_handler && ONESHOT) secondary_handler(); else wake(secondary_thread); That spares a boat load of wakeups and scheduling on RT and reduces the actual problem space significantly. This might need a new IRQF_xxx flag to avoid overloading ONESHOT, but that's an implementation detail. Hmm? Thanks, tglx --- --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1671,7 +1671,7 @@ static int aer_probe(struct pcie_device set_service_data(dev, rpc); status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr, - IRQF_SHARED, "aerdrv", dev); + IRQF_NO_THREAD | IRQF_SHARED, "aerdrv", dev); if (status) { pci_err(port, "request AER IRQ %d failed\n", dev->irq); return status; <ENTERTAINMENT> --- a/drivers/bluetooth/btintel_pcie.c +++ b/drivers/bluetooth/btintel_pcie.c @@ -1482,11 +1482,6 @@ static void btintel_pcie_msix_rx_handle( } } -static irqreturn_t btintel_pcie_msix_isr(int irq, void *data) -{ - return IRQ_WAKE_THREAD; -} - static inline bool btintel_pcie_is_rxq_empty(struct btintel_pcie_data *data) { return data->ia.cr_hia[BTINTEL_PCIE_RXQ_NUM] == data->ia.cr_tia[BTINTEL_PCIE_RXQ_NUM]; @@ -1585,13 +1580,9 @@ static int btintel_pcie_setup_irq(struct msix_entry = &data->msix_entries[i]; msix_entry->vector = pci_irq_vector(data->pdev, i); - err = devm_request_threaded_irq(&data->pdev->dev, - msix_entry->vector, - btintel_pcie_msix_isr, - btintel_pcie_irq_msix_handler, - IRQF_SHARED, - KBUILD_MODNAME, - msix_entry); + err = devm_request_threaded_irq(&data->pdev->dev, msix_entry->vector, + NULL, btintel_pcie_irq_msix_handler, + IRQF_SHARED, KBUILD_MODNAME, msix_entry); if (err) { pci_free_irq_vectors(data->pdev); data->alloc_vecs = 0; --- a/drivers/bus/fsl-mc/dprc-driver.c +++ b/drivers/bus/fsl-mc/dprc-driver.c @@ -381,17 +381,6 @@ int dprc_scan_container(struct fsl_mc_de EXPORT_SYMBOL_GPL(dprc_scan_container); /** - * dprc_irq0_handler - Regular ISR for DPRC interrupt 0 - * - * @irq_num: IRQ number of the interrupt being handled - * @arg: Pointer to device structure - */ -static irqreturn_t dprc_irq0_handler(int irq_num, void *arg) -{ - return IRQ_WAKE_THREAD; -} - -/** * dprc_irq0_handler_thread - Handler thread function for DPRC interrupt 0 * * @irq_num: IRQ number of the interrupt being handled @@ -525,10 +514,8 @@ static int register_dprc_irq_handler(str * NOTE: devm_request_threaded_irq() invokes the device-specific * function that programs the MSI physically in the device */ - error = devm_request_threaded_irq(&mc_dev->dev, - irq->virq, - dprc_irq0_handler, - dprc_irq0_handler_thread, + error = devm_request_threaded_irq(&mc_dev->dev, irq->virq, + NULL, dprc_irq0_handler_thread, IRQF_NO_SUSPEND | IRQF_ONESHOT, dev_name(&mc_dev->dev), &mc_dev->dev); --- a/drivers/gpu/drm/arm/malidp_hw.c +++ b/drivers/gpu/drm/arm/malidp_hw.c @@ -1356,11 +1356,6 @@ void malidp_se_irq_hw_init(struct malidp hwdev->hw->map.se_irq_map.irq_mask); } -static irqreturn_t malidp_se_irq_thread_handler(int irq, void *arg) -{ - return IRQ_HANDLED; -} - int malidp_se_irq_init(struct drm_device *drm, int irq) { struct malidp_drm *malidp = drm_to_malidp(drm); @@ -1371,8 +1366,7 @@ int malidp_se_irq_init(struct drm_device malidp_hw_disable_irq(hwdev, MALIDP_SE_BLOCK, 0xffffffff); malidp_hw_clear_irq(hwdev, MALIDP_SE_BLOCK, 0xffffffff); - ret = devm_request_threaded_irq(drm->dev, irq, malidp_se_irq, - malidp_se_irq_thread_handler, + ret = devm_request_threaded_irq(drm->dev, irq, malidp_se_irq, NULL, IRQF_SHARED, "malidp-se", drm); if (ret < 0) { DRM_ERROR("failed to install SE IRQ handler\n"); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler 2025-09-20 21:20 ` Thomas Gleixner @ 2025-09-21 13:12 ` Lukas Wunner 2025-09-21 18:15 ` Thomas Gleixner 2025-10-03 18:25 ` Crystal Wood 0 siblings, 2 replies; 10+ messages in thread From: Lukas Wunner @ 2025-09-21 13:12 UTC (permalink / raw) To: Thomas Gleixner Cc: Sebastian Andrzej Siewior, Crystal Wood, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote: > I obviously understand that the proposed change squashs the whole class > of similar (not yet detected) issues, but that made me look at that > particular instance nevertheless. > > All aer_irq() does is reading two PCI config words, writing one and then > sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So > arguably this AER problem can be nicely solved by the below one-liner, > no? The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally proposed: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/ I guess your point is that handling the few operations in aer_irq() in hard interrupt context is cheaper than waking a thread and deferring them to that thread? Intuitively I would assume that most threaded interrupt handlers are architected in this way: They only do minimal work in hard interrupt context and defer the actual work to the (secondary) thread. E.g. pciehp_isr() + pciehp_ist() is likewise designed to follow this principle. Your research that at first glance, at least 21 of 40 instances of request_threaded_irq() could just use IRQF_NO_THREAD, seems to support the notion that the majority of interrupt handlers only do minimal work in hard interrupt context. But if that is the case, and if you believe that deferring that small amount of work to a thread is nonsensical, then why is the primary handler forced into a thread by default in the first place, requiring drivers to explicitly opt out by setting IRQF_NO_THREAD? Shouldn't it rather be the other way round, i.e. by default the primary handler is *not* forced into a thread, but only if the driver explicitly opts in? (In cases where the primary handler does a sufficient amount of work that is justified to be deferred to a thread.) Thanks, Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler 2025-09-21 13:12 ` Lukas Wunner @ 2025-09-21 18:15 ` Thomas Gleixner 2025-10-03 18:25 ` Crystal Wood 1 sibling, 0 replies; 10+ messages in thread From: Thomas Gleixner @ 2025-09-21 18:15 UTC (permalink / raw) To: Lukas Wunner Cc: Sebastian Andrzej Siewior, Crystal Wood, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran On Sun, Sep 21 2025 at 15:12, Lukas Wunner wrote: > On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote: > Your research that at first glance, at least 21 of 40 instances of > request_threaded_irq() could just use IRQF_NO_THREAD, seems to > support the notion that the majority of interrupt handlers only > do minimal work in hard interrupt context. > > But if that is the case, and if you believe that deferring that > small amount of work to a thread is nonsensical, then why is the > primary handler forced into a thread by default in the first place, > requiring drivers to explicitly opt out by setting IRQF_NO_THREAD? Because there are primary handlers which are absolutely not RT safe. > Shouldn't it rather be the other way round, i.e. by default the > primary handler is *not* forced into a thread, but only if the > driver explicitly opts in? (In cases where the primary handler > does a sufficient amount of work that is justified to be deferred > to a thread.) We had to do it this way before RT got upstream as there was no way to play a whack a mole game with drivers constantly being added and changed. Thanks, tglx ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler 2025-09-21 13:12 ` Lukas Wunner 2025-09-21 18:15 ` Thomas Gleixner @ 2025-10-03 18:25 ` Crystal Wood 2025-10-24 13:33 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 10+ messages in thread From: Crystal Wood @ 2025-10-03 18:25 UTC (permalink / raw) To: Lukas Wunner, Thomas Gleixner Cc: Sebastian Andrzej Siewior, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran On Sun, 2025-09-21 at 15:12 +0200, Lukas Wunner wrote: > On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote: > > I obviously understand that the proposed change squashs the whole class > > of similar (not yet detected) issues, but that made me look at that > > particular instance nevertheless. > > > > All aer_irq() does is reading two PCI config words, writing one and then > > sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So > > arguably this AER problem can be nicely solved by the below one-liner, > > no? > > The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally > proposed: > > https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/ So, is the plan to apply the original patch then? Thanks, Crystal ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler 2025-10-03 18:25 ` Crystal Wood @ 2025-10-24 13:33 ` Sebastian Andrzej Siewior 2025-10-24 21:00 ` Crystal Wood 0 siblings, 1 reply; 10+ messages in thread From: Sebastian Andrzej Siewior @ 2025-10-24 13:33 UTC (permalink / raw) To: Crystal Wood, Thomas Gleixner Cc: Lukas Wunner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran On 2025-10-03 13:25:53 [-0500], Crystal Wood wrote: > On Sun, 2025-09-21 at 15:12 +0200, Lukas Wunner wrote: > > On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote: > > > I obviously understand that the proposed change squashs the whole class > > > of similar (not yet detected) issues, but that made me look at that > > > particular instance nevertheless. > > > > > > All aer_irq() does is reading two PCI config words, writing one and then > > > sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So > > > arguably this AER problem can be nicely solved by the below one-liner, > > > no? > > > > The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally > > proposed: > > > > https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/ > > So, is the plan to apply the original patch then? Did we settle on something? I wasn't sure if you can mix IRQF_NO_THREAD with IRQF_ONESHOT for shared handlers. If that is a thing, we Crystal's original would do it. Then there is the question if we want to go the "class" problem to ensure that one handler can preempt the other. And maybe I should clean up few ones tglx pointed out that provide a primary handler for no reason… > Thanks, > Crystal Sebastian ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler 2025-10-24 13:33 ` Sebastian Andrzej Siewior @ 2025-10-24 21:00 ` Crystal Wood 2025-10-27 6:40 ` Lukas Wunner 0 siblings, 1 reply; 10+ messages in thread From: Crystal Wood @ 2025-10-24 21:00 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Thomas Gleixner Cc: Lukas Wunner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran On Fri, 2025-10-24 at 15:33 +0200, Sebastian Andrzej Siewior wrote: > On 2025-10-03 13:25:53 [-0500], Crystal Wood wrote: > > On Sun, 2025-09-21 at 15:12 +0200, Lukas Wunner wrote: > > > On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote: > > > > I obviously understand that the proposed change squashs the whole class > > > > of similar (not yet detected) issues, but that made me look at that > > > > particular instance nevertheless. > > > > > > > > All aer_irq() does is reading two PCI config words, writing one and then > > > > sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So > > > > arguably this AER problem can be nicely solved by the below one-liner, > > > > no? > > > > > > The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally > > > proposed: > > > > > > https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/ > > > > So, is the plan to apply the original patch then? > > Did we settle on something? > I wasn't sure if you can mix IRQF_NO_THREAD with IRQF_ONESHOT for shared > handlers. If that is a thing, we Crystal's original would do it. Do you mean mixing IRQF_NO_THREAD on this irq (which should eliminate the forced IRQF_ONESHOT) with another shared irq that still has IRQF_ONESHOT? I suspect it was a non-issue because of IRQCHIP_ONESHOT_SAFE disabling the forced oneshot (the other irq was pciehp). Given that these are pcie-specific, do they ever get used without MSI (which sets IRQCHIP_ONESHOT_SAFE)[1]? The issue seems to be that the type of oneshot we want for forced threading (unmask after the first user-supplied handler) is different from what we want for explicit IRQF_ONESHOT (unmask after the last user-supplied handler). If we separated those, then the semantics would better match non-RT, and we'd only need to care about mixing when it comes to explicit IRQF_NOSHOT. > Then there is the question if we want to go the "class" problem to > ensure that one handler can preempt the other. And maybe I should > clean up few ones tglx pointed out that provide a primary handler for > no reason… Either way works for me, as long as we pick at least one :-) -Crystal [1] I realize that the answer to "has any hardware designer ever done this weird and bad thing?" is usually yes. :-P ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler 2025-10-24 21:00 ` Crystal Wood @ 2025-10-27 6:40 ` Lukas Wunner 2025-10-27 8:18 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 10+ messages in thread From: Lukas Wunner @ 2025-10-27 6:40 UTC (permalink / raw) To: Crystal Wood Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran On Fri, Oct 24, 2025 at 04:00:02PM -0500, Crystal Wood wrote: > On Fri, 2025-10-24 at 15:33 +0200, Sebastian Andrzej Siewior wrote: > > On 2025-10-03 13:25:53 [-0500], Crystal Wood wrote: > > > On Sun, 2025-09-21 at 15:12 +0200, Lukas Wunner wrote: > > > > On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote: > > > > > I obviously understand that the proposed change squashs the whole > > > > > class of similar (not yet detected) issues, but that made me look at > > > > > that particular instance nevertheless. > > > > > > > > > > All aer_irq() does is reading two PCI config words, writing one and > > > > > then sticking 64bytes into a KFIFO. All of that is hard interrupt > > > > > safe. So arguably this AER problem can be nicely solved by the below > > > > > one-liner, no? > > > > > > > > The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally > > > > proposed: > > > > > > > > https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/ > > > > > > So, is the plan to apply the original patch then? > > > > Did we settle on something? > > I wasn't sure if you can mix IRQF_NO_THREAD with IRQF_ONESHOT for shared > > handlers. If that is a thing, we Crystal's original would do it. > > Do you mean mixing IRQF_NO_THREAD on this irq (which should eliminate > the forced IRQF_ONESHOT) with another shared irq that still has > IRQF_ONESHOT? > > I suspect it was a non-issue because of IRQCHIP_ONESHOT_SAFE disabling > the forced oneshot (the other irq was pciehp). Given that these are > pcie-specific, do they ever get used without MSI (which sets > IRQCHIP_ONESHOT_SAFE)[1]? It seems fragile to depend on IRQCHIP_ONESHOT_SAFE. What about irqchips which don't set that? What about PCIe ports which use legacy INTx instead of MSI? As Sebastian pointed out, setting IRQF_NO_THREAD for the AER driver but not for any of the other PCIe port services risks starving the AER primary handler if any of the other PCIe port services' (threaded) primary handlers take a long time: https://lore.kernel.org/r/20250904073024.YsLeZqK_@linutronix.de/ I went through all PCIe port services to see if IRQF_NO_THREAD could be set on all of them. Alas that's not possible: pciehp's primary handler acquires a spinlock because of runtime PM API calls (device->power.lock). It's not a raw_spin_lock_t, so it becomes a sleeping lock on RT which cannot be acquired in hardirq context and converting to a raw_spin_lock_t would be non-trivial and probably undesirable. pme's primary handler also acquires a spinlock, this one could be converted to a raw_spin_lock_t. FWIW, PCIe port services requesting a (potentially shared) interrupt are: drivers/pci/hotplug/pciehp_hpc.c drivers/pci/pcie/aer.c drivers/pci/pcie/dpc.c drivers/pci/pcie/pme.c drivers/pci/pcie/bwctrl.c Long story short, I'll respin the patch to reduce the forced secondary thread's priority, taking into account Thomas' feedback. (Apologies for not having done this earlier.) Thanks, Lukas ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler 2025-10-27 6:40 ` Lukas Wunner @ 2025-10-27 8:18 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 10+ messages in thread From: Sebastian Andrzej Siewior @ 2025-10-27 8:18 UTC (permalink / raw) To: Lukas Wunner Cc: Crystal Wood, Thomas Gleixner, Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran On 2025-10-27 07:40:46 [+0100], Lukas Wunner wrote: > > I suspect it was a non-issue because of IRQCHIP_ONESHOT_SAFE disabling > > the forced oneshot (the other irq was pciehp). Given that these are > > pcie-specific, do they ever get used without MSI (which sets > > IRQCHIP_ONESHOT_SAFE)[1]? > > It seems fragile to depend on IRQCHIP_ONESHOT_SAFE. What about irqchips > which don't set that? What about PCIe ports which use legacy INTx > instead of MSI? exactly. > Long story short, I'll respin the patch to reduce the forced secondary > thread's priority, taking into account Thomas' feedback. > (Apologies for not having done this earlier.) no worries, thanks for the spin up. > Thanks, > > Lukas Sebastian ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-10-27 8:18 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-08 16:08 [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler Lukas Wunner 2025-09-17 14:56 ` Sebastian Andrzej Siewior 2025-09-20 21:20 ` Thomas Gleixner 2025-09-21 13:12 ` Lukas Wunner 2025-09-21 18:15 ` Thomas Gleixner 2025-10-03 18:25 ` Crystal Wood 2025-10-24 13:33 ` Sebastian Andrzej Siewior 2025-10-24 21:00 ` Crystal Wood 2025-10-27 6:40 ` Lukas Wunner 2025-10-27 8:18 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).