linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
@ 2025-09-08 16:08 Lukas Wunner
  2025-09-17 14:56 ` Sebastian Andrzej Siewior
  2025-09-20 21:20 ` Thomas Gleixner
  0 siblings, 2 replies; 10+ messages in thread
From: Lukas Wunner @ 2025-09-08 16:08 UTC (permalink / raw)
  To: Thomas Gleixner, Sebastian Andrzej Siewior, Crystal Wood,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Clark Williams, Steven Rostedt
  Cc: Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel,
	Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran

Crystal reports the PCIe Advanced Error Reporting driver getting stuck
in an infinite loop on PREEMPT_RT:  Both the primary IRQ handler
aer_irq() as well as the secondary handler aer_isr() are forced into
threads with identical priority.

Crystal writes that on the ARM system in question, the primary IRQ
handler has to clear an error in the Root Error Status register...

   "before the next error happens, or else the hardware will set the
    Multiple ERR_COR Received bit.  If that bit is set, then aer_isr()
    can't rely on the Error Source Identification register, so it scans
    through all devices looking for errors -- and for some reason, on
    this system, accessing the AER registers (or any Config Space above
    0x400, even though there are capabilities located there) generates
    an Unsupported Request Error (but returns valid data).  Since this
    happens more than once, without aer_irq() preempting, it causes
    another multi error and we get stuck in a loop."

The issue does not show on non-PREEMPT_RT since the primary IRQ handler
runs in hardirq context and thus clears the Root Error Status register
before waking the secondary IRQ handler's thread.

Simulate the same behavior on PREEMPT_RT by assigning a lower default
priority to forced secondary IRQ handlers.

Reported-by: Crystal Wood <crwood@redhat.com>
Tested-by: Crystal Wood <crwood@redhat.com>
Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Crystal provided a Tested-by off-list.  I've used Crystal's suggestion
for the code comment, but if anyone prefers Sebastian's or some other
wording, I don't mind either respinning myself or a maintainer amending
this or any other aspect of the patch when applying.  Thanks!

 include/linux/sched.h   |  1 +
 kernel/irq/manage.c     |  5 ++++-
 kernel/sched/syscalls.c | 11 +++++++++++
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2b27238..ceed23d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1894,6 +1894,7 @@ static inline int task_nice(const struct task_struct *p)
 extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
 extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
 extern void sched_set_fifo(struct task_struct *p);
+extern void sched_set_fifo_minus_one(struct task_struct *p);
 extern void sched_set_fifo_low(struct task_struct *p);
 extern void sched_set_normal(struct task_struct *p, int nice);
 extern int sched_setattr(struct task_struct *, const struct sched_attr *);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c948373..b09c18a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1239,7 +1239,10 @@ static int irq_thread(void *data)
 
 	irq_thread_set_ready(desc, action);
 
-	sched_set_fifo(current);
+	if (action->handler == irq_forced_secondary_handler)
+		sched_set_fifo_minus_one(current);
+	else
+		sched_set_fifo(current);
 
 	if (force_irqthreads() && test_bit(IRQTF_FORCED_THREAD,
 					   &action->thread_flags))
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 77ae87f..4be85aa 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -847,6 +847,17 @@ void sched_set_fifo(struct task_struct *p)
 EXPORT_SYMBOL_GPL(sched_set_fifo);
 
 /*
+ * For tasks that must be preemptible by a sched_set_fifo() task, such as
+ * to simulate the behavior of a non-PREEMPT_RT system where the
+ * sched_set_fifo() task is a hard interrupt.
+ */
+void sched_set_fifo_minus_one(struct task_struct *p)
+{
+	struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 - 1 };
+	WARN_ON_ONCE(sched_setscheduler_nocheck(p, SCHED_FIFO, &sp) != 0);
+}
+
+/*
  * For when you don't much care about FIFO, but want to be above SCHED_NORMAL.
  */
 void sched_set_fifo_low(struct task_struct *p)
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
  2025-09-08 16:08 [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler Lukas Wunner
@ 2025-09-17 14:56 ` Sebastian Andrzej Siewior
  2025-09-20 21:20 ` Thomas Gleixner
  1 sibling, 0 replies; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-17 14:56 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Thomas Gleixner, Crystal Wood, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt,
	Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel,
	Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran

On 2025-09-08 18:08:31 [+0200], Lukas Wunner wrote:
> Crystal reports the PCIe Advanced Error Reporting driver getting stuck
> in an infinite loop on PREEMPT_RT:  Both the primary IRQ handler
> aer_irq() as well as the secondary handler aer_isr() are forced into
> threads with identical priority.
> 
> Crystal writes that on the ARM system in question, the primary IRQ
> handler has to clear an error in the Root Error Status register...
> 
>    "before the next error happens, or else the hardware will set the
>     Multiple ERR_COR Received bit.  If that bit is set, then aer_isr()
>     can't rely on the Error Source Identification register, so it scans
>     through all devices looking for errors -- and for some reason, on
>     this system, accessing the AER registers (or any Config Space above
>     0x400, even though there are capabilities located there) generates
>     an Unsupported Request Error (but returns valid data).  Since this
>     happens more than once, without aer_irq() preempting, it causes
>     another multi error and we get stuck in a loop."
> 
> The issue does not show on non-PREEMPT_RT since the primary IRQ handler
> runs in hardirq context and thus clears the Root Error Status register
> before waking the secondary IRQ handler's thread.
> 
> Simulate the same behavior on PREEMPT_RT by assigning a lower default
> priority to forced secondary IRQ handlers.
> 
> Reported-by: Crystal Wood <crwood@redhat.com>
> Tested-by: Crystal Wood <crwood@redhat.com>
> Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> Crystal provided a Tested-by off-list.  I've used Crystal's suggestion
> for the code comment, but if anyone prefers Sebastian's or some other
> wording, I don't mind either respinning myself or a maintainer amending
> this or any other aspect of the patch when applying.  Thanks!

Crystal's wording ist fine.

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
  2025-09-08 16:08 [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler Lukas Wunner
  2025-09-17 14:56 ` Sebastian Andrzej Siewior
@ 2025-09-20 21:20 ` Thomas Gleixner
  2025-09-21 13:12   ` Lukas Wunner
  1 sibling, 1 reply; 10+ messages in thread
From: Thomas Gleixner @ 2025-09-20 21:20 UTC (permalink / raw)
  To: Lukas Wunner, Sebastian Andrzej Siewior, Crystal Wood,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Clark Williams, Steven Rostedt
  Cc: Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel,
	Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran

On Mon, Sep 08 2025 at 18:08, Lukas Wunner wrote:
> Crystal reports the PCIe Advanced Error Reporting driver getting stuck
> in an infinite loop on PREEMPT_RT:  Both the primary IRQ handler

Can you please write out interrupt. This IRQ acronym is just not helpful
at all.

> aer_irq() as well as the secondary handler aer_isr() are forced into
> threads with identical priority.
>
> Crystal writes that on the ARM system in question, the primary IRQ
> handler has to clear an error in the Root Error Status register...
>
>    "before the next error happens, or else the hardware will set the
>     Multiple ERR_COR Received bit.  If that bit is set, then aer_isr()
>     can't rely on the Error Source Identification register, so it scans
>     through all devices looking for errors -- and for some reason, on
>     this system, accessing the AER registers (or any Config Space above
>     0x400, even though there are capabilities located there) generates
>     an Unsupported Request Error (but returns valid data).  Since this
>     happens more than once, without aer_irq() preempting, it causes
>     another multi error and we get stuck in a loop."
>
> The issue does not show on non-PREEMPT_RT since the primary IRQ handler
> runs in hardirq context and thus clears the Root Error Status register
> before waking the secondary IRQ handler's thread.

That sentence does not make sense at all because on RT this is not any
different. The primary handler clears the status register before waking
the secondary handler, no?

What's different on non-RT is that the primary handler can interrupt the
threaded secondary handler, which is required to make progress.

> Simulate the same behavior on PREEMPT_RT by assigning a lower default
> priority to forced secondary IRQ handlers.

I'm not really fond of this new minus one interface naming. It's a
horrible misnomer as it suggests that it sets the FIFO priority to -1,
which doesn't make any sense at all and is confusing at best.

Can you please pick a name, which makes it obvious what this is about
and as this is really a special case for forced secondary interrupt
handlers, it should just reflect that in the name.

I obviously understand that the proposed change squashs the whole class
of similar (not yet detected) issues, but that made me look at that
particular instance nevertheless.

All aer_irq() does is reading two PCI config words, writing one and then
sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So
arguably this AER problem can be nicely solved by the below one-liner,
no?

And because that made me curious I looked at the first 40 instances of
interrupt requests which have a primary and a secondary handler.

   21 of them can simply set IRQF_NO_THREAD

    3 of them are just silly. I really could not resist to create the
      diff and append it below for entertainment.

Which means going through all the 231 instances is definitely a
worthwhile exercise.

The rest:

   1) is not immediately obvious as I did not follow (indirect)
      call chains to figure out what's really behind them

   2) sets IRQF_ONESHOT, which means the interrupt line is masked before
      the primary handler runs and stays masked until the secondary
      handler returns.

#1 needs eyeballs

#2 should actually not use the secondary thread mechanism at all when
   forced threaded. This really should do:

   hardirq()
        wake(primary_thread);

   primary_thread()
        if (primary_handler() == IRQ_WAKE_THREAD) {
              if (secondary_handler && ONESHOT)
                 secondary_handler();
              else
                 wake(secondary_thread);

   That spares a boat load of wakeups and scheduling on RT and reduces
   the actual problem space significantly. This might need a new
   IRQF_xxx flag to avoid overloading ONESHOT, but that's an
   implementation detail.

Hmm?

Thanks,

        tglx
---
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1671,7 +1671,7 @@ static int aer_probe(struct pcie_device
 	set_service_data(dev, rpc);
 
 	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
-					   IRQF_SHARED, "aerdrv", dev);
+					   IRQF_NO_THREAD | IRQF_SHARED, "aerdrv", dev);
 	if (status) {
 		pci_err(port, "request AER IRQ %d failed\n", dev->irq);
 		return status;

<ENTERTAINMENT>

--- a/drivers/bluetooth/btintel_pcie.c
+++ b/drivers/bluetooth/btintel_pcie.c
@@ -1482,11 +1482,6 @@ static void btintel_pcie_msix_rx_handle(
 	}
 }
 
-static irqreturn_t btintel_pcie_msix_isr(int irq, void *data)
-{
-	return IRQ_WAKE_THREAD;
-}
-
 static inline bool btintel_pcie_is_rxq_empty(struct btintel_pcie_data *data)
 {
 	return data->ia.cr_hia[BTINTEL_PCIE_RXQ_NUM] == data->ia.cr_tia[BTINTEL_PCIE_RXQ_NUM];
@@ -1585,13 +1580,9 @@ static int btintel_pcie_setup_irq(struct
 		msix_entry = &data->msix_entries[i];
 		msix_entry->vector = pci_irq_vector(data->pdev, i);
 
-		err = devm_request_threaded_irq(&data->pdev->dev,
-						msix_entry->vector,
-						btintel_pcie_msix_isr,
-						btintel_pcie_irq_msix_handler,
-						IRQF_SHARED,
-						KBUILD_MODNAME,
-						msix_entry);
+		err = devm_request_threaded_irq(&data->pdev->dev, msix_entry->vector,
+						NULL, btintel_pcie_irq_msix_handler,
+						IRQF_SHARED, KBUILD_MODNAME, msix_entry);
 		if (err) {
 			pci_free_irq_vectors(data->pdev);
 			data->alloc_vecs = 0;
--- a/drivers/bus/fsl-mc/dprc-driver.c
+++ b/drivers/bus/fsl-mc/dprc-driver.c
@@ -381,17 +381,6 @@ int dprc_scan_container(struct fsl_mc_de
 EXPORT_SYMBOL_GPL(dprc_scan_container);
 
 /**
- * dprc_irq0_handler - Regular ISR for DPRC interrupt 0
- *
- * @irq_num: IRQ number of the interrupt being handled
- * @arg: Pointer to device structure
- */
-static irqreturn_t dprc_irq0_handler(int irq_num, void *arg)
-{
-	return IRQ_WAKE_THREAD;
-}
-
-/**
  * dprc_irq0_handler_thread - Handler thread function for DPRC interrupt 0
  *
  * @irq_num: IRQ number of the interrupt being handled
@@ -525,10 +514,8 @@ static int register_dprc_irq_handler(str
 	 * NOTE: devm_request_threaded_irq() invokes the device-specific
 	 * function that programs the MSI physically in the device
 	 */
-	error = devm_request_threaded_irq(&mc_dev->dev,
-					  irq->virq,
-					  dprc_irq0_handler,
-					  dprc_irq0_handler_thread,
+	error = devm_request_threaded_irq(&mc_dev->dev, irq->virq,
+					  NULL, dprc_irq0_handler_thread,
 					  IRQF_NO_SUSPEND | IRQF_ONESHOT,
 					  dev_name(&mc_dev->dev),
 					  &mc_dev->dev);
--- a/drivers/gpu/drm/arm/malidp_hw.c
+++ b/drivers/gpu/drm/arm/malidp_hw.c
@@ -1356,11 +1356,6 @@ void malidp_se_irq_hw_init(struct malidp
 			     hwdev->hw->map.se_irq_map.irq_mask);
 }
 
-static irqreturn_t malidp_se_irq_thread_handler(int irq, void *arg)
-{
-	return IRQ_HANDLED;
-}
-
 int malidp_se_irq_init(struct drm_device *drm, int irq)
 {
 	struct malidp_drm *malidp = drm_to_malidp(drm);
@@ -1371,8 +1366,7 @@ int malidp_se_irq_init(struct drm_device
 	malidp_hw_disable_irq(hwdev, MALIDP_SE_BLOCK, 0xffffffff);
 	malidp_hw_clear_irq(hwdev, MALIDP_SE_BLOCK, 0xffffffff);
 
-	ret = devm_request_threaded_irq(drm->dev, irq, malidp_se_irq,
-					malidp_se_irq_thread_handler,
+	ret = devm_request_threaded_irq(drm->dev, irq, malidp_se_irq, NULL,
 					IRQF_SHARED, "malidp-se", drm);
 	if (ret < 0) {
 		DRM_ERROR("failed to install SE IRQ handler\n");





^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
  2025-09-20 21:20 ` Thomas Gleixner
@ 2025-09-21 13:12   ` Lukas Wunner
  2025-09-21 18:15     ` Thomas Gleixner
  2025-10-03 18:25     ` Crystal Wood
  0 siblings, 2 replies; 10+ messages in thread
From: Lukas Wunner @ 2025-09-21 13:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, Crystal Wood, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams,
	Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci,
	linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar,
	Oliver OHalloran

On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote:
> I obviously understand that the proposed change squashs the whole class
> of similar (not yet detected) issues, but that made me look at that
> particular instance nevertheless.
> 
> All aer_irq() does is reading two PCI config words, writing one and then
> sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So
> arguably this AER problem can be nicely solved by the below one-liner,
> no?

The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally
proposed:

https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/

I guess your point is that handling the few operations in aer_irq()
in hard interrupt context is cheaper than waking a thread and
deferring them to that thread?

Intuitively I would assume that most threaded interrupt handlers
are architected in this way:  They only do minimal work in hard
interrupt context and defer the actual work to the (secondary)
thread.  E.g. pciehp_isr() + pciehp_ist() is likewise designed
to follow this principle.

Your research that at first glance, at least 21 of 40 instances of
request_threaded_irq() could just use IRQF_NO_THREAD, seems to
support the notion that the majority of interrupt handlers only
do minimal work in hard interrupt context.

But if that is the case, and if you believe that deferring that
small amount of work to a thread is nonsensical, then why is the
primary handler forced into a thread by default in the first place,
requiring drivers to explicitly opt out by setting IRQF_NO_THREAD?

Shouldn't it rather be the other way round, i.e. by default the
primary handler is *not* forced into a thread, but only if the
driver explicitly opts in?  (In cases where the primary handler
does a sufficient amount of work that is justified to be deferred
to a thread.)

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
  2025-09-21 13:12   ` Lukas Wunner
@ 2025-09-21 18:15     ` Thomas Gleixner
  2025-10-03 18:25     ` Crystal Wood
  1 sibling, 0 replies; 10+ messages in thread
From: Thomas Gleixner @ 2025-09-21 18:15 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Sebastian Andrzej Siewior, Crystal Wood, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams,
	Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci,
	linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar,
	Oliver OHalloran

On Sun, Sep 21 2025 at 15:12, Lukas Wunner wrote:
> On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote:
> Your research that at first glance, at least 21 of 40 instances of
> request_threaded_irq() could just use IRQF_NO_THREAD, seems to
> support the notion that the majority of interrupt handlers only
> do minimal work in hard interrupt context.
>
> But if that is the case, and if you believe that deferring that
> small amount of work to a thread is nonsensical, then why is the
> primary handler forced into a thread by default in the first place,
> requiring drivers to explicitly opt out by setting IRQF_NO_THREAD?

Because there are primary handlers which are absolutely not RT safe.

> Shouldn't it rather be the other way round, i.e. by default the
> primary handler is *not* forced into a thread, but only if the
> driver explicitly opts in?  (In cases where the primary handler
> does a sufficient amount of work that is justified to be deferred
> to a thread.)

We had to do it this way before RT got upstream as there was no way to
play a whack a mole game with drivers constantly being added and
changed.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
  2025-09-21 13:12   ` Lukas Wunner
  2025-09-21 18:15     ` Thomas Gleixner
@ 2025-10-03 18:25     ` Crystal Wood
  2025-10-24 13:33       ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 10+ messages in thread
From: Crystal Wood @ 2025-10-03 18:25 UTC (permalink / raw)
  To: Lukas Wunner, Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt,
	Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel,
	Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran

On Sun, 2025-09-21 at 15:12 +0200, Lukas Wunner wrote:
> On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote:
> > I obviously understand that the proposed change squashs the whole class
> > of similar (not yet detected) issues, but that made me look at that
> > particular instance nevertheless.
> > 
> > All aer_irq() does is reading two PCI config words, writing one and then
> > sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So
> > arguably this AER problem can be nicely solved by the below one-liner,
> > no?
> 
> The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally
> proposed:
> 
> https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/

So, is the plan to apply the original patch then?


Thanks,
Crystal


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
  2025-10-03 18:25     ` Crystal Wood
@ 2025-10-24 13:33       ` Sebastian Andrzej Siewior
  2025-10-24 21:00         ` Crystal Wood
  0 siblings, 1 reply; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-10-24 13:33 UTC (permalink / raw)
  To: Crystal Wood, Thomas Gleixner
  Cc: Lukas Wunner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel,
	Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas,
	Mahesh J Salgaonkar, Oliver OHalloran

On 2025-10-03 13:25:53 [-0500], Crystal Wood wrote:
> On Sun, 2025-09-21 at 15:12 +0200, Lukas Wunner wrote:
> > On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote:
> > > I obviously understand that the proposed change squashs the whole class
> > > of similar (not yet detected) issues, but that made me look at that
> > > particular instance nevertheless.
> > > 
> > > All aer_irq() does is reading two PCI config words, writing one and then
> > > sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So
> > > arguably this AER problem can be nicely solved by the below one-liner,
> > > no?
> > 
> > The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally
> > proposed:
> > 
> > https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
> 
> So, is the plan to apply the original patch then?

Did we settle on something?
I wasn't sure if you can mix IRQF_NO_THREAD with IRQF_ONESHOT for shared
handlers. If that is a thing, we Crystal's original would do it. Then
there is the question if we want to go the "class" problem to ensure
that one handler can preempt the other.
And maybe I should clean up few ones tglx pointed out that provide a
primary handler for no reason…

> Thanks,
> Crystal

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
  2025-10-24 13:33       ` Sebastian Andrzej Siewior
@ 2025-10-24 21:00         ` Crystal Wood
  2025-10-27  6:40           ` Lukas Wunner
  0 siblings, 1 reply; 10+ messages in thread
From: Crystal Wood @ 2025-10-24 21:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Thomas Gleixner
  Cc: Lukas Wunner, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot, Clark Williams, Steven Rostedt, Dietmar Eggemann,
	Ben Segall, Mel Gorman, Valentin Schneider, linux-kernel,
	Attila Fazekas, linux-pci, linux-rt-devel, Bjorn Helgaas,
	Mahesh J Salgaonkar, Oliver OHalloran

On Fri, 2025-10-24 at 15:33 +0200, Sebastian Andrzej Siewior wrote:
> On 2025-10-03 13:25:53 [-0500], Crystal Wood wrote:
> > On Sun, 2025-09-21 at 15:12 +0200, Lukas Wunner wrote:
> > > On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote:
> > > > I obviously understand that the proposed change squashs the whole class
> > > > of similar (not yet detected) issues, but that made me look at that
> > > > particular instance nevertheless.
> > > > 
> > > > All aer_irq() does is reading two PCI config words, writing one and then
> > > > sticking 64bytes into a KFIFO. All of that is hard interrupt safe. So
> > > > arguably this AER problem can be nicely solved by the below one-liner,
> > > > no?
> > > 
> > > The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally
> > > proposed:
> > > 
> > > https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
> > 
> > So, is the plan to apply the original patch then?
> 
> Did we settle on something?
> I wasn't sure if you can mix IRQF_NO_THREAD with IRQF_ONESHOT for shared
> handlers. If that is a thing, we Crystal's original would do it.

Do you mean mixing IRQF_NO_THREAD on this irq (which should eliminate
the forced IRQF_ONESHOT) with another shared irq that still has
IRQF_ONESHOT?

I suspect it was a non-issue because of IRQCHIP_ONESHOT_SAFE disabling
the forced oneshot (the other irq was pciehp).  Given that these are
pcie-specific, do they ever get used without MSI (which sets
IRQCHIP_ONESHOT_SAFE)[1]?

The issue seems to be that the type of oneshot we want for forced
threading (unmask after the first user-supplied handler) is different
from what we want for explicit IRQF_ONESHOT (unmask after the last
user-supplied handler).  If we separated those, then the semantics
would better match non-RT, and we'd only need to care about mixing
when it comes to explicit IRQF_NOSHOT.

> Then there is the question if we want to go the "class" problem to
> ensure that one handler can preempt the other.  And maybe I should
> clean up few ones tglx pointed out that provide a primary handler for
> no reason…

Either way works for me, as long as we pick at least one :-)

-Crystal

[1] I realize that the answer to "has any hardware designer ever
done this weird and bad thing?" is usually yes. :-P


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
  2025-10-24 21:00         ` Crystal Wood
@ 2025-10-27  6:40           ` Lukas Wunner
  2025-10-27  8:18             ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 10+ messages in thread
From: Lukas Wunner @ 2025-10-27  6:40 UTC (permalink / raw)
  To: Crystal Wood
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Clark Williams,
	Steven Rostedt, Dietmar Eggemann, Ben Segall, Mel Gorman,
	Valentin Schneider, linux-kernel, Attila Fazekas, linux-pci,
	linux-rt-devel, Bjorn Helgaas, Mahesh J Salgaonkar,
	Oliver OHalloran

On Fri, Oct 24, 2025 at 04:00:02PM -0500, Crystal Wood wrote:
> On Fri, 2025-10-24 at 15:33 +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-10-03 13:25:53 [-0500], Crystal Wood wrote:
> > > On Sun, 2025-09-21 at 15:12 +0200, Lukas Wunner wrote:
> > > > On Sat, Sep 20, 2025 at 11:20:26PM +0200, Thomas Gleixner wrote:
> > > > > I obviously understand that the proposed change squashs the whole
> > > > > class of similar (not yet detected) issues, but that made me look at
> > > > > that particular instance nevertheless.
> > > > > 
> > > > > All aer_irq() does is reading two PCI config words, writing one and
> > > > > then sticking 64bytes into a KFIFO. All of that is hard interrupt
> > > > > safe. So arguably this AER problem can be nicely solved by the below
> > > > > one-liner, no?
> > > > 
> > > > The one-liner (which sets IRQF_NO_THREAD) was what Crystal originally
> > > > proposed:
> > > > 
> > > > https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
> > > 
> > > So, is the plan to apply the original patch then?
> > 
> > Did we settle on something?
> > I wasn't sure if you can mix IRQF_NO_THREAD with IRQF_ONESHOT for shared
> > handlers. If that is a thing, we Crystal's original would do it.
> 
> Do you mean mixing IRQF_NO_THREAD on this irq (which should eliminate
> the forced IRQF_ONESHOT) with another shared irq that still has
> IRQF_ONESHOT?
> 
> I suspect it was a non-issue because of IRQCHIP_ONESHOT_SAFE disabling
> the forced oneshot (the other irq was pciehp).  Given that these are
> pcie-specific, do they ever get used without MSI (which sets
> IRQCHIP_ONESHOT_SAFE)[1]?

It seems fragile to depend on IRQCHIP_ONESHOT_SAFE.  What about irqchips
which don't set that?  What about PCIe ports which use legacy INTx
instead of MSI?

As Sebastian pointed out, setting IRQF_NO_THREAD for the AER driver but
not for any of the other PCIe port services risks starving the AER
primary handler if any of the other PCIe port services' (threaded)
primary handlers take a long time:

https://lore.kernel.org/r/20250904073024.YsLeZqK_@linutronix.de/

I went through all PCIe port services to see if IRQF_NO_THREAD could be
set on all of them.  Alas that's not possible:

pciehp's primary handler acquires a spinlock because of runtime PM API
calls (device->power.lock).  It's not a raw_spin_lock_t, so it becomes
a sleeping lock on RT which cannot be acquired in hardirq context and
converting to a raw_spin_lock_t would be non-trivial and probably
undesirable.

pme's primary handler also acquires a spinlock, this one could be
converted to a raw_spin_lock_t.

FWIW, PCIe port services requesting a (potentially shared) interrupt are:
drivers/pci/hotplug/pciehp_hpc.c
drivers/pci/pcie/aer.c
drivers/pci/pcie/dpc.c
drivers/pci/pcie/pme.c
drivers/pci/pcie/bwctrl.c

Long story short, I'll respin the patch to reduce the forced secondary
thread's priority, taking into account Thomas' feedback.
(Apologies for not having done this earlier.)

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler
  2025-10-27  6:40           ` Lukas Wunner
@ 2025-10-27  8:18             ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 10+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-10-27  8:18 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Crystal Wood, Thomas Gleixner, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Clark Williams, Steven Rostedt,
	Dietmar Eggemann, Ben Segall, Mel Gorman, Valentin Schneider,
	linux-kernel, Attila Fazekas, linux-pci, linux-rt-devel,
	Bjorn Helgaas, Mahesh J Salgaonkar, Oliver OHalloran

On 2025-10-27 07:40:46 [+0100], Lukas Wunner wrote:
> > I suspect it was a non-issue because of IRQCHIP_ONESHOT_SAFE disabling
> > the forced oneshot (the other irq was pciehp).  Given that these are
> > pcie-specific, do they ever get used without MSI (which sets
> > IRQCHIP_ONESHOT_SAFE)[1]?
> 
> It seems fragile to depend on IRQCHIP_ONESHOT_SAFE.  What about irqchips
> which don't set that?  What about PCIe ports which use legacy INTx
> instead of MSI?

exactly.

> Long story short, I'll respin the patch to reduce the forced secondary
> thread's priority, taking into account Thomas' feedback.
> (Apologies for not having done this earlier.)

no worries, thanks for the spin up.

> Thanks,
> 
> Lukas

Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-10-27  8:18 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-08 16:08 [PATCH] genirq/manage: Reduce priority of forced secondary IRQ handler Lukas Wunner
2025-09-17 14:56 ` Sebastian Andrzej Siewior
2025-09-20 21:20 ` Thomas Gleixner
2025-09-21 13:12   ` Lukas Wunner
2025-09-21 18:15     ` Thomas Gleixner
2025-10-03 18:25     ` Crystal Wood
2025-10-24 13:33       ` Sebastian Andrzej Siewior
2025-10-24 21:00         ` Crystal Wood
2025-10-27  6:40           ` Lukas Wunner
2025-10-27  8:18             ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).