[PATCH] PCI/AER: Use IRQF_NO_THREAD on aer

linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq
@ 2025-09-02 22:44 Crystal Wood
  2025-09-03  8:10 ` Lukas Wunner
  2025-09-04  7:30 ` Sebastian Andrzej Siewior
  0 siblings, 2 replies; 9+ messages in thread
From: Crystal Wood @ 2025-09-02 22:44 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Mahesh J Salgaonkar, Oliver O'Halloran,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Attila Fazekas, linux-pci, linux-rt-devel, Crystal Wood

On PREEMPT_RT, currently both aer_irq and aer_isr run in separate threads,
at the same FIFO priority.  This can lead to the aer_isr thread starving
the aer_irq thread, particularly if multi_error_valid causes a scan of
all devices, and multiple errors are raised during the scan.

On !PREEMPT_RT, or if aer_irq runs at a higher priority than aer_isr, these
errors can be queued as single-error events as they happen.  But if aer_irq
can't run until aer_isr finishes, by that time the multi event bit will be
set again, causing a new scan and an infinite loop.

Signed-off-by: Crystal Wood <crwood@redhat.com>
---
I'm seeing this on a particular ARM server when using /sys/bus/pci/rescan,
though the internal reporter sometimes saw it happen on boot as well.
On !PREEMPT_RT, or with this patch, a finite number of errors are emitted
and the scan completes.
---
 drivers/pci/pcie/aer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 15ed541d2fbe..6945a112a5cd 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev)
 	set_service_data(dev, rpc);
 
 	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
-					   IRQF_SHARED, "aerdrv", dev);
+					   IRQF_NO_THREAD | IRQF_SHARED,
+					   "aerdrv", dev);
 	if (status) {
 		pci_err(port, "request AER IRQ %d failed\n", dev->irq);
 		return status;
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq
  2025-09-02 22:44 [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq Crystal Wood
@ 2025-09-03  8:10 ` Lukas Wunner
  2025-09-03 21:39   ` Crystal Wood
  2025-09-04  7:30 ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 9+ messages in thread
From: Lukas Wunner @ 2025-09-03  8:10 UTC (permalink / raw)
  To: Crystal Wood
  Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Attila Fazekas, linux-pci, linux-rt-devel

On Tue, Sep 02, 2025 at 05:44:41PM -0500, Crystal Wood wrote:
> On PREEMPT_RT, currently both aer_irq and aer_isr run in separate threads,

My understanding is that if request_threaded_irq() is passed both a
non-NULL handler and a non-NULL thread_fn, the former runs in hardirq
context and the latter in kthread context.  Even on PREEMPT_RT.

So how can aer_irq() and aer_isr() ever both run in kthread context?
Am I missing something?

> at the same FIFO priority.  This can lead to the aer_isr thread starving
> the aer_irq thread, particularly if multi_error_valid causes a scan of
> all devices, and multiple errors are raised during the scan.

I'm not seeing aer_isr() waiting on a spinlock, so how can it be starved?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq
  2025-09-03  8:10 ` Lukas Wunner
@ 2025-09-03 21:39   ` Crystal Wood
  0 siblings, 0 replies; 9+ messages in thread
From: Crystal Wood @ 2025-09-03 21:39 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Attila Fazekas, linux-pci, linux-rt-devel

On Wed, 2025-09-03 at 10:10 +0200, Lukas Wunner wrote:
> On Tue, Sep 02, 2025 at 05:44:41PM -0500, Crystal Wood wrote:
> > On PREEMPT_RT, currently both aer_irq and aer_isr run in separate threads,
> 
> My understanding is that if request_threaded_irq() is passed both a
> non-NULL handler and a non-NULL thread_fn, the former runs in hardirq
> context and the latter in kthread context.  Even on PREEMPT_RT.
> 
> So how can aer_irq() and aer_isr() ever both run in kthread context?
> Am I missing something?

They are both threaded.  See commit 2a1d3ab8986d1b2 ("genirq: Handle
force threading of irqs with primary and thread handler")

> 
> > at the same FIFO priority.  This can lead to the aer_isr thread starving
> > the aer_irq thread, particularly if multi_error_valid causes a scan of
> > all devices, and multiple errors are raised during the scan.
> 
> I'm not seeing aer_isr() waiting on a spinlock, so how can it be starved?

It's not about locks... Maybe starvation is too strong of a word since
aer_irq does eventually get to run, just too late to avert yet another
multi error that starts the whole thing over again.

-Crystal


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq
  2025-09-02 22:44 [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq Crystal Wood
  2025-09-03  8:10 ` Lukas Wunner
@ 2025-09-04  7:30 ` Sebastian Andrzej Siewior
  2025-09-04 12:48   ` Lukas Wunner
  1 sibling, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-04  7:30 UTC (permalink / raw)
  To: Crystal Wood
  Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Clark Williams, Steven Rostedt, Attila Fazekas, linux-pci,
	linux-rt-devel

On 2025-09-02 17:44:41 [-0500], Crystal Wood wrote:
> On PREEMPT_RT, currently both aer_irq and aer_isr run in separate threads,
> at the same FIFO priority.  This can lead to the aer_isr thread starving
> the aer_irq thread, particularly if multi_error_valid causes a scan of
> all devices, and multiple errors are raised during the scan.
> 
> On !PREEMPT_RT, or if aer_irq runs at a higher priority than aer_isr, these
> errors can be queued as single-error events as they happen.  But if aer_irq
> can't run until aer_isr finishes, by that time the multi event bit will be
> set again, causing a new scan and an infinite loop.

So if aer_irq is too slow we get new "work" pilled up? Is it because
there is a timing constrains how long until the error needs to be
acknowledged?

Another way would be to let the secondary handler run at a slightly lower
priority than the primary handler. In this case making the primary
non-threaded should not cause any harm.

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> Signed-off-by: Crystal Wood <crwood@redhat.com>
> ---
> I'm seeing this on a particular ARM server when using /sys/bus/pci/rescan,
> though the internal reporter sometimes saw it happen on boot as well.
> On !PREEMPT_RT, or with this patch, a finite number of errors are emitted
> and the scan completes.
> ---
>  drivers/pci/pcie/aer.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 15ed541d2fbe..6945a112a5cd 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev)
>  	set_service_data(dev, rpc);
>  
>  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> -					   IRQF_SHARED, "aerdrv", dev);
> +					   IRQF_NO_THREAD | IRQF_SHARED,
> +					   "aerdrv", dev);

I'm not sure if this works with IRQF_SHARED. Your primary handler is
IRQF_SHARED + IRQF_NO_THREAD and another shared handler which is
forced-threaded will have IRQF_SHARED + IRQF_ONESHOT. 
If the core does not complain, all good. Worst case might be the shared
ONESHOT lets your primary handler starve. It would be nice if you could
check if you have shared handler here (I have no aer I three boxes I
checked).

>  	if (status) {
>  		pci_err(port, "request AER IRQ %d failed\n", dev->irq);
>  		return status;

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq
  2025-09-04  7:30 ` Sebastian Andrzej Siewior
@ 2025-09-04 12:48   ` Lukas Wunner
  2025-09-04 13:18     ` Lukas Wunner
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Lukas Wunner @ 2025-09-04 12:48 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Crystal Wood, Bjorn Helgaas, Mahesh J Salgaonkar,
	Oliver O'Halloran, Clark Williams, Steven Rostedt,
	Attila Fazekas, linux-pci, linux-rt-devel

On Thu, Sep 04, 2025 at 09:30:24AM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-09-02 17:44:41 [-0500], Crystal Wood wrote:
> > On PREEMPT_RT, currently both aer_irq and aer_isr run in separate threads,
> > at the same FIFO priority.  This can lead to the aer_isr thread starving
> > the aer_irq thread, particularly if multi_error_valid causes a scan of
> > all devices, and multiple errors are raised during the scan.
> > 
> > On !PREEMPT_RT, or if aer_irq runs at a higher priority than aer_isr, these
> > errors can be queued as single-error events as they happen.  But if aer_irq
> > can't run until aer_isr finishes, by that time the multi event bit will be
> > set again, causing a new scan and an infinite loop.
> 
> So if aer_irq is too slow we get new "work" pilled up? Is it because
> there is a timing constrains how long until the error needs to be
> acknowledged?

Since v6.16, AER supports rate limiting.  It's unclear which
kernel version Crystal is using, but if it's older than v6.16,
it may be worth retrying with a newer release to see if that
solves the problem.

> Another way would be to let the secondary handler run at a slightly lower
> priority than the primary handler. In this case making the primary
> non-threaded should not cause any harm.

Why isn't the secondary handler always assigned a lower priority
by default?  I think a lot of drivers are built on the assumption
that the primary handler is scheduled sooner than the secondary
handler.

E.g. the native PCIe hotplug driver (drivers/pci/hotplug/pciehp_hpc.c)
uses the primary handler to pick up Command Completed interrupts
and will then wake the secondary handler, which is waiting in
pcie_wait_cmd().  The secondary handler uses a timeout of 1 sec
to ensure forward progress in case the hardware never signals
Command Completed (e.g. if the hotplug port itself was hot-removed).

In extreme cases, the primary handler may not run within 1 sec
to wake the secondary handler.  The secondary handler will then
run into the timeout and issue an error message (but should
otherwise react gracefully).

My point is that keeping both at the same priority by default
provokes such situations more easily, so assigning a higher
default priority to the primary handler would seem prudent.

> > +++ b/drivers/pci/pcie/aer.c
> > @@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev)
> >  	set_service_data(dev, rpc);
> >  
> >  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> > -					   IRQF_SHARED, "aerdrv", dev);
> > +					   IRQF_NO_THREAD | IRQF_SHARED,
> > +					   "aerdrv", dev);
> 
> I'm not sure if this works with IRQF_SHARED. Your primary handler is
> IRQF_SHARED + IRQF_NO_THREAD and another shared handler which is
> forced-threaded will have IRQF_SHARED + IRQF_ONESHOT. 
> If the core does not complain, all good. Worst case might be the shared
> ONESHOT lets your primary handler starve. It would be nice if you could
> check if you have shared handler here (I have no aer I three boxes I
> checked).

Yes, interrupt sharing can happen if the Root Port uses legacy INTx
interrupts.  In that case other port services such as hotplug,
bandwidth control, PME or DPC may use the same interrupt.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq
  2025-09-04 12:48   ` Lukas Wunner
@ 2025-09-04 13:18     ` Lukas Wunner
  2025-09-04 13:38       ` Sebastian Andrzej Siewior
  2025-09-04 13:31     ` Sebastian Andrzej Siewior
  2025-09-04 20:27     ` Crystal Wood
  2 siblings, 1 reply; 9+ messages in thread
From: Lukas Wunner @ 2025-09-04 13:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Crystal Wood, Bjorn Helgaas, Mahesh J Salgaonkar,
	Oliver O'Halloran, Clark Williams, Steven Rostedt,
	Attila Fazekas, linux-pci, linux-rt-devel

On Thu, Sep 04, 2025 at 02:48:21PM +0200, Lukas Wunner wrote:
> On Thu, Sep 04, 2025 at 09:30:24AM +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-09-02 17:44:41 [-0500], Crystal Wood wrote:
> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev)
> > >  	set_service_data(dev, rpc);
> > >  
> > >  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> > > -					   IRQF_SHARED, "aerdrv", dev);
> > > +					   IRQF_NO_THREAD | IRQF_SHARED,
> > > +					   "aerdrv", dev);
> > 
> > I'm not sure if this works with IRQF_SHARED. Your primary handler is
> > IRQF_SHARED + IRQF_NO_THREAD and another shared handler which is
> > forced-threaded will have IRQF_SHARED + IRQF_ONESHOT. 
> > If the core does not complain, all good. Worst case might be the shared
> > ONESHOT lets your primary handler starve. It would be nice if you could
> > check if you have shared handler here (I have no aer I three boxes I
> > checked).
> 
> Yes, interrupt sharing can happen if the Root Port uses legacy INTx
> interrupts.  In that case other port services such as hotplug,
> bandwidth control, PME or DPC may use the same interrupt.

I should add that none of these other port service drivers use
IRQF_ONESHOT.  They're all IRQF_SHARED only.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq
  2025-09-04 12:48   ` Lukas Wunner
  2025-09-04 13:18     ` Lukas Wunner
@ 2025-09-04 13:31     ` Sebastian Andrzej Siewior
  2025-09-04 20:27     ` Crystal Wood
  2 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-04 13:31 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Crystal Wood, Bjorn Helgaas, Mahesh J Salgaonkar,
	Oliver O'Halloran, Clark Williams, Steven Rostedt,
	Attila Fazekas, linux-pci, linux-rt-devel

On 2025-09-04 14:48:21 [+0200], Lukas Wunner wrote:
> Since v6.16, AER supports rate limiting.  It's unclear which
> kernel version Crystal is using, but if it's older than v6.16,
> it may be worth retrying with a newer release to see if that
> solves the problem.

Where is this rate limiting coming from?

> > Another way would be to let the secondary handler run at a slightly lower
> > priority than the primary handler. In this case making the primary
> > non-threaded should not cause any harm.
> 
> Why isn't the secondary handler always assigned a lower priority
> by default?  I think a lot of drivers are built on the assumption
> that the primary handler is scheduled sooner than the secondary
> handler.

Well, that is the first time I see that someone made that assumption.

> E.g. the native PCIe hotplug driver (drivers/pci/hotplug/pciehp_hpc.c)
> uses the primary handler to pick up Command Completed interrupts
> and will then wake the secondary handler, which is waiting in
> pcie_wait_cmd().  The secondary handler uses a timeout of 1 sec
> to ensure forward progress in case the hardware never signals
> Command Completed (e.g. if the hotplug port itself was hot-removed).

If it is waiting then everything is good. It would be only problematic
if it busy-polls.

> In extreme cases, the primary handler may not run within 1 sec
> to wake the secondary handler.  The secondary handler will then
> run into the timeout and issue an error message (but should
> otherwise react gracefully).
>
> My point is that keeping both at the same priority by default
> provokes such situations more easily, so assigning a higher
> default priority to the primary handler would seem prudent.

Okay but the secondary should be one less than the primary. The primary
is in the middle priority "MAX_RT_PRIO / 2". It should not be preferred
over other forced-threaded handler just because it has also a secondary
handler. The secondary should run after all primary handler are done.
This would also mirror the !RT case.

> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev)
> > >  	set_service_data(dev, rpc);
> > >  
> > >  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> > > -					   IRQF_SHARED, "aerdrv", dev);
> > > +					   IRQF_NO_THREAD | IRQF_SHARED,
> > > +					   "aerdrv", dev);
> > 
> > I'm not sure if this works with IRQF_SHARED. Your primary handler is
> > IRQF_SHARED + IRQF_NO_THREAD and another shared handler which is
> > forced-threaded will have IRQF_SHARED + IRQF_ONESHOT. 
> > If the core does not complain, all good. Worst case might be the shared
> > ONESHOT lets your primary handler starve. It would be nice if you could
> > check if you have shared handler here (I have no aer I three boxes I
> > checked).
> 
> Yes, interrupt sharing can happen if the Root Port uses legacy INTx
> interrupts.  In that case other port services such as hotplug,
> bandwidth control, PME or DPC may use the same interrupt.

So this sounds like it is not going to work then, or is it?

> Thanks,
> 
> Lukas

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq
  2025-09-04 13:18     ` Lukas Wunner
@ 2025-09-04 13:38       ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-09-04 13:38 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Crystal Wood, Bjorn Helgaas, Mahesh J Salgaonkar,
	Oliver O'Halloran, Clark Williams, Steven Rostedt,
	Attila Fazekas, linux-pci, linux-rt-devel

On 2025-09-04 15:18:50 [+0200], Lukas Wunner wrote:
> On Thu, Sep 04, 2025 at 02:48:21PM +0200, Lukas Wunner wrote:
> > On Thu, Sep 04, 2025 at 09:30:24AM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2025-09-02 17:44:41 [-0500], Crystal Wood wrote:
> > > > +++ b/drivers/pci/pcie/aer.c
> > > > @@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev)
> > > >  	set_service_data(dev, rpc);
> > > >  
> > > >  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> > > > -					   IRQF_SHARED, "aerdrv", dev);
> > > > +					   IRQF_NO_THREAD | IRQF_SHARED,
> > > > +					   "aerdrv", dev);
> > > 
> > > I'm not sure if this works with IRQF_SHARED. Your primary handler is
> > > IRQF_SHARED + IRQF_NO_THREAD and another shared handler which is
> > > forced-threaded will have IRQF_SHARED + IRQF_ONESHOT. 
> > > If the core does not complain, all good. Worst case might be the shared
> > > ONESHOT lets your primary handler starve. It would be nice if you could
> > > check if you have shared handler here (I have no aer I three boxes I
> > > checked).
> > 
> > Yes, interrupt sharing can happen if the Root Port uses legacy INTx
> > interrupts.  In that case other port services such as hotplug,
> > bandwidth control, PME or DPC may use the same interrupt.
> 
> I should add that none of these other port service drivers use
> IRQF_ONESHOT.  They're all IRQF_SHARED only.

Yes. But. If they get forced-threaded then we have a primary handler
irq_default_primary_handler() as the hardirq handler. This one would be
marked IRQF_ONESHOT. The other primary handler would be aer_irq() in
this case. As long as all (or none) are forced-threaded then it is fine.
In this case it could be a miss match, I'm not sure.

> Thanks,
> 
> Lukas

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq
  2025-09-04 12:48   ` Lukas Wunner
  2025-09-04 13:18     ` Lukas Wunner
  2025-09-04 13:31     ` Sebastian Andrzej Siewior
@ 2025-09-04 20:27     ` Crystal Wood
  2 siblings, 0 replies; 9+ messages in thread
From: Crystal Wood @ 2025-09-04 20:27 UTC (permalink / raw)
  To: Lukas Wunner, Sebastian Andrzej Siewior
  Cc: Bjorn Helgaas, Mahesh J Salgaonkar, Oliver O'Halloran,
	Clark Williams, Steven Rostedt, Attila Fazekas, linux-pci,
	linux-rt-devel

On Thu, 2025-09-04 at 14:48 +0200, Lukas Wunner wrote:
> On Thu, Sep 04, 2025 at 09:30:24AM +0200, Sebastian Andrzej Siewior wrote:
> > On 2025-09-02 17:44:41 [-0500], Crystal Wood wrote:
> > > On PREEMPT_RT, currently both aer_irq and aer_isr run in separate threads,
> > > at the same FIFO priority.  This can lead to the aer_isr thread starving
> > > the aer_irq thread, particularly if multi_error_valid causes a scan of
> > > all devices, and multiple errors are raised during the scan.
> > > 
> > > On !PREEMPT_RT, or if aer_irq runs at a higher priority than aer_isr, these
> > > errors can be queued as single-error events as they happen.  But if aer_irq
> > > can't run until aer_isr finishes, by that time the multi event bit will be
> > > set again, causing a new scan and an infinite loop.
> > 
> > So if aer_irq is too slow we get new "work" pilled up? Is it because
> > there is a timing constrains how long until the error needs to be
> > acknowledged?

The error needs to be cleared before the next error happens, or else
the hardware will set the "Multiple ERR_COR Received" bit.  If that bit
is set, then aer_isr can't rely on the error source ID register, so it
scans through all devices looking for errors -- and for some reason, on
this system, accessing the error registers (or any config space above
0x400, even though there are capabilities located there) generates an
Unsupported Request Error (but returns valid data).

Since this happens more than once, without aer_irq preempting, it
causes another multi error and we get stuck in a loop.

> Since v6.16, AER supports rate limiting.  It's unclear which
> kernel version Crystal is using, but if it's older than v6.16,
> it may be worth retrying with a newer release to see if that
> solves the problem.

The problem shows in top-of-tree.  The messages are ratelimited, but
the problem isn't from the messages.  It still does the scan.

> > Another way would be to let the secondary handler run at a slightly lower
> > priority than the primary handler. In this case making the primary
> > non-threaded should not cause any harm.
> 
> Why isn't the secondary handler always assigned a lower priority
> by default?  I think a lot of drivers are built on the assumption
> that the primary handler is scheduled sooner than the secondary
> handler.

That also works, and I agree it's more intuitive.

> > > +++ b/drivers/pci/pcie/aer.c
> > > @@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev)
> > >  	set_service_data(dev, rpc);
> > >  
> > >  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> > > -					   IRQF_SHARED, "aerdrv", dev);
> > > +					   IRQF_NO_THREAD | IRQF_SHARED,
> > > +					   "aerdrv", dev);
> > 
> > I'm not sure if this works with IRQF_SHARED. Your primary handler is
> > IRQF_SHARED + IRQF_NO_THREAD and another shared handler which is
> > forced-threaded will have IRQF_SHARED + IRQF_ONESHOT. 
> > If the core does not complain, all good. Worst case might be the shared
> > ONESHOT lets your primary handler starve. It would be nice if you could
> > check if you have shared handler here (I have no aer I three boxes I
> > checked).
> 
> Yes, interrupt sharing can happen if the Root Port uses legacy INTx
> interrupts.  In that case other port services such as hotplug,
> bandwidth control, PME or DPC may use the same interrupt.

It's shared, but with another explicitly threaded interrupt.
This is with the patch applied:
root         778  0.0  0.0      0     0 ?        S    Sep02   0:00 [irq/87-aerdrv]
root         779  0.0  0.0      0     0 ?        S    Sep02   0:00 [irq/87-pciehp]
root         780  0.0  0.0      0     0 ?        S    Sep02   0:00 [irq/87-s-pciehp]

If it were shared with a oneshot irq (forced or otherwise) wouldn't
that have already been a mismatch?

-Crystal


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-09-04 20:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-02 22:44 [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq Crystal Wood
2025-09-03  8:10 ` Lukas Wunner
2025-09-03 21:39   ` Crystal Wood
2025-09-04  7:30 ` Sebastian Andrzej Siewior
2025-09-04 12:48   ` Lukas Wunner
2025-09-04 13:18     ` Lukas Wunner
2025-09-04 13:38       ` Sebastian Andrzej Siewior
2025-09-04 13:31     ` Sebastian Andrzej Siewior
2025-09-04 20:27     ` Crystal Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).