* [PATCH 0/2] PCI/DPC: Fix EDR recovery path issues
@ 2026-02-12 19:18 Danielle Costantino
2026-02-12 19:18 ` [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link() Danielle Costantino
2026-02-12 19:18 ` [PATCH 2/2] PCI/EDR: Defer AER status clearing until after recovery Danielle Costantino
0 siblings, 2 replies; 15+ messages in thread
From: Danielle Costantino @ 2026-02-12 19:18 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Keith Busch, Kuppuswamy Sathyanarayanan, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci,
Danielle Costantino
Two fixes for the firmware-first EDR path in DPC error recovery.
Both were found investigating recurring DPC events on a platform using
firmware-first EDR, where each recovery left stale state in the DPC
Status register and cleared AER diagnostic data before driver callbacks
could inspect it.
Patch 1 fixes DPC Interrupt Status (bit 3) never being cleared in the
EDR path. In native DPC, dpc_irq() clears it in the top-half, but
EDR bypasses dpc_irq() entirely. After this fix, dpc_reset_link()
clears both Trigger Status and Interrupt Status together.
Patch 2 moves pci_aer_raw_clear_status() from before pcie_do_recovery()
to after it in edr_handle_event(), so AER registers remain readable
throughout the driver recovery callbacks.
Tested on a firmware-first EDR platform. A natural DPC event
confirmed:
- DPC Status properly cleared: 0x2000 (clean) vs 0x2008 (stale, unpatched)
- AER data available during recovery (GHES captured aer_uncor_status:
0x00100010 throughout recovery window)
- Full recovery successful in ~2.3s with link re-establishment
Danielle Costantino (2):
PCI/DPC: Clear Interrupt Status in dpc_reset_link()
PCI/EDR: Defer AER status clearing until after recovery
drivers/pci/pcie/dpc.c | 10 +++++++++-
drivers/pci/pcie/edr.c | 11 ++++++++++-
2 files changed, 19 insertions(+), 2 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-12 19:18 [PATCH 0/2] PCI/DPC: Fix EDR recovery path issues Danielle Costantino
@ 2026-02-12 19:18 ` Danielle Costantino
2026-02-12 19:50 ` Kuppuswamy Sathyanarayanan
2026-02-12 21:36 ` Lukas Wunner
2026-02-12 19:18 ` [PATCH 2/2] PCI/EDR: Defer AER status clearing until after recovery Danielle Costantino
1 sibling, 2 replies; 15+ messages in thread
From: Danielle Costantino @ 2026-02-12 19:18 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Keith Busch, Kuppuswamy Sathyanarayanan, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci,
Danielle Costantino
In the native DPC interrupt path, dpc_irq() clears
PCI_EXP_DPC_STATUS_INTERRUPT before scheduling the threaded handler
that eventually calls dpc_reset_link(). However, in the firmware-first
EDR path, dpc_irq() is never invoked -- firmware owns the DPC interrupt
and notifies the OS via an ACPI EDR notification. dpc_reset_link() is
then called directly from edr_handle_event() via pcie_do_recovery().
Because dpc_reset_link() only clears PCI_EXP_DPC_STATUS_TRIGGER, the
Interrupt Status bit (bit 3) is left set permanently after every EDR
event.
Clear PCI_EXP_DPC_STATUS_INTERRUPT alongside PCI_EXP_DPC_STATUS_TRIGGER
in dpc_reset_link(). Both bits are RW1C in the DPC Status register per
PCIe r6.1, sec 7.9.14.5, so writing them together is safe. The native
path is unaffected because dpc_irq() has already cleared the Interrupt
Status bit before dpc_reset_link() runs.
Fixes: aea47413e7ce ("PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR")
Signed-off-by: Danielle Costantino <dcostantino@meta.com>
---
drivers/pci/pcie/dpc.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fc18349614d7..9baa2345e33e 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -171,8 +171,16 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
goto out;
}
+ /*
+ * Clear both DPC Trigger Status and DPC Interrupt Status. In the
+ * native DPC path, dpc_irq() already clears Interrupt Status before
+ * the threaded handler runs. But in the EDR (firmware-first) path,
+ * dpc_irq() is never called, so Interrupt Status must be cleared
+ * here to prevent it from remaining stale indefinitely.
+ */
pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
- PCI_EXP_DPC_STATUS_TRIGGER);
+ PCI_EXP_DPC_STATUS_TRIGGER |
+ PCI_EXP_DPC_STATUS_INTERRUPT);
if (pci_bridge_wait_for_secondary_bus(pdev, "DPC")) {
clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/2] PCI/EDR: Defer AER status clearing until after recovery
2026-02-12 19:18 [PATCH 0/2] PCI/DPC: Fix EDR recovery path issues Danielle Costantino
2026-02-12 19:18 ` [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link() Danielle Costantino
@ 2026-02-12 19:18 ` Danielle Costantino
1 sibling, 0 replies; 15+ messages in thread
From: Danielle Costantino @ 2026-02-12 19:18 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Keith Busch, Kuppuswamy Sathyanarayanan, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci,
Danielle Costantino
edr_handle_event() calls pci_aer_raw_clear_status() immediately after
dpc_process_error() but before pcie_do_recovery(). While
dpc_process_error() has already read and logged the AER uncorrectable
status by this point, clearing the registers before recovery means
that driver error_detected/slot_reset/resume callbacks invoked by
pcie_do_recovery() can no longer inspect the AER status to make their
own decisions.
Additionally, in the firmware-first EDR path, the normal AER clearing
inside pcie_do_recovery() is a no-op because pcie_aer_is_native()
returns false, so pci_aer_raw_clear_status() in edr_handle_event() is
the only place these registers get cleared. Moving it after recovery
completes keeps the diagnostic data available throughout the entire
recovery sequence.
Move pci_aer_raw_clear_status() to after pcie_do_recovery() returns so
AER registers remain readable during the full recovery sequence.
Fixes: ac1c8e35a326 ("PCI/DPC: Add Error Disconnect Recover (EDR) support")
Signed-off-by: Danielle Costantino <dcostantino@meta.com>
---
drivers/pci/pcie/edr.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pcie/edr.c b/drivers/pci/pcie/edr.c
index e86298dbbcff..94a0e5e58fea 100644
--- a/drivers/pci/pcie/edr.c
+++ b/drivers/pci/pcie/edr.c
@@ -191,7 +191,6 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data)
}
dpc_process_error(edev);
- pci_aer_raw_clear_status(edev);
/*
* Irrespective of whether the DPC event is triggered by ERR_FATAL
@@ -200,6 +199,16 @@ static void edr_handle_event(acpi_handle handle, u32 event, void *data)
*/
estate = pcie_do_recovery(edev, pci_channel_io_frozen, dpc_reset_link);
+ /*
+ * Clear AER status only after pcie_do_recovery() completes.
+ * dpc_process_error() has already read and logged the AER
+ * status above. Deferring the clear keeps diagnostic data
+ * available to driver callbacks invoked during recovery and
+ * avoids interfering with any AER status reads they may
+ * perform.
+ */
+ pci_aer_raw_clear_status(edev);
+
send_ost:
/*
--
2.47.3
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-12 19:18 ` [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link() Danielle Costantino
@ 2026-02-12 19:50 ` Kuppuswamy Sathyanarayanan
2026-02-12 21:23 ` Keith Busch
2026-02-12 21:36 ` Lukas Wunner
1 sibling, 1 reply; 15+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-02-12 19:50 UTC (permalink / raw)
To: Danielle Costantino, Bjorn Helgaas
Cc: Keith Busch, Lukas Wunner, Mahesh J Salgaonkar,
Oliver O'Halloran, linux-pci
On 2/12/2026 11:18 AM, Danielle Costantino wrote:
> In the native DPC interrupt path, dpc_irq() clears
> PCI_EXP_DPC_STATUS_INTERRUPT before scheduling the threaded handler
> that eventually calls dpc_reset_link(). However, in the firmware-first
> EDR path, dpc_irq() is never invoked -- firmware owns the DPC interrupt
> and notifies the OS via an ACPI EDR notification. dpc_reset_link() is
> then called directly from edr_handle_event() via pcie_do_recovery().
>
> Because dpc_reset_link() only clears PCI_EXP_DPC_STATUS_TRIGGER, the
> Interrupt Status bit (bit 3) is left set permanently after every EDR
> event.
>
> Clear PCI_EXP_DPC_STATUS_INTERRUPT alongside PCI_EXP_DPC_STATUS_TRIGGER
> in dpc_reset_link(). Both bits are RW1C in the DPC Status register per
> PCIe r6.1, sec 7.9.14.5, so writing them together is safe. The native
> path is unaffected because dpc_irq() has already cleared the Interrupt
> Status bit before dpc_reset_link() runs.
>
In native path (dpc_irq()), OS owns the interrupt delivery and handling.
So it clears it in dpc_irq() handler.
In case of EDR, firmware owns the interrupt handling. It just uses ACPI
method to request OS help with recovery. Since interrupt handling is
owned by firmware, I think firmware should clear the interrupt status.
> Fixes: aea47413e7ce ("PCI/DPC: Expose dpc_process_error(), dpc_reset_link() for use by EDR")
> Signed-off-by: Danielle Costantino <dcostantino@meta.com>
> ---
> drivers/pci/pcie/dpc.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index fc18349614d7..9baa2345e33e 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -171,8 +171,16 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> goto out;
> }
>
> + /*
> + * Clear both DPC Trigger Status and DPC Interrupt Status. In the
> + * native DPC path, dpc_irq() already clears Interrupt Status before
> + * the threaded handler runs. But in the EDR (firmware-first) path,
> + * dpc_irq() is never called, so Interrupt Status must be cleared
> + * here to prevent it from remaining stale indefinitely.
> + */
> pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> - PCI_EXP_DPC_STATUS_TRIGGER);
> + PCI_EXP_DPC_STATUS_TRIGGER |
> + PCI_EXP_DPC_STATUS_INTERRUPT);
>
> if (pci_bridge_wait_for_secondary_bus(pdev, "DPC")) {
> clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-12 19:50 ` Kuppuswamy Sathyanarayanan
@ 2026-02-12 21:23 ` Keith Busch
2026-02-12 21:49 ` Kuppuswamy Sathyanarayanan
0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2026-02-12 21:23 UTC (permalink / raw)
To: Kuppuswamy Sathyanarayanan
Cc: Danielle Costantino, Bjorn Helgaas, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
On Thu, Feb 12, 2026 at 11:50:54AM -0800, Kuppuswamy Sathyanarayanan wrote:
> In case of EDR, firmware owns the interrupt handling. It just uses ACPI
> method to request OS help with recovery. Since interrupt handling is
> owned by firmware, I think firmware should clear the interrupt status.
But the PCI Firmware Specication says the OS owns the status register
while it is handling EDR notification
"
the operating system is permitted to write the following:
* Device Status Register
* Uncorrectable Error Status Register
* Correctable Error Status Register
* Root Error Status Register
* RP PIO Status Register
in the Port that triggered DPC while processing an Error Disconnect Recover
notification from firmware
"
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-12 19:18 ` [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link() Danielle Costantino
2026-02-12 19:50 ` Kuppuswamy Sathyanarayanan
@ 2026-02-12 21:36 ` Lukas Wunner
2026-02-12 21:50 ` Keith Busch
1 sibling, 1 reply; 15+ messages in thread
From: Lukas Wunner @ 2026-02-12 21:36 UTC (permalink / raw)
To: Danielle Costantino
Cc: Bjorn Helgaas, Keith Busch, Kuppuswamy Sathyanarayanan,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
On Thu, Feb 12, 2026 at 11:18:17AM -0800, Danielle Costantino wrote:
> Clear PCI_EXP_DPC_STATUS_INTERRUPT alongside PCI_EXP_DPC_STATUS_TRIGGER
> in dpc_reset_link(). Both bits are RW1C in the DPC Status register per
> PCIe r6.1, sec 7.9.14.5, so writing them together is safe. The native
> path is unaffected because dpc_irq() has already cleared the Interrupt
> Status bit before dpc_reset_link() runs.
Hm, doesn't this create a risk that in the native case, an interrupt may
be lost (i.e. cleared without acting on it) if it occurs between clearing
the bit in dpc_irq() and clearing it in dpc_reset_link()?
Maybe I'm missing something and this isn't a problem, but in that case
an explanation in the commit message would be good *why* it's not a
problem.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-12 21:23 ` Keith Busch
@ 2026-02-12 21:49 ` Kuppuswamy Sathyanarayanan
2026-02-12 22:12 ` Keith Busch
0 siblings, 1 reply; 15+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-02-12 21:49 UTC (permalink / raw)
To: Keith Busch
Cc: Danielle Costantino, Bjorn Helgaas, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
Hi Keith,
On 2/12/2026 1:23 PM, Keith Busch wrote:
> On Thu, Feb 12, 2026 at 11:50:54AM -0800, Kuppuswamy Sathyanarayanan wrote:
>> In case of EDR, firmware owns the interrupt handling. It just uses ACPI
>> method to request OS help with recovery. Since interrupt handling is
>> owned by firmware, I think firmware should clear the interrupt status.
>
> But the PCI Firmware Specication says the OS owns the status register
> while it is handling EDR notification
>
> "
> the operating system is permitted to write the following:
>
> * Device Status Register
> * Uncorrectable Error Status Register
> * Correctable Error Status Register
> * Root Error Status Register
> * RP PIO Status Register
>
> in the Port that triggered DPC while processing an Error Disconnect Recover
> notification from firmware
> "
Please check the _OSC DPC control bit details in PCI firmware spec.
It specifically calls out OS is permitted to write to DPC trigger status
bit in DPC status register. It does not talk about about DPC interrupt
status bit.
Copied here for reference:
"If control of this feature was requested and denied, or was not requested,
then the operating system is permitted to write to the DPC Trigger Status bit
in the DPC Status Register in the Port that triggered containment"
I think since firmware registers interrupt handler for DPC, after OS helps
handle the recovery it should complete the loop and clear it.
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-12 21:36 ` Lukas Wunner
@ 2026-02-12 21:50 ` Keith Busch
0 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2026-02-12 21:50 UTC (permalink / raw)
To: Lukas Wunner
Cc: Danielle Costantino, Bjorn Helgaas, Kuppuswamy Sathyanarayanan,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
On Thu, Feb 12, 2026 at 10:36:12PM +0100, Lukas Wunner wrote:
> On Thu, Feb 12, 2026 at 11:18:17AM -0800, Danielle Costantino wrote:
> > Clear PCI_EXP_DPC_STATUS_INTERRUPT alongside PCI_EXP_DPC_STATUS_TRIGGER
> > in dpc_reset_link(). Both bits are RW1C in the DPC Status register per
> > PCIe r6.1, sec 7.9.14.5, so writing them together is safe. The native
> > path is unaffected because dpc_irq() has already cleared the Interrupt
> > Status bit before dpc_reset_link() runs.
>
> Hm, doesn't this create a risk that in the native case, an interrupt may
> be lost (i.e. cleared without acting on it) if it occurs between clearing
> the bit in dpc_irq() and clearing it in dpc_reset_link()?
>
> Maybe I'm missing something and this isn't a problem, but in that case
> an explanation in the commit message would be good *why* it's not a
> problem.
The Trigger Status remains set, so the containment is latched for entire
processing. You can't get a 2nd containment event while the port is
already in containment. It's just important to clear the interrupt
status bit in the top-half mainly for level-triggered interrupts. I
think it would be harmless if the OS deferred the clearing to the
bottom-half when using edge triggered interrupts, though.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-12 21:49 ` Kuppuswamy Sathyanarayanan
@ 2026-02-12 22:12 ` Keith Busch
2026-02-12 22:51 ` Kuppuswamy Sathyanarayanan
0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2026-02-12 22:12 UTC (permalink / raw)
To: Kuppuswamy Sathyanarayanan
Cc: Danielle Costantino, Bjorn Helgaas, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
On Thu, Feb 12, 2026 at 01:49:52PM -0800, Kuppuswamy Sathyanarayanan wrote:
> Hi Keith,
>
> On 2/12/2026 1:23 PM, Keith Busch wrote:
> > On Thu, Feb 12, 2026 at 11:50:54AM -0800, Kuppuswamy Sathyanarayanan wrote:
> >> In case of EDR, firmware owns the interrupt handling. It just uses ACPI
> >> method to request OS help with recovery. Since interrupt handling is
> >> owned by firmware, I think firmware should clear the interrupt status.
> >
> > But the PCI Firmware Specication says the OS owns the status register
> > while it is handling EDR notification
> >
> > "
> > the operating system is permitted to write the following:
> >
> > * Device Status Register
> > * Uncorrectable Error Status Register
> > * Correctable Error Status Register
> > * Root Error Status Register
> > * RP PIO Status Register
> >
> > in the Port that triggered DPC while processing an Error Disconnect Recover
> > notification from firmware
> > "
>
> Please check the _OSC DPC control bit details in PCI firmware spec.
>
> It specifically calls out OS is permitted to write to DPC trigger status
> bit in DPC status register. It does not talk about about DPC interrupt
> status bit.
>
> Copied here for reference:
>
> "If control of this feature was requested and denied, or was not requested,
> then the operating system is permitted to write to the DPC Trigger Status bit
> in the DPC Status Register in the Port that triggered containment"
>
> I think since firmware registers interrupt handler for DPC, after OS helps
> handle the recovery it should complete the loop and clear it.
But that just creates a window for when after the OS lets the link
retrain that the port will be unable to trigger a new containment event.
Sure, there's no explicit language in any spec I can find that the OS
must write 1 to bit 3 of the status register, but it doesn't say
firmware owns that bit either. The firmware handed control of the status
to the OS in this path. It would not make sense to return to firmware in
a state that makes it impossible to report new errors during that
transition window.
Besides, is firmware first even triggering off an interrupt? Pretty sure
it's triggering off the ERR_COR message, no? Why would it need to own
the Interrupt Status bit when it's not even relying on it?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-12 22:12 ` Keith Busch
@ 2026-02-12 22:51 ` Kuppuswamy Sathyanarayanan
2026-02-13 1:22 ` Keith Busch
0 siblings, 1 reply; 15+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-02-12 22:51 UTC (permalink / raw)
To: Keith Busch
Cc: Danielle Costantino, Bjorn Helgaas, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
On 2/12/2026 2:12 PM, Keith Busch wrote:
> On Thu, Feb 12, 2026 at 01:49:52PM -0800, Kuppuswamy Sathyanarayanan wrote:
>> Hi Keith,
>>
>> On 2/12/2026 1:23 PM, Keith Busch wrote:
>>> On Thu, Feb 12, 2026 at 11:50:54AM -0800, Kuppuswamy Sathyanarayanan wrote:
>>>> In case of EDR, firmware owns the interrupt handling. It just uses ACPI
>>>> method to request OS help with recovery. Since interrupt handling is
>>>> owned by firmware, I think firmware should clear the interrupt status.
>>>
>>> But the PCI Firmware Specication says the OS owns the status register
>>> while it is handling EDR notification
>>>
>>> "
>>> the operating system is permitted to write the following:
>>>
>>> * Device Status Register
>>> * Uncorrectable Error Status Register
>>> * Correctable Error Status Register
>>> * Root Error Status Register
>>> * RP PIO Status Register
>>>
>>> in the Port that triggered DPC while processing an Error Disconnect Recover
>>> notification from firmware
>>> "
>>
>> Please check the _OSC DPC control bit details in PCI firmware spec.
>>
>> It specifically calls out OS is permitted to write to DPC trigger status
>> bit in DPC status register. It does not talk about about DPC interrupt
>> status bit.
>>
>> Copied here for reference:
>>
>> "If control of this feature was requested and denied, or was not requested,
>> then the operating system is permitted to write to the DPC Trigger Status bit
>> in the DPC Status Register in the Port that triggered containment"
>>
>> I think since firmware registers interrupt handler for DPC, after OS helps
>> handle the recovery it should complete the loop and clear it.
>
> But that just creates a window for when after the OS lets the link
> retrain that the port will be unable to trigger a new containment event.
As per EDR flow, firmware waits for _OST reply from OS to complete the
current interrupt handling. After receiving _OST, firmware decides whether
recovery should continue or if the link should be disabled. When/how firmware
handles subsequent DPC events depends on firmware's implementation.
>
> Sure, there's no explicit language in any spec I can find that the OS
> must write 1 to bit 3 of the status register, but it doesn't say
> firmware owns that bit either. The firmware handed control of the status
> to the OS in this path. It would not make sense to return to firmware in
> a state that makes it impossible to report new errors during that
> transition window.
The spec explicitly lists what the OS can write during the EDR window. For
few registers it gives full contro; For DPC status, it explicitly states
which bit it can clear (DPC trigger status).
>
> Besides, is firmware first even triggering off an interrupt? Pretty sure
> it's triggering off the ERR_COR message, no? Why would it need to own
> the Interrupt Status bit when it's not even relying on it?
If it not triggered by interrupt, then interrupt status bit does not
matter (even for OS handler).
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-12 22:51 ` Kuppuswamy Sathyanarayanan
@ 2026-02-13 1:22 ` Keith Busch
2026-02-13 4:28 ` Sathyanarayanan Kuppuswamy
[not found] ` <4c0d0575-0da1-49ff-878e-65622b442e98@linux.intel.com>
0 siblings, 2 replies; 15+ messages in thread
From: Keith Busch @ 2026-02-13 1:22 UTC (permalink / raw)
To: Kuppuswamy Sathyanarayanan
Cc: Danielle Costantino, Bjorn Helgaas, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
On Thu, Feb 12, 2026 at 02:51:46PM -0800, Kuppuswamy Sathyanarayanan wrote:
> On 2/12/2026 2:12 PM, Keith Busch wrote:
>
> As per EDR flow, firmware waits for _OST reply from OS to complete the
> current interrupt handling. After receiving _OST, firmware decides whether
> recovery should continue or if the link should be disabled. When/how firmware
> handles subsequent DPC events depends on firmware's implementation.
You at least agree the OS controls the "Trigger Status". That controls
whether the link stays contained or not, but now you're saying the
firmware gets to yank the link after the OS returns _OST success?
There's no flow in the spec suggesting any such behavior.
> > Sure, there's no explicit language in any spec I can find that the OS
> > must write 1 to bit 3 of the status register, but it doesn't say
> > firmware owns that bit either. The firmware handed control of the status
> > to the OS in this path. It would not make sense to return to firmware in
> > a state that makes it impossible to report new errors during that
> > transition window.
>
> The spec explicitly lists what the OS can write during the EDR window. For
> few registers it gives full contro; For DPC status, it explicitly states
> which bit it can clear (DPC trigger status).
What harm do you think will come if we return the dpc status register to
firmware in a state that indicates the OS handled it to completion?
> > Besides, is firmware first even triggering off an interrupt? Pretty sure
> > it's triggering off the ERR_COR message, no? Why would it need to own
> > the Interrupt Status bit when it's not even relying on it?
>
> If it not triggered by interrupt, then interrupt status bit does not
> matter (even for OS handler).
Emperically speaking, yes, it does matter. Otherwise why would this
patch exist?
Or the other way, if you think it doesn't matter, then why oppose
improving Linux hardware interop?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-13 1:22 ` Keith Busch
@ 2026-02-13 4:28 ` Sathyanarayanan Kuppuswamy
2026-02-13 14:01 ` Keith Busch
[not found] ` <4c0d0575-0da1-49ff-878e-65622b442e98@linux.intel.com>
1 sibling, 1 reply; 15+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2026-02-13 4:28 UTC (permalink / raw)
To: Keith Busch
Cc: Danielle Costantino, Bjorn Helgaas, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
Hi Keith,
On 2/12/26 5:22 PM, Keith Busch wrote:
> On Thu, Feb 12, 2026 at 02:51:46PM -0800, Kuppuswamy Sathyanarayanan wrote:
>> On 2/12/2026 2:12 PM, Keith Busch wrote:
>>
>> As per EDR flow, firmware waits for _OST reply from OS to complete the
>> current interrupt handling. After receiving _OST, firmware decides whether
>> recovery should continue or if the link should be disabled. When/how firmware
>> handles subsequent DPC events depends on firmware's implementation.
> You at least agree the OS controls the "Trigger Status". That controls
> whether the link stays contained or not, but now you're saying the
> firmware gets to yank the link after the OS returns _OST success?
> There's no flow in the spec suggesting any such behavior.
I was referring to the flow chart and notes on page 85, in PCI firmware spec,
v3.3, which mention firmware can choose to disable the link in certain
cases (notes 3,5).
>
>>> Sure, there's no explicit language in any spec I can find that the OS
>>> must write 1 to bit 3 of the status register, but it doesn't say
>>> firmware owns that bit either. The firmware handed control of the status
>>> to the OS in this path. It would not make sense to return to firmware in
>>> a state that makes it impossible to report new errors during that
>>> transition window.
>> The spec explicitly lists what the OS can write during the EDR window. For
>> few registers it gives full contro; For DPC status, it explicitly states
>> which bit it can clear (DPC trigger status).
> What harm do you think will come if we return the dpc status register to
> firmware in a state that indicates the OS handled it to completion?
>
>>> Besides, is firmware first even triggering off an interrupt? Pretty sure
>>> it's triggering off the ERR_COR message, no? Why would it need to own
>>> the Interrupt Status bit when it's not even relying on it?
>> If it not triggered by interrupt, then interrupt status bit does not
>> matter (even for OS handler).
> Emperically speaking, yes, it does matter. Otherwise why would this
> patch exist?
What's the actual symptom being observed? Is the Interrupt Status bit
being left set causing a functional problem, or is this preventive cleanup?
>
> Or the other way, if you think it doesn't matter, then why oppose
> improving Linux hardware interop?
I'm not opposing the change. I want to make sure we align with the
spec and don't break existing firmware expectations.
If firmware is the final decision maker of the recovery flow (as shown in the
flow chart), does this change actually help with early re-handling of DPC
events? Also, if there currently exists firmware that assumes it owns the
decision on when to re-handle DPC events, this change might break those
assumptions.
If this change improves interoperability with existing platforms, we should
take it. As a follow up, we should work on clarifying the spec to explicitly
permit OS clearing of Interrupt Status during EDR.
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
[not found] ` <4c0d0575-0da1-49ff-878e-65622b442e98@linux.intel.com>
@ 2026-02-13 4:29 ` Sathyanarayanan Kuppuswamy
0 siblings, 0 replies; 15+ messages in thread
From: Sathyanarayanan Kuppuswamy @ 2026-02-13 4:29 UTC (permalink / raw)
To: Keith Busch
Cc: Danielle Costantino, Bjorn Helgaas, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
Hi All,
Please ignore my previous emails. For some reason, when I was typing the reply, it seems to have sent it.
On 2/12/26 8:22 PM, Sathyanarayanan Kuppuswamy wrote:
>
> Hi Keith,
>
> On 2/12/26 5:22 PM, Keith Busch wrote:
>> On Thu, Feb 12, 2026 at 02:51:46PM -0800, Kuppuswamy Sathyanarayanan wrote:
>>> On 2/12/2026 2:12 PM, Keith Busch wrote:
>>>
>>> As per EDR flow, firmware waits for _OST reply from OS to complete the
>>> current interrupt handling. After receiving _OST, firmware decides whether
>>> recovery should continue or if the link should be disabled. When/how firmware
>>> handles subsequent DPC events depends on firmware's implementation.
>> You at least agree the OS controls the "Trigger Status". That controls
>> whether the link stays contained or not, but now you're saying the
>> firmware gets to yank the link after the OS returns _OST success?
>> There's no flow in the spec suggesting any such behavior.
>
> I was referring to the flow chart and notes on page 85, in PCI firmware spec,
> v3.3, which mention firmware can choose to disable the link in certain
> cases (notes 3,5).
>
>>>> Sure, there's no explicit language in any spec I can find that the OS
>>>> must write 1 to bit 3 of the status register, but it doesn't say
>>>> firmware owns that bit either. The firmware handed control of the status
>>>> to the OS in this path. It would not make sense to return to firmware in
>>>> a state that makes it impossible to report new errors during that
>>>> transition window.
>>> The spec explicitly lists what the OS can write during the EDR window. For
>>> few registers it gives full contro; For DPC status, it explicitly states
>>> which bit it can clear (DPC trigger status).
>> What harm do you think will come if we return the dpc status register to
>> firmware in a state that indicates the OS handled it to completion?
>>
>>>> Besides, is firmware first even triggering off an interrupt? Pretty sure
>>>> it's triggering off the ERR_COR message, no? Why would it need to own
>>>> the Interrupt Status bit when it's not even relying on it?
>>> If it not triggered by interrupt, then interrupt status bit does not
>>> matter (even for OS handler).
>> Emperically speaking, yes, it does matter. Otherwise why would this
>> patch exist?
>
> What's the actual symptom being observed? Is the Interrupt Status bit
> being left set causing a functional problem, or is this preventive cleanup?
>
>> Or the other way, if you think it doesn't matter, then why oppose
>> improving Linux hardware interop?
>
> I'm not opposing the change. I want to make sure we align with the
> spec and not break any firmware expectations.
>
> If firmware is the final decision maker of the recovery flow (as shown in
> the flow chart), then does this change really help early re-handling of DPC
> events?
>
> Another point I am concerned, If there currently exits a firmware
> implementation, which assumes firmware own the decision on when to
> re-handle DPC events. we
> not want handle DPC again before
> completing the current event and makes a decision on whether to recover
> , then this change might break it.
>
> If you think it is better for OS to clear it, then I think we should attempt to
> clarify that part in spec as a follow up to make sure we set clear expectation
> to the firmware.
>
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-13 4:28 ` Sathyanarayanan Kuppuswamy
@ 2026-02-13 14:01 ` Keith Busch
2026-02-13 17:08 ` Kuppuswamy Sathyanarayanan
0 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2026-02-13 14:01 UTC (permalink / raw)
To: Sathyanarayanan Kuppuswamy
Cc: Danielle Costantino, Bjorn Helgaas, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
On Thu, Feb 12, 2026 at 08:28:15PM -0800, Sathyanarayanan Kuppuswamy wrote:
> Hi Keith,
>
> On 2/12/26 5:22 PM, Keith Busch wrote:
> > On Thu, Feb 12, 2026 at 02:51:46PM -0800, Kuppuswamy Sathyanarayanan wrote:
> > > On 2/12/2026 2:12 PM, Keith Busch wrote:
> > >
> > > As per EDR flow, firmware waits for _OST reply from OS to complete the
> > > current interrupt handling. After receiving _OST, firmware decides whether
> > > recovery should continue or if the link should be disabled. When/how firmware
> > > handles subsequent DPC events depends on firmware's implementation.
> > You at least agree the OS controls the "Trigger Status". That controls
> > whether the link stays contained or not, but now you're saying the
> > firmware gets to yank the link after the OS returns _OST success?
> > There's no flow in the spec suggesting any such behavior.
>
> I was referring to the flow chart and notes on page 85, in PCI firmware spec,
> v3.3, which mention firmware can choose to disable the link in certain
> cases (notes 3,5).
Err, what? No, there is no path for firmware to disable the link if OS
returns _OST 0x80 (success). The path says firmware is supposed to
"Clear ... Disable Link", not to disable the link.
If the firmware wants to disable the link permanently, it does so before
considering DPC and makes it a hotplug event instead.
But in all honesty, that diagram is a complete mess and very poorly
captures what an OS is supposed to do for a DPC. You don't unbind the
drivers when you're trying to recover the devices they're driving.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link()
2026-02-13 14:01 ` Keith Busch
@ 2026-02-13 17:08 ` Kuppuswamy Sathyanarayanan
0 siblings, 0 replies; 15+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2026-02-13 17:08 UTC (permalink / raw)
To: Keith Busch
Cc: Danielle Costantino, Bjorn Helgaas, Lukas Wunner,
Mahesh J Salgaonkar, Oliver O'Halloran, linux-pci
On 2/13/2026 6:01 AM, Keith Busch wrote:
> On Thu, Feb 12, 2026 at 08:28:15PM -0800, Sathyanarayanan Kuppuswamy wrote:
>> Hi Keith,
>>
>> On 2/12/26 5:22 PM, Keith Busch wrote:
>>> On Thu, Feb 12, 2026 at 02:51:46PM -0800, Kuppuswamy Sathyanarayanan wrote:
>>>> On 2/12/2026 2:12 PM, Keith Busch wrote:
>>>>
>>>> As per EDR flow, firmware waits for _OST reply from OS to complete the
>>>> current interrupt handling. After receiving _OST, firmware decides whether
>>>> recovery should continue or if the link should be disabled. When/how firmware
>>>> handles subsequent DPC events depends on firmware's implementation.
>>> You at least agree the OS controls the "Trigger Status". That controls
>>> whether the link stays contained or not, but now you're saying the
>>> firmware gets to yank the link after the OS returns _OST success?
>>> There's no flow in the spec suggesting any such behavior.
>>
>> I was referring to the flow chart and notes on page 85, in PCI firmware spec,
>> v3.3, which mention firmware can choose to disable the link in certain
>> cases (notes 3,5).
>
> Err, what? No, there is no path for firmware to disable the link if OS
> returns _OST 0x80 (success). The path says firmware is supposed to
> "Clear ... Disable Link", not to disable the link.
Let me list the firmware path after OS sends _OST(0xF, BDF<<16+0x80) indicating
success (PCI Firmware Spec v3.3, page 85):
1. "Capture error status. Optionally, validate if device should be recovered" (note 3)
2. Decision point: "Continue Recovery?"
If YES: Return from _OST (recovery complete)
If NO: Execute "Clear error status. DPC Software Trigger or Disable Link" (note 5)
Note 3 states: "FW/OS may rely on past device history, type of error, etc. to
determine if the recovery should be performed/continued."
Note 5 states: "If setting Disable Link and port supports DPC, DPC should be
disabled prior to setting Disable Link."
If firmware decides not to continue, it can disable link. Please correct me if
my interpretation is wrong.
>
> If the firmware wants to disable the link permanently, it does so before
> considering DPC and makes it a hotplug event instead.
Yes, it does it once before considering the DPC. And another time after
OS completes the recovery.
>
> But in all honesty, that diagram is a complete mess and very poorly
> captures what an OS is supposed to do for a DPC. You don't unbind the
> drivers when you're trying to recover the devices they're driving.
>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-02-13 17:08 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-12 19:18 [PATCH 0/2] PCI/DPC: Fix EDR recovery path issues Danielle Costantino
2026-02-12 19:18 ` [PATCH 1/2] PCI/DPC: Clear Interrupt Status in dpc_reset_link() Danielle Costantino
2026-02-12 19:50 ` Kuppuswamy Sathyanarayanan
2026-02-12 21:23 ` Keith Busch
2026-02-12 21:49 ` Kuppuswamy Sathyanarayanan
2026-02-12 22:12 ` Keith Busch
2026-02-12 22:51 ` Kuppuswamy Sathyanarayanan
2026-02-13 1:22 ` Keith Busch
2026-02-13 4:28 ` Sathyanarayanan Kuppuswamy
2026-02-13 14:01 ` Keith Busch
2026-02-13 17:08 ` Kuppuswamy Sathyanarayanan
[not found] ` <4c0d0575-0da1-49ff-878e-65622b442e98@linux.intel.com>
2026-02-13 4:29 ` Sathyanarayanan Kuppuswamy
2026-02-12 21:36 ` Lukas Wunner
2026-02-12 21:50 ` Keith Busch
2026-02-12 19:18 ` [PATCH 2/2] PCI/EDR: Defer AER status clearing until after recovery Danielle Costantino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox