From: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy@linux.intel.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1] PCI: pciehp: Make sure DPC trigger status is reset in PDC handler
Date: Thu, 15 Jun 2023 16:03:54 -0700 [thread overview]
Message-ID: <713d71dc-c4a5-cd7b-2deb-343c244dd14d@linux.intel.com> (raw)
In-Reply-To: <20230615183550.GA9773@wunner.de>
Hi Lukas,
Thanks for the review.
On 6/15/23 11:35 AM, Lukas Wunner wrote:
> On Wed, Jun 14, 2023 at 11:25:59PM -0700, Kuppuswamy Sathyanarayanan wrote:
>> During the EDR-based DPC recovery process, for devices with persistent
>> issues, the firmware may choose not to handle the DPC error and leave
>> the port in DPC triggered state. In such scenarios, if the user
>> replaces the faulty device with a new one, the OS is expected to clear
>> the DPC trigger status in the hotplug error handler to enable the new
>> device enumeration.
>
> You're clearing the DPC trigger status upon a PDC event, yet are saying
> here the purpose is to reset port state for a future hotplugged device.
Sorry, it is a typo. I meant "hotplug interrupt handler".
The goal is to ensure that when a new device presence is detected, the
old DPC trigger status is cleared.
>
> A PDC event may be synthesized, e.g. to trigger slot bringup via
> sysfs, so using a PDC event to clear DPC trigger status feels wrong.
IMO, it is harmless. We just want to make sure the previous DPC trigger
status is cleared before enumerating a new device.
> pciehp_unconfigure_device() seems like a more appropriate place to me.
>
I initially thought to add it there. Spec also recommends clearing it
when removing the device. But I wasn't sure if pciehp_unconfigure_device()
would be called only during device removal.
Let me test this path and get back to you.
>
>> More details about this issue can be found in PCIe
>> firmware specification, r3.3, sec titled "DPC Event Handling"
>> Implementation note.
>
> That Implementation Note contains a lot of text and a fairly complex
> flow chart. If you could point to specific paragraphs or numbers in
> the Implementation Note that would make life easier for a reviewer
> to make the connection between your code and the spec.
It is the text at the end of the flowchart. Copied it here for reference.
For devices with persistent errors, a port may be kept in the DPC triggered
state (disabled) to keep those devices from continuing to generate errors. For
hot-plug slots, the errant device may be removed and replaced with a new device.
If the DPC trigger state is not cleared, then the port above the newly inserted
device will still be disabled and will be non-operational. Therefore, operating
systems may need to modify their hot-plug interrupt handling code to clear DPC
Trigger Status when a device is removed so that a subsequent insertion
will succeed.
>
>
>> Similar issue might also happen if the DPC or EDR recovery handler
>> exits before clearing the trigger status. To fix this issue, clear the
>> DPC trigger status in PDC interrupt handler.
>
> I was about to ask why the code is added to dpc.c, not edr.c,
> and why it's not constrained to CONFIG_PCIE_EDR, but I assume
> that's the reason? Because it "might" happen for OS-native DPC
> as well?
Yes. There are code paths in the DPC driver where error recover handler
can exit before clearing the DPC trigger status. So I think this fix is
applicable for native code as well.
>
>
>> +/**
>> + * pci_reset_trigger - Clear DPC trigger status
>> + * @pdev: PCI device
>> + *
>> + * It is called from the PCIe hotplug driver to clean the DPC
>> + * trigger status in the PDC interrupt handler.
>> + */
>> +void pci_dpc_reset_trigger(struct pci_dev *pdev)
>> +{
>> + if (!pdev->dpc_cap)
>> + return;
>> +
>> + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_STATUS,
>> + PCI_EXP_DPC_STATUS_TRIGGER);
>> +}
>
> This may run concurrently to dpc_reset_link(), so I'd expect that
> you need some kind of serialization. What happens if pciehp clears
> trigger status behind the DPC driver's back while it is handling an
> error?
Currently, we only call pci_dpc_reset_trigger() in PDC interrupt handler.
Do you think there would be a race between error handler and PDC handler?
>
> Thanks,
>
> Lukas
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
next prev parent reply other threads:[~2023-06-15 23:03 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-15 6:25 [PATCH v1] PCI: pciehp: Make sure DPC trigger status is reset in PDC handler Kuppuswamy Sathyanarayanan
2023-06-15 8:18 ` kernel test robot
2023-06-15 18:35 ` Lukas Wunner
2023-06-15 23:03 ` Sathyanarayanan Kuppuswamy [this message]
2023-06-16 9:06 ` Lukas Wunner
2023-06-16 23:27 ` Sathyanarayanan Kuppuswamy
2023-06-17 8:31 ` Lukas Wunner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=713d71dc-c4a5-cd7b-2deb-343c244dd14d@linux.intel.com \
--to=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.