From: Bjorn Helgaas <helgaas@kernel.org>
To: Timothy Pearson <tpearson@raptorengineering.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
linux-kernel <linux-kernel@vger.kernel.org>,
linux-pci <linux-pci@vger.kernel.org>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
christophe leroy <christophe.leroy@csgroup.eu>,
Naveen N Rao <naveen@kernel.org>,
Bjorn Helgaas <bhelgaas@google.com>,
Shawn Anastasio <sanastasio@raptorengineering.com>
Subject: Re: [PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and
Date: Wed, 18 Jun 2025 14:15:30 -0500 [thread overview]
Message-ID: <20250618191530.GA1218109@bhelgaas> (raw)
In-Reply-To: <317515920.1310655.1750265903281.JavaMail.zimbra@raptorengineeringinc.com>
On Wed, Jun 18, 2025 at 11:58:23AM -0500, Timothy Pearson wrote:
> recovery
Same weird subject/commit wrapping.
> The existing PowerNV hotplug code did not handle suprise plug events
> correctly, leading to a complete failure of the hotplug system after
> device removal and a required reboot to detect new devices.
s/suprise/surprise/ (also below)
> This comes down to two issues:
> 1.) When a device is suprise removed, oftentimes the bridge upstream
> port will cause a PE freeze on the PHB. If this freeze is not
> cleared, the MSI interrupts from the bridge hotplug notification
> logic will not be received by the kernel, stalling all plug events
> on all slots associated with the PE.
I guess you mean the bridge *downstream* port that leads to the slot?
> 2.) When a device is removed from a slot, regardless of suprise or
> programmatic removal, the associated PHB/PE ls left frozen.
> If this freeze is not cleared via a fundamental reset, skiboot
> is unable to clear the freeze and cannot retrain / rescan the
> slot. This also requires a reboot to clear the freeze and redetect
> the device in the slot.
>
> Issue the appropriate unfreeze and rescan commands on hotplug events,
> and don't oops on hotplug if pci_bus_to_OF_node() returns NULL.
>
> Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
> ---
> arch/powerpc/kernel/pci-hotplug.c | 3 ++
> drivers/pci/hotplug/pnv_php.c | 53 ++++++++++++++++++++++++++++++-
> 2 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 9ea74973d78d..6f444d0822d8 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -141,6 +141,9 @@ void pci_hp_add_devices(struct pci_bus *bus)
> struct pci_controller *phb;
> struct device_node *dn = pci_bus_to_OF_node(bus);
>
> + if (!dn)
> + return;
> +
> phb = pci_bus_to_host(bus);
>
> mode = PCI_PROBE_NORMAL;
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index bac8af3df41a..0ceb4a2c3c79 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -10,6 +10,7 @@
> #include <linux/libfdt.h>
> #include <linux/module.h>
> #include <linux/pci.h>
> +#include <linux/delay.h>
> #include <linux/pci_hotplug.h>
> #include <linux/of_fdt.h>
>
> @@ -474,7 +475,7 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
> struct hotplug_slot *slot = &php_slot->slot;
> uint8_t presence = OPAL_PCI_SLOT_EMPTY;
> uint8_t power_status = OPAL_PCI_SLOT_POWER_ON;
> - int ret;
> + int ret, i;
>
> /* Check if the slot has been configured */
> if (php_slot->state != PNV_PHP_STATE_REGISTERED)
> @@ -532,6 +533,27 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
>
> /* Power is off, turn it on and then scan the slot */
> ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
> + if (ret) {
> + SLOT_WARN(php_slot, "PCI slot activation failed with error code %d, possible frozen PHB", ret);
> + SLOT_WARN(php_slot, "Attempting complete PHB reset before retrying slot activation\n");
> + for (i = 0; i < 3; i++) {
> + /* Slot activation failed, PHB may be fenced from a prior device failure
> + * Use the OPAL fundamental reset call to both try a device reset and clear
> + * any potentially active PHB fence / freeze
> + */
> + SLOT_WARN(php_slot, "Try %d...\n", i + 1);
> + pci_set_pcie_reset_state(php_slot->pdev, pcie_warm_reset);
> + msleep(250);
What is the source of the 250 value? Is there a spec you can cite for
this? Maybe add a #define if it makes sense?
> + pci_set_pcie_reset_state(php_slot->pdev, pcie_deassert_reset);
> +
> + ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
Wrap the comment and non-printk lines to fit in 80 columns like the
rest of the file. Preserve the messages as-is so grep finds them
easily.
Usual multi-line comment style is:
/*
* Text ...
*/
Possibly factor this warn/reset code into a helper function to
unclutter pnv_php_enable()?
> + if (!ret)
> + break;
> + }
> +
> + if (i >= 3)
> + SLOT_WARN(php_slot, "Failed to bring slot online, aborting!\n");
> + }
> if (ret)
> return ret;
>
> @@ -841,12 +863,41 @@ static void pnv_php_event_handler(struct work_struct *work)
> struct pnv_php_event *event =
> container_of(work, struct pnv_php_event, work);
> struct pnv_php_slot *php_slot = event->php_slot;
> + struct pci_dev *pdev = php_slot->pdev;
> + struct eeh_dev *edev;
> + struct eeh_pe *pe;
> + int i, rc;
>
> if (event->added)
> pnv_php_enable_slot(&php_slot->slot);
> else
> pnv_php_disable_slot(&php_slot->slot);
>
> + if (!event->added) {
> + /* When a device is surprise removed from a downstream bridge slot, the upstream bridge port
> + * can still end up frozen due to related EEH events, which will in turn block the MSI interrupts
> + * for slot hotplug detection. Detect and thaw any frozen upstream PE after slot deactivation...
> + */
Restyle and wrap comment.
s/upstream bridge port/bridge downstream port/ to avoid confusion.
> + edev = pci_dev_to_eeh_dev(pdev);
> + pe = edev ? edev->pe : NULL;
> + rc = eeh_pe_get_state(pe);
> + if ((rc == -ENODEV) || (rc == -ENOENT)) {
> + SLOT_WARN(php_slot, "Upstream bridge PE state unknown, hotplug detect may fail\n");
> + }
> + else {
> + if (pe->state & EEH_PE_ISOLATED) {
> + SLOT_WARN(php_slot, "Upstream bridge PE %02x frozen, thawing...\n", pe->addr);
> + for (i = 0; i < 3; i++)
> + if (!eeh_unfreeze_pe(pe))
> + break;
> + if (i >= 3)
> + SLOT_WARN(php_slot, "Unable to thaw PE %02x, hotplug detect will fail!\n", pe->addr);
> + else
> + SLOT_WARN(php_slot, "PE %02x thawed successfully\n", pe->addr);
> + }
> + }
> + }
Possibly factor this out, too. Then pnv_php_event_handler() could
look simpler:
if (event->added) {
pnv_php_enable_slot(&php_slot->slot);
} else {
pnv_php_disable_slot(&php_slot->slot);
<new helper to check for surprise removal>
}
kfree(event);
> kfree(event);
> }
>
> --
> 2.39.5
next prev parent reply other threads:[~2025-06-18 19:15 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-18 16:54 [PATCH v2 0/6] PowerNV PCIe Hotplug Driver Fixes Timothy Pearson
2025-06-18 16:56 ` [PATCH v2 1/6] pci/hotplug/pnv_php: Properly clean up allocated IRQs on Timothy Pearson
2025-06-18 16:56 ` [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken Timothy Pearson
2025-06-18 19:44 ` Bjorn Helgaas
2025-06-18 19:50 ` Timothy Pearson
2025-06-18 20:17 ` Bjorn Helgaas
2025-06-19 19:29 ` Timothy Pearson
2025-06-20 7:52 ` Lukas Wunner
2025-06-20 16:45 ` Timothy Pearson
2025-06-25 8:45 ` Lukas Wunner
2025-06-18 16:57 ` [PATCH v2 3/6] powerpc/eeh: Export eeh_unfreeze_pe() Timothy Pearson
2025-06-18 16:57 ` [PATCH v2 4/6] powerpc/eeh: Make EEH driver device hotplug safe Timothy Pearson
2025-06-18 16:58 ` [PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and Timothy Pearson
2025-06-18 19:15 ` Bjorn Helgaas [this message]
2025-06-19 19:22 ` Timothy Pearson
2025-06-18 16:58 ` [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator Timothy Pearson
2025-06-18 19:01 ` Bjorn Helgaas
2025-06-19 0:37 ` Timothy Pearson
2025-06-20 9:26 ` Krishna Kumar
2025-06-21 9:59 ` Lukas Wunner
2025-06-25 4:08 ` Krishna Kumar
2025-06-25 8:08 ` Lukas Wunner
2025-06-25 10:55 ` Krishna Kumar
2025-06-21 15:05 ` Timothy Pearson
2025-06-24 7:07 ` Krishna Kumar
2025-06-24 16:34 ` Timothy Pearson
2025-06-24 22:34 ` Bjorn Helgaas
2025-07-07 8:01 ` Krishna Kumar
2025-07-11 18:18 ` Timothy Pearson
2025-07-11 21:05 ` Bjorn Helgaas
2025-07-15 21:41 ` Timothy Pearson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250618191530.GA1218109@bhelgaas \
--to=helgaas@kernel.org \
--cc=bhelgaas@google.com \
--cc=christophe.leroy@csgroup.eu \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=naveen@kernel.org \
--cc=sanastasio@raptorengineering.com \
--cc=tpearson@raptorengineering.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).