linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Timothy Pearson <tpearson@raptorengineering.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-pci <linux-pci@vger.kernel.org>,
	Madhavan Srinivasan <maddy@linux.ibm.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	christophe leroy <christophe.leroy@csgroup.eu>,
	Naveen N Rao <naveen@kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Shawn Anastasio <sanastasio@raptorengineering.com>
Subject: Re: [PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and
Date: Wed, 18 Jun 2025 14:15:30 -0500	[thread overview]
Message-ID: <20250618191530.GA1218109@bhelgaas> (raw)
In-Reply-To: <317515920.1310655.1750265903281.JavaMail.zimbra@raptorengineeringinc.com>

On Wed, Jun 18, 2025 at 11:58:23AM -0500, Timothy Pearson wrote:
>  recovery

Same weird subject/commit wrapping.

> The existing PowerNV hotplug code did not handle suprise plug events
> correctly, leading to a complete failure of the hotplug system after
> device removal and a required reboot to detect new devices.

s/suprise/surprise/ (also below)

> This comes down to two issues:
> 1.) When a device is suprise removed, oftentimes the bridge upstream
>     port will cause a PE freeze on the PHB.  If this freeze is not
>     cleared, the MSI interrupts from the bridge hotplug notification
>     logic will not be received by the kernel, stalling all plug events
>     on all slots associated with the PE.

I guess you mean the bridge *downstream* port that leads to the slot?

> 2.) When a device is removed from a slot, regardless of suprise or
>     programmatic removal, the associated PHB/PE ls left frozen.
>     If this freeze is not cleared via a fundamental reset, skiboot
>     is unable to clear the freeze and cannot retrain / rescan the
>     slot.  This also requires a reboot to clear the freeze and redetect
>     the device in the slot.
> 
> Issue the appropriate unfreeze and rescan commands on hotplug events,
> and don't oops on hotplug if pci_bus_to_OF_node() returns NULL.
> 
> Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
> ---
>  arch/powerpc/kernel/pci-hotplug.c |  3 ++
>  drivers/pci/hotplug/pnv_php.c     | 53 ++++++++++++++++++++++++++++++-
>  2 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
> index 9ea74973d78d..6f444d0822d8 100644
> --- a/arch/powerpc/kernel/pci-hotplug.c
> +++ b/arch/powerpc/kernel/pci-hotplug.c
> @@ -141,6 +141,9 @@ void pci_hp_add_devices(struct pci_bus *bus)
>  	struct pci_controller *phb;
>  	struct device_node *dn = pci_bus_to_OF_node(bus);
>  
> +	if (!dn)
> +		return;
> +
>  	phb = pci_bus_to_host(bus);
>  
>  	mode = PCI_PROBE_NORMAL;
> diff --git a/drivers/pci/hotplug/pnv_php.c b/drivers/pci/hotplug/pnv_php.c
> index bac8af3df41a..0ceb4a2c3c79 100644
> --- a/drivers/pci/hotplug/pnv_php.c
> +++ b/drivers/pci/hotplug/pnv_php.c
> @@ -10,6 +10,7 @@
>  #include <linux/libfdt.h>
>  #include <linux/module.h>
>  #include <linux/pci.h>
> +#include <linux/delay.h>
>  #include <linux/pci_hotplug.h>
>  #include <linux/of_fdt.h>
>  
> @@ -474,7 +475,7 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
>  	struct hotplug_slot *slot = &php_slot->slot;
>  	uint8_t presence = OPAL_PCI_SLOT_EMPTY;
>  	uint8_t power_status = OPAL_PCI_SLOT_POWER_ON;
> -	int ret;
> +	int ret, i;
>  
>  	/* Check if the slot has been configured */
>  	if (php_slot->state != PNV_PHP_STATE_REGISTERED)
> @@ -532,6 +533,27 @@ static int pnv_php_enable(struct pnv_php_slot *php_slot, bool rescan)
>  
>  	/* Power is off, turn it on and then scan the slot */
>  	ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
> +	if (ret) {
> +		SLOT_WARN(php_slot, "PCI slot activation failed with error code %d, possible frozen PHB", ret);
> +		SLOT_WARN(php_slot, "Attempting complete PHB reset before retrying slot activation\n");
> +		for (i = 0; i < 3; i++) {
> +			/* Slot activation failed, PHB may be fenced from a prior device failure
> +			 * Use the OPAL fundamental reset call to both try a device reset and clear
> +			 * any potentially active PHB fence / freeze
> +			 */
> +			SLOT_WARN(php_slot, "Try %d...\n", i + 1);
> +			pci_set_pcie_reset_state(php_slot->pdev, pcie_warm_reset);
> +			msleep(250);

What is the source of the 250 value?  Is there a spec you can cite for
this?  Maybe add a #define if it makes sense?

> +			pci_set_pcie_reset_state(php_slot->pdev, pcie_deassert_reset);
> +
> +			ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);

Wrap the comment and non-printk lines to fit in 80 columns like the
rest of the file.  Preserve the messages as-is so grep finds them
easily.

Usual multi-line comment style is:

  /*
   * Text ...
   */

Possibly factor this warn/reset code into a helper function to
unclutter pnv_php_enable()?

> +			if (!ret)
> +				break;
> +		}
> +
> +		if (i >= 3)
> +			SLOT_WARN(php_slot, "Failed to bring slot online, aborting!\n");
> +	}
>  	if (ret)
>  		return ret;
>  
> @@ -841,12 +863,41 @@ static void pnv_php_event_handler(struct work_struct *work)
>  	struct pnv_php_event *event =
>  		container_of(work, struct pnv_php_event, work);
>  	struct pnv_php_slot *php_slot = event->php_slot;
> +	struct pci_dev *pdev = php_slot->pdev;
> +	struct eeh_dev *edev;
> +	struct eeh_pe *pe;
> +	int i, rc;
>  
>  	if (event->added)
>  		pnv_php_enable_slot(&php_slot->slot);
>  	else
>  		pnv_php_disable_slot(&php_slot->slot);
>  
> +	if (!event->added) {
> +		/* When a device is surprise removed from a downstream bridge slot, the upstream bridge port
> +		 * can still end up frozen due to related EEH events, which will in turn block the MSI interrupts
> +		 * for slot hotplug detection.  Detect and thaw any frozen upstream PE after slot deactivation...
> +		 */

Restyle and wrap comment.

s/upstream bridge port/bridge downstream port/ to avoid confusion.

> +		edev = pci_dev_to_eeh_dev(pdev);
> +		pe = edev ? edev->pe : NULL;
> +		rc = eeh_pe_get_state(pe);
> +		if ((rc == -ENODEV) || (rc == -ENOENT)) {
> +			SLOT_WARN(php_slot, "Upstream bridge PE state unknown, hotplug detect may fail\n");
> +		}
> +		else {
> +			if (pe->state & EEH_PE_ISOLATED) {
> +				SLOT_WARN(php_slot, "Upstream bridge PE %02x frozen, thawing...\n", pe->addr);
> +				for (i = 0; i < 3; i++)
> +					if (!eeh_unfreeze_pe(pe))
> +						break;
> +				if (i >= 3)
> +					SLOT_WARN(php_slot, "Unable to thaw PE %02x, hotplug detect will fail!\n", pe->addr);
> +				else
> +					SLOT_WARN(php_slot, "PE %02x thawed successfully\n", pe->addr);
> +			}
> +		}
> +	}

Possibly factor this out, too.  Then pnv_php_event_handler() could
look simpler:

  if (event->added) {
    pnv_php_enable_slot(&php_slot->slot);
  } else {
    pnv_php_disable_slot(&php_slot->slot);
    <new helper to check for surprise removal>
  }

  kfree(event);

>  	kfree(event);
>  }
>  
> -- 
> 2.39.5


  reply	other threads:[~2025-06-18 19:15 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-18 16:54 [PATCH v2 0/6] PowerNV PCIe Hotplug Driver Fixes Timothy Pearson
2025-06-18 16:56 ` [PATCH v2 1/6] pci/hotplug/pnv_php: Properly clean up allocated IRQs on Timothy Pearson
2025-06-18 16:56 ` [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken Timothy Pearson
2025-06-18 19:44   ` Bjorn Helgaas
2025-06-18 19:50     ` Timothy Pearson
2025-06-18 20:17       ` Bjorn Helgaas
2025-06-19 19:29         ` Timothy Pearson
2025-06-20  7:52           ` Lukas Wunner
2025-06-20 16:45             ` Timothy Pearson
2025-06-25  8:45               ` Lukas Wunner
2025-06-18 16:57 ` [PATCH v2 3/6] powerpc/eeh: Export eeh_unfreeze_pe() Timothy Pearson
2025-06-18 16:57 ` [PATCH v2 4/6] powerpc/eeh: Make EEH driver device hotplug safe Timothy Pearson
2025-06-18 16:58 ` [PATCH v2 5/6] pci/hotplug/pnv_php: Fix surprise plug detection and Timothy Pearson
2025-06-18 19:15   ` Bjorn Helgaas [this message]
2025-06-19 19:22     ` Timothy Pearson
2025-06-18 16:58 ` [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator Timothy Pearson
2025-06-18 19:01   ` Bjorn Helgaas
2025-06-19  0:37     ` Timothy Pearson
2025-06-20  9:26       ` Krishna Kumar
2025-06-21  9:59         ` Lukas Wunner
2025-06-25  4:08           ` Krishna Kumar
2025-06-25  8:08             ` Lukas Wunner
2025-06-25 10:55               ` Krishna Kumar
2025-06-21 15:05         ` Timothy Pearson
2025-06-24  7:07           ` Krishna Kumar
2025-06-24 16:34             ` Timothy Pearson
2025-06-24 22:34       ` Bjorn Helgaas
2025-07-07  8:01         ` Krishna Kumar
2025-07-11 18:18           ` Timothy Pearson
2025-07-11 21:05             ` Bjorn Helgaas
2025-07-15 21:41               ` Timothy Pearson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250618191530.GA1218109@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=mpe@ellerman.id.au \
    --cc=naveen@kernel.org \
    --cc=sanastasio@raptorengineering.com \
    --cc=tpearson@raptorengineering.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).