* Re: [PATCH v7 1/3] PCI: Add d3cold as general reset method
[not found] <20260603105853.326290-2-jtornosm@redhat.com>
@ 2026-06-10 18:42 ` Bjorn Helgaas
0 siblings, 0 replies; only message in thread
From: Bjorn Helgaas @ 2026-06-10 18:42 UTC (permalink / raw)
To: Jose Ignacio Tornos Martinez
Cc: bhelgaas, alex, jjohnson, mani, linux-pci, linux-wireless, ath11k,
ath12k, mhi, linux-kernel, linux-pm
[+cc linux-pm]
On Wed, Jun 03, 2026 at 12:58:51PM +0200, Jose Ignacio Tornos Martinez wrote:
> Add D3cold power cycle as a general PCI reset method for single-function
> devices on platforms with ACPI _PR3 power resources. This provides true
> power cycle reset capability when the platform can physically cut power
> to the device.
>
> The implementation strictly requires _PR3 to be present - the platform
> must be able to control device power. This ensures d3cold only attempts
> true power cycling, not falling back to D3hot transitions.
>
> D3cold reset is placed at the end of the reset hierarchy since it requires
> specific platform support and should be tried after standard methods.
>
> Reset hierarchy with this change:
> 1. device_specific
> 2. acpi
> 3. flr
> 4. af_flr
> 5. pm (D3hot via config space, checks NoSoftRst)
> 6. bus (SBR)
> 7. cxl_bus
> 8. d3cold (NEW - true power cycle, requires _PR3)
>
> This benefits:
> - Platforms with _PR3 support
> - Single-function devices needing true power cycle
> - VFIO passthrough scenarios where FLR/PM unavailable
I assume you have some specific device(s) in mind here; can we mention
any examples so this is more than just theoretically useful?
> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
> ---
> v7: code unchanged from v6
> v6: https://lore.kernel.org/all/20260602160024.1171949-2-jtornosm@redhat.com/
>
> drivers/pci/pci.c | 50 +++++++++++++++++++++++++++++++++++++++++++++
> include/linux/pci.h | 2 +-
> 2 files changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8f7cfcc00090..096868f80cd4 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4491,6 +4491,55 @@ static int pci_pm_reset(struct pci_dev *dev, bool probe)
> return ret;
> }
>
> +/**
> + * pci_d3cold_reset - Put device into D3cold and back to D0 for reset
> + * @dev: PCI device to reset
> + * @probe: if true, check if D3cold reset is supported; if false, perform reset
> + *
> + * Reset the device by transitioning through D3cold (actual power removal via
> + * platform power control) and back to D0. This requires ACPI _PR3 power
> + * resources to be present - the platform must be able to physically cut power
> + * to the device.
> + *
> + * Only available for single-function devices to avoid affecting other
> + * functions in multi-function devices.
> + *
> + * Returns 0 if device can be/was reset this way, -ENOTTY if not supported,
> + * or other negative error code on failure.
> + */
> +static int pci_d3cold_reset(struct pci_dev *dev, bool probe)
> +{
> + int ret;
> +
> + if (dev->multifunction)
> + return -ENOTTY;
> +
> + if (!pci_pr3_present(dev))
> + return -ENOTTY;
platform_pci_set_power_state() is currently only implemented for MID
and ACPI, but we're starting to see DT systems with power control for
devices, so I assume it may someday be extended to support them.
Checking pci_pr3_present() seems wrong to me because it assumes ACPI.
On ACPI systems, it seems like pci_set_power_state(PCI_D3cold) should
return an error if the necessary ACPI methods are not present, and we
wouldn't need this check. Is that not the case?
Or is this _PR3 check here just to support the "probe"? I don't know
if there's a generic interface to tell us whether we can control the
power to a device.
> + if (probe)
> + return 0;
> +
> + if (dev->current_state != PCI_D0)
> + return -EINVAL;
> +
> + ret = pci_dev_reset_iommu_prepare(dev);
> + if (ret) {
> + pci_err(dev, "failed to stop IOMMU for a PCI reset: %d\n", ret);
> + return ret;
> + }
> +
> + ret = pci_set_power_state(dev, PCI_D3cold);
> + if (ret)
> + goto done;
> +
> + ret = pci_set_power_state(dev, PCI_D0);
> +
> +done:
> + pci_dev_reset_iommu_done(dev);
> + return ret;
> +}
> +
> /**
> * pcie_wait_for_link_status - Wait for link status change
> * @pdev: Device whose link to wait for.
> @@ -5065,6 +5114,7 @@ const struct pci_reset_fn_method pci_reset_fn_methods[] = {
> { pci_pm_reset, .name = "pm" },
> { pci_reset_bus_function, .name = "bus" },
> { cxl_reset_bus_function, .name = "cxl_bus" },
> + { pci_d3cold_reset, .name = "d3cold" },
> };
>
> /**
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 2c4454583c11..1ca7b880ead7 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -51,7 +51,7 @@
> PCI_STATUS_PARITY)
>
> /* Number of reset methods used in pci_reset_fn_methods array in pci.c */
> -#define PCI_NUM_RESET_METHODS 8
> +#define PCI_NUM_RESET_METHODS 9
>
> #define PCI_RESET_PROBE true
> #define PCI_RESET_DO_RESET false
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] only message in thread