From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
To: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>,
Bjorn Helgaas <helgaas@kernel.org>,
"Rafael J. Wysocki" <rafael@kernel.org>
Cc: linux-pci@vger.kernel.org, linux-pm@vger.kernel.org,
Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
Oliver O'Halloran <oohall@gmail.com>
Subject: Re: [PATCH] PCI/AER: Block runtime suspend when handling errors
Date: Fri, 9 Feb 2024 07:45:05 -0800 [thread overview]
Message-ID: <64ad8d52-ba67-4156-8e36-7346605bdf48@linux.intel.com> (raw)
In-Reply-To: <20240209140841.1854711-1-stanislaw.gruszka@linux.intel.com>
On 2/9/24 6:08 AM, Stanislaw Gruszka wrote:
> PM runtime can be done simultaneously with AER error handling.
> Avoid that by using pm_runtime_get_sync() before and pm_runtime_put()
> after reset in pcie_do_recovery() for all recovering devices.
>
> pm_runtime_get_sync() will increase dev->power.usage_count counter
> to prevent any possible future request to runtime suspend a device,
> as well as resume device is was in D3hot state.
runtime suspend a device or resume a device that was in D3hot state.
>
> I tested with igc device by doing simultaneous aer_inject and
> rpm suspend/resume via /sys/bus/pci/devices/PCI_ID/power/control
> and can reproduce:
>
> igc 0000:02:00.0: not ready 65535ms after bus reset; giving up
> pcieport 0000:00:1c.2: AER: Root Port link has been reset (-25)
> pcieport 0000:00:1c.2: AER: subordinate device reset failed
> pcieport 0000:00:1c.2: AER: device recovery failed
> igc 0000:02:00.0: Unable to change power state from D3hot to D0, device inaccessible
>
> The problem disappears when applied this patch.
>
> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
> ---
since it is a bug fix, may be Cc: stable@vger.kernel.org
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> RFC -> v1:
> add runtime callbacks to pcie_do_recovery(), this covers DPC case
> as well as case of recovering multiple devices under same port.
>
> drivers/pci/pcie/err.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 59c90d04a609..705893b5f7b0 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -13,6 +13,7 @@
> #define dev_fmt(fmt) "AER: " fmt
>
> #include <linux/pci.h>
> +#include <linux/pm_runtime.h>
> #include <linux/module.h>
> #include <linux/kernel.h>
> #include <linux/errno.h>
> @@ -85,6 +86,18 @@ static int report_error_detected(struct pci_dev *dev,
> return 0;
> }
>
> +static int pci_pm_runtime_get_sync(struct pci_dev *pdev, void *data)
> +{
> + pm_runtime_get_sync(&pdev->dev);
> + return 0;
> +}
> +
> +static int pci_pm_runtime_put(struct pci_dev *pdev, void *data)
> +{
> + pm_runtime_put(&pdev->dev);
> + return 0;
> +}
> +
> static int report_frozen_detected(struct pci_dev *dev, void *data)
> {
> return report_error_detected(dev, pci_channel_io_frozen, data);
> @@ -207,6 +220,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> else
> bridge = pci_upstream_bridge(dev);
>
> + pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
> +
> pci_dbg(bridge, "broadcast error_detected message\n");
> if (state == pci_channel_io_frozen) {
> pci_walk_bridge(bridge, report_frozen_detected, &status);
> @@ -251,10 +266,15 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> pcie_clear_device_status(dev);
> pci_aer_clear_nonfatal_status(dev);
> }
> +
> + pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
> +
> pci_info(bridge, "device recovery successful\n");
> return status;
>
> failed:
> + pci_walk_bridge(bridge, pci_pm_runtime_put, NULL);
> +
> pci_uevent_ers(bridge, PCI_ERS_RESULT_DISCONNECT);
>
> /* TODO: Should kernel panic here? */
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
next prev parent reply other threads:[~2024-02-09 15:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-09 14:08 [PATCH] PCI/AER: Block runtime suspend when handling errors Stanislaw Gruszka
2024-02-09 14:16 ` Rafael J. Wysocki
2024-02-09 15:45 ` Kuppuswamy Sathyanarayanan [this message]
2024-02-12 8:13 ` Stanislaw Gruszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=64ad8d52-ba67-4156-8e36-7346605bdf48@linux.intel.com \
--to=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=rafael@kernel.org \
--cc=stanislaw.gruszka@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.