From: Bjorn Helgaas <helgaas@kernel.org>
To: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: bhelgaas@google.com, mahesh@linux.ibm.com, oohall@gmail.com,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, bagasdotme@gmail.com,
regressions@lists.linux.dev, linux-nvme@lists.infradead.org,
kch@nvidia.com, hch@lst.de, gloriouseggroll@gmail.com,
kbusch@kernel.org, sagi@grimberg.me, hare@suse.de
Subject: Re: [PATCH v8 2/3] PCI/AER: Disable AER service on suspend
Date: Thu, 18 Apr 2024 15:35:31 -0500 [thread overview]
Message-ID: <20240418203531.GA251408@bhelgaas> (raw)
In-Reply-To: <20240416043225.1462548-2-kai.heng.feng@canonical.com>
On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote:
> When the power rail gets cut off, the hardware can create some electric
> noise on the link that triggers AER. If IRQ is shared between AER with
> PME, such AER noise will cause a spurious wakeup on system suspend.
>
> When the power rail gets back, the firmware of the device resets itself
> and can create unexpected behavior like sending PTM messages. For this
> case, the driver will always be too late to toggle off features should
> be disabled.
>
> As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power
> Management", TLP and DLLP transmission are disabled for a Link in L2/L3
> Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if
> the power will be turned off during suspend process, disable AER service
> and re-enable it during the resume process. This should not affect the
> basic functionality.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Thanks for reviving this series. I tried follow the history about
this, but there are at least two series that were very similar and I
can't put it all together.
> ---
> v8:
> - Add more bug reports.
>
> v7:
> - Wording
> - Disable AER completely (again) if power will be turned off
>
> v6:
> v5:
> - Wording.
>
> v4:
> v3:
> - No change.
>
> v2:
> - Only disable AER IRQ.
> - No more check on PME IRQ#.
> - Use helper.
>
> drivers/pci/pcie/aer.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index ac6293c24976..bea7818c2d1b 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -28,6 +28,7 @@
> #include <linux/delay.h>
> #include <linux/kfifo.h>
> #include <linux/slab.h>
> +#include <linux/suspend.h>
> #include <acpi/apei.h>
> #include <acpi/ghes.h>
> #include <ras/ras_event.h>
> @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev)
> return 0;
> }
>
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware())
> + aer_disable_rootport(rpc);
Why do we check pci_ancestor_pr3_present(pdev) and
pm_suspend_via_firmware()? I'm getting pretty convinced that we need
to disable AER interrupts on suspend in general. I think it will be
better if we do that consistently on all platforms, not special cases
based on details of how we suspend.
Also, why do we use aer_disable_rootport() instead of just
aer_disable_irq()? I think it's the interrupt that causes issues on
suspend. I see that there *were* some versions that used
aer_disable_irq(), but I can't find the reason it changed.
> +
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware())
> + aer_enable_rootport(rpc);
> +
> + return 0;
> +}
> +
> /**
> * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
> * @dev: pointer to Root Port, RCEC, or RCiEP
> @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = {
> .service = PCIE_PORT_SERVICE_AER,
>
> .probe = aer_probe,
> + .suspend = aer_suspend,
> + .resume = aer_resume,
> .remove = aer_remove,
> };
>
> --
> 2.34.1
>
WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: kch@nvidia.com, regressions@lists.linux.dev,
linux-pci@vger.kernel.org, mahesh@linux.ibm.com,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
kbusch@kernel.org, oohall@gmail.com, hare@suse.de,
bagasdotme@gmail.com, bhelgaas@google.com,
gloriouseggroll@gmail.com, linuxppc-dev@lists.ozlabs.org,
hch@lst.de, sagi@grimberg.me
Subject: Re: [PATCH v8 2/3] PCI/AER: Disable AER service on suspend
Date: Thu, 18 Apr 2024 15:35:31 -0500 [thread overview]
Message-ID: <20240418203531.GA251408@bhelgaas> (raw)
In-Reply-To: <20240416043225.1462548-2-kai.heng.feng@canonical.com>
On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote:
> When the power rail gets cut off, the hardware can create some electric
> noise on the link that triggers AER. If IRQ is shared between AER with
> PME, such AER noise will cause a spurious wakeup on system suspend.
>
> When the power rail gets back, the firmware of the device resets itself
> and can create unexpected behavior like sending PTM messages. For this
> case, the driver will always be too late to toggle off features should
> be disabled.
>
> As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power
> Management", TLP and DLLP transmission are disabled for a Link in L2/L3
> Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if
> the power will be turned off during suspend process, disable AER service
> and re-enable it during the resume process. This should not affect the
> basic functionality.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090
> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Thanks for reviving this series. I tried follow the history about
this, but there are at least two series that were very similar and I
can't put it all together.
> ---
> v8:
> - Add more bug reports.
>
> v7:
> - Wording
> - Disable AER completely (again) if power will be turned off
>
> v6:
> v5:
> - Wording.
>
> v4:
> v3:
> - No change.
>
> v2:
> - Only disable AER IRQ.
> - No more check on PME IRQ#.
> - Use helper.
>
> drivers/pci/pcie/aer.c | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index ac6293c24976..bea7818c2d1b 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -28,6 +28,7 @@
> #include <linux/delay.h>
> #include <linux/kfifo.h>
> #include <linux/slab.h>
> +#include <linux/suspend.h>
> #include <acpi/apei.h>
> #include <acpi/ghes.h>
> #include <ras/ras_event.h>
> @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev)
> return 0;
> }
>
> +static int aer_suspend(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware())
> + aer_disable_rootport(rpc);
Why do we check pci_ancestor_pr3_present(pdev) and
pm_suspend_via_firmware()? I'm getting pretty convinced that we need
to disable AER interrupts on suspend in general. I think it will be
better if we do that consistently on all platforms, not special cases
based on details of how we suspend.
Also, why do we use aer_disable_rootport() instead of just
aer_disable_irq()? I think it's the interrupt that causes issues on
suspend. I see that there *were* some versions that used
aer_disable_irq(), but I can't find the reason it changed.
> +
> + return 0;
> +}
> +
> +static int aer_resume(struct pcie_device *dev)
> +{
> + struct aer_rpc *rpc = get_service_data(dev);
> + struct pci_dev *pdev = rpc->rpd;
> +
> + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware())
> + aer_enable_rootport(rpc);
> +
> + return 0;
> +}
> +
> /**
> * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
> * @dev: pointer to Root Port, RCEC, or RCiEP
> @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = {
> .service = PCIE_PORT_SERVICE_AER,
>
> .probe = aer_probe,
> + .suspend = aer_suspend,
> + .resume = aer_resume,
> .remove = aer_remove,
> };
>
> --
> 2.34.1
>
next prev parent reply other threads:[~2024-04-18 20:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-16 4:32 [PATCH v8 1/3] PCI: Add helper to check if any of ancestor device support D3cold Kai-Heng Feng
2024-04-16 4:32 ` Kai-Heng Feng
2024-04-16 4:32 ` [PATCH v8 2/3] PCI/AER: Disable AER service on suspend Kai-Heng Feng
2024-04-16 4:32 ` Kai-Heng Feng
2024-04-18 1:33 ` Kuppuswamy Sathyanarayanan
2024-04-18 1:33 ` Kuppuswamy Sathyanarayanan
2024-04-18 20:35 ` Bjorn Helgaas [this message]
2024-04-18 20:35 ` Bjorn Helgaas
2024-04-25 7:33 ` Kai-Heng Feng
2024-04-25 7:33 ` Kai-Heng Feng
2024-06-18 20:48 ` Bjorn Helgaas
2024-06-18 20:48 ` Bjorn Helgaas
2024-06-19 6:05 ` Kai-Heng Feng
2024-06-19 6:05 ` Kai-Heng Feng
2024-04-16 4:32 ` [PATCH v8 3/3] PCI/DPC: Disable DPC " Kai-Heng Feng
2024-04-16 4:32 ` Kai-Heng Feng
2024-04-18 1:15 ` [PATCH v8 1/3] PCI: Add helper to check if any of ancestor device support D3cold Kuppuswamy Sathyanarayanan
2024-04-18 1:15 ` Kuppuswamy Sathyanarayanan
2024-04-25 6:26 ` Kai-Heng Feng
2024-04-25 6:26 ` Kai-Heng Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240418203531.GA251408@bhelgaas \
--to=helgaas@kernel.org \
--cc=bagasdotme@gmail.com \
--cc=bhelgaas@google.com \
--cc=gloriouseggroll@gmail.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=kai.heng.feng@canonical.com \
--cc=kbusch@kernel.org \
--cc=kch@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mahesh@linux.ibm.com \
--cc=oohall@gmail.com \
--cc=regressions@lists.linux.dev \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.