linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Limonciello, Mario" <mario.limonciello@amd.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Kai-Heng Feng <kai.heng.feng@canonical.com>,
	bhelgaas@google.com,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Vidya Sagar <vidyas@nvidia.com>,
	Michael Bottini <michael.a.bottini@linux.intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] PCI/ASPM: Enable ASPM on external PCIe devices
Date: Mon, 19 Jun 2023 17:09:55 -0500	[thread overview]
Message-ID: <eea91b5c-a61c-9a66-f084-df7495ae2c0c@amd.com> (raw)
In-Reply-To: <20230619213716.GA23372@bhelgaas>


On 6/19/2023 4:37 PM, Bjorn Helgaas wrote:
> On Mon, Jun 19, 2023 at 11:16:35AM -0500, Limonciello, Mario wrote:
>> On 6/15/2023 10:01 PM, Kai-Heng Feng wrote:
>>> On Fri, Jun 16, 2023 at 1:12 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>> On Thu, Jun 15, 2023 at 03:04:20PM +0800, Kai-Heng Feng wrote:
>>>>> When a PCIe device is hotplugged to a Thunderbolt port, ASPM is not
>>>>> enabled for that device. However, when the device is plugged preboot,
>>>>> ASPM is enabled by default.
>>>>>
>>>>> The disparity happens because BIOS doesn't have the ability to program
>>>>> ASPM on hotplugged devices.
>>>>>
>>>>> So enable ASPM by default for external connected PCIe devices so ASPM
>>>>> settings are consitent between preboot and hotplugged.
>>>>>
>>>>> On HP Thunderbolt Dock G4, enable ASPM can also fix BadDLLP error:
>>>>> pcieport 0000:00:1d.0: AER: Corrected error received: 0000:07:04.0
>>>>> pcieport 0000:07:04.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Receiver ID)
>>>>> pcieport 0000:07:04.0:   device [8086:0b26] error status/mask=00000080/00002000
>>>>> pcieport 0000:07:04.0:    [ 7] BadDLLP
>>>>>
>>>>> The root cause is still unclear, but quite likely because the I225 on
>>>>> the dock supports PTM, where ASPM timing is precalculated for the PTM.
>>>>>
>>>>> Cc: Mario Limonciello <mario.limonciello@amd.com>
>>>>> Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217557
>>>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>>>>> ---
>>>>>    drivers/pci/pcie/aspm.c | 4 +++-
>>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>>> index 66d7514ca111..613b0754c9bb 100644
>>>>> --- a/drivers/pci/pcie/aspm.c
>>>>> +++ b/drivers/pci/pcie/aspm.c
>>>>> @@ -119,7 +119,9 @@ static int policy_to_aspm_state(struct pcie_link_state *link)
>>>>>                 /* Enable Everything */
>>>>>                 return ASPM_STATE_ALL;
>>>>>         case POLICY_DEFAULT:
>>>>> -             return link->aspm_default;
>>>>> +             return dev_is_removable(&link->downstream->dev) ?
>>>>> +                     link->aspm_capable :
>>>>> +                     link->aspm_default;
>>>> I'm a little hesitant because dev_is_removable() is a convenient
>>>> test that covers the current issue, but it doesn't seem tightly
>>>> connected from a PCIe architecture perspective.
>>>>
>>>> I think the current model of compile-time ASPM policy selection:
>>>>
>>>>     CONFIG_PCIEASPM_DEFAULT          /* BIOS default setting */
>>>>     CONFIG_PCIEASPM_PERFORMANCE      /* disable L0s and L1 */
>>>>     CONFIG_PCIEASPM_POWERSAVE        /* enable L0s and L1 */
>>>>     CONFIG_PCIEASPM_POWER_SUPERSAVE  /* enable L1 substates */
>>>>
>>>> is flawed.  As far as I know, there's no technical reason we
>>>> have to select this at kernel build-time.  I suspect the
>>>> original reason was risk avoidance, i.e., we were worried that
>>>> we might expose hardware defects if we enabled ASPM states that
>>>> BIOS hadn't already enabled.
>>>>
>>>> How do we get out of that model?  We do have sysfs knobs that
>>>> should cover all the functionality (set overall policy as above
>>>> via /sys/module/pcie_aspm/parameters/policy; set device-level
>>>> exceptions via /sys/bus/pci/devices/.../link/*_aspm).
>>> Agree. Build-time policy can be obsoleted by boot-time argument.
>> I agree as well.
> This isn't strictly relevant to the current problem, so let's put this
> on the back burner for now.


I think it could fold into it if we end up treating the i225
PCIe device as a quirk as mentioned below.

>
>>>> In my opinion, the cleanest solution would be to enable all ASPM
>>>> functionality whenever possible and let users disable it if they
>>>> need to for performance.  If there are device defects when
>>>> something is enabled, deal with it via quirks, as we do for
>>>> other PCI features.
>>>>
>>>> That feels a little risky, but let's have a conversation about
>>>> where we want to go in the long term.  It's good to avoid risk,
>>>> but too much avoidance leads to its own complexity and an
>>>> inability to change things.
>>> I think we should separate the situation into two cases:
>>> - When BIOS/system firmware has the ability to program ASPM, honor
>>> it.  This applies to most "internal" PCI devices.
>>> - When BIOS/system can't program ASPM, enable ASPM for whatever
>>> it's capable of. Most notable case is Intel VMD controller, and
>>> this patch for devices connected through TBT.
>>>
>>> Enabling all ASPM functionality regardless of what's being
>>> pre-programmed by BIOS is way too risky.  Disabling ASPM to
>>> workaround issues and defects are still quite common among
>>> hardware manufacturers.
> It sounds like you have actual experience with this :)  Do you have
> any concrete examples that we can use as "known breakage"?
A variety of Intel chipsets don't support lane width switching
or speed switching.  When ASPM has been enabled on a dGPU,
these features are utilized and breakage ensues.

There are various methods to try to mitigate the impact both in
firmware and driver code.

>
> This feels like a real problem to me.  There are existing mechanisms
> (ACPI_FADT_NO_ASPM and _OSC PCIe cap ownership) the platform can use
> to prevent the OS from using ASPM.
>
> If vendors assume that *in addition*, the OS will pay attention to
> whatever ASPM configuration BIOS did, that's a major disconnect.  We
> don't do anything like that for other PCI features, and I'm not aware
> of any restriction like that being documented.
With both of those policies in place, how did we get into
the situation of having configuration options and knobs?
>> I think the pragmatic way to approach it is to (essentially) apply
>> the policy as BIOS defaults and allow overrides from that.
> Do you mean that when enumerating a device (at boot-time or hot-add
> time), we would read the current ASPM config but not change it?  And
> users could use the sysfs knobs to enable/disable ASPM as desired?
Yes.
> That wouldn't solve the problem Kai-Heng is trying to solve.
Alone it wouldn't; but if you treated the i225 PCIe device
connected to the system as a "quirk" to apply ASPM policy
from the parent device to this child device it could.
> Or that we leave ASPM alone during boot-time enumeration, but enable
> ASPM when we enumerate hot-added devices?  It doesn't sound right that
> a device would be configured differently if present at boot vs
> hot-added.
Same policy for both boot and hot add but specifically if the
device is in a quirk list to enable it or disable it.

  reply	other threads:[~2023-06-19 22:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15  7:04 [PATCH] PCI/ASPM: Enable ASPM on external PCIe devices Kai-Heng Feng
2023-06-15 14:02 ` Ilpo Järvinen
2023-06-15 17:07 ` Sathyanarayanan Kuppuswamy
2023-06-16  2:37   ` Kai-Heng Feng
2023-06-15 17:12 ` Bjorn Helgaas
2023-06-16  3:01   ` Kai-Heng Feng
2023-06-19 16:16     ` Limonciello, Mario
2023-06-19 21:37       ` Bjorn Helgaas
2023-06-19 22:09         ` Limonciello, Mario [this message]
2023-06-20 18:28           ` Bjorn Helgaas
2023-06-20 18:36             ` Limonciello, Mario
2023-06-22 23:06               ` Bjorn Helgaas
2023-06-27  8:35                 ` Kai-Heng Feng
2023-06-27 20:54                   ` Bjorn Helgaas
2023-06-28  5:09                     ` Kai-Heng Feng
2023-07-05 20:06                       ` [Intel-wired-lan] " Bjorn Helgaas
2023-07-06  4:07                         ` Mario Limonciello
2023-07-14  8:17                           ` Kai-Heng Feng
2023-07-14 16:37                             ` Mario Limonciello
2023-07-17  3:34                               ` Kai-Heng Feng
2023-07-17 16:51                                 ` Limonciello, Mario
2023-07-18 19:24                                   ` Bjorn Helgaas
2023-08-11  8:34                                     ` Kai-Heng Feng
2023-06-16 22:01 ` Bjorn Helgaas
2023-06-21  3:08   ` Kai-Heng Feng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eea91b5c-a61c-9a66-f084-df7495ae2c0c@amd.com \
    --to=mario.limonciello@amd.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=michael.a.bottini@linux.intel.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=vidyas@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).