From: Mario Limonciello <mario.limonciello@amd.com>
To: Werner Sembach <wse@tuxedocomputers.com>,
Bjorn Helgaas <bhelgaas@google.com>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
Mika Westerberg <mika.westerberg@linux.intel.com>,
Richard Hughes <hughsient@gmail.com>
Cc: ggo@tuxedocomputers.com, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] PCI: Avoid putting some root ports into D3 on some Ryzen chips
Date: Thu, 12 Dec 2024 13:01:02 -0600 [thread overview]
Message-ID: <20cfa4ed-d25d-4881-81b9-9f1698efe9ff@amd.com> (raw)
In-Reply-To: <6a809349-016a-42bf-b139-544aeec543aa@tuxedocomputers.com>
+ Richard Hughes for some extra comments on the capsule issues.
It's tangential to your immediate problem, but I think if we can help
you to get it solved it will be healthier for your future products.
Here is the full thread for context.
https://lore.kernel.org/all/20241209193614.535940-1-wse@tuxedocomputers.com/
More comments below.
On 12/12/2024 12:47, Werner Sembach wrote:
>
> Am 11.12.24 um 22:24 schrieb Mario Limonciello:
>> On 12/11/2024 06:47, Werner Sembach wrote:
>>> Hi,
>>>
>>> Am 10.12.24 um 17:00 schrieb Mario Limonciello:
>>>> On 12/10/2024 09:24, Werner Sembach wrote:
>>>>> Hi,
>>>>>
>>>>> Am 09.12.24 um 20:45 schrieb Mario Limonciello:
>>>>>> On 12/9/2024 13:36, Werner Sembach wrote:
>>>>>>> From: Mario Limonciello <mario.limonciello@amd.com>
>>>>>>>
>>>>>>> commit 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>>> sets the policy that all PCIe ports are allowed to use D3. When
>>>>>>> the system is suspended if the port is not power manageable by the
>>>>>>> platform and won't be used for wakeup via a PME this sets up the
>>>>>>> policy for these ports to go into D3hot.
>>>>>>>
>>>>>>> This policy generally makes sense from an OSPM perspective but it
>>>>>>> leads
>>>>>>> to problems with wakeup from suspend on the TUXEDO Sirius 16 Gen
>>>>>>> 1 with
>>>>>>> an unupdated BIOS.
>>>>>>>
>>>>>>> - On family 19h model 44h (PCI 0x14b9) this manifests as a
>>>>>>> missing wakeup
>>>>>>> interrupt.
>>>>>>> - On family 19h model 74h (PCI 0x14eb) this manifests as a system
>>>>>>> hang.
>>>>>>>
>>>>>>> On the affected Device + BIOS combination, add a quirk for the
>>>>>>> PCI device
>>>>>>> ID used by the problematic root port on both chips to ensure that
>>>>>>> these
>>>>>>> root ports are not put into D3hot at suspend.
>>>>>>>
>>>>>>> This patch is based on
>>>>>>> https://lore.kernel.org/linux-pci/20230708214457.1229-2-
>>>>>>> mario.limonciello@amd.com/
>>>>>>> but with the added condition both in the documentation and in the
>>>>>>> code to
>>>>>>> apply only to the TUXEDO Sirius 16 Gen 1 with the original
>>>>>>> unpatched BIOS.
>>>>>>>
>>>>>>> Co-developed-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>>>> Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
>>>>>>> Co-developed-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>>>> Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
>>>>>>> Cc: stable@vger.kernel.org # 6.1+
>>>>>>> Reported-by: Iain Lane <iain@orangesquash.org.uk>
>>>>>>> Closes: https://forums.lenovo.com/t5/Ubuntu/Z13-can-t-resume-
>>>>>>> from- suspend-with-external-USB-keyboard/m-p/5217121
>>>>>>> Fixes: 9d26d3a8f1b0 ("PCI: Put PCIe ports into D3 during suspend")
>>>>>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>>>>>> ---
>>>>>>> drivers/pci/quirks.c | 31 +++++++++++++++++++++++++++++++
>>>>>>> 1 file changed, 31 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>>>> index 76f4df75b08a1..2226dca56197d 100644
>>>>>>> --- a/drivers/pci/quirks.c
>>>>>>> +++ b/drivers/pci/quirks.c
>>>>>>> @@ -3908,6 +3908,37 @@ static void
>>>>>>> quirk_apple_poweroff_thunderbolt(struct pci_dev *dev)
>>>>>>> DECLARE_PCI_FIXUP_SUSPEND_LATE(PCI_VENDOR_ID_INTEL,
>>>>>>> PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C,
>>>>>>> quirk_apple_poweroff_thunderbolt);
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Putting PCIe root ports on Ryzen SoCs with USB4 controllers
>>>>>>> into D3hot
>>>>>>> + * may cause problems when the system attempts wake up from s2idle.
>>>>>>> + *
>>>>>>> + * On family 19h model 44h (PCI 0x14b9) this manifests as a
>>>>>>> missing wakeup
>>>>>>> + * interrupt.
>>>>>>> + * On family 19h model 74h (PCI 0x14eb) this manifests as a
>>>>>>> system hang.
>>>>>>> + *
>>>>>>> + * This fix is still required on the TUXEDO Sirius 16 Gen1 with
>>>>>>> the original
>>>>>>> + * unupdated BIOS.
>>>>>>> + */
>>>>>>> +static const struct dmi_system_id quirk_ryzen_rp_d3_dmi_table[] = {
>>>>>>> + {
>>>>>>> + .matches = {
>>>>>>> + DMI_MATCH(DMI_SYS_VENDOR, "TUXEDO"),
>>>>>>> + DMI_MATCH(DMI_BOARD_NAME, "APX958"),
>>>>>>> + DMI_MATCH(DMI_BIOS_VERSION, "V1.00A00"),
>>>>>>> + },
>>>>>>> + },
>>>>>>> + {}
>>>>>>> +};
>>>>>>> +
>>>>>>> +static void quirk_ryzen_rp_d3(struct pci_dev *pdev)
>>>>>>> +{
>>>>>>> + if (dmi_check_system(quirk_ryzen_rp_d3_dmi_table) &&
>>>>>>> + !acpi_pci_power_manageable(pdev))
>>>>>>> + pdev->dev_flags |= PCI_DEV_FLAGS_NO_D3;
>>>>>>> +}
>>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14b9,
>>>>>>> quirk_ryzen_rp_d3);
>>>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x14eb,
>>>>>>> quirk_ryzen_rp_d3);
>>>>>>> #endif
>>>>>>> /*
>>>>>>
>>>>>> Wait, what is wrong with:
>>>>>>
>>>>>> commit 7d08f21f8c630 ("x86/PCI: Avoid PME from D3hot/D3cold for
>>>>>> AMD Rembrandt and Phoenix USB4")
>>>>>>
>>>>>> Is that not activating on your system for some reason?
>>>>>
>>>>> Doesn't seem so, tested with the old BIOS and 6.13-rc2 and had
>>>>> blackscreen on wakeup.
>>>>
>>>> OK, I think we need to dig a layer deeper to see which root port is
>>>> causing problems to understand this.
>>>>
>>>>>
>>>>> With a newer BIOS for that device suspend and resume however works.
>>>>>
>>>>
>>>> Is there any reason that people would realistically be staying on
>>>> the old BIOS and instead we need to carry this quirk in the kernel
>>>> for eternity?
>>> Fear of device bricking or not knowing an update is available.
>>
>> The not knowing is solved by publishing firmware to LVFS.
>>
>> Most "popular" distributions include fwupd, regularly check for
>> updates and will notify users through the MOTD or a graphical
>> application that there is something available.
>>
>>>>
>>>> With the Linux ecosystem for BIOS updates through fwupd + LVFS it's
>>>> not a very big barrier to entry to do an update like it was 20 years
>>>> ago.
>>> Sadly fwupd/LVFS does not support executing arbitrary efi binaries/
>>> nsh scripts which still is the main form ODMs provide bios updates.
>>
>> It's tangential to this thread; but generally ODMs will provide you a
>> capsule if you ask for one.
>
> I already evaluated this a while back with the results:
>
Thanks for sharing.
> If the BIOS is an AMI one you can get a capsule, but this capsule might
> overwrite DMI strings or the vendor boot logo if not flashed with the
> AMI flasher and extra flags (sadly I was not able to find out what these
> flags do under the hood to give fwupd the same capabilities). Also these
> capsules often not include Intel ME and EC firmware updates and certain
> bios version might only work with certain EC firmwares.
>
It sounds like some bugs in the implementation of the capsule handler
for this system.
> Last I checked in with the ODM that uses Insyde BIOSes we where not able
> to get a capsule update. I don't know if that is an ODM or Insyde problem.
It's not an Insyde problem. I use Insyde capsules regularly myself from
fwupd. I also know several other OEMs that ship capsules to LVFS that
use Insyde.
>
> For the rest I will come back to this on Monday as I'm currently away
> from the testing device, thanks for all the feedback so far.
Ack.
>
>>
>> Anyhow if you are providing scripts and random EFI binaries in order
>> to get things updated, that explains why this is a large enough chunk
>> of people to justify a patch like this.
>>
>>>>
>>>>> Looking in the patch the device id's are different (0x162e, 0x162f,
>>>>> 0x1668, and 0x1669).
>>>>>
>>>>
>>>> TUXEDO Sirius 16 Gen1 is Phoenix based, right? So at a minimum you
>>>> shouldn't be including PCI IDs from Rembrandt (0x14b9)
>>> Thanks for the hint, I can delete that then.
>>>>
>>>> Here is the topology from a Phoenix system on my side:
>>>>
>>>> https://gist.github.com/superm1/85bf0c053008435458bdb39418e109d8
>>>
>>> Sorry for the noob question: How do I get that format? it's not lspci
>>> - tvnn on my system
>>
>> No worry.
>>
>> Having looked at a lot of s2idle bugs I've found it's generally
>> helpful to know what ACPI companion is assigned to devices and what
>> are parents.
>>
>> So it's actually created by this function in the s2idle issue triage
>> script, amd_s2idle.py.
>>
>> https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/
>> amd_s2idle.py#L1945
>>
>> And while on the topic, does your "broken" BIOS report this from that
>> script?
>>
>> '''
>> Platform may hang resuming. Upgrade your firmware or add
>> pcie_port_pm=off to kernel command line if you have problems
>> '''
>>
>>>
>>> But i think this contains the same information:
>>>
>>> $ lspci -Pnn
>>> 00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Root Complex [1022:14e8]
>>> 00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Phoenix
>>> IOMMU [1022:14e9]
>>> 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>>> GPP Bridge [1022:14ed]
>>> 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>>> GPP Bridge [1022:14ee]
>>> 00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>>> GPP Bridge [1022:14ee]
>>> 00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>>> GPP Bridge [1022:14ee]
>>> 00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>>> GPP Bridge [1022:14ee]
>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family
>>> 19h USB4/Thunderbolt PCIe tunnel [1022:14ef]
>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Dummy Host Bridge [1022:14ea]
>>> 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>>> 00:08.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>>> 00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Phoenix
>>> Internal GPP Bridge to Bus [C:A] [1022:14eb]
>>> 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus
>>> Controller [1022:790b] (rev 71)
>>> 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC
>>> Bridge [1022:790e] (rev 51)
>>> 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Data Fabric; Function 0 [1022:14f0]
>>> 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Data Fabric; Function 1 [1022:14f1]
>>> 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Data Fabric; Function 2 [1022:14f2]
>>> 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Data Fabric; Function 3 [1022:14f3]
>>> 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Data Fabric; Function 4 [1022:14f4]
>>> 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Data Fabric; Function 5 [1022:14f5]
>>> 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Data Fabric; Function 6 [1022:14f6]
>>> 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD]
>>> Phoenix Data Fabric; Function 7 [1022:14f7]
>>> 00:01.1/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/
>>> ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev 12)
>>> 00:01.1/00.0/00.0 PCI bridge [0604]: Advanced Micro Devices, Inc.
>>> [AMD/ ATI] Navi 10 XL Downstream Port of PCI Express Switch
>>> [1002:1479] (rev 12)
>>> 00:01.1/00.0/00.0/00.0 VGA compatible controller [0300]: Advanced
>>> Micro Devices, Inc. [AMD/ATI] Navi 33 [Radeon RX 7600/7600 XT/7600M
>>> XT/7600S/7700S / PRO W7600] [1002:7480] (rev c7)
>>> 00:01.1/00.0/00.0/00.1 Audio device [0403]: Advanced Micro Devices,
>>> Inc. [AMD/ATI] Navi 31 HDMI/DP Audio [1002:ab30]
>>> 00:02.1/00.0 Ethernet controller [0200]: Realtek Semiconductor Co.,
>>> Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller
>>> [10ec:8168] (rev 15)
>>> 00:02.2/00.0 Network controller [0280]: Intel Corporation Wi-Fi
>>> 6E(802.11ax) AX210/AX1675* 2x2 [Typhoon Peak] [8086:2725] (rev 1a)
>>> 00:02.4/00.0 Non-Volatile memory controller [0108]: Samsung
>>> Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
>>> 00:08.1/00.0 VGA compatible controller [0300]: Advanced Micro
>>> Devices, Inc. [AMD/ATI] Phoenix1 [1002:15bf] (rev c1)
>>> 00:08.1/00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/
>>> ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640]
>>> 00:08.1/00.2 Encryption controller [1080]: Advanced Micro Devices,
>>> Inc. [AMD] Phoenix CCP/PSP 3.0 Device [1022:15c7]
>>> 00:08.1/00.3 USB controller [0c03]: Advanced Micro Devices, Inc.
>>> [AMD] Device [1022:15b9]
>>> 00:08.1/00.4 USB controller [0c03]: Advanced Micro Devices, Inc.
>>> [AMD] Device [1022:15ba]
>>> 00:08.1/00.5 Multimedia controller [0480]: Advanced Micro Devices,
>>> Inc. [AMD] ACP/ACP3X/ACP6x Audio Coprocessor [1022:15e2] (rev 63)
>>> 00:08.1/00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD]
>>> Family 17h/19h/1ah HD Audio Controller [1022:15e3]
>>> 00:08.2/00.0 Non-Essential Instrumentation [1300]: Advanced Micro
>>> Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>>> 00:08.2/00.1 Signal processing controller [1180]: Advanced Micro
>>> Devices, Inc. [AMD] AMD IPU Device [1022:1502]
>>> 00:08.3/00.0 Non-Essential Instrumentation [1300]: Advanced Micro
>>> Devices, Inc. [AMD] Phoenix Dummy Function [1022:14ec]
>>> 00:08.3/00.3 USB controller [0c03]: Advanced Micro Devices, Inc.
>>> [AMD] Device [1022:15c0]
>>> 00:08.3/00.4 USB controller [0c03]: Advanced Micro Devices, Inc.
>>> [AMD] Device [1022:15c1]
>>> 00:08.3/00.5 USB controller [0c03]: Advanced Micro Devices, Inc.
>>> [AMD] Pink Sardine USB4/Thunderbolt NHI controller #1 [1022:1668]
>>>
>>>>
>>>> That's why 7d08f21f8c630 intentionally matches the NHI and then
>>>> changes the root port right above that instead of all the root ports
>>>> - because that is where the problem was.
>>> For some reason it doesn't seem to trigger on my system (added debug
>>> output) I will look further into it why that happens.
>>
>> You never see this message in your logs at suspend time (even on a
>> "fixed" BIOS)?
>>
>> "quirk: disabling D3cold for suspend"
>>
>> I'm /suspecting/ you do see it, but you're having problems with
>> another root port.
>>
>> I mentioned this in my previous iterations of patches that eventually
>> landed on that quirk, but Windows and Linux handle root ports
>> differently at suspend time and that could be why it's exposing your
>> BIOS bug.
>>
>> If you can please narrow down which root ports actually need the quirk
>> for your side (feel free to do a similar style to 7d08f21f8c630) I
>> think we could land on something more narrow and upstreamable.
>>
>> At a minimum what you're doing today is covering both Rembrandt and
>> Phoenix and it should only apply to Phoenix.
>>
next prev parent reply other threads:[~2024-12-12 19:01 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-09 19:36 [PATCH v2] PCI: Avoid putting some root ports into D3 on some Ryzen chips Werner Sembach
2024-12-09 19:45 ` Mario Limonciello
2024-12-10 15:24 ` Werner Sembach
2024-12-10 16:00 ` Mario Limonciello
2024-12-11 12:47 ` Werner Sembach
2024-12-11 21:24 ` Mario Limonciello
2024-12-12 18:47 ` Werner Sembach
2024-12-12 19:01 ` Mario Limonciello [this message]
2024-12-13 10:05 ` Richard Hughes
2024-12-16 23:37 ` Werner Sembach
2024-12-17 10:10 ` Richard Hughes
2024-12-17 11:58 ` Werner Sembach
2024-12-17 14:12 ` Mario Limonciello
2024-12-17 14:07 ` Werner Sembach
2024-12-17 14:11 ` Mario Limonciello
2024-12-17 15:58 ` Werner Sembach
2024-12-17 16:08 ` Werner Sembach
2024-12-17 16:18 ` Mario Limonciello
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20cfa4ed-d25d-4881-81b9-9f1698efe9ff@amd.com \
--to=mario.limonciello@amd.com \
--cc=bhelgaas@google.com \
--cc=ggo@tuxedocomputers.com \
--cc=hughsient@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mika.westerberg@linux.intel.com \
--cc=rafael.j.wysocki@intel.com \
--cc=wse@tuxedocomputers.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox