linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Heiner Kallweit <hkallweit1@gmail.com>
To: George-Daniel Matei <danielgeorgem@chromium.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	nic_swsd@realtek.com, netdev@vger.kernel.org,
	Bjorn Helgaas <helgaas@kernel.org>
Subject: Re: [PATCH] PCI: r8169: add suspend/resume aspm quirk
Date: Thu, 11 Jul 2024 07:45:39 +0200	[thread overview]
Message-ID: <ad0d1201-1fe1-4170-8cfa-d23e74ef8bfd@gmail.com> (raw)
In-Reply-To: <CACfW=qpNmSeQVG_qSeYpEdk9pf_RTAEEKp+OiBYrRFd3d6HOXg@mail.gmail.com>

On 10.07.2024 17:09, George-Daniel Matei wrote:
> Hi,
> 
>>> Added aspm suspend/resume hooks that run
>>> before and after suspend and resume to change
>>> the ASPM states of the PCI bus in order to allow
>>> the system suspend while trying to prevent card hangs
>>
>> Why is this needed?  Is there a r8169 defect we're working around?
>> A BIOS defect?  Is there a problem report you can reference here?
>>
> 
> We encountered this issue while upgrading from kernel v6.1 to v6.6.
> The system would not suspend with 6.6. We tracked down the problem to
> the NIC of the device, mainly that the following code was removed in
> 6.6:
>> else if (tp->mac_version >= RTL_GIGA_MAC_VER_46)
>>         rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1_2);1

With this (older) 6.1 version everything is ok?
Would mean that L1.1 is active and the system suspends (STR?) properly
also with L1.1 being active.

Under 6.6 per default L1 (incl. sub-states) is disabled.
Then you manually enable L1 (incl. L1.1, but not L1.2?) via sysfs,
and now the system hangs on suspend?

Is this what you're saying? Would be strange because in both cases
L1.1 is active when suspending.


> For the listed devices, ASPM L1 is disabled entirely in 6.6. As for
> the reason, L1 was observed to cause some problems
> (https://bugzilla.kernel.org/show_bug.cgi?id=217814). We use a Raptor
> Lake soc and it won't change residency if the NIC doesn't have L1
> enabled. I saw in 6.1 the following comment:
>> Chips from RTL8168h partially have issues with L1.2, but seem
>> to work fine with L1 and L1.1.
> I was thinking that disabling/enabling L1.1 on the fly before/after
> suspend could help mitigate the risk associated with L1/L1.1 . I know
> that ASPM settings are exposed in sysfs and that this could be done
> from outside the kernel, that was my first approach, but it was
> suggested to me that this kind of workaround would be better suited
> for quirks. I did around 1000 suspend/resume cycles of 16-30 seconds
> each (correcting the resume dev->bus->self being configured twice
> mistake) and did not notice any problems. What do you think, is this a
> good approach ... ?
> 
>>> +             //configure device
>>> +             pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
>>> +                                                PCI_EXP_LNKCTL_ASPMC, 0);
>>> +
>>> +             pci_read_config_word(dev->bus->self,
>>> +                                  dev->bus->self->l1ss + PCI_L1SS_CTL1,
>>> +                                  &val);
>>> +             val = val & ~PCI_L1SS_CTL1_L1SS_MASK;
>>> +             pci_write_config_word(dev->bus->self,
>>> +                                   dev->bus->self->l1ss + PCI_L1SS_CTL1,
>>> +                                   val);
>> Updates the parent (dev->bus->self) twice; was the first one supposed
>> to update the device (dev)?
> Yes, it was supposed to update the device (dev). It's my first time
> sending a patch and I messed something up while doing some style
> changes, I will correct it. I'm sorry for that.
> 
>> This doesn't restore the state as it existed before suspend.  Does
>> this rely on other parts of restore to do that?
> It operates on the assumption that after driver initialization
> PCI_EXP_LNKCTL_ASPMC is 0 and that there are no states enabled in
> CTL1. I did a lspci -vvv dump on the affected devices before and after
> the quirks ran and saw no difference. This could be improved.
> 
>> What is the RTL8168 chip version used on these systems?
> It should be RTL8111H.
> 
>> What's the root cause of the issue?
>> A silicon bug on the host side?
> I think it's the ASPM implementation of the soc.
> 
>> ASPM L1 is disabled per default in r8169. So why is the patch needed
>> at all?
> Leaving it disabled all the time prevents the system from suspending.
> 
> Thank you,
> George-Daniel Matei
> 
> 
> 
> 
> 
> On Tue, Jul 9, 2024 at 12:15 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 08.07.2024 19:23, Bjorn Helgaas wrote:
>>> [+cc r8169 folks]
>>>
>>> On Mon, Jul 08, 2024 at 03:38:15PM +0000, George-Daniel Matei wrote:
>>>> Added aspm suspend/resume hooks that run
>>>> before and after suspend and resume to change
>>>> the ASPM states of the PCI bus in order to allow
>>>> the system suspend while trying to prevent card hangs
>>>
>>> Why is this needed?  Is there a r8169 defect we're working around?
>>> A BIOS defect?  Is there a problem report you can reference here?
>>>
>>
>> Basically the same question from my side. Apparently such a workaround
>> isn't needed on any other system. And Realtek NICs can be found on more
>> or less every consumer system. What's the root cause of the issue?
>> A silicon bug on the host side?
>>
>> What is the RTL8168 chip version used on these systems?
>>
>> ASPM L1 is disabled per default in r8169. So why is the patch needed
>> at all?
>>
>>> s/Added/Add/
>>>
>>> s/aspm/ASPM/ above
>>>
>>> s/PCI bus/device and parent/
>>>
>>> Add period at end of sentence.
>>>
>>> Rewrap to fill 75 columns.
>>>
>>>> Signed-off-by: George-Daniel Matei <danielgeorgem@chromium.org>
>>>> ---
>>>>  drivers/pci/quirks.c | 142 +++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 142 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>> index dc12d4a06e21..aa3dba2211d3 100644
>>>> --- a/drivers/pci/quirks.c
>>>> +++ b/drivers/pci/quirks.c
>>>> @@ -6189,6 +6189,148 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56b0, aspm_l1_acceptable_latency
>>>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56b1, aspm_l1_acceptable_latency);
>>>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c0, aspm_l1_acceptable_latency);
>>>>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c1, aspm_l1_acceptable_latency);
>>>> +
>>>> +static const struct dmi_system_id chromebox_match_table[] = {
>>>> +    {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Brask"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Aurash"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +            {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Bujia"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Gaelin"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Gladios"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Hahn"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Jeev"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Kinox"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Kuldax"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    {
>>>> +            .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Lisbon"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    {
>>>> +                    .matches = {
>>>> +                    DMI_MATCH(DMI_PRODUCT_NAME, "Moli"),
>>>> +                    DMI_MATCH(DMI_BIOS_VENDOR, "coreboot"),
>>>> +            }
>>>> +    },
>>>> +    { }
>>>> +};
>>>> +
>>>> +static void rtl8169_suspend_aspm_settings(struct pci_dev *dev)
>>>> +{
>>>> +    u16 val = 0;
>>>> +
>>>> +    if (dmi_check_system(chromebox_match_table)) {
>>>> +            //configure parent
>>>> +            pcie_capability_clear_and_set_word(dev->bus->self,
>>>> +                                               PCI_EXP_LNKCTL,
>>>> +                                               PCI_EXP_LNKCTL_ASPMC,
>>>> +                                               PCI_EXP_LNKCTL_ASPM_L1);
>>>> +
>>>> +            pci_read_config_word(dev->bus->self,
>>>> +                                 dev->bus->self->l1ss + PCI_L1SS_CTL1,
>>>> +                                 &val);
>>>> +            val = (val & ~PCI_L1SS_CTL1_L1SS_MASK) |
>>>> +                  PCI_L1SS_CTL1_PCIPM_L1_2 | PCI_L1SS_CTL1_PCIPM_L1_2 |
>>>> +                  PCI_L1SS_CTL1_ASPM_L1_1;
>>>> +            pci_write_config_word(dev->bus->self,
>>>> +                                  dev->bus->self->l1ss + PCI_L1SS_CTL1,
>>>> +                                  val);
>>>> +
>>>> +            //configure device
>>>> +            pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
>>>> +                                               PCI_EXP_LNKCTL_ASPMC,
>>>> +                                               PCI_EXP_LNKCTL_ASPM_L1);
>>>> +
>>>> +            pci_read_config_word(dev, dev->l1ss + PCI_L1SS_CTL1, &val);
>>>> +            val = (val & ~PCI_L1SS_CTL1_L1SS_MASK) |
>>>> +                  PCI_L1SS_CTL1_PCIPM_L1_2 | PCI_L1SS_CTL1_PCIPM_L1_2 |
>>>> +                  PCI_L1SS_CTL1_ASPM_L1_1;
>>>> +            pci_write_config_word(dev, dev->l1ss + PCI_L1SS_CTL1, val);
>>>> +    }
>>>> +}
>>>> +
>>>> +DECLARE_PCI_FIXUP_SUSPEND(PCI_VENDOR_ID_REALTEK, 0x8168,
>>>> +                      rtl8169_suspend_aspm_settings);
>>>> +
>>>> +static void rtl8169_resume_aspm_settings(struct pci_dev *dev)
>>>> +{
>>>> +    u16 val = 0;
>>>> +
>>>> +    if (dmi_check_system(chromebox_match_table)) {
>>>> +            //configure device
>>>> +            pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
>>>> +                                               PCI_EXP_LNKCTL_ASPMC, 0);
>>>> +
>>>> +            pci_read_config_word(dev->bus->self,
>>>> +                                 dev->bus->self->l1ss + PCI_L1SS_CTL1,
>>>> +                                 &val);
>>>> +            val = val & ~PCI_L1SS_CTL1_L1SS_MASK;
>>>> +            pci_write_config_word(dev->bus->self,
>>>> +                                  dev->bus->self->l1ss + PCI_L1SS_CTL1,
>>>> +                                  val);
>>>> +
>>>> +            //configure parent
>>>> +            pcie_capability_clear_and_set_word(dev->bus->self,
>>>> +                                               PCI_EXP_LNKCTL,
>>>> +                                               PCI_EXP_LNKCTL_ASPMC, 0);
>>>> +
>>>> +            pci_read_config_word(dev->bus->self,
>>>> +                                 dev->bus->self->l1ss + PCI_L1SS_CTL1,
>>>> +                                 &val);
>>>> +            val = val & ~PCI_L1SS_CTL1_L1SS_MASK;
>>>> +            pci_write_config_word(dev->bus->self,
>>>> +                                  dev->bus->self->l1ss + PCI_L1SS_CTL1,
>>>> +                                  val);
>>>
>>> Updates the parent (dev->bus->self) twice; was the first one supposed
>>> to update the device (dev)?
>>>
>>> This doesn't restore the state as it existed before suspend.  Does
>>> this rely on other parts of restore to do that?
>>>
>>>> +    }
>>>> +}
>>>> +
>>>> +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_REALTEK, 0x8168,
>>>> +                     rtl8169_resume_aspm_settings);
>>>>  #endif
>>>>
>>>>  #ifdef CONFIG_PCIE_DPC
>>>> --
>>>> 2.45.2.803.g4e1b14247a-goog
>>>>
>>


  parent reply	other threads:[~2024-07-11  5:45 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-08 15:38 [PATCH] PCI: r8169: add suspend/resume aspm quirk George-Daniel Matei
2024-07-08 17:23 ` Bjorn Helgaas
2024-07-08 22:15   ` Heiner Kallweit
2024-07-10 15:09     ` George-Daniel Matei
2024-07-10 19:59       ` Heiner Kallweit
2024-07-17 17:12         ` George-Daniel Matei
2024-07-17 19:34           ` Heiner Kallweit
2024-07-10 21:38       ` Bjorn Helgaas
2024-07-25 12:56         ` George-Daniel Matei
2024-07-25 19:46           ` Heiner Kallweit
2024-07-11  5:45       ` Heiner Kallweit [this message]
2024-07-16 12:13         ` George-Daniel Matei
2024-07-16 19:25           ` Heiner Kallweit
2024-07-25 15:34             ` Bjorn Helgaas
2024-07-10 14:20 ` Ilpo Järvinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad0d1201-1fe1-4170-8cfa-d23e74ef8bfd@gmail.com \
    --to=hkallweit1@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=danielgeorgem@chromium.org \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nic_swsd@realtek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).