From: Bjorn Helgaas <helgaas@kernel.org>
To: George-Daniel Matei <danielgeorgem@chromium.org>
Cc: Heiner Kallweit <hkallweit1@gmail.com>,
Bjorn Helgaas <bhelgaas@google.com>,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
nic_swsd@realtek.com, netdev@vger.kernel.org
Subject: Re: [PATCH] PCI: r8169: add suspend/resume aspm quirk
Date: Wed, 10 Jul 2024 16:38:37 -0500 [thread overview]
Message-ID: <20240710213837.GA257340@bhelgaas> (raw)
In-Reply-To: <CACfW=qpNmSeQVG_qSeYpEdk9pf_RTAEEKp+OiBYrRFd3d6HOXg@mail.gmail.com>
On Wed, Jul 10, 2024 at 05:09:08PM +0200, George-Daniel Matei wrote:
> >> Added aspm suspend/resume hooks that run
> >> before and after suspend and resume to change
> >> the ASPM states of the PCI bus in order to allow
> >> the system suspend while trying to prevent card hangs
> >
> > Why is this needed? Is there a r8169 defect we're working around?
> > A BIOS defect? Is there a problem report you can reference here?
>
> We encountered this issue while upgrading from kernel v6.1 to v6.6.
> The system would not suspend with 6.6. We tracked down the problem to
> the NIC of the device, mainly that the following code was removed in
> 6.6:
>
> > else if (tp->mac_version >= RTL_GIGA_MAC_VER_46)
> > rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1_2);
>
> For the listed devices, ASPM L1 is disabled entirely in 6.6. As for
> the reason, L1 was observed to cause some problems
> (https://bugzilla.kernel.org/show_bug.cgi?id=217814). We use a Raptor
> Lake soc and it won't change residency if the NIC doesn't have L1
> enabled. I saw in 6.1 the following comment:
Can you verify that the problem still exists in a current kernel,
e.g., v6.9?
If this is a regression that's still present in v6.9, we need to
identify the commit that broke it. Maybe it's 90ca51e8c654 ("r8169:
fix ASPM-related issues on a number of systems with NIC version from
RTL8168h")?
> > Chips from RTL8168h partially have issues with L1.2, but seem
> > to work fine with L1 and L1.1.
>
> I was thinking that disabling/enabling L1.1 on the fly before/after
> suspend could help mitigate the risk associated with L1/L1.1 . I know
> that ASPM settings are exposed in sysfs and that this could be done
> from outside the kernel, that was my first approach, but it was
> suggested to me that this kind of workaround would be better suited
> for quirks. I did around 1000 suspend/resume cycles of 16-30 seconds
> each (correcting the resume dev->bus->self being configured twice
> mistake) and did not notice any problems. What do you think, is this a
> good approach ... ?
Whatever the problem is, it definitely should be fixed in the kernel,
and Ilpo is right that it *should* be done in the PCI core ASPM
support (aspm.c) or at least with interfaces it supplies.
Generally speaking, drivers should not need to touch ASPM at all
except to work around hardware defects in their device, but r8169 has
a long history of weird ASPM stuff. I dunno if that stuff is related
to hardware defects in the r8169 devices or if it is workarounds for
past or current defects in aspm.c.
> > This doesn't restore the state as it existed before suspend. Does
> > this rely on other parts of restore to do that?
>
> It operates on the assumption that after driver initialization
> PCI_EXP_LNKCTL_ASPMC is 0 and that there are no states enabled in
> CTL1. I did a lspci -vvv dump on the affected devices before and after
> the quirks ran and saw no difference. This could be improved.
Yep, we can't assume any of that because the PCI core owns ASPM
config, not the driver itself.
> > What's the root cause of the issue?
> > A silicon bug on the host side?
>
> I think it's the ASPM implementation of the soc.
As Heiner pointed out, if it's a SoC defect, it would potentially
affect all devices and a workaround would have to cover them all.
Side note: oops, quoting error below, see note about top-posting here:
https://people.kernel.org/tglx/notes-about-netiquette
> On Tue, Jul 9, 2024 at 12:15 AM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >
> > On 08.07.2024 19:23, Bjorn Helgaas wrote:
> > > [+cc r8169 folks]
> > >
> > > On Mon, Jul 08, 2024 at 03:38:15PM +0000, George-Daniel Matei wrote:
> > >> Added aspm suspend/resume hooks that run
> > >> before and after suspend and resume to change
> > >> the ASPM states of the PCI bus in order to allow
> > >> the system suspend while trying to prevent card hangs
> > >
> > > Why is this needed? Is there a r8169 defect we're working around?
> > > A BIOS defect? Is there a problem report you can reference here?
> ...
next prev parent reply other threads:[~2024-07-10 21:38 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20240708153815.2757367-1-danielgeorgem@chromium.org>
2024-07-08 17:23 ` [PATCH] PCI: r8169: add suspend/resume aspm quirk Bjorn Helgaas
2024-07-08 22:15 ` Heiner Kallweit
2024-07-10 15:09 ` George-Daniel Matei
2024-07-10 19:59 ` Heiner Kallweit
2024-07-17 17:12 ` George-Daniel Matei
2024-07-17 19:34 ` Heiner Kallweit
2024-07-10 21:38 ` Bjorn Helgaas [this message]
2024-07-25 12:56 ` George-Daniel Matei
2024-07-25 19:46 ` Heiner Kallweit
2024-07-11 5:45 ` Heiner Kallweit
2024-07-16 12:13 ` George-Daniel Matei
2024-07-16 19:25 ` Heiner Kallweit
2024-07-25 15:34 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240710213837.GA257340@bhelgaas \
--to=helgaas@kernel.org \
--cc=bhelgaas@google.com \
--cc=danielgeorgem@chromium.org \
--cc=hkallweit1@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=nic_swsd@realtek.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).