* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-14 18:49 ` Bjorn Helgaas
@ 2025-10-14 23:33 ` Dragan Simic
2025-10-15 6:22 ` Manivannan Sadhasivam
2025-10-15 6:26 ` Manivannan Sadhasivam
` (2 subsequent siblings)
3 siblings, 1 reply; 42+ messages in thread
From: Dragan Simic @ 2025-10-14 23:33 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: FUKAUMI Naoki, manivannan.sadhasivam, Bjorn Helgaas,
Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
Heiner Kallweit, Chia-Lin Kao, linux-rockchip, regressions
Hello all,
On Tuesday, October 14, 2025 20:49 CEST, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
> > I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
> > Rockchip RK3588(S) SoC.
> >
> > When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
> > freezes or fails to probe M.2 Wi-Fi modules. This happens with several
> > different modules I've tested, including the Realtek RTL8852BE, MediaTek
> > MT7921E, and Intel AX210.
> >
> > I've found that reverting the following commit (i.e., the patch I'm replying
> > to) resolves the problem:
> > commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
>
> Thanks for the report, and sorry for the regression.
>
> Since this affects several devices from different manufacturers and (I
> assume) different drivers, it seems likely that there's some issue
> with the Rockchip end, since ASPM probably works on these devices in
> other systems. So we should figure out if there's something wrong
> with the way we configure ASPM, which we could potentially fix, or if
> there's a hardware issue and we need some king of quirk to prevent
> usage of ASPM on the affected platforms.
>
> Can you collect a complete dmesg log when booting with
>
> ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
>
> and the output of "sudo lspci -vv"?
>
> When the kernel freezes, can you give us any information about where,
> e.g., a log or screenshot?
>
> Do you know if any platforms other than Radxa ROCK 5A/5B have this
> problem?
After thinking quite a bit about it, I think we should revert this
patch and replace it with another patch that allows per-SoC, or
maybe even per-board, opting into the forced enablement of PCIe
ASPM. Let me explain, please.
When a new feature is introduced, it's expected that it may fail
on some hardware or with some specific setups, so quirking off such
instances, as time passes, is perfectly fine. Such a new feature
didn't work before it was implemented, so it's acceptable that it
fails in some instances after the introduction, and that it gets
quirked off as time passes and more testing is performed.
However, when some widespread feature, such as PCIe, has already
been in production for quite a while, introducing high-risk changes
to it in a blanket fashion, while intending to have the incompatible
or not-yet-ready platforms quirked off over time, simply isn't the
way to go. Breaking stuff intentionally to find out what actually
doesn't work is rarely a good option.
Thus, I'd suggest that this patch is replaced with nother patches,
which would introduce an additional ASPM opt-in switch to the PCI
binding, allowing SoCs or boards to have it enabled _after_ proper
testing is performed. The PCIe driver may emit a warning that ASPM
is to be enabled at some point in the future, to "bug" people about
the need to perform the testing, etc. With all that in place, we
could expect that in a year or two PCIe ASPM could eventually be
enabled everywhere. Getting everything tested is a massive endeavor,
but that's the only way not to break stuff.
Biting the bullet and hoping that it all goes well, I'd say, isn't
the right approach here.
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-14 23:33 ` Dragan Simic
@ 2025-10-15 6:22 ` Manivannan Sadhasivam
2025-10-15 11:23 ` Diederik de Haas
2025-10-23 18:57 ` Dragan Simic
0 siblings, 2 replies; 42+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-15 6:22 UTC (permalink / raw)
To: Dragan Simic
Cc: Bjorn Helgaas, FUKAUMI Naoki, manivannan.sadhasivam,
Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
linux-rockchip, regressions
On Wed, Oct 15, 2025 at 01:33:35AM +0200, Dragan Simic wrote:
> Hello all,
>
> On Tuesday, October 14, 2025 20:49 CEST, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
> > > I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
> > > Rockchip RK3588(S) SoC.
> > >
> > > When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
> > > freezes or fails to probe M.2 Wi-Fi modules. This happens with several
> > > different modules I've tested, including the Realtek RTL8852BE, MediaTek
> > > MT7921E, and Intel AX210.
> > >
> > > I've found that reverting the following commit (i.e., the patch I'm replying
> > > to) resolves the problem:
> > > commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
> >
> > Thanks for the report, and sorry for the regression.
> >
> > Since this affects several devices from different manufacturers and (I
> > assume) different drivers, it seems likely that there's some issue
> > with the Rockchip end, since ASPM probably works on these devices in
> > other systems. So we should figure out if there's something wrong
> > with the way we configure ASPM, which we could potentially fix, or if
> > there's a hardware issue and we need some king of quirk to prevent
> > usage of ASPM on the affected platforms.
> >
> > Can you collect a complete dmesg log when booting with
> >
> > ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
> >
> > and the output of "sudo lspci -vv"?
> >
> > When the kernel freezes, can you give us any information about where,
> > e.g., a log or screenshot?
> >
> > Do you know if any platforms other than Radxa ROCK 5A/5B have this
> > problem?
>
> After thinking quite a bit about it, I think we should revert this
> patch and replace it with another patch that allows per-SoC, or
> maybe even per-board, opting into the forced enablement of PCIe
> ASPM. Let me explain, please.
>
ASPM is a PCIe device specific feature, nothing related to SoC/board. Even if
you limit it to certain platforms, there is no guarantee that it will be safe as
the users can connect a buggy device to the slot and it could lead to the same
issue.
> When a new feature is introduced, it's expected that it may fail
> on some hardware or with some specific setups, so quirking off such
> instances, as time passes, is perfectly fine. Such a new feature
> didn't work before it was implemented, so it's acceptable that it
> fails in some instances after the introduction, and that it gets
> quirked off as time passes and more testing is performed.
>
ASPM is not a new feature. It was introduced more than a decade before. But we
somehow procastinated the enablement for so long until we realized that if we
don't do it now, we wouldn't be able to do it anytime in the future.
> However, when some widespread feature, such as PCIe, has already
> been in production for quite a while, introducing high-risk changes
> to it in a blanket fashion, while intending to have the incompatible
> or not-yet-ready platforms quirked off over time, simply isn't the
> way to go. Breaking stuff intentionally to find out what actually
> doesn't work is rarely a good option.
>
The issue is due to devices exposing ASPM capability, but behaving erratically
when enabled. Until, we enable ASPM on these devices, we cannot know whether
they are working or not. To avoid mass chaos, we decided to enable it only for
devicetree platforms as a start.
> Thus, I'd suggest that this patch is replaced with nother patches,
> which would introduce an additional ASPM opt-in switch to the PCI
> binding, allowing SoCs or boards to have it enabled _after_ proper
> testing is performed. The PCIe driver may emit a warning that ASPM
> is to be enabled at some point in the future, to "bug" people about
> the need to perform the testing, etc.
Even if we emit a "YOUR DEVICE MAY BREAK" warning, nobody would care as long as
the device works for them. We didn't decide to enable this feature overnight to
trouble users. The fact that ASPM saves runtime power, which will benefit users
and ofc the environment as a whole, should not be kept disabled.
But does that mean, we wanted to have breakages, NO. We expected breakages as
not all devices will play nicely with ASPM, but there is only one way to find
out. And we do want to disable ASPM only for those devices.
> With all that in place, we
> could expect that in a year or two PCIe ASPM could eventually be
> enabled everywhere. Getting everything tested is a massive endeavor,
> but that's the only way not to break stuff.
>
> Biting the bullet and hoping that it all goes well, I'd say, isn't
> the right approach here.
>
Your two year phased approach would never work as that's what we have hoped for
more than a decade.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 6:22 ` Manivannan Sadhasivam
@ 2025-10-15 11:23 ` Diederik de Haas
2025-10-23 18:57 ` Dragan Simic
1 sibling, 0 replies; 42+ messages in thread
From: Diederik de Haas @ 2025-10-15 11:23 UTC (permalink / raw)
To: Manivannan Sadhasivam, Dragan Simic
Cc: Bjorn Helgaas, FUKAUMI Naoki, manivannan.sadhasivam,
Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
linux-rockchip, regressions
On Wed Oct 15, 2025 at 8:22 AM CEST, Manivannan Sadhasivam wrote:
> On Wed, Oct 15, 2025 at 01:33:35AM +0200, Dragan Simic wrote:
>> On Tuesday, October 14, 2025 20:49 CEST, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
>> > > I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
>> > > Rockchip RK3588(S) SoC.
>> > >
>> > > When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
>> > > freezes or fails to probe M.2 Wi-Fi modules. This happens with several
>> > > different modules I've tested, including the Realtek RTL8852BE, MediaTek
>> > > MT7921E, and Intel AX210.
>> > >
>> > > I've found that reverting the following commit (i.e., the patch I'm replying
>> > > to) resolves the problem:
>> > > commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
>> >
>> > <snip>
>> >
>> > Do you know if any platforms other than Radxa ROCK 5A/5B have this
>> > problem?
>> >
>> After thinking quite a bit about it, I think we should revert this
>> patch and replace it with another patch that allows per-SoC, or
>> maybe even per-board, opting into the forced enablement of PCIe
>> ASPM. Let me explain, please.
>
> ASPM is a PCIe device specific feature, nothing related to SoC/board. Even if
> you limit it to certain platforms, there is no guarantee that it will be safe as
> the users can connect a buggy device to the slot and it could lead to the same
> issue.
>
>> When a new feature is introduced, it's expected that it may fail
>> on some hardware or with some specific setups, so quirking off such
>> instances, as time passes, is perfectly fine. Such a new feature
>> didn't work before it was implemented, so it's acceptable that it
>> fails in some instances after the introduction, and that it gets
>> quirked off as time passes and more testing is performed.
>
> ASPM is not a new feature. It was introduced more than a decade before. But we
> somehow procastinated the enablement for so long until we realized that if we
> don't do it now, we wouldn't be able to do it anytime in the future.
Do you mean literally *now* or more like "we need to do it sometime"?
>> However, when some widespread feature, such as PCIe, has already
>> been in production for quite a while, introducing high-risk changes
>> to it in a blanket fashion, while intending to have the incompatible
>> or not-yet-ready platforms quirked off over time, simply isn't the
>> way to go. Breaking stuff intentionally to find out what actually
>> doesn't work is rarely a good option.
>
> The issue is due to devices exposing ASPM capability, but behaving erratically
> when enabled. Until, we enable ASPM on these devices, we cannot know whether
> they are working or not. To avoid mass chaos, we decided to enable it only for
> devicetree platforms as a start.
>
>> Thus, I'd suggest that this patch is replaced with nother patches,
>> which would introduce an additional ASPM opt-in switch to the PCI
>> binding, allowing SoCs or boards to have it enabled _after_ proper
>> testing is performed. The PCIe driver may emit a warning that ASPM
>> is to be enabled at some point in the future, to "bug" people about
>> the need to perform the testing, etc.
>
> Even if we emit a "YOUR DEVICE MAY BREAK" warning, nobody would care as long as
> the device works for them. We didn't decide to enable this feature overnight to
> trouble users. The fact that ASPM saves runtime power, which will benefit users
> and ofc the environment as a whole, should not be kept disabled.
>
> But does that mean, we wanted to have breakages, NO. We expected breakages as
> not all devices will play nicely with ASPM, but there is only one way to find
> out. And we do want to disable ASPM only for those devices.
I understand this logic. And I'm very much in favor of changes that
reduce power usage.
I suspect that 6.18 will become a LTS kernel, so introducing a change
which may break many devices, sounds less then ideal for such a kernel.
Kernel 6.19 OTOH sounds perfect for that. Then there's plenty of time to
encounter and fix issues which may/will come up before there is another
LTS kernel, namely ~ a year.
My 0.02.
Cheers,
Diederik
PS: will send my bug/debug report separately
>> With all that in place, we could expect that in a year or two PCIe ASPM
>> could eventually be enabled everywhere. Getting everything tested is a
>> massive endeavor, but that's the only way not to break stuff.
>>
>> Biting the bullet and hoping that it all goes well, I'd say, isn't
>> the right approach here.
>
> Your two year phased approach would never work as that's what we have hoped for
> more than a decade.
>
> - Mani
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 6:22 ` Manivannan Sadhasivam
2025-10-15 11:23 ` Diederik de Haas
@ 2025-10-23 18:57 ` Dragan Simic
1 sibling, 0 replies; 42+ messages in thread
From: Dragan Simic @ 2025-10-23 18:57 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Bjorn Helgaas, FUKAUMI Naoki, manivannan.sadhasivam,
Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
linux-rockchip, regressions, Shawn Lin, Diederik de Haas,
Heiko Stuebner
Hello Manivannan,
On Wednesday, October 15, 2025 08:22 CEST, Manivannan Sadhasivam <mani@kernel.org> wrote:
> On Wed, Oct 15, 2025 at 01:33:35AM +0200, Dragan Simic wrote:
> > On Tuesday, October 14, 2025 20:49 CEST, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
> > > > I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
> > > > Rockchip RK3588(S) SoC.
> > > >
> > > > When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
> > > > freezes or fails to probe M.2 Wi-Fi modules. This happens with several
> > > > different modules I've tested, including the Realtek RTL8852BE, MediaTek
> > > > MT7921E, and Intel AX210.
> > > >
> > > > I've found that reverting the following commit (i.e., the patch I'm replying
> > > > to) resolves the problem:
> > > > commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
> > >
> > > Thanks for the report, and sorry for the regression.
> > >
> > > Since this affects several devices from different manufacturers and (I
> > > assume) different drivers, it seems likely that there's some issue
> > > with the Rockchip end, since ASPM probably works on these devices in
> > > other systems. So we should figure out if there's something wrong
> > > with the way we configure ASPM, which we could potentially fix, or if
> > > there's a hardware issue and we need some king of quirk to prevent
> > > usage of ASPM on the affected platforms.
> > >
> > > Can you collect a complete dmesg log when booting with
> > >
> > > ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
> > >
> > > and the output of "sudo lspci -vv"?
> > >
> > > When the kernel freezes, can you give us any information about where,
> > > e.g., a log or screenshot?
> > >
> > > Do you know if any platforms other than Radxa ROCK 5A/5B have this
> > > problem?
> >
> > After thinking quite a bit about it, I think we should revert this
> > patch and replace it with another patch that allows per-SoC, or
> > maybe even per-board, opting into the forced enablement of PCIe
> > ASPM. Let me explain, please.
>
> ASPM is a PCIe device specific feature, nothing related to SoC/board. Even if
> you limit it to certain platforms, there is no guarantee that it will be safe as
> the users can connect a buggy device to the slot and it could lead to the same
> issue.
I'm hoping that it's clear by now that the theory and practice
actually diverge in this case, requiring certain level of support
for different SoCs and boards, which makes ASPM more than just
a device feature that's simply expected to work by default.
Dealing with buggy PCIe devices is, of course, another part of
the entire endeavor.
> > When a new feature is introduced, it's expected that it may fail
> > on some hardware or with some specific setups, so quirking off such
> > instances, as time passes, is perfectly fine. Such a new feature
> > didn't work before it was implemented, so it's acceptable that it
> > fails in some instances after the introduction, and that it gets
> > quirked off as time passes and more testing is performed.
>
> ASPM is not a new feature. It was introduced more than a decade before. But we
> somehow procastinated the enablement for so long until we realized that if we
> don't do it now, we wouldn't be able to do it anytime in the future.
To clarify, I referred to PCIe as the already established feature,
which evidently became broken on some SoCs and boards with the
initial blanket enabling of ASPM.
> > However, when some widespread feature, such as PCIe, has already
> > been in production for quite a while, introducing high-risk changes
> > to it in a blanket fashion, while intending to have the incompatible
> > or not-yet-ready platforms quirked off over time, simply isn't the
> > way to go. Breaking stuff intentionally to find out what actually
> > doesn't work is rarely a good option.
>
> The issue is due to devices exposing ASPM capability, but behaving erratically
> when enabled. Until, we enable ASPM on these devices, we cannot know whether
> they are working or not. To avoid mass chaos, we decided to enable it only for
> devicetree platforms as a start.
As mentioned above, dealing with buggy PCIe devices is something
to be covered separately, and wasn't the subject of my complaint.
I referred to the SoC and board specifics.
> > Thus, I'd suggest that this patch is replaced with nother patches,
> > which would introduce an additional ASPM opt-in switch to the PCI
> > binding, allowing SoCs or boards to have it enabled _after_ proper
> > testing is performed. The PCIe driver may emit a warning that ASPM
> > is to be enabled at some point in the future, to "bug" people about
> > the need to perform the testing, etc.
>
> Even if we emit a "YOUR DEVICE MAY BREAK" warning, nobody would care as long as
> the device works for them. We didn't decide to enable this feature overnight to
> trouble users. The fact that ASPM saves runtime power, which will benefit users
> and ofc the environment as a whole, should not be kept disabled.
>
> But does that mean, we wanted to have breakages, NO. We expected breakages as
> not all devices will play nicely with ASPM, but there is only one way to find
> out. And we do want to disable ASPM only for those devices.
I see, and it has turned out rather well in the end. I totally
understand the frustration that may come out of people not caring
for something until it actually breaks, but as you can see, the
Rockchip community, including people from Rockchip itself, actually
cares quite a lot and is willing to help in different ways.
> > With all that in place, we
> > could expect that in a year or two PCIe ASPM could eventually be
> > enabled everywhere. Getting everything tested is a massive endeavor,
> > but that's the only way not to break stuff.
> >
> > Biting the bullet and hoping that it all goes well, I'd say, isn't
> > the right approach here.
>
> Your two year phased approach would never work as that's what
> we have hoped for more than a decade.
Totally understood, but again, perhaps you haven't knocked at the
right doors. The Rockchip community cares, for example. :)
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-14 18:49 ` Bjorn Helgaas
2025-10-14 23:33 ` Dragan Simic
@ 2025-10-15 6:26 ` Manivannan Sadhasivam
2025-10-15 7:13 ` FUKAUMI Naoki
2025-10-15 12:26 ` Diederik de Haas
2025-10-30 22:14 ` Bjorn Helgaas
3 siblings, 1 reply; 42+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-15 6:26 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: FUKAUMI Naoki, manivannan.sadhasivam, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions
On Tue, Oct 14, 2025 at 01:49:05PM -0500, Bjorn Helgaas wrote:
> [+cc regressions]
>
> On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
> > Hi Manivannan Sadhasivam,
> >
> > I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
> > Rockchip RK3588(S) SoC.
> >
> > When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
> > freezes or fails to probe M.2 Wi-Fi modules. This happens with several
> > different modules I've tested, including the Realtek RTL8852BE, MediaTek
> > MT7921E, and Intel AX210.
> >
> > I've found that reverting the following commit (i.e., the patch I'm replying
> > to) resolves the problem:
> > commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
>
> Thanks for the report, and sorry for the regression.
>
> Since this affects several devices from different manufacturers and (I
> assume) different drivers, it seems likely that there's some issue
> with the Rockchip end, since ASPM probably works on these devices in
> other systems. So we should figure out if there's something wrong
> with the way we configure ASPM, which we could potentially fix, or if
> there's a hardware issue and we need some king of quirk to prevent
> usage of ASPM on the affected platforms.
>
I believe it is the latter. The Root Port is having trouble with ASPM.
FUKAUMI Naoki, could you please share the 'sudo lspci -vv' output so that we
know what kind of Root Port we are dealing with? You can revert the offending
patch and share the output.
Then I could supply a patch that you can test out.
- Mani
> Can you collect a complete dmesg log when booting with
>
> ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
>
> and the output of "sudo lspci -vv"?
>
> When the kernel freezes, can you give us any information about where,
> e.g., a log or screenshot?
>
> Do you know if any platforms other than Radxa ROCK 5A/5B have this
> problem?
>
> #regzbot introduced: f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
> #regzbot dup-of: https://github.com/chzigotzky/kernels/issues/17
>
> > On 9/23/25 01:16, Manivannan Sadhasivam via B4 Relay wrote:
> > > From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> > >
> > > So far, the PCI subsystem has honored the ASPM and Clock PM states set by
> > > the BIOS (through LNKCTL) during device initialization, if it relies on the
> > > default state selected using:
> > >
> > > * Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
> > > * cmdline: "pcie_aspm=off", or
> > > * FADT: ACPI_FADT_NO_ASPM
> > >
> > > This was done conservatively to avoid issues with the buggy devices that
> > > advertise ASPM capabilities, but behave erratically if the ASPM states are
> > > enabled. So the PCI subsystem ended up trusting the BIOS to enable only the
> > > ASPM states that were known to work for the devices.
> > >
> > > But this turned out to be a problem for devicetree platforms, especially
> > > the ARM based devicetree platforms powering Embedded and *some* Compute
> > > devices as they tend to run without any standard BIOS. So the ASPM states
> > > on these platforms were left disabled during boot and the PCI subsystem
> > > never bothered to enable them, unless the user has forcefully enabled the
> > > ASPM states through Kconfig, cmdline, and sysfs or the device drivers
> > > themselves, enabling the ASPM states through pci_enable_link_state() APIs.
> > >
> > > This caused runtime power issues on those platforms. So a couple of
> > > approaches were tried to mitigate this BIOS dependency without user
> > > intervention by enabling the ASPM states in the PCI controller drivers
> > > after device enumeration, and overriding the ASPM/Clock PM states
> > > by the PCI controller drivers through an API before enumeration.
> > >
> > > But it has been concluded that none of these mitigations should really be
> > > required and the PCI subsystem should enable the ASPM states advertised by
> > > the devices without relying on BIOS or the PCI controller drivers. If any
> > > device is found to be misbehaving after enabling ASPM states that they
> > > advertised, then those devices should be quirked to disable the problematic
> > > ASPM/Clock PM states.
> > >
> > > In an effort to do so, start by overriding the ASPM and Clock PM states set
> > > by the BIOS for devicetree platforms first. Separate helper functions are
> > > introduced to override the BIOS set states by enabling all of them if
> > > of_have_populated_dt() returns true. To aid debugging, print the overridden
> > > ASPM and Clock PM states as well.
> > >
> > > In the future, these helpers could be extended to allow other platforms
> > > like VMD, newer ACPI systems with a cutoff year etc... to follow the path.
> > >
> > > Link: https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
> > > Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
> > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> > > Link: https://patch.msgid.link/20250916-pci-dt-aspm-v1-1-778fe907c9ad@oss.qualcomm.com
> > > ---
> > > drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++++++++++--
> > > 1 file changed, 40 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index 919a05b9764791c3cc469c9ada62ba5b2c405118..cda31150aec1b67b6a48b60569222ea3d1c3d41f 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -15,6 +15,7 @@
> > > #include <linux/math.h>
> > > #include <linux/module.h>
> > > #include <linux/moduleparam.h>
> > > +#include <linux/of.h>
> > > #include <linux/pci.h>
> > > #include <linux/pci_regs.h>
> > > #include <linux/errno.h>
> > > @@ -235,13 +236,15 @@ struct pcie_link_state {
> > > u32 aspm_support:7; /* Supported ASPM state */
> > > u32 aspm_enabled:7; /* Enabled ASPM state */
> > > u32 aspm_capable:7; /* Capable ASPM state with latency */
> > > - u32 aspm_default:7; /* Default ASPM state by BIOS */
> > > + u32 aspm_default:7; /* Default ASPM state by BIOS or
> > > + override */
> > > u32 aspm_disable:7; /* Disabled ASPM state */
> > > /* Clock PM state */
> > > u32 clkpm_capable:1; /* Clock PM capable? */
> > > u32 clkpm_enabled:1; /* Current Clock PM state */
> > > - u32 clkpm_default:1; /* Default Clock PM state by BIOS */
> > > + u32 clkpm_default:1; /* Default Clock PM state by BIOS or
> > > + override */
> > > u32 clkpm_disable:1; /* Clock PM disabled */
> > > };
> > > @@ -373,6 +376,18 @@ static void pcie_set_clkpm(struct pcie_link_state *link, int enable)
> > > pcie_set_clkpm_nocheck(link, enable);
> > > }
> > > +static void pcie_clkpm_override_default_link_state(struct pcie_link_state *link,
> > > + int enabled)
> > > +{
> > > + struct pci_dev *pdev = link->downstream;
> > > +
> > > + /* Override the BIOS disabled Clock PM state for devicetree platforms */
> > > + if (of_have_populated_dt() && !enabled) {
> > > + link->clkpm_default = 1;
> > > + pci_info(pdev, "Clock PM state overridden: ClockPM+\n");
> > > + }
> > > +}
> > > +
> > > static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
> > > {
> > > int capable = 1, enabled = 1;
> > > @@ -395,6 +410,7 @@ static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
> > > }
> > > link->clkpm_enabled = enabled;
> > > link->clkpm_default = enabled;
> > > + pcie_clkpm_override_default_link_state(link, enabled);
> > > link->clkpm_capable = capable;
> > > link->clkpm_disable = blacklist ? 1 : 0;
> > > }
> > > @@ -788,6 +804,26 @@ static void aspm_l1ss_init(struct pcie_link_state *link)
> > > aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
> > > }
> > > +static void pcie_aspm_override_default_link_state(struct pcie_link_state *link)
> > > +{
> > > + struct pci_dev *pdev = link->downstream;
> > > + u32 override;
> > > +
> > > + /* Override the BIOS disabled ASPM states for devicetree platforms */
> > > + if (of_have_populated_dt()) {
> > > + link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
> > > + override = link->aspm_default & ~link->aspm_enabled;
> > > + if (override)
> > > + pci_info(pdev, "ASPM states overridden: %s%s%s%s%s%s\n",
> > > + (override & PCIE_LINK_STATE_L0S) ? "L0s+, " : "",
> > > + (override & PCIE_LINK_STATE_L1) ? "L1+, " : "",
> > > + (override & PCIE_LINK_STATE_L1_1) ? "L1.1+, " : "",
> > > + (override & PCIE_LINK_STATE_L1_2) ? "L1.2+, " : "",
> > > + (override & PCIE_LINK_STATE_L1_1_PCIPM) ? "L1.1 PCI-PM+, " : "",
> > > + (override & PCIE_LINK_STATE_L1_2_PCIPM) ? "L1.2 PCI-PM+" : "");
> > > + }
> > > +}
> > > +
> > > static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
> > > {
> > > struct pci_dev *child = link->downstream, *parent = link->pdev;
> > > @@ -868,6 +904,8 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
> > > /* Save default state */
> > > link->aspm_default = link->aspm_enabled;
> > > + pcie_aspm_override_default_link_state(link);
> > > +
> > > /* Setup initial capable state. Will be updated later */
> > > link->aspm_capable = link->aspm_support;
> > >
> >
> >
> > _______________________________________________
> > Linux-rockchip mailing list
> > Linux-rockchip@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-rockchip
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 6:26 ` Manivannan Sadhasivam
@ 2025-10-15 7:13 ` FUKAUMI Naoki
2025-10-15 7:50 ` Manivannan Sadhasivam
0 siblings, 1 reply; 42+ messages in thread
From: FUKAUMI Naoki @ 2025-10-15 7:13 UTC (permalink / raw)
To: Manivannan Sadhasivam, Bjorn Helgaas
Cc: manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
Heiner Kallweit, Chia-Lin Kao, Dragan Simic, linux-rockchip,
regressions
Hi,
On 10/15/25 15:26, Manivannan Sadhasivam wrote:
> On Tue, Oct 14, 2025 at 01:49:05PM -0500, Bjorn Helgaas wrote:
>> [+cc regressions]
>>
>> On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
>>> Hi Manivannan Sadhasivam,
>>>
>>> I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
>>> Rockchip RK3588(S) SoC.
>>>
>>> When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
>>> freezes or fails to probe M.2 Wi-Fi modules. This happens with several
>>> different modules I've tested, including the Realtek RTL8852BE, MediaTek
>>> MT7921E, and Intel AX210.
>>>
>>> I've found that reverting the following commit (i.e., the patch I'm replying
>>> to) resolves the problem:
>>> commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
>>
>> Thanks for the report, and sorry for the regression.
>>
>> Since this affects several devices from different manufacturers and (I
>> assume) different drivers, it seems likely that there's some issue
>> with the Rockchip end, since ASPM probably works on these devices in
>> other systems. So we should figure out if there's something wrong
>> with the way we configure ASPM, which we could potentially fix, or if
>> there's a hardware issue and we need some king of quirk to prevent
>> usage of ASPM on the affected platforms.
>>
>
> I believe it is the latter. The Root Port is having trouble with ASPM.
>
> FUKAUMI Naoki, could you please share the 'sudo lspci -vv' output so that we
> know what kind of Root Port we are dealing with? You can revert the offending
> patch and share the output.
Here is dmesg/lspci output on ROCK 5A(RK3588S):
https://gist.github.com/RadxaNaoki/1355a0b4278b6e51a61d89df7a535a5d
----
I've (likely) noticed another PCIe issue on the ROCK 5B (RK3588).
Reverting commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961 on top of
commit 331b2acbe6123dbbcb5c34698692959abbf5748b definitely gets the M.2
Wi-Fi working. However, reverting commit
f3ac2ff14834a0aa056ee3ae0e4b8c641c579961 on v6.18-rc1/next-20251014 does
not work. (It does work on the ROCK 5A.)
Commit 331b2acbe6123dbbcb5c34698692959abbf5748b landed between
next-20250923 and next-20250924. I'm currently in the process of bisecting.
Best regards,
--
FUKAUMI Naoki
Radxa Computer (Shenzhen) Co., Ltd.
> Then I could supply a patch that you can test out.
>
> - Mani
>
>> Can you collect a complete dmesg log when booting with
>>
>> ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
>>
>> and the output of "sudo lspci -vv"?
>>
>> When the kernel freezes, can you give us any information about where,
>> e.g., a log or screenshot?
>>
>> Do you know if any platforms other than Radxa ROCK 5A/5B have this
>> problem?
>>
>> #regzbot introduced: f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
>> #regzbot dup-of: https://github.com/chzigotzky/kernels/issues/17
>>
>>> On 9/23/25 01:16, Manivannan Sadhasivam via B4 Relay wrote:
>>>> From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
>>>>
>>>> So far, the PCI subsystem has honored the ASPM and Clock PM states set by
>>>> the BIOS (through LNKCTL) during device initialization, if it relies on the
>>>> default state selected using:
>>>>
>>>> * Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
>>>> * cmdline: "pcie_aspm=off", or
>>>> * FADT: ACPI_FADT_NO_ASPM
>>>>
>>>> This was done conservatively to avoid issues with the buggy devices that
>>>> advertise ASPM capabilities, but behave erratically if the ASPM states are
>>>> enabled. So the PCI subsystem ended up trusting the BIOS to enable only the
>>>> ASPM states that were known to work for the devices.
>>>>
>>>> But this turned out to be a problem for devicetree platforms, especially
>>>> the ARM based devicetree platforms powering Embedded and *some* Compute
>>>> devices as they tend to run without any standard BIOS. So the ASPM states
>>>> on these platforms were left disabled during boot and the PCI subsystem
>>>> never bothered to enable them, unless the user has forcefully enabled the
>>>> ASPM states through Kconfig, cmdline, and sysfs or the device drivers
>>>> themselves, enabling the ASPM states through pci_enable_link_state() APIs.
>>>>
>>>> This caused runtime power issues on those platforms. So a couple of
>>>> approaches were tried to mitigate this BIOS dependency without user
>>>> intervention by enabling the ASPM states in the PCI controller drivers
>>>> after device enumeration, and overriding the ASPM/Clock PM states
>>>> by the PCI controller drivers through an API before enumeration.
>>>>
>>>> But it has been concluded that none of these mitigations should really be
>>>> required and the PCI subsystem should enable the ASPM states advertised by
>>>> the devices without relying on BIOS or the PCI controller drivers. If any
>>>> device is found to be misbehaving after enabling ASPM states that they
>>>> advertised, then those devices should be quirked to disable the problematic
>>>> ASPM/Clock PM states.
>>>>
>>>> In an effort to do so, start by overriding the ASPM and Clock PM states set
>>>> by the BIOS for devicetree platforms first. Separate helper functions are
>>>> introduced to override the BIOS set states by enabling all of them if
>>>> of_have_populated_dt() returns true. To aid debugging, print the overridden
>>>> ASPM and Clock PM states as well.
>>>>
>>>> In the future, these helpers could be extended to allow other platforms
>>>> like VMD, newer ACPI systems with a cutoff year etc... to follow the path.
>>>>
>>>> Link: https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
>>>> Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
>>>> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
>>>> Link: https://patch.msgid.link/20250916-pci-dt-aspm-v1-1-778fe907c9ad@oss.qualcomm.com
>>>> ---
>>>> drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>>>> 1 file changed, 40 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>>> index 919a05b9764791c3cc469c9ada62ba5b2c405118..cda31150aec1b67b6a48b60569222ea3d1c3d41f 100644
>>>> --- a/drivers/pci/pcie/aspm.c
>>>> +++ b/drivers/pci/pcie/aspm.c
>>>> @@ -15,6 +15,7 @@
>>>> #include <linux/math.h>
>>>> #include <linux/module.h>
>>>> #include <linux/moduleparam.h>
>>>> +#include <linux/of.h>
>>>> #include <linux/pci.h>
>>>> #include <linux/pci_regs.h>
>>>> #include <linux/errno.h>
>>>> @@ -235,13 +236,15 @@ struct pcie_link_state {
>>>> u32 aspm_support:7; /* Supported ASPM state */
>>>> u32 aspm_enabled:7; /* Enabled ASPM state */
>>>> u32 aspm_capable:7; /* Capable ASPM state with latency */
>>>> - u32 aspm_default:7; /* Default ASPM state by BIOS */
>>>> + u32 aspm_default:7; /* Default ASPM state by BIOS or
>>>> + override */
>>>> u32 aspm_disable:7; /* Disabled ASPM state */
>>>> /* Clock PM state */
>>>> u32 clkpm_capable:1; /* Clock PM capable? */
>>>> u32 clkpm_enabled:1; /* Current Clock PM state */
>>>> - u32 clkpm_default:1; /* Default Clock PM state by BIOS */
>>>> + u32 clkpm_default:1; /* Default Clock PM state by BIOS or
>>>> + override */
>>>> u32 clkpm_disable:1; /* Clock PM disabled */
>>>> };
>>>> @@ -373,6 +376,18 @@ static void pcie_set_clkpm(struct pcie_link_state *link, int enable)
>>>> pcie_set_clkpm_nocheck(link, enable);
>>>> }
>>>> +static void pcie_clkpm_override_default_link_state(struct pcie_link_state *link,
>>>> + int enabled)
>>>> +{
>>>> + struct pci_dev *pdev = link->downstream;
>>>> +
>>>> + /* Override the BIOS disabled Clock PM state for devicetree platforms */
>>>> + if (of_have_populated_dt() && !enabled) {
>>>> + link->clkpm_default = 1;
>>>> + pci_info(pdev, "Clock PM state overridden: ClockPM+\n");
>>>> + }
>>>> +}
>>>> +
>>>> static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
>>>> {
>>>> int capable = 1, enabled = 1;
>>>> @@ -395,6 +410,7 @@ static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
>>>> }
>>>> link->clkpm_enabled = enabled;
>>>> link->clkpm_default = enabled;
>>>> + pcie_clkpm_override_default_link_state(link, enabled);
>>>> link->clkpm_capable = capable;
>>>> link->clkpm_disable = blacklist ? 1 : 0;
>>>> }
>>>> @@ -788,6 +804,26 @@ static void aspm_l1ss_init(struct pcie_link_state *link)
>>>> aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
>>>> }
>>>> +static void pcie_aspm_override_default_link_state(struct pcie_link_state *link)
>>>> +{
>>>> + struct pci_dev *pdev = link->downstream;
>>>> + u32 override;
>>>> +
>>>> + /* Override the BIOS disabled ASPM states for devicetree platforms */
>>>> + if (of_have_populated_dt()) {
>>>> + link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
>>>> + override = link->aspm_default & ~link->aspm_enabled;
>>>> + if (override)
>>>> + pci_info(pdev, "ASPM states overridden: %s%s%s%s%s%s\n",
>>>> + (override & PCIE_LINK_STATE_L0S) ? "L0s+, " : "",
>>>> + (override & PCIE_LINK_STATE_L1) ? "L1+, " : "",
>>>> + (override & PCIE_LINK_STATE_L1_1) ? "L1.1+, " : "",
>>>> + (override & PCIE_LINK_STATE_L1_2) ? "L1.2+, " : "",
>>>> + (override & PCIE_LINK_STATE_L1_1_PCIPM) ? "L1.1 PCI-PM+, " : "",
>>>> + (override & PCIE_LINK_STATE_L1_2_PCIPM) ? "L1.2 PCI-PM+" : "");
>>>> + }
>>>> +}
>>>> +
>>>> static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
>>>> {
>>>> struct pci_dev *child = link->downstream, *parent = link->pdev;
>>>> @@ -868,6 +904,8 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
>>>> /* Save default state */
>>>> link->aspm_default = link->aspm_enabled;
>>>> + pcie_aspm_override_default_link_state(link);
>>>> +
>>>> /* Setup initial capable state. Will be updated later */
>>>> link->aspm_capable = link->aspm_support;
>>>>
>>>
>>>
>>> _______________________________________________
>>> Linux-rockchip mailing list
>>> Linux-rockchip@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-rockchip
>
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 7:13 ` FUKAUMI Naoki
@ 2025-10-15 7:50 ` Manivannan Sadhasivam
2025-10-15 9:11 ` Shawn Lin
0 siblings, 1 reply; 42+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-15 7:50 UTC (permalink / raw)
To: FUKAUMI Naoki
Cc: Bjorn Helgaas, manivannan.sadhasivam, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions
On Wed, Oct 15, 2025 at 04:13:41PM +0900, FUKAUMI Naoki wrote:
> Hi,
>
> On 10/15/25 15:26, Manivannan Sadhasivam wrote:
> > On Tue, Oct 14, 2025 at 01:49:05PM -0500, Bjorn Helgaas wrote:
> > > [+cc regressions]
> > >
> > > On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
> > > > Hi Manivannan Sadhasivam,
> > > >
> > > > I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
> > > > Rockchip RK3588(S) SoC.
> > > >
> > > > When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
> > > > freezes or fails to probe M.2 Wi-Fi modules. This happens with several
> > > > different modules I've tested, including the Realtek RTL8852BE, MediaTek
> > > > MT7921E, and Intel AX210.
> > > >
> > > > I've found that reverting the following commit (i.e., the patch I'm replying
> > > > to) resolves the problem:
> > > > commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
> > >
> > > Thanks for the report, and sorry for the regression.
> > >
> > > Since this affects several devices from different manufacturers and (I
> > > assume) different drivers, it seems likely that there's some issue
> > > with the Rockchip end, since ASPM probably works on these devices in
> > > other systems. So we should figure out if there's something wrong
> > > with the way we configure ASPM, which we could potentially fix, or if
> > > there's a hardware issue and we need some king of quirk to prevent
> > > usage of ASPM on the affected platforms.
> > >
> >
> > I believe it is the latter. The Root Port is having trouble with ASPM.
> >
> > FUKAUMI Naoki, could you please share the 'sudo lspci -vv' output so that we
> > know what kind of Root Port we are dealing with? You can revert the offending
> > patch and share the output.
>
> Here is dmesg/lspci output on ROCK 5A(RK3588S):
> https://gist.github.com/RadxaNaoki/1355a0b4278b6e51a61d89df7a535a5d
>
Thanks! Could you please try the below diff with f3ac2ff14834 applied?
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 214ed060ca1b..0069d06c282d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2525,6 +2525,15 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
*/
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
+
+static void quirk_disable_aspm_all(struct pci_dev *dev)
+{
+ pci_info(dev, "Disabling ASPM\n");
+ pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ROCKCHIP, 0x3588, quirk_disable_aspm_all);
+
/*
* Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain
* Link bit cleared after starting the link retrain process to allow this
From your previous comment, I believe the Root Port is having the issues with
ASPM as you seem to have tried connecting different devices to the slot. So I
disabled ASPM for the Root Port with the above diff.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply related [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 7:50 ` Manivannan Sadhasivam
@ 2025-10-15 9:11 ` Shawn Lin
2025-10-15 9:43 ` Manivannan Sadhasivam
2025-10-15 9:46 ` Niklas Cassel
0 siblings, 2 replies; 42+ messages in thread
From: Shawn Lin @ 2025-10-15 9:11 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: shawn.lin, Bjorn Helgaas, manivannan.sadhasivam, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
Hi Mani
在 2025/10/15 星期三 15:50, Manivannan Sadhasivam 写道:
> On Wed, Oct 15, 2025 at 04:13:41PM +0900, FUKAUMI Naoki wrote:
>> Hi,
>>
>> On 10/15/25 15:26, Manivannan Sadhasivam wrote:
>>> On Tue, Oct 14, 2025 at 01:49:05PM -0500, Bjorn Helgaas wrote:
>>>> [+cc regressions]
>>>>
>>>> On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
>>>>> Hi Manivannan Sadhasivam,
>>>>>
>>>>> I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
>>>>> Rockchip RK3588(S) SoC.
>>>>>
>>>>> When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
>>>>> freezes or fails to probe M.2 Wi-Fi modules. This happens with several
>>>>> different modules I've tested, including the Realtek RTL8852BE, MediaTek
>>>>> MT7921E, and Intel AX210.
>>>>>
>>>>> I've found that reverting the following commit (i.e., the patch I'm replying
>>>>> to) resolves the problem:
>>>>> commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
>>>>
>>>> Thanks for the report, and sorry for the regression.
>>>>
>>>> Since this affects several devices from different manufacturers and (I
>>>> assume) different drivers, it seems likely that there's some issue
>>>> with the Rockchip end, since ASPM probably works on these devices in
>>>> other systems. So we should figure out if there's something wrong
>>>> with the way we configure ASPM, which we could potentially fix, or if
>>>> there's a hardware issue and we need some king of quirk to prevent
>>>> usage of ASPM on the affected platforms.
>>>>
>>>
>>> I believe it is the latter. The Root Port is having trouble with ASPM.
>>>
>>> FUKAUMI Naoki, could you please share the 'sudo lspci -vv' output so that we
>>> know what kind of Root Port we are dealing with? You can revert the offending
>>> patch and share the output.
>>
>> Here is dmesg/lspci output on ROCK 5A(RK3588S):
>> https://gist.github.com/RadxaNaoki/1355a0b4278b6e51a61d89df7a535a5d
>>
>
> Thanks! Could you please try the below diff with f3ac2ff14834 applied?
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 214ed060ca1b..0069d06c282d 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2525,6 +2525,15 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
> */
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
>
> +
> +static void quirk_disable_aspm_all(struct pci_dev *dev)
> +{
> + pci_info(dev, "Disabling ASPM\n");
> + pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
> +}
> +
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ROCKCHIP, 0x3588, quirk_disable_aspm_all);
That's not true from my POV. Rockchip platform supports all ASPM policy
after mass production verification. I also verified current upstream
code this morning with RK3588-EVB and can check L0s/L1/L1ss work fine.
The log and lspci output could be found here:
https://pastebin.com/qizeYED7
Moreover, I disscussed this issue with FUKAUMI today off-list and his
board seems to work when only disable L1ss by patching:
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -813,7 +813,7 @@ static void
pcie_aspm_override_default_link_state(struct pcie_link_state *link)
/* For devicetree platforms, enable all ASPM states by default */
if (of_have_populated_dt()) {
- link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
+ link->aspm_default = PCIE_LINK_STATE_L0S |
PCIE_LINK_STATE_L1;
override = link->aspm_default & ~link->aspm_enabled;
if (override)
pci_info(pdev, "ASPM: DT platform,
So, is there a proper way to just disable this feature for spec boards
instead of this Soc?
> +
> /*
> * Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain
> * Link bit cleared after starting the link retrain process to allow this
>
>
> From your previous comment, I believe the Root Port is having the issues with
> ASPM as you seem to have tried connecting different devices to the slot. So I
> disabled ASPM for the Root Port with the above diff.
>
> - Mani
>
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 9:11 ` Shawn Lin
@ 2025-10-15 9:43 ` Manivannan Sadhasivam
2025-10-15 9:46 ` Niklas Cassel
1 sibling, 0 replies; 42+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-15 9:43 UTC (permalink / raw)
To: Shawn Lin
Cc: Bjorn Helgaas, manivannan.sadhasivam, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Wed, Oct 15, 2025 at 05:11:39PM +0800, Shawn Lin wrote:
> Hi Mani
>
> 在 2025/10/15 星期三 15:50, Manivannan Sadhasivam 写道:
> > On Wed, Oct 15, 2025 at 04:13:41PM +0900, FUKAUMI Naoki wrote:
> > > Hi,
> > >
> > > On 10/15/25 15:26, Manivannan Sadhasivam wrote:
> > > > On Tue, Oct 14, 2025 at 01:49:05PM -0500, Bjorn Helgaas wrote:
> > > > > [+cc regressions]
> > > > >
> > > > > On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
> > > > > > Hi Manivannan Sadhasivam,
> > > > > >
> > > > > > I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
> > > > > > Rockchip RK3588(S) SoC.
> > > > > >
> > > > > > When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
> > > > > > freezes or fails to probe M.2 Wi-Fi modules. This happens with several
> > > > > > different modules I've tested, including the Realtek RTL8852BE, MediaTek
> > > > > > MT7921E, and Intel AX210.
> > > > > >
> > > > > > I've found that reverting the following commit (i.e., the patch I'm replying
> > > > > > to) resolves the problem:
> > > > > > commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
> > > > >
> > > > > Thanks for the report, and sorry for the regression.
> > > > >
> > > > > Since this affects several devices from different manufacturers and (I
> > > > > assume) different drivers, it seems likely that there's some issue
> > > > > with the Rockchip end, since ASPM probably works on these devices in
> > > > > other systems. So we should figure out if there's something wrong
> > > > > with the way we configure ASPM, which we could potentially fix, or if
> > > > > there's a hardware issue and we need some king of quirk to prevent
> > > > > usage of ASPM on the affected platforms.
> > > > >
> > > >
> > > > I believe it is the latter. The Root Port is having trouble with ASPM.
> > > >
> > > > FUKAUMI Naoki, could you please share the 'sudo lspci -vv' output so that we
> > > > know what kind of Root Port we are dealing with? You can revert the offending
> > > > patch and share the output.
> > >
> > > Here is dmesg/lspci output on ROCK 5A(RK3588S):
> > > https://gist.github.com/RadxaNaoki/1355a0b4278b6e51a61d89df7a535a5d
> > >
> >
> > Thanks! Could you please try the below diff with f3ac2ff14834 applied?
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 214ed060ca1b..0069d06c282d 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -2525,6 +2525,15 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
> > */
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
> >
> > +
> > +static void quirk_disable_aspm_all(struct pci_dev *dev)
> > +{
> > + pci_info(dev, "Disabling ASPM\n");
> > + pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
> > +}
> > +
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ROCKCHIP, 0x3588, quirk_disable_aspm_all);
>
> That's not true from my POV. Rockchip platform supports all ASPM policy
> after mass production verification. I also verified current upstream
> code this morning with RK3588-EVB and can check L0s/L1/L1ss work fine.
>
> The log and lspci output could be found here:
> https://pastebin.com/qizeYED7
>
> Moreover, I disscussed this issue with FUKAUMI today off-list and his
> board seems to work when only disable L1ss by patching:
>
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -813,7 +813,7 @@ static void pcie_aspm_override_default_link_state(struct
> pcie_link_state *link)
>
> /* For devicetree platforms, enable all ASPM states by default */
> if (of_have_populated_dt()) {
> - link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
> + link->aspm_default = PCIE_LINK_STATE_L0S |
> PCIE_LINK_STATE_L1;
> override = link->aspm_default & ~link->aspm_enabled;
> if (override)
> pci_info(pdev, "ASPM: DT platform,
>
>
Thanks a lot for debugging the issue. Now it is clear that the board routing is
on play and ASPM works fine on Rockchip Root Ports.
> So, is there a proper way to just disable this feature for spec boards
> instead of this Soc?
>
Below should work:
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 214ed060ca1b..9864b2c91399 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -29,6 +29,7 @@
#include <linux/ktime.h>
#include <linux/mm.h>
#include <linux/nvme.h>
+#include <linux/of.h>
#include <linux/platform_data/x86/apple.h>
#include <linux/pm_runtime.h>
#include <linux/sizes.h>
@@ -2525,6 +2526,19 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
*/
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
+
+static void quirk_disable_aspm_radxa(struct pci_dev *dev)
+{
+ if (of_machine_is_compatible("radxa,rock-5a") ||
+ (of_machine_is_compatible("radxa,rock-5b"))) {
+ pci_info(dev, "Disabling ASPM L1ss\n");
+ pci_disable_link_state(dev, PCIE_LINK_STATE_L1_1 |
+ PCIE_LINK_STATE_L1_2);
+ }
+}
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ROCKCHIP, 0x3588, quirk_disable_aspm_radxa);
+
/*
* Some Pericom PCIe-to-PCI bridges in reverse mode need the PCIe Retrain
* Link bit cleared after starting the link retrain process to allow this
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply related [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 9:11 ` Shawn Lin
2025-10-15 9:43 ` Manivannan Sadhasivam
@ 2025-10-15 9:46 ` Niklas Cassel
2025-10-15 10:33 ` Manivannan Sadhasivam
1 sibling, 1 reply; 42+ messages in thread
From: Niklas Cassel @ 2025-10-15 9:46 UTC (permalink / raw)
To: Shawn Lin
Cc: Manivannan Sadhasivam, Bjorn Helgaas, manivannan.sadhasivam,
Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
Hello Shawn,
On Wed, Oct 15, 2025 at 05:11:39PM +0800, Shawn Lin wrote:
> >
> > Thanks! Could you please try the below diff with f3ac2ff14834 applied?
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 214ed060ca1b..0069d06c282d 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -2525,6 +2525,15 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
> > */
> > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
> >
> > +
> > +static void quirk_disable_aspm_all(struct pci_dev *dev)
> > +{
> > + pci_info(dev, "Disabling ASPM\n");
> > + pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
> > +}
> > +
> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ROCKCHIP, 0x3588, quirk_disable_aspm_all);
>
> That's not true from my POV. Rockchip platform supports all ASPM policy
> after mass production verification. I also verified current upstream
> code this morning with RK3588-EVB and can check L0s/L1/L1ss work fine.
>
> The log and lspci output could be found here:
> https://pastebin.com/qizeYED7
>
> Moreover, I disscussed this issue with FUKAUMI today off-list and his
> board seems to work when only disable L1ss by patching:
>
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -813,7 +813,7 @@ static void pcie_aspm_override_default_link_state(struct
> pcie_link_state *link)
>
> /* For devicetree platforms, enable all ASPM states by default */
> if (of_have_populated_dt()) {
> - link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
> + link->aspm_default = PCIE_LINK_STATE_L0S |
> PCIE_LINK_STATE_L1;
> override = link->aspm_default & ~link->aspm_enabled;
> if (override)
> pci_info(pdev, "ASPM: DT platform,
>
>
> So, is there a proper way to just disable this feature for spec boards
> instead of this Soc?
This fix seems do the trick, without needing to patch common code (aspm.c):
diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
index 3e2752c7dd09..f5e1aaa97719 100644
--- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
+++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
@@ -200,6 +200,19 @@ static bool rockchip_pcie_link_up(struct dw_pcie *pci)
return FIELD_GET(PCIE_LINKUP_MASK, val) == PCIE_LINKUP;
}
+static void rockchip_pcie_disable_l1sub(struct dw_pcie *pci)
+{
+ u32 cap, l1subcap;
+
+ cap = dw_pcie_find_ext_capability(pci, PCI_EXT_CAP_ID_L1SS);
+ if (cap) {
+ l1subcap = dw_pcie_readl_dbi(pci, cap + PCI_L1SS_CAP);
+ l1subcap &= ~(PCI_L1SS_CAP_ASPM_L1_1 | PCI_L1SS_CAP_ASPM_L1_2 | PCI_L1SS_CAP_L1_PM_SS);
+ dw_pcie_writel_dbi(pci, cap + PCI_L1SS_CAP, l1subcap);
+ l1subcap = dw_pcie_readl_dbi(pci, cap + PCI_L1SS_CAP);
+ }
+}
+
static void rockchip_pcie_enable_l0s(struct dw_pcie *pci)
{
u32 cap, lnkcap;
@@ -264,6 +277,7 @@ static int rockchip_pcie_host_init(struct dw_pcie_rp *pp)
irq_set_chained_handler_and_data(irq, rockchip_pcie_intx_handler,
rockchip);
+ rockchip_pcie_disable_l1sub(pci);
rockchip_pcie_enable_l0s(pci);
return 0;
@@ -301,6 +315,7 @@ static void rockchip_pcie_ep_init(struct dw_pcie_ep *ep)
struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
enum pci_barno bar;
+ rockchip_pcie_disable_l1sub(pci);
rockchip_pcie_enable_l0s(pci);
rockchip_pcie_ep_hide_broken_ats_cap_rk3588(ep);
In reality, I think that pcie-dw-rockchip.c should check 'supports-clkreq',
and only if it doesn't support clkreq, it should disable L1 substates, similar
to how the Tegra driver does things:
https://github.com/torvalds/linux/blob/v6.18-rc1/drivers/pci/controller/dwc/pcie-tegra194.c#L934-L938
https://github.com/torvalds/linux/blob/v6.18-rc1/drivers/pci/controller/dwc/pcie-tegra194.c#L1164-L1165
In fact, that is also how the downstream rockchip drives does things:
https://github.com/rockchip-linux/kernel/blob/develop-6.6/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L200-L233
https://github.com/rockchip-linux/kernel/blob/develop-6.6/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L725
So I guess we either:
1) Add code to pcie-dw-rockchip.c to unconditionally disable L1 substates, or
2) We add code to:
- If have 'supports-clkreq' property, set PCIE_CLIENT_POWER_CON.app_clk_req_n=1
- If don't have 'supports-clkreq' property, disable L1 substates.
I think we need to do either 1) or 2), because a user can build the kernel with
CONFIG_PCIEASPM_POWER_SUPERSAVE=y
and that would break things even on older kernels, that don't have Mani's recent
commit.
Mani, perhaps common code (aspm.c) should enable L1 substates only if
'supports-clkreq' DT property exists?
Kind regards,
Niklas
^ permalink raw reply related [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 9:46 ` Niklas Cassel
@ 2025-10-15 10:33 ` Manivannan Sadhasivam
2025-10-15 12:17 ` Niklas Cassel
0 siblings, 1 reply; 42+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-15 10:33 UTC (permalink / raw)
To: Niklas Cassel
Cc: Shawn Lin, Bjorn Helgaas, manivannan.sadhasivam, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Wed, Oct 15, 2025 at 11:46:02AM +0200, Niklas Cassel wrote:
> Hello Shawn,
>
> On Wed, Oct 15, 2025 at 05:11:39PM +0800, Shawn Lin wrote:
> > >
> > > Thanks! Could you please try the below diff with f3ac2ff14834 applied?
> > >
> > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > index 214ed060ca1b..0069d06c282d 100644
> > > --- a/drivers/pci/quirks.c
> > > +++ b/drivers/pci/quirks.c
> > > @@ -2525,6 +2525,15 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
> > > */
> > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
> > >
> > > +
> > > +static void quirk_disable_aspm_all(struct pci_dev *dev)
> > > +{
> > > + pci_info(dev, "Disabling ASPM\n");
> > > + pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
> > > +}
> > > +
> > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ROCKCHIP, 0x3588, quirk_disable_aspm_all);
> >
> > That's not true from my POV. Rockchip platform supports all ASPM policy
> > after mass production verification. I also verified current upstream
> > code this morning with RK3588-EVB and can check L0s/L1/L1ss work fine.
> >
> > The log and lspci output could be found here:
> > https://pastebin.com/qizeYED7
> >
> > Moreover, I disscussed this issue with FUKAUMI today off-list and his
> > board seems to work when only disable L1ss by patching:
> >
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -813,7 +813,7 @@ static void pcie_aspm_override_default_link_state(struct
> > pcie_link_state *link)
> >
> > /* For devicetree platforms, enable all ASPM states by default */
> > if (of_have_populated_dt()) {
> > - link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
> > + link->aspm_default = PCIE_LINK_STATE_L0S |
> > PCIE_LINK_STATE_L1;
> > override = link->aspm_default & ~link->aspm_enabled;
> > if (override)
> > pci_info(pdev, "ASPM: DT platform,
> >
> >
> > So, is there a proper way to just disable this feature for spec boards
> > instead of this Soc?
>
> This fix seems do the trick, without needing to patch common code (aspm.c):
>
> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> index 3e2752c7dd09..f5e1aaa97719 100644
> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> @@ -200,6 +200,19 @@ static bool rockchip_pcie_link_up(struct dw_pcie *pci)
> return FIELD_GET(PCIE_LINKUP_MASK, val) == PCIE_LINKUP;
> }
>
> +static void rockchip_pcie_disable_l1sub(struct dw_pcie *pci)
> +{
> + u32 cap, l1subcap;
> +
> + cap = dw_pcie_find_ext_capability(pci, PCI_EXT_CAP_ID_L1SS);
> + if (cap) {
> + l1subcap = dw_pcie_readl_dbi(pci, cap + PCI_L1SS_CAP);
> + l1subcap &= ~(PCI_L1SS_CAP_ASPM_L1_1 | PCI_L1SS_CAP_ASPM_L1_2 | PCI_L1SS_CAP_L1_PM_SS);
> + dw_pcie_writel_dbi(pci, cap + PCI_L1SS_CAP, l1subcap);
> + l1subcap = dw_pcie_readl_dbi(pci, cap + PCI_L1SS_CAP);
> + }
> +}
> +
> static void rockchip_pcie_enable_l0s(struct dw_pcie *pci)
> {
> u32 cap, lnkcap;
> @@ -264,6 +277,7 @@ static int rockchip_pcie_host_init(struct dw_pcie_rp *pp)
> irq_set_chained_handler_and_data(irq, rockchip_pcie_intx_handler,
> rockchip);
>
> + rockchip_pcie_disable_l1sub(pci);
> rockchip_pcie_enable_l0s(pci);
>
> return 0;
> @@ -301,6 +315,7 @@ static void rockchip_pcie_ep_init(struct dw_pcie_ep *ep)
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> enum pci_barno bar;
>
> + rockchip_pcie_disable_l1sub(pci);
> rockchip_pcie_enable_l0s(pci);
> rockchip_pcie_ep_hide_broken_ats_cap_rk3588(ep);
>
But this patch removes the L1SS CAP for all boards, isn't it?
>
>
>
> In reality, I think that pcie-dw-rockchip.c should check 'supports-clkreq',
> and only if it doesn't support clkreq, it should disable L1 substates, similar
> to how the Tegra driver does things:
> https://github.com/torvalds/linux/blob/v6.18-rc1/drivers/pci/controller/dwc/pcie-tegra194.c#L934-L938
> https://github.com/torvalds/linux/blob/v6.18-rc1/drivers/pci/controller/dwc/pcie-tegra194.c#L1164-L1165
>
> In fact, that is also how the downstream rockchip drives does things:
> https://github.com/rockchip-linux/kernel/blob/develop-6.6/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L200-L233
> https://github.com/rockchip-linux/kernel/blob/develop-6.6/drivers/pci/controller/dwc/pcie-dw-rockchip.c#L725
>
> So I guess we either:
> 1) Add code to pcie-dw-rockchip.c to unconditionally disable L1 substates, or
> 2) We add code to:
> - If have 'supports-clkreq' property, set PCIE_CLIENT_POWER_CON.app_clk_req_n=1
> - If don't have 'supports-clkreq' property, disable L1 substates.
>
> I think we need to do either 1) or 2), because a user can build the kernel with
> CONFIG_PCIEASPM_POWER_SUPERSAVE=y
> and that would break things even on older kernels, that don't have Mani's recent
> commit.
>
>
>
> Mani, perhaps common code (aspm.c) should enable L1 substates only if
> 'supports-clkreq' DT property exists?
>
Unfortunately, not all DTs define this property even though the platforms
support CLKREQ#. Right now, only Nvidia defines this property in the binding,
but not in upstream DTS. But I would expect the platforms to support CLKREQ# if
the Root Port supports L1SS CAP.
So removing the L1SS CAP for Root Port in the controller driver makes sense to
me.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 10:33 ` Manivannan Sadhasivam
@ 2025-10-15 12:17 ` Niklas Cassel
2025-10-15 13:00 ` Shawn Lin
0 siblings, 1 reply; 42+ messages in thread
From: Niklas Cassel @ 2025-10-15 12:17 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Shawn Lin, Bjorn Helgaas, manivannan.sadhasivam, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Wed, Oct 15, 2025 at 04:03:53PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Oct 15, 2025 at 11:46:02AM +0200, Niklas Cassel wrote:
> > Hello Shawn,
> >
> > On Wed, Oct 15, 2025 at 05:11:39PM +0800, Shawn Lin wrote:
> > > >
> > > > Thanks! Could you please try the below diff with f3ac2ff14834 applied?
> > > >
> > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > > > index 214ed060ca1b..0069d06c282d 100644
> > > > --- a/drivers/pci/quirks.c
> > > > +++ b/drivers/pci/quirks.c
> > > > @@ -2525,6 +2525,15 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
> > > > */
> > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
> > > >
> > > > +
> > > > +static void quirk_disable_aspm_all(struct pci_dev *dev)
> > > > +{
> > > > + pci_info(dev, "Disabling ASPM\n");
> > > > + pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
> > > > +}
> > > > +
> > > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ROCKCHIP, 0x3588, quirk_disable_aspm_all);
> > >
> > > That's not true from my POV. Rockchip platform supports all ASPM policy
> > > after mass production verification. I also verified current upstream
> > > code this morning with RK3588-EVB and can check L0s/L1/L1ss work fine.
> > >
> > > The log and lspci output could be found here:
> > > https://pastebin.com/qizeYED7
> > >
> > > Moreover, I disscussed this issue with FUKAUMI today off-list and his
> > > board seems to work when only disable L1ss by patching:
> > >
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -813,7 +813,7 @@ static void pcie_aspm_override_default_link_state(struct
> > > pcie_link_state *link)
> > >
> > > /* For devicetree platforms, enable all ASPM states by default */
> > > if (of_have_populated_dt()) {
> > > - link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
> > > + link->aspm_default = PCIE_LINK_STATE_L0S |
> > > PCIE_LINK_STATE_L1;
> > > override = link->aspm_default & ~link->aspm_enabled;
> > > if (override)
> > > pci_info(pdev, "ASPM: DT platform,
> > >
> > >
> > > So, is there a proper way to just disable this feature for spec boards
> > > instead of this Soc?
> >
> > This fix seems do the trick, without needing to patch common code (aspm.c):
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > index 3e2752c7dd09..f5e1aaa97719 100644
> > --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
> > @@ -200,6 +200,19 @@ static bool rockchip_pcie_link_up(struct dw_pcie *pci)
> > return FIELD_GET(PCIE_LINKUP_MASK, val) == PCIE_LINKUP;
> > }
> >
> > +static void rockchip_pcie_disable_l1sub(struct dw_pcie *pci)
> > +{
> > + u32 cap, l1subcap;
> > +
> > + cap = dw_pcie_find_ext_capability(pci, PCI_EXT_CAP_ID_L1SS);
> > + if (cap) {
> > + l1subcap = dw_pcie_readl_dbi(pci, cap + PCI_L1SS_CAP);
> > + l1subcap &= ~(PCI_L1SS_CAP_ASPM_L1_1 | PCI_L1SS_CAP_ASPM_L1_2 | PCI_L1SS_CAP_L1_PM_SS);
> > + dw_pcie_writel_dbi(pci, cap + PCI_L1SS_CAP, l1subcap);
> > + l1subcap = dw_pcie_readl_dbi(pci, cap + PCI_L1SS_CAP);
> > + }
> > +}
> > +
> > static void rockchip_pcie_enable_l0s(struct dw_pcie *pci)
> > {
> > u32 cap, lnkcap;
> > @@ -264,6 +277,7 @@ static int rockchip_pcie_host_init(struct dw_pcie_rp *pp)
> > irq_set_chained_handler_and_data(irq, rockchip_pcie_intx_handler,
> > rockchip);
> >
> > + rockchip_pcie_disable_l1sub(pci);
> > rockchip_pcie_enable_l0s(pci);
> >
> > return 0;
> > @@ -301,6 +315,7 @@ static void rockchip_pcie_ep_init(struct dw_pcie_ep *ep)
> > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > enum pci_barno bar;
> >
> > + rockchip_pcie_disable_l1sub(pci);
> > rockchip_pcie_enable_l0s(pci);
> > rockchip_pcie_ep_hide_broken_ats_cap_rk3588(ep);
> >
>
> But this patch removes the L1SS CAP for all boards, isn't it?
Yes, all boards supported by pcie-dw-rockchip.c, which matches what their
downstream driver does.
(Their downstream driver disables L1 substates for all boards that have
not defined 'supports-clkreq', and a grep through their downstream tree,
for all their all their different branches, shows that not a since rockchip
DTS has this property specified.)
So, let me submit a real patch with the above.
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 12:17 ` Niklas Cassel
@ 2025-10-15 13:00 ` Shawn Lin
2025-10-15 15:23 ` Niklas Cassel
2025-10-15 23:30 ` Bjorn Helgaas
0 siblings, 2 replies; 42+ messages in thread
From: Shawn Lin @ 2025-10-15 13:00 UTC (permalink / raw)
To: Niklas Cassel, Manivannan Sadhasivam
Cc: shawn.lin, Bjorn Helgaas, manivannan.sadhasivam, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
在 2025/10/15 星期三 20:17, Niklas Cassel 写道:
> On Wed, Oct 15, 2025 at 04:03:53PM +0530, Manivannan Sadhasivam wrote:
>> On Wed, Oct 15, 2025 at 11:46:02AM +0200, Niklas Cassel wrote:
>>> Hello Shawn,
>>>
>>> On Wed, Oct 15, 2025 at 05:11:39PM +0800, Shawn Lin wrote:
>>>>>
>>>>> Thanks! Could you please try the below diff with f3ac2ff14834 applied?
>>>>>
>>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>>>> index 214ed060ca1b..0069d06c282d 100644
>>>>> --- a/drivers/pci/quirks.c
>>>>> +++ b/drivers/pci/quirks.c
>>>>> @@ -2525,6 +2525,15 @@ static void quirk_disable_aspm_l0s_l1(struct pci_dev *dev)
>>>>> */
>>>>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASMEDIA, 0x1080, quirk_disable_aspm_l0s_l1);
>>>>>
>>>>> +
>>>>> +static void quirk_disable_aspm_all(struct pci_dev *dev)
>>>>> +{
>>>>> + pci_info(dev, "Disabling ASPM\n");
>>>>> + pci_disable_link_state(dev, PCIE_LINK_STATE_ALL);
>>>>> +}
>>>>> +
>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ROCKCHIP, 0x3588, quirk_disable_aspm_all);
>>>>
>>>> That's not true from my POV. Rockchip platform supports all ASPM policy
>>>> after mass production verification. I also verified current upstream
>>>> code this morning with RK3588-EVB and can check L0s/L1/L1ss work fine.
>>>>
>>>> The log and lspci output could be found here:
>>>> https://pastebin.com/qizeYED7
>>>>
>>>> Moreover, I disscussed this issue with FUKAUMI today off-list and his
>>>> board seems to work when only disable L1ss by patching:
>>>>
>>>> --- a/drivers/pci/pcie/aspm.c
>>>> +++ b/drivers/pci/pcie/aspm.c
>>>> @@ -813,7 +813,7 @@ static void pcie_aspm_override_default_link_state(struct
>>>> pcie_link_state *link)
>>>>
>>>> /* For devicetree platforms, enable all ASPM states by default */
>>>> if (of_have_populated_dt()) {
>>>> - link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
>>>> + link->aspm_default = PCIE_LINK_STATE_L0S |
>>>> PCIE_LINK_STATE_L1;
>>>> override = link->aspm_default & ~link->aspm_enabled;
>>>> if (override)
>>>> pci_info(pdev, "ASPM: DT platform,
>>>>
>>>>
>>>> So, is there a proper way to just disable this feature for spec boards
>>>> instead of this Soc?
>>>
>>> This fix seems do the trick, without needing to patch common code (aspm.c):
>>>
>>> diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>> index 3e2752c7dd09..f5e1aaa97719 100644
>>> --- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>> +++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
>>> @@ -200,6 +200,19 @@ static bool rockchip_pcie_link_up(struct dw_pcie *pci)
>>> return FIELD_GET(PCIE_LINKUP_MASK, val) == PCIE_LINKUP;
>>> }
>>>
>>> +static void rockchip_pcie_disable_l1sub(struct dw_pcie *pci)
>>> +{
>>> + u32 cap, l1subcap;
>>> +
>>> + cap = dw_pcie_find_ext_capability(pci, PCI_EXT_CAP_ID_L1SS);
>>> + if (cap) {
>>> + l1subcap = dw_pcie_readl_dbi(pci, cap + PCI_L1SS_CAP);
>>> + l1subcap &= ~(PCI_L1SS_CAP_ASPM_L1_1 | PCI_L1SS_CAP_ASPM_L1_2 | PCI_L1SS_CAP_L1_PM_SS);
>>> + dw_pcie_writel_dbi(pci, cap + PCI_L1SS_CAP, l1subcap);
>>> + l1subcap = dw_pcie_readl_dbi(pci, cap + PCI_L1SS_CAP);
>>> + }
>>> +}
>>> +
>>> static void rockchip_pcie_enable_l0s(struct dw_pcie *pci)
>>> {
>>> u32 cap, lnkcap;
>>> @@ -264,6 +277,7 @@ static int rockchip_pcie_host_init(struct dw_pcie_rp *pp)
>>> irq_set_chained_handler_and_data(irq, rockchip_pcie_intx_handler,
>>> rockchip);
>>>
>>> + rockchip_pcie_disable_l1sub(pci);
For now, this is a acceptable option if default ASPM policy enable L1ss
w/o checking if the HW could supports it... But how about adding
supports-clkreq stuff to upstream host driver directly? That would help
folks enable L1ss if the HW is ready and they just need adding property
to the DT.
>>> rockchip_pcie_enable_l0s(pci);
>>>
>>> return 0;
>>> @@ -301,6 +315,7 @@ static void rockchip_pcie_ep_init(struct dw_pcie_ep *ep)
>>> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>> enum pci_barno bar;
>>>
>>> + rockchip_pcie_disable_l1sub(pci);
>>> rockchip_pcie_enable_l0s(pci);
>>> rockchip_pcie_ep_hide_broken_ats_cap_rk3588(ep);
>>>
>>
>> But this patch removes the L1SS CAP for all boards, isn't it?
>
> Yes, all boards supported by pcie-dw-rockchip.c, which matches what their
> downstream driver does.
>
> (Their downstream driver disables L1 substates for all boards that have
> not defined 'supports-clkreq', and a grep through their downstream tree,
> for all their all their different branches, shows that not a since rockchip
> DTS has this property specified.)
The L1ss support is quite strict and need several steps to check, so we
didn't add supports-clkreq for them unless the HW is ready to go...
For adding supports of L1ss,
[1] the HW should support CLKREQ#, expecially for PCIe3.0 case on
Rockchip SoCs , since both CLKREQ# of RC and EP should connect to the
100MHz crystal generator's enable pin, as L1.2 need to disable refclk as
well. If the enable pin is high active, the HW even need a invertor....
[2] define proper clkreq iomux to pinctrl of pcie node
[3] make sure the devices work fine with L1ss.(It's hard to check the
slot case with random devices in the wild )
[4] add supports-clkreq to the DT and enable
CONFIG_PCIEASPM_POWER_SUPERSAVE
>
> So, let me submit a real patch with the above.
>
>
> Kind regards,
> Niklas
>
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 13:00 ` Shawn Lin
@ 2025-10-15 15:23 ` Niklas Cassel
2025-10-15 23:30 ` Bjorn Helgaas
1 sibling, 0 replies; 42+ messages in thread
From: Niklas Cassel @ 2025-10-15 15:23 UTC (permalink / raw)
To: Shawn Lin
Cc: Manivannan Sadhasivam, Bjorn Helgaas, manivannan.sadhasivam,
Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Wed, Oct 15, 2025 at 09:00:41PM +0800, Shawn Lin wrote:
> For now, this is a acceptable option if default ASPM policy enable L1ss
> w/o checking if the HW could supports it... But how about adding
> supports-clkreq stuff to upstream host driver directly? That would help
> folks enable L1ss if the HW is ready and they just need adding property
> to the DT.
I like your idea, if you have time, please send a patch.
However, adding (working) support for L1 substates (via 'supports-clkreq')
is new code, and should thus be queued for next release instead of v6.18.
For now, pcie-dw-rockchip.c is broken for a lot of PCIe devices, so the
fix should be minimal and target v6.18, i.e. something like:
https://lore.kernel.org/linux-pci/20251015123142.392274-2-cassel@kernel.org/
Support for L1 substates via 'supports-clkreq' can be added on top of that
patch (while targeting v6.19).
Kind regards,
Niklas
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 13:00 ` Shawn Lin
2025-10-15 15:23 ` Niklas Cassel
@ 2025-10-15 23:30 ` Bjorn Helgaas
2025-10-16 6:46 ` Hongxing Zhu
2025-10-17 3:36 ` Manivannan Sadhasivam
1 sibling, 2 replies; 42+ messages in thread
From: Bjorn Helgaas @ 2025-10-15 23:30 UTC (permalink / raw)
To: Shawn Lin
Cc: Niklas Cassel, Manivannan Sadhasivam, manivannan.sadhasivam,
Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Wed, Oct 15, 2025 at 09:00:41PM +0800, Shawn Lin wrote:
> ...
> For now, this is a acceptable option if default ASPM policy enable
> L1ss w/o checking if the HW could supports it... But how about
> adding supports-clkreq stuff to upstream host driver directly? That
> would help folks enable L1ss if the HW is ready and they just need
> adding property to the DT.
> ...
> The L1ss support is quite strict and need several steps to check, so we
> didn't add supports-clkreq for them unless the HW is ready to go...
>
> For adding supports of L1ss,
> [1] the HW should support CLKREQ#, expecially for PCIe3.0 case on Rockchip
> SoCs , since both CLKREQ# of RC and EP should connect to the
> 100MHz crystal generator's enable pin, as L1.2 need to disable refclk as
> well. If the enable pin is high active, the HW even need a invertor....
>
> [2] define proper clkreq iomux to pinctrl of pcie node
> [3] make sure the devices work fine with L1ss.(It's hard to check the slot
> case with random devices in the wild )
> [4] add supports-clkreq to the DT and enable
> CONFIG_PCIEASPM_POWER_SUPERSAVE
I don't understand the details of the supports-clkreq issue.
If we need to add supports-clkreq to devicetree, I want to understand
why we need it there when we don't seem to need it for ACPI systems.
Generally the OS relies on what the hardware advertises, e.g., in Link
Capabilities and the L1 PM Substates Capability, and what is available
from firmware, e.g., the ACPI _DSM for Latency Tolerance Reporting.
On the ACPI side, I don't think we get any specific information about
CLKREQ#. Can somebody explain why we do need it on the devicetree
side?
Bjorn
^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 23:30 ` Bjorn Helgaas
@ 2025-10-16 6:46 ` Hongxing Zhu
2025-10-17 3:36 ` Manivannan Sadhasivam
1 sibling, 0 replies; 42+ messages in thread
From: Hongxing Zhu @ 2025-10-16 6:46 UTC (permalink / raw)
To: Bjorn Helgaas, Shawn Lin
Cc: Niklas Cassel, Manivannan Sadhasivam,
manivannan.sadhasivam@oss.qualcomm.com, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczy��ski,
Rob Herring, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org,
David E. Box, Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit,
Chia-Lin Kao, Dragan Simic, linux-rockchip@lists.infradead.org,
regressions@lists.linux.dev, FUKAUMI Naoki
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 3374 bytes --]
> -----Original Message-----
> From: Bjorn Helgaas <helgaas@kernel.org>
> Sent: 2025Äê10ÔÂ16ÈÕ 7:31
> To: Shawn Lin <shawn.lin@rock-chips.com>
> Cc: Niklas Cassel <cassel@kernel.org>; Manivannan Sadhasivam
> <mani@kernel.org>; manivannan.sadhasivam@oss.qualcomm.com; Bjorn
> Helgaas <bhelgaas@google.com>; Lorenzo Pieralisi <lpieralisi@kernel.org>;
> Krzysztof Wilczy¨½ski <kwilczynski@kernel.org>; Rob Herring
> <robh@kernel.org>; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-arm-msm@vger.kernel.org; David E. Box <david.e.box@linux.intel.com>;
> Kai-Heng Feng <kai.heng.feng@canonical.com>; Rafael J. Wysocki
> <rafael@kernel.org>; Heiner Kallweit <hkallweit1@gmail.com>; Chia-Lin Kao
> <acelan.kao@canonical.com>; Dragan Simic <dsimic@manjaro.org>;
> linux-rockchip@lists.infradead.org; regressions@lists.linux.dev; FUKAUMI Naoki
> <naoki@radxa.com>
> Subject: Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states
> set by BIOS for devicetree platforms
>
> On Wed, Oct 15, 2025 at 09:00:41PM +0800, Shawn Lin wrote:
> > ...
>
> > For now, this is a acceptable option if default ASPM policy enable
> > L1ss w/o checking if the HW could supports it... But how about adding
> > supports-clkreq stuff to upstream host driver directly? That would
> > help folks enable L1ss if the HW is ready and they just need adding
> > property to the DT.
> > ...
>
> > The L1ss support is quite strict and need several steps to check, so
> > we didn't add supports-clkreq for them unless the HW is ready to go...
> >
> > For adding supports of L1ss,
> > [1] the HW should support CLKREQ#, expecially for PCIe3.0 case on
> > Rockchip SoCs , since both CLKREQ# of RC and EP should connect to the
> > 100MHz crystal generator's enable pin, as L1.2 need to disable refclk
> > as well. If the enable pin is high active, the HW even need a invertor....
> >
> > [2] define proper clkreq iomux to pinctrl of pcie node [3] make sure
> > the devices work fine with L1ss.(It's hard to check the slot case with
> > random devices in the wild ) [4] add supports-clkreq to the DT and
> > enable CONFIG_PCIEASPM_POWER_SUPERSAVE
>
> I don't understand the details of the supports-clkreq issue.
>
> If we need to add supports-clkreq to devicetree, I want to understand why we
> need it there when we don't seem to need it for ACPI systems.
>
> Generally the OS relies on what the hardware advertises, e.g., in Link
> Capabilities and the L1 PM Substates Capability, and what is available from
> firmware, e.g., the ACPI _DSM for Latency Tolerance Reporting.
Hi Bjorn:
Yes, it is. The L1 PM Substates support can be broadcasted by the according
Capabilities of PCIe controller.
But, this feature is still relied on the CLKREQ# signal connection on the
board/device hardware designs too.
Maybe the "supports-clkreq" property is used to guarantee that the hardware
designs of CLKREQ# on board/device meet the requirements of L1 PM Substates.
https://lore.kernel.org/imx/20251015030428.2980427-1-hongxing.zhu@nxp.com/
This is the jam I encountered, especially on the second slot of i.MX95
19x19 EVK board.
Best Regards
Richard Zhu
>
> On the ACPI side, I don't think we get any specific information about CLKREQ#.
> Can somebody explain why we do need it on the devicetree side?
>
> Bjorn
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 23:30 ` Bjorn Helgaas
2025-10-16 6:46 ` Hongxing Zhu
@ 2025-10-17 3:36 ` Manivannan Sadhasivam
2025-10-17 9:47 ` Shawn Lin
1 sibling, 1 reply; 42+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-17 3:36 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Shawn Lin, Niklas Cassel, Manivannan Sadhasivam, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Wed, Oct 15, 2025 at 06:30:54PM -0500, Bjorn Helgaas wrote:
> On Wed, Oct 15, 2025 at 09:00:41PM +0800, Shawn Lin wrote:
> > ...
>
> > For now, this is a acceptable option if default ASPM policy enable
> > L1ss w/o checking if the HW could supports it... But how about
> > adding supports-clkreq stuff to upstream host driver directly? That
> > would help folks enable L1ss if the HW is ready and they just need
> > adding property to the DT.
> > ...
>
> > The L1ss support is quite strict and need several steps to check, so we
> > didn't add supports-clkreq for them unless the HW is ready to go...
> >
> > For adding supports of L1ss,
> > [1] the HW should support CLKREQ#, expecially for PCIe3.0 case on Rockchip
> > SoCs , since both CLKREQ# of RC and EP should connect to the
> > 100MHz crystal generator's enable pin, as L1.2 need to disable refclk as
> > well. If the enable pin is high active, the HW even need a invertor....
> >
> > [2] define proper clkreq iomux to pinctrl of pcie node
> > [3] make sure the devices work fine with L1ss.(It's hard to check the slot
> > case with random devices in the wild )
> > [4] add supports-clkreq to the DT and enable
> > CONFIG_PCIEASPM_POWER_SUPERSAVE
>
> I don't understand the details of the supports-clkreq issue.
>
> If we need to add supports-clkreq to devicetree, I want to understand
> why we need it there when we don't seem to need it for ACPI systems.
>
> Generally the OS relies on what the hardware advertises, e.g., in Link
> Capabilities and the L1 PM Substates Capability, and what is available
> from firmware, e.g., the ACPI _DSM for Latency Tolerance Reporting.
>
> On the ACPI side, I don't think we get any specific information about
> CLKREQ#. Can somebody explain why we do need it on the devicetree
> side?
>
I think there is a disconnect between enabling L1ss CAP and CLKREQ#
availability.. When L1ss CAP is enabled for the Root Port in the hardware, there
is no guarantee that CLKREQ# is also available. If CLKREQ# is not available,
then if L1ss is enabled by the OS, it is not possible to exit the L1ss states
(assuming that L1ss is entered due to CLKREQ# in deassert (default) state).
Yes, there seems to be no standard way to know CLKREQ# presence in ACPI, but
in devicetree, we have this 'supports-clkreq' property to tell the OS that
CLKREQ# is available in the platform. But unfortunately, this property is not
widely used by the devicetrees out there. So we cannot use it in generic
pci/aspm.c driver.
We can certainly rely on the BIOS to enable L1ss as the fw developers would
have the knowledge of the CLKREQ# availability. But BIOS is not a thing on
mobile and embedded platforms where L1ss would come handy.
What I would suggest is, the host controller drivers (mostly for devicetree
platforms) should enable L1ss CAP for the Root Port only if they know that
CLKREQ# is available. They can either rely on the 'supports-clkreq' property or
some platform specific knowledge (for instance, on all Qcom platforms, we
certainly know that CLKREQ# is available, but we don't set the DT property).
Then in the generic pci/aspm.c driver, we can just enable L1ss for all devices
if the CAP is set, which we do currently.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-17 3:36 ` Manivannan Sadhasivam
@ 2025-10-17 9:47 ` Shawn Lin
2025-10-17 10:04 ` Manivannan Sadhasivam
0 siblings, 1 reply; 42+ messages in thread
From: Shawn Lin @ 2025-10-17 9:47 UTC (permalink / raw)
To: Manivannan Sadhasivam, Bjorn Helgaas
Cc: shawn.lin, Niklas Cassel, Manivannan Sadhasivam, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
Hi Mani and Bjorn
在 2025/10/17 星期五 11:36, Manivannan Sadhasivam 写道:
> On Wed, Oct 15, 2025 at 06:30:54PM -0500, Bjorn Helgaas wrote:
>> On Wed, Oct 15, 2025 at 09:00:41PM +0800, Shawn Lin wrote:
>>> ...
>>
>>> For now, this is a acceptable option if default ASPM policy enable
>>> L1ss w/o checking if the HW could supports it... But how about
>>> adding supports-clkreq stuff to upstream host driver directly? That
>>> would help folks enable L1ss if the HW is ready and they just need
>>> adding property to the DT.
>>> ...
>>
>>> The L1ss support is quite strict and need several steps to check, so we
>>> didn't add supports-clkreq for them unless the HW is ready to go...
>>>
>>> For adding supports of L1ss,
>>> [1] the HW should support CLKREQ#, expecially for PCIe3.0 case on Rockchip
>>> SoCs , since both CLKREQ# of RC and EP should connect to the
>>> 100MHz crystal generator's enable pin, as L1.2 need to disable refclk as
>>> well. If the enable pin is high active, the HW even need a invertor....
>>>
>>> [2] define proper clkreq iomux to pinctrl of pcie node
>>> [3] make sure the devices work fine with L1ss.(It's hard to check the slot
>>> case with random devices in the wild )
>>> [4] add supports-clkreq to the DT and enable
>>> CONFIG_PCIEASPM_POWER_SUPERSAVE
>>
>> I don't understand the details of the supports-clkreq issue.
>>
>> If we need to add supports-clkreq to devicetree, I want to understand
>> why we need it there when we don't seem to need it for ACPI systems.
>>
>> Generally the OS relies on what the hardware advertises, e.g., in Link
>> Capabilities and the L1 PM Substates Capability, and what is available
>> from firmware, e.g., the ACPI _DSM for Latency Tolerance Reporting.
>>
>> On the ACPI side, I don't think we get any specific information about
>> CLKREQ#. Can somebody explain why we do need it on the devicetree
>> side?
>>
>
> I think there is a disconnect between enabling L1ss CAP and CLKREQ#
> availability.. When L1ss CAP is enabled for the Root Port in the hardware, there
> is no guarantee that CLKREQ# is also available. If CLKREQ# is not available,
> then if L1ss is enabled by the OS, it is not possible to exit the L1ss states
> (assuming that L1ss is entered due to CLKREQ# in deassert (default) state).
>
> Yes, there seems to be no standard way to know CLKREQ# presence in ACPI, but
> in devicetree, we have this 'supports-clkreq' property to tell the OS that
> CLKREQ# is available in the platform. But unfortunately, this property is not
> widely used by the devicetrees out there. So we cannot use it in generic
> pci/aspm.c driver.
>
> We can certainly rely on the BIOS to enable L1ss as the fw developers would
> have the knowledge of the CLKREQ# availability. But BIOS is not a thing on
> mobile and embedded platforms where L1ss would come handy.
>
> What I would suggest is, the host controller drivers (mostly for devicetree
> platforms) should enable L1ss CAP for the Root Port only if they know that
> CLKREQ# is available. They can either rely on the 'supports-clkreq' property or
> some platform specific knowledge (for instance, on all Qcom platforms, we
> certainly know that CLKREQ# is available, but we don't set the DT property).
While we're on the topic of ASPM, may I ask a silly question?
I saw the ASPM would only be configured once the function driver calling
pci_enable_device. So if the modular driver hasn't been insmoded, the
link will be in L0 even though there is no transcation on-going. What is
the intention behind it?
>
> Then in the generic pci/aspm.c driver, we can just enable L1ss for all devices
> if the CAP is set, which we do currently.
>
> - Mani
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-17 9:47 ` Shawn Lin
@ 2025-10-17 10:04 ` Manivannan Sadhasivam
2025-10-17 12:19 ` Shawn Lin
0 siblings, 1 reply; 42+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-17 10:04 UTC (permalink / raw)
To: Shawn Lin
Cc: Manivannan Sadhasivam, Bjorn Helgaas, Niklas Cassel,
Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Fri, Oct 17, 2025 at 05:47:44PM +0800, Shawn Lin wrote:
> Hi Mani and Bjorn
>
> 在 2025/10/17 星期五 11:36, Manivannan Sadhasivam 写道:
> > On Wed, Oct 15, 2025 at 06:30:54PM -0500, Bjorn Helgaas wrote:
> > > On Wed, Oct 15, 2025 at 09:00:41PM +0800, Shawn Lin wrote:
> > > > ...
> > >
> > > > For now, this is a acceptable option if default ASPM policy enable
> > > > L1ss w/o checking if the HW could supports it... But how about
> > > > adding supports-clkreq stuff to upstream host driver directly? That
> > > > would help folks enable L1ss if the HW is ready and they just need
> > > > adding property to the DT.
> > > > ...
> > >
> > > > The L1ss support is quite strict and need several steps to check, so we
> > > > didn't add supports-clkreq for them unless the HW is ready to go...
> > > >
> > > > For adding supports of L1ss,
> > > > [1] the HW should support CLKREQ#, expecially for PCIe3.0 case on Rockchip
> > > > SoCs , since both CLKREQ# of RC and EP should connect to the
> > > > 100MHz crystal generator's enable pin, as L1.2 need to disable refclk as
> > > > well. If the enable pin is high active, the HW even need a invertor....
> > > >
> > > > [2] define proper clkreq iomux to pinctrl of pcie node
> > > > [3] make sure the devices work fine with L1ss.(It's hard to check the slot
> > > > case with random devices in the wild )
> > > > [4] add supports-clkreq to the DT and enable
> > > > CONFIG_PCIEASPM_POWER_SUPERSAVE
> > >
> > > I don't understand the details of the supports-clkreq issue.
> > >
> > > If we need to add supports-clkreq to devicetree, I want to understand
> > > why we need it there when we don't seem to need it for ACPI systems.
> > >
> > > Generally the OS relies on what the hardware advertises, e.g., in Link
> > > Capabilities and the L1 PM Substates Capability, and what is available
> > > from firmware, e.g., the ACPI _DSM for Latency Tolerance Reporting.
> > >
> > > On the ACPI side, I don't think we get any specific information about
> > > CLKREQ#. Can somebody explain why we do need it on the devicetree
> > > side?
> > >
> >
> > I think there is a disconnect between enabling L1ss CAP and CLKREQ#
> > availability.. When L1ss CAP is enabled for the Root Port in the hardware, there
> > is no guarantee that CLKREQ# is also available. If CLKREQ# is not available,
> > then if L1ss is enabled by the OS, it is not possible to exit the L1ss states
> > (assuming that L1ss is entered due to CLKREQ# in deassert (default) state).
> >
> > Yes, there seems to be no standard way to know CLKREQ# presence in ACPI, but
> > in devicetree, we have this 'supports-clkreq' property to tell the OS that
> > CLKREQ# is available in the platform. But unfortunately, this property is not
> > widely used by the devicetrees out there. So we cannot use it in generic
> > pci/aspm.c driver.
> >
> > We can certainly rely on the BIOS to enable L1ss as the fw developers would
> > have the knowledge of the CLKREQ# availability. But BIOS is not a thing on
> > mobile and embedded platforms where L1ss would come handy.
> >
> > What I would suggest is, the host controller drivers (mostly for devicetree
> > platforms) should enable L1ss CAP for the Root Port only if they know that
> > CLKREQ# is available. They can either rely on the 'supports-clkreq' property or
> > some platform specific knowledge (for instance, on all Qcom platforms, we
> > certainly know that CLKREQ# is available, but we don't set the DT property).
>
> While we're on the topic of ASPM, may I ask a silly question?
> I saw the ASPM would only be configured once the function driver calling
> pci_enable_device. So if the modular driver hasn't been insmoded, the
> link will be in L0 even though there is no transcation on-going. What is
> the intention behind it?
>
I don't see where ASPM is configured during pci_enable_device(). It is currently
configured for all devices during pci_scan_slot().
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-17 10:04 ` Manivannan Sadhasivam
@ 2025-10-17 12:19 ` Shawn Lin
2025-10-17 12:54 ` Manivannan Sadhasivam
0 siblings, 1 reply; 42+ messages in thread
From: Shawn Lin @ 2025-10-17 12:19 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: shawn.lin, Manivannan Sadhasivam, Bjorn Helgaas, Niklas Cassel,
Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
在 2025/10/17 星期五 18:04, Manivannan Sadhasivam 写道:
> On Fri, Oct 17, 2025 at 05:47:44PM +0800, Shawn Lin wrote:
>> Hi Mani and Bjorn
>>
>> 在 2025/10/17 星期五 11:36, Manivannan Sadhasivam 写道:
>>> On Wed, Oct 15, 2025 at 06:30:54PM -0500, Bjorn Helgaas wrote:
>>>> On Wed, Oct 15, 2025 at 09:00:41PM +0800, Shawn Lin wrote:
>>>>> ...
>>>>
>>>>> For now, this is a acceptable option if default ASPM policy enable
>>>>> L1ss w/o checking if the HW could supports it... But how about
>>>>> adding supports-clkreq stuff to upstream host driver directly? That
>>>>> would help folks enable L1ss if the HW is ready and they just need
>>>>> adding property to the DT.
>>>>> ...
>>>>
>>>>> The L1ss support is quite strict and need several steps to check, so we
>>>>> didn't add supports-clkreq for them unless the HW is ready to go...
>>>>>
>>>>> For adding supports of L1ss,
>>>>> [1] the HW should support CLKREQ#, expecially for PCIe3.0 case on Rockchip
>>>>> SoCs , since both CLKREQ# of RC and EP should connect to the
>>>>> 100MHz crystal generator's enable pin, as L1.2 need to disable refclk as
>>>>> well. If the enable pin is high active, the HW even need a invertor....
>>>>>
>>>>> [2] define proper clkreq iomux to pinctrl of pcie node
>>>>> [3] make sure the devices work fine with L1ss.(It's hard to check the slot
>>>>> case with random devices in the wild )
>>>>> [4] add supports-clkreq to the DT and enable
>>>>> CONFIG_PCIEASPM_POWER_SUPERSAVE
>>>>
>>>> I don't understand the details of the supports-clkreq issue.
>>>>
>>>> If we need to add supports-clkreq to devicetree, I want to understand
>>>> why we need it there when we don't seem to need it for ACPI systems.
>>>>
>>>> Generally the OS relies on what the hardware advertises, e.g., in Link
>>>> Capabilities and the L1 PM Substates Capability, and what is available
>>>> from firmware, e.g., the ACPI _DSM for Latency Tolerance Reporting.
>>>>
>>>> On the ACPI side, I don't think we get any specific information about
>>>> CLKREQ#. Can somebody explain why we do need it on the devicetree
>>>> side?
>>>>
>>>
>>> I think there is a disconnect between enabling L1ss CAP and CLKREQ#
>>> availability.. When L1ss CAP is enabled for the Root Port in the hardware, there
>>> is no guarantee that CLKREQ# is also available. If CLKREQ# is not available,
>>> then if L1ss is enabled by the OS, it is not possible to exit the L1ss states
>>> (assuming that L1ss is entered due to CLKREQ# in deassert (default) state).
>>>
>>> Yes, there seems to be no standard way to know CLKREQ# presence in ACPI, but
>>> in devicetree, we have this 'supports-clkreq' property to tell the OS that
>>> CLKREQ# is available in the platform. But unfortunately, this property is not
>>> widely used by the devicetrees out there. So we cannot use it in generic
>>> pci/aspm.c driver.
>>>
>>> We can certainly rely on the BIOS to enable L1ss as the fw developers would
>>> have the knowledge of the CLKREQ# availability. But BIOS is not a thing on
>>> mobile and embedded platforms where L1ss would come handy.
>>>
>>> What I would suggest is, the host controller drivers (mostly for devicetree
>>> platforms) should enable L1ss CAP for the Root Port only if they know that
>>> CLKREQ# is available. They can either rely on the 'supports-clkreq' property or
>>> some platform specific knowledge (for instance, on all Qcom platforms, we
>>> certainly know that CLKREQ# is available, but we don't set the DT property).
>>
>> While we're on the topic of ASPM, may I ask a silly question?
>> I saw the ASPM would only be configured once the function driver calling
>> pci_enable_device. So if the modular driver hasn't been insmoded, the
>> link will be in L0 even though there is no transcation on-going. What is
>> the intention behind it?
>>
>
> I don't see where ASPM is configured during pci_enable_device(). It is currently
> configured for all devices during pci_scan_slot().
This is the dump_stack() where I observed. If I compile NVMe driver as a
module and never insmod it, the link is always in L0, namely ASPM
Disabled.
[ 0.747508] pci 0000:01:00.0: ASPM: DT platform, enabling L0s-up
L0s-dw L1 ASPM-L1.1 ASPM-L1.2 PCI-PM-L1.1 PCI-PM-L1.2
[ 0.748509] pci 0000:01:00.0: ASPM: DT platform, enabling ClockPM
[ 0.749145] pci 0000:01:00.0: BAR 0 [mem 0xf0200000-0xf0203fff
64bit]: assigned
[ 0.750430] nvme nvme0: pci function 0000:01:00.0
[ 0.750850] CPU: 7 UID: 0 PID: 120 Comm: kworker/u32:7 Not tainted
6.18.0-rc1-00019-gb00f931c2b43-dirty #19 PREEMPT
[ 0.750854] Hardware name: Rockchip RK3588 EVB1 V10 Board (DT)
[ 0.750856] Workqueue: async async_run_entry_fn
[ 0.750863] Call trace:
[ 0.750864] show_stack+0x18/0x24 (C)
[ 0.750869] dump_stack_lvl+0x40/0x84
[ 0.750873] dump_stack+0x18/0x24
[ 0.750876] pcie_config_aspm_link+0x40/0x2dc
[ 0.750882] pcie_aspm_powersave_config_link+0x68/0x104
[ 0.750885] do_pci_enable_device+0x80/0x170
[ 0.750890] pci_enable_device_flags+0x1b8/0x220
[ 0.750894] pci_enable_device_mem+0x14/0x20
[ 0.750898] nvme_pci_enable+0x3c/0x66c
[ 0.750904] nvme_probe+0x5f8/0x7d4
[ 0.750908] pci_device_probe+0x1dc/0x2ac
[ 0.750912] really_probe+0x138/0x2d8
[ 0.750915] __driver_probe_device+0xa0/0x128
[ 0.750918] driver_probe_device+0x3c/0x1f8
[ 0.750921] __device_attach_driver+0x118/0x140
[ 0.750924] bus_for_each_drv+0xf4/0x14c
[ 0.750927] __device_attach_async_helper+0x78/0xd0
[ 0.750930] async_run_entry_fn+0x24/0xdc
[ 0.750933] process_scheduled_works+0x194/0x2c4
[ 0.750937] worker_thread+0x28c/0x394
[ 0.750939] kthread+0x1bc/0x200
[ 0.750943] ret_from_fork+0x10/0x20
[ 0.750947] pcie_config_aspm_link 989
[ 0.753436] ehci-platform fc800000.usb: USB 2.0 started, EHCI 1.00
[ 0.753620] pcie_config_aspm_link 997 state 0x7c
[ 0.754204] hub 4-0:1.0: USB hub found
[ 0.754279] pcie_config_aspm_link 1004 state 0x7c
[ 0.754281] pcie_config_aspm_link 1008
[ 0.754741] hub 4-0:1.0: 1 port detected
[ 0.755153] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[ 0.769741] hub 1-0:1.0: USB hub found
[ 0.770084] hub 1-0:1.0: 1 port detected
[ 0.801662] hub 3-0:1.0: USB hub found
[ 0.802007] hub 3-0:1.0: 1 port detected
[ 0.806393] nvme nvme0: D3 entry latency set to 10 seconds
[ 0.810188] nvme nvme0: 8/0/0 default/read/poll queues
>
> - Mani
>
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-17 12:19 ` Shawn Lin
@ 2025-10-17 12:54 ` Manivannan Sadhasivam
2025-10-17 13:45 ` Bjorn Helgaas
0 siblings, 1 reply; 42+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-17 12:54 UTC (permalink / raw)
To: Shawn Lin
Cc: Manivannan Sadhasivam, Bjorn Helgaas, Niklas Cassel,
Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Fri, Oct 17, 2025 at 08:19:11PM +0800, Shawn Lin wrote:
> 在 2025/10/17 星期五 18:04, Manivannan Sadhasivam 写道:
> > On Fri, Oct 17, 2025 at 05:47:44PM +0800, Shawn Lin wrote:
> > > Hi Mani and Bjorn
> > >
> > > 在 2025/10/17 星期五 11:36, Manivannan Sadhasivam 写道:
> > > > On Wed, Oct 15, 2025 at 06:30:54PM -0500, Bjorn Helgaas wrote:
> > > > > On Wed, Oct 15, 2025 at 09:00:41PM +0800, Shawn Lin wrote:
> > > > > > ...
> > > > >
> > > > > > For now, this is a acceptable option if default ASPM policy enable
> > > > > > L1ss w/o checking if the HW could supports it... But how about
> > > > > > adding supports-clkreq stuff to upstream host driver directly? That
> > > > > > would help folks enable L1ss if the HW is ready and they just need
> > > > > > adding property to the DT.
> > > > > > ...
> > > > >
> > > > > > The L1ss support is quite strict and need several steps to check, so we
> > > > > > didn't add supports-clkreq for them unless the HW is ready to go...
> > > > > >
> > > > > > For adding supports of L1ss,
> > > > > > [1] the HW should support CLKREQ#, expecially for PCIe3.0 case on Rockchip
> > > > > > SoCs , since both CLKREQ# of RC and EP should connect to the
> > > > > > 100MHz crystal generator's enable pin, as L1.2 need to disable refclk as
> > > > > > well. If the enable pin is high active, the HW even need a invertor....
> > > > > >
> > > > > > [2] define proper clkreq iomux to pinctrl of pcie node
> > > > > > [3] make sure the devices work fine with L1ss.(It's hard to check the slot
> > > > > > case with random devices in the wild )
> > > > > > [4] add supports-clkreq to the DT and enable
> > > > > > CONFIG_PCIEASPM_POWER_SUPERSAVE
> > > > >
> > > > > I don't understand the details of the supports-clkreq issue.
> > > > >
> > > > > If we need to add supports-clkreq to devicetree, I want to understand
> > > > > why we need it there when we don't seem to need it for ACPI systems.
> > > > >
> > > > > Generally the OS relies on what the hardware advertises, e.g., in Link
> > > > > Capabilities and the L1 PM Substates Capability, and what is available
> > > > > from firmware, e.g., the ACPI _DSM for Latency Tolerance Reporting.
> > > > >
> > > > > On the ACPI side, I don't think we get any specific information about
> > > > > CLKREQ#. Can somebody explain why we do need it on the devicetree
> > > > > side?
> > > > >
> > > >
> > > > I think there is a disconnect between enabling L1ss CAP and CLKREQ#
> > > > availability.. When L1ss CAP is enabled for the Root Port in the hardware, there
> > > > is no guarantee that CLKREQ# is also available. If CLKREQ# is not available,
> > > > then if L1ss is enabled by the OS, it is not possible to exit the L1ss states
> > > > (assuming that L1ss is entered due to CLKREQ# in deassert (default) state).
> > > >
> > > > Yes, there seems to be no standard way to know CLKREQ# presence in ACPI, but
> > > > in devicetree, we have this 'supports-clkreq' property to tell the OS that
> > > > CLKREQ# is available in the platform. But unfortunately, this property is not
> > > > widely used by the devicetrees out there. So we cannot use it in generic
> > > > pci/aspm.c driver.
> > > >
> > > > We can certainly rely on the BIOS to enable L1ss as the fw developers would
> > > > have the knowledge of the CLKREQ# availability. But BIOS is not a thing on
> > > > mobile and embedded platforms where L1ss would come handy.
> > > >
> > > > What I would suggest is, the host controller drivers (mostly for devicetree
> > > > platforms) should enable L1ss CAP for the Root Port only if they know that
> > > > CLKREQ# is available. They can either rely on the 'supports-clkreq' property or
> > > > some platform specific knowledge (for instance, on all Qcom platforms, we
> > > > certainly know that CLKREQ# is available, but we don't set the DT property).
> > >
> > > While we're on the topic of ASPM, may I ask a silly question?
> > > I saw the ASPM would only be configured once the function driver calling
> > > pci_enable_device. So if the modular driver hasn't been insmoded, the
> > > link will be in L0 even though there is no transcation on-going. What is
> > > the intention behind it?
> > >
> >
> > I don't see where ASPM is configured during pci_enable_device(). It is currently
> > configured for all devices during pci_scan_slot().
>
> This is the dump_stack() where I observed. If I compile NVMe driver as a
> module and never insmod it, the link is always in L0, namely ASPM
> Disabled.
>
I guess this comment answers your question:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/aspm.c?h=v6.18-rc1#n1179
But with the recent ASPM change, the ASPM settings for DT platforms will be
applied before pci_enable_device(). Also, the comment is somewhat outdated as we
generally do not want PCI client drivers to enable/disable ASPM nowadays. They
can however do it under specific circumstances.
- Mani
> [ 0.747508] pci 0000:01:00.0: ASPM: DT platform, enabling L0s-up L0s-dw
> L1 ASPM-L1.1 ASPM-L1.2 PCI-PM-L1.1 PCI-PM-L1.2
> [ 0.748509] pci 0000:01:00.0: ASPM: DT platform, enabling ClockPM
> [ 0.749145] pci 0000:01:00.0: BAR 0 [mem 0xf0200000-0xf0203fff 64bit]:
> assigned
> [ 0.750430] nvme nvme0: pci function 0000:01:00.0
> [ 0.750850] CPU: 7 UID: 0 PID: 120 Comm: kworker/u32:7 Not tainted
> 6.18.0-rc1-00019-gb00f931c2b43-dirty #19 PREEMPT
> [ 0.750854] Hardware name: Rockchip RK3588 EVB1 V10 Board (DT)
> [ 0.750856] Workqueue: async async_run_entry_fn
> [ 0.750863] Call trace:
> [ 0.750864] show_stack+0x18/0x24 (C)
> [ 0.750869] dump_stack_lvl+0x40/0x84
> [ 0.750873] dump_stack+0x18/0x24
> [ 0.750876] pcie_config_aspm_link+0x40/0x2dc
> [ 0.750882] pcie_aspm_powersave_config_link+0x68/0x104
> [ 0.750885] do_pci_enable_device+0x80/0x170
> [ 0.750890] pci_enable_device_flags+0x1b8/0x220
> [ 0.750894] pci_enable_device_mem+0x14/0x20
> [ 0.750898] nvme_pci_enable+0x3c/0x66c
> [ 0.750904] nvme_probe+0x5f8/0x7d4
> [ 0.750908] pci_device_probe+0x1dc/0x2ac
> [ 0.750912] really_probe+0x138/0x2d8
> [ 0.750915] __driver_probe_device+0xa0/0x128
> [ 0.750918] driver_probe_device+0x3c/0x1f8
> [ 0.750921] __device_attach_driver+0x118/0x140
> [ 0.750924] bus_for_each_drv+0xf4/0x14c
> [ 0.750927] __device_attach_async_helper+0x78/0xd0
> [ 0.750930] async_run_entry_fn+0x24/0xdc
> [ 0.750933] process_scheduled_works+0x194/0x2c4
> [ 0.750937] worker_thread+0x28c/0x394
> [ 0.750939] kthread+0x1bc/0x200
> [ 0.750943] ret_from_fork+0x10/0x20
> [ 0.750947] pcie_config_aspm_link 989
> [ 0.753436] ehci-platform fc800000.usb: USB 2.0 started, EHCI 1.00
> [ 0.753620] pcie_config_aspm_link 997 state 0x7c
> [ 0.754204] hub 4-0:1.0: USB hub found
> [ 0.754279] pcie_config_aspm_link 1004 state 0x7c
> [ 0.754281] pcie_config_aspm_link 1008
> [ 0.754741] hub 4-0:1.0: 1 port detected
> [ 0.755153] nvme 0000:01:00.0: enabling device (0000 -> 0002)
> [ 0.769741] hub 1-0:1.0: USB hub found
> [ 0.770084] hub 1-0:1.0: 1 port detected
> [ 0.801662] hub 3-0:1.0: USB hub found
> [ 0.802007] hub 3-0:1.0: 1 port detected
> [ 0.806393] nvme nvme0: D3 entry latency set to 10 seconds
> [ 0.810188] nvme nvme0: 8/0/0 default/read/poll queues
>
>
> >
> > - Mani
> >
>
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-17 12:54 ` Manivannan Sadhasivam
@ 2025-10-17 13:45 ` Bjorn Helgaas
2025-10-31 6:21 ` Manivannan Sadhasivam
0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-10-17 13:45 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Shawn Lin, Manivannan Sadhasivam, Niklas Cassel, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Fri, Oct 17, 2025 at 06:24:26PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Oct 17, 2025 at 08:19:11PM +0800, Shawn Lin wrote:
> > 在 2025/10/17 星期五 18:04, Manivannan Sadhasivam 写道:
> > > On Fri, Oct 17, 2025 at 05:47:44PM +0800, Shawn Lin wrote:
> ...
> > > > While we're on the topic of ASPM, may I ask a silly question?
> > > > I saw the ASPM would only be configured once the function
> > > > driver calling pci_enable_device. So if the modular driver
> > > > hasn't been insmoded, the link will be in L0 even though there
> > > > is no transcation on-going. What is the intention behind it?
> > >
> > > I don't see where ASPM is configured during pci_enable_device().
> > > It is currently configured for all devices during
> > > pci_scan_slot().
> >
> > This is the dump_stack() where I observed. If I compile NVMe
> > driver as a module and never insmod it, the link is always in L0,
> > namely ASPM Disabled.
>
> I guess this comment answers your question:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/aspm.c?h=v6.18-rc1#n1179
The comment is:
* At this stage drivers haven't had an opportunity to change the
* link policy setting. Enabling ASPM on broken hardware can cripple
* it even before the driver has had a chance to disable ASPM, so
* default to a safe level right now. If we're enabling ASPM beyond
* the BIOS's expectation, we'll do so once pci_enable_device() is
* called.
I don't think relying on a driver to disable ASPM to avoid broken
hardware is the right answer. If the driver is never loaded, we waste
power. And if the user enables ASPM via sysfs, apparently the device
may be crippled.
I think it would be better to have an enumeration-time quirk to keep
us from enabling ASPM. We might trip over some of this broken
hardware, but I don't think there are very many drivers that fiddle
with ASPM, so we should be able to be proactive about it.
> But with the recent ASPM change, the ASPM settings for DT platforms
> will be applied before pci_enable_device(). Also, the comment is
> somewhat outdated as we generally do not want PCI client drivers to
> enable/disable ASPM nowadays. They can however do it under specific
> circumstances.
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-17 13:45 ` Bjorn Helgaas
@ 2025-10-31 6:21 ` Manivannan Sadhasivam
0 siblings, 0 replies; 42+ messages in thread
From: Manivannan Sadhasivam @ 2025-10-31 6:21 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Shawn Lin, Manivannan Sadhasivam, Niklas Cassel, Bjorn Helgaas,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions, FUKAUMI Naoki
On Fri, Oct 17, 2025 at 08:45:54AM -0500, Bjorn Helgaas wrote:
> On Fri, Oct 17, 2025 at 06:24:26PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Oct 17, 2025 at 08:19:11PM +0800, Shawn Lin wrote:
> > > 在 2025/10/17 星期五 18:04, Manivannan Sadhasivam 写道:
> > > > On Fri, Oct 17, 2025 at 05:47:44PM +0800, Shawn Lin wrote:
> > ...
>
> > > > > While we're on the topic of ASPM, may I ask a silly question?
> > > > > I saw the ASPM would only be configured once the function
> > > > > driver calling pci_enable_device. So if the modular driver
> > > > > hasn't been insmoded, the link will be in L0 even though there
> > > > > is no transcation on-going. What is the intention behind it?
> > > >
> > > > I don't see where ASPM is configured during pci_enable_device().
> > > > It is currently configured for all devices during
> > > > pci_scan_slot().
> > >
> > > This is the dump_stack() where I observed. If I compile NVMe
> > > driver as a module and never insmod it, the link is always in L0,
> > > namely ASPM Disabled.
> >
> > I guess this comment answers your question:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/aspm.c?h=v6.18-rc1#n1179
>
> The comment is:
>
> * At this stage drivers haven't had an opportunity to change the
> * link policy setting. Enabling ASPM on broken hardware can cripple
> * it even before the driver has had a chance to disable ASPM, so
> * default to a safe level right now. If we're enabling ASPM beyond
> * the BIOS's expectation, we'll do so once pci_enable_device() is
> * called.
>
> I don't think relying on a driver to disable ASPM to avoid broken
> hardware is the right answer. If the driver is never loaded, we waste
> power. And if the user enables ASPM via sysfs, apparently the device
> may be crippled.
>
> I think it would be better to have an enumeration-time quirk to keep
> us from enabling ASPM. We might trip over some of this broken
> hardware, but I don't think there are very many drivers that fiddle
> with ASPM, so we should be able to be proactive about it.
>
There are quite a bit of drivers fiddling with ASPM states:
git grep -l PCI_EXP_LNKCTL_ASPMC drivers/ | wc -l
16
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-14 18:49 ` Bjorn Helgaas
2025-10-14 23:33 ` Dragan Simic
2025-10-15 6:26 ` Manivannan Sadhasivam
@ 2025-10-15 12:26 ` Diederik de Haas
2025-10-15 22:50 ` Bjorn Helgaas
2025-10-30 22:14 ` Bjorn Helgaas
3 siblings, 1 reply; 42+ messages in thread
From: Diederik de Haas @ 2025-10-15 12:26 UTC (permalink / raw)
To: Bjorn Helgaas, FUKAUMI Naoki
Cc: manivannan.sadhasivam, Bjorn Helgaas, Manivannan Sadhasivam,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions
On Tue Oct 14, 2025 at 8:49 PM CEST, Bjorn Helgaas wrote:
> On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
>> I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
>> Rockchip RK3588(S) SoC.
>>
>> When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
>> freezes or fails to probe M.2 Wi-Fi modules. This happens with several
>> different modules I've tested, including the Realtek RTL8852BE, MediaTek
>> MT7921E, and Intel AX210.
>>
>> I've found that reverting the following commit (i.e., the patch I'm replying
>> to) resolves the problem:
>> commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
>
> Thanks for the report, and sorry for the regression.
>
> Since this affects several devices from different manufacturers and (I
> assume) different drivers, it seems likely that there's some issue
> with the Rockchip end, since ASPM probably works on these devices in
> other systems. So we should figure out if there's something wrong
> with the way we configure ASPM, which we could potentially fix, or if
> there's a hardware issue and we need some king of quirk to prevent
> usage of ASPM on the affected platforms.
>
> Can you collect a complete dmesg log when booting with
>
> ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
>
> and the output of "sudo lspci -vv"?
I have a Rock 5B as well, but I don't have a Wi-Fi module, but I do have
a NVMe drive connected. That boots fine with 6.17, but I end up in a
rescue shell with 6.18-rc1. I haven't verified that it's caused by the
same commit, but it does sound plausible.
On this device, the NVMe isn't strictly needed (I used it to compile my
kernels on), so I added 'noauto' to the NVMe line in /etc/fstab ... and
that made it boot successfully into 6.18-rc1. Then running the 'mount'
command wrt that NVMe drive failed with this message:
EXT4-fs (nvme0n1p1): unable to read superblock
The log of my attempts can be found here:
https://paste.sr.ht/~diederik/f435eb258dca60676f7ac5154c00ddfdc24ac0b7
> When the kernel freezes, can you give us any information about where,
> e.g., a log or screenshot?
For me, there is no kernel freeze. I ended up in a rescue shell as it
couldn't mount the NVMe drive. As described above, when not letting it
auto-mount that drive, the boot completed normally.
> Do you know if any platforms other than Radxa ROCK 5A/5B have this
> problem?
I do have a NanoPi R5S (rk3568) which also has a NVMe drive attached,
but that one is mounted on /var ...
> #regzbot introduced: f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
> #regzbot dup-of: https://github.com/chzigotzky/kernels/issues/17
>
>> On 9/23/25 01:16, Manivannan Sadhasivam via B4 Relay wrote:
>> > From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
>> >
>> > So far, the PCI subsystem has honored the ASPM and Clock PM states set by
>> > the BIOS (through LNKCTL) during device initialization, if it relies on the
>> > default state selected using:
>> >
>> > * Kconfig: CONFIG_PCIEASPM_DEFAULT=y, or
>> > * cmdline: "pcie_aspm=off", or
>> > * FADT: ACPI_FADT_NO_ASPM
>> >
>> > This was done conservatively to avoid issues with the buggy devices that
>> > advertise ASPM capabilities, but behave erratically if the ASPM states are
>> > enabled. So the PCI subsystem ended up trusting the BIOS to enable only the
>> > ASPM states that were known to work for the devices.
>> >
>> > But this turned out to be a problem for devicetree platforms, especially
>> > the ARM based devicetree platforms powering Embedded and *some* Compute
>> > devices as they tend to run without any standard BIOS. So the ASPM states
I noticed in Naoki's log that it had lines mentioning 'BIOS', but I do
not. Naoki's log is wrt Rock 5A, while mine is wrt Rock 5B. I don't
suspect that's very relevant though.
Cheers,
Diederik
>> > on these platforms were left disabled during boot and the PCI subsystem
>> > never bothered to enable them, unless the user has forcefully enabled the
>> > ASPM states through Kconfig, cmdline, and sysfs or the device drivers
>> > themselves, enabling the ASPM states through pci_enable_link_state() APIs.
>> >
>> > This caused runtime power issues on those platforms. So a couple of
>> > approaches were tried to mitigate this BIOS dependency without user
>> > intervention by enabling the ASPM states in the PCI controller drivers
>> > after device enumeration, and overriding the ASPM/Clock PM states
>> > by the PCI controller drivers through an API before enumeration.
>> >
>> > But it has been concluded that none of these mitigations should really be
>> > required and the PCI subsystem should enable the ASPM states advertised by
>> > the devices without relying on BIOS or the PCI controller drivers. If any
>> > device is found to be misbehaving after enabling ASPM states that they
>> > advertised, then those devices should be quirked to disable the problematic
>> > ASPM/Clock PM states.
>> >
>> > In an effort to do so, start by overriding the ASPM and Clock PM states set
>> > by the BIOS for devicetree platforms first. Separate helper functions are
>> > introduced to override the BIOS set states by enabling all of them if
>> > of_have_populated_dt() returns true. To aid debugging, print the overridden
>> > ASPM and Clock PM states as well.
>> >
>> > In the future, these helpers could be extended to allow other platforms
>> > like VMD, newer ACPI systems with a cutoff year etc... to follow the path.
>> >
>> > Link: https://lore.kernel.org/linux-pci/20250828204345.GA958461@bhelgaas
>> > Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
>> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
>> > Link: https://patch.msgid.link/20250916-pci-dt-aspm-v1-1-778fe907c9ad@oss.qualcomm.com
>> > ---
>> > drivers/pci/pcie/aspm.c | 42 ++++++++++++++++++++++++++++++++++++++++--
>> > 1 file changed, 40 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> > index 919a05b9764791c3cc469c9ada62ba5b2c405118..cda31150aec1b67b6a48b60569222ea3d1c3d41f 100644
>> > --- a/drivers/pci/pcie/aspm.c
>> > +++ b/drivers/pci/pcie/aspm.c
>> > @@ -15,6 +15,7 @@
>> > #include <linux/math.h>
>> > #include <linux/module.h>
>> > #include <linux/moduleparam.h>
>> > +#include <linux/of.h>
>> > #include <linux/pci.h>
>> > #include <linux/pci_regs.h>
>> > #include <linux/errno.h>
>> > @@ -235,13 +236,15 @@ struct pcie_link_state {
>> > u32 aspm_support:7; /* Supported ASPM state */
>> > u32 aspm_enabled:7; /* Enabled ASPM state */
>> > u32 aspm_capable:7; /* Capable ASPM state with latency */
>> > - u32 aspm_default:7; /* Default ASPM state by BIOS */
>> > + u32 aspm_default:7; /* Default ASPM state by BIOS or
>> > + override */
>> > u32 aspm_disable:7; /* Disabled ASPM state */
>> > /* Clock PM state */
>> > u32 clkpm_capable:1; /* Clock PM capable? */
>> > u32 clkpm_enabled:1; /* Current Clock PM state */
>> > - u32 clkpm_default:1; /* Default Clock PM state by BIOS */
>> > + u32 clkpm_default:1; /* Default Clock PM state by BIOS or
>> > + override */
>> > u32 clkpm_disable:1; /* Clock PM disabled */
>> > };
>> > @@ -373,6 +376,18 @@ static void pcie_set_clkpm(struct pcie_link_state *link, int enable)
>> > pcie_set_clkpm_nocheck(link, enable);
>> > }
>> > +static void pcie_clkpm_override_default_link_state(struct pcie_link_state *link,
>> > + int enabled)
>> > +{
>> > + struct pci_dev *pdev = link->downstream;
>> > +
>> > + /* Override the BIOS disabled Clock PM state for devicetree platforms */
>> > + if (of_have_populated_dt() && !enabled) {
>> > + link->clkpm_default = 1;
>> > + pci_info(pdev, "Clock PM state overridden: ClockPM+\n");
>> > + }
>> > +}
>> > +
>> > static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
>> > {
>> > int capable = 1, enabled = 1;
>> > @@ -395,6 +410,7 @@ static void pcie_clkpm_cap_init(struct pcie_link_state *link, int blacklist)
>> > }
>> > link->clkpm_enabled = enabled;
>> > link->clkpm_default = enabled;
>> > + pcie_clkpm_override_default_link_state(link, enabled);
>> > link->clkpm_capable = capable;
>> > link->clkpm_disable = blacklist ? 1 : 0;
>> > }
>> > @@ -788,6 +804,26 @@ static void aspm_l1ss_init(struct pcie_link_state *link)
>> > aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap);
>> > }
>> > +static void pcie_aspm_override_default_link_state(struct pcie_link_state *link)
>> > +{
>> > + struct pci_dev *pdev = link->downstream;
>> > + u32 override;
>> > +
>> > + /* Override the BIOS disabled ASPM states for devicetree platforms */
>> > + if (of_have_populated_dt()) {
>> > + link->aspm_default = PCIE_LINK_STATE_ASPM_ALL;
>> > + override = link->aspm_default & ~link->aspm_enabled;
>> > + if (override)
>> > + pci_info(pdev, "ASPM states overridden: %s%s%s%s%s%s\n",
>> > + (override & PCIE_LINK_STATE_L0S) ? "L0s+, " : "",
>> > + (override & PCIE_LINK_STATE_L1) ? "L1+, " : "",
>> > + (override & PCIE_LINK_STATE_L1_1) ? "L1.1+, " : "",
>> > + (override & PCIE_LINK_STATE_L1_2) ? "L1.2+, " : "",
>> > + (override & PCIE_LINK_STATE_L1_1_PCIPM) ? "L1.1 PCI-PM+, " : "",
>> > + (override & PCIE_LINK_STATE_L1_2_PCIPM) ? "L1.2 PCI-PM+" : "");
>> > + }
>> > +}
>> > +
>> > static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
>> > {
>> > struct pci_dev *child = link->downstream, *parent = link->pdev;
>> > @@ -868,6 +904,8 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
>> > /* Save default state */
>> > link->aspm_default = link->aspm_enabled;
>> > + pcie_aspm_override_default_link_state(link);
>> > +
>> > /* Setup initial capable state. Will be updated later */
>> > link->aspm_capable = link->aspm_support;
>> >
>>
>>
>> _______________________________________________
>> Linux-rockchip mailing list
>> Linux-rockchip@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-rockchip
>
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 12:26 ` Diederik de Haas
@ 2025-10-15 22:50 ` Bjorn Helgaas
2025-10-16 17:38 ` Diederik de Haas
0 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-10-15 22:50 UTC (permalink / raw)
To: Diederik de Haas
Cc: FUKAUMI Naoki, manivannan.sadhasivam, Bjorn Helgaas,
Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
Heiner Kallweit, Chia-Lin Kao, Dragan Simic, linux-rockchip,
regressions
On Wed, Oct 15, 2025 at 02:26:30PM +0200, Diederik de Haas wrote:
> On Tue Oct 14, 2025 at 8:49 PM CEST, Bjorn Helgaas wrote:
> > On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
> >> I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
> >> Rockchip RK3588(S) SoC.
> >>
> >> When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
> >> freezes or fails to probe M.2 Wi-Fi modules. This happens with several
> >> different modules I've tested, including the Realtek RTL8852BE, MediaTek
> >> MT7921E, and Intel AX210.
> >>
> >> I've found that reverting the following commit (i.e., the patch I'm replying
> >> to) resolves the problem:
> >> commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
> >
> > Thanks for the report, and sorry for the regression.
> >
> > Since this affects several devices from different manufacturers and (I
> > assume) different drivers, it seems likely that there's some issue
> > with the Rockchip end, since ASPM probably works on these devices in
> > other systems. So we should figure out if there's something wrong
> > with the way we configure ASPM, which we could potentially fix, or if
> > there's a hardware issue and we need some king of quirk to prevent
> > usage of ASPM on the affected platforms.
> >
> > Can you collect a complete dmesg log when booting with
> >
> > ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
> >
> > and the output of "sudo lspci -vv"?
>
> I have a Rock 5B as well, but I don't have a Wi-Fi module, but I do have
> a NVMe drive connected. That boots fine with 6.17, but I end up in a
> rescue shell with 6.18-rc1. I haven't verified that it's caused by the
> same commit, but it does sound plausible.
FWIW, my expectation is that booting with "pcie_aspm=off" should
effectively avoid the ASPM enabling and behave similarly to reverting
f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for
devicetree platforms"). My hope was that we could boot that way and
incrementally enable ASPM via sysfs a device at a time for testing.
If hardware implements ASPM correctly, enabling it should have no
functional impact at all, so we might be tripping over some kind of
hardware bug or maybe a generic Linux issue (ASPM has to be enabled in
a very specific order, and it's conceivable we messed that up).
> On this device, the NVMe isn't strictly needed (I used it to compile my
> kernels on), so I added 'noauto' to the NVMe line in /etc/fstab ... and
> that made it boot successfully into 6.18-rc1. Then running the 'mount'
> command wrt that NVMe drive failed with this message:
>
> EXT4-fs (nvme0n1p1): unable to read superblock
>
> The log of my attempts can be found here:
> https://paste.sr.ht/~diederik/f435eb258dca60676f7ac5154c00ddfdc24ac0b7
>
> > When the kernel freezes, can you give us any information about where,
> > e.g., a log or screenshot?
>
> For me, there is no kernel freeze. I ended up in a rescue shell as it
> couldn't mount the NVMe drive. As described above, when not letting it
> auto-mount that drive, the boot completed normally.
Thanks for the log, it's very useful. This is pieced together from
the serial console log and the "dmesg --level" output, but I think
it's all the same boot:
[ 2.872094] rockchip-dw-pcie a40000000.pcie: PCI host bridge to bus 0000:00
[ 2.885904] pci 0000:00:00.0: [1d87:3588] type 01 class 0x060400 PCIe Root Port
[ 2.888237] pci 0000:00:00.0: PCI bridge to [bus 01-ff]
[ 3.143823] pci 0000:01:00.0: [144d:a80a] type 00 class 0x010802 PCIe Endpoint
[ 3.144646] pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
[ 3.162748] pci 0000:01:00.0: BAR 0 [mem 0xf0200000-0xf0203fff 64bit]: assigned
[ 3.298198] nvme nvme0: pci function 0000:01:00.0
[ 3.298901] nvme 0000:01:00.0: enabling device (0000 -> 0002)
[ 3.316695] nvme nvme0: D3 entry latency set to 10 seconds
...
[ 18.921811] rockchip-pm-domain fd8d8000.power-management:power-controller: sync_state() pending due to fdad0000.npu
[ 18.922737] rockchip-pm-domain fd8d8000.power-management:power-controller: sync_state() pending due to fdb50000.video-codec
...
[ 39.971050] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS read failed (134)
[ 39.971945] nvme nvme0: Does your device have a faulty power saving mode enabled?
[ 39.972609] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
[ 42.357637] nvme0n1: I/O Cmd(0x2) @ LBA 0, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
[ 42.358644] I/O error, dev nvme0n1, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[ 42.391612] nvme 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 42.443644] nvme nvme0: Disabling device after reset failure: -19
[ 42.459544] Buffer I/O error on dev nvme0n1, logical block 0, async page read
[ 42.607749] EXT4-fs (nvme0n1p1): unable to read superblock
The earlydump info shows the 00:00.0 Root Port had I/O+ Mem+
BusMaster+ (0x0107) and the 01:00.0 NVMe initially had I/O- Mem-
BusMaster- (0x0000). We were able to enumerate the NVMe device and
assign its BAR, and the nvme driver turned on Mem+ (0x002).
nvme_timeout
csts = readl(dev->bar + NVME_REG_CSTS)
if (nvme_should_reset(csts))
nvme_warn_reset(csts)
result = pci_read_config_word(PCI_STATUS)
"controller is down; will reset: CSTS=0xffffffff, ... failed (134)"
nvme_dev_disable
But I think the NVMe device was powered down to D3cold somewhere
before 39.971050. I don't know if the power-controller messages at
18.921811 have any connection, and I don't know why ASPM would be
related.
In any event, the NVME_REG_CSTS mem read returned ~0, probably because
the device didn't respond and the RC fabricated ~0. The PCI_STATUS
config read failed with 134 (PCIBIOS_DEVICE_NOT_FOUND). The config
read should be this path, which probably failed because the link was
down, which would happen if NVMe is in D3cold:
pci_read_config_word
if (pci_dev_is_disconnected())
return PCIBIOS_DEVICE_NOT_FOUND
pci_bus_read_config_word
ret = bus->ops->read
dw_pcie_rd_other_conf
pci_generic_config_read
addr = bus->ops->map_bus
dw_pcie_other_conf_map_bus
if (!dw_pcie_link_up())
return pci->ops->link_up
rockchip_pcie_link_up # .link_up
return NULL # link was down
if (!addr) # .map_bus() failed b/c link down
return PCIBIOS_DEVICE_NOT_FOUND
Your lspci shows no response, i.e., config reads to the device
returned ~0:
0000:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd SSD 980 PRO
!!! Unknown header type 7f
Interrupt: pin ? routed to IRQ 94
The Root Port shows a Completion Timeout error, which might be a
consequence of NVMe being powered off:
0000:00:00.0 PCI bridge: Rockchip Electronics Co., Ltd RK3588 (rev 01) (prog-if 00 [Normal decode])
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
Bottom line, I don't think I can get any further with this particular
issue until we confirm that f3ac2ff14834 ("PCI/ASPM: Enable all
ClockPM and ASPM states for devicetree platforms") is the cause.
Bjorn
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-15 22:50 ` Bjorn Helgaas
@ 2025-10-16 17:38 ` Diederik de Haas
0 siblings, 0 replies; 42+ messages in thread
From: Diederik de Haas @ 2025-10-16 17:38 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: FUKAUMI Naoki, manivannan.sadhasivam, Bjorn Helgaas,
Manivannan Sadhasivam, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
Heiner Kallweit, Chia-Lin Kao, Dragan Simic, linux-rockchip,
regressions, Ulf Hansson
On Thu Oct 16, 2025 at 12:50 AM CEST, Bjorn Helgaas wrote:
> On Wed, Oct 15, 2025 at 02:26:30PM +0200, Diederik de Haas wrote:
>> On Tue Oct 14, 2025 at 8:49 PM CEST, Bjorn Helgaas wrote:
>> > On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
>> >> I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
>> >> Rockchip RK3588(S) SoC.
>> >>
>> >> When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
>> >> freezes or fails to probe M.2 Wi-Fi modules. This happens with several
>> >> different modules I've tested, including the Realtek RTL8852BE, MediaTek
>> >> MT7921E, and Intel AX210.
>> >>
>> >> I've found that reverting the following commit (i.e., the patch I'm replying
>> >> to) resolves the problem:
>> >> commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
>> >
>> > Can you collect a complete dmesg log when booting with
>> >
>> > ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
>> >
>> > and the output of "sudo lspci -vv"?
>>
>> I have a Rock 5B as well, but I don't have a Wi-Fi module, but I do have
>> a NVMe drive connected. That boots fine with 6.17, but I end up in a
>> rescue shell with 6.18-rc1. I haven't verified that it's caused by the
>> same commit, but it does sound plausible.
>
> FWIW, my expectation is that booting with "pcie_aspm=off" should
> effectively avoid the ASPM enabling and behave similarly to reverting
> f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for
> devicetree platforms"). My hope was that we could boot that way and
> incrementally enable ASPM via sysfs a device at a time for testing.
>
> [Moved last lines up here]
> Bottom line, I don't think I can get any further with this particular
> issue until we confirm that f3ac2ff14834 ("PCI/ASPM: Enable all
> ClockPM and ASPM states for devicetree platforms") is the cause.
I built a 6.18-rc1 kernel with that commit reverted and when booted up,
I could mount my NVMe drive. Next I removed the 'noauto' from /etc/fstab
and rebooted and that succeeded as well.
So I think we can conclude that commit f3ac2ff14834 is the cause.
>> On this device, the NVMe isn't strictly needed (I used it to compile my
>> kernels on), so I added 'noauto' to the NVMe line in /etc/fstab ... and
>> that made it boot successfully into 6.18-rc1. Then running the 'mount'
>> command wrt that NVMe drive failed with this message:
>>
>> EXT4-fs (nvme0n1p1): unable to read superblock
>>
>> The log of my attempts can be found here:
>> https://paste.sr.ht/~diederik/f435eb258dca60676f7ac5154c00ddfdc24ac0b7
>
> Thanks for the log, it's very useful. This is pieced together from
> the serial console log and the "dmesg --level" output, but I think
> it's all the same boot:
Correct.
> ...
> [ 18.921811] rockchip-pm-domain fd8d8000.power-management:power-controller: sync_state() pending due to fdad0000.npu
> [ 18.922737] rockchip-pm-domain fd8d8000.power-management:power-controller: sync_state() pending due to fdb50000.video-codec
> ...
>
> The earlydump info shows the 00:00.0 Root Port had I/O+ Mem+
> BusMaster+ (0x0107) and the 01:00.0 NVMe initially had I/O- Mem-
> BusMaster- (0x0000). We were able to enumerate the NVMe device and
> assign its BAR, and the nvme driver turned on Mem+ (0x002).
>
> nvme_timeout
> csts = readl(dev->bar + NVME_REG_CSTS)
> if (nvme_should_reset(csts))
> nvme_warn_reset(csts)
> result = pci_read_config_word(PCI_STATUS)
> "controller is down; will reset: CSTS=0xffffffff, ... failed (134)"
> nvme_dev_disable
>
> But I think the NVMe device was powered down to D3cold somewhere
> before 39.971050. I don't know if the power-controller messages at
> 18.921811 have any connection, and I don't know why ASPM would be
> related.
I highly doubt they're connected. These threads are relevant:
https://lore.kernel.org/all/20250701114733.636510-1-ulf.hansson@linaro.org/
https://lore.kernel.org/all/20250909111130.132976-1-ulf.hansson@linaro.org/
https://lore.kernel.org/all/20251007094312.590819-1-ulf.hansson@linaro.org/
TL;DR: Those warnings will (likely) be downgraded to 'info'.
Cheers,
Diederik
^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-14 18:49 ` Bjorn Helgaas
` (2 preceding siblings ...)
2025-10-15 12:26 ` Diederik de Haas
@ 2025-10-30 22:14 ` Bjorn Helgaas
2025-10-30 22:16 ` Bjorn Helgaas
3 siblings, 1 reply; 42+ messages in thread
From: Bjorn Helgaas @ 2025-10-30 22:14 UTC (permalink / raw)
To: FUKAUMI Naoki
Cc: manivannan.sadhasivam, Bjorn Helgaas, Manivannan Sadhasivam,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions
On Tue, Oct 14, 2025 at 01:49:05PM -0500, Bjorn Helgaas wrote:
> On Wed, Oct 15, 2025 at 01:30:16AM +0900, FUKAUMI Naoki wrote:
> > Hi Manivannan Sadhasivam,
> >
> > I've noticed an issue on Radxa ROCK 5A/5B boards, which are based on the
> > Rockchip RK3588(S) SoC.
> >
> > When running Linux v6.18-rc1 or linux-next since 20250924, the kernel either
> > freezes or fails to probe M.2 Wi-Fi modules. This happens with several
> > different modules I've tested, including the Realtek RTL8852BE, MediaTek
> > MT7921E, and Intel AX210.
> >
> > I've found that reverting the following commit (i.e., the patch I'm replying
> > to) resolves the problem:
> > commit f3ac2ff14834a0aa056ee3ae0e4b8c641c579961
>
> Thanks for the report, and sorry for the regression.
>
> Since this affects several devices from different manufacturers and (I
> assume) different drivers, it seems likely that there's some issue
> with the Rockchip end, since ASPM probably works on these devices in
> other systems. So we should figure out if there's something wrong
> with the way we configure ASPM, which we could potentially fix, or if
> there's a hardware issue and we need some king of quirk to prevent
> usage of ASPM on the affected platforms.
>
> Can you collect a complete dmesg log when booting with
>
> ignore_loglevel pci=earlydump dyndbg="file drivers/pci/* +p"
>
> and the output of "sudo lspci -vv"?
>
> When the kernel freezes, can you give us any information about where,
> e.g., a log or screenshot?
>
> Do you know if any platforms other than Radxa ROCK 5A/5B have this
> problem?
>
> #regzbot introduced: f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
> #regzbot dup-of: https://github.com/chzigotzky/kernels/issues/17
#regzbot report: https://lore.kernel.org/r/22594781424C5C98+22cb5d61-19b1-4353-9818-3bb2b311da0b@radxa.com
^ permalink raw reply [flat|nested] 42+ messages in thread* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
2025-10-30 22:14 ` Bjorn Helgaas
@ 2025-10-30 22:16 ` Bjorn Helgaas
0 siblings, 0 replies; 42+ messages in thread
From: Bjorn Helgaas @ 2025-10-30 22:16 UTC (permalink / raw)
To: FUKAUMI Naoki
Cc: manivannan.sadhasivam, Bjorn Helgaas, Manivannan Sadhasivam,
Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
linux-pci, linux-kernel, linux-arm-msm, David E. Box,
Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
Dragan Simic, linux-rockchip, regressions
On Thu, Oct 30, 2025 at 05:14:05PM -0500, Bjorn Helgaas wrote:
> On Tue, Oct 14, 2025 at 01:49:05PM -0500, Bjorn Helgaas wrote:
> > #regzbot introduced: f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
> > #regzbot dup-of: https://github.com/chzigotzky/kernels/issues/17
>
> #regzbot report: https://lore.kernel.org/r/22594781424C5C98+22cb5d61-19b1-4353-9818-3bb2b311da0b@radxa.com
Sorry for the noise; I'm trying to figure out how to use regzbot and
making lots of mistakes.
#regzbot fix: df5192d9bb0e
df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
^ permalink raw reply [flat|nested] 42+ messages in thread