From: Jon Hunter <jonathanh@nvidia.com>
To: Manivannan Sadhasivam <mani@kernel.org>
Cc: "Bjorn Helgaas" <helgaas@kernel.org>,
manivannan.sadhasivam@oss.qualcomm.com,
"Thierry Reding" <treding@nvidia.com>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Rob Herring" <robh@kernel.org>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-msm@vger.kernel.org,
"David E. Box" <david.e.box@linux.intel.com>,
"Kai-Heng Feng" <kai.heng.feng@canonical.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
"Heiner Kallweit" <hkallweit1@gmail.com>,
"Chia-Lin Kao" <acelan.kao@canonical.com>,
"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
"Keith Busch" <kbusch@kernel.org>, "Jens Axboe" <axboe@kernel.dk>,
"Christoph Hellwig" <hch@lst.de>,
"Sagi Grimberg" <sagi@grimberg.me>,
linux-nvme@lists.infradead.org
Subject: Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
Date: Tue, 12 May 2026 10:07:50 +0100 [thread overview]
Message-ID: <daa93cc4-090a-4eb0-91c3-029e0b037b71@nvidia.com> (raw)
In-Reply-To: <fb6uzh3jfes3hky6fblpsh2vvg3daij5ogecydiuhmytxbglcb@tdqjcoxuymsk>
On 11/05/2026 06:18, Manivannan Sadhasivam wrote:
> On Thu, May 07, 2026 at 11:25:23AM +0100, Jon Hunter wrote:
>> Hi Bjorn, Mani,
>>
>> On 22/01/2026 15:29, Bjorn Helgaas wrote:
>>> [+cc NVMe folks]
>>>
>>> On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
>>>> ...
>>>
>>>> Since this commit was added in Linux v6.18, I have been observing a suspend
>>>> test failures on some of our boards. The suspend test suspends the devices
>>>> for 20 secs and before this change the board would resume in about ~27 secs
>>>> (including the 20 sec sleep). After this change the board would take over 80
>>>> secs to resume and this triggered a failure.
>>>>
>>>> Looking at the logs, I can see it is the NVMe device on the board that is
>>>> having an issue, and I see the reset failing ...
>>>>
>>>> [ 945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
>>>> flow control rx/tx
>>>> [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
>>>> 0 timeout, reset controller
>>>> [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
>>>> [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
>>>> [ 1003.050481] OOM killer enabled.
>>>> [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
>>>>
>>>> From the above timestamps the delay is coming from the NVMe. I see this
>>>> issue on several boards with different NVMe devices and I can workaround
>>>> this by disabling ASPM L0/L1 for these devices ...
>>>>
>>>> DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
>>>> DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
>>>> DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
>>>> DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
>>>>
>>>> I am curious if you have seen any similar issues?
>>>>
>>>> Other PCIe devices seem to be OK (like the realtek r8169) but just
>>>> the NVMe is having issues. So I am trying to figure out the best way
>>>> to resolve this?
>>>
>>> For context, "this commit" refers to f3ac2ff14834, modified by
>>> df5192d9bb0e:
>>>
>>> f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
>>> df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
>>>
>>> The fact that this suspend issue only affects NVMe reminds me of the
>>> code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
>>> enabled because of some NVMe expectation:
>>>
>>> dw_pcie_suspend_noirq()
>>> {
>>> ...
>>> /*
>>> * If L1SS is supported, then do not put the link into L2 as some
>>> * devices such as NVMe expect low resume latency.
>>> */
>>> if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>>> return 0;
>>> ...
>>>
>>> That suggests there's some NVMe/ASPM interaction that the PCI core
>>> doesn't understand yet.
>>>
>>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-designware-host.c?id=v6.18#n1146
>>
>>
>> I want to revisit this issue. From my perspective low-power suspend has now
>> been broken on some of our Tegra platforms (that have NVMe devices) since
>> v6.19 and so far this is no resolution to this issue. The patch that was
>> proposed to fix this [0] has been rejected by qualcomm and although this
>> does workaround the issue, my confidence that this is the right fix is now
>> low.
>>
>
> The referenced patch is now merged into arm-soc for v7.2:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=7602c0ec0bbfd3985d49f4f0cad281c1414008c9
>
> I hope this takes care of the issue you are dealing with.
Well yes this patch does fix issues for us. However, I am still a bit
confused about this whole thing given that this patch does not work for
all qualcomm platforms. Anyway, I guess we have not seen any other
issues so far with the above and so may be we can consider this closed
for now.
Thanks
Jon
--
nvpublic
prev parent reply other threads:[~2026-05-12 9:08 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260122152903.GA1247682@bhelgaas>
2026-05-07 10:25 ` [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms Jon Hunter
2026-05-11 5:18 ` Manivannan Sadhasivam
2026-05-12 9:07 ` Jon Hunter [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=daa93cc4-090a-4eb0-91c3-029e0b037b71@nvidia.com \
--to=jonathanh@nvidia.com \
--cc=acelan.kao@canonical.com \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=david.e.box@linux.intel.com \
--cc=hch@lst.de \
--cc=helgaas@kernel.org \
--cc=hkallweit1@gmail.com \
--cc=kai.heng.feng@canonical.com \
--cc=kbusch@kernel.org \
--cc=kwilczynski@kernel.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=mani@kernel.org \
--cc=manivannan.sadhasivam@oss.qualcomm.com \
--cc=rafael@kernel.org \
--cc=robh@kernel.org \
--cc=sagi@grimberg.me \
--cc=treding@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox