From: Hans Zhang <hans.zhang@cixtech.com>
To: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
kbusch@kernel.org, axboe@kernel.dk, hch@lst.de, sagi@grimberg.me,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org
Subject: Re: [PATCH] nvme-pci: Fix system hang when ASPM L1 is enabled during suspend
Date: Sat, 3 May 2025 00:20:52 +0800 [thread overview]
Message-ID: <58e343d9-adf3-4853-9dec-df7c1892d6b2@cixtech.com> (raw)
In-Reply-To: <433f2678-86c1-4ff6-88d1-7ed485cf44b7@cixtech.com>
On 2025/5/3 00:07, Hans Zhang wrote:
>
>
> On 2025/5/2 23:58, Manivannan Sadhasivam wrote:
>> EXTERNAL EMAIL
>>
>> On Fri, May 02, 2025 at 11:49:07PM +0800, Hans Zhang wrote:
>>>
>>>
>>> On 2025/5/2 23:00, Bjorn Helgaas wrote:
>>>> EXTERNAL EMAIL
>>>>
>>>> On Fri, May 02, 2025 at 11:20:51AM +0800, hans.zhang@cixtech.com wrote:
>>>>> From: Hans Zhang <hans.zhang@cixtech.com>
>>>>>
>>>>> When PCIe ASPM L1 is enabled (CONFIG_PCIEASPM_POWERSAVE=y), certain
>>>>
>>>> CONFIG_PCIEASPM_POWERSAVE=y only sets the default. L1 can be enabled
>>>> dynamically regardless of the config.
>>>>
>>>
>>> Dear Bjorn,
>>>
>>> Thank you very much for your reply.
>>>
>>> Yes. To reduce the power consumption of the SOC system, we have
>>> enabled ASPM
>>> L1 by default.
>>>
>>>>> NVMe controllers fail to release LPI MSI-X interrupts during system
>>>>> suspend, leading to a system hang. This occurs because the driver's
>>>>> existing power management path does not fully disable the device
>>>>> when ASPM is active.
>>>>
>>>> I have no idea what this has to do with ASPM L1. I do see that
>>>> nvme_suspend() tests pcie_aspm_enabled(pdev) (which seems kind of
>>>> janky and racy). But this doesn't explain anything about what would
>>>> cause a system hang.
>>>
>>> [ 92.411265] [pid:322,cpu11,kworker/u24:6]nvme 0000:91:00.0: PM:
>>> calling
>>> pci_pm_suspend_noirq+0x0/0x2c0 @ 322, parent: 0000:90:00.0
>>> [ 92.423028] [pid:322,cpu11,kworker/u24:6]nvme 0000:91:00.0: PM:
>>> pci_pm_suspend_noirq+0x0/0x2c0 returned 0 after 1 usecs
>>> [ 92.433894] [pid:324,cpu10,kworker/u24:7]pcieport 0000:90:00.0: PM:
>>> calling pci_pm_suspend_noirq+0x0/0x2c0 @ 324, parent: pci0000:90
>>> [ 92.445880] [pid:324,cpu10,kworker/u24:7]pcieport 0000:90:00.0: PM:
>>> pci_pm_suspend_noirq+0x0/0x2c0 returned 0 after 39 usecs
>>> [ 92.457227] [pid:916,cpu7,bash]sky1-pcie a070000.pcie: PM: calling
>>> sky1_pcie_suspend_noirq+0x0/0x174 @ 916, parent: soc@0
>>> [ 92.479315] [pid:916,cpu7,bash]cix-pcie-phy a080000.pcie_phy:
>>> pcie_phy_common_exit end
>>> [ 92.487389] [pid:916,cpu7,bash]sky1-pcie a070000.pcie:
>>> sky1_pcie_suspend_noirq
>>> [ 92.494604] [pid:916,cpu7,bash]sky1-pcie a070000.pcie: PM:
>>> sky1_pcie_suspend_noirq+0x0/0x174 returned 0 after 26379 usecs
>>> [ 92.505619] [pid:916,cpu7,bash]sky1-audss-clk
>>> 7110000.system-controller:clock-controller: PM: calling
>>> genpd_suspend_noirq+0x0/0x80 @ 916, parent: 7110000.system-controller
>>> [ 92.520919] [pid:916,cpu7,bash]sky1-audss-clk
>>> 7110000.system-controller:clock-controller: PM:
>>> genpd_suspend_noirq+0x0/0x80
>>> returned 0 after 1 usecs
>>> [ 92.534214] [pid:916,cpu7,bash]Disabling non-boot CPUs ...
>>>
>>>
>>> Hans: Before I added the printk for debugging, it hung here.
>>>
>>>
>>> I added the log output after debugging printk.
>>>
>>> Sky1 SOC Root Port driver's suspend function: sky1_pcie_suspend_noirq
>>> Our hardware is in STR(suspend to ram), and the controller and PHY
>>> will lose
>>> power.
>>>
>>> So in sky1_pcie_suspend_noirq, the AXI,APB clock, etc. of the PCIe
>>> controller will be turned off. In sky1_pcie_resume_noirq, the PCIe
>>> controller and PHY will be reinitialized. If suspend does not close
>>> the AXI
>>> and APB clock, and the AXI is reopened during the resume process, the
>>> APB
>>> clock will cause the reference count of the kernel API to accumulate
>>> continuously.
>>>
>>
>> So this is the actual issue (controller loosing power during system
>> suspend) and
>> everything else (ASPM, MSIX write) are all side effects of it.
>>
Dear Mani,
There are some things I don't understand here. Why doesn't the NVMe SSD
driver release the MSI/MSIx interrupt when ASPM is enabled? However, if
ASPM is not enabled, the MSI/MSIx interrupt will be released instead.
Best regards,
Hans
>> Yes, this issue is more common with several vendors and we need to
>> come up with
>> a generic solution instead of hacking up the client drivers. I'm
>> planning to
>> work on it in the coming days. Will keep you in the loop.
>>
>
> Dear Mani,
>
> Thank you very much for your reply. Thank you very much for helping to
> solve this problem together. If possible, I'd be very glad to help with
> the test together.
>
> Best regards,
> Hans
>
>
>
next prev parent reply other threads:[~2025-05-02 16:21 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-02 3:20 [PATCH] nvme-pci: Fix system hang when ASPM L1 is enabled during suspend hans.zhang
2025-05-02 15:00 ` Bjorn Helgaas
2025-05-02 15:49 ` Hans Zhang
2025-05-02 15:58 ` Manivannan Sadhasivam
2025-05-02 16:07 ` Hans Zhang
2025-05-02 16:20 ` Hans Zhang [this message]
2025-05-02 18:05 ` Manivannan Sadhasivam
2025-05-03 14:36 ` Hans Zhang
2025-05-02 15:45 ` Manivannan Sadhasivam
2025-05-02 16:04 ` Hans Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58e343d9-adf3-4853-9dec-df7c1892d6b2@cixtech.com \
--to=hans.zhang@cixtech.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=manivannan.sadhasivam@linaro.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox