From: Hans Zhang <hans.zhang@cixtech.com>
To: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
kbusch@kernel.org, axboe@kernel.dk, hch@lst.de, sagi@grimberg.me,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org
Subject: Re: [PATCH] nvme-pci: Fix system hang when ASPM L1 is enabled during suspend
Date: Sat, 3 May 2025 22:36:30 +0800 [thread overview]
Message-ID: <8c590b78-6f54-4ae2-9263-3553b5e27527@cixtech.com> (raw)
In-Reply-To: <onw47gzc6mda2unsew36b2cmp2et3ijrjqlmgpueeko5vucgph@wrkaiqlbo2fp>
On 2025/5/3 02:05, Manivannan Sadhasivam wrote:
> EXTERNAL EMAIL
>
> On Sat, May 03, 2025 at 12:20:52AM +0800, Hans Zhang wrote:
>>
>>
>> On 2025/5/3 00:07, Hans Zhang wrote:
>>>
>>>
>>> On 2025/5/2 23:58, Manivannan Sadhasivam wrote:
>>>> EXTERNAL EMAIL
>>>>
>>>> On Fri, May 02, 2025 at 11:49:07PM +0800, Hans Zhang wrote:
>>>>>
>>>>>
>>>>> On 2025/5/2 23:00, Bjorn Helgaas wrote:
>>>>>> EXTERNAL EMAIL
>>>>>>
>>>>>> On Fri, May 02, 2025 at 11:20:51AM +0800, hans.zhang@cixtech.com wrote:
>>>>>>> From: Hans Zhang <hans.zhang@cixtech.com>
>>>>>>>
>>>>>>> When PCIe ASPM L1 is enabled (CONFIG_PCIEASPM_POWERSAVE=y), certain
>>>>>>
>>>>>> CONFIG_PCIEASPM_POWERSAVE=y only sets the default. L1 can be enabled
>>>>>> dynamically regardless of the config.
>>>>>>
>>>>>
>>>>> Dear Bjorn,
>>>>>
>>>>> Thank you very much for your reply.
>>>>>
>>>>> Yes. To reduce the power consumption of the SOC system, we have
>>>>> enabled ASPM
>>>>> L1 by default.
>>>>>
>>>>>>> NVMe controllers fail to release LPI MSI-X interrupts during system
>>>>>>> suspend, leading to a system hang. This occurs because the driver's
>>>>>>> existing power management path does not fully disable the device
>>>>>>> when ASPM is active.
>>>>>>
>>>>>> I have no idea what this has to do with ASPM L1. I do see that
>>>>>> nvme_suspend() tests pcie_aspm_enabled(pdev) (which seems kind of
>>>>>> janky and racy). But this doesn't explain anything about what would
>>>>>> cause a system hang.
>>>>>
>>>>> [ 92.411265] [pid:322,cpu11,kworker/u24:6]nvme 0000:91:00.0:
>>>>> PM: calling
>>>>> pci_pm_suspend_noirq+0x0/0x2c0 @ 322, parent: 0000:90:00.0
>>>>> [ 92.423028] [pid:322,cpu11,kworker/u24:6]nvme 0000:91:00.0: PM:
>>>>> pci_pm_suspend_noirq+0x0/0x2c0 returned 0 after 1 usecs
>>>>> [ 92.433894] [pid:324,cpu10,kworker/u24:7]pcieport 0000:90:00.0: PM:
>>>>> calling pci_pm_suspend_noirq+0x0/0x2c0 @ 324, parent: pci0000:90
>>>>> [ 92.445880] [pid:324,cpu10,kworker/u24:7]pcieport 0000:90:00.0: PM:
>>>>> pci_pm_suspend_noirq+0x0/0x2c0 returned 0 after 39 usecs
>>>>> [ 92.457227] [pid:916,cpu7,bash]sky1-pcie a070000.pcie: PM: calling
>>>>> sky1_pcie_suspend_noirq+0x0/0x174 @ 916, parent: soc@0
>>>>> [ 92.479315] [pid:916,cpu7,bash]cix-pcie-phy a080000.pcie_phy:
>>>>> pcie_phy_common_exit end
>>>>> [ 92.487389] [pid:916,cpu7,bash]sky1-pcie a070000.pcie:
>>>>> sky1_pcie_suspend_noirq
>>>>> [ 92.494604] [pid:916,cpu7,bash]sky1-pcie a070000.pcie: PM:
>>>>> sky1_pcie_suspend_noirq+0x0/0x174 returned 0 after 26379 usecs
>>>>> [ 92.505619] [pid:916,cpu7,bash]sky1-audss-clk
>>>>> 7110000.system-controller:clock-controller: PM: calling
>>>>> genpd_suspend_noirq+0x0/0x80 @ 916, parent: 7110000.system-controller
>>>>> [ 92.520919] [pid:916,cpu7,bash]sky1-audss-clk
>>>>> 7110000.system-controller:clock-controller: PM:
>>>>> genpd_suspend_noirq+0x0/0x80
>>>>> returned 0 after 1 usecs
>>>>> [ 92.534214] [pid:916,cpu7,bash]Disabling non-boot CPUs ...
>>>>>
>>>>>
>>>>> Hans: Before I added the printk for debugging, it hung here.
>>>>>
>>>>>
>>>>> I added the log output after debugging printk.
>>>>>
>>>>> Sky1 SOC Root Port driver's suspend function: sky1_pcie_suspend_noirq
>>>>> Our hardware is in STR(suspend to ram), and the controller and
>>>>> PHY will lose
>>>>> power.
>>>>>
>>>>> So in sky1_pcie_suspend_noirq, the AXI,APB clock, etc. of the PCIe
>>>>> controller will be turned off. In sky1_pcie_resume_noirq, the PCIe
>>>>> controller and PHY will be reinitialized. If suspend does not
>>>>> close the AXI
>>>>> and APB clock, and the AXI is reopened during the resume
>>>>> process, the APB
>>>>> clock will cause the reference count of the kernel API to accumulate
>>>>> continuously.
>>>>>
>>>>
>>>> So this is the actual issue (controller loosing power during system
>>>> suspend) and
>>>> everything else (ASPM, MSIX write) are all side effects of it.
>>>>
>>
>> Dear Mani,
>>
>> There are some things I don't understand here. Why doesn't the NVMe SSD
>> driver release the MSI/MSIx interrupt when ASPM is enabled? However, if ASPM
>> is not enabled, the MSI/MSIx interrupt will be released instead.
>>
>
> You mean by calling pci_free_irq_vectors()? If so, the reason is that if ASPM is
> unavailable, then the NVMe cannot be put into low power APST state during
> suspend. So shutting down it is the only sane option to save power, with the
> cost of increased resume latency. But if ASPM is available, then the driver
> doesn't shut the NVMe as it relies on APST to keep the NVMe controller/memory in
> low power mode.
>
Dear Mani,
Thank you for your explanation.
Best regards,
Hans
next prev parent reply other threads:[~2025-05-03 14:36 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-02 3:20 [PATCH] nvme-pci: Fix system hang when ASPM L1 is enabled during suspend hans.zhang
2025-05-02 15:00 ` Bjorn Helgaas
2025-05-02 15:49 ` Hans Zhang
2025-05-02 15:58 ` Manivannan Sadhasivam
2025-05-02 16:07 ` Hans Zhang
2025-05-02 16:20 ` Hans Zhang
2025-05-02 18:05 ` Manivannan Sadhasivam
2025-05-03 14:36 ` Hans Zhang [this message]
2025-05-02 15:45 ` Manivannan Sadhasivam
2025-05-02 16:04 ` Hans Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8c590b78-6f54-4ae2-9263-3553b5e27527@cixtech.com \
--to=hans.zhang@cixtech.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=manivannan.sadhasivam@linaro.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox