Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
       [not found] <7306256a-b380-489b-8248-b774e6d3d80e@nvidia.com>
@ 2026-01-22 15:29 ` Bjorn Helgaas
  2026-01-22 17:01   ` Manivannan Sadhasivam
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 15:29 UTC (permalink / raw)
  To: Jon Hunter
  Cc: manivannan.sadhasivam, Bjorn Helgaas, Manivannan Sadhasivam,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, David E. Box,
	Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
	linux-tegra@vger.kernel.org, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

[+cc NVMe folks]

On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
> ...

> Since this commit was added in Linux v6.18, I have been observing a suspend
> test failures on some of our boards. The suspend test suspends the devices
> for 20 secs and before this change the board would resume in about ~27 secs
> (including the 20 sec sleep). After this change the board would take over 80
> secs to resume and this triggered a failure.
> 
> Looking at the logs, I can see it is the NVMe device on the board that is
> having an issue, and I see the reset failing ...
> 
>  [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
>   flow control rx/tx
>  [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
>   0 timeout, reset controller
>  [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
>  [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
>  [ 1003.050481] OOM killer enabled.
>  [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
> 
> From the above timestamps the delay is coming from the NVMe. I see this
> issue on several boards with different NVMe devices and I can workaround
> this by disabling ASPM L0/L1 for these devices ...
> 
>  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
>  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
>  DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
>  DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
> 
> I am curious if you have seen any similar issues?
> 
> Other PCIe devices seem to be OK (like the realtek r8169) but just
> the NVMe is having issues. So I am trying to figure out the best way
> to resolve this?

For context, "this commit" refers to f3ac2ff14834, modified by
df5192d9bb0e:

  f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
  df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")

The fact that this suspend issue only affects NVMe reminds me of the
code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
enabled because of some NVMe expectation:

  dw_pcie_suspend_noirq()
  {
    ...
    /*
     * If L1SS is supported, then do not put the link into L2 as some
     * devices such as NVMe expect low resume latency.
     */
    if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
      return 0;
    ...

That suggests there's some NVMe/ASPM interaction that the PCI core
doesn't understand yet.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/dwc/pcie-designware-host.c?id=v6.18#n1146


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-01-22 15:29 ` [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms Bjorn Helgaas
@ 2026-01-22 17:01   ` Manivannan Sadhasivam
  2026-01-22 19:14     ` Jon Hunter
  2026-02-16 17:19     ` Claudiu Beznea
  0 siblings, 2 replies; 21+ messages in thread
From: Manivannan Sadhasivam @ 2026-01-22 17:01 UTC (permalink / raw)
  To: Bjorn Helgaas, Jon Hunter
  Cc: manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme

On Thu, Jan 22, 2026 at 09:29:03AM -0600, Bjorn Helgaas wrote:
> [+cc NVMe folks]
> 
> On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
> > ...
> 
> > Since this commit was added in Linux v6.18, I have been observing a suspend
> > test failures on some of our boards. The suspend test suspends the devices
> > for 20 secs and before this change the board would resume in about ~27 secs
> > (including the 20 sec sleep). After this change the board would take over 80
> > secs to resume and this triggered a failure.
> > 
> > Looking at the logs, I can see it is the NVMe device on the board that is
> > having an issue, and I see the reset failing ...
> > 
> >  [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
> >   flow control rx/tx
> >  [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
> >   0 timeout, reset controller
> >  [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
> >  [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
> >  [ 1003.050481] OOM killer enabled.
> >  [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
> > 
> > From the above timestamps the delay is coming from the NVMe. I see this
> > issue on several boards with different NVMe devices and I can workaround
> > this by disabling ASPM L0/L1 for these devices ...
> > 
> >  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
> >  DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
> >  DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
> >  DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
> > 
> > I am curious if you have seen any similar issues?
> > 
> > Other PCIe devices seem to be OK (like the realtek r8169) but just
> > the NVMe is having issues. So I am trying to figure out the best way
> > to resolve this?
> 
> For context, "this commit" refers to f3ac2ff14834, modified by
> df5192d9bb0e:
> 
>   f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
>   df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
> 
> The fact that this suspend issue only affects NVMe reminds me of the
> code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
> enabled because of some NVMe expectation:
> 
>   dw_pcie_suspend_noirq()
>   {
>     ...
>     /*
>      * If L1SS is supported, then do not put the link into L2 as some
>      * devices such as NVMe expect low resume latency.
>      */
>     if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>       return 0;
>     ...
> 
> That suggests there's some NVMe/ASPM interaction that the PCI core
> doesn't understand yet.
> 

We have this check in place since NVMe driver keeps the device in D0 and expects
the link to be in L1ss on platforms not passing below checks:

        if (pm_suspend_via_firmware() || !ctrl->npss ||
            !pcie_aspm_enabled(pdev) ||
            (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))

Since the majority of the DWC platforms do not pass the above checks, we don't
transition the device to D3Cold or link to L2/L3 in dw_pcie_suspend_noirq() if
the link is in L1ss. Though I think we should be checking for D0 state instead
of L1ss here.

I think what is going on here is that since before commits f3ac2ff14834 and
df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not enabled for
the device (and upstream port) and after those commits, this check is not
passing and the NVMe driver is not shutting down the controller and expects the
link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
transition, and also turns off the device. So all the NVMe context is lost
during suspend and while resuming, the NVMe driver got confused due to lost
context.

Jon, could you please try the below hack and see if it fixes the issue?

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 0e4caeab739c..4b8d261117f5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
         * state (which may not be possible if the link is up).
         */
        if (pm_suspend_via_firmware() || !ctrl->npss ||
-           !pcie_aspm_enabled(pdev) ||
+           pcie_aspm_enabled(pdev) ||
            (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
                return nvme_disable_prepare_reset(ndev, true);
 
This will confirm whether the issue is due to Tegra controller driver breaking
the NVMe driver assumption or not.

- Mani

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-01-22 17:01   ` Manivannan Sadhasivam
@ 2026-01-22 19:14     ` Jon Hunter
  2026-01-23 10:55       ` Jon Hunter
  2026-02-16 17:19     ` Claudiu Beznea
  1 sibling, 1 reply; 21+ messages in thread
From: Jon Hunter @ 2026-01-22 19:14 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Bjorn Helgaas
  Cc: manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme


On 22/01/2026 17:01, Manivannan Sadhasivam wrote:
> On Thu, Jan 22, 2026 at 09:29:03AM -0600, Bjorn Helgaas wrote:
>> [+cc NVMe folks]
>>
>> On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
>>> ...
>>
>>> Since this commit was added in Linux v6.18, I have been observing a suspend
>>> test failures on some of our boards. The suspend test suspends the devices
>>> for 20 secs and before this change the board would resume in about ~27 secs
>>> (including the 20 sec sleep). After this change the board would take over 80
>>> secs to resume and this triggered a failure.
>>>
>>> Looking at the logs, I can see it is the NVMe device on the board that is
>>> having an issue, and I see the reset failing ...
>>>
>>>   [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
>>>    flow control rx/tx
>>>   [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
>>>    0 timeout, reset controller
>>>   [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
>>>   [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
>>>   [ 1003.050481] OOM killer enabled.
>>>   [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
>>>
>>>  From the above timestamps the delay is coming from the NVMe. I see this
>>> issue on several boards with different NVMe devices and I can workaround
>>> this by disabling ASPM L0/L1 for these devices ...
>>>
>>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
>>>
>>> I am curious if you have seen any similar issues?
>>>
>>> Other PCIe devices seem to be OK (like the realtek r8169) but just
>>> the NVMe is having issues. So I am trying to figure out the best way
>>> to resolve this?
>>
>> For context, "this commit" refers to f3ac2ff14834, modified by
>> df5192d9bb0e:
>>
>>    f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
>>    df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
>>
>> The fact that this suspend issue only affects NVMe reminds me of the
>> code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
>> enabled because of some NVMe expectation:
>>
>>    dw_pcie_suspend_noirq()
>>    {
>>      ...
>>      /*
>>       * If L1SS is supported, then do not put the link into L2 as some
>>       * devices such as NVMe expect low resume latency.
>>       */
>>      if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>>        return 0;
>>      ...
>>
>> That suggests there's some NVMe/ASPM interaction that the PCI core
>> doesn't understand yet.
>>
> 
> We have this check in place since NVMe driver keeps the device in D0 and expects
> the link to be in L1ss on platforms not passing below checks:
> 
>          if (pm_suspend_via_firmware() || !ctrl->npss ||
>              !pcie_aspm_enabled(pdev) ||
>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
> 
> Since the majority of the DWC platforms do not pass the above checks, we don't
> transition the device to D3Cold or link to L2/L3 in dw_pcie_suspend_noirq() if
> the link is in L1ss. Though I think we should be checking for D0 state instead
> of L1ss here.
> 
> I think what is going on here is that since before commits f3ac2ff14834 and
> df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not enabled for
> the device (and upstream port) and after those commits, this check is not
> passing and the NVMe driver is not shutting down the controller and expects the
> link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
> transition, and also turns off the device. So all the NVMe context is lost
> during suspend and while resuming, the NVMe driver got confused due to lost
> context.
> 
> Jon, could you please try the below hack and see if it fixes the issue?
> 
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 0e4caeab739c..4b8d261117f5 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
>           * state (which may not be possible if the link is up).
>           */
>          if (pm_suspend_via_firmware() || !ctrl->npss ||
> -           !pcie_aspm_enabled(pdev) ||
> +           pcie_aspm_enabled(pdev) ||
>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
>                  return nvme_disable_prepare_reset(ndev, true);
>   
> This will confirm whether the issue is due to Tegra controller driver breaking
> the NVMe driver assumption or not.

Yes that appears to be working! I will test some more boards to confirm.

Cheers
Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-01-22 19:14     ` Jon Hunter
@ 2026-01-23 10:55       ` Jon Hunter
  2026-01-23 13:56         ` Manivannan Sadhasivam
  0 siblings, 1 reply; 21+ messages in thread
From: Jon Hunter @ 2026-01-23 10:55 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Bjorn Helgaas
  Cc: manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme


On 22/01/2026 19:14, Jon Hunter wrote:

...

>> I think what is going on here is that since before commits 
>> f3ac2ff14834 and
>> df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not 
>> enabled for
>> the device (and upstream port) and after those commits, this check is not
>> passing and the NVMe driver is not shutting down the controller and 
>> expects the
>> link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
>> transition, and also turns off the device. So all the NVMe context is 
>> lost
>> during suspend and while resuming, the NVMe driver got confused due to 
>> lost
>> context.
>>
>> Jon, could you please try the below hack and see if it fixes the issue?
>>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 0e4caeab739c..4b8d261117f5 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
>>           * state (which may not be possible if the link is up).
>>           */
>>          if (pm_suspend_via_firmware() || !ctrl->npss ||
>> -           !pcie_aspm_enabled(pdev) ||
>> +           pcie_aspm_enabled(pdev) ||
>>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
>>                  return nvme_disable_prepare_reset(ndev, true);
>> This will confirm whether the issue is due to Tegra controller driver 
>> breaking
>> the NVMe driver assumption or not.
> 
> Yes that appears to be working! I will test some more boards to confirm.

So yes with the above all boards appear to be working fine.

How is this usually coordinated between the NVMe driver and Host 
controller driver? It is not clear to me exactly where the problem is 
and if the NVMe is not shutting down, then what should be preventing the 
Host controller from shutting down.

Thanks
Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-01-23 10:55       ` Jon Hunter
@ 2026-01-23 13:56         ` Manivannan Sadhasivam
  2026-01-23 14:39           ` Jon Hunter
  2026-02-16 14:03           ` Jon Hunter
  0 siblings, 2 replies; 21+ messages in thread
From: Manivannan Sadhasivam @ 2026-01-23 13:56 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Bjorn Helgaas, manivannan.sadhasivam, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, David E. Box,
	Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
	linux-tegra@vger.kernel.org, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, krishna.chundru

+ Krishna

On Fri, Jan 23, 2026 at 10:55:28AM +0000, Jon Hunter wrote:
> 
> On 22/01/2026 19:14, Jon Hunter wrote:
> 
> ...
> 
> > > I think what is going on here is that since before commits
> > > f3ac2ff14834 and
> > > df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not
> > > enabled for
> > > the device (and upstream port) and after those commits, this check is not
> > > passing and the NVMe driver is not shutting down the controller and
> > > expects the
> > > link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
> > > transition, and also turns off the device. So all the NVMe context
> > > is lost
> > > during suspend and while resuming, the NVMe driver got confused due
> > > to lost
> > > context.
> > > 
> > > Jon, could you please try the below hack and see if it fixes the issue?
> > > 
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > index 0e4caeab739c..4b8d261117f5 100644
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
> > >           * state (which may not be possible if the link is up).
> > >           */
> > >          if (pm_suspend_via_firmware() || !ctrl->npss ||
> > > -           !pcie_aspm_enabled(pdev) ||
> > > +           pcie_aspm_enabled(pdev) ||
> > >              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
> > >                  return nvme_disable_prepare_reset(ndev, true);
> > > This will confirm whether the issue is due to Tegra controller
> > > driver breaking
> > > the NVMe driver assumption or not.
> > 
> > Yes that appears to be working! I will test some more boards to confirm.
> 
> So yes with the above all boards appear to be working fine.
> 
> How is this usually coordinated between the NVMe driver and Host controller
> driver? It is not clear to me exactly where the problem is and if the NVMe
> is not shutting down, then what should be preventing the Host controller
> from shutting down.
> 

Well if the NVMe driver is not shutting down the device, then it expects the
device to be in APST (NVMe low power state if supported) state and retain all
the context across the suspend/resume cycle.

But if the host controller powers down the device, then during resume, the
device will start afresh and would've lost all the context (like queue info
etc..). So when the NVMe driver resumes, it would expect the device to retain
the context and try to use the device as such. But it won't work as the device
will be in an unconfigured state and you'll see failures as you reported.

Apparently, most host controller drivers never cared about it because either
they were not tested with NVMe or they haven't enabled ASPM before. So the NVMe
driver ended up shutting down the controller during suspend. But since we
started enabling ASPM by default since 6.18, this issue is being uncovered.

So to properly fix it, we need the controller drivers to perform below checks
for all devices under the Root bus(ses) before initiating D3Cold:

1. Check if the device state is D3Hot. If it is not D3Hot, then the device is
expected to stay in the current D-state by the client driver, so D3Cold should
not be initiated.

2. Check if the device is wakeup capable. If it is, then check if it can support
wakeup from D3Cold (with WAKE#).

Only if both conditions are satisfied for all the devices under the Root busses,
then the host controller driver should initiate D3Cold sequence.

Krishna is going to post a patch that performs the above checks for
pcie-designware-host driver. But since the above checks are platform agnostic,
we should introduce a helper and resuse it across other controllers as well.

Hope this clarifies.

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-01-23 13:56         ` Manivannan Sadhasivam
@ 2026-01-23 14:39           ` Jon Hunter
  2026-02-16 14:03           ` Jon Hunter
  1 sibling, 0 replies; 21+ messages in thread
From: Jon Hunter @ 2026-01-23 14:39 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, manivannan.sadhasivam, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, David E. Box,
	Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
	linux-tegra@vger.kernel.org, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, krishna.chundru


On 23/01/2026 13:56, Manivannan Sadhasivam wrote:
> + Krishna
> 
> On Fri, Jan 23, 2026 at 10:55:28AM +0000, Jon Hunter wrote:
>>
>> On 22/01/2026 19:14, Jon Hunter wrote:
>>
>> ...
>>
>>>> I think what is going on here is that since before commits
>>>> f3ac2ff14834 and
>>>> df5192d9bb0e, !pcie_aspm_enabled() check was passing as ASPM was not
>>>> enabled for
>>>> the device (and upstream port) and after those commits, this check is not
>>>> passing and the NVMe driver is not shutting down the controller and
>>>> expects the
>>>> link to be in L0/L1ss. But the Tegra controller driver initiates L2/L3
>>>> transition, and also turns off the device. So all the NVMe context
>>>> is lost
>>>> during suspend and while resuming, the NVMe driver got confused due
>>>> to lost
>>>> context.
>>>>
>>>> Jon, could you please try the below hack and see if it fixes the issue?
>>>>
>>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>>> index 0e4caeab739c..4b8d261117f5 100644
>>>> --- a/drivers/nvme/host/pci.c
>>>> +++ b/drivers/nvme/host/pci.c
>>>> @@ -3723,7 +3723,7 @@ static int nvme_suspend(struct device *dev)
>>>>            * state (which may not be possible if the link is up).
>>>>            */
>>>>           if (pm_suspend_via_firmware() || !ctrl->npss ||
>>>> -           !pcie_aspm_enabled(pdev) ||
>>>> +           pcie_aspm_enabled(pdev) ||
>>>>               (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
>>>>                   return nvme_disable_prepare_reset(ndev, true);
>>>> This will confirm whether the issue is due to Tegra controller
>>>> driver breaking
>>>> the NVMe driver assumption or not.
>>>
>>> Yes that appears to be working! I will test some more boards to confirm.
>>
>> So yes with the above all boards appear to be working fine.
>>
>> How is this usually coordinated between the NVMe driver and Host controller
>> driver? It is not clear to me exactly where the problem is and if the NVMe
>> is not shutting down, then what should be preventing the Host controller
>> from shutting down.
>>
> 
> Well if the NVMe driver is not shutting down the device, then it expects the
> device to be in APST (NVMe low power state if supported) state and retain all
> the context across the suspend/resume cycle.
> 
> But if the host controller powers down the device, then during resume, the
> device will start afresh and would've lost all the context (like queue info
> etc..). So when the NVMe driver resumes, it would expect the device to retain
> the context and try to use the device as such. But it won't work as the device
> will be in an unconfigured state and you'll see failures as you reported.
> 
> Apparently, most host controller drivers never cared about it because either
> they were not tested with NVMe or they haven't enabled ASPM before. So the NVMe
> driver ended up shutting down the controller during suspend. But since we
> started enabling ASPM by default since 6.18, this issue is being uncovered.
> 
> So to properly fix it, we need the controller drivers to perform below checks
> for all devices under the Root bus(ses) before initiating D3Cold:
> 
> 1. Check if the device state is D3Hot. If it is not D3Hot, then the device is
> expected to stay in the current D-state by the client driver, so D3Cold should
> not be initiated.
> 
> 2. Check if the device is wakeup capable. If it is, then check if it can support
> wakeup from D3Cold (with WAKE#).
> 
> Only if both conditions are satisfied for all the devices under the Root busses,
> then the host controller driver should initiate D3Cold sequence.
> 
> Krishna is going to post a patch that performs the above checks for
> pcie-designware-host driver. But since the above checks are platform agnostic,
> we should introduce a helper and resuse it across other controllers as well.
> 
> Hope this clarifies.

Yes it does. I am happy to test any patches for this.

Jon
-- 
nvpublic



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-01-23 13:56         ` Manivannan Sadhasivam
  2026-01-23 14:39           ` Jon Hunter
@ 2026-02-16 14:03           ` Jon Hunter
  2026-02-16 14:18             ` Manivannan Sadhasivam
  1 sibling, 1 reply; 21+ messages in thread
From: Jon Hunter @ 2026-02-16 14:03 UTC (permalink / raw)
  To: Manivannan Sadhasivam, krishna.chundru
  Cc: Bjorn Helgaas, manivannan.sadhasivam, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, David E. Box,
	Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
	linux-tegra@vger.kernel.org, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

Hi Mani, Krishna,

On 23/01/2026 13:56, Manivannan Sadhasivam wrote:

...

> So to properly fix it, we need the controller drivers to perform below checks
> for all devices under the Root bus(ses) before initiating D3Cold:
> 
> 1. Check if the device state is D3Hot. If it is not D3Hot, then the device is
> expected to stay in the current D-state by the client driver, so D3Cold should
> not be initiated.
> 
> 2. Check if the device is wakeup capable. If it is, then check if it can support
> wakeup from D3Cold (with WAKE#).
> 
> Only if both conditions are satisfied for all the devices under the Root busses,
> then the host controller driver should initiate D3Cold sequence.
> 
> Krishna is going to post a patch that performs the above checks for
> pcie-designware-host driver. But since the above checks are platform agnostic,
> we should introduce a helper and resuse it across other controllers as well.


Do you have a rough idea of when you will be posting patches for this?

Thanks
Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-16 14:03           ` Jon Hunter
@ 2026-02-16 14:18             ` Manivannan Sadhasivam
  2026-02-16 14:35               ` Jon Hunter
  0 siblings, 1 reply; 21+ messages in thread
From: Manivannan Sadhasivam @ 2026-02-16 14:18 UTC (permalink / raw)
  To: Jon Hunter
  Cc: krishna.chundru, Bjorn Helgaas, manivannan.sadhasivam,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
	Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
	linux-tegra@vger.kernel.org, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Mon, Feb 16, 2026 at 02:03:41PM +0000, Jon Hunter wrote:
> Hi Mani, Krishna,
> 
> On 23/01/2026 13:56, Manivannan Sadhasivam wrote:
> 
> ...
> 
> > So to properly fix it, we need the controller drivers to perform below checks
> > for all devices under the Root bus(ses) before initiating D3Cold:
> > 
> > 1. Check if the device state is D3Hot. If it is not D3Hot, then the device is
> > expected to stay in the current D-state by the client driver, so D3Cold should
> > not be initiated.
> > 
> > 2. Check if the device is wakeup capable. If it is, then check if it can support
> > wakeup from D3Cold (with WAKE#).
> > 
> > Only if both conditions are satisfied for all the devices under the Root busses,
> > then the host controller driver should initiate D3Cold sequence.
> > 
> > Krishna is going to post a patch that performs the above checks for
> > pcie-designware-host driver. But since the above checks are platform agnostic,
> > we should introduce a helper and resuse it across other controllers as well.
> 
> 
> Do you have a rough idea of when you will be posting patches for this?
> 

Krishna posted the series a couple of weeks before but forgot to CC you:
https://lore.kernel.org/linux-pci/20260128-d3cold-v1-0-dd8f3f0ce824@oss.qualcomm.com/

You are expected to use the helper pci_host_common_can_enter_d3cold() in the
suspend path.

- Mani

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-16 14:18             ` Manivannan Sadhasivam
@ 2026-02-16 14:35               ` Jon Hunter
  2026-02-19 17:42                 ` Jon Hunter
  0 siblings, 1 reply; 21+ messages in thread
From: Jon Hunter @ 2026-02-16 14:35 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: krishna.chundru, Bjorn Helgaas, manivannan.sadhasivam,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
	Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
	linux-tegra@vger.kernel.org, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme


On 16/02/2026 14:18, Manivannan Sadhasivam wrote:
> On Mon, Feb 16, 2026 at 02:03:41PM +0000, Jon Hunter wrote:
>> Hi Mani, Krishna,
>>
>> On 23/01/2026 13:56, Manivannan Sadhasivam wrote:
>>
>> ...
>>
>>> So to properly fix it, we need the controller drivers to perform below checks
>>> for all devices under the Root bus(ses) before initiating D3Cold:
>>>
>>> 1. Check if the device state is D3Hot. If it is not D3Hot, then the device is
>>> expected to stay in the current D-state by the client driver, so D3Cold should
>>> not be initiated.
>>>
>>> 2. Check if the device is wakeup capable. If it is, then check if it can support
>>> wakeup from D3Cold (with WAKE#).
>>>
>>> Only if both conditions are satisfied for all the devices under the Root busses,
>>> then the host controller driver should initiate D3Cold sequence.
>>>
>>> Krishna is going to post a patch that performs the above checks for
>>> pcie-designware-host driver. But since the above checks are platform agnostic,
>>> we should introduce a helper and resuse it across other controllers as well.
>>
>>
>> Do you have a rough idea of when you will be posting patches for this?
>>
> 
> Krishna posted the series a couple of weeks before but forgot to CC you:
> https://lore.kernel.org/linux-pci/20260128-d3cold-v1-0-dd8f3f0ce824@oss.qualcomm.com/
> 
> You are expected to use the helper pci_host_common_can_enter_d3cold() in the
> suspend path.

Thanks! I will take a look.

Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-01-22 17:01   ` Manivannan Sadhasivam
  2026-01-22 19:14     ` Jon Hunter
@ 2026-02-16 17:19     ` Claudiu Beznea
  2026-02-18 13:56       ` Manivannan Sadhasivam
  1 sibling, 1 reply; 21+ messages in thread
From: Claudiu Beznea @ 2026-02-16 17:19 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Bjorn Helgaas, Jon Hunter
  Cc: manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme, John Madieu

Hi,

On 1/22/26 19:01, Manivannan Sadhasivam wrote:
> On Thu, Jan 22, 2026 at 09:29:03AM -0600, Bjorn Helgaas wrote:
>> [+cc NVMe folks]
>>
>> On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
>>> ...
>>
>>> Since this commit was added in Linux v6.18, I have been observing a suspend
>>> test failures on some of our boards. The suspend test suspends the devices
>>> for 20 secs and before this change the board would resume in about ~27 secs
>>> (including the 20 sec sleep). After this change the board would take over 80
>>> secs to resume and this triggered a failure.
>>>
>>> Looking at the logs, I can see it is the NVMe device on the board that is
>>> having an issue, and I see the reset failing ...
>>>
>>>   [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
>>>    flow control rx/tx
>>>   [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
>>>    0 timeout, reset controller
>>>   [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
>>>   [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
>>>   [ 1003.050481] OOM killer enabled.
>>>   [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
>>>
>>>  From the above timestamps the delay is coming from the NVMe. I see this
>>> issue on several boards with different NVMe devices and I can workaround
>>> this by disabling ASPM L0/L1 for these devices ...
>>>
>>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
>>>   DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
>>>
>>> I am curious if you have seen any similar issues?
>>>
>>> Other PCIe devices seem to be OK (like the realtek r8169) but just
>>> the NVMe is having issues. So I am trying to figure out the best way
>>> to resolve this?
>>
>> For context, "this commit" refers to f3ac2ff14834, modified by
>> df5192d9bb0e:
>>
>>    f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
>>    df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
>>
>> The fact that this suspend issue only affects NVMe reminds me of the
>> code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
>> enabled because of some NVMe expectation:
>>
>>    dw_pcie_suspend_noirq()
>>    {
>>      ...
>>      /*
>>       * If L1SS is supported, then do not put the link into L2 as some
>>       * devices such as NVMe expect low resume latency.
>>       */
>>      if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
>>        return 0;
>>      ...
>>
>> That suggests there's some NVMe/ASPM interaction that the PCI core
>> doesn't understand yet.
>>
> 
> We have this check in place since NVMe driver keeps the device in D0 and expects
> the link to be in L1ss on platforms not passing below checks:
> 
>          if (pm_suspend_via_firmware() || !ctrl->npss ||
>              !pcie_aspm_enabled(pdev) ||
>              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
> 

We noticed a similar issue with the Renesas RZ/G3S host driver and NVMe devices. 
We currently have 2 SoCs where we identified this problem (RZ/G3S and RZ/G3E), 
both present on SoM modules, and the SoM modules could be connected to the same 
carrier board where the PCIe signals are routed and connectors exists. On the 
carrier board we have 2 connectors were we can attach NVMe devices, one M.2 Key 
B and one PCIe x4 connector 
(https://www.amphenol-cs.com/product/10061913111plf.html).

The issue described in this thread is reproducible for us only after suspend and 
only for the NVMe device connected to the the PCIe x4 connector. The device is 
working correctly just after boot. On suspend, power to the most SoC components 
(including PCIe) is lost but the endpoints remains powered.

The issue is not reproducible if the following command is executed before 
suspend: echo performance > /sys/module/pcie_aspm/parameters/policy

The difference we identified in terms of signals connected from the SoC to the 
on board connectors relies in the CLKREQ#. This signal is only connected to the 
PCIe x4 slot.

On RZ/G3E the CLKREQ# is configured as a individual GPIO pin. On RZ/G3S it is 
muxed by the pin controller with PCIe function. We tried on RZ/G3E to not 
configure the CLKREQ# pin at all and with this the NVMe connected on the PCIe x4 
slot started to work even after suspend. We cannot reproduce the same behavior 
on RZ/G3S.

Initially, we considered we might have to update the existing code to do 
specific configuration for the boards were CLKREQ# is not connected (through 
supports-clkreq DT property that some controllers are using).

Currently, the manual is unclear on how to control CLKREQ#.

Apart from the suggestions mentioned in [1], could you please let me know if you 
have any others?

Thank you,
Claudiu

[1] 
https://lore.kernel.org/all/unc5zefwndgcv7wufaezz3gkg3qtaymkjlmymhyqdqwzn3wybl@ow2rhbyt772h/


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-16 17:19     ` Claudiu Beznea
@ 2026-02-18 13:56       ` Manivannan Sadhasivam
  0 siblings, 0 replies; 21+ messages in thread
From: Manivannan Sadhasivam @ 2026-02-18 13:56 UTC (permalink / raw)
  To: Claudiu Beznea
  Cc: Bjorn Helgaas, Jon Hunter, manivannan.sadhasivam, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, David E. Box,
	Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
	linux-tegra@vger.kernel.org, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme, John Madieu

On Mon, Feb 16, 2026 at 07:19:46PM +0200, Claudiu Beznea wrote:
> Hi,
> 
> On 1/22/26 19:01, Manivannan Sadhasivam wrote:
> > On Thu, Jan 22, 2026 at 09:29:03AM -0600, Bjorn Helgaas wrote:
> > > [+cc NVMe folks]
> > > 
> > > On Thu, Jan 22, 2026 at 12:12:42PM +0000, Jon Hunter wrote:
> > > > ...
> > > 
> > > > Since this commit was added in Linux v6.18, I have been observing a suspend
> > > > test failures on some of our boards. The suspend test suspends the devices
> > > > for 20 secs and before this change the board would resume in about ~27 secs
> > > > (including the 20 sec sleep). After this change the board would take over 80
> > > > secs to resume and this triggered a failure.
> > > > 
> > > > Looking at the logs, I can see it is the NVMe device on the board that is
> > > > having an issue, and I see the reset failing ...
> > > > 
> > > >   [  945.754939] r8169 0007:01:00.0 enP7p1s0: Link is Up - 1Gbps/Full -
> > > >    flow control rx/tx
> > > >   [ 1002.467432] nvme nvme0: I/O tag 12 (400c) opcode 0x9 (Admin Cmd) QID
> > > >    0 timeout, reset controller
> > > >   [ 1002.493713] nvme nvme0: 12/0/0 default/read/poll queues
> > > >   [ 1003.050448] nvme nvme0: ctrl state 1 is not RESETTING
> > > >   [ 1003.050481] OOM killer enabled.
> > > >   [ 1003.054035] nvme nvme0: Disabling device after reset failure: -19
> > > > 
> > > >  From the above timestamps the delay is coming from the NVMe. I see this
> > > > issue on several boards with different NVMe devices and I can workaround
> > > > this by disabling ASPM L0/L1 for these devices ...
> > > > 
> > > >   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5011, quirk_disable_aspm_l0s_l1);
> > > >   DECLARE_PCI_FIXUP_HEADER(0x15b7, 0x5036, quirk_disable_aspm_l0s_l1);
> > > >   DECLARE_PCI_FIXUP_HEADER(0x1b4b, 0x1322, quirk_disable_aspm_l0s_l1);
> > > >   DECLARE_PCI_FIXUP_HEADER(0xc0a9, 0x540a, quirk_disable_aspm_l0s_l1);
> > > > 
> > > > I am curious if you have seen any similar issues?
> > > > 
> > > > Other PCIe devices seem to be OK (like the realtek r8169) but just
> > > > the NVMe is having issues. So I am trying to figure out the best way
> > > > to resolve this?
> > > 
> > > For context, "this commit" refers to f3ac2ff14834, modified by
> > > df5192d9bb0e:
> > > 
> > >    f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for devicetree platforms")
> > >    df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
> > > 
> > > The fact that this suspend issue only affects NVMe reminds me of the
> > > code in dw_pcie_suspend_noirq() [1] that bails out early if L1 is
> > > enabled because of some NVMe expectation:
> > > 
> > >    dw_pcie_suspend_noirq()
> > >    {
> > >      ...
> > >      /*
> > >       * If L1SS is supported, then do not put the link into L2 as some
> > >       * devices such as NVMe expect low resume latency.
> > >       */
> > >      if (dw_pcie_readw_dbi(pci, offset + PCI_EXP_LNKCTL) & PCI_EXP_LNKCTL_ASPM_L1)
> > >        return 0;
> > >      ...
> > > 
> > > That suggests there's some NVMe/ASPM interaction that the PCI core
> > > doesn't understand yet.
> > > 
> > 
> > We have this check in place since NVMe driver keeps the device in D0 and expects
> > the link to be in L1ss on platforms not passing below checks:
> > 
> >          if (pm_suspend_via_firmware() || !ctrl->npss ||
> >              !pcie_aspm_enabled(pdev) ||
> >              (ndev->ctrl.quirks & NVME_QUIRK_SIMPLE_SUSPEND))
> > 
> 
> We noticed a similar issue with the Renesas RZ/G3S host driver and NVMe
> devices. We currently have 2 SoCs where we identified this problem (RZ/G3S
> and RZ/G3E), both present on SoM modules, and the SoM modules could be
> connected to the same carrier board where the PCIe signals are routed and
> connectors exists. On the carrier board we have 2 connectors were we can
> attach NVMe devices, one M.2 Key B and one PCIe x4 connector
> (https://www.amphenol-cs.com/product/10061913111plf.html).
> 
> The issue described in this thread is reproducible for us only after suspend
> and only for the NVMe device connected to the the PCIe x4 connector. The
> device is working correctly just after boot. On suspend, power to the most
> SoC components (including PCIe) is lost but the endpoints remains powered.
> 
> The issue is not reproducible if the following command is executed before
> suspend: echo performance > /sys/module/pcie_aspm/parameters/policy
> 
> The difference we identified in terms of signals connected from the SoC to
> the on board connectors relies in the CLKREQ#. This signal is only connected
> to the PCIe x4 slot.
> 
> On RZ/G3E the CLKREQ# is configured as a individual GPIO pin. On RZ/G3S it
> is muxed by the pin controller with PCIe function. We tried on RZ/G3E to not
> configure the CLKREQ# pin at all and with this the NVMe connected on the
> PCIe x4 slot started to work even after suspend. We cannot reproduce the
> same behavior on RZ/G3S.
> 
> Initially, we considered we might have to update the existing code to do
> specific configuration for the boards were CLKREQ# is not connected (through
> supports-clkreq DT property that some controllers are using).
> 
> Currently, the manual is unclear on how to control CLKREQ#.
> 
> Apart from the suggestions mentioned in [1], could you please let me know if
> you have any others?
> 

If you do not know how to control CLKREQ# and it is broken, then disable L1 PM
Substates in the Root Port L1 PM Susbstates Capabilities register during
controller driver probe.

- Mani

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-16 14:35               ` Jon Hunter
@ 2026-02-19 17:42                 ` Jon Hunter
  2026-02-26 10:34                   ` Jon Hunter
  2026-02-26 11:16                   ` Manivannan Sadhasivam
  0 siblings, 2 replies; 21+ messages in thread
From: Jon Hunter @ 2026-02-19 17:42 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Manikanta Maddireddy
  Cc: krishna.chundru, Bjorn Helgaas, manivannan.sadhasivam,
	Bjorn Helgaas, Lorenzo Pieralisi, Krzysztof Wilczyński,
	Rob Herring, linux-pci, linux-kernel, linux-arm-msm, David E. Box,
	Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
	linux-tegra@vger.kernel.org, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

Hi Mani,

On 16/02/2026 14:35, Jon Hunter wrote:

...

>> Krishna posted the series a couple of weeks before but forgot to CC you:
>> https://lore.kernel.org/linux-pci/20260128-d3cold-v1-0- 
>> dd8f3f0ce824@oss.qualcomm.com/
>>
>> You are expected to use the helper pci_host_common_can_enter_d3cold() 
>> in the
>> suspend path.

I have been playing around with this, but so far I have not got anything
to work. Right now I have just made the following change (note that this
is based upon Manikanta's fixes series [0]) ...

diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
index 9883d14f7f97..9f88e4c1db08 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -2311,6 +2311,7 @@ static int tegra_pcie_dw_suspend_late(struct device *dev)
  static int tegra_pcie_dw_suspend_noirq(struct device *dev)
  {
         struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
+       struct dw_pcie *pci = &pcie->pci;

         if (pcie->of_data->mode == DW_PCIE_EP_TYPE)
                 return 0;
@@ -2318,6 +2319,9 @@ static int tegra_pcie_dw_suspend_noirq(struct device *dev)
         if (!pcie->link_state)
                 return 0;

+       if (!pci_host_common_can_enter_d3cold(pci->pp.bridge))
+               return 0;
+
         tegra_pcie_dw_pme_turnoff(pcie);
         tegra_pcie_unconfig_controller(pcie);

At first I was thinking that is we are not actually suspending the
controller we can skip the configuration of the controller in the
resume. However, if we skip configuring the controller in the resume
then the device does not resume at all. So right now I have the
above, but clearly this is not sufficient. The device resumes but
the NVMe is not working ...

  nvme nvme0: ctrl state 1 is not RESETTING
  nvme nvme0: Disabling device after reset failure: -19
  nvme nvme0: Ignoring bogus Namespace Identifiers
  Aborting journal on device nvme0n1p1-8.
  nvme0n1: detected capacity change from 0 to 976773168
  EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1613: inode #18622533: comm (t-helper): reading directory lblock 0
  Buffer I/O error on dev nvme0n1p1, logical block 60850176, lost sync page write
  Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
  JBD2: I/O error when updating journal superblock for nvme0n1p1-8.
  EXT4-fs (nvme0n1p1): I/O error while writing superblock
  EXT4-fs error (device nvme0n1p1): ext4_journal_check_start:86: comm rs:main Q:Reg: Detected aborted journal
  Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
  EXT4-fs (nvme0n1p1): I/O error while writing superblock
  EXT4-fs (nvme0n1p1): Remounting filesystem read-only
  EXT4-fs (nvme0n1p1): shut down requested (2)

Is the above what you were thinking? Anything else I am missing?

Jon

[0] https://lore.kernel.org/linux-tegra/20260208180746.2024338-1-mmaddireddy@nvidia.com/
-- 
nvpublic

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-19 17:42                 ` Jon Hunter
@ 2026-02-26 10:34                   ` Jon Hunter
  2026-02-26 11:08                     ` Manivannan Sadhasivam
  2026-02-26 11:16                   ` Manivannan Sadhasivam
  1 sibling, 1 reply; 21+ messages in thread
From: Jon Hunter @ 2026-02-26 10:34 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Manikanta Maddireddy, Bjorn Helgaas
  Cc: krishna.chundru, manivannan.sadhasivam, Bjorn Helgaas,
	Lorenzo Pieralisi, Krzysztof Wilczyński, Rob Herring,
	linux-pci, linux-kernel, linux-arm-msm, David E. Box,
	Kai-Heng Feng, Rafael J. Wysocki, Heiner Kallweit, Chia-Lin Kao,
	linux-tegra@vger.kernel.org, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

Hi Mani, Bjorn,

On 19/02/2026 17:42, Jon Hunter wrote:
> Hi Mani,
> 
> On 16/02/2026 14:35, Jon Hunter wrote:
> 
> ...
> 
>>> Krishna posted the series a couple of weeks before but forgot to CC you:
>>> https://lore.kernel.org/linux-pci/20260128-d3cold-v1-0- 
>>> dd8f3f0ce824@oss.qualcomm.com/
>>>
>>> You are expected to use the helper pci_host_common_can_enter_d3cold() 
>>> in the
>>> suspend path.
> 
> 
> I have been playing around with this, but so far I have not got anything
> to work. Right now I have just made the following change (note that this
> is based upon Manikanta's fixes series [0]) ...
> 
> diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/ 
> controller/dwc/pcie-tegra194.c
> index 9883d14f7f97..9f88e4c1db08 100644
> --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> @@ -2311,6 +2311,7 @@ static int tegra_pcie_dw_suspend_late(struct 
> device *dev)
>   static int tegra_pcie_dw_suspend_noirq(struct device *dev)
>   {
>          struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> +       struct dw_pcie *pci = &pcie->pci;
> 
>          if (pcie->of_data->mode == DW_PCIE_EP_TYPE)
>                  return 0;
> @@ -2318,6 +2319,9 @@ static int tegra_pcie_dw_suspend_noirq(struct 
> device *dev)
>          if (!pcie->link_state)
>                  return 0;
> 
> +       if (!pci_host_common_can_enter_d3cold(pci->pp.bridge))
> +               return 0;
> +
>          tegra_pcie_dw_pme_turnoff(pcie);
>          tegra_pcie_unconfig_controller(pcie);
> 
> 
> At first I was thinking that is we are not actually suspending the
> controller we can skip the configuration of the controller in the
> resume. However, if we skip configuring the controller in the resume
> then the device does not resume at all. So right now I have the
> above, but clearly this is not sufficient. The device resumes but
> the NVMe is not working ...
> 
>   nvme nvme0: ctrl state 1 is not RESETTING
>   nvme nvme0: Disabling device after reset failure: -19
>   nvme nvme0: Ignoring bogus Namespace Identifiers
>   Aborting journal on device nvme0n1p1-8.
>   nvme0n1: detected capacity change from 0 to 976773168
>   EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1613: inode 
> #18622533: comm (t-helper): reading directory lblock 0
>   Buffer I/O error on dev nvme0n1p1, logical block 60850176, lost sync 
> page write
>   Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
>   JBD2: I/O error when updating journal superblock for nvme0n1p1-8.
>   EXT4-fs (nvme0n1p1): I/O error while writing superblock
>   EXT4-fs error (device nvme0n1p1): ext4_journal_check_start:86: comm 
> rs:main Q:Reg: Detected aborted journal
>   Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
>   EXT4-fs (nvme0n1p1): I/O error while writing superblock
>   EXT4-fs (nvme0n1p1): Remounting filesystem read-only
>   EXT4-fs (nvme0n1p1): shut down requested (2)
> 
> Is the above what you were thinking? Anything else I am missing?

So NVMe is still broken for us and I admit, I don't fully understand the 
issue. However, it seems to me that this change is not working for all 
device-tree platforms as intended. So for now, would it be acceptable to 
add a callback function for drivers such as the Tegra194 PCIe driver to 
opt out of this? This would at least allow NVMe to work as it was before.

Thanks
Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-26 10:34                   ` Jon Hunter
@ 2026-02-26 11:08                     ` Manivannan Sadhasivam
  2026-02-26 16:55                       ` Jon Hunter
  0 siblings, 1 reply; 21+ messages in thread
From: Manivannan Sadhasivam @ 2026-02-26 11:08 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Manikanta Maddireddy, Bjorn Helgaas, krishna.chundru,
	manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme

On Thu, Feb 26, 2026 at 10:34:18AM +0000, Jon Hunter wrote:
> Hi Mani, Bjorn,
> 
> On 19/02/2026 17:42, Jon Hunter wrote:
> > Hi Mani,
> > 
> > On 16/02/2026 14:35, Jon Hunter wrote:
> > 
> > ...
> > 
> > > > Krishna posted the series a couple of weeks before but forgot to CC you:
> > > > https://lore.kernel.org/linux-pci/20260128-d3cold-v1-0-
> > > > dd8f3f0ce824@oss.qualcomm.com/
> > > > 
> > > > You are expected to use the helper
> > > > pci_host_common_can_enter_d3cold() in the
> > > > suspend path.
> > 
> > 
> > I have been playing around with this, but so far I have not got anything
> > to work. Right now I have just made the following change (note that this
> > is based upon Manikanta's fixes series [0]) ...
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/
> > controller/dwc/pcie-tegra194.c
> > index 9883d14f7f97..9f88e4c1db08 100644
> > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > @@ -2311,6 +2311,7 @@ static int tegra_pcie_dw_suspend_late(struct
> > device *dev)
> >   static int tegra_pcie_dw_suspend_noirq(struct device *dev)
> >   {
> >          struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> > +       struct dw_pcie *pci = &pcie->pci;
> > 
> >          if (pcie->of_data->mode == DW_PCIE_EP_TYPE)
> >                  return 0;
> > @@ -2318,6 +2319,9 @@ static int tegra_pcie_dw_suspend_noirq(struct
> > device *dev)
> >          if (!pcie->link_state)
> >                  return 0;
> > 
> > +       if (!pci_host_common_can_enter_d3cold(pci->pp.bridge))
> > +               return 0;
> > +
> >          tegra_pcie_dw_pme_turnoff(pcie);
> >          tegra_pcie_unconfig_controller(pcie);
> > 
> > 
> > At first I was thinking that is we are not actually suspending the
> > controller we can skip the configuration of the controller in the
> > resume. However, if we skip configuring the controller in the resume
> > then the device does not resume at all. So right now I have the
> > above, but clearly this is not sufficient. The device resumes but
> > the NVMe is not working ...
> > 
> >   nvme nvme0: ctrl state 1 is not RESETTING
> >   nvme nvme0: Disabling device after reset failure: -19
> >   nvme nvme0: Ignoring bogus Namespace Identifiers
> >   Aborting journal on device nvme0n1p1-8.
> >   nvme0n1: detected capacity change from 0 to 976773168
> >   EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1613: inode
> > #18622533: comm (t-helper): reading directory lblock 0
> >   Buffer I/O error on dev nvme0n1p1, logical block 60850176, lost sync
> > page write
> >   Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
> >   JBD2: I/O error when updating journal superblock for nvme0n1p1-8.
> >   EXT4-fs (nvme0n1p1): I/O error while writing superblock
> >   EXT4-fs error (device nvme0n1p1): ext4_journal_check_start:86: comm
> > rs:main Q:Reg: Detected aborted journal
> >   Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
> >   EXT4-fs (nvme0n1p1): I/O error while writing superblock
> >   EXT4-fs (nvme0n1p1): Remounting filesystem read-only
> >   EXT4-fs (nvme0n1p1): shut down requested (2)
> > 
> > Is the above what you were thinking? Anything else I am missing?
> 
> So NVMe is still broken for us and I admit, I don't fully understand the
> issue. However, it seems to me that this change is not working for all
> device-tree platforms as intended. So for now, would it be acceptable to add
> a callback function for drivers such as the Tegra194 PCIe driver to opt out
> of this? This would at least allow NVMe to work as it was before.
> 

Since we know that ASPM is the issue on your platform and the failure also
confirms that ASPM was never enabled before, I'd suggest disabling ASPM for the
Root Port as a workaround:

```
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
index 06571d806ab3..f504b4ffbcb6 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -2499,6 +2499,13 @@ module_platform_driver(tegra_pcie_dw_driver);
 
 MODULE_DEVICE_TABLE(of, tegra_pcie_dw_of_match);
 
+static void tegra_pcie_quirk_disable_aspm(struct pci_dev *dev)
+{
+       pcie_aspm_remove_cap(dev, PCI_EXP_LNKCAP_ASPM_L1 |
+                                 PCI_EXP_LNKCAP_ASPM_L0S);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, tegra_pcie_quirk_disable_aspm);
+
 MODULE_AUTHOR("Vidya Sagar <vidyas@nvidia.com>");
 MODULE_DESCRIPTION("NVIDIA PCIe host controller driver");
 MODULE_LICENSE("GPL v2");
```

You can use specific Root Port IDs or PCI_ANY_ID depending on the impact. We can
also work on fixing the actual issue parallelly.

- Mani

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-19 17:42                 ` Jon Hunter
  2026-02-26 10:34                   ` Jon Hunter
@ 2026-02-26 11:16                   ` Manivannan Sadhasivam
  2026-02-26 16:52                     ` Jon Hunter
  1 sibling, 1 reply; 21+ messages in thread
From: Manivannan Sadhasivam @ 2026-02-26 11:16 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Manikanta Maddireddy, krishna.chundru, Bjorn Helgaas,
	manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme

On Thu, Feb 19, 2026 at 05:42:37PM +0000, Jon Hunter wrote:
> Hi Mani,
> 
> On 16/02/2026 14:35, Jon Hunter wrote:
> 
> ...
> 
> > > Krishna posted the series a couple of weeks before but forgot to CC you:
> > > https://lore.kernel.org/linux-pci/20260128-d3cold-v1-0-
> > > dd8f3f0ce824@oss.qualcomm.com/
> > > 
> > > You are expected to use the helper
> > > pci_host_common_can_enter_d3cold() in the
> > > suspend path.
> 
> 
> I have been playing around with this, but so far I have not got anything
> to work. Right now I have just made the following change (note that this
> is based upon Manikanta's fixes series [0]) ...
> 
> diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> index 9883d14f7f97..9f88e4c1db08 100644
> --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> @@ -2311,6 +2311,7 @@ static int tegra_pcie_dw_suspend_late(struct device *dev)
>  static int tegra_pcie_dw_suspend_noirq(struct device *dev)
>  {
>         struct tegra_pcie_dw *pcie = dev_get_drvdata(dev);
> +       struct dw_pcie *pci = &pcie->pci;
>         if (pcie->of_data->mode == DW_PCIE_EP_TYPE)
>                 return 0;
> @@ -2318,6 +2319,9 @@ static int tegra_pcie_dw_suspend_noirq(struct device *dev)
>         if (!pcie->link_state)
>                 return 0;
> +       if (!pci_host_common_can_enter_d3cold(pci->pp.bridge))
> +               return 0;
> +
>         tegra_pcie_dw_pme_turnoff(pcie);
>         tegra_pcie_unconfig_controller(pcie);
> 
> 
> At first I was thinking that is we are not actually suspending the
> controller we can skip the configuration of the controller in the
> resume. However, if we skip configuring the controller in the resume
> then the device does not resume at all.

Device mean the 'host' here?

> So right now I have the
> above, but clearly this is not sufficient. The device resumes but
> the NVMe is not working ...
> 
>  nvme nvme0: ctrl state 1 is not RESETTING
>  nvme nvme0: Disabling device after reset failure: -19
>  nvme nvme0: Ignoring bogus Namespace Identifiers
>  Aborting journal on device nvme0n1p1-8.
>  nvme0n1: detected capacity change from 0 to 976773168
>  EXT4-fs error (device nvme0n1p1): __ext4_find_entry:1613: inode #18622533: comm (t-helper): reading directory lblock 0
>  Buffer I/O error on dev nvme0n1p1, logical block 60850176, lost sync page write
>  Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
>  JBD2: I/O error when updating journal superblock for nvme0n1p1-8.
>  EXT4-fs (nvme0n1p1): I/O error while writing superblock
>  EXT4-fs error (device nvme0n1p1): ext4_journal_check_start:86: comm rs:main Q:Reg: Detected aborted journal
>  Buffer I/O error on dev nvme0n1p1, logical block 0, lost sync page write
>  EXT4-fs (nvme0n1p1): I/O error while writing superblock
>  EXT4-fs (nvme0n1p1): Remounting filesystem read-only
>  EXT4-fs (nvme0n1p1): shut down requested (2)
> 
> Is the above what you were thinking? Anything else I am missing?
> 

I can't certainly know what is going wrong. If controller driver suspend is
skipped, then ideally the controller and the NVMe device should stay powered ON
during suspend. But if the platform pulls the plug at the end of suspend
(firmware, gdsc or some other entity), then all the context would be lost and
that might explain the failure because both the controller driver and NVMe
driver would expect the RC and NVMe to be active.

You can try commenting out the whole PM callbacks:
		// .pm = &tegra_pcie_dw_pm_ops

If the host itself doesn't resume, then it confirms that some other entity is
pulling the plug (which is common in ARM platforms). In that case, we have to
let the NVMe driver know about it so that it can shutdown the controller.

- Mani

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-26 11:16                   ` Manivannan Sadhasivam
@ 2026-02-26 16:52                     ` Jon Hunter
  2026-03-03 16:17                       ` Manivannan Sadhasivam
  0 siblings, 1 reply; 21+ messages in thread
From: Jon Hunter @ 2026-02-26 16:52 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Manikanta Maddireddy, krishna.chundru, Bjorn Helgaas,
	manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme


On 26/02/2026 11:16, Manivannan Sadhasivam wrote:

...

> I can't certainly know what is going wrong. If controller driver suspend is
> skipped, then ideally the controller and the NVMe device should stay powered ON
> during suspend. But if the platform pulls the plug at the end of suspend
> (firmware, gdsc or some other entity), then all the context would be lost and
> that might explain the failure because both the controller driver and NVMe
> driver would expect the RC and NVMe to be active.
> 
> You can try commenting out the whole PM callbacks:
> 		// .pm = &tegra_pcie_dw_pm_ops
> 
> If the host itself doesn't resume, then it confirms that some other entity is
> pulling the plug (which is common in ARM platforms). In that case, we have to
> let the NVMe driver know about it so that it can shutdown the controller.

For Tegra, we enter a deep low power state known as SC7 on suspend which 
does involve firmware. Nonetheless I tried for fun, but this breaks 
suspend completely.

Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-26 11:08                     ` Manivannan Sadhasivam
@ 2026-02-26 16:55                       ` Jon Hunter
  2026-03-03 16:27                         ` Manivannan Sadhasivam
  0 siblings, 1 reply; 21+ messages in thread
From: Jon Hunter @ 2026-02-26 16:55 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Manikanta Maddireddy, Bjorn Helgaas, krishna.chundru,
	manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme


On 26/02/2026 11:08, Manivannan Sadhasivam wrote:

...

> Since we know that ASPM is the issue on your platform and the failure also
> confirms that ASPM was never enabled before, I'd suggest disabling ASPM for the
> Root Port as a workaround:
> 
> ```
> diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> index 06571d806ab3..f504b4ffbcb6 100644
> --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> @@ -2499,6 +2499,13 @@ module_platform_driver(tegra_pcie_dw_driver);
>   
>   MODULE_DEVICE_TABLE(of, tegra_pcie_dw_of_match);
>   
> +static void tegra_pcie_quirk_disable_aspm(struct pci_dev *dev)
> +{
> +       pcie_aspm_remove_cap(dev, PCI_EXP_LNKCAP_ASPM_L1 |
> +                                 PCI_EXP_LNKCAP_ASPM_L0S);
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, tegra_pcie_quirk_disable_aspm);
> +
>   MODULE_AUTHOR("Vidya Sagar <vidyas@nvidia.com>");
>   MODULE_DESCRIPTION("NVIDIA PCIe host controller driver");
>   MODULE_LICENSE("GPL v2");
> ```
> 
> You can use specific Root Port IDs or PCI_ANY_ID depending on the impact. We can
> also work on fixing the actual issue parallelly.

Thanks. By default we are building the PCIe driver for Tegra as a module 
and so I am not sure we can use DECLARE_PCI_FIXUP_EARLY() right?

I was just thinking that in pcie_aspm_override_default_link_state() we 
just need a callback to specify the default ASPM override state?

Cheers
Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-26 16:52                     ` Jon Hunter
@ 2026-03-03 16:17                       ` Manivannan Sadhasivam
  2026-03-06 16:03                         ` Jon Hunter
  0 siblings, 1 reply; 21+ messages in thread
From: Manivannan Sadhasivam @ 2026-03-03 16:17 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Manikanta Maddireddy, krishna.chundru, Bjorn Helgaas,
	manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme

On Thu, Feb 26, 2026 at 04:52:57PM +0000, Jon Hunter wrote:
> 
> On 26/02/2026 11:16, Manivannan Sadhasivam wrote:
> 
> ...
> 
> > I can't certainly know what is going wrong. If controller driver suspend is
> > skipped, then ideally the controller and the NVMe device should stay powered ON
> > during suspend. But if the platform pulls the plug at the end of suspend
> > (firmware, gdsc or some other entity), then all the context would be lost and
> > that might explain the failure because both the controller driver and NVMe
> > driver would expect the RC and NVMe to be active.
> > 
> > You can try commenting out the whole PM callbacks:
> > 		// .pm = &tegra_pcie_dw_pm_ops
> > 
> > If the host itself doesn't resume, then it confirms that some other entity is
> > pulling the plug (which is common in ARM platforms). In that case, we have to
> > let the NVMe driver know about it so that it can shutdown the controller.
> 
> For Tegra, we enter a deep low power state known as SC7 on suspend which
> does involve firmware. Nonetheless I tried for fun, but this breaks suspend
> completely.
> 

Ah, this explains the problem. We also have a similar problem on our Qcom Auto
boards where the firmware completely shuts down the SoC and puts the DRAM in
self refresh mode. So NVMe driver never resumes properly. We tried multiple ways
to address this issue in the NVMe driver, but the NVMe maintainers rejected
every single one of them and asking for some API in the PCI or PM core to tell
the NVMe driver when to shutdown the device during suspend. But this turned out
to be not so trivial.

Another way to workaround this issue would be by calling
pm_set_suspend_via_firmware() from the driver that controls the entity doing
power management of the SoC (firmware). In your case, it is
drivers/soc/tegra/pmc.c?

In that case, you can use this patch as a reference:
https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com

When pm_set_suspend_via_firmware() is set, NVMe driver assumes that the firmware
might pull the plug during suspend, so it shutdowns the controller completely
and brings it back from reset during resume.

- Mani

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-02-26 16:55                       ` Jon Hunter
@ 2026-03-03 16:27                         ` Manivannan Sadhasivam
  0 siblings, 0 replies; 21+ messages in thread
From: Manivannan Sadhasivam @ 2026-03-03 16:27 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Manikanta Maddireddy, Bjorn Helgaas, krishna.chundru,
	manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme

On Thu, Feb 26, 2026 at 04:55:34PM +0000, Jon Hunter wrote:
> 
> On 26/02/2026 11:08, Manivannan Sadhasivam wrote:
> 
> ...
> 
> > Since we know that ASPM is the issue on your platform and the failure also
> > confirms that ASPM was never enabled before, I'd suggest disabling ASPM for the
> > Root Port as a workaround:
> > 
> > ```
> > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c
> > index 06571d806ab3..f504b4ffbcb6 100644
> > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > @@ -2499,6 +2499,13 @@ module_platform_driver(tegra_pcie_dw_driver);
> >   MODULE_DEVICE_TABLE(of, tegra_pcie_dw_of_match);
> > +static void tegra_pcie_quirk_disable_aspm(struct pci_dev *dev)
> > +{
> > +       pcie_aspm_remove_cap(dev, PCI_EXP_LNKCAP_ASPM_L1 |
> > +                                 PCI_EXP_LNKCAP_ASPM_L0S);
> > +}
> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, tegra_pcie_quirk_disable_aspm);
> > +
> >   MODULE_AUTHOR("Vidya Sagar <vidyas@nvidia.com>");
> >   MODULE_DESCRIPTION("NVIDIA PCIe host controller driver");
> >   MODULE_LICENSE("GPL v2");
> > ```
> > 
> > You can use specific Root Port IDs or PCI_ANY_ID depending on the impact. We can
> > also work on fixing the actual issue parallelly.
> 
> Thanks. By default we are building the PCIe driver for Tegra as a module and
> so I am not sure we can use DECLARE_PCI_FIXUP_EARLY() right?
> 

Ah, yes. We cannot use any of these DECLARE_PCI_FIXUP*() in a module anyway :/

> I was just thinking that in pcie_aspm_override_default_link_state() we just
> need a callback to specify the default ASPM override state?
> 

That looks like a dirty hack. Moreover, your platform works perfectly fine with
ASPM. So you should not be turning it off due to one driver behaving
erratically.

As I mentioned in [1], you should try to advertise the
PM_SUSPEND_FLAG_FW_SUSPEND flag for the platform as per [2].

FYI, this is a known issue that is plauging us for so long.
PM_SUSPEND_FLAG_FW_SUSPEND is the cleanest solution we came up with. But
unfortunately, we cannot use it across all of our Qcom SoCs due to firmware not
advertising S2RAM in some of them.

- Mani

[1] https://lore.kernel.org/linux-pci/kkly3z4durpagtenadvmzdpojlctachgfgi2fdapt6zthdl2gx@n2qhmlud2zb7/
[2] https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com/

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-03-03 16:17                       ` Manivannan Sadhasivam
@ 2026-03-06 16:03                         ` Jon Hunter
  2026-03-09  8:00                           ` Manivannan Sadhasivam
  0 siblings, 1 reply; 21+ messages in thread
From: Jon Hunter @ 2026-03-06 16:03 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Manikanta Maddireddy, krishna.chundru, Bjorn Helgaas,
	manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme


On 03/03/2026 16:17, Manivannan Sadhasivam wrote:

...

>> For Tegra, we enter a deep low power state known as SC7 on suspend which
>> does involve firmware. Nonetheless I tried for fun, but this breaks suspend
>> completely.
>>
> 
> Ah, this explains the problem. We also have a similar problem on our Qcom Auto
> boards where the firmware completely shuts down the SoC and puts the DRAM in
> self refresh mode. So NVMe driver never resumes properly. We tried multiple ways
> to address this issue in the NVMe driver, but the NVMe maintainers rejected
> every single one of them and asking for some API in the PCI or PM core to tell
> the NVMe driver when to shutdown the device during suspend. But this turned out
> to be not so trivial.
> 
> Another way to workaround this issue would be by calling
> pm_set_suspend_via_firmware() from the driver that controls the entity doing
> power management of the SoC (firmware). In your case, it is
> drivers/soc/tegra/pmc.c?

Actually for newer devices it is PSCI and so ...

> In that case, you can use this patch as a reference:
> https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com

This change fixes the problem as implemented. What is the status of the 
above? Any plans to get this merged?

Jon

-- 
nvpublic



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms
  2026-03-06 16:03                         ` Jon Hunter
@ 2026-03-09  8:00                           ` Manivannan Sadhasivam
  0 siblings, 0 replies; 21+ messages in thread
From: Manivannan Sadhasivam @ 2026-03-09  8:00 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Manikanta Maddireddy, krishna.chundru, Bjorn Helgaas,
	manivannan.sadhasivam, Bjorn Helgaas, Lorenzo Pieralisi,
	Krzysztof Wilczyński, Rob Herring, linux-pci, linux-kernel,
	linux-arm-msm, David E. Box, Kai-Heng Feng, Rafael J. Wysocki,
	Heiner Kallweit, Chia-Lin Kao, linux-tegra@vger.kernel.org,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme

On Fri, Mar 06, 2026 at 04:03:35PM +0000, Jon Hunter wrote:
> 
> On 03/03/2026 16:17, Manivannan Sadhasivam wrote:
> 
> ...
> 
> > > For Tegra, we enter a deep low power state known as SC7 on suspend which
> > > does involve firmware. Nonetheless I tried for fun, but this breaks suspend
> > > completely.
> > > 
> > 
> > Ah, this explains the problem. We also have a similar problem on our Qcom Auto
> > boards where the firmware completely shuts down the SoC and puts the DRAM in
> > self refresh mode. So NVMe driver never resumes properly. We tried multiple ways
> > to address this issue in the NVMe driver, but the NVMe maintainers rejected
> > every single one of them and asking for some API in the PCI or PM core to tell
> > the NVMe driver when to shutdown the device during suspend. But this turned out
> > to be not so trivial.
> > 
> > Another way to workaround this issue would be by calling
> > pm_set_suspend_via_firmware() from the driver that controls the entity doing
> > power management of the SoC (firmware). In your case, it is
> > drivers/soc/tegra/pmc.c?
> 
> Actually for newer devices it is PSCI and so ...
> 
> > In that case, you can use this patch as a reference:
> > https://lore.kernel.org/all/20251231162126.7728-1-manivannan.sadhasivam@oss.qualcomm.com
> 
> This change fixes the problem as implemented. What is the status of the
> above? Any plans to get this merged?
> 

It didn't get any love so far. It'd help if you can give your tags to that
patch.

- Mani

-- 
மணிவண்ணன் சதாசிவம்


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-03-09  8:00 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <7306256a-b380-489b-8248-b774e6d3d80e@nvidia.com>
2026-01-22 15:29 ` [PATCH v2 1/2] PCI/ASPM: Override the ASPM and Clock PM states set by BIOS for devicetree platforms Bjorn Helgaas
2026-01-22 17:01   ` Manivannan Sadhasivam
2026-01-22 19:14     ` Jon Hunter
2026-01-23 10:55       ` Jon Hunter
2026-01-23 13:56         ` Manivannan Sadhasivam
2026-01-23 14:39           ` Jon Hunter
2026-02-16 14:03           ` Jon Hunter
2026-02-16 14:18             ` Manivannan Sadhasivam
2026-02-16 14:35               ` Jon Hunter
2026-02-19 17:42                 ` Jon Hunter
2026-02-26 10:34                   ` Jon Hunter
2026-02-26 11:08                     ` Manivannan Sadhasivam
2026-02-26 16:55                       ` Jon Hunter
2026-03-03 16:27                         ` Manivannan Sadhasivam
2026-02-26 11:16                   ` Manivannan Sadhasivam
2026-02-26 16:52                     ` Jon Hunter
2026-03-03 16:17                       ` Manivannan Sadhasivam
2026-03-06 16:03                         ` Jon Hunter
2026-03-09  8:00                           ` Manivannan Sadhasivam
2026-02-16 17:19     ` Claudiu Beznea
2026-02-18 13:56       ` Manivannan Sadhasivam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox