Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
* NVMe suspend failure
@ 2024-02-04  1:01 Bart Van Assche
  2024-02-04  1:18 ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2024-02-04  1:01 UTC (permalink / raw)
  To: linux-nvme@lists.infradead.org

Hi,

Even after having added nvme_core.default_ps_max_latency_us=0
pcie_aspm=off to the kernel command line, the following still
appears in dmesg -w output upon suspend of an x86_64 workstation:

[ 2451.640676] nvme nvme0: controller is down; will reset: 
CSTS=0xffffffff, PCI_STATUS=0x10
[ 2451.640690] nvme nvme0: Does your device have a faulty power saving 
mode enabled?
[ 2451.640694] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 
pcie_aspm=off" and report a bug

Hence this email. From the nvme id-ctrl output:

mn        : Samsung SSD 970 EVO Plus 500GB
fr        : 2B2QEXM7

Please let me know if you need more information.

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NVMe suspend failure
  2024-02-04  1:01 NVMe suspend failure Bart Van Assche
@ 2024-02-04  1:18 ` Keith Busch
  2024-02-04  1:44   ` Bart Van Assche
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2024-02-04  1:18 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-nvme@lists.infradead.org

On Sat, Feb 03, 2024 at 05:01:41PM -0800, Bart Van Assche wrote:
> Hi,
> 
> Even after having added nvme_core.default_ps_max_latency_us=0
> pcie_aspm=off to the kernel command line, the following still
> appears in dmesg -w output upon suspend of an x86_64 workstation:
> 
> [ 2451.640676] nvme nvme0: controller is down; will reset: CSTS=0xffffffff,
> PCI_STATUS=0x10
> [ 2451.640690] nvme nvme0: Does your device have a faulty power saving mode
> enabled?
> [ 2451.640694] nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0
> pcie_aspm=off" and report a bug
> 
> Hence this email. From the nvme id-ctrl output:
> 
> mn        : Samsung SSD 970 EVO Plus 500GB
> fr        : 2B2QEXM7
> 
> Please let me know if you need more information.

And this is happening during a suspend? What kind of suspend? Like an
S3/S4, or idle suspend?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NVMe suspend failure
  2024-02-04  1:18 ` Keith Busch
@ 2024-02-04  1:44   ` Bart Van Assche
  2024-02-05 17:58     ` Keith Busch
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Van Assche @ 2024-02-04  1:44 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme@lists.infradead.org

On 2/3/24 17:18, Keith Busch wrote:
> And this is happening during a suspend? What kind of suspend? Like an
> S3/S4, or idle suspend?

I'm not sure how to determine this? This is what I found in the logs:

Feb 03 09:02:20 asus systemd-logind[1208]: The system will suspend now!
Feb 03 09:02:20 asus systemd-logind[1208]: Unit suspend.target is 
masked, refusing operation.

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NVMe suspend failure
  2024-02-04  1:44   ` Bart Van Assche
@ 2024-02-05 17:58     ` Keith Busch
  2024-02-05 18:44       ` Bart Van Assche
  0 siblings, 1 reply; 5+ messages in thread
From: Keith Busch @ 2024-02-05 17:58 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: linux-nvme@lists.infradead.org

On Sat, Feb 03, 2024 at 05:44:03PM -0800, Bart Van Assche wrote:
> On 2/3/24 17:18, Keith Busch wrote:
> > And this is happening during a suspend? What kind of suspend? Like an
> > S3/S4, or idle suspend?
> 
> I'm not sure how to determine this? This is what I found in the logs:
> 
> Feb 03 09:02:20 asus systemd-logind[1208]: The system will suspend now!
> Feb 03 09:02:20 asus systemd-logind[1208]: Unit suspend.target is masked,
> refusing operation.

I am not sure what the "suspend now!" message means to the driver. Your
initial report with the "CSTS=0xfffffff" comes from the nvme request
timeout handler, so I'd want to confirm what command is timing out, and
what path dispatched it: was the command dispatched from nvme_suspend()
path, or is this coming from somewhere else? If somewhere else, was it
dispatched before or after the nvme_suspend?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: NVMe suspend failure
  2024-02-05 17:58     ` Keith Busch
@ 2024-02-05 18:44       ` Bart Van Assche
  0 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2024-02-05 18:44 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-nvme@lists.infradead.org

On 2/5/24 09:58, Keith Busch wrote:
> On Sat, Feb 03, 2024 at 05:44:03PM -0800, Bart Van Assche wrote:
>> On 2/3/24 17:18, Keith Busch wrote:
>>> And this is happening during a suspend? What kind of suspend? Like an
>>> S3/S4, or idle suspend?
>>
>> I'm not sure how to determine this? This is what I found in the logs:
>>
>> Feb 03 09:02:20 asus systemd-logind[1208]: The system will suspend now!
>> Feb 03 09:02:20 asus systemd-logind[1208]: Unit suspend.target is masked,
>> refusing operation.
> 
> I am not sure what the "suspend now!" message means to the driver. Your
> initial report with the "CSTS=0xfffffff" comes from the nvme request
> timeout handler, so I'd want to confirm what command is timing out, and
> what path dispatched it: was the command dispatched from nvme_suspend()
> path, or is this coming from somewhere else? If somewhere else, was it
> dispatched before or after the nvme_suspend?

Hi Keith,

I think the requests that timed out were submitted after resume. User space
software is paused before nvme_suspend() is called. nvme_suspend() waits for
pending requests to complete. Hence, the requests that timed out must have
been submitted after user space processes were resumed. An additional
indication is that I saw user space software crash after resume that writes
periodically to block storage devices (email client and web browser).

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-05 18:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-04  1:01 NVMe suspend failure Bart Van Assche
2024-02-04  1:18 ` Keith Busch
2024-02-04  1:44   ` Bart Van Assche
2024-02-05 17:58     ` Keith Busch
2024-02-05 18:44       ` Bart Van Assche

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox