From: Keith Busch <kbusch@kernel.org>
To: Marcos Scriven <marcos@scriven.org>
Cc: linux-nvme@lists.infradead.org
Subject: Re: My Western Digital SN850 appears to suffer from deep power state issues - considering submitting quirk patch.
Date: Thu, 19 May 2022 14:02:03 -0600 [thread overview]
Message-ID: <YoaiO2yze7UaYZK6@kbusch-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <4486f6fd-0fa8-4cee-9cbe-35f017ea6ceb@www.fastmail.com>
On Thu, May 19, 2022 at 11:14:41AM +0100, Marcos Scriven wrote:
>
> I have now tried both of these options (separately and together), and the issue still occurs. I confirmed settings in command line:
>
> cat /proc/cmdline
> BOOT_IMAGE=/boot/vmlinuz-5.13.19-6-pve root=/dev/mapper/pve-root ro quiet video=efifb:off acpi_enforce_resources=lax nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
>
> Does this mean it's definitively not a power state issue?
If you're still seeing this same all f's failure even with these settings, I
think it rules out the autonomous power settings provided by nvme and pcie.
It doesn't necessarily rule out potentially platform specific power
capabilities, but that'd be well outside my view of the nvme driver stack, and
I don't have any ideas off the top of my head on what to even check.
> The slightly positive news is I now have a fairly dependable way of replicating the issue. I've described it over in the Proxmox forums (https://forum.proxmox.com/threads/what-processes-resources-are-used-while-doing-a-vm-backup-in-stop-mode.109779/) but in short, just backing up a container (from the affected drive, to an unaffected one) has about a 30% chance of causing the drive to go offline. I suppose the fact this happens is another indicator it's not to do with lowering power states (autonomous or otherwise) if it's happening right when the disk is being read.
>
> I tried running strace on the process to see if I could see anything obvious about what the process is doing while it fails, vs what happens when it completes without error. I can't see anything obvious.
>
> Another thing I tried was a raw dd read from the affected disk to dev/null, to see if it was something about intensive reading that causes this, and it did not happen. During that the controller temp maxes out at 63C. nvme smart-log doesn't show any critical warnings.
63C, not great, not terrible.
'dd' is a nice tool, but you may be able to push the drive further with 'fio',
assuming an intense read workload is pushing the drive to failure. Just a quick
example:
# fio --name=global --filename=/dev/nvme0n1 --bs=64k --direct=1 --ioengine=libaio --rw=randread --iodepth=32 --numjobs=8 --name=test
> I'm wondering if there's anything that would help me identify if it's a hardware issue (PSU, SSD, motherboard) in terms of low leve debugging or BIOS settings.
It does sound hardware related, but I'm not aware of any reasonable tools or
methods to debug it. Right now, I can only recommend verifying you've got the
latest vendor provided firmware installed.
> As an complete aside on mailgroup protocol, my understanding is text only, and 'inline' style. Responses come both from the group and directly from the people replying - should I "reply all" or just the group? For now, I've gone for the latter.
The mailing list only accepts plain text. Top posting is generally frowned
upon. A reply-all is fine. Wrapping columns at 80 characters helps readability.
prev parent reply other threads:[~2022-05-19 20:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-15 16:00 My Western Digital SN850 appears to suffer from deep power state issues - considering submitting quirk patch Marcos Scriven
2022-05-15 19:44 ` Keith Busch
2022-05-15 20:13 ` Keith Busch
2022-05-16 17:58 ` Christoph Hellwig
2022-05-19 10:14 ` Marcos Scriven
2022-05-19 20:02 ` Keith Busch [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YoaiO2yze7UaYZK6@kbusch-mbp.dhcp.thefacebook.com \
--to=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=marcos@scriven.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox