Re: WD Red SN700 4000GB, F/W: 11C120WD (Device not ready; aborting reset, CSTS=0x1)

public inbox for linux-raid@vger.kernel.org
 help / color / mirror / Atom feed

From: Wol <antlists@youngman.org.uk>
To: Justin Piszcz <jpiszcz@lucidpixels.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-nvme@lists.infradead.org, linux-raid@vger.kernel.org,
	Btrfs BTRFS <linux-btrfs@vger.kernel.org>
Subject: Re: WD Red SN700 4000GB, F/W: 11C120WD (Device not ready; aborting reset, CSTS=0x1)
Date: Tue, 25 Nov 2025 18:25:41 +0000	[thread overview]
Message-ID: <07500979-eca8-4159-b2a5-3052e9958c84@youngman.org.uk> (raw)
In-Reply-To: <CAO9zADxCYgQVOD9A1WYoS4JcLgvsNtGGr4xEZm9CMFHXsTV8ww@mail.gmail.com>

Probably not the problem, but how old are the drives? About 2020, WD 
started shingling the Red line (you had to move to Red Pro to get 
conventional drives). Shingled is bad news for linux raid, but the fact 
your drives tend to drop out when idling makes it unlikely this is the 
problem.

Cheers,
Wol

On 25/11/2025 14:42, Justin Piszcz wrote:
> Hello,
> 
> Issue/Summary:
> 1. Usually once a month, a random WD Red SN700 4TB NVME drive will
> drop out of a NAS array, after power cycling the device, it rebuilds
> successfully.
> 
> Details:
> 0. I use an NVME NAS (FS6712X) with WD Red SN700 4TB drives (WDS400T1R0C):
> 1. Ever since I installed the drives, there will be a random drive
> that drops offline every month or so, almost always when the system is
> idle.
> 2. I have troubleshot this with Asustor and WD/SanDisk.
> 3. Asustor noted that they did have other users with the same
> configuration running into this problem.
> 4. When troubleshooting with WD/SanDisk's it was noted my main option
> is to replace the drive, even though the issue occurs across nearly
> all of the drives.
> 5. The drives are up to date currently according to the WD Dashboard
> (when removing them and checking them on another system).
> 6. As for the device/filesystem, the FS6712X's configuration is
> MD-RAID6 device with BTRFS on-top of it.
> 7. The "workaround" is to power cycle the FS6712X and when it boots up
> the MD-RAID6 re-syncs back to a healthy state.
> 
> I am using the latest Asus ADM/OS which uses the 6.6.x kernel:
> 1. Linux FS6712X-EB92 6.6.x #1 SMP PREEMPT_DYNAMIC Tue Nov  4 00:53:39
> CST 2025 x86_64 GNU/Linux
> 
> Questions:
> 1. Have others experienced this failure scenario?
> 2. Are there identified workarounds for this issue outside of power
> cycling the device when this happens?
> 3. Are there any debug options that can be enabled that could help to
> pinpoint the root cause?
> 4. Within the BIOS settings, which starts 2:18 below there are some
> advanced settings that are shown, could there be a power saving
> feature or other setting that can be modified to address this issue?
> 4a. https://www.youtube.com/watch?v=YytWFtgqVy0
> 
> [1] The last failures have been at random times on the following days:
> 1. August 27, 2025
> 2. September 19th, 2025
> 3. September 29th, 2025
> 4. October 28th, 2025
> 5. November 24, 2025
> 
> Chipset being used:
> 1. ASMedia Technology Inc.:ASM2806 4-Port PCIe x2 Gen3 Packet Switch
> 
> Details:
> 
> 1. August 27, 2025
> [1156824.598513] nvme nvme2: I/O 5 QID 0 timeout, reset controller
> [1156896.035355] nvme nvme2: Device not ready; aborting reset, CSTS=0x1
> [1156906.057936] nvme nvme2: Device not ready; aborting reset, CSTS=0x1
> [1158185.737571] md/raid:md1: Disk failure on nvme2n1p4, disabling device.
> [1158185.744188] md/raid:md1: Operation continuing on 11 devices.
> 
> 2. September 19th, 2025
> [2001664.727044] nvme nvme9: I/O 26 QID 0 timeout, reset controller
> [2001736.159123] nvme nvme9: Device not ready; aborting reset, CSTS=0x1
> [2001746.180813] nvme nvme9: Device not ready; aborting reset, CSTS=0x1
> [2002368.631788] md/raid:md1: Disk failure on nvme9n1p4, disabling device.
> [2002368.638414] md/raid:md1: Operation continuing on 11 devices.
> [2003213.517965] md/raid1:md0: Disk failure on nvme9n1p2, disabling device.
> [2003213.517965] md/raid1:md0: Operation continuing on 11 devices.
> 
> 3.  September 29th, 2025
> [858305.408049] nvme nvme3: I/O 8 QID 0 timeout, reset controller
> [858376.843140] nvme nvme3: Device not ready; aborting reset, CSTS=0x1
> [858386.865240] nvme nvme3: Device not ready; aborting reset, CSTS=0x1
> [858386.883053] md/raid:md1: Disk failure on nvme3n1p4, disabling device.
> [858386.889586] md/raid:md1: Operation continuing on 11 devices.
> 
> 4. October 28th, 2025
> [502963.821407] nvme nvme4: I/O 0 QID 0 timeout, reset controller
> [503035.257391] nvme nvme4: Device not ready; aborting reset, CSTS=0x1
> [503045.282923] nvme nvme4: Device not ready; aborting reset, CSTS=0x1
> [503142.226962] md/raid:md1: Disk failure on nvme4n1p4, disabling device.
> [503142.233496] md/raid:md1: Operation continuing on 11 devices.
> 
> 5. November 24th, 2025
> [1658454.034633] nvme nvme2: I/O 24 QID 0 timeout, reset controller
> [1658525.470287] nvme nvme2: Device not ready; aborting reset, CSTS=0x1
> [1658535.491803] nvme nvme2: Device not ready; aborting reset, CSTS=0x1
> [1658535.517638] md/raid1:md0: Disk failure on nvme2n1p2, disabling device.
> [1658535.517638] md/raid1:md0: Operation continuing on 11 devices.
> [1659258.368386] md/raid:md1: Disk failure on nvme2n1p4, disabling device.
> [1659258.375012] md/raid:md1: Operation continuing on 11 devices.
> 
> 
> Justin
>

next prev parent reply	other threads:[~2025-11-25 19:33 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-25 14:42 WD Red SN700 4000GB, F/W: 11C120WD (Device not ready; aborting reset, CSTS=0x1) Justin Piszcz
2025-11-25 14:48 ` Dr. David Alan Gilbert
2025-12-01 14:09   ` Justin Piszcz
2025-11-25 14:56 ` Keith Busch
2025-12-01 14:13   ` Justin Piszcz
2025-12-02  2:47     ` Keith Busch
2025-11-25 15:19 ` Dragan Milivojević
2025-11-25 16:57   ` Paul Rolland
2025-12-08 13:37     ` Sinisa
2025-11-25 22:37   ` Jani Partanen
2025-12-01 14:14   ` Justin Piszcz
2025-12-08  8:43     ` Brad Campbell
2025-11-25 18:25 ` Wol [this message]
2025-11-26  0:15   ` David Sterba
2025-11-26 12:51     ` Roger Heflin
2025-12-01 14:16   ` Justin Piszcz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07500979-eca8-4159-b2a5-3052e9958c84@youngman.org.uk \
    --to=antlists@youngman.org.uk \
    --cc=jpiszcz@lucidpixels.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox