Default IDENTIFY timeout is 5000ms which is too short for enterprise disks

Linux ATA/IDE development
 help / color / mirror / Atom feed

* Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
@ 2026-04-09 10:21 AlanCui4080
  2026-04-09 11:55 ` Damien Le Moal
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: AlanCui4080 @ 2026-04-09 10:21 UTC (permalink / raw)
  To: linux-ide, dlemoal

Hi,

I have two ST4000NM000A-2HZ100 on my computer which is of seagate enterprise 
line.  But when i recovery from suspend, the kernel complains about that and 
the zpool kicks the disk off:

```
ata2: found unknown device (class 0)
ata4: found unknown device (class 0)
ata2: found unknown device (class 0)
ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4: found unknown device (class 0)
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4.00: qc timeout after 5000 msecs (cmd 0xec)
ata4.00: qc timeout after 5000 msecs (cmd 0xec)
ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata4.00: revalidation failed (errno=-5)
ata2.00: qc timeout after 5000 msecs (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata2.00: configured for UDMA/133
ata4.00: configured for UDMA/133
```
I think that's cause by the too slow spinup for my disk.
After make libata to wait longer, the warning disappeared.

```
# cat /proc/cmdline
libata.ata_probe_timeout=10
```

```
ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
sd 1:0:0:0: [sda] Starting disk
ata2.00: configured for UDMA/133
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
sd 3:0:0:0: [sdb] Starting disk
ata4.00: configured for UDMA/133
```

Meanwhile, the seachest reports that the startup time from standby is 9sec, 
which is longer that the default ATA IDENTIFY timeout.

```
/dev/sg0 - ST4000NM000A-2HZ100 - **** - TN04 - ATA

Standby Z : Recovery Time : 90 (in 100msecs)
```

```
static const unsigned int ata_eh_identify_timeouts[] = {
         5000,  /* covers > 99% of successes and not too boring on failures */
        10000,  /* combined time till here is enough even for media access */
        30000,  /* for true idiots */
        UINT_MAX,
};
```

I tested the hard drive, and as long as it's never set to STANDBY_Z (disk 
stops spinning, requiring 9 seconds to recover) and kept in IDLE_C (platter 
slows down, requiring 3.2 seconds to recover), this error never occurs.

It's been seen many users complaining about this elsewhere, should we quirk 
for those "heavy" disk? Or print some warnings about how to relax this 
problem.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-09 10:21 Default IDENTIFY timeout is 5000ms which is too short for enterprise disks AlanCui4080
@ 2026-04-09 11:55 ` Damien Le Moal
  2026-04-09 12:01 ` Damien Le Moal
       [not found] ` <14062658.dW097sEU6C@alanarchdesktop>
  2 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2026-04-09 11:55 UTC (permalink / raw)
  To: AlanCui4080, linux-ide

On 2026/04/09 12:21, AlanCui4080 wrote:
> Hi,
> 
> I have two ST4000NM000A-2HZ100 on my computer which is of seagate enterprise 
> line.  But when i recovery from suspend, the kernel complains about that and 
> the zpool kicks the disk off:
> 
> ```
> ata2: found unknown device (class 0)
> ata4: found unknown device (class 0)
> ata2: found unknown device (class 0)
> ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata4: found unknown device (class 0)
> ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata4.00: qc timeout after 5000 msecs (cmd 0xec)
> ata4.00: qc timeout after 5000 msecs (cmd 0xec)
> ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata4.00: revalidation failed (errno=-5)
> ata2.00: qc timeout after 5000 msecs (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2.00: revalidation failed (errno=-5)
> ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata2.00: configured for UDMA/133
> ata4.00: configured for UDMA/133
> ```
> I think that's cause by the too slow spinup for my disk.
> After make libata to wait longer, the warning disappeared.
> 
> ```
> # cat /proc/cmdline
> libata.ata_probe_timeout=10
> ```
> 
> ```
> ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> sd 1:0:0:0: [sda] Starting disk
> ata2.00: configured for UDMA/133
> ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> sd 3:0:0:0: [sdb] Starting disk
> ata4.00: configured for UDMA/133
> ```
> 
> 
> Meanwhile, the seachest reports that the startup time from standby is 9sec, 
> which is longer that the default ATA IDENTIFY timeout.
> 
> ```
> /dev/sg0 - ST4000NM000A-2HZ100 - **** - TN04 - ATA
> 
> Standby Z : Recovery Time : 90 (in 100msecs)
> ```
> 
> ```
> static const unsigned int ata_eh_identify_timeouts[] = {
>          5000,  /* covers > 99% of successes and not too boring on failures */
>         10000,  /* combined time till here is enough even for media access */
>         30000,  /* for true idiots */
>         UINT_MAX,
> };
> ```
> 
> I tested the hard drive, and as long as it's never set to STANDBY_Z (disk 
> stops spinning, requiring 9 seconds to recover) and kept in IDLE_C (platter 
> slows down, requiring 3.2 seconds to recover), this error never occurs.
> 
> It's been seen many users complaining about this elsewhere, should we quirk 
> for those "heavy" disk? Or print some warnings about how to relax this 
> problem.

Elsewhere ? I have not seen any complaints/problem reports on the linux-ide list
recently. So I do not know where "elsewhere" is.

And no, we should not quirk the disk but rather improve resume from suspend to
issue identify with increasing timeouts, like regular probe does, or to issue
identify only once we see the drive ready, which is a check that exist for
spundown startups. That should solve the issue.




-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-09 10:21 Default IDENTIFY timeout is 5000ms which is too short for enterprise disks AlanCui4080
  2026-04-09 11:55 ` Damien Le Moal
@ 2026-04-09 12:01 ` Damien Le Moal
  2026-04-15 12:40   ` Niklas Cassel
       [not found] ` <14062658.dW097sEU6C@alanarchdesktop>
  2 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2026-04-09 12:01 UTC (permalink / raw)
  To: AlanCui4080, linux-ide, Niklas Cassel

On 2026/04/09 12:21, AlanCui4080 wrote:
> Hi,
> 
> I have two ST4000NM000A-2HZ100 on my computer which is of seagate enterprise 
> line.  But when i recovery from suspend, the kernel complains about that and 
> the zpool kicks the disk off:

We do not deal with out of tree code. So mentioning something that ZFS does is
not helping. Please check with an upstream file system. E.g. XFS, ext4 or BTRFS.

> 
> ```
> ata2: found unknown device (class 0)
> ata4: found unknown device (class 0)
> ata2: found unknown device (class 0)
> ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata4: found unknown device (class 0)
> ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata4.00: qc timeout after 5000 msecs (cmd 0xec)
> ata4.00: qc timeout after 5000 msecs (cmd 0xec)
> ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata4.00: revalidation failed (errno=-5)
> ata2.00: qc timeout after 5000 msecs (cmd 0xec)
> ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> ata2.00: revalidation failed (errno=-5)
> ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> ata2.00: configured for UDMA/133
> ata4.00: configured for UDMA/133
> ```
> I think that's cause by the too slow spinup for my disk.
> After make libata to wait longer, the warning disappeared.

What kernel version is this ? Did you test with the latest mainline (7.0-rc7) ?

> 
> ```
> # cat /proc/cmdline
> libata.ata_probe_timeout=10
> ```
> 
> ```
> ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> sd 1:0:0:0: [sda] Starting disk
> ata2.00: configured for UDMA/133
> ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> sd 3:0:0:0: [sdb] Starting disk
> ata4.00: configured for UDMA/133
> ```
> 
> 
> Meanwhile, the seachest reports that the startup time from standby is 9sec, 
> which is longer that the default ATA IDENTIFY timeout.

Your drive is very slow/old. Most modern drives can reply to identify even when
they are not fully spun up.

> 
> ```
> /dev/sg0 - ST4000NM000A-2HZ100 - **** - TN04 - ATA
> 
> Standby Z : Recovery Time : 90 (in 100msecs)
> ```
> 
> ```
> static const unsigned int ata_eh_identify_timeouts[] = {
>          5000,  /* covers > 99% of successes and not too boring on failures */
>         10000,  /* combined time till here is enough even for media access */
>         30000,  /* for true idiots */
>         UINT_MAX,
> };
> ```
> 
> I tested the hard drive, and as long as it's never set to STANDBY_Z (disk 
> stops spinning, requiring 9 seconds to recover) and kept in IDLE_C (platter 
> slows down, requiring 3.2 seconds to recover), this error never occurs.
> 
> It's been seen many users complaining about this elsewhere, should we quirk 
> for those "heavy" disk? Or print some warnings about how to relax this 
> problem.

Elsewhere ? That certainly was not on this list as we have seen no problem
reports recently.

And no, we should not introduce a quirk for this. Rather, we should do the same
3-steps timeout for revalidation after a resume from suspend in the same manner
as a regular probe does. Or add a check/wait for "drive ready" when resuming,
similar to the PUIS handling (power up in standby).


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-09 12:01 ` Damien Le Moal
@ 2026-04-15 12:40   ` Niklas Cassel
  2026-04-16 12:59     ` AlanCui4080
  0 siblings, 1 reply; 13+ messages in thread
From: Niklas Cassel @ 2026-04-15 12:40 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: AlanCui4080, linux-ide

Hello Alan, Damien,

On Thu, Apr 09, 2026 at 02:01:03PM +0200, Damien Le Moal wrote:
> On 2026/04/09 12:21, AlanCui4080 wrote:

[...]

> And no, we should not introduce a quirk for this. Rather, we should do the same
> 3-steps timeout for revalidation after a resume from suspend in the same manner
> as a regular probe does. Or add a check/wait for "drive ready" when resuming,
> similar to the PUIS handling (power up in standby).

Just like regular ata_dev_read_id() (called during probe),
ata_dev_reread_id() (called during revalidate) already does increase the
timeout with each retry:

# echo +10 > /sys/class/rtc/rtc0/wakealarm
# echo mem > /sys/power/state

[   22.709542] PM: suspend entry (deep)
[   22.734353] Filesystems sync: 0.024 seconds
[   22.749431] Freezing user space processes
[   22.750703] Freezing user space processes completed (elapsed 0.000 seconds)
[   22.751533] OOM killer disabled.
[   22.751939] Freezing remaining freezable tasks
[   22.753553] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[   22.763375] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[   22.764396] ata1.00: Entering standby power mode
[   22.775472] PM: suspend devices took 0.021 seconds

...
...
...

[   28.826052] PM: suspend exit
[   29.063513] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

QEMU: cmd_identify: IDENTIFY count: 4, intentionally not sending completion

[   29.064800] ata2: SATA link down (SStatus 0 SControl 300)
[   29.071055] ata3: SATA link down (SStatus 0 SControl 300)
[   29.072053] ata6: SATA link down (SStatus 0 SControl 300)
[   29.073009] ata5: SATA link down (SStatus 0 SControl 300)
[   29.074038] ata4: SATA link down (SStatus 0 SControl 300)
[   34.168702] ata1.00: qc timeout after 5000 msecs (cmd 0xec)
[   34.169820] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[   34.170754] ata1.00: revalidation failed (errno=-5)
[   34.481679] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

QEMU: cmd_identify: IDENTIFY count: 5, intentionally not sending completion

[   44.920692] ata1.00: qc timeout after 10000 msecs (cmd 0xec)
[   44.921814] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[   44.922647] ata1.00: revalidation failed (errno=-5)
[   45.232872] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

QEMU: cmd_identify: IDENTIFY count: 6, intentionally not sending completion

[   75.640934] ata1.00: qc timeout after 30000 msecs (cmd 0xec)
[   75.642127] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[   75.643091] ata1.00: revalidation failed (errno=-5)
[   75.643893] ata1.00: disable device
[   75.950433] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)



Alan,
could you please provide a full dmesg (dmesg that has not been cut)
when reproducing your problem on kernel v7.0.

And please explain your problem as detailed as you can, including which
drive/port (ataX.YY) that you are having a problem with.

ZFS is not a filesystem in the kernel, so we don't really care about it.

Are you saying that you see something like:
[   75.643893] ata1.00: disable device

instead of something like:
[   75.068077] sd 0:0:0:0: [sda] Starting disk
[   75.069628] ata1.00: configured for UDMA/100



Note that if you specify an explicit probe timeout value, e.g.
libata.ata_probe_timeout=10

Then that timeout value will be used for each retry:
https://github.com/torvalds/linux/blob/v7.0/drivers/ata/libata-core.c#L1612-L1617

I.e. if you specify an explicit probe timeout value, you will not
automatically get a larger timeout timeout for each retry.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-15 12:40   ` Niklas Cassel
@ 2026-04-16 12:59     ` AlanCui4080
  2026-04-20 16:27       ` Niklas Cassel
  0 siblings, 1 reply; 13+ messages in thread
From: AlanCui4080 @ 2026-04-16 12:59 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-ide

Hi

On Wednesday, 15 April 2026 20:40，you wrote：

> Alan,
> could you please provide a full dmesg (dmesg that has not been cut)
> when reproducing your problem on kernel v7.0.

Here is a full dmesg before enter the suspend and recovery from it:

$ sudo systemctl suspend

[12951.183287] r8169 0000:07:00.0 enp7s0: Link is Down
[12951.209606] PM: suspend entry (deep)
[12951.253352] Filesystems sync: 0.043 seconds
[12955.016893] Freezing user space processes
[12955.018682] Freezing user space processes completed (elapsed 0.001 seconds)
[12955.018686] OOM killer disabled.
[12955.018688] Freezing remaining freezable tasks
[12955.019625] Freezing remaining freezable tasks completed (elapsed 0.000 seconds)
[12955.019644] printk: Suspending console(s) (use no_console_suspend to debug)
[12955.027700] serial 00:04: disabled
[12955.036574] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[12955.037560] sd 3:0:0:0: [sdb] Synchronizing SCSI cache
[12955.194975] ACPI: PM: Preparing to enter system sleep state S3
[12955.701415] ACPI: PM: Saving platform NVS memory
[12955.701483] Disabling non-boot CPUs ...
[12955.703600] smpboot: CPU 15 is now offline
[12955.706609] smpboot: CPU 14 is now offline
[12955.709581] smpboot: CPU 13 is now offline
[12955.712508] smpboot: CPU 12 is now offline
[12955.715410] smpboot: CPU 11 is now offline
[12955.718240] smpboot: CPU 10 is now offline
[12955.721183] smpboot: CPU 9 is now offline
[12955.724271] smpboot: CPU 8 is now offline
[12955.725278] Spectre V2 : Update user space SMT mitigation: STIBP off
[12955.727089] smpboot: CPU 7 is now offline
[12955.729733] smpboot: CPU 6 is now offline
[12955.732059] smpboot: CPU 5 is now offline
[12955.734550] smpboot: CPU 4 is now offline
[12955.737036] smpboot: CPU 3 is now offline
[12955.739410] smpboot: CPU 2 is now offline
[12955.741822] smpboot: CPU 1 is now offline
[12955.743174] ACPI: PM: Low-level resume complete
[12955.743192] ACPI: PM: Restoring platform NVS memory
[12955.743331] LVT offset 0 assigned for vector 0x400
[12955.743893] Enabling non-boot CPUs ...
[12955.743930] smpboot: Booting Node 0 Processor 1 APIC 0x2
[12955.747020] CPU1 is up
[12955.747038] smpboot: Booting Node 0 Processor 2 APIC 0x4
[12955.750123] CPU2 is up
[12955.750147] smpboot: Booting Node 0 Processor 3 APIC 0x6
[12955.753787] CPU3 is up
[12955.753808] smpboot: Booting Node 0 Processor 4 APIC 0x8
[12955.757092] CPU4 is up
[12955.757112] smpboot: Booting Node 0 Processor 5 APIC 0xa
[12955.760262] CPU5 is up
[12955.760295] smpboot: Booting Node 0 Processor 6 APIC 0xc
[12955.763945] CPU6 is up
[12955.763969] smpboot: Booting Node 0 Processor 7 APIC 0xe
[12955.767190] CPU7 is up
[12955.767212] smpboot: Booting Node 0 Processor 8 APIC 0x1
[12955.770965] Spectre V2 : Update user space SMT mitigation: STIBP always-on
[12955.770995] CPU8 is up
[12955.771014] smpboot: Booting Node 0 Processor 9 APIC 0x3
[12955.774292] CPU9 is up
[12955.774314] smpboot: Booting Node 0 Processor 10 APIC 0x5
[12955.778002] CPU10 is up
[12955.778025] smpboot: Booting Node 0 Processor 11 APIC 0x7
[12955.781355] CPU11 is up
[12955.781376] smpboot: Booting Node 0 Processor 12 APIC 0x9
[12955.785077] CPU12 is up
[12955.785106] smpboot: Booting Node 0 Processor 13 APIC 0xb
[12955.789075] CPU13 is up
[12955.789106] smpboot: Booting Node 0 Processor 14 APIC 0xd
[12955.792527] CPU14 is up
[12955.792550] smpboot: Booting Node 0 Processor 15 APIC 0xf
[12955.796288] CPU15 is up
[12955.797605] ACPI: PM: Waking up from system sleep state S3
[12955.800715] xhci_hcd 0000:02:00.0: xHC error in resume, USBSTS 0x401, Reinit
[12955.800718] usb usb1: root hub lost power or was reset
[12955.800720] usb usb2: root hub lost power or was reset
[12955.801742] serial 00:04: activated
[12955.858626] nvme nvme0: D3 entry latency set to 8 seconds
[12955.865177] nvme nvme1: 8/0/0 default/read/poll queues
[12955.874829] nvme nvme0: 16/0/0 default/read/poll queues
[12956.110891] ata3: SATA link down (SStatus 0 SControl 300)
[12956.110924] ata1: SATA link down (SStatus 0 SControl 300)
[12956.110955] ata5: SATA link down (SStatus 0 SControl 330)
[12956.260598] usb 1-9: reset low-speed USB device number 7 using xhci_hcd
[12956.617639] usb 1-6: WARN: invalid context state for evaluate context command.
[12956.790599] usb 1-6: reset full-speed USB device number 3 using xhci_hcd
[12956.841562] ata6: failed to resume link (SControl 0)
[12956.841577] ata6: SATA link down (SStatus 0 SControl 0)
[12957.058571] usb 1-1: WARN: invalid context state for evaluate context command.
[12957.231590] usb 1-1: reset full-speed USB device number 2 using xhci_hcd
[12957.498554] usb 1-8: WARN: invalid context state for evaluate context command.
[12957.671531] usb 1-8: reset full-speed USB device number 5 using xhci_hcd
[12958.111579] usb 1-7: reset high-speed USB device number 4 using xhci_hcd
[12958.576614] usb 1-7.3: WARN: invalid context state for evaluate context command.
[12958.648571] usb 1-7.3: reset full-speed USB device number 6 using xhci_hcd
[12958.876493] OOM killer enabled.
[12958.876496] Restarting tasks: Starting
[12958.879178] Bluetooth: hci0: CSR: Setting up dongle with HCI ver=6 rev=22bb
[12958.879183] Bluetooth: hci0: LMP ver=6 subver=22bb; manufacturer=10
[12958.879299] Restarting tasks: Done
[12958.879306] efivarfs: resyncing variable state
[12958.886795] efivarfs: finished resyncing variable state
[12958.886811] random: crng reseeded on system resumption
[12959.047801] Bluetooth: MGMT ver 1.23
[12959.427062] NVRM: _kgspIsHeartbeatTimedOut: Heartbeat timed out, currentTimeMs 2367851435 heartbeat 0 heartbeatWithOffsetMs 0 diff 2367851435 timeout 5200
[12959.427065] NVRM: _kgspRpcRecvPoll: GSP RM heartbeat timed out
[12960.627839] PM: suspend exit
[12960.661209] Realtek Internal NBASE-T PHY r8169-0-700:00: attached PHY driver (mii_bus:phy_addr=r8169-0-700:00, irq=MAC)
[12960.820346] r8169 0000:07:00.0 enp7s0: Link is Down
[12961.113196] ata2: link is slow to respond, please be patient (ready=0)
[12961.114200] ata4: link is slow to respond, please be patient (ready=0)
[12963.428372] r8169 0000:07:00.0 enp7s0: Link is Up - 1Gbps/Full - flow control rx/tx
[12965.816162] ata2: found unknown device (class 0)
[12965.816180] ata4: found unknown device (class 0)
[12965.969160] ata2: found unknown device (class 0)
[12965.969171] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[12965.970159] ata4: found unknown device (class 0)
[12965.970167] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[12981.369196] ata2.00: qc timeout after 15000 msecs (cmd 0xec)
[12981.369207] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[12981.369210] ata2.00: revalidation failed (errno=-5)
[12981.369226] ata4.00: qc timeout after 15000 msecs (cmd 0xec)
[12981.369236] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[12981.369238] ata4.00: revalidation failed (errno=-5)
[12981.833047] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[12981.833134] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[12981.869506] ata2.00: configured for UDMA/133
[12981.879537] ata4.00: configured for UDMA/133

> And please explain your problem as detailed as you can, including which
> drive/port (ataX.YY) that you are having a problem with.

There is a soft RAID on ata2.00 and ata4.00. When i recovery from suspend,
the ata port will be failed at revalidation and "some software"
will see it as a failure, then kick the disk out of the RAID. So I started
to reproduce it.  (Those two are enterprise-grade disk, starts up really slowly)

And, This problem will only happened at recovery from suspend or when i put the
disk into "Standby_Z" mode. I guess, the datasheet of this disk said the recovery
from standby costs, at typical, 9 seconds or at maximum, 23 seconds, so that may
caused by the spinup time too long for those two disks. I enlong the timeout,
then this problem is being "relaxed", during 10 recoveries, only 2 times
the revalidation failed. Damien said a disk that properly implemented the ATA
specification should have response of ATA commands though they are spining up.
That is strange, you can see the revalidation on second time will immediately succeed,
and the total time (6s) is still short than 9 seconds. The difference between the
failed recovery to succeed one is that ata won't report "found unkown device" like
[  332.991862] ata4: found unknown device (class 0)

I attached more disk as test (they are just regular customer-grade), they
did not go wrong. And at most strange, the three new attached disk also will relaxed
the problem without adjustment on timeout (like 50 pct time failed, 50 pct time succeed).

So, at least now, I have no idea about why it happened.

> Then that timeout value will be used for each retry:
> https://github.com/torvalds/linux/blob/v7.0/drivers/ata/libata-core.c#L1612-L1617

> I.e. if you specify an explicit probe timeout value, you will not
> automatically get a larger timeout timeout for each retry.

But as far as i seen, the qc timeout will frezze the port, then reset it.
I just don't want the port failed explicitly, or without a port reset.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-16 12:59     ` AlanCui4080
@ 2026-04-20 16:27       ` Niklas Cassel
  2026-04-23  9:18         ` AlanCui4080
  0 siblings, 1 reply; 13+ messages in thread
From: Niklas Cassel @ 2026-04-20 16:27 UTC (permalink / raw)
  To: AlanCui4080; +Cc: linux-ide

On Thu, Apr 16, 2026 at 08:59:30PM +0800, AlanCui4080 wrote:

[...]

> [12961.113196] ata2: link is slow to respond, please be patient (ready=0)
> [12961.114200] ata4: link is slow to respond, please be patient (ready=0)
> [12963.428372] r8169 0000:07:00.0 enp7s0: Link is Up - 1Gbps/Full - flow control rx/tx
> [12965.816162] ata2: found unknown device (class 0)
> [12965.816180] ata4: found unknown device (class 0)
> [12965.969160] ata2: found unknown device (class 0)
> [12965.969171] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [12965.970159] ata4: found unknown device (class 0)
> [12965.970167] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [12981.369196] ata2.00: qc timeout after 15000 msecs (cmd 0xec)
> [12981.369207] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [12981.369210] ata2.00: revalidation failed (errno=-5)
> [12981.369226] ata4.00: qc timeout after 15000 msecs (cmd 0xec)
> [12981.369236] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
> [12981.369238] ata4.00: revalidation failed (errno=-5)
> [12981.833047] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [12981.833134] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [12981.869506] ata2.00: configured for UDMA/133
> [12981.879537] ata4.00: configured for UDMA/133

From this it seems that it is simply the first IDENTIFY that times out.
On the second try, it seems that the IDENTIFY passes, otherwise we would
have seen more "revalidation failed (errno=-5)" prints for the same drive.

So, from this log alone, I don't see any problem. We will try to do IDENTIFY
up to three times, so just a single IDENTIFY failing should not be a problem.

So I think the question is, at this point, can you read from the drive?

E.g.:
# dd if=/dev/sda of=/dev/null iflag=direct bs=4K count=1

replace /dev/sda with the drive connected to ata2.00 or ata4.00.

If you can read from the device, then this seem like a problem with zpool
kicking the device off the RAID array (perhaps because it is taking longer
than some zpool defined timeout value?), rather than a libata problem.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-20 16:27       ` Niklas Cassel
@ 2026-04-23  9:18         ` AlanCui4080
  2026-04-23 11:15           ` Niklas Cassel
  0 siblings, 1 reply; 13+ messages in thread
From: AlanCui4080 @ 2026-04-23  9:18 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-ide

On Tuesday, 21 April 2026 00:27，you wrote：
> From this it seems that it is simply the first IDENTIFY that times out.
> On the second try, it seems that the IDENTIFY passes, otherwise we would
> have seen more "revalidation failed (errno=-5)" prints for the same drive.
> 
> So, from this log alone, I don't see any problem. We will try to do IDENTIFY
> up to three times, so just a single IDENTIFY failing should not be a problem.

So at your opinion, the error is caused by a hardware failure but not kernel, 
so we should not add any quirk to relax or solve the problem, is that correct?
(I just want to confirm that how kernel will deal with this error)

> So I think the question is, at this point, can you read from the drive?
> 
> E.g.:
> # dd if=/dev/sda of=/dev/null iflag=direct bs=4K count=1

I will be blocked out of the shell for 5 secs unless the IDENTIFY succeed.

> 
> If you can read from the device, then this seem like a problem with zpool
> kicking the device off the RAID array (perhaps because it is taking longer
> than some zpool defined timeout value?), rather than a libata problem.

But after the link re-established, the drive works normally.

Alan



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-23  9:18         ` AlanCui4080
@ 2026-04-23 11:15           ` Niklas Cassel
  2026-04-23 14:26             ` AlanCui4080
  0 siblings, 1 reply; 13+ messages in thread
From: Niklas Cassel @ 2026-04-23 11:15 UTC (permalink / raw)
  To: AlanCui4080; +Cc: linux-ide, dlemoal

Hello Alan,

On Thu, Apr 23, 2026 at 05:18:24PM +0800, AlanCui4080 wrote:
> On Tuesday, 21 April 2026 00:27，you wrote：
> > From this it seems that it is simply the first IDENTIFY that times out.
> > On the second try, it seems that the IDENTIFY passes, otherwise we would
> > have seen more "revalidation failed (errno=-5)" prints for the same drive.
> > 
> > So, from this log alone, I don't see any problem. We will try to do IDENTIFY
> > up to three times, so just a single IDENTIFY failing should not be a problem.
> 
> So at your opinion, the error is caused by a hardware failure but not kernel, 
> so we should not add any quirk to relax or solve the problem, is that correct?
> (I just want to confirm that how kernel will deal with this error)

Like Damien said, the IDENTIFY DEVICE command is one of the few commands which
a device is required to execute without leaving the Standby state or requiring
a spin-up. A device is allowed to reply to IDENTIFY with the 'incomplete' bit
set:

37C8h - Device requires SET FEATURES subcommand to spin-up after power-up and
IDENTIFY DEVICE data is incomplete (see 4.19).
738Ch - Device requires SET FEATURES subcommand to spin-up after power-up and
IDENTIFY DEVICE data is complete (see 4.19).

8C73h - Device does not require SET FEATURES subcommand to spin-up after
power-up and IDENTIFY DEVICE data is incomplete (see 4.19).
C837h - Device does not require SET FEATURES subcommand to spin-up after
power-up and IDENTIFY DEVICE data is complete (see 4.19).

libata looks like it already handles this:
https://github.com/torvalds/linux/blob/v7.0/drivers/ata/libata-core.c#L1903-L1922

However, in your case you get a timeout, which means that the device does
not reply at all.

Before a system suspend, libata will send a spin-down/STANDBY IMMEDIATE
command to all drives.

After a system resume, libata will send a COMRESET to all devices, before
it sends the IDENTIFY, and after that it will send SET ACTIVE to spin-up
the drive.

It seems that occasionally, some of your drives hangs in a weird state after
STANDBY + COMRESET + IDENTIFY. When we get a timeout, we will do another
COMRESET + IDENTIFY, and this time your drive does not hang.

My best guess is that it is a HDD firmware bug where the drive sometimes
hangs after a STANDBY + COMRESET + IDENTIFY. Or claims to be ready before
it is actually ready.

It could of course also be a bug in e.g. ata_wait_ready(), and we are sending
the IDENTIFY command too quickly after the COMRESET, but if that was the case,
I think we would have seen way more bug reports from different vendors by now.

Anyway, considering that from a user space perspective, we are never removing
the device (we only do that if we fail IDENTIFY three times), so the retries
themselves should not be visible to user space applications.

So if you disregard the error in the log, from a user space application
perspective, the only difference should be that it takes a few extra seconds
for the device to reply to commands after a system resume.

> 
> > So I think the question is, at this point, can you read from the drive?
> > 
> > E.g.:
> > # dd if=/dev/sda of=/dev/null iflag=direct bs=4K count=1
> 
> I will be blocked out of the shell for 5 secs unless the IDENTIFY succeed.

But as soon as you get a shell after a system resume, the above command
succeeds, right?

> 
> > 
> > If you can read from the device, then this seem like a problem with zpool
> > kicking the device off the RAID array (perhaps because it is taking longer
> > than some zpool defined timeout value?), rather than a libata problem.
> 
> But after the link re-established, the drive works normally.

My suggestion is to look at the zpool code to see how long it waits to finds
all devices after a system resume before it kicks devices off the RAID array.

My initial feeling is that if your device is ready after 5 seconds after a
system resume, then the timeout value for zpool to kick off a device must be
very low.

Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-23 11:15           ` Niklas Cassel
@ 2026-04-23 14:26             ` AlanCui4080
  2026-04-23 16:17               ` Niklas Cassel
  0 siblings, 1 reply; 13+ messages in thread
From: AlanCui4080 @ 2026-04-23 14:26 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-ide

Hi,

Thank you for your really really detailed response. And i was surprised by your
reply, hopefully that doesn't cost your time too much.

On Thursday, 23 April 2026 19:15，you wrote：
> My best guess is that it is a HDD firmware bug where the drive sometimes
> hangs after a STANDBY + COMRESET + IDENTIFY. Or claims to be ready before
> it is actually ready.
> 
> It could of course also be a bug in e.g. ata_wait_ready(), and we are sending
> the IDENTIFY command too quickly after the COMRESET, but if that was the case,
> I think we would have seen way more bug reports from different vendors by now.
> 

That's the same as what i guessed, the drive's firmware didn't implement the ATA
correctly, they either ignored the command or just don't want to reply. The vendor
do really not instrest about specification, they do even break the APM and changed
the meaning of standard SMART value.

I'm not a native speaker so the actually question that i want to ask is that
will kernel do quirk for those drives so let it just don't fail to avoid costing
extra 5 seconds and producing annoying erros on console?

> > 
> > > So I think the question is, at this point, can you read from the drive?
> > > 
> > > E.g.:
> > > # dd if=/dev/sda of=/dev/null iflag=direct bs=4K count=1
> > 
> > I will be blocked out of the shell for 5 secs unless the IDENTIFY succeed.
> 
> But as soon as you get a shell after a system resume, the above command
> succeeds, right?

Yes, that's correct.

> 
> My suggestion is to look at the zpool code to see how long it waits to finds
> all devices after a system resume before it kicks devices off the RAID array.
> 
> My initial feeling is that if your device is ready after 5 seconds after a
> system resume, then the timeout value for zpool to kick off a device must be
> very low.

I do believe that's about zpool, i migrated it into btrfs RAID now, it works very
well. And if you want to know, the timeout of TXG to commit is 5000ms also.

Alan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-23 14:26             ` AlanCui4080
@ 2026-04-23 16:17               ` Niklas Cassel
  2026-05-08 20:48                 ` AlanCui4080
  0 siblings, 1 reply; 13+ messages in thread
From: Niklas Cassel @ 2026-04-23 16:17 UTC (permalink / raw)
  To: AlanCui4080; +Cc: linux-ide

Hello Alan,

On Thu, Apr 23, 2026 at 10:26:55PM +0800, AlanCui4080 wrote:
> Thank you for your really really detailed response. And i was surprised by your
> reply, hopefully that doesn't cost your time too much.

My pleasure :)


> I'm not a native speaker so the actually question that i want to ask is that
> will kernel do quirk for those drives so let it just don't fail to avoid costing
> extra 5 seconds and producing annoying erros on console?

Well, considering that the controller does not send a reply when it is
supposed to, I think that the error in the log is justified.


You also wrote that:
> I enlong the timeout, then this problem is being "relaxed", during 10
> recoveries, only 2 times the revalidation failed.

So as far as I can tell, even increasing the timeout does not solve the
problem in all cases.

So, right now, I can't even think of a way to quirk the device so that it
works reliably.


> > But as soon as you get a shell after a system resume, the above command
> > succeeds, right?
> 
> Yes, that's correct.

Great!


> > My suggestion is to look at the zpool code to see how long it waits to finds
> > all devices after a system resume before it kicks devices off the RAID array.
> > 
> > My initial feeling is that if your device is ready after 5 seconds after a
> > system resume, then the timeout value for zpool to kick off a device must be
> > very low.
> 
> I do believe that's about zpool, i migrated it into btrfs RAID now, it works very
> well. And if you want to know, the timeout of TXG to commit is 5000ms also.

Nice to hear that you are not seeing the same issue with btrfs.


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-23 16:17               ` Niklas Cassel
@ 2026-05-08 20:48                 ` AlanCui4080
  0 siblings, 0 replies; 13+ messages in thread
From: AlanCui4080 @ 2026-05-08 20:48 UTC (permalink / raw)
  To: Niklas Cassel; +Cc: linux-ide

Hi Cassel,

On Friday, 24 April 2026 00:17，you wrote：

> So as far as I can tell, even increasing the timeout does not solve the
> problem in all cases.
> 
> So, right now, I can't even think of a way to quirk the device so that it
> works reliably.

FYI, After I replaced the motherboard from B550 to X570, this problem is
completely gone. So It seems that's a corner case with this drive and my
south bridge.

The only message produced by kernel ata driver is:

May 09 04:42:37 AlanArchDesktop kernel: ata6: link is slow to respond, please be patient (ready=0)
May 09 04:42:37 AlanArchDesktop kernel: ata2: link is slow to respond, please be patient (ready=0)

Looks pretty good to me :)

Due to the B550 motherboard is taken down from my computer, I'm going
to do more tests on ata driver once I got my new CPU.

Alan.



^ permalink raw reply	[flat|nested] 13+ messages in thread

[parent not found: <14062658.dW097sEU6C@alanarchdesktop>]

[parent not found: <4482b737-1454-48cb-a941-165aa84fb2eb@kernel.org>]

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
       [not found]   ` <4482b737-1454-48cb-a941-165aa84fb2eb@kernel.org>
@ 2026-04-10 11:24     ` AlanCui4080
  2026-04-10 12:14       ` AlanCui4080
  0 siblings, 1 reply; 13+ messages in thread
From: AlanCui4080 @ 2026-04-10 11:24 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-ide

On Friday, 10 April 2026 12:19，you wrote：
> I need to check the code again, but no, That's not that. Sinc on resume we
> revalidate the device, it is ata_dev_reread_id() that needs to be a bit more lax
> on timeouts and repeatedly call ata_dev_read_id() with an increasing timeout as
> defined by ata_eh_identify_timeouts(). That should the IDENTIFY issue for drives
> that slow to respond to that command on resume/while spinning up.
> 
> >> Or add a check/wait for "drive ready"
> >> when resuming, similar to the PUIS handling (power up in standby).
> > 
> > There is tried_spinup in ata_dev_read_id(), but seems required the device to 
> > response at least incomplete IDENTIFY, with a device will never response 
> > during spining up, is that possible to implement it?
> 
> Ah, yes, forgot about that one. So it is not an option.
> 

Hi, I've tried (and extra WARN ONCE at ata_port_is_frozen):

---

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 374993031895..0ac0daae33f9 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3902,7 +3902,15 @@ int ata_dev_reread_id(struct ata_device *dev, unsigned int readid_flags)
        int rc;
 
        /* read ID data */
-       rc = ata_dev_read_id(dev, &class, readid_flags, id);
+       int retry_read_id = 3;
+       do {
+               rc = ata_dev_read_id(dev, &class, readid_flags, id);
+               if (rc) {
+                       ata_dev_warn(dev, "retrying ata_dev_read_id(), %d times remainng",
+                               retry_read_id);
+               }
+               retry_read_id--;
+       } while (rc && retry_read_id > 0);
        if (rc)
                return rc

--

But it reports:

```
[  119.260621] ata2: found unknown device (class 0)
[  119.264620] ata4: found unknown device (class 0)
[  119.415623] ata2: found unknown device (class 0)
[  119.415634] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  119.422627] ata4: found unknown device (class 0)
[  119.422636] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  124.646636] ata4.00: qc timeout after 5000 msecs (cmd 0xec)
[  124.646646] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  124.646648] ata4.00: retrying ata_dev_read_id(), 3 times remainng
[  124.646657] ------------[ cut here ]------------
[  124.646659] ata_port_is_frozen(ap)
[  124.646660] WARNING: drivers/ata/libata-core.c:1549 at ata_exec_internal+0x4e4/0x590, CPU#0: scsi_eh_3/155
...
[  124.646793] Call Trace:
[  124.646795]  <TASK>
[  124.646799]  ata_dev_read_id+0x3b2/0x560
[  124.646805]  ata_dev_reread_id+0x50/0xf0
[  124.646808]  ata_dev_revalidate+0x64/0xd0
[  124.646811]  ata_eh_recover+0xa76/0xf90
[  124.646815]  ? update_load_avg+0x7b/0x740
[  124.646819]  ? __dequeue_entity+0x4f4/0x5d0
[  124.646823]  sata_pmp_error_handler+0x387/0x660
[  124.646827]  ? __flush_work+0x2b1/0x360
[  124.646832]  ahci_error_handler+0x42/0x80
[  124.646836]  ata_scsi_port_error_handler+0x71a/0x950
[  124.646840]  ata_scsi_error+0x95/0xd0
[  124.646843]  scsi_error_handler+0xd1/0x530
[  124.646848]  ? __pfx_scsi_error_handler+0x10/0x10
[  124.646851]  kthread+0xfc/0x240
[  124.646855]  ? __pfx_kthread+0x10/0x10
[  124.646858]  ret_from_fork+0x243/0x280
[  124.646862]  ? __pfx_kthread+0x10/0x10
[  124.646865]  ret_from_fork_asm+0x1a/0x30
[  124.646873]  </TASK>
[  124.646875] ---[ end trace 0000000000000000 ]---
[  124.646877] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[  124.646879] ata4.00: retrying ata_dev_read_id(), 2 times remainng
[  124.646886] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[  124.646888] ata4.00: retrying ata_dev_read_id(), 1 times remainng
[  124.646889] ata4.00: revalidation failed (errno=-5)
[  124.646919] ata2.00: qc timeout after 5000 msecs (cmd 0xec)
[  124.646927] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  124.646929] ata2.00: retrying ata_dev_read_id(), 3 times remainng
[  124.646937] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[  124.646939] ata2.00: retrying ata_dev_read_id(), 2 times remainng
[  124.646945] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[  124.646947] ata2.00: retrying ata_dev_read_id(), 1 times remainng
[  124.646948] ata2.00: revalidation failed (errno=-5)
[  125.110629] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  125.110649] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  125.146916] ata2.00: configured for UDMA/133
[  125.163102] ata4.00: configured for UDMA/133

```

And, yes, libata will freeze the link when the qc failed:

```
if (qc->flags & ATA_QCFLAG_ACTIVE) {
	qc->err_mask |= AC_ERR_TIMEOUT;
	ata_port_freeze(ap);
	ata_dev_warn(dev, "qc timeout after %u msecs (cmd 0x%x)\n",
		     timeout, command);
}
```

So, should the retry happened in ata_exec_internal()? No, the ata_exec_internal has
no a path to cancel the command already issued, it can only be freeze and reset the
port. All we can do is to continue wait and increase the timeout 3 times before
let the port reset. I don't think that is a good idea.

Alan.



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
  2026-04-10 11:24     ` AlanCui4080
@ 2026-04-10 12:14       ` AlanCui4080
  0 siblings, 0 replies; 13+ messages in thread
From: AlanCui4080 @ 2026-04-10 12:14 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-ide

Hi,

As further infomation, I found that increase the time of timeout can only relax
the problem, In multiple wakings from S3, it failed to IDENTIFY in about 10% time.
Interestingly, after the failure, the port immediately regained the link then
successfully configured the hard drive.

```
[  322.975526] ACPI: PM: Waking up from system sleep state S3
...
[  332.991862] ata4: found unknown device (class 0)
[  332.992863] ata2: found unknown device (class 0)
[  333.147890] ata2: found unknown device (class 0)
[  333.147899] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  333.147911] ata4: found unknown device (class 0)
[  333.147920] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  348.198232] ata4.00: qc timeout after 15000 msecs (cmd 0xec)
[  348.198242] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  348.198245] ata4.00: revalidation failed (errno=-5)
[  348.198259] ata2.00: qc timeout after 15000 msecs (cmd 0xec)
[  348.198269] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  348.198272] ata2.00: revalidation failed (errno=-5)
[  348.662584] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  348.662610] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  348.699354] ata4.00: configured for UDMA/133
[  348.719825] ata2.00: configured for UDMA/133
```

And the difference between the failed recovery to succeed one is that ata
won't report "found unkown device". Then I attached new customer-level
WD and Seagate drives, and as what i think, they spinup really faster
than those Exos drives and will never be reported as revalidation failed:

```  // 2.5 inch WD Blue drive, 8 secs faster
[ 1047.409533] ACPI: PM: Waking up from system sleep state S3
...
[ 1047.724415] ata5: SATA link down (SStatus 0 SControl 330)
[ 1047.724451] ata3: SATA link down (SStatus 0 SControl 300)
[ 1048.452451] ata6: SATA link down (SStatus 0 SControl 0)
[ 1049.204451] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1049.257864] sd 0:0:0:0: [sdc] Starting disk
...
[ 1051.916495] PM: suspend exit
[ 1052.728394] ata4: link is slow to respond, please be patient (ready=0)
[ 1052.733355] ata2: link is slow to respond, please be patient (ready=0)
[ 1054.840880] r8169 0000:07:00.0 enp7s0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1057.076309] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1057.116416] sd 1:0:0:0: [sda] Starting disk
[ 1057.134584] ata2.00: configured for UDMA/133
[ 1057.532325] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1057.576679] sd 3:0:0:0: [sdb] Starting disk
[ 1057.594743] ata4.00: configured for UDMA/13
```

```  // 3.5 inch Seagate BarraCuda drive, 6 secs faster
[ 1484.056163] ACPI: PM: Waking up from system sleep state S3
[ 1484.371881] ata5: SATA link down (SStatus 0 SControl 330)
[ 1484.371917] ata3: SATA link down (SStatus 0 SControl 300)
[ 1485.099799] ata6: SATA link down (SStatus 0 SControl 0)
...
[ 1488.620192] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1488.621446] sd 0:0:0:0: [sdc] Starting disk
[ 1488.622941] ata1.00: configured for UDMA/133
...
[ 1488.633805] PM: suspend exit
[ 1489.374930] ata2: link is slow to respond, please be patient (ready=0)
[ 1489.374939] ata4: link is slow to respond, please be patient (ready=0)
[ 1491.563828] r8169 0000:07:00.0 enp7s0: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1493.666523] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1493.713096] sd 1:0:0:0: [sda] Starting disk
[ 1493.731018] ata2.00: configured for UDMA/133
[ 1494.026490] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1494.083273] sd 3:0:0:0: [sdb] Starting disk
[ 1494.101513] ata4.00: configured for UDMA/133
```
Furthermore, I discovered that adding an extra hard drive to the system can relax
the revalidation failure issue. That may shows, the hard drive might not actually
restore the linkwhen the kernel believes it has (because the kernel said it don't
know the device on the link). And the slight delay caused by adding an extra
hard drive allows command can be truly accepted by the hard drive, thus avoiding this problem.

At the same time, I'd like to point out that the AMD B550 southbridge only has
two native SATA ports, so these six ports must be of port multiplier.
Could this cause issues? I've seen many B550 users reported that the ASMedia IP Cores
used for the southbridge SATA ports are no reliable enough.

Alan.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-08 21:19 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 10:21 Default IDENTIFY timeout is 5000ms which is too short for enterprise disks AlanCui4080
2026-04-09 11:55 ` Damien Le Moal
2026-04-09 12:01 ` Damien Le Moal
2026-04-15 12:40   ` Niklas Cassel
2026-04-16 12:59     ` AlanCui4080
2026-04-20 16:27       ` Niklas Cassel
2026-04-23  9:18         ` AlanCui4080
2026-04-23 11:15           ` Niklas Cassel
2026-04-23 14:26             ` AlanCui4080
2026-04-23 16:17               ` Niklas Cassel
2026-05-08 20:48                 ` AlanCui4080
     [not found] ` <14062658.dW097sEU6C@alanarchdesktop>
     [not found]   ` <4482b737-1454-48cb-a941-165aa84fb2eb@kernel.org>
2026-04-10 11:24     ` AlanCui4080
2026-04-10 12:14       ` AlanCui4080

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox