* Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
@ 2026-04-09 10:21 AlanCui4080
2026-04-09 11:55 ` Damien Le Moal
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: AlanCui4080 @ 2026-04-09 10:21 UTC (permalink / raw)
To: linux-ide, dlemoal
Hi,
I have two ST4000NM000A-2HZ100 on my computer which is of seagate enterprise
line. But when i recovery from suspend, the kernel complains about that and
the zpool kicks the disk off:
```
ata2: found unknown device (class 0)
ata4: found unknown device (class 0)
ata2: found unknown device (class 0)
ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4: found unknown device (class 0)
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4.00: qc timeout after 5000 msecs (cmd 0xec)
ata4.00: qc timeout after 5000 msecs (cmd 0xec)
ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata4.00: revalidation failed (errno=-5)
ata2.00: qc timeout after 5000 msecs (cmd 0xec)
ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata2.00: revalidation failed (errno=-5)
ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
ata2.00: configured for UDMA/133
ata4.00: configured for UDMA/133
```
I think that's cause by the too slow spinup for my disk.
After make libata to wait longer, the warning disappeared.
```
# cat /proc/cmdline
libata.ata_probe_timeout=10
```
```
ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
sd 1:0:0:0: [sda] Starting disk
ata2.00: configured for UDMA/133
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
sd 3:0:0:0: [sdb] Starting disk
ata4.00: configured for UDMA/133
```
Meanwhile, the seachest reports that the startup time from standby is 9sec,
which is longer that the default ATA IDENTIFY timeout.
```
/dev/sg0 - ST4000NM000A-2HZ100 - **** - TN04 - ATA
Standby Z : Recovery Time : 90 (in 100msecs)
```
```
static const unsigned int ata_eh_identify_timeouts[] = {
5000, /* covers > 99% of successes and not too boring on failures */
10000, /* combined time till here is enough even for media access */
30000, /* for true idiots */
UINT_MAX,
};
```
I tested the hard drive, and as long as it's never set to STANDBY_Z (disk
stops spinning, requiring 9 seconds to recover) and kept in IDLE_C (platter
slows down, requiring 3.2 seconds to recover), this error never occurs.
It's been seen many users complaining about this elsewhere, should we quirk
for those "heavy" disk? Or print some warnings about how to relax this
problem.
Alan
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks 2026-04-09 10:21 Default IDENTIFY timeout is 5000ms which is too short for enterprise disks AlanCui4080 @ 2026-04-09 11:55 ` Damien Le Moal 2026-04-09 12:01 ` Damien Le Moal [not found] ` <14062658.dW097sEU6C@alanarchdesktop> 2 siblings, 0 replies; 5+ messages in thread From: Damien Le Moal @ 2026-04-09 11:55 UTC (permalink / raw) To: AlanCui4080, linux-ide On 2026/04/09 12:21, AlanCui4080 wrote: > Hi, > > I have two ST4000NM000A-2HZ100 on my computer which is of seagate enterprise > line. But when i recovery from suspend, the kernel complains about that and > the zpool kicks the disk off: > > ``` > ata2: found unknown device (class 0) > ata4: found unknown device (class 0) > ata2: found unknown device (class 0) > ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > ata4: found unknown device (class 0) > ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > ata4.00: qc timeout after 5000 msecs (cmd 0xec) > ata4.00: qc timeout after 5000 msecs (cmd 0xec) > ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) > ata4.00: revalidation failed (errno=-5) > ata2.00: qc timeout after 5000 msecs (cmd 0xec) > ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) > ata2.00: revalidation failed (errno=-5) > ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > ata2.00: configured for UDMA/133 > ata4.00: configured for UDMA/133 > ``` > I think that's cause by the too slow spinup for my disk. > After make libata to wait longer, the warning disappeared. > > ``` > # cat /proc/cmdline > libata.ata_probe_timeout=10 > ``` > > ``` > ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > sd 1:0:0:0: [sda] Starting disk > ata2.00: configured for UDMA/133 > ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > sd 3:0:0:0: [sdb] Starting disk > ata4.00: configured for UDMA/133 > ``` > > > Meanwhile, the seachest reports that the startup time from standby is 9sec, > which is longer that the default ATA IDENTIFY timeout. > > ``` > /dev/sg0 - ST4000NM000A-2HZ100 - **** - TN04 - ATA > > Standby Z : Recovery Time : 90 (in 100msecs) > ``` > > ``` > static const unsigned int ata_eh_identify_timeouts[] = { > 5000, /* covers > 99% of successes and not too boring on failures */ > 10000, /* combined time till here is enough even for media access */ > 30000, /* for true idiots */ > UINT_MAX, > }; > ``` > > I tested the hard drive, and as long as it's never set to STANDBY_Z (disk > stops spinning, requiring 9 seconds to recover) and kept in IDLE_C (platter > slows down, requiring 3.2 seconds to recover), this error never occurs. > > It's been seen many users complaining about this elsewhere, should we quirk > for those "heavy" disk? Or print some warnings about how to relax this > problem. Elsewhere ? I have not seen any complaints/problem reports on the linux-ide list recently. So I do not know where "elsewhere" is. And no, we should not quirk the disk but rather improve resume from suspend to issue identify with increasing timeouts, like regular probe does, or to issue identify only once we see the drive ready, which is a check that exist for spundown startups. That should solve the issue. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks 2026-04-09 10:21 Default IDENTIFY timeout is 5000ms which is too short for enterprise disks AlanCui4080 2026-04-09 11:55 ` Damien Le Moal @ 2026-04-09 12:01 ` Damien Le Moal [not found] ` <14062658.dW097sEU6C@alanarchdesktop> 2 siblings, 0 replies; 5+ messages in thread From: Damien Le Moal @ 2026-04-09 12:01 UTC (permalink / raw) To: AlanCui4080, linux-ide, Niklas Cassel On 2026/04/09 12:21, AlanCui4080 wrote: > Hi, > > I have two ST4000NM000A-2HZ100 on my computer which is of seagate enterprise > line. But when i recovery from suspend, the kernel complains about that and > the zpool kicks the disk off: We do not deal with out of tree code. So mentioning something that ZFS does is not helping. Please check with an upstream file system. E.g. XFS, ext4 or BTRFS. > > ``` > ata2: found unknown device (class 0) > ata4: found unknown device (class 0) > ata2: found unknown device (class 0) > ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > ata4: found unknown device (class 0) > ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > ata4.00: qc timeout after 5000 msecs (cmd 0xec) > ata4.00: qc timeout after 5000 msecs (cmd 0xec) > ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) > ata4.00: revalidation failed (errno=-5) > ata2.00: qc timeout after 5000 msecs (cmd 0xec) > ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) > ata2.00: revalidation failed (errno=-5) > ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > ata2.00: configured for UDMA/133 > ata4.00: configured for UDMA/133 > ``` > I think that's cause by the too slow spinup for my disk. > After make libata to wait longer, the warning disappeared. What kernel version is this ? Did you test with the latest mainline (7.0-rc7) ? > > ``` > # cat /proc/cmdline > libata.ata_probe_timeout=10 > ``` > > ``` > ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > sd 1:0:0:0: [sda] Starting disk > ata2.00: configured for UDMA/133 > ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > sd 3:0:0:0: [sdb] Starting disk > ata4.00: configured for UDMA/133 > ``` > > > Meanwhile, the seachest reports that the startup time from standby is 9sec, > which is longer that the default ATA IDENTIFY timeout. Your drive is very slow/old. Most modern drives can reply to identify even when they are not fully spun up. > > ``` > /dev/sg0 - ST4000NM000A-2HZ100 - **** - TN04 - ATA > > Standby Z : Recovery Time : 90 (in 100msecs) > ``` > > ``` > static const unsigned int ata_eh_identify_timeouts[] = { > 5000, /* covers > 99% of successes and not too boring on failures */ > 10000, /* combined time till here is enough even for media access */ > 30000, /* for true idiots */ > UINT_MAX, > }; > ``` > > I tested the hard drive, and as long as it's never set to STANDBY_Z (disk > stops spinning, requiring 9 seconds to recover) and kept in IDLE_C (platter > slows down, requiring 3.2 seconds to recover), this error never occurs. > > It's been seen many users complaining about this elsewhere, should we quirk > for those "heavy" disk? Or print some warnings about how to relax this > problem. Elsewhere ? That certainly was not on this list as we have seen no problem reports recently. And no, we should not introduce a quirk for this. Rather, we should do the same 3-steps timeout for revalidation after a resume from suspend in the same manner as a regular probe does. Or add a check/wait for "drive ready" when resuming, similar to the PUIS handling (power up in standby). -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <14062658.dW097sEU6C@alanarchdesktop>]
[parent not found: <4482b737-1454-48cb-a941-165aa84fb2eb@kernel.org>]
* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks [not found] ` <4482b737-1454-48cb-a941-165aa84fb2eb@kernel.org> @ 2026-04-10 11:24 ` AlanCui4080 2026-04-10 12:14 ` AlanCui4080 0 siblings, 1 reply; 5+ messages in thread From: AlanCui4080 @ 2026-04-10 11:24 UTC (permalink / raw) To: Damien Le Moal; +Cc: linux-ide On Friday, 10 April 2026 12:19,you wrote: > I need to check the code again, but no, That's not that. Sinc on resume we > revalidate the device, it is ata_dev_reread_id() that needs to be a bit more lax > on timeouts and repeatedly call ata_dev_read_id() with an increasing timeout as > defined by ata_eh_identify_timeouts(). That should the IDENTIFY issue for drives > that slow to respond to that command on resume/while spinning up. > > >> Or add a check/wait for "drive ready" > >> when resuming, similar to the PUIS handling (power up in standby). > > > > There is tried_spinup in ata_dev_read_id(), but seems required the device to > > response at least incomplete IDENTIFY, with a device will never response > > during spining up, is that possible to implement it? > > Ah, yes, forgot about that one. So it is not an option. > Hi, I've tried (and extra WARN ONCE at ata_port_is_frozen): --- diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 374993031895..0ac0daae33f9 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -3902,7 +3902,15 @@ int ata_dev_reread_id(struct ata_device *dev, unsigned int readid_flags) int rc; /* read ID data */ - rc = ata_dev_read_id(dev, &class, readid_flags, id); + int retry_read_id = 3; + do { + rc = ata_dev_read_id(dev, &class, readid_flags, id); + if (rc) { + ata_dev_warn(dev, "retrying ata_dev_read_id(), %d times remainng", + retry_read_id); + } + retry_read_id--; + } while (rc && retry_read_id > 0); if (rc) return rc -- But it reports: ``` [ 119.260621] ata2: found unknown device (class 0) [ 119.264620] ata4: found unknown device (class 0) [ 119.415623] ata2: found unknown device (class 0) [ 119.415634] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 119.422627] ata4: found unknown device (class 0) [ 119.422636] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 124.646636] ata4.00: qc timeout after 5000 msecs (cmd 0xec) [ 124.646646] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 124.646648] ata4.00: retrying ata_dev_read_id(), 3 times remainng [ 124.646657] ------------[ cut here ]------------ [ 124.646659] ata_port_is_frozen(ap) [ 124.646660] WARNING: drivers/ata/libata-core.c:1549 at ata_exec_internal+0x4e4/0x590, CPU#0: scsi_eh_3/155 ... [ 124.646793] Call Trace: [ 124.646795] <TASK> [ 124.646799] ata_dev_read_id+0x3b2/0x560 [ 124.646805] ata_dev_reread_id+0x50/0xf0 [ 124.646808] ata_dev_revalidate+0x64/0xd0 [ 124.646811] ata_eh_recover+0xa76/0xf90 [ 124.646815] ? update_load_avg+0x7b/0x740 [ 124.646819] ? __dequeue_entity+0x4f4/0x5d0 [ 124.646823] sata_pmp_error_handler+0x387/0x660 [ 124.646827] ? __flush_work+0x2b1/0x360 [ 124.646832] ahci_error_handler+0x42/0x80 [ 124.646836] ata_scsi_port_error_handler+0x71a/0x950 [ 124.646840] ata_scsi_error+0x95/0xd0 [ 124.646843] scsi_error_handler+0xd1/0x530 [ 124.646848] ? __pfx_scsi_error_handler+0x10/0x10 [ 124.646851] kthread+0xfc/0x240 [ 124.646855] ? __pfx_kthread+0x10/0x10 [ 124.646858] ret_from_fork+0x243/0x280 [ 124.646862] ? __pfx_kthread+0x10/0x10 [ 124.646865] ret_from_fork_asm+0x1a/0x30 [ 124.646873] </TASK> [ 124.646875] ---[ end trace 0000000000000000 ]--- [ 124.646877] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x40) [ 124.646879] ata4.00: retrying ata_dev_read_id(), 2 times remainng [ 124.646886] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x40) [ 124.646888] ata4.00: retrying ata_dev_read_id(), 1 times remainng [ 124.646889] ata4.00: revalidation failed (errno=-5) [ 124.646919] ata2.00: qc timeout after 5000 msecs (cmd 0xec) [ 124.646927] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 124.646929] ata2.00: retrying ata_dev_read_id(), 3 times remainng [ 124.646937] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40) [ 124.646939] ata2.00: retrying ata_dev_read_id(), 2 times remainng [ 124.646945] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40) [ 124.646947] ata2.00: retrying ata_dev_read_id(), 1 times remainng [ 124.646948] ata2.00: revalidation failed (errno=-5) [ 125.110629] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 125.110649] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 125.146916] ata2.00: configured for UDMA/133 [ 125.163102] ata4.00: configured for UDMA/133 ``` And, yes, libata will freeze the link when the qc failed: ``` if (qc->flags & ATA_QCFLAG_ACTIVE) { qc->err_mask |= AC_ERR_TIMEOUT; ata_port_freeze(ap); ata_dev_warn(dev, "qc timeout after %u msecs (cmd 0x%x)\n", timeout, command); } ``` So, should the retry happened in ata_exec_internal()? No, the ata_exec_internal has no a path to cancel the command already issued, it can only be freeze and reset the port. All we can do is to continue wait and increase the timeout 3 times before let the port reset. I don't think that is a good idea. Alan. ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks 2026-04-10 11:24 ` AlanCui4080 @ 2026-04-10 12:14 ` AlanCui4080 0 siblings, 0 replies; 5+ messages in thread From: AlanCui4080 @ 2026-04-10 12:14 UTC (permalink / raw) To: Damien Le Moal; +Cc: linux-ide Hi, As further infomation, I found that increase the time of timeout can only relax the problem, In multiple wakings from S3, it failed to IDENTIFY in about 10% time. Interestingly, after the failure, the port immediately regained the link then successfully configured the hard drive. ``` [ 322.975526] ACPI: PM: Waking up from system sleep state S3 ... [ 332.991862] ata4: found unknown device (class 0) [ 332.992863] ata2: found unknown device (class 0) [ 333.147890] ata2: found unknown device (class 0) [ 333.147899] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 333.147911] ata4: found unknown device (class 0) [ 333.147920] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 348.198232] ata4.00: qc timeout after 15000 msecs (cmd 0xec) [ 348.198242] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 348.198245] ata4.00: revalidation failed (errno=-5) [ 348.198259] ata2.00: qc timeout after 15000 msecs (cmd 0xec) [ 348.198269] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4) [ 348.198272] ata2.00: revalidation failed (errno=-5) [ 348.662584] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 348.662610] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 348.699354] ata4.00: configured for UDMA/133 [ 348.719825] ata2.00: configured for UDMA/133 ``` And the difference between the failed recovery to succeed one is that ata won't report "found unkown device". Then I attached new customer-level WD and Seagate drives, and as what i think, they spinup really faster than those Exos drives and will never be reported as revalidation failed: ``` // 2.5 inch WD Blue drive, 8 secs faster [ 1047.409533] ACPI: PM: Waking up from system sleep state S3 ... [ 1047.724415] ata5: SATA link down (SStatus 0 SControl 330) [ 1047.724451] ata3: SATA link down (SStatus 0 SControl 300) [ 1048.452451] ata6: SATA link down (SStatus 0 SControl 0) [ 1049.204451] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 1049.257864] sd 0:0:0:0: [sdc] Starting disk ... [ 1051.916495] PM: suspend exit [ 1052.728394] ata4: link is slow to respond, please be patient (ready=0) [ 1052.733355] ata2: link is slow to respond, please be patient (ready=0) [ 1054.840880] r8169 0000:07:00.0 enp7s0: Link is Up - 1Gbps/Full - flow control rx/tx [ 1057.076309] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1057.116416] sd 1:0:0:0: [sda] Starting disk [ 1057.134584] ata2.00: configured for UDMA/133 [ 1057.532325] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1057.576679] sd 3:0:0:0: [sdb] Starting disk [ 1057.594743] ata4.00: configured for UDMA/13 ``` ``` // 3.5 inch Seagate BarraCuda drive, 6 secs faster [ 1484.056163] ACPI: PM: Waking up from system sleep state S3 [ 1484.371881] ata5: SATA link down (SStatus 0 SControl 330) [ 1484.371917] ata3: SATA link down (SStatus 0 SControl 300) [ 1485.099799] ata6: SATA link down (SStatus 0 SControl 0) ... [ 1488.620192] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1488.621446] sd 0:0:0:0: [sdc] Starting disk [ 1488.622941] ata1.00: configured for UDMA/133 ... [ 1488.633805] PM: suspend exit [ 1489.374930] ata2: link is slow to respond, please be patient (ready=0) [ 1489.374939] ata4: link is slow to respond, please be patient (ready=0) [ 1491.563828] r8169 0000:07:00.0 enp7s0: Link is Up - 1Gbps/Full - flow control rx/tx [ 1493.666523] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1493.713096] sd 1:0:0:0: [sda] Starting disk [ 1493.731018] ata2.00: configured for UDMA/133 [ 1494.026490] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1494.083273] sd 3:0:0:0: [sdb] Starting disk [ 1494.101513] ata4.00: configured for UDMA/133 ``` Furthermore, I discovered that adding an extra hard drive to the system can relax the revalidation failure issue. That may shows, the hard drive might not actually restore the linkwhen the kernel believes it has (because the kernel said it don't know the device on the link). And the slight delay caused by adding an extra hard drive allows command can be truly accepted by the hard drive, thus avoiding this problem. At the same time, I'd like to point out that the AMD B550 southbridge only has two native SATA ports, so these six ports must be of port multiplier. Could this cause issues? I've seen many B550 users reported that the ASMedia IP Cores used for the southbridge SATA ports are no reliable enough. Alan. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-10 12:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-09 10:21 Default IDENTIFY timeout is 5000ms which is too short for enterprise disks AlanCui4080
2026-04-09 11:55 ` Damien Le Moal
2026-04-09 12:01 ` Damien Le Moal
[not found] ` <14062658.dW097sEU6C@alanarchdesktop>
[not found] ` <4482b737-1454-48cb-a941-165aa84fb2eb@kernel.org>
2026-04-10 11:24 ` AlanCui4080
2026-04-10 12:14 ` AlanCui4080
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox