All of lore.kernel.org
 help / color / mirror / Atom feed
From: AlanCui4080 <me@alancui.cc>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: linux-ide@vger.kernel.org
Subject: Re: Default IDENTIFY timeout is 5000ms which is too short for enterprise disks
Date: Fri, 10 Apr 2026 19:24:29 +0800	[thread overview]
Message-ID: <12870147.O9o76ZdvQC@alanarchdesktop> (raw)
In-Reply-To: <4482b737-1454-48cb-a941-165aa84fb2eb@kernel.org>

On Friday, 10 April 2026 12:19,you wrote:
> I need to check the code again, but no, That's not that. Sinc on resume we
> revalidate the device, it is ata_dev_reread_id() that needs to be a bit more lax
> on timeouts and repeatedly call ata_dev_read_id() with an increasing timeout as
> defined by ata_eh_identify_timeouts(). That should the IDENTIFY issue for drives
> that slow to respond to that command on resume/while spinning up.
> 
> >> Or add a check/wait for "drive ready"
> >> when resuming, similar to the PUIS handling (power up in standby).
> > 
> > There is tried_spinup in ata_dev_read_id(), but seems required the device to 
> > response at least incomplete IDENTIFY, with a device will never response 
> > during spining up, is that possible to implement it?
> 
> Ah, yes, forgot about that one. So it is not an option.
> 

Hi, I've tried (and extra WARN ONCE at ata_port_is_frozen):

---

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 374993031895..0ac0daae33f9 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3902,7 +3902,15 @@ int ata_dev_reread_id(struct ata_device *dev, unsigned int readid_flags)
        int rc;
 
        /* read ID data */
-       rc = ata_dev_read_id(dev, &class, readid_flags, id);
+       int retry_read_id = 3;
+       do {
+               rc = ata_dev_read_id(dev, &class, readid_flags, id);
+               if (rc) {
+                       ata_dev_warn(dev, "retrying ata_dev_read_id(), %d times remainng",
+                               retry_read_id);
+               }
+               retry_read_id--;
+       } while (rc && retry_read_id > 0);
        if (rc)
                return rc

--

But it reports:

```
[  119.260621] ata2: found unknown device (class 0)
[  119.264620] ata4: found unknown device (class 0)
[  119.415623] ata2: found unknown device (class 0)
[  119.415634] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  119.422627] ata4: found unknown device (class 0)
[  119.422636] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  124.646636] ata4.00: qc timeout after 5000 msecs (cmd 0xec)
[  124.646646] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  124.646648] ata4.00: retrying ata_dev_read_id(), 3 times remainng
[  124.646657] ------------[ cut here ]------------
[  124.646659] ata_port_is_frozen(ap)
[  124.646660] WARNING: drivers/ata/libata-core.c:1549 at ata_exec_internal+0x4e4/0x590, CPU#0: scsi_eh_3/155
...
[  124.646793] Call Trace:
[  124.646795]  <TASK>
[  124.646799]  ata_dev_read_id+0x3b2/0x560
[  124.646805]  ata_dev_reread_id+0x50/0xf0
[  124.646808]  ata_dev_revalidate+0x64/0xd0
[  124.646811]  ata_eh_recover+0xa76/0xf90
[  124.646815]  ? update_load_avg+0x7b/0x740
[  124.646819]  ? __dequeue_entity+0x4f4/0x5d0
[  124.646823]  sata_pmp_error_handler+0x387/0x660
[  124.646827]  ? __flush_work+0x2b1/0x360
[  124.646832]  ahci_error_handler+0x42/0x80
[  124.646836]  ata_scsi_port_error_handler+0x71a/0x950
[  124.646840]  ata_scsi_error+0x95/0xd0
[  124.646843]  scsi_error_handler+0xd1/0x530
[  124.646848]  ? __pfx_scsi_error_handler+0x10/0x10
[  124.646851]  kthread+0xfc/0x240
[  124.646855]  ? __pfx_kthread+0x10/0x10
[  124.646858]  ret_from_fork+0x243/0x280
[  124.646862]  ? __pfx_kthread+0x10/0x10
[  124.646865]  ret_from_fork_asm+0x1a/0x30
[  124.646873]  </TASK>
[  124.646875] ---[ end trace 0000000000000000 ]---
[  124.646877] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[  124.646879] ata4.00: retrying ata_dev_read_id(), 2 times remainng
[  124.646886] ata4.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[  124.646888] ata4.00: retrying ata_dev_read_id(), 1 times remainng
[  124.646889] ata4.00: revalidation failed (errno=-5)
[  124.646919] ata2.00: qc timeout after 5000 msecs (cmd 0xec)
[  124.646927] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[  124.646929] ata2.00: retrying ata_dev_read_id(), 3 times remainng
[  124.646937] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[  124.646939] ata2.00: retrying ata_dev_read_id(), 2 times remainng
[  124.646945] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[  124.646947] ata2.00: retrying ata_dev_read_id(), 1 times remainng
[  124.646948] ata2.00: revalidation failed (errno=-5)
[  125.110629] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  125.110649] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[  125.146916] ata2.00: configured for UDMA/133
[  125.163102] ata4.00: configured for UDMA/133

```

And, yes, libata will freeze the link when the qc failed:

```
if (qc->flags & ATA_QCFLAG_ACTIVE) {
	qc->err_mask |= AC_ERR_TIMEOUT;
	ata_port_freeze(ap);
	ata_dev_warn(dev, "qc timeout after %u msecs (cmd 0x%x)\n",
		     timeout, command);
}
```

So, should the retry happened in ata_exec_internal()? No, the ata_exec_internal has
no a path to cancel the command already issued, it can only be freeze and reset the
port. All we can do is to continue wait and increase the timeout 3 times before
let the port reset. I don't think that is a good idea.

Alan.



  parent reply	other threads:[~2026-04-10 11:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 10:21 Default IDENTIFY timeout is 5000ms which is too short for enterprise disks AlanCui4080
2026-04-09 11:55 ` Damien Le Moal
2026-04-09 12:01 ` Damien Le Moal
2026-04-15 12:40   ` Niklas Cassel
2026-04-16 12:59     ` AlanCui4080
2026-04-20 16:27       ` Niklas Cassel
2026-04-23  9:18         ` AlanCui4080
2026-04-23 11:15           ` Niklas Cassel
2026-04-23 14:26             ` AlanCui4080
2026-04-23 16:17               ` Niklas Cassel
2026-05-08 20:48                 ` AlanCui4080
     [not found] ` <14062658.dW097sEU6C@alanarchdesktop>
     [not found]   ` <4482b737-1454-48cb-a941-165aa84fb2eb@kernel.org>
2026-04-10 11:24     ` AlanCui4080 [this message]
2026-04-10 12:14       ` AlanCui4080

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12870147.O9o76ZdvQC@alanarchdesktop \
    --to=me@alancui.cc \
    --cc=dlemoal@kernel.org \
    --cc=linux-ide@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.