public inbox for linux-ide@vger.kernel.org
 help / color / mirror / Atom feed
* ata timeout exceptions
@ 2025-11-03  4:13 Eyal Lebedinsky
  2025-11-09 20:40 ` Niklas Cassel
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Eyal Lebedinsky @ 2025-11-03  4:13 UTC (permalink / raw)
  To: list linux-ide

I have a sata disk that is probably on its last legs.
It is a plain disk (no RAID or such). If it matters, it is an old 8TB Seagate SMA disk.
It sees very little activity.

Every two hours a small rsync copies a directory into this disk. A few 100s of files are copied each time, a few 10s of GB in total.

For the last few weeks it started to log timeout errors (not always) like this:

   kernel: ata2.00: exception Emask 0x0 SAct 0x2020 SErr 0x0 action 0x6 frozen
   kernel: ata2.00: failed command: WRITE FPDMA QUEUED
   kernel: ata2.00: cmd 61/80:28:a0:10:df/00:00:d1:01:00/40 tag 5 ncq dma 65536 out
                    res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
   kernel: ata2.00: status: { DRDY }
   kernel: ata2.00: failed command: WRITE FPDMA QUEUED
   kernel: ata2.00: cmd 61/00:68:18:15:30/20:00:20:01:00/40 tag 13 ncq dma 4194304 out
                    res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
   kernel: ata2.00: status: { DRDY }
   kernel: ata2: hard resetting link
   kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
   kernel: ata2.00: configured for UDMA/133
   kernel: ata2: EH complete

Looking at the smart log I see that one more command_timeout was counted and no other attribute is incremented.

However, later on, this error was followed by 31 more failures, probably the full command queue was aborted.
The messages mention 'tag 0 ncq dma' through 'tag 31 ncq dma'.
Again, in the smart log, the whole burst counted as one extra command_timeout.

After this going on for a few days, a repeated burst of errors lead to:
   kernel: ata2.00: NCQ disabled due to excessive errors

 From now on, only one exception is logged:
   kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
   kernel: ata2.00: failed command: WRITE DMA EXT
   kernel: ata2.00: cmd 35/00:00:98:a3:4c/00:20:86:01:00/e0 tag 6 dma 4194304 out
                    res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
   kernel: ata2.00: status: { DRDY }
   kernel: ata2: hard resetting link
   kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
   kernel: ata2.00: configured for UDMA/133
   kernel: ata2: EH complete

Furthermore, the smart log shows no change. This has been going on for the last two days,
over a dozen times.

I want to understand what is going on:

1) Why do I not see an I/O error and the writes to the disk (rsync) seem to complete?
    Which layer absorbs the errors, hiding them from the application?

2) Why do I get only one command_timeout counted (originally, with ncq active) and none when ncq is disabled?

Naturally, I already copied the disk to a replacement which I will install after this disk fails completely.

--
Eyal at Home (eyal@eyal.emu.id.au)


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-01-02  6:31 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-03  4:13 ata timeout exceptions Eyal Lebedinsky
2025-11-09 20:40 ` Niklas Cassel
2025-11-09 22:41   ` Eyal Lebedinsky
2025-11-10 13:11     ` Niklas Cassel
2025-11-14  4:32 ` Eyal Lebedinsky
2025-11-18 15:17   ` Niklas Cassel
2025-11-18 23:05     ` Eyal Lebedinsky
2025-11-19  5:41       ` Damien Le Moal
2025-11-19 13:37         ` Eyal Lebedinsky
2025-11-20  3:34           ` Damien Le Moal
2025-11-20 11:38             ` Eyal Lebedinsky
2025-11-20 12:18               ` Damien Le Moal
2025-11-20 23:53                 ` Eyal Lebedinsky
2025-12-16 23:39 ` Eyal Lebedinsky
2025-12-17  1:35   ` Damien Le Moal
2025-12-17 11:56     ` Eyal Lebedinsky
2025-12-17 12:02       ` Niklas Cassel
2025-12-20  4:03         ` Eyal Lebedinsky
2025-12-21  8:34           ` Damien Le Moal
2025-12-21 12:12             ` Eyal Lebedinsky
2025-12-21 22:43               ` Eyal Lebedinsky
2025-12-21 23:14                 ` Damien Le Moal
2025-12-22  2:10                   ` Eyal Lebedinsky
2025-12-22  3:43                     ` Damien Le Moal
2025-12-22  5:57                       ` Eyal Lebedinsky
2025-12-30 22:43                         ` Eyal Lebedinsky
2026-01-02  1:21                           ` Damien Le Moal
2026-01-02  6:30                             ` Eyal Lebedinsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox