Seagate SATA disk flush cache timeout issue

linux-ide.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Seagate SATA disk flush cache timeout issue
       [not found]   ` <201204131053446718483@bstar.com.cn>
@ 2012-04-16  2:31     ` 田志仲
  2012-04-22  6:19       ` Robert Hancock
  0 siblings, 1 reply; 2+ messages in thread
From: 田志仲 @ 2012-04-16  2:31 UTC (permalink / raw)
  To: linux-ide

Hi,

I'm working on an embedded linux DVR product and its kernel is based on 2.6.24. During recent testing I found several SATA disk IO errors while read/write disks for long time, e.g. about 24 hours. 

I find three kinds of Seagate SATA disk have such problem. They are 
ST2000DL003 (Barracuda Green / 2TB   / 5900rpm / 64M cache  / 4KB per sector)
ST500DM002  (Barracuda Green / 500G / 7200rpm / 16M cache  / 4KB per sector)
ST1000526SV (SV35 series       / 1TB   / 7200rpm / 32M cache  / 512B per sector).

The kernel output is alike below.
ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata4.00: status: { DRDY }
ata4: port is slow to respond, please be patient (Status 0xd0)
ata4: device not ready (errno=-16), forcing hardreset
ata4: hard resetting link
ata4: port is slow to respond, please be patient (Status 0xff)
ata4: COMRESET failed (errno=-16)
ata4: hard resetting link
ata4: port is slow to respond, please be patient (Status 0xff)
ata4: COMRESET failed (errno=-16)
ata4: hard resetting link
ata4: port is slow to respond, please be patient (Status 0xff)
ata4: COMRESET failed (errno=-16)
ata4: hard resetting link
ata4: COMRESET failed (errno=-16)
ata4: reset failed, giving up
ata4.00: disabled
ata4: EH complete

I analyzed the kernel output and got its reason is ATA_CMD_FLUSH_EXT command timeout.
I tried adding SCSI flush cache command timeout to 120 seconds and retrying 5 times when the command is timed out, the symptom was still happened.
I tried adding ATA_CMD_FLUSH_EXT timeout to 120 seconds becuase of the specification of ATA8, the symptom was still happened. 

There is a very strange symptom that is before the failed ATA_CMD_FLUSH_EXT(cmd ea) command, the last command must be ATA_CMD_VERIFY(cmd 40).
In most kernel outputs, the sector LBAs that ATA_CMD_VERIFY accessed are in a very narrow range (from 0xC24F00 to 0xC24F09), even for different disk modles, such as ST2000DL003 and ST1000526SV.

I also found same symptom in debian buglist http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625922

Can you give me some suggestion on this issue?

Thanks.

Tony Tian

2012-04-16 

tzz@bstar.com.cn

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Seagate SATA disk flush cache timeout issue
  2012-04-16  2:31     ` Seagate SATA disk flush cache timeout issue 田志仲
@ 2012-04-22  6:19       ` Robert Hancock
  0 siblings, 0 replies; 2+ messages in thread
From: Robert Hancock @ 2012-04-22  6:19 UTC (permalink / raw)
  To: 田志仲; +Cc: linux-ide

On 04/15/2012 08:31 PM, 田志仲 wrote:
> Hi,
>
> I'm working on an embedded linux DVR product and its kernel is based on 2.6.24. During recent testing I found several SATA disk IO errors while read/write disks for long time, e.g. about 24 hours.
>
> I find three kinds of Seagate SATA disk have such problem. They are
> ST2000DL003 (Barracuda Green / 2TB   / 5900rpm / 64M cache  / 4KB per sector)
> ST500DM002  (Barracuda Green / 500G / 7200rpm / 16M cache  / 4KB per sector)
> ST1000526SV (SV35 series       / 1TB   / 7200rpm / 32M cache  / 512B per sector).
>
> The kernel output is alike below.
> ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
>           res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
> ata4.00: status: { DRDY }
> ata4: port is slow to respond, please be patient (Status 0xd0)
> ata4: device not ready (errno=-16), forcing hardreset
> ata4: hard resetting link
> ata4: port is slow to respond, please be patient (Status 0xff)
> ata4: COMRESET failed (errno=-16)
> ata4: hard resetting link
> ata4: port is slow to respond, please be patient (Status 0xff)
> ata4: COMRESET failed (errno=-16)
> ata4: hard resetting link
> ata4: port is slow to respond, please be patient (Status 0xff)
> ata4: COMRESET failed (errno=-16)
> ata4: hard resetting link
> ata4: COMRESET failed (errno=-16)
> ata4: reset failed, giving up
> ata4.00: disabled
> ata4: EH complete
>
> I analyzed the kernel output and got its reason is ATA_CMD_FLUSH_EXT command timeout.
> I tried adding SCSI flush cache command timeout to 120 seconds and retrying 5 times when the command is timed out, the symptom was still happened.
> I tried adding ATA_CMD_FLUSH_EXT timeout to 120 seconds becuase of the specification of ATA8, the symptom was still happened.
>
> There is a very strange symptom that is before the failed ATA_CMD_FLUSH_EXT(cmd ea) command, the last command must be ATA_CMD_VERIFY(cmd 40).

Do you know what is issuing verify commands?

It seems like the drive is just ceasing to respond after that command 
gets issued. Could be a drive firmware problem or something similar. But 
you could try a newer kernel version and see if the behavior changes.

> In most kernel outputs, the sector LBAs that ATA_CMD_VERIFY accessed are in a very narrow range (from 0xC24F00 to 0xC24F09), even for different disk modles, such as ST2000DL003 and ST1000526SV.
>
> I also found same symptom in debian buglist http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625922
>
> Can you give me some suggestion on this issue?
>
> Thanks.
>
> Tony Tian
>
> 2012-04-16
>
>
> tzz@bstar.com.cn
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-04-22  6:19 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <201204131039173121876@rd.bstar.com.cn>
     [not found] ` <201204131039534213981@rd.bstar.com.cn>
     [not found]   ` <201204131053446718483@bstar.com.cn>
2012-04-16  2:31     ` Seagate SATA disk flush cache timeout issue 田志仲
2012-04-22  6:19       ` Robert Hancock

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).