* Seagate SATA disk flush cache timeout issue [not found] ` <201204131053446718483@bstar.com.cn> @ 2012-04-16 2:31 ` 田志仲 2012-04-22 6:19 ` Robert Hancock 0 siblings, 1 reply; 2+ messages in thread From: 田志仲 @ 2012-04-16 2:31 UTC (permalink / raw) To: linux-ide Hi, I'm working on an embedded linux DVR product and its kernel is based on 2.6.24. During recent testing I found several SATA disk IO errors while read/write disks for long time, e.g. about 24 hours. I find three kinds of Seagate SATA disk have such problem. They are ST2000DL003 (Barracuda Green / 2TB / 5900rpm / 64M cache / 4KB per sector) ST500DM002 (Barracuda Green / 500G / 7200rpm / 16M cache / 4KB per sector) ST1000526SV (SV35 series / 1TB / 7200rpm / 32M cache / 512B per sector). The kernel output is alike below. ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) ata4.00: status: { DRDY } ata4: port is slow to respond, please be patient (Status 0xd0) ata4: device not ready (errno=-16), forcing hardreset ata4: hard resetting link ata4: port is slow to respond, please be patient (Status 0xff) ata4: COMRESET failed (errno=-16) ata4: hard resetting link ata4: port is slow to respond, please be patient (Status 0xff) ata4: COMRESET failed (errno=-16) ata4: hard resetting link ata4: port is slow to respond, please be patient (Status 0xff) ata4: COMRESET failed (errno=-16) ata4: hard resetting link ata4: COMRESET failed (errno=-16) ata4: reset failed, giving up ata4.00: disabled ata4: EH complete I analyzed the kernel output and got its reason is ATA_CMD_FLUSH_EXT command timeout. I tried adding SCSI flush cache command timeout to 120 seconds and retrying 5 times when the command is timed out, the symptom was still happened. I tried adding ATA_CMD_FLUSH_EXT timeout to 120 seconds becuase of the specification of ATA8, the symptom was still happened. There is a very strange symptom that is before the failed ATA_CMD_FLUSH_EXT(cmd ea) command, the last command must be ATA_CMD_VERIFY(cmd 40). In most kernel outputs, the sector LBAs that ATA_CMD_VERIFY accessed are in a very narrow range (from 0xC24F00 to 0xC24F09), even for different disk modles, such as ST2000DL003 and ST1000526SV. I also found same symptom in debian buglist http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625922 Can you give me some suggestion on this issue? Thanks. Tony Tian 2012-04-16 tzz@bstar.com.cn ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Seagate SATA disk flush cache timeout issue 2012-04-16 2:31 ` Seagate SATA disk flush cache timeout issue 田志仲 @ 2012-04-22 6:19 ` Robert Hancock 0 siblings, 0 replies; 2+ messages in thread From: Robert Hancock @ 2012-04-22 6:19 UTC (permalink / raw) To: 田志仲; +Cc: linux-ide On 04/15/2012 08:31 PM, 田志仲 wrote: > Hi, > > I'm working on an embedded linux DVR product and its kernel is based on 2.6.24. During recent testing I found several SATA disk IO errors while read/write disks for long time, e.g. about 24 hours. > > I find three kinds of Seagate SATA disk have such problem. They are > ST2000DL003 (Barracuda Green / 2TB / 5900rpm / 64M cache / 4KB per sector) > ST500DM002 (Barracuda Green / 500G / 7200rpm / 16M cache / 4KB per sector) > ST1000526SV (SV35 series / 1TB / 7200rpm / 32M cache / 512B per sector). > > The kernel output is alike below. > ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen > ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 > res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) > ata4.00: status: { DRDY } > ata4: port is slow to respond, please be patient (Status 0xd0) > ata4: device not ready (errno=-16), forcing hardreset > ata4: hard resetting link > ata4: port is slow to respond, please be patient (Status 0xff) > ata4: COMRESET failed (errno=-16) > ata4: hard resetting link > ata4: port is slow to respond, please be patient (Status 0xff) > ata4: COMRESET failed (errno=-16) > ata4: hard resetting link > ata4: port is slow to respond, please be patient (Status 0xff) > ata4: COMRESET failed (errno=-16) > ata4: hard resetting link > ata4: COMRESET failed (errno=-16) > ata4: reset failed, giving up > ata4.00: disabled > ata4: EH complete > > I analyzed the kernel output and got its reason is ATA_CMD_FLUSH_EXT command timeout. > I tried adding SCSI flush cache command timeout to 120 seconds and retrying 5 times when the command is timed out, the symptom was still happened. > I tried adding ATA_CMD_FLUSH_EXT timeout to 120 seconds becuase of the specification of ATA8, the symptom was still happened. > > There is a very strange symptom that is before the failed ATA_CMD_FLUSH_EXT(cmd ea) command, the last command must be ATA_CMD_VERIFY(cmd 40). Do you know what is issuing verify commands? It seems like the drive is just ceasing to respond after that command gets issued. Could be a drive firmware problem or something similar. But you could try a newer kernel version and see if the behavior changes. > In most kernel outputs, the sector LBAs that ATA_CMD_VERIFY accessed are in a very narrow range (from 0xC24F00 to 0xC24F09), even for different disk modles, such as ST2000DL003 and ST1000526SV. > > I also found same symptom in debian buglist http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625922 > > Can you give me some suggestion on this issue? > > Thanks. > > Tony Tian > > 2012-04-16 > > > tzz@bstar.com.cn > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ide" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2012-04-22 6:19 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <201204131039173121876@rd.bstar.com.cn>
[not found] ` <201204131039534213981@rd.bstar.com.cn>
[not found] ` <201204131053446718483@bstar.com.cn>
2012-04-16 2:31 ` Seagate SATA disk flush cache timeout issue 田志仲
2012-04-22 6:19 ` Robert Hancock
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.