Fwd: Question / Request about timeouts of SATA harddisks [was:]devices get kicked from RAID about once a month

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Fwd: Question / Request about timeouts of SATA harddisks [was:]devices get kicked from RAID about once a month
@ 2010-06-04  6:35 Stefan /*St0fF*/ Hübner
  0 siblings, 0 replies; only message in thread
From: Stefan /*St0fF*/ Hübner @ 2010-06-04  6:35 UTC (permalink / raw)
  To: Linux RAID, Dan Christensen, Neil Brown, Bill Davidsen

Below is the message I posted on linux-scsi regarding the timeouts.
I guess my approach is a bit different from Bill's, but my experience at
work is (always getting 100% OK drives back from customers stating
they'd be broken) that the internal error-correction of disk drives
makes them fail RAIDs too often.  And that's because the read-commands
time out too early.

Stefan

-------- Original-Nachricht --------
Betreff: Question / Request about timeouts of SATA harddisks
Datum: Thu, 03 Jun 2010 08:32:45 +0200
Von: Stefan /*St0fF*/ Hübner  <stefan.huebner@stud.tu-ilmenau.de>
Antwort an: st0ff@npl.de
An: linux-scsi@vger.kernel.org

Dear list,

concerning RAIDs with Desktop class drives it'd be good to know if there
is any kernel-timeout-value which states, how long a diskdrive may take
to process a command.  If it doesn't respond in-time I've seen in my
logs and with many disks that the sg-eh becomes active resetting the
bus.  So somewhere there needs to be such a timeout.

The question is: can this timeout-value be found somewhere in the sysfs?
 If "yes" where?  If "no", can it be exported?

Suggestions for the maximum of this timeout-value:
I've read in multiple places that the internal error correction of
desktop-class drives can even take more than 2 minutes to complete.  But
another much bigger value can be taken directly out of the
IDENTIFY_DEVICE_STRUCTURE, word 89, which states how long a
SECURITY_ERASE_UNIT command takes approximately (in minutes-see
ATA8-ACS, table 32 for the exact meaning of the value).  A in some cases
even longer time can be found in word 90 (time estimate for enhanced
security erase unit, table 33).

I know for sure that this command should not be issued from linux - at
least not at this state of libata.  I've done it twice on two different
Samsung HD103UJ drives to "securely erase customer data".  While
executing the command libata made the PassThru command fail within a
minute and according to syslog it unsuccessfully tried to reset the
drive.  The results were (after letting the drives run for about the
indicated 3 hours): one drive broke completely, it was never again
recognized on any computer, the other drive was recognized though, but
did not operate properly anymore.

Any comments and hints are very welcome.  All the best,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2010-06-04  6:35 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-04  6:35 Fwd: Question / Request about timeouts of SATA harddisks [was:]devices get kicked from RAID about once a month Stefan /*St0fF*/ Hübner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).