linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Mark Klarzynski" <mark.k@computer-design.co.uk>
To: linux-raid@vger.kernel.org
Subject: good drive / bad drive  (maxtor topic)
Date: Thu, 25 Nov 2004 11:46:06 -0000	[thread overview]
Message-ID: <005301c4d2e4$5a0bb670$4500a8c0@MarkK> (raw)


In the world of hardware raid we decide that a drive has failed based on
various criteria, one of which is the obvious 'has the drive responded'
within a set time.  This set time various depending on the drive, the
application, the load etc.  This 'timeout' value is realistically
between 6 and 10 seconds.  There is no real formula, just lots of
experience. Set it too short and drives will look failed too often, set
it too long and you risk allowing a suspect drive to continue.

Once we detect a timeout we have to decide what to with it. in SCSI we
issue a scsi bus rest (hardware reset on the bus).. the reason we do
this (and all hardware raid manufactures) is because life is just that
way. drives do lock up.   We issue up to 3 resets, and then fail.  This
is extremely effective and does exactly what it is supposed to do.
Often the drive will never cause an issue again. if it is faulty then it
will escalate and fail.

We have utilised countless sata drives, and timeouts are by far the most
significant failure we see and as sata (although its hard to tell much
else on sata).. but it is therefore imperative that the timeout values
are correct for the drive and the application.

But the point is that we do not see anywhere near the failure rates on
the Maxtor's that you guys are mentioning.  Also, if we trial sata's on
different hardware RAID's we see differing failure rates... (i.e. ICP
come in higher then 3ware, which are higher than the host independent
raids we have tested and so on)

So I am wondering if it is worth thinking about the timeout values?  And
what do you do once the drive has timed out?

I am seeing some tremendous work going on in this group and without a
doubt this community is going to propel MD to enterprise level raid one
day. so this is honestly meant as constructive and is based on way too
many years designing raid solutions. i.e. I'm not looking to start an
argument simply offering some information.






             reply	other threads:[~2004-11-25 11:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-25 11:46 Mark Klarzynski [this message]
2004-11-27 18:06 ` good drive / bad drive (maxtor topic) Guy
2004-11-28  6:16   ` Brad Campbell
2004-11-28  6:56     ` Guy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='005301c4d2e4$5a0bb670$4500a8c0@MarkK' \
    --to=mark.k@computer-design.co.uk \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).