From: Tim Bock <jtbock@daylight.com>
To: Goswin von Brederlow <goswin-v-b@web.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: Question about raid robustness when disk fails
Date: Mon, 25 Jan 2010 09:22:27 -0700 [thread overview]
Message-ID: <1264436547.3015.24.camel@kije> (raw)
In-Reply-To: <87hbqeyua9.fsf@frosties.localdomain>
Thank you for the response. Through the smartctl tests, I noticed that
the "seek error rate" value for the misbehaving disk was at 42, with the
threshold at 30. For other disks in the same array, the "seek error
rate" values were up around 75 (same threshold of 30). As it seems the
values decrement to the threshold, I took that as a further sign that
the disk was in trouble and replaced it. Any likely correlation between
the described problem and the "seek error rate" value?
Is there a way to post-mortem the drive/logs/other traces to gain
insight into what the lower layer problem was? I would like to be able
to definitively pinpoint (or at least have a reasonable level of
confidence about) the cause of the problem. The ultimate goal, of
course, is to try and prevent any recurrence.
Thanks again,
Tim
On Fri, 2010-01-22 at 17:32 +0100, Goswin von Brederlow wrote:
> Tim Bock <jtbock@daylight.com> writes:
>
> > Hello,
> >
> > I built a raid-1 + lvm setup on a Dell 2950 in December 2008. The OS
> > disk (ubuntu server 8.04) is not part of the raid. Raid is 4 disks + 1
> > hot spare (all raid disks are sata, 1TB Seagates).
> >
> > Worked like a charm for ten months, and then had some kind of disk
> > problem in October which drove the load average to 13. Initially tried
> > a reboot, but system would not come all of the way back up. Had to boot
> > single-user and comment out the RAID entry. System came up, I manually
> > failed/removed the offending disk, added the RAID entry back to fstab,
> > rebooted, and things proceeded as I would expect. Replaced offending
> > drive.
>
> If a drive goes crazy without actualy dying then linux can spend a
> long time trying to get something from the drive. The driver chip can
> go crazy or the driver itself can have a bug and lockup. All those
> things are below the raid level and if they halt your system then raid
> can not do anything about it.
>
> Only when a drive goes bad and the lower layers report an error to the
> raid level can raid cope with the situation, remove the drive and keep
> running. Unfortunately there seems to be a loose correlation between
> cost of the controler (chip) and the likelyhood of a failing disk
> locking up the system. I.e. the cheap onboard SATA chips on desktop
> systems do that more often than expensive server controler. But that
> is just a loose relationship.
>
> MfG
> Goswin
>
> PS: I've seen hardware raid boxes lock up too so this isn't a drawback
> of software raid.
next prev parent reply other threads:[~2010-01-25 16:22 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-08 17:39 Question about raid robustness when disk fails Tim Bock
2010-01-22 16:32 ` Goswin von Brederlow
2010-01-25 16:22 ` Tim Bock [this message]
2010-01-25 17:51 ` Goswin von Brederlow
2010-01-25 18:12 ` Michał Sawicz
2010-01-26 7:29 ` Goswin von Brederlow
2010-01-27 0:19 ` Ryan Wagoner
2010-01-27 4:22 ` Michael Evans
2010-01-27 9:04 ` Goswin von Brederlow
2010-01-27 9:22 ` Asdo
2010-01-27 10:25 ` Goswin von Brederlow
2010-01-27 10:43 ` Asdo
2010-01-27 15:34 ` Goswin von Brederlow
2010-01-28 11:52 ` Michael Evans
2010-01-27 15:15 ` Tim Bock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1264436547.3015.24.camel@kije \
--to=jtbock@daylight.com \
--cc=goswin-v-b@web.de \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox