From: Ross Boylan <ross@biostat.ucsf.edu>
To: Chris Murphy <lists@colorremedies.com>
Cc: ross@biostat.ucsf.edu,
"linux-raid@vger.kernel.org Raid" <linux-raid@vger.kernel.org>
Subject: Re: How do I tell which disk failed?
Date: Mon, 07 Jan 2013 23:49:46 -0800 [thread overview]
Message-ID: <1357631387.16366.131.camel@corn.betterworld.us> (raw)
In-Reply-To: <02B6762C-3755-4CE3-9AB1-A48D3384CACB@colorremedies.com>
On Tue, 2013-01-08 at 00:17 -0700, Chris Murphy wrote:
> On Jan 7, 2013, at 11:59 PM, Ross Boylan <ross@biostat.ucsf.edu> wrote:
> >>
> > Isn't it possible there's a hardware problem, e.g., leading to a
> > failure/retry cycle?
>
> smartctl -a /dev/sda
> smartctl -a /dev/sdb
> smartctl -a /dev/sdc
>
> Compare them. If there was a write failure reported by the drive, md would have marked the device faulty.
SMART seems to think they are all OK, though my understanding of it is
limited (e.g., the logs showed SMART reporting Temperature_Celsius of
110, but I think that's a normalized value for a raw of 42, meaning the
temp is 42 degrees celsius). Do I need to manually run a test before
the report reflects current conditions? At any rate, I did (just a
short one), and the drives passed.
The raw value (last column) for one of the parameters seems to be
changing extremely rapidly, and perhaps is overflowing:
# date; smartctl -a /dev/sda | grep 195
Mon Jan 7 23:11:03 PST 2013
195 Hardware_ECC_Recovered 0x001a 059 024 000 Old_age Always - 241377818
# date; smartctl -a /dev/sda | grep 195
Mon Jan 7 23:12:26 PST 2013
195 Hardware_ECC_Recovered 0x001a 056 024 000 Old_age Always - 3600778
Perhaps someone on this list can interpret that better than I.
My thought was disk failure (not necessarily complete failure) -> system
lockup. Continued disk flakiness leads to continued slowness after
restart as, e.g., the disk keeps retrying operations that fail.
I infer you have a different scenario in mind: the system freaks out for
a reason unrelated to the disk. The resulting shutdown (which was a
manual power off) leaves the arrays and their components in a funky
state. When the system comes back, it fixes things up.
Even if this did happen, in RAID 1 wouldn't some of the componnents
(partitions in my case) be deemed good and others bad, with the latter
resynced to match the former? And if that is happening, why can't I
tell which partition(s) are master (considered good) and which are not
(being overwritten with contents of the master)?
The sync just completed, so I can no longer poke around while the
rebuild is in process. Bad for learning and diagnosis, but good for
almost every other purpose.
Ross
next prev parent reply other threads:[~2013-01-08 7:49 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-08 2:05 How do I tell which disk failed? Ross Boylan
2013-01-08 5:19 ` Stan Hoeppner
2013-01-08 6:59 ` Ross Boylan
2013-01-08 7:17 ` Chris Murphy
2013-01-08 7:49 ` Ross Boylan [this message]
2013-01-08 8:48 ` Chris Murphy
2013-01-08 9:32 ` Ross Boylan
2013-01-08 17:36 ` Chris Murphy
2013-01-08 22:30 ` Stan Hoeppner
2013-01-08 7:59 ` Ross Boylan
2013-01-08 9:10 ` Chris Murphy
2013-01-08 21:54 ` Ross Boylan
2013-01-08 22:38 ` Chris Murphy
2013-01-08 23:13 ` Ross Boylan
2013-01-09 0:43 ` Chris Murphy
2013-01-08 23:03 ` Stan Hoeppner
2013-01-08 5:55 ` Chris Murphy
2013-01-08 9:55 ` Mikael Abrahamsson
2013-01-08 17:20 ` Ross Boylan
2013-01-08 21:24 ` pg_mh, Peter Grandi
2013-01-08 22:34 ` Stan Hoeppner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1357631387.16366.131.camel@corn.betterworld.us \
--to=ross@biostat.ucsf.edu \
--cc=linux-raid@vger.kernel.org \
--cc=lists@colorremedies.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.