From: Thomas Fjellstrom <thomas@fjellstrom.ca>
To: Phil Turmel <philip@turmel.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: raid/device failure
Date: Sun, 10 Feb 2013 19:55:18 -0700 [thread overview]
Message-ID: <201302101955.19121.thomas@fjellstrom.ca> (raw)
In-Reply-To: <511852EF.9070201@turmel.org>
On February 10, 2013, Phil Turmel wrote:
> On 02/10/2013 08:27 PM, Thomas Fjellstrom wrote:
> > I've re-configured my NAS box (still haven't put it into "production") to
> > be a raid5 over 7 2TB consumer seagate barracuda drives, and with some
> > tweaking, performance was looking stellar.
>
> > Unfortunately I started seeing some messages in dmesg that worried me:
> [trim /]
>
> The MD subsystem keeps a count of read errors on each device, corrected
> or not, and kicks the drive out when the count reaches twenty (20).
> Every hour, the accumulated count is cut in half to allow for general
> URE "maintenenance" in regular scrubs. This behavior and the count are
> hardcoded in the kernel source.
>
Interesting. Thats good to know.
> > I've run full S.M.A.R.T. tests (except the conveyance test, probably run
> > that tonight and see what happens) on all drives in the array, and there
> > are no obvious warnings or errors in the S.M.A.R.T. restults at all.
> > Including reallocated (pending or not) sectors.
>
> MD fixed most of these errors, so I wouldn't expect to see them in SMART
> unless the fix triggered a relocation. But some weren't corrected--so I
> would be concerned that MD and SMART don't agree.
That is what I was wondering. I tought an uncorrected read error meant it
wrote the data back out, and then a read of that data again was wrong.
> Have these drives ever been scrubbed? (I vaguely recall you mentioning
> new drives...) If they are new and already had a URE, I'd be concerned
> about mishandling during shipping. If they aren't new, I'd
> destructively exercise them and retest.
They are new in that they haven't been used very much at all yet, and I
haven't done a full scrub over every sector. I have run some lenghy tests
using iozone over 32GB or more space (individually, and as part of a raid6),
but as a bunch of parameters have changed from my last setup (raid5 vs raid6,
xfs inode32 vs inode64), and xfs/md may or may not have alloated the test
files from different areas of the device, so I can't be sure that the same
general area of the disks were being accessed.
I did think that a full destructive write test may be in order, just to make
sure. I've seen a drive throw errors at me, refuse to reallocate a sector
untill it was written over manually, and then work fine afterwards.
> > I've seen references while searching for possible causes, where people
> > had this error occur with faulty cables, or SAS backplanes. Is this a
> > likely senario? The cables are brand new, but anything is possible.
> >
> > The card is a IBM M1015 8 port HBA flashed with the LSI 9211-8i IT
> > firmware, and no BIOS.
>
> It might not hurt to recheck your power supply rating vs. load. If you
> can't find anything else, a data-logging voltmeter with min/max capture
> would be my tool of choice.
>
> http://www.fluke.com/fluke/usen/digital-multimeters/fluke-287.htm?PID=56058
The PSU is overspeced if anything. But that doesn't mean it's not faulty in
some way. It's a Seasonic G series 450W 80+ gold PSU. The system at full load
should come in at just over half of that (core i3 2120, intel s1200kp m-itx
board, hba, 7 hdds, 2 ssds, 2 x 8GB ddr3 1333mhz ECC ram).
I have an Agilent U1253B ( http://goo.gl/kl1aC ) which should be adequate to
test with.
The NAS is on a 1000VA (600W?) UPS, so incomming power should be decently
clean and even (assuming the UPS isn't bad).
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Thomas Fjellstrom
thomas@fjellstrom.ca
next prev parent reply other threads:[~2013-02-11 2:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-11 1:27 raid/device failure Thomas Fjellstrom
2013-02-11 2:09 ` Phil Turmel
2013-02-11 2:52 ` EJ Vincent
2013-02-11 3:44 ` Phil Turmel
2013-02-11 20:28 ` EJ Vincent
2013-02-11 2:55 ` Thomas Fjellstrom [this message]
2013-02-11 3:22 ` Brad Campbell
2013-02-11 7:55 ` Thomas Fjellstrom
2013-02-11 8:29 ` Roy Sigurd Karlsbakk
2013-02-11 9:13 ` Thomas Fjellstrom
2013-02-12 22:31 ` Thomas Fjellstrom
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201302101955.19121.thomas@fjellstrom.ca \
--to=thomas@fjellstrom.ca \
--cc=linux-raid@vger.kernel.org \
--cc=philip@turmel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).