From: Maarten <maarten@ultratux.net>
To: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: woes with... mdadm ?
Date: Wed, 27 Jan 2010 19:30:29 +0100 [thread overview]
Message-ID: <4B608645.8030102@ultratux.net> (raw)
In-Reply-To: <4877c76c1001262014w6219132ew552b092dab9aba62@mail.gmail.com>
Hi Michael, thanks for your reply
Michael Evans wrote:
> Lets validate some basics first:
>
> 1, 2) Have you stress-tested your CPU and ram?
Depending on your definition of such a test, yes. For starters, I've
installed Gentoo with it & on it. I reckon no bad CPU and/or RAM would
ever survive compiling of gcc, glibc and the kernel along with some 100
other packages. However, because DIMMs were swapped since then I did a
10-hour memtest86 today to be doubly sure: no errors.
> 3: the CRC is off only on two nibbles (between bits 4 and 11); and
> nowhere else. That usually doesn't happen with CRCs.
Okay... But I'm not sure what that points to exactly...
> 3) >> In the past I had some similar SATA controllers become corrupted
> by some test-debugging code in an older version of the kernel. Even
> if the devices firmware is up to date TRY REFLASHING/'updating' THEM.
I'm not saying that's a bad idea, but just to clarify things: two of
those 5 controllers plus the port replicator have been bought new just
last week. No chance there is corruption there, I'd say. I'll swap the
disks to not previously used cards and rerun some tests.
> 4) Have you run S.M.A.R.T. self tests
Not yet but of the 6 disks used, 4 of them are fresh new 1 TB drives.
The two used for the raid1 test were older 320 GB drives.
In any case I have a large stockpile of both SATA cards of 5+ different
makes, and many (15+) smaller disks of previously used arrays (<250GB).
So I can easily repeat this with any arbitrary combination of devices.
And they can't be all bad. But for now I have reproduced it only with
two setups, yes. I'll change the setup to get more reliable results.
> 5) If possible badblocks as well; once you've verified everything else.
I appreciate you want to eliminate all possible sources of error but can
I just say this does not look like a problem with the disk reliability?
Not that I consider myself an expert, but in the 12 years I've been
using md raid I have not had such weird failures. And the chances of it
happening on 3 separate drives, all in exactly the same manner, are
really fairly slim.
> Those are all possible and easy to test for causes of data-corruption.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-01-27 18:30 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-26 22:27 woes with... mdadm ? Maarten
2010-01-27 4:14 ` Michael Evans
2010-01-27 18:30 ` Maarten [this message]
2010-01-29 4:58 ` Michael Evans
2010-01-29 10:42 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B608645.8030102@ultratux.net \
--to=maarten@ultratux.net \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.