Debugging a strange array corruption

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Brad Campbell <brad@wasp.net.au>
To: RAID Linux <linux-raid@vger.kernel.org>
Subject: Debugging a strange array corruption
Date: Tue, 14 Dec 2010 16:10:07 +0800	[thread overview]
Message-ID: <4D07265F.8060109@wasp.net.au> (raw)

G'day all,

I have a 10 x 1TB drive RAID-6 here. It's been great for ages, but recently I've seen nasty random 
corruption across the entire array that I can not pin down.

The machine also has a number of RAID-1 and a RAID-5 which are all behaving perfectly.

The machine has 16GB of RAM, so all my read tests are done with dd bs=1G count=20 to make sure I'm 
actually hitting the disk somewhere.

The array is partitioned into three approximately equal partitions.

If I do something like -

for i in `seq 3` ; do dd if=/dev/md0p1 bs=1G count=20 | md5sum ; done

- I get three completely different checksums

The filesystems are unmounted and the array is idle.

I've run the same test individually on all 10 disks in the array and they all appear to give 
consistent data. Reading anything from the array gives me mostly correct data with intermittent garbage.

I've tried both a 2.6.36.[12] kernel, and I'm currently running 2.6.37-rc5-git3 with the same odd 
results.

All the disks pass long SMART tests. They all checksum correctly from end to end with repeated 
sequential runs.

No libata errors in the logs.

The drives are all on separate channels. 8 are on a pair of Marvell 88SX7042 controllers and 2 are 
on a SIL3132. This has occurred since I upgraded the mainboard (and kernel at the same time - 
nothing like throwing more variables in the mix) and its effects were subtle enough that I missed 
them until it had successfully rotated out all of my good backups with broken data. Lesson learned.

I'm stumped and I don't even know where to begin. I've never seen something like this happen without 
a bad disk, controller or cable and they are easy to diagnose.

Regards,
-- 
Dolphins are so intelligent that within a few weeks they can
train Americans to stand at the edge of the pool and throw them
fish.

next             reply	other threads:[~2010-12-14  8:10 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-14  8:10 Brad Campbell [this message]
2010-12-14  9:22 ` Debugging a strange array corruption Roman Mamedov
2010-12-14  9:37   ` Brad Campbell
2010-12-14  9:42     ` Roman Mamedov
2010-12-14 10:29       ` Brad Campbell
2010-12-14 11:59   ` David W.
2010-12-14 12:07     ` Roman Mamedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D07265F.8060109@wasp.net.au \
    --to=brad@wasp.net.au \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).