Re: data corruption after rebuild

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <neilb@suse.de>
To: Pavel Herrmann <morpheus.ibis@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: data corruption after rebuild
Date: Wed, 20 Jul 2011 16:24:31 +1000	[thread overview]
Message-ID: <20110720162431.5df90f32@notabene.brown> (raw)
In-Reply-To: <2073005.NH8LALaxuD@bloomfield>

On Tue, 19 Jul 2011 15:55:35 +0200 Pavel Herrmann <morpheus.ibis@gmail.com>
wrote:

> Hi,
> 
> I have a big problem with mdadm, I removed a drive from my raid6, after 
> replacing it the raid started an online resync. I accidentally pushed the 
> computer and it shut down (power cord moved), and after booting it again the 
> online resync continued.
> 
> the problem is that the rebuilt array is corrupted. most of the data is fine, 
> but every several MB there is an error (which doesn't look like being caused 
> by a crash), effectively invalidating all data on the drive (about 7TB, mainly 
> HD video)
> 
> I do monthly scans, so the redundancy syndromes should have been up-to-date, 
> the array is made of 8 disks, the setup is ext4 on lvm on mdraid
> 
> is there anything to solve this? or at least ideas what happened?

My suggestion would be to remove the drive you recently added and then see if
the data is still corrupted.  It may not help but is probably worth a try.

There was a bug prior to 2.6.32 where RAID6 could sometimes write the wrong
data when recovering to a spare.  It would only happen if you were accessing
that data at the same time as it was recovery it, and if you were unlucky.

However you are running a newer kernel so that shouldn't affect you, but you
never know.

BTW the monthly scans that you do are primarily for finding sleeping bad
blocks - blocks that you cannot read.  They do check for inconsistencies in
the parity but only report them, it doesn't correct them.  This is because
automatically correcting can cause more problems than it solves.

When the monthly check reported inconsistencies you "should" have confirmed
that all the drives seem to be functioning correctly and then run a 'repair'
pass to fix the parity blocks up.

As you didn't that bad parity would have created bad data when you recovered.

NeilBrown

next prev parent reply	other threads:[~2011-07-20  6:24 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-19 13:55 data corruption after rebuild Pavel Herrmann
2011-07-19 15:12 ` Roman Mamedov
2011-07-19 16:18   ` Pavel Herrmann
2011-07-19 17:38     ` Roman Mamedov
2011-07-19 17:44       ` Pavel Herrmann
2011-07-19 16:35   ` Pavel Herrmann
2011-07-19 16:48     ` Roman Mamedov
2011-07-19 17:05       ` Pavel Herrmann
2011-07-19 18:12         ` Roman Mamedov
2011-07-20  6:24 ` NeilBrown [this message]
2011-07-20  8:20   ` Pavel Herrmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110720162431.5df90f32@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=morpheus.ibis@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).