RAID 6 corruption : please help!

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID 6 corruption : please help!
@ 2005-10-05 15:26 Trevor Cordes
  2005-10-05 17:14 ` Trevor Cordes
  2005-10-06 15:33 ` Molle Bestefich
  0 siblings, 2 replies; 6+ messages in thread
From: Trevor Cordes @ 2005-10-05 15:26 UTC (permalink / raw)
  To: linux-raid

Hi all,

I'm in a dire situation.  I need some advice.  If you are knowledgable,
please help.  If you are knowledgable but busy, I can arrange to pay for
your help.

I was rebuilding my file server to switch some disks around and add new
ones.  I have a 2TB RAID6 array.  I was removing 2 components of the array
and adding 2 new ones.  I'm using FC3 2.6 kernel and mdadm for all 
operations.

I took out the 2 decommissioned drives and put in the 2 new ones.

I hotadded the 2 new ones like: mdadm -a /dev/md3 /dev/hd[qs]2 (mistake 
#1?)

md0-2 were still being rebuilt from other work I was doing (they are root, 
boot and swap) so mdstat showed md3 as being "delayed" for rebuild.  md3 
shares 2 disks with the RAID 1 root/boot/swap.

I mounted md3 rw and was able to access/write to it no problems (mistake 
#2?).

I had to reboot to change the NIC (it's a long story), and since md0 was 
still being rebuilt and md3 had NOT started rebuilding, I thought it would 
be ok (mistake #3?).

Rebooted and md0 started rebuilding again.  md3 still said it was waiting 
before rebuilding.

On boot up I got some very weird behaviour from md3.  Logs showed md3 was
operational with 9 of 10 disks (fd:1) including one of the new ones (q)
that had not been synched yet!  It also said hds2 was a spare, and it said
hdq2 was operational?!  I tried to mount ro and it failed with the usual
filesystem-corrupted errors you get when you're majorly screwed.

If I look at a hexdump of hdq2 and hds2 I can see that some data was 
written to these yet-to-be-rebuilt drives... probably in the places where 
I was writing when it was mounted rw?

The important thing in my mind is I know for sure that no rebuild was ever
started on md3, because md0 was always rebuilding.  I had mdadm --stop 
md3 before md0 ever finished and have not added it back in without 
restarting a rebuild on md0 again.

I'm trying to figure out what exactly occurred so I can try to undo it.  
I'm very good at data recovery and hex editors and such, having saved many 
a RAID5 dual-disk failure scenario, even with corrupted partition tables.  
I just can't understand what RAID6 was doing.

I know all the data is just sitting there, there's just some wackiness to 
the way it's been spread across the disks.

Please help!  Please email, I can provide a phone # if you think that will 
help you help me.  Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 6 corruption : please help!
  2005-10-05 15:26 RAID 6 corruption : please help! Trevor Cordes
@ 2005-10-05 17:14 ` Trevor Cordes
  2005-10-06  0:17   ` Tyler
  2005-10-06 15:33 ` Molle Bestefich
  1 sibling, 1 reply; 6+ messages in thread
From: Trevor Cordes @ 2005-10-05 17:14 UTC (permalink / raw)
  To: linux-raid

Followup: Mr Anvin, the author of the RAID6 bits, pointed out to me that
a major bug in RAID6 in degraded mode existed pre 2.6.10 or so.  Since I
had just freshly installed FC3, I was using the stock FC3 DVD 2.6.9.
That caused major corruption on write when degraded.

I'm still playing around to see if I can recover at the filesystem
level, since I didn't do much writing.

Thanks to all who helped.  If anyone has any other ideas, please shout.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 6 corruption : please help!
  2005-10-05 17:14 ` Trevor Cordes
@ 2005-10-06  0:17   ` Tyler
  2005-10-07 18:16     ` Trevor Cordes
  0 siblings, 1 reply; 6+ messages in thread
From: Tyler @ 2005-10-06  0:17 UTC (permalink / raw)
  To: Trevor Cordes; +Cc: linux-raid

Trevor Cordes wrote:

>Followup: Mr Anvin, the author of the RAID6 bits, pointed out to me that
>a major bug in RAID6 in degraded mode existed pre 2.6.10 or so.  Since I
>had just freshly installed FC3, I was using the stock FC3 DVD 2.6.9.
>That caused major corruption on write when degraded.
>
>I'm still playing around to see if I can recover at the filesystem
>level, since I didn't do much writing.
>
>Thanks to all who helped.  If anyone has any other ideas, please shout.
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>  
>
What was the bug, and is it maybe something that is reversible.. ?

Tyler.


-- 
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.10/119 - Release Date: 10/4/2005


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 6 corruption : please help!
  2005-10-06  0:17   ` Tyler
@ 2005-10-07 18:16     ` Trevor Cordes
  2005-10-15  4:18       ` Bill Davidsen
  0 siblings, 1 reply; 6+ messages in thread
From: Trevor Cordes @ 2005-10-07 18:16 UTC (permalink / raw)
  To: linux-raid

> What was the bug, and is it maybe something that is reversible.. ?

That's what I had thought until I knew the details of the bug.

Mr. Anvin says:

"No, it's "random"."

"The error was: when a write happened to a stripe that needs
read-modify-write, it wouldn't properly schedule the reads, and would
blindly write out whatever crap happened to be in the stripe cache."

">Do you know where in the code the bug was?  If I can only discover
>exactly what it did I could write a program to try to clean it up?"

"No, it's timing-dependent and, in either case, involve writing non-data
to the disks."

On  6 Oct, Molle Bestefich wrote:
> What's stopping you from just pulling out the two new disks, mounting
> the array using the old, almost OK disks, and fsck'ing your way out of
> the couple of files that were corrupted when you were in rw mode?

That's kind of what I thought, but I had written to the disks and for
each write lots of the entire stripe (in many cases) would get wiped out
with random data.

In the end, I ran fsck -y on it and crossed my fingers.  That recovered
nearly 8/10ths of the data before it hit some fsck bug (dies on signal
11).  The rest of the data I had 1 month old backups, so it actually
turned out pretty good.  I'm certainly going to increase my backup
frequency to weekly or twice weekly from now on -- even on a RAID6 setup
that I was *really* trusting to protect my 2TB.

Moral of the story is NEVER mount your RAID array until you update to AT
LEAST the same kernel version you were running prior!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 6 corruption : please help!
  2005-10-07 18:16     ` Trevor Cordes
@ 2005-10-15  4:18       ` Bill Davidsen
  0 siblings, 0 replies; 6+ messages in thread
From: Bill Davidsen @ 2005-10-15  4:18 UTC (permalink / raw)
  To: Trevor Cordes; +Cc: linux-raid

Trevor Cordes wrote:

>>What was the bug, and is it maybe something that is reversible.. ?
>>    
>>
>
>That's what I had thought until I knew the details of the bug.
>
>Mr. Anvin says:
>
>"No, it's "random"."
>
>"The error was: when a write happened to a stripe that needs
>read-modify-write, it wouldn't properly schedule the reads, and would
>blindly write out whatever crap happened to be in the stripe cache."
>
>">Do you know where in the code the bug was?  If I can only discover
>  
>
>>exactly what it did I could write a program to try to clean it up?"
>>    
>>
>
>"No, it's timing-dependent and, in either case, involve writing non-data
>to the disks."
>
>On  6 Oct, Molle Bestefich wrote:
>  
>
>>What's stopping you from just pulling out the two new disks, mounting
>>the array using the old, almost OK disks, and fsck'ing your way out of
>>the couple of files that were corrupted when you were in rw mode?
>>    
>>
>
>That's kind of what I thought, but I had written to the disks and for
>each write lots of the entire stripe (in many cases) would get wiped out
>with random data.
>
>In the end, I ran fsck -y on it and crossed my fingers.  That recovered
>nearly 8/10ths of the data before it hit some fsck bug (dies on signal
>11).  The rest of the data I had 1 month old backups, so it actually
>turned out pretty good.  I'm certainly going to increase my backup
>frequency to weekly or twice weekly from now on -- even on a RAID6 setup
>that I was *really* trusting to protect my 2TB.
>

You want to google for "signal 11" it (usually?) means there are 
hardware problems with the system, frequently memory or bus occasional 
failures. Unfortunately you may not be out of the woods yet.

On the other hand, you did a lot of inadvisable things to get to this 
point...

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID 6 corruption : please help!
  2005-10-05 15:26 RAID 6 corruption : please help! Trevor Cordes
  2005-10-05 17:14 ` Trevor Cordes
@ 2005-10-06 15:33 ` Molle Bestefich
  1 sibling, 0 replies; 6+ messages in thread
From: Molle Bestefich @ 2005-10-06 15:33 UTC (permalink / raw)
  To: Trevor Cordes; +Cc: linux-raid

Trevor Cordes wrote:
> I know all the data is just sitting there, there's just some wackiness to
> the way it's been spread across the disks.

What's stopping you from just pulling out the two new disks, mounting
the array using the old, almost OK disks, and fsck'ing your way out of
the couple of files that were corrupted when you were in rw mode?

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-10-15  4:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-10-05 15:26 RAID 6 corruption : please help! Trevor Cordes
2005-10-05 17:14 ` Trevor Cordes
2005-10-06  0:17   ` Tyler
2005-10-07 18:16     ` Trevor Cordes
2005-10-15  4:18       ` Bill Davidsen
2005-10-06 15:33 ` Molle Bestefich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).