Possible data corruption after rebuild

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Possible data corruption after rebuild
@ 2012-07-06 16:09 Alex
  2012-07-06 16:11 ` Alex
  2012-07-09  0:16 ` NeilBrown
  0 siblings, 2 replies; 6+ messages in thread
From: Alex @ 2012-07-06 16:09 UTC (permalink / raw)
  To: linux-raid

Hi,

I had a situation where after rebooting all three drives of a RAID5
array were marked as spares. I rebuild the array using "mdadm -C
/dev/md1 -e 1.1 --level 5 -n 3  --chunk 512 --assume-clean /dev/sda2
/dev/sdb2 /dev/sdc2" and mdstat showed it was again assembled. The
filesystem types on /dev/sdb were all "Linux" instead of "Linux raid
autodetect", so I changed them back.

/dev/md2 also has a problem, and I have no idea what to do there either.

When I tried to fsck it to be sure it was intact, it prompted me that
there was a problem with the superblock, and I answered Yes to "Fix?".

After there being a number of further errors, I quit fsck, and am here for help.

Did I perhaps assemble the array in the wrong disk order? Is there
another superblock that may be useful here and how would I find it?

I'm really concerned that I've lost the data and really hope someone
has some ideas.

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md1 : active raid5 sda2[0] sdc2[2] sdb2[1]
      51196928 blocks super 1.1 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

md2 : active raid5 sdc3[0] sdb3[2] sda3[1]
      1890300928 blocks super 1.1 level 5, 512k chunk, algorithm 2 [3/3] [UUU]

md0 : active raid1 sdc1[0] sdb1[1]
      255988 blocks super 1.0 [3/2] [U_U]

unused devices: <none>

# mdadm -E /dev/md1
mdadm: No md superblock detected on /dev/md1.

# mdadm --detail /dev/md1
/dev/md1:
        Version : 1.1
  Creation Time : Fri Jul  6 13:41:54 2012
     Raid Level : raid5
     Array Size : 51196928 (48.83 GiB 52.43 GB)
  Used Dev Size : 25598464 (24.41 GiB 26.21 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Fri Jul  6 16:01:18 2012
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           Name : sysresccd:1  (local to host sysresccd)
           UUID : 4ce6925e:b6cbd20e:7f3efbfc:668295fe
         Events : 2

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
       2       8       34        2      active sync   /dev/sdc2

Thanks for any ideas,
Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible data corruption after rebuild
  2012-07-06 16:09 Possible data corruption after rebuild Alex
@ 2012-07-06 16:11 ` Alex
  2012-07-06 16:15   ` Alex
  2012-07-09  0:16 ` NeilBrown
  1 sibling, 1 reply; 6+ messages in thread
From: Alex @ 2012-07-06 16:11 UTC (permalink / raw)
  To: linux-raid

Hi,

> /dev/md2 also has a problem, and I have no idea what to do there either.

I should add that when trying to run fsck on /dev/md2, I receive the following:

# fsck /dev/md2
fsck from util-linux 2.20.1
e2fsck 1.42.1 (17-Feb-2012)
Superblock has an invalid journal (inode 8).
Clear<y>?

I've aborted this in hopes there's some way to fix the filesystem properly.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible data corruption after rebuild
  2012-07-06 16:11 ` Alex
@ 2012-07-06 16:15   ` Alex
  0 siblings, 0 replies; 6+ messages in thread
From: Alex @ 2012-07-06 16:15 UTC (permalink / raw)
  To: linux-raid

Hi,

I'm sorry for spamming, but I was in the process of canceling the
fsck, and it looks like it ran anyway:

#

*** ext3 journal has been deleted - filesystem is now ext2 only ***

^[[D^CTruncating orphaned inode 94375075 (uid=0, gid=504, mode=064, size=5339)

/dev/md2: ***** FILE SYSTEM WAS MODIFIED *****

When I tried to mount the filesystem now:

# mount -o ro -t ext2 /dev/md2 /tmp/dave
mount: wrong fs type, bad option, bad superblock on /dev/md2,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

I really hope someone can help.

Thanks,
Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible data corruption after rebuild
  2012-07-06 16:09 Possible data corruption after rebuild Alex
  2012-07-06 16:11 ` Alex
@ 2012-07-09  0:16 ` NeilBrown
  2012-07-10 17:53   ` Alex
  1 sibling, 1 reply; 6+ messages in thread
From: NeilBrown @ 2012-07-09  0:16 UTC (permalink / raw)
  To: Alex; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3306 bytes --]

On Fri, 6 Jul 2012 12:09:43 -0400 Alex <mysqlstudent@gmail.com> wrote:

> Hi,
> 
> I had a situation where after rebooting all three drives of a RAID5
> array were marked as spares. I rebuild the array using "mdadm -C
> /dev/md1 -e 1.1 --level 5 -n 3  --chunk 512 --assume-clean /dev/sda2
> /dev/sdb2 /dev/sdc2" and mdstat showed it was again assembled. The
> filesystem types on /dev/sdb were all "Linux" instead of "Linux raid
> autodetect", so I changed them back.

You've been bitten by http://neil.brown.name/blog/20120615073245

So md1 is all happy again is it?

> 
> /dev/md2 also has a problem, and I have no idea what to do there either.
> 
> When I tried to fsck it to be sure it was intact, it prompted me that
> there was a problem with the superblock, and I answered Yes to "Fix?".

Always use "fsck -n" to check if something is intact!!

> 
> After there being a number of further errors, I quit fsck, and am here for help.
> 
> Did I perhaps assemble the array in the wrong disk order? Is there
> another superblock that may be useful here and how would I find it?

Certainly possible.  With only 3 devices there aren't may different orders to
test so you could try them all.

As fsck thought it recognised a filesystem it is very likely that the first
device is correct, so just try swapping the other to and issuing a new
--create command.  Then "fsck -n".

NeilBrown


> 
> I'm really concerned that I've lost the data and really hope someone
> has some ideas.
> 
> # cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
> [raid4] [raid10]
> md1 : active raid5 sda2[0] sdc2[2] sdb2[1]
>       51196928 blocks super 1.1 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
> 
> md2 : active raid5 sdc3[0] sdb3[2] sda3[1]
>       1890300928 blocks super 1.1 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
> 
> md0 : active raid1 sdc1[0] sdb1[1]
>       255988 blocks super 1.0 [3/2] [U_U]
> 
> unused devices: <none>
> 
> # mdadm -E /dev/md1
> mdadm: No md superblock detected on /dev/md1.
> 
> # mdadm --detail /dev/md1
> /dev/md1:
>         Version : 1.1
>   Creation Time : Fri Jul  6 13:41:54 2012
>      Raid Level : raid5
>      Array Size : 51196928 (48.83 GiB 52.43 GB)
>   Used Dev Size : 25598464 (24.41 GiB 26.21 GB)
>    Raid Devices : 3
>   Total Devices : 3
>     Persistence : Superblock is persistent
> 
>     Update Time : Fri Jul  6 16:01:18 2012
>           State : clean
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>            Name : sysresccd:1  (local to host sysresccd)
>            UUID : 4ce6925e:b6cbd20e:7f3efbfc:668295fe
>          Events : 2
> 
>     Number   Major   Minor   RaidDevice State
>        0       8        2        0      active sync   /dev/sda2
>        1       8       18        1      active sync   /dev/sdb2
>        2       8       34        2      active sync   /dev/sdc2
> 
> Thanks for any ideas,
> Alex
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible data corruption after rebuild
  2012-07-09  0:16 ` NeilBrown
@ 2012-07-10 17:53   ` Alex
  2012-07-11  0:27     ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: Alex @ 2012-07-10 17:53 UTC (permalink / raw)
  To: NeilBrown, linux-raid

Hi,

>> I had a situation where after rebooting all three drives of a RAID5
>> array were marked as spares. I rebuild the array using "mdadm -C
>> /dev/md1 -e 1.1 --level 5 -n 3  --chunk 512 --assume-clean /dev/sda2
>> /dev/sdb2 /dev/sdc2" and mdstat showed it was again assembled. The
>> filesystem types on /dev/sdb were all "Linux" instead of "Linux raid
>> autodetect", so I changed them back.
>
> You've been bitten by http://neil.brown.name/blog/20120615073245

Ugh, that sucks. I actually performed much of what you described
before hearing from you, but didn't realize the device order was so
important and the kernel wouldn't be able to determine it on its own.

If it wasn't a production system that I had to get back online before
Monday morning, I would have been less hasty and waited a bit longer
for guidance.

> So md1 is all happy again is it?

I actually broke that array previously and turned sda1 into an ext4
because I couldn't get fc15 to properly boot from RAID1 with grub
reliably.

>> When I tried to fsck it to be sure it was intact, it prompted me that
>> there was a problem with the superblock, and I answered Yes to "Fix?".
>
> Always use "fsck -n" to check if something is intact!!

As I think I mentioned in my post, I had previously experienced
something similar to this, and you helped me through it, but it was
much easier situation. The filesystem was intact with only rebuilding
the array. This time, when the array was intact, I didn't know I had
any other option other than proceed with the fsck to attempt to fix
the filesystem anyway.

The last thing I thought was the issue was a kernel bug, and my
exhaustive googling still left me without anything useful.

> As fsck thought it recognised a filesystem it is very likely that the first
> device is correct, so just try swapping the other to and issuing a new
> --create command.  Then "fsck -n".

I still have an image of all three disks on three new identical disks.

I'm pretty sure I tried all permutations of the three devices, and
fsck complained on each of them. I'm pretty sure I screwed it up along
the way with fscking.

If I deleted the journal with fsck, and it started complaining about
the root inode missing, is there anything else that could possibly
recover the data, or is it surely gone now?

Thanks,
Alex

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Possible data corruption after rebuild
  2012-07-10 17:53   ` Alex
@ 2012-07-11  0:27     ` NeilBrown
  0 siblings, 0 replies; 6+ messages in thread
From: NeilBrown @ 2012-07-11  0:27 UTC (permalink / raw)
  To: Alex; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2834 bytes --]

On Tue, 10 Jul 2012 13:53:01 -0400 Alex <mysqlstudent@gmail.com> wrote:

> Hi,
> 
> >> I had a situation where after rebooting all three drives of a RAID5
> >> array were marked as spares. I rebuild the array using "mdadm -C
> >> /dev/md1 -e 1.1 --level 5 -n 3  --chunk 512 --assume-clean /dev/sda2
> >> /dev/sdb2 /dev/sdc2" and mdstat showed it was again assembled. The
> >> filesystem types on /dev/sdb were all "Linux" instead of "Linux raid
> >> autodetect", so I changed them back.
> >
> > You've been bitten by http://neil.brown.name/blog/20120615073245
> 
> Ugh, that sucks. I actually performed much of what you described
> before hearing from you, but didn't realize the device order was so
> important and the kernel wouldn't be able to determine it on its own.
> 
> If it wasn't a production system that I had to get back online before
> Monday morning, I would have been less hasty and waited a bit longer
> for guidance.
> 
> > So md1 is all happy again is it?
> 
> I actually broke that array previously and turned sda1 into an ext4
> because I couldn't get fc15 to properly boot from RAID1 with grub
> reliably.
> 
> >> When I tried to fsck it to be sure it was intact, it prompted me that
> >> there was a problem with the superblock, and I answered Yes to "Fix?".
> >
> > Always use "fsck -n" to check if something is intact!!
> 
> As I think I mentioned in my post, I had previously experienced
> something similar to this, and you helped me through it, but it was
> much easier situation. The filesystem was intact with only rebuilding
> the array. This time, when the array was intact, I didn't know I had
> any other option other than proceed with the fsck to attempt to fix
> the filesystem anyway.
> 
> The last thing I thought was the issue was a kernel bug, and my
> exhaustive googling still left me without anything useful.
> 
> > As fsck thought it recognised a filesystem it is very likely that the first
> > device is correct, so just try swapping the other to and issuing a new
> > --create command.  Then "fsck -n".
> 
> I still have an image of all three disks on three new identical disks.
> 
> I'm pretty sure I tried all permutations of the three devices, and
> fsck complained on each of them. I'm pretty sure I screwed it up along
> the way with fscking.
> 
> If I deleted the journal with fsck, and it started complaining about
> the root inode missing, is there anything else that could possibly
> recover the data, or is it surely gone now?

My knowledge of ext3/4 and related tools isn't good enough to be able to
answer that, but it doesn't sound good.

When did you take the image onto the three new disks?  Before or after you
tried to "fsck" ?

Do you have the "mdadm --examine" output from before the disaster?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-07-11  0:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-06 16:09 Possible data corruption after rebuild Alex
2012-07-06 16:11 ` Alex
2012-07-06 16:15   ` Alex
2012-07-09  0:16 ` NeilBrown
2012-07-10 17:53   ` Alex
2012-07-11  0:27     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).