RAID5 Recovery

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RAID5 Recovery
       [not found] <1161479672.14505.10.camel@localhost>
@ 2006-10-22  1:14 ` Neil Cavan
  2006-10-23  1:29 ` Neil Brown
  1 sibling, 0 replies; 12+ messages in thread
From: Neil Cavan @ 2006-10-22  1:14 UTC (permalink / raw)
  To: linux-raid

Hi,

I had a run-in with the Ubuntu Server installer, and in trying to get
the new system to recognize the clean 5-disk raid5 array left behind by
the previous Ubuntu system, I think I inadvertently instructed it to
create a new raid array using those same partitions.

What I know for sure is that now, I get this:

user@host:~$ sudo mdadm --examine /dev/hda1
mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got
00000000)
user@host:~$ sudo mdadm --examine /dev/hdc1
mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got
00000000)
user@host:~$ sudo mdadm --examine /dev/hde1
mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got
00000000)
user@host:~$ sudo mdadm --examine /dev/hdg1
mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got
00000000)
user@host:~$ sudo mdadm --examine /dev/hdi1
mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got
00000000)

I didn't format the partitions or write any data to the disk, so I think
the array's data should be intact. Is there a way to recreate the
superblocks, or am I hosed?

Thanks,
Neil

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 Recovery
       [not found] <1161479672.14505.10.camel@localhost>
  2006-10-22  1:14 ` Neil Cavan
@ 2006-10-23  1:29 ` Neil Brown
       [not found]   ` <1161571953.4871.8.camel@localhost>
  1 sibling, 1 reply; 12+ messages in thread
From: Neil Brown @ 2006-10-23  1:29 UTC (permalink / raw)
  To: nrcavan; +Cc: linux-raid

On Saturday October 21, nrcavan@engmail.uwaterloo.ca wrote:
> Hi,
> 
> I had a run-in with the Ubuntu Server installer, and in trying to get
> the new system to recognize the clean 5-disk raid5 array left behind by
> the previous Ubuntu system, I think I inadvertently instructed it to
> create a new raid array using those same partitions.
> 
> What I know for sure is that now, I get this:
> 
> user@host:~$ sudo mdadm --examine /dev/hda1
> mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got
> 00000000)
> user@host:~$ sudo mdadm --examine /dev/hdc1
> mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got
> 00000000)
> user@host:~$ sudo mdadm --examine /dev/hde1
> mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got
> 00000000)
> user@host:~$ sudo mdadm --examine /dev/hdg1
> mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got
> 00000000)
> user@host:~$ sudo mdadm --examine /dev/hdi1
> mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got
> 00000000)
> 
> I didn't format the partitions or write any data to the disk, so I think
> the array's data should be intact. Is there a way to recreate the
> superblocks, or am I hosed?

Weirds.... Could the drives have been repartitioned in the process,
with the partitions being slightly different sizes or at slightly
different offsets?  That might explain the disappearing superblocks,
and remaking the partitions might fix it.

Or you can just re-create the array.  Doing so won't destroy any data
that happens to be there.
To be on the safe side, create it with --assume-clean.  This will avoid
a resync so you can be sure that no data blocks will be written at
all.
Then 'fsck -n' or mount readonly and see if you data is safe.  
Once you are happy that you have the data safe you can trigger the
resync with
   mdadm --assemble --update=resync .....
or 
   echo resync > /sys/block/md0/md/sync_action

(assuming it is 'md0').

Good luck.

NeilBrown

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 Recovery
       [not found]   ` <1161571953.4871.8.camel@localhost>
@ 2006-10-23  2:52     ` Neil Cavan
  2006-10-23  4:01     ` Neil Brown
  1 sibling, 0 replies; 12+ messages in thread
From: Neil Cavan @ 2006-10-23  2:52 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

The drives have not been repartitioned.

I think what happened is that I created a new raid5 array over the old
one, but never synced or initialized it.

I'm leery of re-creating the array as you suggest, because I think
re-creating an array "over top" of my existing array is what got me into
trouble in the first place.

Also, from mdadm man page (using v 1.12.0):

--assume-clean
    Tell mdadm that the array pre-existed and is known to be clean.
    This is only really useful for Building RAID1  array.   Only
    use this if you really know what you are doing.  This is currently
    only supported for --build.

This suggests to me that I can only use this to build a legacy array
without superblocks - which I don't want - and that since my array was
RAID5, that it's not "really useful", whatever that means. Oh, and also,
I don't really know what I'm doing. ;)

If I do re-create the array to regenerate the superblocks, isn't it
important that I know the exact parameters of the pre-existing array, to
get the data to match up? chunk size, parity method, etc?

I just don't want to rush in and mess things up. Did that once
already. ;)

Thanks,
Neil

On Mon, 2006-23-10 at 11:29 +1000, Neil Brown wrote:
> On Saturday October 21, nrcavan@engmail.uwaterloo.ca wrote:
> > Hi,
> > 
> > I had a run-in with the Ubuntu Server installer, and in trying to get
> > the new system to recognize the clean 5-disk raid5 array left behind by
> > the previous Ubuntu system, I think I inadvertently instructed it to
> > create a new raid array using those same partitions.
> > 
> > What I know for sure is that now, I get this:
> > 
> > user@host:~$ sudo mdadm --examine /dev/hda1
> > mdadm: No super block found on /dev/hda1 (Expected magic a92b4efc, got
> > 00000000)
> > user@host:~$ sudo mdadm --examine /dev/hdc1
> > mdadm: No super block found on /dev/hdc1 (Expected magic a92b4efc, got
> > 00000000)
> > user@host:~$ sudo mdadm --examine /dev/hde1
> > mdadm: No super block found on /dev/hde1 (Expected magic a92b4efc, got
> > 00000000)
> > user@host:~$ sudo mdadm --examine /dev/hdg1
> > mdadm: No super block found on /dev/hdg1 (Expected magic a92b4efc, got
> > 00000000)
> > user@host:~$ sudo mdadm --examine /dev/hdi1
> > mdadm: No super block found on /dev/hdi1 (Expected magic a92b4efc, got
> > 00000000)
> > 
> > I didn't format the partitions or write any data to the disk, so I think
> > the array's data should be intact. Is there a way to recreate the
> > superblocks, or am I hosed?
> 
> Weirds.... Could the drives have been repartitioned in the process,
> with the partitions being slightly different sizes or at slightly
> different offsets?  That might explain the disappearing superblocks,
> and remaking the partitions might fix it.
> 
> Or you can just re-create the array.  Doing so won't destroy any data
> that happens to be there.
> To be on the safe side, create it with --assume-clean.  This will avoid
> a resync so you can be sure that no data blocks will be written at
> all.
> Then 'fsck -n' or mount readonly and see if you data is safe.  
> Once you are happy that you have the data safe you can trigger the
> resync with
>    mdadm --assemble --update=resync .....
> or 
>    echo resync > /sys/block/md0/md/sync_action
> 
> (assuming it is 'md0').
> 
> Good luck.
> 
> NeilBrown
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 Recovery
       [not found]   ` <1161571953.4871.8.camel@localhost>
  2006-10-23  2:52     ` Neil Cavan
@ 2006-10-23  4:01     ` Neil Brown
  1 sibling, 0 replies; 12+ messages in thread
From: Neil Brown @ 2006-10-23  4:01 UTC (permalink / raw)
  To: nrcavan; +Cc: linux-raid

On Sunday October 22, nrcavan@engmail.uwaterloo.ca wrote:
> The drives have not been repartitioned.
> 
> I think what happened is that I created a new raid5 array over the old
> one, but never synced or initialized it.

If you created an array - whether it synced or not - the superblock
would be written and --examine would have found them.  So there must
be something else that happened.  Hard to know what.

> 
> I'm leery of re-creating the array as you suggest, because I think
> re-creating an array "over top" of my existing array is what got me into
> trouble in the first place.
> 
> Also, from mdadm man page (using v 1.12.0):
> 
> --assume-clean
>     Tell mdadm that the array pre-existed and is known to be clean.
>     This is only really useful for Building RAID1  array.   Only
>     use this if you really know what you are doing.  This is currently
>     only supported for --build.
> 
> This suggests to me that I can only use this to build a legacy array
> without superblocks - which I don't want - and that since my array was
> RAID5, that it's not "really useful", whatever that means. Oh, and also,
> I don't really know what I'm doing. ;)

--assume-clean was extended to --create in mdadm-2.2.

> 
> If I do re-create the array to regenerate the superblocks, isn't it
> important that I know the exact parameters of the pre-existing array, to
> get the data to match up? chunk size, parity method, etc?

Yes, but I would assume you just used the defaults.  If not, you
presumably know why you changed the defaults and can do it again???

In any case, creating the array with --assume-clean does not modify
any data.  It only overwrites the superblocks.  As you currently don't
have any superblock, you have nothing to lose.
After you create the array you can try 'fsck' or other tools to see if
the data is intact.  If it is - good.  If not, stop the array and try
creating it with different parameters.\x0e

> 
> I just don't want to rush in and mess things up. Did that once
> already. ;)

Very sensible.  
Assuming the partitions really are the same as they were before (can't
hurt to triple-check) then I really thing '--create --assume-clean' is
your best bet.  Maybe download and compile the latest mdadm
 http://www.kernel.org/pub/linux/utils/raid/mdadm/
to make sure you have a working --assume-clean.

NeilBrown

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RAID5 Recovery
@ 2007-11-14  1:05 Neil Cavan
  2007-11-14 10:58 ` David Greaves
  0 siblings, 1 reply; 12+ messages in thread
From: Neil Cavan @ 2007-11-14  1:05 UTC (permalink / raw)
  To: linux-raid

Hello,

I have a 5-disk RAID5 array that has gone belly-up. It consists of 2x
2 disks on Promise PCI controllers, and one on the mobo controller.

This array has been running for a couple years, and every so often
(randomly, sometimes every couple weeks sometimes no problem for
months) it will drop a drive. It's not a drive failure per se, it's
something controller-related since the failures tend to happen in
pairs and SMART gives the drives a clean bill of health. If it's only
one drive, I can hot-add with no problem. If it's 2 drives my heart
leaps into my mouth but I reboot, only one of the drives comes up as
failed, and I can hot-add with no problem. The 2-drive case has
happened a dozen times and my array is never any worse for the wear.

This morning, I woke up to find the array had kicked two disks. This
time, though, /proc/mdstat showed one of the failed disks (U_U_U, one
of the "_"s) had been marked as a spare - weird, since there are no
spare drives in this array. I rebooted, and the array came back in the
same state: one failed, one spare. I hot-removed and hot-added the
spare drive, which put the array back to where I thought it should be
( still U_U_U, but with both "_"s marked as failed). Then I rebooted,
and the array began rebuilding on its own. Usually I have to hot-add
manually, so that struck me as a little odd, but I gave it no mind and
went to work. Without checking the contents of the filesystem. Which
turned out not to have been mounted on reboot. Because apparently
things went horribly wrong.

The rebuild process ran its course. I now have an array that mdadm
insists is peachy:
-------------------------------------------------------------------------------------------------------
md0 : active raid5 hda1[0] hdc1[1] hdi1[4] hdg1[3] hde1[2]
      468872704 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]

unused devices: <none>
-------------------------------------------------------------------------------------------------------

But there is no filesystem on /dev/md0:

-------------------------------------------------------------------------------------------------------
sudo mount -t reiserfs /dev/md0 /storage/
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or other error
-------------------------------------------------------------------------------------------------------

Do I have any hope of recovering this data? Could rebuilding the
reiserfs superblock help if the rebuild managed to corrupt the
superblock but not the data?

Any help is appreciated, below is the failure event in
/var/log/messages, followed by the output of cat /var/log/messages |
grep md.

Thanks,
Neil Cavan

Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
status=0x51 { DriveReady SeekComplete Error }
Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
error=0x40 { UncorrectableError }, LBAsect=11736, sector=1
1719
Nov 13 02:01:03 localhost kernel: [17805772.424000] ide: failed opcode
was: unknown
Nov 13 02:01:03 localhost kernel: [17805772.424000] end_request: I/O
error, dev hdc, sector 11719
Nov 13 02:01:03 localhost kernel: [17805772.424000] R5: read error not
correctable.
Nov 13 02:01:03 localhost kernel: [17805772.464000] lost page write
due to I/O error on md0
Nov 13 02:01:05 localhost kernel: [17805773.776000] hdc: dma_intr:
status=0x51 { DriveReady SeekComplete Error }
Nov 13 02:01:05 localhost kernel: [17805773.776000] hdc: dma_intr:
error=0x40 { UncorrectableError }, LBAsect=11736, sector=1
1727
Nov 13 02:01:05 localhost kernel: [17805773.776000] ide: failed opcode
was: unknown
Nov 13 02:01:05 localhost kernel: [17805773.776000] end_request: I/O
error, dev hdc, sector 11727
Nov 13 02:01:05 localhost kernel: [17805773.776000] R5: read error not
correctable.
Nov 13 02:01:05 localhost kernel: [17805773.776000] lost page write
due to I/O error on md0
Nov 13 02:01:06 localhost kernel: [17805775.156000] hdc: dma_intr:
status=0x51 { DriveReady SeekComplete Error }
Nov 13 02:01:06 localhost kernel: [17805775.156000] hdc: dma_intr:
error=0x40 { UncorrectableError }, LBAsect=11736, sector=1
1735
Nov 13 02:01:06 localhost kernel: [17805775.156000] ide: failed opcode
was: unknown
Nov 13 02:01:06 localhost kernel: [17805775.156000] end_request: I/O
error, dev hdc, sector 11735
Nov 13 02:01:06 localhost kernel: [17805775.156000] R5: read error not
correctable.
Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write
due to I/O error on md0
Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout:
Nov 13 02:01:06 localhost kernel: [17805775.196000]  --- rd:5 wd:3 fd:2
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 0, o:1, dev:hda1
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 1, o:0, dev:hdc1
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 2, o:1, dev:hde1
Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 4, o:1, dev:hdi1
Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 conf printout:
Nov 13 02:01:06 localhost kernel: [17805775.212000]  --- rd:5 wd:3 fd:2
Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 0, o:1, dev:hda1
Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 2, o:1, dev:hde1
Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 4, o:1, dev:hdi1
Nov 13 02:01:06 localhost kernel: [17805775.212000] lost page write
due to I/O error on md0
Nov 13 02:01:06 localhost last message repeated 4 times

at /var/log/messages | grep md:
-------------------------------------------------------------------------------------------------------
Nov 13 02:01:03 localhost kernel: [17805772.464000] lost page write
due to I/O error on md0
Nov 13 02:01:05 localhost kernel: [17805773.776000] lost page write
due to I/O error on md0
Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write
due to I/O error on md0
Nov 13 02:01:06 localhost kernel: [17805775.212000] lost page write
due to I/O error on md0
Nov 13 07:21:07 localhost kernel: [17179583.968000] md: md driver
0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
Nov 13 07:21:07 localhost kernel: [17179583.968000] md: bitmap version 4.39
Nov 13 07:21:07 localhost kernel: [17179583.972000] md: raid5
personality registered as nr 4
Nov 13 07:21:07 localhost kernel: [17179584.712000] md: md0 stopped.
Nov 13 07:21:07 localhost kernel: [17179584.876000] md: bind<hdc1>
Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind<hde1>
Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind<hdg1>
Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind<hdi1>
Nov 13 07:21:07 localhost kernel: [17179584.892000] md: bind<hda1>
Nov 13 07:21:07 localhost kernel: [17179584.892000] md: kicking
non-fresh hdg1 from array!
Nov 13 07:21:07 localhost kernel: [17179584.892000] md: unbind<hdg1>
Nov 13 07:21:07 localhost kernel: [17179584.892000] md: export_rdev(hdg1)
Nov 13 07:21:07 localhost kernel: [17179584.896000] raid5: allocated
5245kB for md0
Nov 13 07:21:07 localhost kernel: [17179665.524000] ReiserFS: md0:
found reiserfs format "3.6" with standard journal
Nov 13 07:21:07 localhost kernel: [17179676.136000] ReiserFS: md0:
using ordered data mode
Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
journal params: device md0, size 8192, journal first block 18, max
trans len 1024, max batch 900, max commit age 30, max trans age 30
Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
checking transaction log (md0)
Nov 13 07:21:07 localhost kernel: [17179676.828000] ReiserFS: md0:
replayed 7 transactions in 1 seconds
Nov 13 07:21:07 localhost kernel: [17179677.012000] ReiserFS: md0:
Using r5 hash to sort names
Nov 13 07:21:09 localhost kernel: [17179682.064000] lost page write
due to I/O error on md0
Nov 13 07:25:39 localhost kernel: [17179584.824000] md: md driver
0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
Nov 13 07:25:39 localhost kernel: [17179584.824000] md: bitmap version 4.39
Nov 13 07:25:39 localhost kernel: [17179584.828000] md: raid5
personality registered as nr 4
Nov 13 07:25:39 localhost kernel: [17179585.532000] md: md0 stopped.
Nov 13 07:25:39 localhost kernel: [17179585.696000] md: bind<hdc1>
Nov 13 07:25:39 localhost kernel: [17179585.696000] md: bind<hde1>
Nov 13 07:25:39 localhost kernel: [17179585.700000] md: bind<hdg1>
Nov 13 07:25:39 localhost kernel: [17179585.700000] md: bind<hdi1>
Nov 13 07:25:39 localhost kernel: [17179585.708000] md: bind<hda1>
Nov 13 07:25:39 localhost kernel: [17179585.708000] md: kicking
non-fresh hdg1 from array!
Nov 13 07:25:39 localhost kernel: [17179585.708000] md: unbind<hdg1>
Nov 13 07:25:39 localhost kernel: [17179585.708000] md: export_rdev(hdg1)
Nov 13 07:25:39 localhost kernel: [17179585.712000] raid5: allocated
5245kB for md0
Nov 13 07:25:40 localhost kernel: [17179666.064000] ReiserFS: md0:
found reiserfs format "3.6" with standard journal
Nov 13 07:25:40 localhost kernel: [17179676.904000] ReiserFS: md0:
using ordered data mode
Nov 13 07:25:40 localhost kernel: [17179676.928000] ReiserFS: md0:
journal params: device md0, size 8192, journal first block 18, max
trans len 1024, max batch 900, max commit age 30, max trans age 30
Nov 13 07:25:40 localhost kernel: [17179676.932000] ReiserFS: md0:
checking transaction log (md0)
Nov 13 07:25:40 localhost kernel: [17179677.080000] ReiserFS: md0:
Using r5 hash to sort names
Nov 13 07:25:42 localhost kernel: [17179683.128000] lost page write
due to I/O error on md0
Nov 13 07:26:57 localhost kernel: [17179757.524000] md: unbind<hdc1>
Nov 13 07:26:57 localhost kernel: [17179757.524000] md: export_rdev(hdc1)
Nov 13 07:27:03 localhost kernel: [17179763.700000] md: bind<hdc1>
Nov 13 07:30:24 localhost kernel: [17179584.180000] md: md driver
0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
Nov 13 07:30:24 localhost kernel: [17179584.180000] md: bitmap version 4.39
Nov 13 07:30:24 localhost kernel: [17179584.184000] md: raid5
personality registered as nr 4
Nov 13 07:30:24 localhost kernel: [17179584.912000] md: md0 stopped.
Nov 13 07:30:24 localhost kernel: [17179585.060000] md: bind<hde1>
Nov 13 07:30:24 localhost kernel: [17179585.064000] md: bind<hdg1>
Nov 13 07:30:24 localhost kernel: [17179585.064000] md: bind<hdi1>
Nov 13 07:30:24 localhost kernel: [17179585.064000] md: bind<hdc1>
Nov 13 07:30:24 localhost kernel: [17179585.068000] md: bind<hda1>
Nov 13 07:30:24 localhost kernel: [17179585.068000] raid5: allocated
5245kB for md0
Nov 13 07:30:24 localhost kernel: [17179585.068000] md: syncing RAID array md0
Nov 13 07:30:24 localhost kernel: [17179585.068000] md: minimum
_guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Nov 13 07:30:24 localhost kernel: [17179585.068000] md: using maximum
available idle IO bandwidth (but not more than 200000 KB/sec) for
reconstruction.
Nov 13 07:30:24 localhost kernel: [17179585.068000] md: using 128k
window, over a total of 117218176 blocks.
Nov 13 07:30:24 localhost kernel: [17179684.160000] ReiserFS: md0:
warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md0
Nov 13 08:57:11 localhost kernel: [17184895.816000] md: md0: sync done.
Nov 13 18:17:10 localhost kernel: [17218493.012000] ReiserFS: md0:
warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md0
Nov 13 18:36:03 localhost kernel: [17219625.456000] ReiserFS: md0:
warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 Recovery
  2007-11-14  1:05 RAID5 Recovery Neil Cavan
@ 2007-11-14 10:58 ` David Greaves
  0 siblings, 0 replies; 12+ messages in thread
From: David Greaves @ 2007-11-14 10:58 UTC (permalink / raw)
  To: Neil Cavan; +Cc: linux-raid

Neil Cavan wrote:
> Hello,
Hi Neil

What kernel version?
What mdadm version?

> This morning, I woke up to find the array had kicked two disks. This
> time, though, /proc/mdstat showed one of the failed disks (U_U_U, one
> of the "_"s) had been marked as a spare - weird, since there are no
> spare drives in this array. I rebooted, and the array came back in the
> same state: one failed, one spare. I hot-removed and hot-added the
> spare drive, which put the array back to where I thought it should be
> ( still U_U_U, but with both "_"s marked as failed). Then I rebooted,
> and the array began rebuilding on its own. Usually I have to hot-add
> manually, so that struck me as a little odd, but I gave it no mind and
> went to work. Without checking the contents of the filesystem. Which
> turned out not to have been mounted on reboot.
OK

> Because apparently things went horribly wrong.
Yep :(

> Do I have any hope of recovering this data? Could rebuilding the
> reiserfs superblock help if the rebuild managed to corrupt the
> superblock but not the data?
See below



> Nov 13 02:01:03 localhost kernel: [17805772.424000] hdc: dma_intr:
> status=0x51 { DriveReady SeekComplete Error }
<snip>
> Nov 13 02:01:06 localhost kernel: [17805775.156000] lost page write
> due to I/O error on md0
hdc1 fails


> Nov 13 02:01:06 localhost kernel: [17805775.196000] RAID5 conf printout:
> Nov 13 02:01:06 localhost kernel: [17805775.196000]  --- rd:5 wd:3 fd:2
> Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 0, o:1, dev:hda1
> Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 1, o:0, dev:hdc1
> Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 2, o:1, dev:hde1
> Nov 13 02:01:06 localhost kernel: [17805775.196000]  disk 4, o:1, dev:hdi1

hdg1 is already missing?

> Nov 13 02:01:06 localhost kernel: [17805775.212000] RAID5 conf printout:
> Nov 13 02:01:06 localhost kernel: [17805775.212000]  --- rd:5 wd:3 fd:2
> Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 0, o:1, dev:hda1
> Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 2, o:1, dev:hde1
> Nov 13 02:01:06 localhost kernel: [17805775.212000]  disk 4, o:1, dev:hdi1

so now the array is bad.

a reboot happens and:
> Nov 13 07:21:07 localhost kernel: [17179584.712000] md: md0 stopped.
> Nov 13 07:21:07 localhost kernel: [17179584.876000] md: bind<hdc1>
> Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind<hde1>
> Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind<hdg1>
> Nov 13 07:21:07 localhost kernel: [17179584.884000] md: bind<hdi1>
> Nov 13 07:21:07 localhost kernel: [17179584.892000] md: bind<hda1>
> Nov 13 07:21:07 localhost kernel: [17179584.892000] md: kicking
> non-fresh hdg1 from array!
> Nov 13 07:21:07 localhost kernel: [17179584.892000] md: unbind<hdg1>
> Nov 13 07:21:07 localhost kernel: [17179584.892000] md: export_rdev(hdg1)
> Nov 13 07:21:07 localhost kernel: [17179584.896000] raid5: allocated
> 5245kB for md0
... apparently hdc1 is OK? Hmmm.

> Nov 13 07:21:07 localhost kernel: [17179665.524000] ReiserFS: md0:
> found reiserfs format "3.6" with standard journal
> Nov 13 07:21:07 localhost kernel: [17179676.136000] ReiserFS: md0:
> using ordered data mode
> Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
> journal params: device md0, size 8192, journal first block 18, max
> trans len 1024, max batch 900, max commit age 30, max trans age 30
> Nov 13 07:21:07 localhost kernel: [17179676.164000] ReiserFS: md0:
> checking transaction log (md0)
> Nov 13 07:21:07 localhost kernel: [17179676.828000] ReiserFS: md0:
> replayed 7 transactions in 1 seconds
> Nov 13 07:21:07 localhost kernel: [17179677.012000] ReiserFS: md0:
> Using r5 hash to sort names
> Nov 13 07:21:09 localhost kernel: [17179682.064000] lost page write
> due to I/O error on md0
Reiser tries to mount/replay itself relying on hdc1 (which is partly bad)

> Nov 13 07:25:39 localhost kernel: [17179584.828000] md: raid5
> personality registered as nr 4
> Nov 13 07:25:39 localhost kernel: [17179585.708000] md: kicking
> non-fresh hdg1 from array!
Another reboot...

> Nov 13 07:25:40 localhost kernel: [17179666.064000] ReiserFS: md0:
> found reiserfs format "3.6" with standard journal
> Nov 13 07:25:40 localhost kernel: [17179676.904000] ReiserFS: md0:
> using ordered data mode
> Nov 13 07:25:40 localhost kernel: [17179676.928000] ReiserFS: md0:
> journal params: device md0, size 8192, journal first block 18, max
> trans len 1024, max batch 900, max commit age 30, max trans age 30
> Nov 13 07:25:40 localhost kernel: [17179676.932000] ReiserFS: md0:
> checking transaction log (md0)
> Nov 13 07:25:40 localhost kernel: [17179677.080000] ReiserFS: md0:
> Using r5 hash to sort names
> Nov 13 07:25:42 localhost kernel: [17179683.128000] lost page write
> due to I/O error on md0
Reiser tries again...

> Nov 13 07:26:57 localhost kernel: [17179757.524000] md: unbind<hdc1>
> Nov 13 07:26:57 localhost kernel: [17179757.524000] md: export_rdev(hdc1)
> Nov 13 07:27:03 localhost kernel: [17179763.700000] md: bind<hdc1>
> Nov 13 07:30:24 localhost kernel: [17179584.180000] md: md driver
hdc is kicked too (again)

> Nov 13 07:30:24 localhost kernel: [17179584.184000] md: raid5
> personality registered as nr 4
Another reboot...

> Nov 13 07:30:24 localhost kernel: [17179585.068000] md: syncing RAID array md0
Now (I guess) hdg is being restored using hdc data:

> Nov 13 07:30:24 localhost kernel: [17179684.160000] ReiserFS: md0:
> warning: sh-2021: reiserfs_fill_super: can not find reiserfs on md0
But Reiser is confused.

> Nov 13 08:57:11 localhost kernel: [17184895.816000] md: md0: sync done.
hdg is back up to speed:


So hdc looks faulty.
Your only hope (IMO) is to use reiserfs recovery tools.
You may want to replace hdc to avoid an hdc failure interrupting any rebuild.

I think what happened is that hdg failed prior to 2am and you didn't notice
(mdadm --monitor is your friend). Then hdc had a real failure - at that point
you had data loss (not enough good disks). I don't know why md rebuilt using hdc
- I would expect it to have found hdc and hdg stale. If this is a newish kernel
then maybe Neil should take a look...

David


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RAID5 recovery
@ 2013-06-05 12:02 Philipp Frauenfelder
  2013-06-05 13:20 ` Drew
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Philipp Frauenfelder @ 2013-06-05 12:02 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]

Hi

I am new to this list and RAID problems (but not to Linux).

My RAIDs run well but one of my colleague in a Synology has failed. It 
was a 4-disk RAID5 and the hot-spare disk and one of the used disks (the 
middle one) failed. Unfortunately, he returned the disks to the vendor 
the get new ones.

So, we ended up with 2 out of 4 disks and trying to get the data of the 
disks now. My colleague copied the disks and we were trying to rebuild 
the RAID5 on the copies. We tried to do the rebuild on a PC runing a 
fairly recent Knoppix:

root@Microknoppix:~# mdadm --create --assume-clean --level=5 
--raid-devices=4 --size=1948662272 /dev/md2 missing /dev/sda3 missing 
/dev/sdb3
mdadm: /dev/sda3 appears to be part of a raid array:
     level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
mdadm: /dev/sdb3 appears to be part of a raid array:
     level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: RUN_ARRAY failed: Input/output error

Apparently, this did not work. :-(

In the RAID wiki, it says one should ask here before trying a destroying 
to much, that's why I am asking here....

Btw, attached is the raid.status and the dmesg output.

Is there a hint what we need to do?

Thanks in advance,
Philipp

[-- Attachment #2: output.txt --]
[-- Type: text/plain, Size: 11948 bytes --]

[    8.601246] grow_buffers: requested out-of-range block 18446744072372647623 for device sdb3
[    8.601249] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020433, location=-1096946863
[    8.601252] grow_buffers: requested out-of-range block 18446744072372647934 for device sdb3
[    8.601254] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020688, location=-1096946608
[    8.601257] grow_buffers: requested out-of-range block 18446744072372647622 for device sdb3
[    8.601260] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020432, location=-1096946864
[    8.601262] grow_buffers: requested out-of-range block 18446744072372647624 for device sdb3
[    8.601265] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020434, location=-1096946862
[    8.601268] grow_buffers: requested out-of-range block 18446744072372647933 for device sdb3
[    8.601270] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020687, location=-1096946609
[    8.601273] grow_buffers: requested out-of-range block 18446744072372647621 for device sdb3
[    8.601276] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020431, location=-1096946865
[    8.601278] grow_buffers: requested out-of-range block 18446744072372647750 for device sdb3
[    8.601281] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020539, location=-1096946757
[    8.601284] grow_buffers: requested out-of-range block 18446744072372647438 for device sdb3
[    8.601286] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020283, location=-1096947013
[    8.601289] grow_buffers: requested out-of-range block 18446744072372647748 for device sdb3
[    8.601291] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020537, location=-1096946759
[    8.601294] grow_buffers: requested out-of-range block 18446744072372647436 for device sdb3
[    8.601297] UDF-fs: error (device sdb3): udf_read_tagged: read failed, block=3198020281, location=-1096947015
[    8.633283] UDF-fs: warning (device sdb3): udf_fill_super: No partition found (1)
[    8.671669] FAT-fs (sdc1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
[    8.827286] cloop: Can't open device read-write in mode 0x1f
[    8.841807] cloop: losetup_file: 15549 blocks, 131072 bytes/block, largest block is 131098 bytes.
[    8.861463] ISO 9660 Extensions: RRIP_1991A
[   18.316899] aufs test_add:264:busybox[2042]: uid/gid/perm /KNOPPIX 0/0/0755, 0/0/01777
[   19.568037] [drm] radeon kernel modesetting enabled.
[   19.568191] [drm] initializing kernel modesetting (CEDAR 0x1002:0x68F9 0x1043:0x03CA).
[   19.568206] [drm] register mmio base: 0xF7DC0000
[   19.568207] [drm] register mmio size: 131072
[   19.568397] ATOM BIOS: 68F9.12.20.0.60.AS01
[   19.568470] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[   19.568473] radeon 0000:01:00.0: GTT: 512M 0x0000000040000000 - 0x000000005FFFFFFF
[   19.573103] [drm] Detected VRAM RAM=1024M, BAR=256M
[   19.573107] [drm] RAM width 64bits DDR
[   19.573202] [TTM] Zone  kernel: Available graphics memory: 435976 kiB
[   19.573203] [TTM] Zone highmem: Available graphics memory: 1685004 kiB
[   19.573204] [TTM] Initializing pool allocator
[   19.573234] [drm] radeon: 1024M of VRAM memory ready
[   19.573236] [drm] radeon: 512M of GTT memory ready.
[   19.573255] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[   19.573256] [drm] Driver supports precise vblank timestamp query.
[   19.573301] radeon 0000:01:00.0: irq 44 for MSI/MSI-X
[   19.573310] radeon 0000:01:00.0: radeon: using MSI.
[   19.573344] [drm] radeon: irq initialized.
[   19.573348] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   19.573636] [drm] probing gen 2 caps for device 8086:29c1 = 1/0
[   19.573674] [drm] Loading CEDAR Microcode
[   19.752047] ACPI Warning: 0x00000400-0x0000041f SystemIO conflicts with Region \SMRG 1 (20120711/utaddress-251)
[   19.752055] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   19.771647] ACPI Warning: 0x00000480-0x000004bf SystemIO conflicts with Region \GPS0 1 (20120711/utaddress-251)
[   19.771653] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   19.771655] lpc_ich: Resource conflict(s) found affecting gpio_ich
[   19.794779] sky2: driver version 1.30
[   19.794818] sky2 0000:02:00.0: Yukon-2 EC Ultra chip revision 3
[   19.794900] sky2 0000:02:00.0: irq 45 for MSI/MSI-X
[   19.795088] sky2 0000:02:00.0: eth0: addr 00:1e:8c:9b:c2:a8
[   19.795973] snd_hda_intel 0000:00:1b.0: irq 46 for MSI/MSI-X
[   20.317373] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
[   20.317477] radeon 0000:01:00.0: WB enabled
[   20.317482] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xff924c00
[   20.333784] [drm] ring test on 0 succeeded in 0 usecs
[   20.333930] [drm] ib test on ring 0 succeeded in 0 usecs
[   20.334383] [drm] Radeon Display Connectors
[   20.334385] [drm] Connector 0:
[   20.334386] [drm]   HDMI-A-1
[   20.334388] [drm]   HPD1
[   20.334390] [drm]   DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c
[   20.334392] [drm]   Encoders:
[   20.334393] [drm]     DFP1: INTERNAL_UNIPHY1
[   20.334395] [drm] Connector 1:
[   20.334397] [drm]   DVI-I-1
[   20.334398] [drm]   HPD4
[   20.334400] [drm]   DDC: 0x6450 0x6450 0x6454 0x6454 0x6458 0x6458 0x645c 0x645c
[   20.334402] [drm]   Encoders:
[   20.334403] [drm]     DFP2: INTERNAL_UNIPHY
[   20.334405] [drm]     CRT2: INTERNAL_KLDSCP_DAC2
[   20.334407] [drm] Connector 2:
[   20.334408] [drm]   VGA-1
[   20.334411] [drm]   DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c
[   20.334412] [drm]   Encoders:
[   20.334413] [drm]     CRT1: INTERNAL_KLDSCP_DAC1
[   20.334453] [drm] Internal thermal controller without fan control
[   20.334495] [drm] radeon: power management initialized
[   20.409843] [drm] fb mappable at 0xD0142000
[   20.409846] [drm] vram apper at 0xD0000000
[   20.409848] [drm] size 7299072
[   20.409849] [drm] fb depth is 24
[   20.409851] [drm]    pitch is 6912
[   20.409894] fbcon: radeondrmfb (fb0) is primary device
[   20.634495] Console: switching to colour frame buffer device 210x65
[   20.642330] fb0: radeondrmfb frame buffer device
[   20.642332] drm: registered panic notifier
[   20.642335] [drm] Initialized radeon 2.24.0 20080528 for 0000:01:00.0 on minor 0
[   20.642447] snd_hda_intel 0000:01:00.1: irq 47 for MSI/MSI-X
[   23.204085] Floppy drive(s): fd0 is 1.44M
[   23.222067] FDC 0 is a post-1991 82077
[   23.255387] logitech-djreceiver 0003:046D:C52B.0004: hiddev0,hidraw1: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:00:1d.0-1/input2
[   23.259694] input: Logitech Unifying Device. Wireless PID:4002 as /devices/pci0000:00/0000:00:1d.0/usb6/6-1/6-1:1.2/0003:046D:C52B.0004/input/input3
[   23.259764] logitech-djdevice 0003:046D:C52B.0005: input,hidraw2: USB HID v1.11 Keyboard [Logitech Unifying Device. Wireless PID:4002] on usb-0000:00:1d.0-1:1
[   23.317599] Linux media interface: v0.10
[   23.329515] Linux video capture interface: v2.00
[   23.342380] gspca_main: v2.14.0 registered
[   23.354400] gspca_main: ALi m5602-2.14.0 probing 0402:5602
[   23.447385] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[   23.467999] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[   23.468324] gspca_m5602: Sensor reported 0x00
[   23.491436] 00:0a: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[   23.507888] gspca_m5602: Detected a s5k83a sensor
[   23.563589] usbcore: registered new interface driver ALi m5602
[   23.985843] Adding 2527504k swap on /dev/zram0.  Priority:0 extents:1 across:2527504k SS
[   24.362914] sky2 0000:02:00.0: eth0: enabling interface
[   26.808806] sky2 0000:02:00.0: eth0: Link is up at 1000 Mbps, full duplex, flow control both
[   26.989818] NET: Registered protocol family 10
[   45.775875] lp: driver loaded but no devices found
[   45.779193] ppdev: user-space parallel port driver
[  108.873580] md: md2 stopped.
[  108.874281] md: bind<sda3>
[  251.127903] md: md2 stopped.
[  251.127911] md: unbind<sda3>
[  251.153329] md: export_rdev(sda3)
[  262.889300] md: md2 stopped.
[  262.890686] md: bind<sdb3>
[  262.891379] md: bind<sda3>
[  365.539083] async_tx: api initialized (async)
[  365.540515] xor: automatically using best checksumming function:
[  365.573320]    pIII_sse  : 10740.000 MB/sec
[  365.639975] raid6: mmxx1     3941 MB/s
[  365.696652] raid6: mmxx2     3113 MB/s
[  365.753338] raid6: sse1x1    1817 MB/s
[  365.809996] raid6: sse1x2    2225 MB/s
[  365.866645] raid6: sse2x1    3366 MB/s
[  365.923309] raid6: sse2x2    4144 MB/s
[  365.923312] raid6: using algorithm sse2x2 (4144 MB/s)
[  365.923314] raid6: using ssse3x1 recovery algorithm
[  365.927355] md: raid6 personality registered for level 6
[  365.927358] md: raid5 personality registered for level 5
[  365.927359] md: raid4 personality registered for level 4
[  365.927525] bio: create slab <bio-1> at 1
[  365.927545] md/raid:md2: device sda3 operational as raid disk 1
[  365.927547] md/raid:md2: device sdb3 operational as raid disk 3
[  365.927898] md/raid:md2: allocated 4250kB
[  365.927933] md/raid:md2: not enough operational devices (2/4 failed)
[  365.927944] RAID conf printout:
[  365.927946]  --- level:5 rd:4 wd:2
[  365.927948]  disk 1, o:1, dev:sda3
[  365.927950]  disk 3, o:1, dev:sdb3
[  365.928222] md/raid:md2: failed to run raid set.
[  365.928224] md: pers->run() failed ...
[  764.898143] md: md2 stopped.
[  764.898151] md: unbind<sda3>
[  764.926609] md: export_rdev(sda3)
[  764.926629] md: unbind<sdb3>
[  764.943268] md: export_rdev(sdb3)
[  787.874649] md: md2 stopped.
[  787.875397] md: bind<sdb3>
[  787.875581] md: bind<sda3>
[ 1848.916785] md: md2 stopped.
[ 1848.916793] md: unbind<sda3>
[ 1848.939838] md: export_rdev(sda3)
[ 1848.939862] md: unbind<sdb3>
[ 1848.969834] md: export_rdev(sdb3)
[ 1943.094851] md: bind<sda3>
[ 1943.096685] md: bind<sdb3>
[ 1943.098368] bio: create slab <bio-1> at 1
[ 1943.098389] md/raid:md2: device sdb3 operational as raid disk 3
[ 1943.098392] md/raid:md2: device sda3 operational as raid disk 1
[ 1943.098746] md/raid:md2: allocated 4250kB
[ 1943.098775] md/raid:md2: not enough operational devices (2/4 failed)
[ 1943.098786] RAID conf printout:
[ 1943.098788]  --- level:5 rd:4 wd:2
[ 1943.098790]  disk 1, o:1, dev:sda3
[ 1943.098792]  disk 3, o:1, dev:sdb3
[ 1943.099068] md/raid:md2: failed to run raid set.
[ 1943.099069] md: pers->run() failed ...
[ 1943.099093] md: md2 stopped.
[ 1943.099098] md: unbind<sdb3>
[ 1943.109816] md: export_rdev(sdb3)
[ 1943.109821] md: unbind<sda3>
[ 1943.126481] md: export_rdev(sda3)
root@Microknoppix:~# ls /dev/md2
/dev/md2
root@Microknoppix:~# cat /proc/partitions
major minor  #blocks  name

 240        0    1990272 cloop0
   8        0 2930266584 sda
   8        1    2498076 sda1
   8        2    2097152 sda2
   8        3 1948793856 sda3
   8       16 2930266584 sdb
   8       17    2498076 sdb1
   8       18    2097152 sdb2
   8       19 1948793856 sdb3
 251        0    2527508 zram0
   8       32    7876607 sdc
   8       33    7875583 sdc1
root@Microknoppix:~# mdadm -S /dev/md2
mdadm: stopped /dev/md2
root@Microknoppix:~# mdadm --create --assume-clean --level=5 --raid-devices=4 --size=1948662272 /dev/md2 missing /dev/sda3 missing /dev/sdb3
mdadm: /dev/sda3 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
mdadm: /dev/sdb3 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: RUN_ARRAY failed: Input/output error
root@Microknoppix:~#

[-- Attachment #3: raid.status --]
[-- Type: text/plain, Size: 1613 bytes --]

/dev/sda3:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : d6f749fd:c8facea3:67a12527:d26839ce
           Name : CubeStation_CH:2
  Creation Time : Fri May 14 19:21:32 2010
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3897586617 (1858.51 GiB 1995.56 GB)
     Array Size : 5846379840 (5575.54 GiB 5986.69 GB)
  Used Dev Size : 3897586560 (1858.51 GiB 1995.56 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : 17580b66:1315c925:7cd68c49:ab0fd414

    Update Time : Thu Dec 13 22:17:50 2012
       Checksum : 2a65b28 - correct
         Events : 539041

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 1
   Array State : .A.A ('A' == active, '.' == missing)
/dev/sdb3:
          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x0
     Array UUID : d6f749fd:c8facea3:67a12527:d26839ce
           Name : CubeStation_CH:2
  Creation Time : Fri May 14 19:21:32 2010
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 3897586617 (1858.51 GiB 1995.56 GB)
     Array Size : 5846379840 (5575.54 GiB 5986.69 GB)
  Used Dev Size : 3897586560 (1858.51 GiB 1995.56 GB)
    Data Offset : 264 sectors
   Super Offset : 0 sectors
          State : clean
    Device UUID : afc651a8:70326fd6:c6e261b2:4aac6c8e

    Update Time : Thu Dec 13 22:17:50 2012
       Checksum : d800900a - correct
         Events : 539041

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 3
   Array State : .A.A ('A' == active, '.' == missing)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 recovery
  2013-06-05 12:02 RAID5 recovery Philipp Frauenfelder
@ 2013-06-05 13:20 ` Drew
  2013-06-05 13:29 ` John Robinson
  2013-06-06 12:28 ` Sam Bingner
  2 siblings, 0 replies; 12+ messages in thread
From: Drew @ 2013-06-05 13:20 UTC (permalink / raw)
  To: Philipp Frauenfelder; +Cc: linux-raid

Hi Phillip,

The experts will no doubt chime in quite quickly but I'm going to jump
in myself. Just glancing it over, I hope your friend has backups
because this one is messy at best.

> My RAIDs run well but one of my colleague in a Synology has failed. It was a
> 4-disk RAID5 and the hot-spare disk and one of the used disks (the middle
> one) failed. Unfortunately, he returned the disks to the vendor the get new
> ones.

Looking at the raid.status, this is a 4 disk RAID-5, there is no "hot
spare". RAID5 integrates the parity data across all disks so that
fourth disk is part of the array.

> So, we ended up with 2 out of 4 disks and trying to get the data of the
> disks now. My colleague copied the disks and we were trying to rebuild the
> RAID5 on the copies. We tried to do the rebuild on a PC runing a fairly
> recent Knoppix:

RAID-5 can only withstand a single disk failure. Losing two disks
together means the array is IMHO hosed. The first disk lost can be
recreated as the parity is distributed across the other disks. Once
you lose the second disk, there's no way to recreate the data from
parity and the stripes are essentially gone.

> root@Microknoppix:~# mdadm --create --assume-clean --level=5
> --raid-devices=4 --size=1948662272 /dev/md2 missing /dev/sda3 missing
> /dev/sdb3
> mdadm: /dev/sda3 appears to be part of a raid array:
>     level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
> mdadm: /dev/sdb3 appears to be part of a raid array:
>     level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: RUN_ARRAY failed: Input/output error
>
> Apparently, this did not work. :-(

For one, you can't create a 4 disk RAID5 with two missing disks. Even
if a third disk had been available, that command is extremely
dangerous as it overwrites the RAID superblock. In this case it
overwrote a 1.1 metadata array with a 1.2 which most likely toasted
any partitions or filesystem superblocks on the existing array.

> In the RAID wiki, it says one should ask here before trying a destroying to
> much, that's why I am asking here....

You probably read the info at:
https://raid.wiki.kernel.org/index.php/RAID_Recovery The --create
statement is very dangerous as it does overwrite existing data. You
did however capture the previous RAID info so if one of the experts
can help, you've got the info they'll need.

The better choice for recovery (from the wiki) is:
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID

> Btw, attached is the raid.status and the dmesg output.
>
> Is there a hint what we need to do?

Pull out those backups. Hopefully one of the experts knows some magic
to recovering a two disk failure of RAID5 but I don't.

-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 recovery
  2013-06-05 12:02 RAID5 recovery Philipp Frauenfelder
  2013-06-05 13:20 ` Drew
@ 2013-06-05 13:29 ` John Robinson
  2013-06-05 15:10   ` Drew
  2013-06-06 12:28 ` Sam Bingner
  2 siblings, 1 reply; 12+ messages in thread
From: John Robinson @ 2013-06-05 13:29 UTC (permalink / raw)
  To: Philipp Frauenfelder; +Cc: linux-raid

On 05/06/2013 13:02, Philipp Frauenfelder wrote:
> Hi
>
> I am new to this list and RAID problems (but not to Linux).
>
> My RAIDs run well but one of my colleague in a Synology has failed. It
> was a 4-disk RAID5 and the hot-spare disk and one of the used disks (the
> middle one) failed. Unfortunately, he returned the disks to the vendor
> the get new ones.
>
> So, we ended up with 2 out of 4 disks and trying to get the data of the
> disks now. My colleague copied the disks and we were trying to rebuild
> the RAID5 on the copies. We tried to do the rebuild on a PC runing a
> fairly recent Knoppix:
>
> root@Microknoppix:~# mdadm --create --assume-clean --level=5
> --raid-devices=4 --size=1948662272 /dev/md2 missing /dev/sda3 missing
> /dev/sdb3
> mdadm: /dev/sda3 appears to be part of a raid array:
>      level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
> mdadm: /dev/sdb3 appears to be part of a raid array:
>      level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: RUN_ARRAY failed: Input/output error
>
> Apparently, this did not work. :-(
>
> In the RAID wiki, it says one should ask here before trying a destroying
> to much, that's why I am asking here....
>
> Btw, attached is the raid.status and the dmesg output.
>
> Is there a hint what we need to do?

Unfortunately you've almost certainly already lost or destroyed too much.

Firstly, you say a hot-spare and one of the used discs failed. That 
would be a 3-drive array with a spare, not a 4-drive array.

You then attempted to create a 4-drive array with two drives missing. By 
doing that, and saying "y" when you ran the create command, you 
destroyed any remaining info about the original array.

Somewhat helpfully, the create command told you a little about the 
previous metadata. It says it was a 4-drive array. Was this the first 
time you'd try to run the create command?

If the original array was indeed a 4-drive array - rather than 3 plus a 
spare - then you have lost your data because a RAID-5 can only tolerate 
losing one drive.

If the original array was in fact a 3-drive array with a spare, someone 
here might be able to do something. Re-image from the original drives, 
then post `fdisk -lu` from each disc and `mdadm -Ev` from each disc and 
all their partitions.

Cheers,

John.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 recovery
  2013-06-05 13:29 ` John Robinson
@ 2013-06-05 15:10   ` Drew
  2013-06-05 20:12     ` Philipp Frauenfelder
  0 siblings, 1 reply; 12+ messages in thread
From: Drew @ 2013-06-05 15:10 UTC (permalink / raw)
  To: John Robinson; +Cc: Philipp Frauenfelder, linux-raid

> Firstly, you say a hot-spare and one of the used discs failed. That would be
> a 3-drive array with a spare, not a 4-drive array.

The raid.status attachment he posted was apparently pre-recovery. It
shows the system was a 4 disk array, so there was no "hot spare."


-- 
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie

"This started out as a hobby and spun horribly out of control."
-Unknown

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 recovery
  2013-06-05 15:10   ` Drew
@ 2013-06-05 20:12     ` Philipp Frauenfelder
  0 siblings, 0 replies; 12+ messages in thread
From: Philipp Frauenfelder @ 2013-06-05 20:12 UTC (permalink / raw)
  To: linux-raid

Hi Drew

2013/6/5 Drew <drew.kay@gmail.com>:
>> Firstly, you say a hot-spare and one of the used discs failed. That would be
>> a 3-drive array with a spare, not a 4-drive array.
>
> The raid.status attachment he posted was apparently pre-recovery. It
> shows the system was a 4 disk array, so there was no "hot spare."

I tried to find hints on the web and
http://forum.synology.com/wiki/index.php/How_to_retrieve_data_from_RAID_Volumes_on_Linux
seems to say the same as you. :-(

Thanks to all for the help!

Philipp

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: RAID5 recovery
  2013-06-05 12:02 RAID5 recovery Philipp Frauenfelder
  2013-06-05 13:20 ` Drew
  2013-06-05 13:29 ` John Robinson
@ 2013-06-06 12:28 ` Sam Bingner
  2 siblings, 0 replies; 12+ messages in thread
From: Sam Bingner @ 2013-06-06 12:28 UTC (permalink / raw)
  To: Philipp Frauenfelder; +Cc: <linux-raid@vger.kernel.org>

On Jun 5, 2013, at 2:02 AM, Philipp Frauenfelder <philipp.frauenfelder@gmail.com> wrote:

> Hi
> 
> I am new to this list and RAID problems (but not to Linux).
> 
> My RAIDs run well but one of my colleague in a Synology has failed. It was a 4-disk RAID5 and the hot-spare disk and one of the used disks (the middle one) failed. Unfortunately, he returned the disks to the vendor the get new ones.
> 
> So, we ended up with 2 out of 4 disks and trying to get the data of the disks now. My colleague copied the disks and we were trying to rebuild the RAID5 on the copies. We tried to do the rebuild on a PC runing a fairly recent Knoppix:
> 
> root@Microknoppix:~# mdadm --create --assume-clean --level=5 --raid-devices=4 --size=1948662272 /dev/md2 missing /dev/sda3 missing /dev/sdb3
> mdadm: /dev/sda3 appears to be part of a raid array:
>    level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
> mdadm: /dev/sdb3 appears to be part of a raid array:
>    level=raid5 devices=4 ctime=Thu May  9 22:47:08 2013
> Continue creating array? y
> mdadm: Defaulting to version 1.2 metadata
> mdadm: RUN_ARRAY failed: Input/output error
> 
> Apparently, this did not work. :-(
> 
> In the RAID wiki, it says one should ask here before trying a destroying to much, that's why I am asking here....
> 
> Btw, attached is the raid.status and the dmesg output.
> 
> Is there a hint what we need to do?
> 
> Thanks in advance,
> Philipp
> <output.txt><raid.status>

Yup - as everybody else said it's toasty - hopefully you have a backup.

If not, I will offer you another suggestion - contact the vendor and try to see if they can return your failed drives to you.... if you can get them back there is a high likelihood that you can recover a majority, if not all, of your data.

Sam

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-06-06 12:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-05 12:02 RAID5 recovery Philipp Frauenfelder
2013-06-05 13:20 ` Drew
2013-06-05 13:29 ` John Robinson
2013-06-05 15:10   ` Drew
2013-06-05 20:12     ` Philipp Frauenfelder
2013-06-06 12:28 ` Sam Bingner
  -- strict thread matches above, loose matches on Subject: below --
2007-11-14  1:05 RAID5 Recovery Neil Cavan
2007-11-14 10:58 ` David Greaves
     [not found] <1161479672.14505.10.camel@localhost>
2006-10-22  1:14 ` Neil Cavan
2006-10-23  1:29 ` Neil Brown
     [not found]   ` <1161571953.4871.8.camel@localhost>
2006-10-23  2:52     ` Neil Cavan
2006-10-23  4:01     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).