From mboxrd@z Thu Jan 1 00:00:00 1970 From: Nicolas Jungers Subject: Re: RAID5 disk failure during rebuild of spare, any chance of recovery when one of the failed devices is suspected to be intact? Date: Mon, 16 Aug 2010 18:37:56 +0200 Message-ID: <4C696964.7030205@jungers.net> References: <4C68CCC9.2050604@jungers.net> <4C68D6D3.6070906@jungers.net> <4C68FA1D.7040105@seoss.co.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: =?UTF-8?B?VG9yIEFybmUgVmVzdGLDuA==?= Cc: linux-raid List-Id: linux-raid.ids On 08/16/2010 06:27 PM, Tor Arne Vestb=C3=B8 wrote: > On Mon, Aug 16, 2010 at 10:43 AM, Tim Small wrote: >> On 16/08/10 07:12, Nicolas Jungers wrote: >>> >>> On 08/16/2010 07:54 AM, Tor Arne Vestb=C3=B8 wrote: >>>> >>>> You mean you sdc and sde plus either sdb or sdd, depending on whic= h >>>> one I think is more sane a this point? >>> >>> I'd try both. Do a ddrescue of the failing one and try that (with = copy of >>> the others) and check what's coming out. >> >> As an alternative to using ddrescue, you could quickly prototype var= ious >> arrangements (without writing anything to the drives) using a device= -mapper >> copy-on-write mapping - I posted some details to the list a while ba= ck when >> I was trying to use this to reconstruct a hw raid array... Check th= e list >> archives for details. > > Cool, here's what I tried: > > Created spares files for each of the devices > > dd if=3D/dev/zero of=3Dsdb_cow bs=3D1 count=3D0 seek=3D2GB > > Mapped that to a loop device > > losetup /dev/loop1 sdb_cow > > Then ran the following for each device: > > cow_size=3D`blockdev --getsize /dev/sdb1` > chunk_size=3D64 > echo "0 $cow_size snapshot /dev/sdb1 /dev/loop1 p $chunk_size" | > dmsetup create sdb1_cow > > After these were created I tried the following: > > # mdadm -v -C /dev/md0 -l5 -n4 /dev/mapper/sdb1_cow > /dev/mapper/sdc1_cow missing /dev/mapper/sde1_cow > mdadm: layout defaults to left-symmetric > mdadm: chunk size defaults to 64K > mdadm: /dev/mapper/sdb1_cow appears to be part of a raid array: > level=3Draid5 devices=3D4 ctime=3DSun Mar 2 22:52:53 2008 > mdadm: /dev/mapper/sdc1_cow appears to be part of a raid array: > level=3Draid5 devices=3D4 ctime=3DSun Mar 2 22:52:53 2008 > mdadm: /dev/mapper/sde1_cow appears to be part of a raid array: > level=3Draid5 devices=3D4 ctime=3DSun Mar 2 22:52:53 2008 > mdadm: size set to 732571904K > Continue creating array? Y > mdadm: array /dev/md0 started. > > # mdadm --detail /dev/md0 > /dev/md0: > Version : 00.90 > Creation Time : Mon Aug 16 18:20:06 2010 > Raid Level : raid5 > Array Size : 2197715712 (2095.91 GiB 2250.46 GB) > Used Dev Size : 732571904 (698.64 GiB 750.15 GB) > Raid Devices : 4 > Total Devices : 3 > Preferred Minor : 0 > Persistence : Superblock is persistent > > Update Time : Mon Aug 16 18:20:06 2010 > State : clean, degraded > Active Devices : 3 > Working Devices : 3 > Failed Devices : 0 > Spare Devices : 0 > > Layout : left-symmetric > Chunk Size : 64K > > UUID : 916ceaa2:b877a3cc:3973abef:31f2d600 (local to host= monstre) > Events : 0.1 > > Number Major Minor RaidDevice State > 0 251 9 0 active sync /dev/block/251= :9 > 1 251 10 1 active sync /dev/block/251= :10 > 2 0 0 2 removed > 3 251 12 3 active sync /dev/block/251= :12 > > And I can now mount /dev/mapper/raid-home ! > > The question now is, what next? Should I start copying things off to = a > backup, or run fsck first or something else to try to repair errors? > Or perhaps are the 2GB sparse files to small for anything like that? =46or me: first, copy everything. You have an unreliable disk in the=20 middle of your data. N. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html