RAID5 recovering

Linux RAID subsystem development
 help / color / mirror / Atom feed

* RAID5 recovering
@ 2013-04-15 13:47 Pierre Martineau
  2013-04-15 15:19 ` Robin Hill
  0 siblings, 1 reply; 6+ messages in thread
From: Pierre Martineau @ 2013-04-15 13:47 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 9363 bytes --]

Dear Raid experts,

I have a Raid5 volume that recently crashed and I need you advices 
before doing some irreversible action.

Let me first summarize the past and current state.

1) I had a nicely running RAID5 volume with 3 x 1 To disks (LVM on top 
and several LVM volumes in ext3 and axt4) but volume was now a bit too 
small and I decided to add a new 1 To disk.

2) I added a new disk and did not do anything for a couple of days (Raid 
still running with 3 disks)

3) One of the old disk failed and was ejected from the RAID.

4) The ejected disk was not even present as /dev/sdX. I thus tested the 
connections and the disk came back.

5) I resync the ejected disk and I was back with my original 3 disk array.

6) I waited 2-3 days and everything was fine. I then added the new disk 
and resync.

7) I had now a running 4 disk RAID5 array, I created a new volume and 
started copying on it.

8) During the week-end, 2 disks were ejected from the array, the new 
installed one and the same than previously (step 3)

9) Again the 2 disks were not present in /dev/sdX. I thus checked again 
the connections and the problem was a molex connector. The two ejected 
disks were on the same molex and this explains why both were detected as 
faulty.

Now, my list of errors as a newbie.

4) I did not save all the informations before proceeding (mdadm 
--examine, /etc/mdadm/mdadm.conf, syslog, ...)

5) I tried to assemble the disks with
mdadm --assemble --scan
with no result

6) I thus tried and this is my big error I think !!!
mdadm --assemble /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

I forgot in this command /dev/md0 after assemble.
Because of this /dev/sdb1 suberblock was removed and now mdadm--examine 
/dev/sdb1 returns "No md superblock detected on /dev/sdb1"

I would like now to be more cautious. If some nice expert from the list 
would be nice enough to tell me if the proposed method described below 
is the right approach I will be grateful for the rest of my life :-)

7) I read the RAID wiki and the list.

8) I saved
mdadm --examine /dev/sd[bcde]1
dmesg
syslog
/etc/mdadm/mdadm.conf
fdisk -lu /dev/sd[bcde]

I put the content of this files at the end of this message (except dmesg 
and syslog because they are very long).

9) /dev/sdd is the new disk. This is clear in the fdisk listing since it 
is a 4K sector disk.
The normal order of the raid is thus (see mdadm --examine /dev/sd[de]1)
sdb1 sdc1 sde1 sdd1

10) Events are
/dev/sdb1: no md superblock (see 6)
/dev/sdc1: Events : 112358
/dev/sdd1: Events : 112333
/dev/sde1: Events : 112358

It seems that sdd was the first disk removed.
Presumably sdb1 is in sync since it was running with sdc1 when the sdd1 
and sde1 were ejected from the array (see 8) but I can't be sure since I 
stupidly erased its superblock!

11) I propose to re-create the array with the --assume-clean option, 
then check everything using "fsck -n" and "mount -o ro"
the command would be:

mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
--chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdd1

however, since sdd1 is not really in sync since its event count is a bit 
lower I could also just try

mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
--chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 missing

however, I'm not completely sure for sdb1 since it does not have a 
superblock I could also try

mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
--chunk=64 --size=976759936 missing /dev/sdc1 /dev/sde1 /dev/sdd1

Would you use the 4 disks as in the first command or do you think that 
the 20 event difference is a big problem?
If it works, what it the best way to test that everything is ok?

Thanks a lot for your help.

------------------------------- /etc/mdadm/mdadm.conf 
----------------------------------------------
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
DEVICE /dev/sd*1 /dev/sdf1 /dev/sdc1 /dev/sdd1

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md0 level=raid5 num-devices=3 
UUID=760291c6:73cd6884:c91d1289:ceb97d9c

# This file was auto-generated on Wed, 04 Mar 2009 17:10:18 +0100
# by mkconf $Id$

--------------------------------- mdadm --examine /dev/sd[bcde]1 
----------------------------------------------------------
#mdadm --examine /dev/sd[bcde]1

mdadm: No md superblock detected on /dev/sdb1.

/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 760291c6:73cd6884:c91d1289:ceb97d9c (local to host backup)
Creation Time : Wed Mar 4 17:13:19 2009
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Update Time : Thu Apr 11 03:03:18 2013
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 2
Spare Devices : 0
Checksum : 2329d0 - correct
Events : 112358

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 1 8 33 1 active sync /dev/sdc1

0 0 8 17 0 active sync
1 1 8 33 1 active sync /dev/sdc1
2 2 0 0 2 active sync
3 3 0 0 3 faulty removed
/dev/sdd1:
Magic : a92b4efc
Version : 00.90.00
UUID : 760291c6:73cd6884:c91d1289:ceb97d9c (local to host backup)
Creation Time : Wed Mar 4 17:13:19 2009
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Update Time : Wed Apr 10 23:52:35 2013
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 21461c - correct
Events : 112333

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 3 8 49 3 active sync /dev/sdd1

0 0 8 17 0 active sync
1 1 8 33 1 active sync /dev/sdc1
2 2 8 65 2 active sync /dev/sde1
3 3 8 49 3 active sync /dev/sdd1
/dev/sde1:
Magic : a92b4efc
Version : 00.90.00
UUID : 760291c6:73cd6884:c91d1289:ceb97d9c (local to host backup)
Creation Time : Wed Mar 4 17:13:19 2009
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 2930279808 (2794.53 GiB 3000.61 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0

Update Time : Wed Apr 10 23:52:35 2013
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 214643 - correct
Events : 112358

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 2 8 65 2 active sync /dev/sde1

0 0 8 17 0 active sync
1 1 8 33 1 active sync /dev/sdc1
2 2 8 65 2 active sync /dev/sde1
3 3 8 49 3 active sync /dev/sdd1

---------------------------------------------------- fdisk -lu 
/dev/sd[bcde] -----------------------------------------------
Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00025fce

Device Boot Start End Blocks Id System
/dev/sdb1 63 1953520064 976760001 fd Linux raid autodetect

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000a177b

Device Boot Start End Blocks Id System
/dev/sdc1 63 1953520064 976760001 fd Linux raid autodetect

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
16 heads, 29 sectors/track, 4210183 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x9ab6c5c9

Device Boot Start End Blocks Id System
/dev/sdd1 2048 1953522049 976760001 fd Linux raid autodetect

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000291d5

Device Boot Start End Blocks Id System
/dev/sde1 63 1953520064 976760001 fd Linux raid autodetect



-- 
Pierre MARTINEAU

Institut de Recherche en Cancérologie de Montpellier
Inserm U896 – Université Montpellier 1 – CRLC Val d’Aurelle
Campus Val d’Aurelle
208 Rue des Apothicaires
F-34298 Montpellier Cedex 5, France

Tel: +33 (0)4 67 61 37 43
Fax: +33 (0)4 67 61 37 87
E-mail:pierre.martineau@inserm.fr
E-mail:pierre.martineau@montpellier.unicancer.fr
Site internet:http://www.ircm.fr


[-- Attachment #2: pierre_martineau.vcf --]
[-- Type: text/x-vcard, Size: 329 bytes --]

begin:vcard
fn:Pierre MARTINEAU
n:MARTINEAU;Pierre
org:INSERM U896;IRCM
adr:208 rue des Apothicaires;;CRLC Val d'Aurelle-Paul Lamarque;Montpellier;;34298;France
email;internet:pierre.martineau@inserm.fr
tel;work:+33 (0)4 67 61 37 43
tel;fax:+33 (0)4 67 61 37 87
x-mozilla-html:FALSE
url:http://www.ircm.fr
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID5 recovering
  2013-04-15 13:47 RAID5 recovering Pierre Martineau
@ 2013-04-15 15:19 ` Robin Hill
  2013-04-15 15:49   ` Oliver Schinagl
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Robin Hill @ 2013-04-15 15:19 UTC (permalink / raw)
  To: Pierre Martineau; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4267 bytes --]

On Mon Apr 15, 2013 at 03:47:39PM +0200, Pierre Martineau wrote:

> Dear Raid experts,
> 
> I have a Raid5 volume that recently crashed and I need you advices 
> before doing some irreversible action.
> 
> Let me first summarize the past and current state.
> 
> 1) I had a nicely running RAID5 volume with 3 x 1 To disks (LVM on top 
> and several LVM volumes in ext3 and axt4) but volume was now a bit too 
> small and I decided to add a new 1 To disk.
> 
Given the rebuild time for a 1To disk, I'd be wary of running RAID5 - if
you have the space, adding another disk and going to RAID6 will be much
safer.

> 2) I added a new disk and did not do anything for a couple of days (Raid 
> still running with 3 disks)
> 
> 3) One of the old disk failed and was ejected from the RAID.
> 
> 4) The ejected disk was not even present as /dev/sdX. I thus tested the 
> connections and the disk came back.
> 
> 5) I resync the ejected disk and I was back with my original 3 disk array.
> 
> 6) I waited 2-3 days and everything was fine. I then added the new disk 
> and resync.
> 
> 7) I had now a running 4 disk RAID5 array, I created a new volume and 
> started copying on it.
> 
> 8) During the week-end, 2 disks were ejected from the array, the new 
> installed one and the same than previously (step 3)
> 
> 9) Again the 2 disks were not present in /dev/sdX. I thus checked again 
> the connections and the problem was a molex connector. The two ejected 
> disks were on the same molex and this explains why both were detected as 
> faulty.
> 
> Now, my list of errors as a newbie.
> 
> 4) I did not save all the informations before proceeding (mdadm 
> --examine, /etc/mdadm/mdadm.conf, syslog, ...)
> 
> 5) I tried to assemble the disks with
> mdadm --assemble --scan
> with no result
> 
> 6) I thus tried and this is my big error I think !!!
> mdadm --assemble /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
> 
> I forgot in this command /dev/md0 after assemble.
> Because of this /dev/sdb1 suberblock was removed and now mdadm--examine 
> /dev/sdb1 returns "No md superblock detected on /dev/sdb1"
> 
> I would like now to be more cautious. If some nice expert from the list 
> would be nice enough to tell me if the proposed method described below 
> is the right approach I will be grateful for the rest of my life :-)
> 
> 7) I read the RAID wiki and the list.
> 
> 8) I saved
> mdadm --examine /dev/sd[bcde]1
> dmesg
> syslog
> /etc/mdadm/mdadm.conf
> fdisk -lu /dev/sd[bcde]
> 
> I put the content of this files at the end of this message (except dmesg 
> and syslog because they are very long).
> 
> 9) /dev/sdd is the new disk. This is clear in the fdisk listing since it 
> is a 4K sector disk.
> The normal order of the raid is thus (see mdadm --examine /dev/sd[de]1)
> sdb1 sdc1 sde1 sdd1
> 
> 10) Events are
> /dev/sdb1: no md superblock (see 6)
> /dev/sdc1: Events : 112358
> /dev/sdd1: Events : 112333
> /dev/sde1: Events : 112358
> 
> It seems that sdd was the first disk removed.
> Presumably sdb1 is in sync since it was running with sdc1 when the sdd1 
> and sde1 were ejected from the array (see 8) but I can't be sure since I 
> stupidly erased its superblock!
> 
> 11) I propose to re-create the array with the --assume-clean option, 
> then check everything using "fsck -n" and "mount -o ro"
> the command would be:
> 
> mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
> --chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdd1
> 
<-- snip -->

Have you tried to force assemble the array first? Recreating the array
is a risky option, so should be avoided if possible. First try doing:
  mdadm -Af /dev/md0 /dev/sd[cde]1

If that works then you'll need to re-add (and rebuild) /dev/sdb1. If it
doesn't work, try rerunning (after making sure the array is stopped) and
adding "-vvv" for extra verbosity, then send through the output from
that and anything relevant from dmesg.

HTH,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@robinhill.me.uk> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID5 recovering
  2013-04-15 15:19 ` Robin Hill
@ 2013-04-15 15:49   ` Oliver Schinagl
  2013-04-15 15:58   ` Pierre Martineau
  2013-04-16  8:30   ` Roman Mamedov
  2 siblings, 0 replies; 6+ messages in thread
From: Oliver Schinagl @ 2013-04-15 15:49 UTC (permalink / raw)
  To: Pierre Martineau, linux-raid

On 15-04-13 17:19, Robin Hill wrote:
> On Mon Apr 15, 2013 at 03:47:39PM +0200, Pierre Martineau wrote:
>
>> Dear Raid experts,
>>
>> I have a Raid5 volume that recently crashed and I need you advices
>> before doing some irreversible action.
>>
>> Let me first summarize the past and current state.
>>
>> 1) I had a nicely running RAID5 volume with 3 x 1 To disks (LVM on top
>> and several LVM volumes in ext3 and axt4) but volume was now a bit too
>> small and I decided to add a new 1 To disk.
>>
> Given the rebuild time for a 1To disk, I'd be wary of running RAID5 - if
> you have the space, adding another disk and going to RAID6 will be much
> safer.
+1
Raid5 is great, it really is, but raid6 is so much more better.
>> 2) I added a new disk and did not do anything for a couple of days (Raid
>> still running with 3 disks)
>>
>> 3) One of the old disk failed and was ejected from the RAID.
>>
>> 4) The ejected disk was not even present as /dev/sdX. I thus tested the
>> connections and the disk came back.
>>
>> 5) I resync the ejected disk and I was back with my original 3 disk array.
>>
>> 6) I waited 2-3 days and everything was fine. I then added the new disk
>> and resync.
>>
>> 7) I had now a running 4 disk RAID5 array, I created a new volume and
>> started copying on it.
>>
>> 8) During the week-end, 2 disks were ejected from the array, the new
>> installed one and the same than previously (step 3)
>>
>> 9) Again the 2 disks were not present in /dev/sdX. I thus checked again
>> the connections and the problem was a molex connector. The two ejected
>> disks were on the same molex and this explains why both were detected as
>> faulty.
>>
>> Now, my list of errors as a newbie.
>>
>> 4) I did not save all the informations before proceeding (mdadm
>> --examine, /etc/mdadm/mdadm.conf, syslog, ...)
>>
>> 5) I tried to assemble the disks with
>> mdadm --assemble --scan
>> with no result
>>
>> 6) I thus tried and this is my big error I think !!!
>> mdadm --assemble /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>
>> I forgot in this command /dev/md0 after assemble.
>> Because of this /dev/sdb1 suberblock was removed and now mdadm--examine
>> /dev/sdb1 returns "No md superblock detected on /dev/sdb1"
>>
>> I would like now to be more cautious. If some nice expert from the list
>> would be nice enough to tell me if the proposed method described below
>> is the right approach I will be grateful for the rest of my life :-)
>>
>> 7) I read the RAID wiki and the list.
>>
>> 8) I saved
>> mdadm --examine /dev/sd[bcde]1
>> dmesg
>> syslog
>> /etc/mdadm/mdadm.conf
>> fdisk -lu /dev/sd[bcde]
>>
>> I put the content of this files at the end of this message (except dmesg
>> and syslog because they are very long).
>>
>> 9) /dev/sdd is the new disk. This is clear in the fdisk listing since it
>> is a 4K sector disk.
>> The normal order of the raid is thus (see mdadm --examine /dev/sd[de]1)
>> sdb1 sdc1 sde1 sdd1
>>
>> 10) Events are
>> /dev/sdb1: no md superblock (see 6)
>> /dev/sdc1: Events : 112358
>> /dev/sdd1: Events : 112333
>> /dev/sde1: Events : 112358
>>
>> It seems that sdd was the first disk removed.
>> Presumably sdb1 is in sync since it was running with sdc1 when the sdd1
>> and sde1 were ejected from the array (see 8) but I can't be sure since I
>> stupidly erased its superblock!
>>
>> 11) I propose to re-create the array with the --assume-clean option,
>> then check everything using "fsck -n" and "mount -o ro"
>> the command would be:
>>
>> mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
>> --chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdd1
>>
> <-- snip -->
>
> Have you tried to force assemble the array first? Recreating the array
> is a risky option, so should be avoided if possible. First try doing:
>    mdadm -Af /dev/md0 /dev/sd[cde]1
I don't know if this would have been the best first course of action. 
You forcibly used the array with a wrong event count. You got lucky this 
time and only had minor corruptions, it could have been much much worse.

You could have examined the superblock first with hexdump -C /dev/sdb1 | 
less

See if it is all actually zero, or just some fields and hopefully could 
be recreated by examining the other disks.

I personally would have trusted the recreation method more. Dump all 
superblocks (as backup! with dd so you can always write it back)! 
recreate it using sd[bce]1 (sdd1 wasn't fully in sync) and fsck -n (read 
only test). If that is okay, read only mount. (I would even mark the 
array  as read-only). If all that works. You have a corrected 3/4 array. 
Re-add sdd1.

If you dump the superblock via dd (some hexdumping juju should give you 
the start of the ext/lvm's and thus upto that point should be dumped, 
about 4MiB i guess) you should have a perfectly acceptable way to get 
your superblocks back into its original state (if needed).

Also, I recall having read on this list that raid5 disk 'order' didn't 
matter? Only with raid6 it apparently mattered.

Anyway, you got it all back, so lucky you :)
>
> If that works then you'll need to re-add (and rebuild) /dev/sdb1. If it
> doesn't work, try rerunning (after making sure the array is stopped) and
> adding "-vvv" for extra verbosity, then send through the output from
> that and anything relevant from dmesg.
>
> HTH,
>      Robin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID5 recovering
  2013-04-15 15:19 ` Robin Hill
  2013-04-15 15:49   ` Oliver Schinagl
@ 2013-04-15 15:58   ` Pierre Martineau
  2013-04-16  8:30   ` Roman Mamedov
  2 siblings, 0 replies; 6+ messages in thread
From: Pierre Martineau @ 2013-04-15 15:58 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4956 bytes --]

Thanks a lot!
The array seems to start with only minor problems

mdadm: forcing event count in /dev/sdd1(3) from 112333 upto 112358
mdadm: clearing FAULTY flag for device 1 in /dev/md0 for /dev/sdd1
mdadm: /dev/md0 has been started with 3 drives (out of 4).

File systems are corrupted but not too seriously.
I will have a look for RAID6 in the future.

Thanks again,
Pierre

Pierre MARTINEAU

Institut de Recherche en Cancérologie de Montpellier
Inserm U896 – Université Montpellier 1 – CRLC Val d’Aurelle
Campus Val d’Aurelle
208 Rue des Apothicaires
F-34298 Montpellier Cedex 5, France

Tel: +33 (0)4 67 61 37 43
Fax: +33 (0)4 67 61 37 87
E-mail: pierre.martineau@inserm.fr
E-mail: pierre.martineau@montpellier.unicancer.fr
Site internet: http://www.ircm.fr

Le 15/04/2013 17:19, Robin Hill a écrit :
> On Mon Apr 15, 2013 at 03:47:39PM +0200, Pierre Martineau wrote:
>
>> Dear Raid experts,
>>
>> I have a Raid5 volume that recently crashed and I need you advices
>> before doing some irreversible action.
>>
>> Let me first summarize the past and current state.
>>
>> 1) I had a nicely running RAID5 volume with 3 x 1 To disks (LVM on top
>> and several LVM volumes in ext3 and axt4) but volume was now a bit too
>> small and I decided to add a new 1 To disk.
>>
> Given the rebuild time for a 1To disk, I'd be wary of running RAID5 - if
> you have the space, adding another disk and going to RAID6 will be much
> safer.
>
>> 2) I added a new disk and did not do anything for a couple of days (Raid
>> still running with 3 disks)
>>
>> 3) One of the old disk failed and was ejected from the RAID.
>>
>> 4) The ejected disk was not even present as /dev/sdX. I thus tested the
>> connections and the disk came back.
>>
>> 5) I resync the ejected disk and I was back with my original 3 disk array.
>>
>> 6) I waited 2-3 days and everything was fine. I then added the new disk
>> and resync.
>>
>> 7) I had now a running 4 disk RAID5 array, I created a new volume and
>> started copying on it.
>>
>> 8) During the week-end, 2 disks were ejected from the array, the new
>> installed one and the same than previously (step 3)
>>
>> 9) Again the 2 disks were not present in /dev/sdX. I thus checked again
>> the connections and the problem was a molex connector. The two ejected
>> disks were on the same molex and this explains why both were detected as
>> faulty.
>>
>> Now, my list of errors as a newbie.
>>
>> 4) I did not save all the informations before proceeding (mdadm
>> --examine, /etc/mdadm/mdadm.conf, syslog, ...)
>>
>> 5) I tried to assemble the disks with
>> mdadm --assemble --scan
>> with no result
>>
>> 6) I thus tried and this is my big error I think !!!
>> mdadm --assemble /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>
>> I forgot in this command /dev/md0 after assemble.
>> Because of this /dev/sdb1 suberblock was removed and now mdadm--examine
>> /dev/sdb1 returns "No md superblock detected on /dev/sdb1"
>>
>> I would like now to be more cautious. If some nice expert from the list
>> would be nice enough to tell me if the proposed method described below
>> is the right approach I will be grateful for the rest of my life :-)
>>
>> 7) I read the RAID wiki and the list.
>>
>> 8) I saved
>> mdadm --examine /dev/sd[bcde]1
>> dmesg
>> syslog
>> /etc/mdadm/mdadm.conf
>> fdisk -lu /dev/sd[bcde]
>>
>> I put the content of this files at the end of this message (except dmesg
>> and syslog because they are very long).
>>
>> 9) /dev/sdd is the new disk. This is clear in the fdisk listing since it
>> is a 4K sector disk.
>> The normal order of the raid is thus (see mdadm --examine /dev/sd[de]1)
>> sdb1 sdc1 sde1 sdd1
>>
>> 10) Events are
>> /dev/sdb1: no md superblock (see 6)
>> /dev/sdc1: Events : 112358
>> /dev/sdd1: Events : 112333
>> /dev/sde1: Events : 112358
>>
>> It seems that sdd was the first disk removed.
>> Presumably sdb1 is in sync since it was running with sdc1 when the sdd1
>> and sde1 were ejected from the array (see 8) but I can't be sure since I
>> stupidly erased its superblock!
>>
>> 11) I propose to re-create the array with the --assume-clean option,
>> then check everything using "fsck -n" and "mount -o ro"
>> the command would be:
>>
>> mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
>> --chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdd1
>>
> <-- snip -->
>
> Have you tried to force assemble the array first? Recreating the array
> is a risky option, so should be avoided if possible. First try doing:
>    mdadm -Af /dev/md0 /dev/sd[cde]1
>
> If that works then you'll need to re-add (and rebuild) /dev/sdb1. If it
> doesn't work, try rerunning (after making sure the array is stopped) and
> adding "-vvv" for extra verbosity, then send through the output from
> that and anything relevant from dmesg.
>
> HTH,
>      Robin


[-- Attachment #2: pierre_martineau.vcf --]
[-- Type: text/x-vcard, Size: 329 bytes --]

begin:vcard
fn:Pierre MARTINEAU
n:MARTINEAU;Pierre
org:INSERM U896;IRCM
adr:208 rue des Apothicaires;;CRLC Val d'Aurelle-Paul Lamarque;Montpellier;;34298;France
email;internet:pierre.martineau@inserm.fr
tel;work:+33 (0)4 67 61 37 43
tel;fax:+33 (0)4 67 61 37 87
x-mozilla-html:FALSE
url:http://www.ircm.fr
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID5 recovering
  2013-04-15 15:19 ` Robin Hill
  2013-04-15 15:49   ` Oliver Schinagl
  2013-04-15 15:58   ` Pierre Martineau
@ 2013-04-16  8:30   ` Roman Mamedov
  2013-04-16 16:41     ` Roy Sigurd Karlsbakk
  2 siblings, 1 reply; 6+ messages in thread
From: Roman Mamedov @ 2013-04-16  8:30 UTC (permalink / raw)
  To: Robin Hill; +Cc: Pierre Martineau, linux-raid

[-- Attachment #1: Type: text/plain, Size: 620 bytes --]

On Mon, 15 Apr 2013 16:19:39 +0100
Robin Hill <robin@robinhill.me.uk> wrote:

> Given the rebuild time for a 1To disk, I'd be wary of running RAID5 - if
> you have the space, adding another disk and going to RAID6 will be much
> safer.

As I see it, 3 disks is about the only configuration where RAID5 still does
make sense.

4 disks is a tricky spot, RAID5 already feels a bit too dangerous, but RAID6
is still not space-efficient enough.

5-disk (and more) RAID6 is the way to go, but changing from 3 disks to 5 or
more is not always justified requirements/cost/space-wise.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: RAID5 recovering
  2013-04-16  8:30   ` Roman Mamedov
@ 2013-04-16 16:41     ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 6+ messages in thread
From: Roy Sigurd Karlsbakk @ 2013-04-16 16:41 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Pierre Martineau, linux-raid, Robin Hill

> > Given the rebuild time for a 1To disk, I'd be wary of running RAID5
> > - if
> > you have the space, adding another disk and going to RAID6 will be
> > much
> > safer.
> 
> As I see it, 3 disks is about the only configuration where RAID5 still
> does make sense.
> 
> 4 disks is a tricky spot, RAID5 already feels a bit too dangerous, but
> RAID6 is still not space-efficient enough.

Given the fact that all data demands grow, the space-efficiency part is temporary, so I wouldn't worry too much about it. Helped a friend to setup a RAID-5 and after her initial drive failure occured, gave her a tip on getting a new one in addition to the RMAed one. 4 drives in RAID-6 now, half a year after the initial failure, rock solid. If she gets a double drive failure, well, the chances are good her data survives it.

So yes, use RAID-6 if you can. Perhaps even with a spare if you've got another drive and you don't need the extra space.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 98013356
roy@karlsbakk.net
http://blogg.karlsbakk.net/
GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-16 16:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-15 13:47 RAID5 recovering Pierre Martineau
2013-04-15 15:19 ` Robin Hill
2013-04-15 15:49   ` Oliver Schinagl
2013-04-15 15:58   ` Pierre Martineau
2013-04-16  8:30   ` Roman Mamedov
2013-04-16 16:41     ` Roy Sigurd Karlsbakk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox