Thanks a lot!
The array seems to start with only minor problems

mdadm: forcing event count in /dev/sdd1(3) from 112333 upto 112358
mdadm: clearing FAULTY flag for device 1 in /dev/md0 for /dev/sdd1
mdadm: /dev/md0 has been started with 3 drives (out of 4).

File systems are corrupted but not too seriously.
I will have a look for RAID6 in the future.

Thanks again,
Pierre

Pierre MARTINEAU

Institut de Recherche en Cancérologie de Montpellier
Inserm U896 – Université Montpellier 1 – CRLC Val d’Aurelle
Campus Val d’Aurelle
208 Rue des Apothicaires
F-34298 Montpellier Cedex 5, France

Tel: +33 (0)4 67 61 37 43
Fax: +33 (0)4 67 61 37 87
E-mail: pierre.martineau@inserm.fr
E-mail: pierre.martineau@montpellier.unicancer.fr
Site internet: http://www.ircm.fr

Le 15/04/2013 17:19, Robin Hill a écrit :
> On Mon Apr 15, 2013 at 03:47:39PM +0200, Pierre Martineau wrote:
>
>> Dear Raid experts,
>>
>> I have a Raid5 volume that recently crashed and I need you advices
>> before doing some irreversible action.
>>
>> Let me first summarize the past and current state.
>>
>> 1) I had a nicely running RAID5 volume with 3 x 1 To disks (LVM on top
>> and several LVM volumes in ext3 and axt4) but volume was now a bit too
>> small and I decided to add a new 1 To disk.
>>
> Given the rebuild time for a 1To disk, I'd be wary of running RAID5 - if
> you have the space, adding another disk and going to RAID6 will be much
> safer.
>
>> 2) I added a new disk and did not do anything for a couple of days (Raid
>> still running with 3 disks)
>>
>> 3) One of the old disk failed and was ejected from the RAID.
>>
>> 4) The ejected disk was not even present as /dev/sdX. I thus tested the
>> connections and the disk came back.
>>
>> 5) I resync the ejected disk and I was back with my original 3 disk array.
>>
>> 6) I waited 2-3 days and everything was fine. I then added the new disk
>> and resync.
>>
>> 7) I had now a running 4 disk RAID5 array, I created a new volume and
>> started copying on it.
>>
>> 8) During the week-end, 2 disks were ejected from the array, the new
>> installed one and the same than previously (step 3)
>>
>> 9) Again the 2 disks were not present in /dev/sdX. I thus checked again
>> the connections and the problem was a molex connector. The two ejected
>> disks were on the same molex and this explains why both were detected as
>> faulty.
>>
>> Now, my list of errors as a newbie.
>>
>> 4) I did not save all the informations before proceeding (mdadm
>> --examine, /etc/mdadm/mdadm.conf, syslog, ...)
>>
>> 5) I tried to assemble the disks with
>> mdadm --assemble --scan
>> with no result
>>
>> 6) I thus tried and this is my big error I think !!!
>> mdadm --assemble /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>>
>> I forgot in this command /dev/md0 after assemble.
>> Because of this /dev/sdb1 suberblock was removed and now mdadm--examine
>> /dev/sdb1 returns "No md superblock detected on /dev/sdb1"
>>
>> I would like now to be more cautious. If some nice expert from the list
>> would be nice enough to tell me if the proposed method described below
>> is the right approach I will be grateful for the rest of my life :-)
>>
>> 7) I read the RAID wiki and the list.
>>
>> 8) I saved
>> mdadm --examine /dev/sd[bcde]1
>> dmesg
>> syslog
>> /etc/mdadm/mdadm.conf
>> fdisk -lu /dev/sd[bcde]
>>
>> I put the content of this files at the end of this message (except dmesg
>> and syslog because they are very long).
>>
>> 9) /dev/sdd is the new disk. This is clear in the fdisk listing since it
>> is a 4K sector disk.
>> The normal order of the raid is thus (see mdadm --examine /dev/sd[de]1)
>> sdb1 sdc1 sde1 sdd1
>>
>> 10) Events are
>> /dev/sdb1: no md superblock (see 6)
>> /dev/sdc1: Events : 112358
>> /dev/sdd1: Events : 112333
>> /dev/sde1: Events : 112358
>>
>> It seems that sdd was the first disk removed.
>> Presumably sdb1 is in sync since it was running with sdc1 when the sdd1
>> and sde1 were ejected from the array (see 8) but I can't be sure since I
>> stupidly erased its superblock!
>>
>> 11) I propose to re-create the array with the --assume-clean option,
>> then check everything using "fsck -n" and "mount -o ro"
>> the command would be:
>>
>> mdadm --create /dev/md0 -e 0.90 --assume-clean --level=5 --n=4 \
>> --chunk=64 --size=976759936 /dev/sdb1 /dev/sdc1 /dev/sde1 /dev/sdd1
>>
> <-- snip -->
>
> Have you tried to force assemble the array first? Recreating the array
> is a risky option, so should be avoided if possible. First try doing:
>    mdadm -Af /dev/md0 /dev/sd[cde]1
>
> If that works then you'll need to re-add (and rebuild) /dev/sdb1. If it
> doesn't work, try rerunning (after making sure the array is stopped) and
> adding "-vvv" for extra verbosity, then send through the output from
> that and anything relevant from dmesg.
>
> HTH,
>      Robin