From mboxrd@z Thu Jan 1 00:00:00 1970 From: Phil Turmel Subject: Re: md_raid5 recovering failed need help Date: Sat, 27 Dec 2014 10:24:22 -0500 Message-ID: <549ECF26.2040104@turmel.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Stephan Hafiz , linux-raid List-Id: linux-raid.ids Good morning David, { or Stephan ? } On 12/25/2014 09:24 AM, Stephan Hafiz wrote: > Hi! I=E2=80=99m from germany and my raid and me needs help. > My english isn=E2=80=99t very good, but i think it=E2=80=99s sufficie= nt. And i think, this mailinglist is my last hope =E2=98=BA This is the right place for problems with linux raid arrays. > So on, =E2=80=A6. Here ist my problem. > The raid5 has lost 2 of 5 disks. First one disk and then the second o= ne. Ok. Not uncommon. [trim /] > !SMART Status > for i in a b c d e f; do echo Device sd$i; smartctl -H /dev/sd$i | e= grep overall; echo; done; > Device sda > SMART overall-health self-assessment test result: PASSED >=20 > Device sdb > SMART overall-health self-assessment test result: PASSED >=20 > Device sdc > SMART overall-health self-assessment test result: PASSED >=20 > Device sdd > SMART overall-health self-assessment test result: PASSED >=20 > Device sde > SMART overall-health self-assessment test result: PASSED >=20 > Device sdf > SMART overall-health self-assessment test result: PASSED It is extremely common to have an overall result of "PASSED" when you aren't safe at all. Please redo this without trimming, like so: for x in /dev/sd[b-f] ; do echo $x ; smartctl -x $x ; done Paste the result at the end of you next mail--no need to attach nor nee= d for pastebin services. Also, if you still have any syslogs from the time of the failure, it would be good to see the kernel messages that triggered the drive ejections from the raid. > !mdadm version > mdadm - v3.2.5 - 18th May 2012 > I have read about recent versions 3.3.x @ raid.wiki.kernel.org, i hav= en=E2=80=99t tested this version. It may be necessary. You haven't reported your distro nor your kernel version. > !superblock informations > Only the Events from sdb1 are off [trim /] Very good report! You've saved all the superblocks and you haven't tried to do any --create operations. [trim /] > !reassemble force > mdadm --assemble /dev/md0 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /de= v/sdf1 --force > mdadm: ignoring /dev/sdd1 as it reports /dev/sdc1 as failed > mdadm: ignoring /dev/sde1 as it reports /dev/sdc1 as failed > mdadm: ignoring /dev/sdf1 as it reports /dev/sdc1 as failed > mdadm: /dev/md0 assembled from 1 drive - not enough to start the arra= y. This should have worked. Hmmm. > i hope i don=E2=80=99t get the award =E2=80=9Epaint onself in to the = corner=E2=80=9C =E2=80=A6=E2=80=A6 Probably not. :-) The simplest way forward would probably be to boot a rescue CD (I generally use the one from sysrescuecd.org) that has a recent kernel an= d mdadm combination. Such CDs will probably attempt to assemble your array during boot to /dev/md127 instead of /dev/md0, but it will fail. So, within the rescue environment, do: mdadm --stop /dev/md127 {or whatever shows in /proc/mdstat} mdadm --assemble --force --verbose /dev/md0 /dev/sd[b-f]1 If that doesn't work, show us the verbose output, along with the matching part of the dmesg. If it does work, just do a clean shutdown and reboot back into your regular OS. > merry christmas =E2=80=A6 David And Merry Christmas to you! When you are done celebrating the revival of your array, you will need to find out why it broke in the first place. The most common cause see= n on this list is the use of consumer-grade drives without dealing with the timeout mismatch problem. You might want to review this old thread= : http://marc.info/?l=3Dlinux-raid&m=3D135811522817345&w=3D1 Phil -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html