From: "Hákon Gíslason" <hakon.gislason@gmail.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Failed drive while converting raid5 to raid6, then a hard reboot
Date: Tue, 8 May 2012 23:03:29 +0000 [thread overview]
Message-ID: <CAFTzWnq8z5VYbisFRa-rLkNBn--_ee6-bG4x2CB5NruBPXxAWQ@mail.gmail.com> (raw)
In-Reply-To: <CAFTzWnrGf5Fs4wRekjnOHfutANArJiT8j-Y6Mbd+nxPXXXrnSA@mail.gmail.com>
Forgot this: http://pastebin.ubuntu.com/976915/
--
Hákon G.
On 8 May 2012 22:19, Hákon Gíslason <hakon.gislason@gmail.com> wrote:
> Thank you for the reply, Neil
> I was using mdadm from the package manager in Debian stable first
> (v3.1.4), but after the constant drive failures I upgraded to the
> latest one (3.2.3).
> I've come to the conclusion that the drives are either failing because
> they are "green" drives, and might have power-saving features that are
> causing them to be "disconnected", or that the cables that came with
> the motherboard aren't good enough. I'm not 100% sure about either,
> but at the moment these seem likely causes. It could be incompatible
> hardware or the kernel that I'm using (proxmox debian kernel:
> 2.6.32-11-pve).
>
> I got the array assembled (thank you), but what about the raid5 to
> raid6 conversion? Do I have to complete it for this to work, or will
> mdadm know what to do? Can I cancel (revert) the conversion and get
> the array back to raid5?
>
> /proc/mdstat contains:
>
> root@axiom:~# cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active (read-only) raid6 sdc[6] sdb[5] sda[4] sdd[7]
> 5860540224 blocks super 1.2 level 6, 32k chunk, algorithm 18 [5/3] [_UUU_]
>
> unused devices: <none>
>
> If I try to mount the volume group on the array the kernel panics, and
> the system hangs. Is that related to the incomplete conversion?
>
> Thanks,
> --
> Hákon G.
>
>
>
> On 8 May 2012 20:48, NeilBrown <neilb@suse.de> wrote:
>>
>> On Mon, 30 Apr 2012 13:59:56 +0000 Hákon Gíslason
>> <hakon.gislason@gmail.com>
>> wrote:
>>
>> > Hello,
>> > I've been having frequent drive "failures", as in, they are reported
>> > failed/bad and mdadm sends me an email telling me things went wrong,
>> > etc... but after a reboot or two, they are perfectly fine again. I'm
>> > not sure what it is, but this server is quite new and I think there
>> > might be more behind it, bad memory or the motherboard (I've been
>> > having other issues as well). I've had 4 drive "failures" in this
>> > month, all different drives except for one, which "failed" twice, and
>> > all have been fixed with a reboot or rebuild (all drives reported bad
>> > by mdadm passed an extensive SMART test).
>> > Due to this, I decided to convert my raid5 array to a raid6 array
>> > while I find the root cause of the problem.
>> >
>> > I started the conversion right after a drive failure & rebuild, but as
>> > it had converted/reshaped aprox. 4%(if I remember correctly, and it
>> > was going really slowly, ~7500 minutes to completion), it reported
>> > another drive bad, and the conversion to raid6 stopped (it said
>> > "rebuilding", but the speed was 0K/sec and the time left was a few
>> > million minutes.
>> > After that happened, I tried to stop the array and reboot the server,
>> > as I had done previously to get the reportedly "bad" drive working
>> > again, but It wouldn't stop the array or reboot, neither could I
>> > unmount it, it just hung whenever I tried to do something with
>> > /dev/md0. After trying to reboot a few times, I just killed the power
>> > and re-started it. Admittedly this was probably not the best thing I
>> > could have done at that point.
>> >
>> > I have backup of ca. 80% of the data on there, it's been a month since
>> > the last complete backup (because I ran out of backup disk space).
>> >
>> > So, the big question, can the array be activated, and can it complete
>> > the conversion to raid6? And will I get my data back?
>> > I hope the data can be rescued, and any help I can get would be much
>> > appreciated!
>> >
>> > I'm fairly new to raid in general, and have been using mdadm for about
>> > a month now.
>> > Here's some data:
>> >
>> > root@axiom:~# mdadm --examine --scan
>> > ARRAY /dev/md/0 metadata=1.2 UUID=cfedbfc1:feaee982:4e92ccf4:45e08ed1
>> > name=axiom.is:0
>> >
>> >
>> > root@axiom:~# cat /proc/mdstat
>> > Personalities : [raid6] [raid5] [raid4]
>> > md0 : inactive sdc[6] sde[7] sdb[5] sda[4]
>> > 7814054240 blocks super 1.2
>> >
>> > root@axiom:~# mdadm --assemble --scan --force --run /dev/md0
>> > mdadm: /dev/md0 is already in use.
>> >
>> > root@axiom:~# mdadm --stop /dev/md0
>> > mdadm: stopped /dev/md0
>> >
>> > root@axiom:~# mdadm --assemble --scan --force --run /dev/md0
>> > mdadm: Failed to restore critical section for reshape, sorry.
>> > Possibly you needed to specify the --backup-file
>> >
>> > root@axiom:~# mdadm --assemble --scan --force --run /dev/md0
>> > --backup-file=/root/mdadm-backup-file
>> > mdadm: Failed to restore critical section for reshape, sorry.
>>
>> What version of mdadm are you using?
>>
>> I suggest getting a newer one (I'm about to release 3.2.4, but 3.2.3
>> should
>> be fine) and if just that doesn't help, add the "--invalid-backup" option.
>>
>> However I very strongly suggest you try to resolve the problem which is
>> causing your drives to fail. Until you resolve that it will keep
>> happening
>> and having it happen repeatly during the (slow) reshape process would not
>> be
>> good.
>>
>> Maybe plug the drives into another computer, or another controller, while
>> the
>> reshape runs?
>>
>> NeilBrown
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-05-08 23:03 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-04-30 13:59 Failed drive while converting raid5 to raid6, then a hard reboot Hákon Gíslason
2012-05-08 20:48 ` NeilBrown
2012-05-08 22:19 ` Hákon Gíslason
2012-05-08 23:03 ` Hákon Gíslason [this message]
2012-05-08 23:21 ` NeilBrown
2012-05-08 23:55 ` Hákon Gíslason
2012-05-09 0:20 ` Hákon Gíslason
2012-05-09 0:46 ` Hákon Gíslason
2012-05-09 0:47 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFTzWnq8z5VYbisFRa-rLkNBn--_ee6-bG4x2CB5NruBPXxAWQ@mail.gmail.com \
--to=hakon.gislason@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).