From: Paul Boven <boven@jive.nl>
To: Phil Turmel <philip@turmel.org>, linux-raid@vger.kernel.org
Subject: Re: Raid 5: all devices marked spare, cannot assemble
Date: Thu, 12 Mar 2015 15:28:52 +0100 [thread overview]
Message-ID: <5501A2A4.7060900@jive.nl> (raw)
In-Reply-To: <55019940.4030104@turmel.org>
Hi Phil,
Good morning and thanks for your quick reply.
On 03/12/2015 02:48 PM, Phil Turmel wrote:
>> I have a rather curious issue with one of our storage machines. The
>> machine has 36x 4TB disks (SuperMicro 847 chassis) which are divided
>> over 4 dual SAS-HBAs and the on-board SAS. These disks are in RAID5
>> configurations, 6 raids of 6 disks each. Recently the machine ran out of
>> memory (it has 32GB, and no swapspace as it boots from SATA-DOM) and the
>> last entries in the syslog are from the OOM-killer. The machine is
>> running Ubuntu 14.04.02 LTS, mdadm 3.2.5-5ubuntu4.1.
>
> {BTW, I think raid5 is *insane* for this size array.}
It's 6 raid5s, not a single big one. This is only a temporary holding
space for data to be processed. In its original incarnation the machine
had 36 distinct file-systems that we would read from in a software
stripe, just to get enough IO performance. So this is a trade-off
between IO-speed and lost capacity versus convenience in case a drive
inevitably fails.
I guess you would recommend raid6? I would have liked a global hot
spare, maybe 7 arrays of 5 disks, but then we lose 8 disks in total
instead of the current 6.
> Wrong syntax. It's already assembled. Just try "mdadm --run /dev/md15"
Trying to 'run' md15 gives me the same errors as before:
md/raid:md15: not clean -- starting background reconstruction
md/raid:md15: device sdad1 operational as raid disk 0
md/raid:md15: device sdy1 operational as raid disk 3
md/raid:md15: device sdv1 operational as raid disk 4
md/raid:md15: device sdm1 operational as raid disk 2
md/raid:md15: device sdq1 operational as raid disk 1
md/raid:md15: allocated 0kB
md/raid:md15: cannot start dirty degraded array.
RAID conf printout:
--- level:5 rd:6 wd:5
disk 0, o:1, dev:sdad1
disk 1, o:1, dev:sdq1
disk 2, o:1, dev:sdm1
disk 3, o:1, dev:sdy1
disk 4, o:1, dev:sdv1
md/raid:md15: failed to run raid set.
md: pers->run() failed ...
> If the simple --run doesn't work, stop the array and force assemble the
> good drives:
>
> mdadm --stop /dev/md15
> mdadm --assemble --force --verbose /dev/md15 /dev/sd{ad,q,m,y,v}1
That worked!
mdadm: looking for devices for /dev/md15
mdadm: /dev/sdad1 is identified as a member of /dev/md15, slot 0.
mdadm: /dev/sdq1 is identified as a member of /dev/md15, slot 1.
mdadm: /dev/sdm1 is identified as a member of /dev/md15, slot 2.
mdadm: /dev/sdy1 is identified as a member of /dev/md15, slot 3.
mdadm: /dev/sdv1 is identified as a member of /dev/md15, slot 4.
mdadm: Marking array /dev/md15 as 'clean'
mdadm: added /dev/sdq1 to /dev/md15 as 1
mdadm: added /dev/sdm1 to /dev/md15 as 2
mdadm: added /dev/sdy1 to /dev/md15 as 3
mdadm: added /dev/sdv1 to /dev/md15 as 4
mdadm: no uptodate device for slot 5 of /dev/md15
mdadm: added /dev/sdad1 to /dev/md15 as 0
mdadm: /dev/md15 has been started with 5 drives (out of 6).
I've checked that the filesystem is in good shape, and added /dev/sdd1
back in, the array is now resyncing. 680 minutes to go, but there's a
few tricks I can do to speed that up a bit.
> In other words, unclean shutdowns should have manual intervention,
> unless the array in question contains the root filesystem, in which case
> the risky "start_dirty_degraded" may be appropriate. In that case, you
> probably would want your initramfs to have a special mdadm.conf,
> deferring assembly of bulk arrays to normal userspace.
I'm perfectly happy with doing the recovery in userspace, these drives
are not critical for booting. Except that Ubuntu, Plymouth and a few
other things conspire against booting a machine with any disk problems,
but that's a different rant for a different place.
Thank you very much for your very helpful reply, things look a lot
better now.
Regards, Paul Boven.
--
Paul Boven <boven@jive.nl> +31 (0)521-596547
Unix/Linux/Networking specialist
Joint Institute for VLBI in Europe - www.jive.nl
VLBI - It's a fringe science
next prev parent reply other threads:[~2015-03-12 14:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-12 12:21 Raid 5: all devices marked spare, cannot assemble Paul Boven
2015-03-12 13:48 ` Phil Turmel
2015-03-12 14:28 ` Paul Boven [this message]
2015-03-13 10:06 ` Bad block management in raid1 Ankur Bose
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5501A2A4.7060900@jive.nl \
--to=boven@jive.nl \
--cc=linux-raid@vger.kernel.org \
--cc=philip@turmel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.