From: Adam Thompson <athompso@athompso.net>
To: NeilBrown <neilb@suse.de>, Phil Turmel <philip@turmel.org>
Cc: linux-raid@vger.kernel.org, "Cordes, Trevor" <trevor@tecnopolis.ca>
Subject: Re: dead RAID6 array on CentOS6.6 / kernel 3.19
Date: Wed, 11 Feb 2015 12:21:49 -0600 [thread overview]
Message-ID: <54DB9DBD.2040202@athompso.net> (raw)
In-Reply-To: <20150211152605.0c1bf94e@notabene.brown>
On 2015-02-10 10:26 PM, NeilBrown wrote:
>>> Also, kernel 3.19, which I mentioned we're running, pretty much *is* my
>>> definition of an up-to-date kernel... how much newer do you want me to
>>> try, and where would you recommend I find such a thing in a bootable image?
>> You're right, 3.19 should be fine. I'm stumped. Looks like a bug.
>> Adding Neil ....
> I think it is an mdadm bug. I don't see a mention of mdadm version number
> (but I didn't look very hard).
> If you are using 3.3, update to at least 3.3.1
>
> (just
> cd /tmp
> git clone git://neil.brown.name/mdadm
> cd mdadm
> make
> ./mdadm --assemble --force /dev/md127 .....
> )
>
> NeilBrown
So, I'm already running mdadm v3.3 from CentOS 6.6 (the precise package
version# is in the original message).
I've tried building the latest-and-greatest, but fail on the RUN_DIR
check. Looks like it can be disabled with no downside... yup, compiles
with no errors now.
Yay! mdadm from git was able to reassemble the array:
(I find it interesting that it bumped the event count up to 26307...
*again*. Old v3.3 mdadm already claims to have done exactly that.)
> [root@muug mdadm]# ./mdadm --verbose --assemble --force /dev/md127
> /dev/sd[a-l]
> mdadm: looking for devices for /dev/md127
> mdadm: failed to get exclusive lock on mapfile - continue anyway...
> mdadm: /dev/sda is identified as a member of /dev/md127, slot 11.
> mdadm: /dev/sdb is identified as a member of /dev/md127, slot 2.
> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdd is identified as a member of /dev/md127, slot 3.
> mdadm: /dev/sde is identified as a member of /dev/md127, slot 5.
> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 6.
> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 7.
> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 4.
> mdadm: /dev/sdi is identified as a member of /dev/md127, slot 8.
> mdadm: /dev/sdj is identified as a member of /dev/md127, slot 9.
> mdadm: /dev/sdk is identified as a member of /dev/md127, slot 10.
> mdadm: /dev/sdl is identified as a member of /dev/md127, slot 0.
> mdadm: forcing event count in /dev/sdf(6) from 26263 upto 26307
> mdadm: forcing event count in /dev/sdg(7) from 26263 upto 26307
> mdadm: forcing event count in /dev/sda(11) from 26263 upto 26307
> mdadm: clearing FAULTY flag for device 5 in /dev/md127 for /dev/sdf
> mdadm: clearing FAULTY flag for device 6 in /dev/md127 for /dev/sdg
> mdadm: clearing FAULTY flag for device 0 in /dev/md127 for /dev/sda
> mdadm: Marking array /dev/md127 as 'clean'
> mdadm: added /dev/sdc to /dev/md127 as 1
> mdadm: added /dev/sdb to /dev/md127 as 2
> mdadm: added /dev/sdd to /dev/md127 as 3
> mdadm: added /dev/sdh to /dev/md127 as 4
> mdadm: added /dev/sde to /dev/md127 as 5
> mdadm: added /dev/sdf to /dev/md127 as 6
> mdadm: added /dev/sdg to /dev/md127 as 7
> mdadm: added /dev/sdi to /dev/md127 as 8
> mdadm: added /dev/sdj to /dev/md127 as 9
> mdadm: added /dev/sdk to /dev/md127 as 10
> mdadm: added /dev/sda to /dev/md127 as 11
> mdadm: added /dev/sdl to /dev/md127 as 0
> mdadm: /dev/md127 has been started with 12 drives.
> [root@muug mdadm]# cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4] [raid10]
> md127 : active raid6 sdl[12] sda[13] sdk[10] sdj[9] sdi[8] sdg[7]
> sdf[6] sde[5] sdh[4] sdd[3] sdb[2] sdc[1]
> 39068875120 blocks super 1.2 level 6, 4k chunk, algorithm 2
> [12/12] [UUUUUUUUUUUU]
> bitmap: 0/30 pages [0KB], 65536KB chunk
>
> md0 : active raid1 sdm1[0] sdn1[1]
> 1048512 blocks super 1.0 [2/2] [UU]
> bitmap: 0/1 pages [0KB], 65536KB chunk
>
> unused devices: <none>
Kernel messages accompanying this:
> Feb 11 11:53:46 muug kernel: md: md127 stopped.
> Feb 11 11:53:47 muug kernel: md: bind<sdc>
> Feb 11 11:53:47 muug kernel: md: bind<sdb>
> Feb 11 11:53:47 muug kernel: md: bind<sdd>
> Feb 11 11:53:47 muug kernel: md: bind<sdh>
> Feb 11 11:53:47 muug kernel: md: bind<sde>
> Feb 11 11:53:47 muug kernel: md: bind<sdf>
> Feb 11 11:53:47 muug kernel: md: bind<sdg>
> Feb 11 11:53:47 muug kernel: md: bind<sdi>
> Feb 11 11:53:47 muug kernel: md: bind<sdj>
> Feb 11 11:53:47 muug kernel: md: bind<sdk>
> Feb 11 11:53:47 muug kernel: md: bind<sda>
> Feb 11 11:53:47 muug kernel: md: bind<sdl>
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdl operational as
> raid disk 0
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sda operational as
> raid disk 11
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdk operational as
> raid disk 10
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdj operational as
> raid disk 9
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdi operational as
> raid disk 8
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdg operational as
> raid disk 7
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdf operational as
> raid disk 6
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sde operational as
> raid disk 5
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdh operational as
> raid disk 4
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdd operational as
> raid disk 3
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdb operational as
> raid disk 2
> Feb 11 11:53:47 muug kernel: md/raid:md127: device sdc operational as
> raid disk 1
> Feb 11 11:53:47 muug kernel: md/raid:md127: allocated 0kB
> Feb 11 11:53:47 muug kernel: md/raid:md127: raid level 6 active with
> 12 out of 12 devices, algorithm 2
> Feb 11 11:53:47 muug kernel: created bitmap (30 pages) for device md127
> Feb 11 11:53:47 muug kernel: md127: bitmap initialized from disk: read
> 2 pages, set 280 of 59615 bits
> Feb 11 11:53:48 muug kernel: md127: detected capacity change from 0 to
> 40006528122880
> Feb 11 11:53:48 muug kernel: md127: unknown partition table
Then, since it's an LVM PV:
> [root@muug ~]# pvscan
> PV /dev/sdm2 VG vg00 lvm2 [110.79 GiB / 0 free]
> PV /dev/sdn2 VG vg00 lvm2 [110.79 GiB / 24.00 MiB free]
> PV /dev/md127 VG vg00 lvm2 [36.39 TiB / 0 free]
> Total: 3 [36.60 TiB] / in use: 3 [36.60 TiB] / in no VG: 0 [0 ]
> [root@muug ~]# vgscan
> Reading all physical volumes. This may take a while...
> Found volume group "vg00" using metadata type lvm2
> [root@muug ~]# lvscan
> ACTIVE '/dev/vg00/root' [64.00 GiB] inherit
> ACTIVE '/dev/vg00/swap' [32.00 GiB] inherit
> inactive '/dev/vg00/ARRAY' [36.39 TiB] inherit
> inactive '/dev/vg00/cache' [30.71 GiB] inherit
> [root@muug ~]# lvchange -a y /dev/vg00/ARRAY
> Feb 11 12:04:15 muug kernel: md/raid1:mdX: active with 2 out of 2 mirrors
> Feb 11 12:04:15 muug kernel: created bitmap (31 pages) for device mdX
> Feb 11 12:04:15 muug kernel: mdX: bitmap initialized from disk: read 2
> pages, set 636 of 62904 bits
> Feb 11 12:04:15 muug kernel: md/raid1:mdX: active with 2 out of 2 mirrors
> Feb 11 12:04:15 muug kernel: created bitmap (1 pages) for device mdX
> Feb 11 12:04:15 muug kernel: mdX: bitmap initialized from disk: read 1
> pages, set 1 of 64 bits
> Feb 11 12:04:15 muug kernel: device-mapper: cache-policy-mq: version
> 1.3.0 loaded
> Feb 11 12:04:16 muug lvm[1418]: Monitoring RAID device
> vg00-cache_cdata for events.
> Feb 11 12:04:16 muug lvm[1418]: Monitoring RAID device
> vg00-cache_cmeta for events.
> [root@muug ~]# lvs
> LV VG Attr LSize Pool Origin Data% Meta% Move
> Log Cpy%Sync Convert
> ARRAY vg00 Cwi-a-C--- 36.39t cache [ARRAY_corig]
> cache vg00 Cwi---C--- 30.71g
> root vg00 rwi-aor---
> 64.00g 100.00
> swap vg00 -wi-ao---- 32.00g
> [root@muug ~]# mount -oro /dev/vg00/ARRAY /ARRAY
> Feb 11 12:04:37 muug kernel: XFS (dm-17): Mounting V4 Filesystem
> Feb 11 12:04:38 muug kernel: XFS (dm-17): Ending clean mount
> [root@muug ~]# umount /ARRAY
> [root@muug ~]# mount /ARRAY
> Feb 11 12:04:45 muug kernel: XFS (dm-17): Mounting V4 Filesystem
> Feb 11 12:04:45 muug kernel: XFS (dm-17): Ending clean mount
> [root@muug ~]# df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/vg00-root
> 63G 22G 39G 36% /
> tmpfs 16G 0 16G 0% /dev/shm
> /dev/md0 1008M , 278M 680M 29% /boot
> /dev/mapper/vg00-ARRAY
> 37T 16T 21T 43% /ARRAY
Wow... xfs_check (xfs_db, actually) needed ~40GB of RAM to check the
filesystem... but it thinks everything's OK.
The big question I have now:
If it's a bug in:
mdadm v3.3 and/or
CentOS 6.6 rc scripts and/or
kernel 3.19,
what should I do to prevent future re-occurrences of the same
problem? I don't want to have to keep buying new underwear... ;-)
--
-Adam Thompson
athompso@athompso.net
+1 (204) 291-7950 - cell
+1 (204) 489-6515 - fax
prev parent reply other threads:[~2015-02-11 18:21 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-11 1:53 dead RAID6 array on CentOS6.6 / kernel 3.19 Adam Thompson
2015-02-11 2:39 ` Phil Turmel
2015-02-11 2:53 ` Adam Thompson
2015-02-11 3:08 ` Phil Turmel
2015-02-11 3:26 ` Adam Thompson
2015-02-11 4:26 ` NeilBrown
2015-02-11 18:21 ` Adam Thompson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54DB9DBD.2040202@athompso.net \
--to=athompso@athompso.net \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=philip@turmel.org \
--cc=trevor@tecnopolis.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.