* md raid6 not working @ 2012-08-20 19:06 Vanhorn, Mike 2012-08-20 22:34 ` NeilBrown 0 siblings, 1 reply; 5+ messages in thread From: Vanhorn, Mike @ 2012-08-20 19:06 UTC (permalink / raw) To: linux-raid@vger.kernel.org I have/had an 8-disk md raid 6, /dev/md0. At some point over the weekend, two of the disks suddently became marked as "spare" and the other has disappeared completely (at least as far as mdadm is concerned). All eight disks seem to be just fine, so I think the data is okay, and if I could just convince it to start the array with all 8 disks, I actually think everything would be fine. However, everything I've tried has come to nothing, and now I think I am stuck. Is there some way to just "force" is to change the two spare disks from "spare" to "active", and then let it go? Here's what I think are relevant details: The RAID is/was composed of /dev/sd[bcdefghi]1. /proc/mdstat says: # cat /proc/mdstat Personalities : [raid6] md0 : inactive sdc1[1] sdd1[10] sdi1[8] sdg1[5] sdf1[4] sde1[3] sdh1[2] 13674583552 blocks unused devices: <none> # So, here, sdb is the only one missing. However, if I try to start the array # mdadm --assemble /dev/md0 mdadm: /dev/sdi1 has no superblock - assembly aborted # So, I check /dev/sdi1: # mdadm --examine /dev/sdi1 /dev/sdi1: Magic : a92b4efc Version : 0.90.00 UUID : 6b8b4567:327b23c6:643c9869:66334873 Creation Time : Mon Jun 28 10:46:51 2010 Raid Level : raid6 Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Array Size : 11721071616 (11178.09 GiB 12002.38 GB) Raid Devices : 8 Total Devices : 6 Preferred Minor : 0 Update Time : Mon Aug 20 12:10:18 2012 State : clean Active Devices : 5 Working Devices : 6 Failed Devices : 2 Spare Devices : 1 Checksum : 297da62d - correct Events : 59235337 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 8 8 129 8 spare /dev/sdi1 0 0 0 0 0 removed 1 1 8 33 1 active sync /dev/sdc1 2 2 8 113 2 active sync /dev/sdh1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 active sync /dev/sdf1 5 5 8 97 5 active sync /dev/sdg1 6 6 0 0 6 faulty removed 7 7 0 0 7 faulty removed 8 8 8 129 8 spare /dev/sdi1 # The fact that that command worked on /dev/sdi1 indicates that there is, in fact, a superblock, doesn't it? At any rate, going from the output of --examine on sdi1, it would seem that /dev/sdd1 is also not working. So, # mdadm --examine /dev/sdd1 /dev/sdd1: Magic : a92b4efc Version : 0.90.00 UUID : 6b8b4567:327b23c6:643c9869:66334873 Creation Time : Mon Jun 28 10:46:51 2010 Raid Level : raid6 Used Dev Size : 1953511936 (1863.01 GiB 2000.40 GB) Array Size : 11721071616 (11178.09 GiB 12002.38 GB) Raid Devices : 8 Total Devices : 5 Preferred Minor : 0 Update Time : Mon Aug 20 12:10:21 2012 State : clean Active Devices : 5 Working Devices : 5 Failed Devices : 2 Spare Devices : 0 Checksum : 297da583 - correct Events : 59235338 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 10 8 49 -1 spare /dev/sdd1 0 0 0 0 0 removed 1 1 8 33 1 active sync /dev/sdc1 2 2 8 113 2 active sync /dev/sdh1 3 3 8 65 3 active sync /dev/sde1 4 4 8 81 4 active sync /dev/sdf1 5 5 8 97 5 active sync /dev/sdg1 6 6 0 0 6 faulty removed 7 7 0 0 7 faulty removed # Which would seem to indicate that sdd1 is fine, too. So, then, what about sdb1? # mdadm --examine /dev/sdb1 mdadm: No md superblock detected on /dev/sdb1. # Okay, fine, maybe something actually has happened to sdb1. However, since it's a RAID6, having that one bad disk should be survivable. If I could just get the other two disks (sdi1 and sdd1) to not be spares. --- Mike VanHorn Senior Computer Systems Administrator College of Engineering and Computer Science Wright State University 265 Russ Engineering Center 937-775-5157 michael.vanhorn@wright.edu http://www.cecs.wright.edu/~mvanhorn/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: md raid6 not working 2012-08-20 19:06 md raid6 not working Vanhorn, Mike @ 2012-08-20 22:34 ` NeilBrown 2012-08-21 11:17 ` Vanhorn, Mike 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2012-08-20 22:34 UTC (permalink / raw) To: Vanhorn, Mike; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1313 bytes --] On Mon, 20 Aug 2012 19:06:17 +0000 "Vanhorn, Mike" <michael.vanhorn@wright.edu> wrote: > > I have/had an 8-disk md raid 6, /dev/md0. At some point over the weekend, > two of the disks suddently became marked as "spare" and the other has > disappeared completely (at least as far as mdadm is concerned). > > All eight disks seem to be just fine, so I think the data is okay, and if > I could just convince it to start the array with all 8 disks, I actually > think everything would be fine. However, everything I've tried has come to > nothing, and now I think I am stuck. > > Is there some way to just "force" is to change the two spare disks from > "spare" to "active", and then let it go? > > Here's what I think are relevant details: > > The RAID is/was composed of /dev/sd[bcdefghi]1. > > /proc/mdstat says: > > # cat /proc/mdstat > Personalities : [raid6] > md0 : inactive sdc1[1] sdd1[10] sdi1[8] sdg1[5] sdf1[4] sde1[3] sdh1[2] > 13674583552 blocks > > unused devices: <none> > # > > So, here, sdb is the only one missing. However, if I try to start the array > > # mdadm --assemble /dev/md0 > mdadm: /dev/sdi1 has no superblock - assembly aborted > # What is the result of: mdadm -S /dev/md0 mdadm -Avvv /dev/md0 ?? NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: md raid6 not working 2012-08-20 22:34 ` NeilBrown @ 2012-08-21 11:17 ` Vanhorn, Mike 2012-08-21 22:38 ` NeilBrown 0 siblings, 1 reply; 5+ messages in thread From: Vanhorn, Mike @ 2012-08-21 11:17 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org Thank you for your response. Here is the output you requested. >What is the result of: > > mdadm -S /dev/md0 # mdadm -S /dev/md0 mdadm: stopped /dev/md0 # > mdadm -Avvv /dev/md0 # mdadm -Avvv /dev/md0 mdadm: looking for devices for /dev/md0 mdadm: cannot open device /dev/sdi1: Device or resource busy mdadm: /dev/sdi1 has no superblock - assembly aborted # --- Mike VanHorn Senior Computer Systems Administrator College of Engineering and Computer Science Wright State University 265 Russ Engineering Center 937-775-5157 michael.vanhorn@wright.edu http://www.cecs.wright.edu/~mvanhorn/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: md raid6 not working 2012-08-21 11:17 ` Vanhorn, Mike @ 2012-08-21 22:38 ` NeilBrown 2012-08-22 12:57 ` Vanhorn, Mike 0 siblings, 1 reply; 5+ messages in thread From: NeilBrown @ 2012-08-21 22:38 UTC (permalink / raw) To: Vanhorn, Mike; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1155 bytes --] On Tue, 21 Aug 2012 11:17:57 +0000 "Vanhorn, Mike" <michael.vanhorn@wright.edu> wrote: > Thank you for your response. Here is the output you requested. > > >What is the result of: > > > > mdadm -S /dev/md0 > > # mdadm -S /dev/md0 > mdadm: stopped /dev/md0 > # > > > mdadm -Avvv /dev/md0 > > # mdadm -Avvv /dev/md0 > mdadm: looking for devices for /dev/md0 > mdadm: cannot open device /dev/sdi1: Device or resource busy > mdadm: /dev/sdi1 has no superblock - assembly aborted > # So /dev/sdi1 is busy. You need to find out why. (the "no superblock" message is a bit misleading... I might have fixed that in newer mdadm, I'm not sure). The "/proc/mdstat" that you showed in the original email had sdi1 as a member of md0, so it clearly wasn't being used by anything else then. "mdadm -S /dev/md0" would have removed it from md0 so it shouldn't have been busy. The fact that it is busy is very odd. A device can be busy if: - it is mounted as a filesystem - it is active as swap - it is part of an md array - it is part of a dm device - probably something else, but those are the main ones. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: md raid6 not working 2012-08-21 22:38 ` NeilBrown @ 2012-08-22 12:57 ` Vanhorn, Mike 0 siblings, 0 replies; 5+ messages in thread From: Vanhorn, Mike @ 2012-08-22 12:57 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org On 8/21/12 6:38 PM, "NeilBrown" <neilb@suse.de> wrote: >># mdadm -Avvv /dev/md0 >>mdadm: looking for devices for /dev/md0 >>mdadm: cannot open device /dev/sdi1: Device or resource busy >>mdadm: /dev/sdi1 has no superblock - assembly aborted >># > >So /dev/sdi1 is busy. You need to find out why. (the "no superblock" >message is a bit misleading... I might have fixed that in newer mdadm, I'm >not sure). > >The "/proc/mdstat" that you showed in the original email had sdi1 as a >member >of md0, so it clearly wasn't being used by anything else then. >"mdadm -S /dev/md0" would have removed it from md0 so it shouldn't have >been >busy. >The fact that it is busy is very odd. > >A device can be busy if: >- it is mounted as a filesystem >- it is active as swap >- it is part of an md array >- it is part of a dm device >- probably something else, but those are the main ones. > > Okay, I have gone to investigate what was using /dev/sdi1 yesterday morning when I tried to assemble the array. I couldn't find anything at all that would have been doing something with that disk, so I simply tried the assemble again, and this time it worked (well, sort of): # mdadm -Avvv /dev/md0 mdadm: looking for devices for /dev/md0 mdadm: /dev/sdb1 is not one of /dev/sdc1,/dev/sdd1,/dev/sde1,/dev/sdf1,/dev/sdg1,/dev/sdh1,/dev/sdi1 mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 8. mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 2. mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5. mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4. mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3. mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot -1. mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1. mdadm: no uptodate device for slot 0 of /dev/md0 mdadm: added /dev/sdh1 to /dev/md0 as 2 mdadm: added /dev/sde1 to /dev/md0 as 3 mdadm: added /dev/sdf1 to /dev/md0 as 4 mdadm: added /dev/sdg1 to /dev/md0 as 5 mdadm: no uptodate device for slot 6 of /dev/md0 mdadm: no uptodate device for slot 7 of /dev/md0 mdadm: added /dev/sdi1 to /dev/md0 as 8 mdadm: added /dev/sdd1 to /dev/md0 as -1 mdadm: added /dev/sdc1 to /dev/md0 as 1 mdadm: /dev/md0 assembled from 5 drives and 2 spares - not enough to start the array. # So, sdi1 seems to be just fine. However, since two of the disks are getting marked as spares, it can't start the array. I don't ever recall setting the two disks as spares, and even if I had, would one of the spares have kicked in when sdb1 went bad? Or, am I not understanding the concept of a spare as it applies to a level 6 raid? At this point, I'm thinking that sdd1 and sdi1 really should be in either slot 0, 6 or 7, but I'm not sure which ones. Is there a way to use trial-and-error to assemble the array with, for example, sdd1 as slot 0, and see if it works ("working" meaning that I could then mount the xfs file system) and, if it doesn't, stop the array, and then try it in slot 6? I am, I guess, making the assumption that it being marked a spare is incorrect, and that it does, in fact, have data on it. --- Mike VanHorn Senior Computer Systems Administrator College of Engineering and Computer Science Wright State University 265 Russ Engineering Center 937-775-5157 michael.vanhorn@wright.edu http://www.cecs.wright.edu/~mvanhorn/ ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-08-22 12:57 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-20 19:06 md raid6 not working Vanhorn, Mike 2012-08-20 22:34 ` NeilBrown 2012-08-21 11:17 ` Vanhorn, Mike 2012-08-21 22:38 ` NeilBrown 2012-08-22 12:57 ` Vanhorn, Mike
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).