* Raid 5 Problem
@ 2008-12-14 13:41 nterry
2008-12-14 15:34 ` Michal Soltys
0 siblings, 1 reply; 14+ messages in thread
From: nterry @ 2008-12-14 13:41 UTC (permalink / raw)
To: linux-raid
Hi. I hope someone can tell me what I have done wrong. I have a 4 disk
Raid 5 array running on Fedora9. I've run this array for 2.5 years with
no issues. I recently rebooted after upgrading to Kernel 2.6.27.7.
When I did this I found that only 3 of my disks were in the array. When
I examine the three active elements of the array (/dev/sdd1, /dev/sde1,
/dev/sdc1) they all show that the array has 3 drives and one missing.
When I examine the missing drive it shows that all members of the array
are present, which I don't understand! When I try to add the missing
drive back is says the device is busy. Please see below and let me know
what I need to do to get this working again. Thanks Nigel:
==================================================================
[root@homepc ~]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[0] sdc1[3] sde1[1]
735334656 blocks level 5, 128k chunk, algorithm 2 [4/3] [UU_U]
md_d0 : inactive sdb[2](S)
245117312 blocks
unused devices: <none>
[root@homepc ~]#
==================================================================
[root@homepc ~]# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
# DEVICE partitions
MAILADDR nigel@nigelterry.net
ARRAY /dev/md0 level=raid5 num-devices=4
devices=/dev/sdb1,/dev/sdc1,/dev/sdd1,/dev/sde1
[root@homepc ~]#
==================================================================
[root@homepc ~]# mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Tue Apr 18 17:44:34 2006
Raid Level : raid5
Array Size : 735334656 (701.27 GiB 752.98 GB)
Used Dev Size : 245111552 (233.76 GiB 250.99 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Dec 12 17:48:12 2008
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 128K
UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb
Events : 0.6485812
Number Major Minor RaidDevice State
0 8 49 0 active sync /dev/sdd1
1 8 65 1 active sync /dev/sde1
2 0 0 2 removed
3 8 33 3 active sync /dev/sdc1
[root@homepc ~]# mdadm --examine /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.03
UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb
Creation Time : Tue Apr 18 17:44:34 2006
Raid Level : raid5
Used Dev Size : 245111552 (233.76 GiB 250.99 GB)
Array Size : 735334656 (701.27 GiB 752.98 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Dec 12 17:48:22 2008
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 2ec4d801 - correct
Events : 6485814
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 0 8 49 0 active sync /dev/sdd1
0 0 8 49 0 active sync /dev/sdd1
1 1 8 65 1 active sync /dev/sde1
2 2 0 0 2 faulty removed
3 3 8 33 3 active sync /dev/sdc1
[root@homepc ~]# mdadm --examine /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 0.90.03
UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb
Creation Time : Tue Apr 18 17:44:34 2006
Raid Level : raid5
Used Dev Size : 245111552 (233.76 GiB 250.99 GB)
Array Size : 735334656 (701.27 GiB 752.98 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Dec 12 17:48:22 2008
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 2ec4d813 - correct
Events : 6485814
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 1 8 65 1 active sync /dev/sde1
0 0 8 49 0 active sync /dev/sdd1
1 1 8 65 1 active sync /dev/sde1
2 2 0 0 2 faulty removed
3 3 8 33 3 active sync /dev/sdc1
[root@homepc ~]# mdadm --examine /dev/sdc1
/dev/sdc1:
Magic : a92b4efc
Version : 0.90.03
UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb
Creation Time : Tue Apr 18 17:44:34 2006
Raid Level : raid5
Used Dev Size : 245111552 (233.76 GiB 250.99 GB)
Array Size : 735334656 (701.27 GiB 752.98 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Update Time : Fri Dec 12 17:48:22 2008
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 2ec4d7f7 - correct
Events : 6485814
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 3 8 33 3 active sync /dev/sdc1
0 0 8 49 0 active sync /dev/sdd1
1 1 8 65 1 active sync /dev/sde1
2 2 0 0 2 faulty removed
3 3 8 33 3 active sync /dev/sdc1
[root@homepc ~]# mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.03
UUID : 50e3173e:b5d2bdb6:7db3576b:644409bb
Creation Time : Tue Apr 18 17:44:34 2006
Raid Level : raid5
Used Dev Size : 245111552 (233.76 GiB 250.99 GB)
Array Size : 735334656 (701.27 GiB 752.98 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Fri Dec 12 17:29:15 2008
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0
Checksum : 2ec4d1d6 - correct
Events : 6485600
Layout : left-symmetric
Chunk Size : 128K
Number Major Minor RaidDevice State
this 2 8 17 2 active sync /dev/sdb1
0 0 8 49 0 active sync /dev/sdd1
1 1 8 65 1 active sync /dev/sde1
2 2 8 17 2 active sync /dev/sdb1
3 3 8 33 3 active sync /dev/sdc1
[root@homepc ~]# mdadm /dev/md0 --add /dev/sdb1
mdadm: Cannot open /dev/sdb1: Device or resource busy
[root@homepc ~]#
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: Raid 5 Problem 2008-12-14 13:41 Raid 5 Problem nterry @ 2008-12-14 15:34 ` Michal Soltys 2008-12-14 20:41 ` nterry 0 siblings, 1 reply; 14+ messages in thread From: Michal Soltys @ 2008-12-14 15:34 UTC (permalink / raw) To: nterry; +Cc: linux-raid nterry wrote: > Hi. I hope someone can tell me what I have done wrong. I have a 4 disk > Raid 5 array running on Fedora9. I've run this array for 2.5 years with > no issues. I recently rebooted after upgrading to Kernel 2.6.27.7. > When I did this I found that only 3 of my disks were in the array. When > I examine the three active elements of the array (/dev/sdd1, /dev/sde1, > /dev/sdc1) they all show that the array has 3 drives and one missing. > When I examine the missing drive it shows that all members of the array > are present, which I don't understand! When I try to add the missing > drive back is says the device is busy. Please see below and let me know > what I need to do to get this working again. Thanks Nigel: > > ================================================================== > [root@homepc ~]# cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 sdd1[0] sdc1[3] sde1[1] > 735334656 blocks level 5, 128k chunk, algorithm 2 [4/3] [UU_U] > md_d0 : inactive sdb[2](S) > 245117312 blocks > unused devices: <none> > [root@homepc ~]# For some reason, it looks like you have 2 raid arrays visible - md0 and md_d0. The latter took sdb (not sdb1) as its component. sd{c,d,e}1 is in assembeld array (with appropriately updated superblocks), thus mdadm --examine calls show one device as removed, but sdb is part of another inactive array, and the superblock is untouched and shows "old" situation. Note that 0.9 superblock is stored at the end of the device (see md(4) for details), so its position could be valid for both sdb and sdb1. This might be an effect of --incremental assembly mode. Hard to tell more without seeing startup scripts, mdadm.conf, udev rules, partition layout... Did upgrade involve anything more besides kernel ? Stop both arrays, check mdadm.conf, assemble md0 manually (mdadm -A /dev/md0 /dev/sd{c,d,e}1 ), verify situation with mdadm -D. If everything looks sane, add /dev/sdb1 to the array. Still, w/o checking out startup stuff, it might happen again after reboot. Adding DEVICE /dev/sd[bcde]1 to mdadm.conf might help though. Wait a bit for other suggestions as well. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 15:34 ` Michal Soltys @ 2008-12-14 20:41 ` nterry 2008-12-14 20:53 ` Justin Piszcz 2008-12-14 21:14 ` Michal Soltys 0 siblings, 2 replies; 14+ messages in thread From: nterry @ 2008-12-14 20:41 UTC (permalink / raw) To: Michal Soltys, linux-raid Michal Soltys wrote: > nterry wrote: >> Hi. I hope someone can tell me what I have done wrong. I have a 4 >> disk Raid 5 array running on Fedora9. I've run this array for 2.5 >> years with no issues. I recently rebooted after upgrading to Kernel >> 2.6.27.7. When I did this I found that only 3 of my disks were in >> the array. When I examine the three active elements of the array >> (/dev/sdd1, /dev/sde1, /dev/sdc1) they all show that the array has 3 >> drives and one missing. When I examine the missing drive it shows >> that all members of the array are present, which I don't understand! >> When I try to add the missing drive back is says the device is busy. >> Please see below and let me know what I need to do to get this >> working again. Thanks Nigel: >> >> ================================================================== >> [root@homepc ~]# cat /proc/mdstat >> Personalities : [raid6] [raid5] [raid4] >> md0 : active raid5 sdd1[0] sdc1[3] sde1[1] >> 735334656 blocks level 5, 128k chunk, algorithm 2 [4/3] [UU_U] >> md_d0 : inactive sdb[2](S) >> 245117312 blocks >> unused devices: <none> >> [root@homepc ~]# > > For some reason, it looks like you have 2 raid arrays visible - md0 > and md_d0. The latter took sdb (not sdb1) as its component. > > sd{c,d,e}1 is in assembeld array (with appropriately updated > superblocks), thus mdadm --examine calls show one device as removed, > but sdb is part of another inactive array, and the superblock is > untouched and shows "old" situation. Note that 0.9 superblock is > stored at the end of the device (see md(4) for details), so its > position could be valid for both sdb and sdb1. > > This might be an effect of --incremental assembly mode. Hard to tell > more without seeing startup scripts, mdadm.conf, udev rules, partition > layout... Did upgrade involve anything more besides kernel ? > > Stop both arrays, check mdadm.conf, assemble md0 manually (mdadm -A > /dev/md0 /dev/sd{c,d,e}1 ), verify situation with mdadm -D. If > everything looks sane, add /dev/sdb1 to the array. Still, w/o checking > out startup stuff, it might happen again after reboot. Adding DEVICE > /dev/sd[bcde]1 to mdadm.conf might help though. > > Wait a bit for other suggestions as well. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > I don't think the Kernel upgrade actually caused the problem. I tried booting up on an older (2.6.27.5) kernel and that made no difference. I checked the logs for anything else that might have made a difference, but couldn't see anything that made any sense to me. I did note that on an earlier update mdadm was upgraded: Nov 26 17:08:32 Updated: mdadm-2.6.7.1-1.fc9.x86_64 and I did not reboot after that upgrade I included my mdadm.conf with the last email and it includes ARRAY /dev/md0 level=raid5 num-devices=4 devices=/dev/sdb1,/dev/sdc1,/dev/sdd1,/dev/sde1 My configuration is just vanilla Fedora9 with the mdadm.conf I sent I've never had a /dev/md_d0 array, so that must have been automatically created. I may have had other devices and partitions in /dev/md0 as I know I had several attempts at getting it working 2.5 years ago, and I had other issues when Fedora changed device naming, I think at FC7. There is only one partition on /dev/sdb, see below: (parted) select /dev/sdb Using /dev/sdb (parted) print Model: ATA Maxtor 6L250R0 (scsi) Disk /dev/sdb: 251GB Sector size (logical/physical): 512B/512B Partition Table: msdos Number Start End Size Type File system Flags 1 32.3kB 251GB 251GB primary boot, raid So it looks like something is creating the /dev/md_d0 and adding /dev/sdb to it before /dev/md0 gets started. So I tried: [root@homepc ~]# mdadm --stop /dev/md_d0 mdadm: stopped /dev/md_d0 [root@homepc ~]# mdadm --add /dev/md0 /dev/sdb1 mdadm: re-added /dev/sdb1 [root@homepc ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdb1[4] sdd1[0] sdc1[3] sde1[1] 735334656 blocks level 5, 128k chunk, algorithm 2 [4/3] [UU_U] [>....................] recovery = 0.1% (299936/245111552) finish=81.6min speed=49989K/sec unused devices: <none> [root@homepc ~]# Great - All working. Then I rebooted and was back to square one with only 3 drives in /dev/md0 and /dev/sdb in /dev/md_d0 So I am still not understanding where /dev/md_d0 is coming from and although I know how to get things working after a reboot, clearly this is not a long term solution... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 20:41 ` nterry @ 2008-12-14 20:53 ` Justin Piszcz 2008-12-14 20:58 ` nterry 2008-12-14 21:14 ` Michal Soltys 1 sibling, 1 reply; 14+ messages in thread From: Justin Piszcz @ 2008-12-14 20:53 UTC (permalink / raw) To: nterry; +Cc: Michal Soltys, linux-raid On Sun, 14 Dec 2008, nterry wrote: > Michal Soltys wrote: >> nterry wrote: >>> Hi. I hope someone can tell me what I have done wrong. I have a 4 disk >>> Raid 5 array running on Fedora9. I've run this array for 2.5 years with >>> no issues. I recently rebooted after upgrading to Kernel 2.6.27.7. When >>> I did this I found that only 3 of my disks were in the array. When I >>> examine the three active elements of the array (/dev/sdd1, /dev/sde1, >>> /dev/sdc1) they all show that the array has 3 drives and one missing. >>> When I examine the missing drive it shows that all members of the array >>> are present, which I don't understand! When I try to add the missing drive >>> back is says the device is busy. Please see below and let me know what I >>> need to do to get this working again. Thanks Nigel: >>> >>> ================================================================== >>> [root@homepc ~]# cat /proc/mdstat >>> Personalities : [raid6] [raid5] [raid4] >>> md0 : active raid5 sdd1[0] sdc1[3] sde1[1] >>> 735334656 blocks level 5, 128k chunk, algorithm 2 [4/3] [UU_U] >>> md_d0 : inactive sdb[2](S) >>> 245117312 blocks >>> unused devices: <none> >>> [root@homepc ~]# >> >> For some reason, it looks like you have 2 raid arrays visible - md0 and >> md_d0. The latter took sdb (not sdb1) as its component. >> >> sd{c,d,e}1 is in assembeld array (with appropriately updated superblocks), >> thus mdadm --examine calls show one device as removed, but sdb is part of >> another inactive array, and the superblock is untouched and shows "old" >> situation. Note that 0.9 superblock is stored at the end of the device >> (see md(4) for details), so its position could be valid for both sdb and >> sdb1. >> >> This might be an effect of --incremental assembly mode. Hard to tell more >> without seeing startup scripts, mdadm.conf, udev rules, partition layout... >> Did upgrade involve anything more besides kernel ? >> >> Stop both arrays, check mdadm.conf, assemble md0 manually (mdadm -A >> /dev/md0 /dev/sd{c,d,e}1 ), verify situation with mdadm -D. If everything >> looks sane, add /dev/sdb1 to the array. Still, w/o checking out startup >> stuff, it might happen again after reboot. Adding DEVICE /dev/sd[bcde]1 to >> mdadm.conf might help though. >> >> Wait a bit for other suggestions as well. >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > I don't think the Kernel upgrade actually caused the problem. I tried > booting up on an older (2.6.27.5) kernel and that made no difference. I > checked the logs for anything else that might have made a difference, but > couldn't see anything that made any sense to me. I did note that on an > earlier update mdadm was upgraded: > Nov 26 17:08:32 Updated: mdadm-2.6.7.1-1.fc9.x86_64 > and I did not reboot after that upgrade > > I included my mdadm.conf with the last email and it includes ARRAY /dev/md0 > level=raid5 num-devices=4 devices=/dev/sdb1,/dev/sdc1,/dev/sdd1,/dev/sde1 > My configuration is just vanilla Fedora9 with the mdadm.conf I sent > > I've never had a /dev/md_d0 array, so that must have been automatically > created. I may have had other devices and partitions in /dev/md0 as I know I > had several attempts at getting it working 2.5 years ago, and I had other > issues when Fedora changed device naming, I think at FC7. There is only one > partition on /dev/sdb, see below: > > (parted) select /dev/sdb > Using /dev/sdb > (parted) print > Model: ATA Maxtor 6L250R0 (scsi) > Disk /dev/sdb: 251GB > Sector size (logical/physical): 512B/512B > Partition Table: msdos > > Number Start End Size Type File system Flags 1 32.3kB > 251GB 251GB primary boot, raid > > So it looks like something is creating the /dev/md_d0 and adding /dev/sdb to > it before /dev/md0 gets started. > > So I tried: > [root@homepc ~]# mdadm --stop /dev/md_d0 > mdadm: stopped /dev/md_d0 > [root@homepc ~]# mdadm --add /dev/md0 /dev/sdb1 > mdadm: re-added /dev/sdb1 > [root@homepc ~]# cat /proc/mdstat > Personalities : [raid6] [raid5] [raid4] > md0 : active raid5 sdb1[4] sdd1[0] sdc1[3] sde1[1] > 735334656 blocks level 5, 128k chunk, algorithm 2 [4/3] [UU_U] > [>....................] recovery = 0.1% (299936/245111552) > finish=81.6min speed=49989K/sec > unused devices: <none> > [root@homepc ~]# > > Great - All working. Then I rebooted and was back to square one with only 3 > drives in /dev/md0 and /dev/sdb in /dev/md_d0 > So I am still not understanding where > /dev/md_d0 is coming from and although I know how to get things working after > a reboot, clearly this is not a long term solution... What does: mdadm --examine --scan Say? Are you using a kernel with an initrd+modules or is everything compiled in? Justin. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 20:53 ` Justin Piszcz @ 2008-12-14 20:58 ` nterry 2008-12-14 21:03 ` Justin Piszcz 0 siblings, 1 reply; 14+ messages in thread From: nterry @ 2008-12-14 20:58 UTC (permalink / raw) To: Justin Piszcz, linux-raid; +Cc: Michal Soltys Justin Piszcz wrote: > > > On Sun, 14 Dec 2008, nterry wrote: > >> Michal Soltys wrote: >>> nterry wrote: >>>> Hi. I hope someone can tell me what I have done wrong. I have a 4 >>>> disk Raid 5 array running on Fedora9. I've run this array for 2.5 >>>> years with no issues. I recently rebooted after upgrading to >>>> Kernel 2.6.27.7. When I did this I found that only 3 of my disks >>>> were in the array. When I examine the three active elements of the >>>> array (/dev/sdd1, /dev/sde1, /dev/sdc1) they all show that the >>>> array has 3 drives and one missing. When I examine the missing >>>> drive it shows that all members of the array are present, which I >>>> don't understand! When I try to add the missing drive back is says >>>> the device is busy. Please see below and let me know what I need >>>> to do to get this working again. Thanks Nigel: >>>> >>>> ================================================================== >>>> [root@homepc ~]# cat /proc/mdstat >>>> Personalities : [raid6] [raid5] [raid4] >>>> md0 : active raid5 sdd1[0] sdc1[3] sde1[1] >>>> 735334656 blocks level 5, 128k chunk, algorithm 2 [4/3] [UU_U] >>>> md_d0 : inactive sdb[2](S) >>>> 245117312 blocks >>>> unused devices: <none> >>>> [root@homepc ~]# >>> >>> For some reason, it looks like you have 2 raid arrays visible - md0 >>> and md_d0. The latter took sdb (not sdb1) as its component. >>> >>> sd{c,d,e}1 is in assembeld array (with appropriately updated >>> superblocks), thus mdadm --examine calls show one device as removed, >>> but sdb is part of another inactive array, and the superblock is >>> untouched and shows "old" situation. Note that 0.9 superblock is >>> stored at the end of the device (see md(4) for details), so its >>> position could be valid for both sdb and sdb1. >>> >>> This might be an effect of --incremental assembly mode. Hard to tell >>> more without seeing startup scripts, mdadm.conf, udev rules, >>> partition layout... Did upgrade involve anything more besides kernel ? >>> >>> Stop both arrays, check mdadm.conf, assemble md0 manually (mdadm -A >>> /dev/md0 /dev/sd{c,d,e}1 ), verify situation with mdadm -D. If >>> everything looks sane, add /dev/sdb1 to the array. Still, w/o >>> checking out startup stuff, it might happen again after reboot. >>> Adding DEVICE /dev/sd[bcde]1 to mdadm.conf might help though. >>> >>> Wait a bit for other suggestions as well. >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe >>> linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> I don't think the Kernel upgrade actually caused the problem. I >> tried booting up on an older (2.6.27.5) kernel and that made no >> difference. I checked the logs for anything else that might have >> made a difference, but couldn't see anything that made any sense to >> me. I did note that on an earlier update mdadm was upgraded: >> Nov 26 17:08:32 Updated: mdadm-2.6.7.1-1.fc9.x86_64 >> and I did not reboot after that upgrade >> >> I included my mdadm.conf with the last email and it includes ARRAY >> /dev/md0 level=raid5 num-devices=4 >> devices=/dev/sdb1,/dev/sdc1,/dev/sdd1,/dev/sde1 >> My configuration is just vanilla Fedora9 with the mdadm.conf I sent >> >> I've never had a /dev/md_d0 array, so that must have been >> automatically created. I may have had other devices and partitions >> in /dev/md0 as I know I had several attempts at getting it working >> 2.5 years ago, and I had other issues when Fedora changed device >> naming, I think at FC7. There is only one partition on /dev/sdb, see >> below: >> >> (parted) select /dev/sdb Using /dev/sdb >> (parted) print Model: ATA Maxtor 6L250R0 (scsi) >> Disk /dev/sdb: 251GB >> Sector size (logical/physical): 512B/512B >> Partition Table: msdos >> >> Number Start End Size Type File system Flags 1 >> 32.3kB 251GB 251GB primary boot, raid >> >> So it looks like something is creating the /dev/md_d0 and adding >> /dev/sdb to it before /dev/md0 gets started. >> >> So I tried: >> [root@homepc ~]# mdadm --stop /dev/md_d0 >> mdadm: stopped /dev/md_d0 >> [root@homepc ~]# mdadm --add /dev/md0 /dev/sdb1 >> mdadm: re-added /dev/sdb1 >> [root@homepc ~]# cat /proc/mdstat >> Personalities : [raid6] [raid5] [raid4] >> md0 : active raid5 sdb1[4] sdd1[0] sdc1[3] sde1[1] >> 735334656 blocks level 5, 128k chunk, algorithm 2 [4/3] [UU_U] >> [>....................] recovery = 0.1% (299936/245111552) >> finish=81.6min speed=49989K/sec >> unused devices: <none> >> [root@homepc ~]# >> >> Great - All working. Then I rebooted and was back to square one with >> only 3 drives in /dev/md0 and /dev/sdb in /dev/md_d0 >> So I am still not understanding >> where /dev/md_d0 is coming from and although I know how to get things >> working after a reboot, clearly this is not a long term solution... > > What does: > > mdadm --examine --scan > > Say? > > Are you using a kernel with an initrd+modules or is everything > compiled in? > > Justin. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [root@homepc ~]# mdadm --examine --scan ARRAY /dev/md0 level=raid5 num-devices=2 UUID=c57d50aa:1b3bcabd:ab04d342:6049b3f1 spares=1 ARRAY /dev/md0 level=raid5 num-devices=4 UUID=50e3173e:b5d2bdb6:7db3576b:644409bb spares=1 ARRAY /dev/md0 level=raid5 num-devices=4 UUID=50e3173e:b5d2bdb6:7db3576b:644409bb spares=1 [root@homepc ~]# I'm not sure I really know the answer to your second question. I'm using a regular Fedora9 kernel, so I think that is initrd+modules [root@homepc ~]# uname -a Linux homepc.nigelterry.net 2.6.27.7-53.fc9.x86_64 #1 SMP Thu Nov 27 02:05:02 EST 2008 x86_64 x86_64 x86_64 GNU/Linux [root@homepc ~]# ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 20:58 ` nterry @ 2008-12-14 21:03 ` Justin Piszcz 2008-12-14 21:08 ` Nigel J. Terry 2008-12-14 22:55 ` Michal Soltys 0 siblings, 2 replies; 14+ messages in thread From: Justin Piszcz @ 2008-12-14 21:03 UTC (permalink / raw) To: nterry; +Cc: linux-raid, Michal Soltys On Sun, 14 Dec 2008, nterry wrote: > Justin Piszcz wrote: >> >> >> On Sun, 14 Dec 2008, nterry wrote: >> >>> Michal Soltys wrote: >>>> nterry wrote: >>>>> Hi. I hope someone can tell me what I have done wrong. I have a 4 disk >>>>> Raid 5 array running on Fedora9. I've run this array for 2.5 years with >>>>> no issues. I recently rebooted after upgrading to Kernel 2.6.27.7. > [root@homepc ~]# mdadm --examine --scan > ARRAY /dev/md0 level=raid5 num-devices=2 > UUID=c57d50aa:1b3bcabd:ab04d342:6049b3f1 > spares=1 > ARRAY /dev/md0 level=raid5 num-devices=4 > UUID=50e3173e:b5d2bdb6:7db3576b:644409bb > spares=1 > ARRAY /dev/md0 level=raid5 num-devices=4 > UUID=50e3173e:b5d2bdb6:7db3576b:644409bb > spares=1 > [root@homepc ~]# I saw Debian do something like this to one of my raids once and it was because /etc/mdadm/mdadm.conf had been changed through an upgrade or some such to use md0_X, I changed it back to /dev/md0 and the problem went away. You have another issue here though, it looks like your "few" attempts have lead to multiple RAID superblocks. I have always wondered how one can clean this up without dd if=/dev/zero of=/dev/dsk & (for each disk, wipe it) to get rid of them all, you should only have [1] /dev/md0 for your raid 5, not 3. Neil? Justin. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 21:03 ` Justin Piszcz @ 2008-12-14 21:08 ` Nigel J. Terry 2008-12-14 22:55 ` Michal Soltys 1 sibling, 0 replies; 14+ messages in thread From: Nigel J. Terry @ 2008-12-14 21:08 UTC (permalink / raw) To: Justin Piszcz, linux-raid; +Cc: Michal Soltys Justin Piszcz wrote: > > > On Sun, 14 Dec 2008, nterry wrote: > >> Justin Piszcz wrote: >>> >>> >>> On Sun, 14 Dec 2008, nterry wrote: >>> >>>> Michal Soltys wrote: >>>>> nterry wrote: >>>>>> Hi. I hope someone can tell me what I have done wrong. I have a >>>>>> 4 disk Raid 5 array running on Fedora9. I've run this array for >>>>>> 2.5 years with no issues. I recently rebooted after upgrading to >>>>>> Kernel 2.6.27.7. > >> [root@homepc ~]# mdadm --examine --scan >> ARRAY /dev/md0 level=raid5 num-devices=2 >> UUID=c57d50aa:1b3bcabd:ab04d342:6049b3f1 >> spares=1 >> ARRAY /dev/md0 level=raid5 num-devices=4 >> UUID=50e3173e:b5d2bdb6:7db3576b:644409bb >> spares=1 >> ARRAY /dev/md0 level=raid5 num-devices=4 >> UUID=50e3173e:b5d2bdb6:7db3576b:644409bb >> spares=1 >> [root@homepc ~]# > > I saw Debian do something like this to one of my raids once and it was > because > /etc/mdadm/mdadm.conf had been changed through an upgrade or some such > to use > md0_X, I changed it back to /dev/md0 and the problem went away. > > You have another issue here though, it looks like your "few" attempts > have > lead to multiple RAID superblocks. I have always wondered how one can > clean > this up without dd if=/dev/zero of=/dev/dsk & (for each disk, wipe it) > to get > rid of them all, you should only have [1] /dev/md0 for your raid 5, > not 3. > > Neil? > > Justin. > The difference in my case is that I don't have /dev/md_d0 in /etc/mdadm.conf and have never had that. It seems that something is automatically creating it at boot and that has changed in the last few days. Wait for Neil I guess... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 21:03 ` Justin Piszcz 2008-12-14 21:08 ` Nigel J. Terry @ 2008-12-14 22:55 ` Michal Soltys 1 sibling, 0 replies; 14+ messages in thread From: Michal Soltys @ 2008-12-14 22:55 UTC (permalink / raw) To: Justin Piszcz; +Cc: nterry, linux-raid Justin Piszcz wrote: > > You have another issue here though, it looks like your "few" attempts have > lead to multiple RAID superblocks. I have always wondered how one can > clean > this up without dd if=/dev/zero of=/dev/dsk & (for each disk, wipe it) > to get > rid of them all, you should only have [1] /dev/md0 for your raid 5, not 3. Well - 0.9 superblock is 4K at 64K boundary, at least 64K from the end of the device, but less 128K (man 4 md). If a filesystem on some partition taking up that space *didn't* overwrite this space with its [meta]data (check with debugfs, xfs_db, etc.), just clean it up *carefully* with dd or hexeditor. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 20:41 ` nterry 2008-12-14 20:53 ` Justin Piszcz @ 2008-12-14 21:14 ` Michal Soltys 2008-12-14 21:34 ` nterry 1 sibling, 1 reply; 14+ messages in thread From: Michal Soltys @ 2008-12-14 21:14 UTC (permalink / raw) To: nterry; +Cc: linux-raid nterry wrote: > > Great - All working. Then I rebooted and was back to square one with > only 3 drives in /dev/md0 and /dev/sdb in /dev/md_d0 > So I am still not understanding where > /dev/md_d0 is coming from and although I know how to get things working > after a reboot, clearly this is not a long term solution... > My blind shot is that udev rules of your distro are doing mdadm --incremental assembly and picking sdb as a part of nonexisting array from the long ago (leftover after old experimentations ?). Or something else is doing so. What does mdadm -Esvv /dev/sdb show ? Add DEVICE /dev/sd[bcde]1 on top of your mdadm.conf - it should stop --incremental from picking up sdb. Assuming that's the cause of the problem. Also note, that FC9 might be trying to assemble the array during initramfs stage (assuming it uses one) and having problems there. I've never used Fedora so hard to tell for me - but definitely peek there, particulary at udev and mdadm part of things. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 21:14 ` Michal Soltys @ 2008-12-14 21:34 ` nterry 2008-12-14 22:02 ` Michal Soltys 2008-12-15 21:50 ` Neil Brown 0 siblings, 2 replies; 14+ messages in thread From: nterry @ 2008-12-14 21:34 UTC (permalink / raw) To: Michal Soltys; +Cc: linux-raid Michal Soltys wrote: > nterry wrote: >> >> Great - All working. Then I rebooted and was back to square one with >> only 3 drives in /dev/md0 and /dev/sdb in /dev/md_d0 >> So I am still not understanding >> where /dev/md_d0 is coming from and although I know how to get things >> working after a reboot, clearly this is not a long term solution... >> > > My blind shot is that udev rules of your distro are doing mdadm > --incremental assembly and picking sdb as a part of nonexisting array > from the long ago (leftover after old experimentations ?). Or > something else is doing so. > > What does mdadm -Esvv /dev/sdb show ? > > Add > > DEVICE /dev/sd[bcde]1 > > on top of your mdadm.conf - it should stop --incremental from picking > up sdb. Assuming that's the cause of the problem. > > Also note, that FC9 might be trying to assemble the array during > initramfs stage (assuming it uses one) and having problems there. I've > never used Fedora so hard to tell for me - but definitely peek there, > particulary at udev and mdadm part of things. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > [root@homepc ~]# mdadm -Esvv /dev/sdb /dev/sdb: Magic : a92b4efc Version : 0.90.02 UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 Creation Time : Thu Dec 15 15:29:36 2005 Raid Level : raid5 Used Dev Size : 245111552 (233.76 GiB 250.99 GB) Array Size : 245111552 (233.76 GiB 250.99 GB) Raid Devices : 2 Total Devices : 3 Preferred Minor : 0 Update Time : Wed Apr 5 13:43:20 2006 State : clean Active Devices : 2 Working Devices : 3 Failed Devices : 0 Spare Devices : 1 Checksum : 2bd59790 - correct Events : 1530654 Layout : left-symmetric Chunk Size : 128K Number Major Minor RaidDevice State this 2 22 0 2 spare 0 0 8 1 0 active sync /dev/sda1 1 1 8 17 1 active sync /dev/sdb1 2 2 22 0 2 spare [root@homepc ~]# I added the DEVICE /dev/sd[bcde]1 to mdadm.conf and that apepars to have fixed the problem. 2 reboots and it worked both times. I also note now that: [root@homepc ~]# mdadm --examine --scan ARRAY /dev/md0 level=raid5 num-devices=4 UUID=50e3173e:b5d2bdb6:7db3576b:644409bb spares=1 [root@homepc ~]# Frankly I don't know enough about the workings of udev and the boot process to be able to get into that. However these two files might mean something to you: [root@homepc ~]# cat /etc/udev/rules.d/64-md-raid.rules # do not edit this file, it will be overwritten on update SUBSYSTEM!="block", GOTO="md_end" ACTION!="add|change", GOTO="md_end" # import data from a raid member and activate it #ENV{ID_FS_TYPE}=="linux_raid_member", IMPORT{program}="/sbin/mdadm --examine --export $tempnode", RUN+="/sbin/mdadm --incremental $env{DEVNAME}" # import data from a raid set KERNEL!="md*", GOTO="md_end" ATTR{md/array_state}=="|clear|inactive", GOTO="md_end" IMPORT{program}="/sbin/mdadm --detail --export $tempnode" ENV{MD_NAME}=="?*", SYMLINK+="disk/by-id/md-name-$env{MD_NAME}" ENV{MD_UUID}=="?*", SYMLINK+="disk/by-id/md-uuid-$env{MD_UUID}" IMPORT{program}="vol_id --export $tempnode" OPTIONS="link_priority=100" ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}" ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}" LABEL="md_end" [root@homepc ~]# AND... [root@homepc ~]# cat /etc/udev/rules.d/70-mdadm.rules # This file causes block devices with Linux RAID (mdadm) signatures to # automatically cause mdadm to be run. # See udev(8) for syntax SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \ RUN+="/sbin/mdadm -I --auto=yes $root/%k" [root@homepc ~]# Thanks for getting me working Nigel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 21:34 ` nterry @ 2008-12-14 22:02 ` Michal Soltys 2008-12-15 21:50 ` Neil Brown 1 sibling, 0 replies; 14+ messages in thread From: Michal Soltys @ 2008-12-14 22:02 UTC (permalink / raw) To: nterry; +Cc: linux-raid nterry wrote: > [root@homepc ~]# mdadm -Esvv /dev/sdb > /dev/sdb: > Magic : a92b4efc > Version : 0.90.02 > UUID : c57d50aa:1b3bcabd:ab04d342:6049b3f1 > Creation Time : Thu Dec 15 15:29:36 2005 > Raid Level : raid5 Yup - that's leftover block from one of your earlier arrays/attempts. It just happens to sit in unused area of /dev/sdb1, but is picked as valid /dev/sdb superblock > > I added the DEVICE /dev/sd[bcde]1 to mdadm.conf and that apepars to have > fixed the problem. 2 reboots and it worked both times. > > I also note now that: > [root@homepc ~]# mdadm --examine --scan > ARRAY /dev/md0 level=raid5 num-devices=4 > UUID=50e3173e:b5d2bdb6:7db3576b:644409bb > spares=1 > [root@homepc ~]# That DEVICE line limits mdadm to only look at sd[bcde]1 as potential array members, so examine shows only current array. If you plan to add more raid arrays later, be sure to change it. Also read below. > Frankly I don't know enough about the workings of udev and the boot > process to be able to get into that. However these two files might mean > something to you: > > [...] > > AND... > > [root@homepc ~]# cat /etc/udev/rules.d/70-mdadm.rules > # This file causes block devices with Linux RAID (mdadm) signatures to > # automatically cause mdadm to be run. > # See udev(8) for syntax > > SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \ > RUN+="/sbin/mdadm -I --auto=yes $root/%k" > [root@homepc ~]# > That's the part of the problem. In human language it means, that during udev event related to device add or change, any device detected as being part of linux raid, will be incrementally assembled. ID_FS_TYPE is exported in one of the earlier rules by vol_id helper tool. So during initial udev run, /dev/sdb is picked up, and forces mdadm later to drop /dev/sdb1 from array during every boot. You can comment it out safely (or just remove whole 70-mdadm.rules file). Then you can remove DEVICE line from mdadm.conf > Thanks for getting me working > np :) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-14 21:34 ` nterry 2008-12-14 22:02 ` Michal Soltys @ 2008-12-15 21:50 ` Neil Brown 2008-12-15 23:07 ` nterry 1 sibling, 1 reply; 14+ messages in thread From: Neil Brown @ 2008-12-15 21:50 UTC (permalink / raw) To: nterry; +Cc: Michal Soltys, linux-raid On Sunday December 14, nigel@nigelterry.net wrote: > > I added the DEVICE /dev/sd[bcde]1 to mdadm.conf and that apepars to have > fixed the problem. 2 reboots and it worked both times. An alternate fix in this case would be mdadm --zero-superblock /dev/sdb to remove the old superblock that is confusing things. NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-15 21:50 ` Neil Brown @ 2008-12-15 23:07 ` nterry 2008-12-16 20:39 ` nterry 0 siblings, 1 reply; 14+ messages in thread From: nterry @ 2008-12-15 23:07 UTC (permalink / raw) To: Neil Brown, linux-raid; +Cc: Michal Soltys Neil Brown wrote: > On Sunday December 14, nigel@nigelterry.net wrote: > >> I added the DEVICE /dev/sd[bcde]1 to mdadm.conf and that apepars to have >> fixed the problem. 2 reboots and it worked both times. >> > > An alternate fix in this case would be > > mdadm --zero-superblock /dev/sdb > > to remove the old superblock that is confusing things. > > NeilBrown > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > That fails as: [root@homepc ~]# mdadm --zero-superblock /dev/sdb mdadm: Couldn't open /dev/sdb for write - not zeroing [root@homepc ~]# I also discovered that /dev/sdc appears to have a superblock which maybe explains why # mdadm --examine --scan throws up three arrays. Trying to zero the superblock on /dev/sdc gives the same error. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Raid 5 Problem 2008-12-15 23:07 ` nterry @ 2008-12-16 20:39 ` nterry 0 siblings, 0 replies; 14+ messages in thread From: nterry @ 2008-12-16 20:39 UTC (permalink / raw) To: Neil Brown, linux-raid; +Cc: Michal Soltys nterry wrote: > Neil Brown wrote: >> >> An alternate fix in this case would be >> >> mdadm --zero-superblock /dev/sdb >> >> to remove the old superblock that is confusing things. >> >> NeilBrown >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > That fails as: > > [root@homepc ~]# mdadm --zero-superblock /dev/sdb > mdadm: Couldn't open /dev/sdb for write - not zeroing > [root@homepc ~]# > > I also discovered that /dev/sdc appears to have a superblock which > maybe explains why # mdadm --examine --scan throws up three arrays. > Trying to zero the superblock on /dev/sdc gives the same error. OK, I solved it, but not in a clean manner. I had to remove /dev/sdb1 from the array before I could zero the superblock on /dev/sdb as below: [root@homepc ~]# mdadm /dev/md0 --fail /dev/sdb1 mdadm: set /dev/sdb1 faulty in /dev/md0 [root@homepc ~]# mdadm /dev/md0 --remove /dev/sdb1 mdadm: hot removed /dev/sdb1 [root@homepc ~]# mdadm --zero-superblock --verbose /dev/sdb [root@homepc ~]# mdadm /dev/md0 --re-add /dev/sdb1 mdadm: re-added /dev/sdb1 [root@homepc ~]# cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdb1[4] sdd1[0] sdc1[3] sde1[1] 735334656 blocks level 5, 128k chunk, algorithm 2 [4/3] [UU_U] [>....................] recovery = 0.0% (204800/245111552) finish=79.6min speed=51200K/sec unused devices: <none> I did this on both sdb and sdc and now I only have the one array when I mdadm --examine --scan --verbose. However is there a better way to do this that doesn't involve a full recovery? I thought --re-add would handle that? ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-12-16 20:39 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-12-14 13:41 Raid 5 Problem nterry 2008-12-14 15:34 ` Michal Soltys 2008-12-14 20:41 ` nterry 2008-12-14 20:53 ` Justin Piszcz 2008-12-14 20:58 ` nterry 2008-12-14 21:03 ` Justin Piszcz 2008-12-14 21:08 ` Nigel J. Terry 2008-12-14 22:55 ` Michal Soltys 2008-12-14 21:14 ` Michal Soltys 2008-12-14 21:34 ` nterry 2008-12-14 22:02 ` Michal Soltys 2008-12-15 21:50 ` Neil Brown 2008-12-15 23:07 ` nterry 2008-12-16 20:39 ` nterry
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).