* md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 @ 2010-08-08 1:27 fibreraid 2010-08-08 8:58 ` Neil Brown 0 siblings, 1 reply; 11+ messages in thread From: fibreraid @ 2010-08-08 1:27 UTC (permalink / raw) To: linux-raid Hi all, I am facing a serious issue with md's on my Ubuntu 10.04 64-bit server. I am using mdadm 3.1.2. The system has 40 drives in it, and there are 10 md devices, which are a combination of RAID 0, 1, 5, 6, and 10 levels. The drives are connected via LSI SAS adapters in external SAS JBODs. When I boot the system, about 50% of the time, the md's will not come up correctly. Instead of md0-md9 being active, some or all will be inactive and there will be new md's like md127, md126, md125, etc. Here is the output of /proc/mdstat when all md's come up correctly: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S) sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0] 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 [10/10] [UUUUUUUUUU] md9 : active raid0 sdao1[1] sdan1[0] 976765440 blocks super 1.2 256k chunks md8 : active raid0 sdam1[1] sdal1[0] 976765440 blocks super 1.2 256k chunks md7 : active raid0 sdak1[1] sdaj1[0] 976765888 blocks super 1.2 4k chunks md6 : active raid0 sdai1[1] sdah1[0] 976765696 blocks super 1.2 128k chunks md5 : active raid0 sdag1[1] sdaf1[0] 976765440 blocks super 1.2 256k chunks md4 : active raid0 sdae1[1] sdad1[0] 976765888 blocks super 1.2 32k chunks md3 : active raid1 sdac1[1] sdab1[0] 195357272 blocks super 1.2 [2/2] [UU] md2 : active raid0 sdaa1[0] sdz1[1] 62490672 blocks super 1.2 4k chunks md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5] sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0] 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 [11/11] [UUUUUUUUUUU] unused devices: <none> -------------------------------------------------------------------------------------------------------------------------- Here are several examples of when they do not come up correctly. Again, I am not making any configuration changes; I just reboot the system and check /proc/mdstat several minutes after it is fully booted. Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md124 : inactive sdam1[1](S) 488382944 blocks super 1.2 md125 : inactive sdag1[1](S) 488382944 blocks super 1.2 md7 : active raid0 sdaj1[0] sdak1[1] 976765888 blocks super 1.2 4k chunks md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S) sdq1[2](S) sdx1[9](S) 1757761512 blocks super 1.2 md9 : active raid0 sdan1[0] sdao1[1] 976765440 blocks super 1.2 256k chunks md6 : inactive sdah1[0](S) 488382944 blocks super 1.2 md4 : inactive sdae1[1](S) 488382944 blocks super 1.2 md8 : inactive sdal1[0](S) 488382944 blocks super 1.2 md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S) sdf1[2](S) sdb1[10](S) 860226027 blocks super 1.2 md5 : inactive sdaf1[0](S) 488382944 blocks super 1.2 md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S) sdy1[10](S) sdv1[7](S) 1757761512 blocks super 1.2 md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S) 860226027 blocks super 1.2 md3 : inactive sdab1[0](S) 195357344 blocks super 1.2 md2 : active raid0 sdaa1[0] sdz1[1] 62490672 blocks super 1.2 4k chunks unused devices: <none> --------------------------------------------------------------------------------------------------------------------------- Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md126 : inactive sdaf1[0](S) 488382944 blocks super 1.2 md127 : inactive sdae1[1](S) 488382944 blocks super 1.2 md9 : active raid0 sdan1[0] sdao1[1] 976765440 blocks super 1.2 256k chunks md7 : active raid0 sdaj1[0] sdak1[1] 976765888 blocks super 1.2 4k chunks md4 : inactive sdad1[0](S) 488382944 blocks super 1.2 md6 : active raid0 sdah1[0] sdai1[1] 976765696 blocks super 1.2 128k chunks md8 : active raid0 sdam1[1] sdal1[0] 976765440 blocks super 1.2 256k chunks md5 : inactive sdag1[1](S) 488382944 blocks super 1.2 md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1] sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S) 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 [10/10] [UUUUUUUUUU] md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8] sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4] 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 [11/11] [UUUUUUUUUUU] md3 : active raid1 sdac1[1] sdab1[0] 195357272 blocks super 1.2 [2/2] [UU] md2 : active raid0 sdz1[1] sdaa1[0] 62490672 blocks super 1.2 4k chunks unused devices: <none> -------------------------------------------------------------------------------------------------------------------------- Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : inactive sdab1[0](S) 195357344 blocks super 1.2 md4 : active raid0 sdad1[0] sdae1[1] 976765888 blocks super 1.2 32k chunks md7 : active raid0 sdak1[1] sdaj1[0] 976765888 blocks super 1.2 4k chunks md8 : active raid0 sdam1[1] sdal1[0] 976765440 blocks super 1.2 256k chunks md6 : active raid0 sdah1[0] sdai1[1] 976765696 blocks super 1.2 128k chunks md9 : active raid0 sdao1[1] sdan1[0] 976765440 blocks super 1.2 256k chunks md5 : active raid0 sdaf1[0] sdag1[1] 976765440 blocks super 1.2 256k chunks md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2] sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0] 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 [11/11] [UUUUUUUUUUU] md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1] sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6] 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 [10/10] [UUUUUUUUUU] md3 : inactive sdac1[1](S) 195357344 blocks super 1.2 md2 : active raid0 sdz1[1] sdaa1[0] 62490672 blocks super 1.2 4k chunks unused devices: <none> My mdadm.conf file is as follows: # mdadm.conf # # Please refer to mdadm.conf(5) for information about this file. # # by default, scan all partitions (/proc/partitions) for MD superblocks. # alternatively, specify devices to scan, using wildcards if desired. DEVICE partitions # auto-create devices with Debian standard permissions CREATE owner=root group=disk mode=0660 auto=yes # automatically tag new arrays as belonging to the local system HOMEHOST <system> # instruct the monitoring daemon where to send mail alerts MAILADDR root # definitions of existing MD arrays # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500 # by mkconf $Id$ Any insight would be greatly appreciated. This is a big problem as it is now. Thank you very much in advance! Best, -Tommy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-08 1:27 md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 fibreraid @ 2010-08-08 8:58 ` Neil Brown 2010-08-08 14:26 ` fibreraid 0 siblings, 1 reply; 11+ messages in thread From: Neil Brown @ 2010-08-08 8:58 UTC (permalink / raw) To: fibreraid@gmail.com; +Cc: linux-raid On Sat, 7 Aug 2010 18:27:58 -0700 "fibreraid@gmail.com" <fibreraid@gmail.com> wrote: > Hi all, > > I am facing a serious issue with md's on my Ubuntu 10.04 64-bit > server. I am using mdadm 3.1.2. The system has 40 drives in it, and > there are 10 md devices, which are a combination of RAID 0, 1, 5, 6, > and 10 levels. The drives are connected via LSI SAS adapters in > external SAS JBODs. > > When I boot the system, about 50% of the time, the md's will not come > up correctly. Instead of md0-md9 being active, some or all will be > inactive and there will be new md's like md127, md126, md125, etc. Sounds like a locking problem - udev is calling "mdadm -I" on each device and might call some in parallel. mdadm needs to serialise things to ensure this sort of confusion doesn't happen. It is possible that this is fixed in the just-released mdadm-3.1.3. If you could test and and see if it makes a difference that would help a lot. Thanks, NeilBrown > > Here is the output of /proc/mdstat when all md's come up correctly: > > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S) > sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0] > 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 > [10/10] [UUUUUUUUUU] > > md9 : active raid0 sdao1[1] sdan1[0] > 976765440 blocks super 1.2 256k chunks > > md8 : active raid0 sdam1[1] sdal1[0] > 976765440 blocks super 1.2 256k chunks > > md7 : active raid0 sdak1[1] sdaj1[0] > 976765888 blocks super 1.2 4k chunks > > md6 : active raid0 sdai1[1] sdah1[0] > 976765696 blocks super 1.2 128k chunks > > md5 : active raid0 sdag1[1] sdaf1[0] > 976765440 blocks super 1.2 256k chunks > > md4 : active raid0 sdae1[1] sdad1[0] > 976765888 blocks super 1.2 32k chunks > > md3 : active raid1 sdac1[1] sdab1[0] > 195357272 blocks super 1.2 [2/2] [UU] > > md2 : active raid0 sdaa1[0] sdz1[1] > 62490672 blocks super 1.2 4k chunks > > md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5] > sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0] > 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 > [11/11] [UUUUUUUUUUU] > > unused devices: <none> > > > -------------------------------------------------------------------------------------------------------------------------- > > > Here are several examples of when they do not come up correctly. > Again, I am not making any configuration changes; I just reboot the > system and check /proc/mdstat several minutes after it is fully > booted. > > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md124 : inactive sdam1[1](S) > 488382944 blocks super 1.2 > > md125 : inactive sdag1[1](S) > 488382944 blocks super 1.2 > > md7 : active raid0 sdaj1[0] sdak1[1] > 976765888 blocks super 1.2 4k chunks > > md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S) > sdq1[2](S) sdx1[9](S) > 1757761512 blocks super 1.2 > > md9 : active raid0 sdan1[0] sdao1[1] > 976765440 blocks super 1.2 256k chunks > > md6 : inactive sdah1[0](S) > 488382944 blocks super 1.2 > > md4 : inactive sdae1[1](S) > 488382944 blocks super 1.2 > > md8 : inactive sdal1[0](S) > 488382944 blocks super 1.2 > > md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S) > sdf1[2](S) sdb1[10](S) > 860226027 blocks super 1.2 > > md5 : inactive sdaf1[0](S) > 488382944 blocks super 1.2 > > md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S) > sdy1[10](S) sdv1[7](S) > 1757761512 blocks super 1.2 > > md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S) > 860226027 blocks super 1.2 > > md3 : inactive sdab1[0](S) > 195357344 blocks super 1.2 > > md2 : active raid0 sdaa1[0] sdz1[1] > 62490672 blocks super 1.2 4k chunks > > unused devices: <none> > > > --------------------------------------------------------------------------------------------------------------------------- > > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md126 : inactive sdaf1[0](S) > 488382944 blocks super 1.2 > > md127 : inactive sdae1[1](S) > 488382944 blocks super 1.2 > > md9 : active raid0 sdan1[0] sdao1[1] > 976765440 blocks super 1.2 256k chunks > > md7 : active raid0 sdaj1[0] sdak1[1] > 976765888 blocks super 1.2 4k chunks > > md4 : inactive sdad1[0](S) > 488382944 blocks super 1.2 > > md6 : active raid0 sdah1[0] sdai1[1] > 976765696 blocks super 1.2 128k chunks > > md8 : active raid0 sdam1[1] sdal1[0] > 976765440 blocks super 1.2 256k chunks > > md5 : inactive sdag1[1](S) > 488382944 blocks super 1.2 > > md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1] > sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S) > 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 > [10/10] [UUUUUUUUUU] > > md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8] > sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4] > 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 > [11/11] [UUUUUUUUUUU] > > md3 : active raid1 sdac1[1] sdab1[0] > 195357272 blocks super 1.2 [2/2] [UU] > > md2 : active raid0 sdz1[1] sdaa1[0] > 62490672 blocks super 1.2 4k chunks > > unused devices: <none> > > > -------------------------------------------------------------------------------------------------------------------------- > > > Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > [raid4] [raid10] > md127 : inactive sdab1[0](S) > 195357344 blocks super 1.2 > > md4 : active raid0 sdad1[0] sdae1[1] > 976765888 blocks super 1.2 32k chunks > > md7 : active raid0 sdak1[1] sdaj1[0] > 976765888 blocks super 1.2 4k chunks > > md8 : active raid0 sdam1[1] sdal1[0] > 976765440 blocks super 1.2 256k chunks > > md6 : active raid0 sdah1[0] sdai1[1] > 976765696 blocks super 1.2 128k chunks > > md9 : active raid0 sdao1[1] sdan1[0] > 976765440 blocks super 1.2 256k chunks > > md5 : active raid0 sdaf1[0] sdag1[1] > 976765440 blocks super 1.2 256k chunks > > md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2] > sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0] > 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 > [11/11] [UUUUUUUUUUU] > > md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1] > sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6] > 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 > [10/10] [UUUUUUUUUU] > > md3 : inactive sdac1[1](S) > 195357344 blocks super 1.2 > > md2 : active raid0 sdz1[1] sdaa1[0] > 62490672 blocks super 1.2 4k chunks > > unused devices: <none> > > > > My mdadm.conf file is as follows: > > > # mdadm.conf > # > # Please refer to mdadm.conf(5) for information about this file. > # > > # by default, scan all partitions (/proc/partitions) for MD superblocks. > # alternatively, specify devices to scan, using wildcards if desired. > DEVICE partitions > > # auto-create devices with Debian standard permissions > CREATE owner=root group=disk mode=0660 auto=yes > > # automatically tag new arrays as belonging to the local system > HOMEHOST <system> > > # instruct the monitoring daemon where to send mail alerts > MAILADDR root > > # definitions of existing MD arrays > > # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500 > # by mkconf $Id$ > > > > > Any insight would be greatly appreciated. This is a big problem as it > is now. Thank you very much in advance! > > Best, > -Tommy > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-08 8:58 ` Neil Brown @ 2010-08-08 14:26 ` fibreraid 2010-08-09 9:00 ` fibreraid 2010-08-09 11:00 ` Neil Brown 0 siblings, 2 replies; 11+ messages in thread From: fibreraid @ 2010-08-08 14:26 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Thank you Neil for the reply and heads-up on 3.1.3. I will test that immediately and report back my findings. One potential issue I noticed is that Ubuntu Lucid's default kernel configuration has CONFIG_MD_AUTODETECT enabled. I thought this feature might conflict with udev, so I've attempted to disable this by adding a parameter to my grub2 bootup: raid=noautodetect. But I am not sure if this is effective. Do you think this kernel setting could also be a problem source? Another method I was contemplating to avoid a potential locking issue is to have udev's mdadm -i command run with watershed, which should in theory serialize it. What do you think? SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \ RUN+="watershed -i mdadm /sbin/mdadm --incremental $env{DEVNAME}" Finally, in your view, is it essential that the underlying partitions used in the md's be the "Linux raid autodetect" type? My partitions at the moment are just plain "Linux". Anyway, I will test mdadm 3.1.3 right now but I wanted to ask for your insight/comments on the above. Thanks! Best, Tommy On Sun, Aug 8, 2010 at 1:58 AM, Neil Brown <neilb@suse.de> wrote: > On Sat, 7 Aug 2010 18:27:58 -0700 > "fibreraid@gmail.com" <fibreraid@gmail.com> wrote: > >> Hi all, >> >> I am facing a serious issue with md's on my Ubuntu 10.04 64-bit >> server. I am using mdadm 3.1.2. The system has 40 drives in it, and >> there are 10 md devices, which are a combination of RAID 0, 1, 5, 6, >> and 10 levels. The drives are connected via LSI SAS adapters in >> external SAS JBODs. >> >> When I boot the system, about 50% of the time, the md's will not come >> up correctly. Instead of md0-md9 being active, some or all will be >> inactive and there will be new md's like md127, md126, md125, etc. > > Sounds like a locking problem - udev is calling "mdadm -I" on each device and > might call some in parallel. mdadm needs to serialise things to ensure this > sort of confusion doesn't happen. > > It is possible that this is fixed in the just-released mdadm-3.1.3. If you > could test and and see if it makes a difference that would help a lot. > > Thanks, > NeilBrown > >> >> Here is the output of /proc/mdstat when all md's come up correctly: >> >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >> [raid4] [raid10] >> md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S) >> sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0] >> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 >> [10/10] [UUUUUUUUUU] >> >> md9 : active raid0 sdao1[1] sdan1[0] >> 976765440 blocks super 1.2 256k chunks >> >> md8 : active raid0 sdam1[1] sdal1[0] >> 976765440 blocks super 1.2 256k chunks >> >> md7 : active raid0 sdak1[1] sdaj1[0] >> 976765888 blocks super 1.2 4k chunks >> >> md6 : active raid0 sdai1[1] sdah1[0] >> 976765696 blocks super 1.2 128k chunks >> >> md5 : active raid0 sdag1[1] sdaf1[0] >> 976765440 blocks super 1.2 256k chunks >> >> md4 : active raid0 sdae1[1] sdad1[0] >> 976765888 blocks super 1.2 32k chunks >> >> md3 : active raid1 sdac1[1] sdab1[0] >> 195357272 blocks super 1.2 [2/2] [UU] >> >> md2 : active raid0 sdaa1[0] sdz1[1] >> 62490672 blocks super 1.2 4k chunks >> >> md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5] >> sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0] >> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 >> [11/11] [UUUUUUUUUUU] >> >> unused devices: <none> >> >> >> -------------------------------------------------------------------------------------------------------------------------- >> >> >> Here are several examples of when they do not come up correctly. >> Again, I am not making any configuration changes; I just reboot the >> system and check /proc/mdstat several minutes after it is fully >> booted. >> >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >> [raid4] [raid10] >> md124 : inactive sdam1[1](S) >> 488382944 blocks super 1.2 >> >> md125 : inactive sdag1[1](S) >> 488382944 blocks super 1.2 >> >> md7 : active raid0 sdaj1[0] sdak1[1] >> 976765888 blocks super 1.2 4k chunks >> >> md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S) >> sdq1[2](S) sdx1[9](S) >> 1757761512 blocks super 1.2 >> >> md9 : active raid0 sdan1[0] sdao1[1] >> 976765440 blocks super 1.2 256k chunks >> >> md6 : inactive sdah1[0](S) >> 488382944 blocks super 1.2 >> >> md4 : inactive sdae1[1](S) >> 488382944 blocks super 1.2 >> >> md8 : inactive sdal1[0](S) >> 488382944 blocks super 1.2 >> >> md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S) >> sdf1[2](S) sdb1[10](S) >> 860226027 blocks super 1.2 >> >> md5 : inactive sdaf1[0](S) >> 488382944 blocks super 1.2 >> >> md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S) >> sdy1[10](S) sdv1[7](S) >> 1757761512 blocks super 1.2 >> >> md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S) >> 860226027 blocks super 1.2 >> >> md3 : inactive sdab1[0](S) >> 195357344 blocks super 1.2 >> >> md2 : active raid0 sdaa1[0] sdz1[1] >> 62490672 blocks super 1.2 4k chunks >> >> unused devices: <none> >> >> >> --------------------------------------------------------------------------------------------------------------------------- >> >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >> [raid4] [raid10] >> md126 : inactive sdaf1[0](S) >> 488382944 blocks super 1.2 >> >> md127 : inactive sdae1[1](S) >> 488382944 blocks super 1.2 >> >> md9 : active raid0 sdan1[0] sdao1[1] >> 976765440 blocks super 1.2 256k chunks >> >> md7 : active raid0 sdaj1[0] sdak1[1] >> 976765888 blocks super 1.2 4k chunks >> >> md4 : inactive sdad1[0](S) >> 488382944 blocks super 1.2 >> >> md6 : active raid0 sdah1[0] sdai1[1] >> 976765696 blocks super 1.2 128k chunks >> >> md8 : active raid0 sdam1[1] sdal1[0] >> 976765440 blocks super 1.2 256k chunks >> >> md5 : inactive sdag1[1](S) >> 488382944 blocks super 1.2 >> >> md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1] >> sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S) >> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 >> [10/10] [UUUUUUUUUU] >> >> md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8] >> sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4] >> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 >> [11/11] [UUUUUUUUUUU] >> >> md3 : active raid1 sdac1[1] sdab1[0] >> 195357272 blocks super 1.2 [2/2] [UU] >> >> md2 : active raid0 sdz1[1] sdaa1[0] >> 62490672 blocks super 1.2 4k chunks >> >> unused devices: <none> >> >> >> -------------------------------------------------------------------------------------------------------------------------- >> >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >> [raid4] [raid10] >> md127 : inactive sdab1[0](S) >> 195357344 blocks super 1.2 >> >> md4 : active raid0 sdad1[0] sdae1[1] >> 976765888 blocks super 1.2 32k chunks >> >> md7 : active raid0 sdak1[1] sdaj1[0] >> 976765888 blocks super 1.2 4k chunks >> >> md8 : active raid0 sdam1[1] sdal1[0] >> 976765440 blocks super 1.2 256k chunks >> >> md6 : active raid0 sdah1[0] sdai1[1] >> 976765696 blocks super 1.2 128k chunks >> >> md9 : active raid0 sdao1[1] sdan1[0] >> 976765440 blocks super 1.2 256k chunks >> >> md5 : active raid0 sdaf1[0] sdag1[1] >> 976765440 blocks super 1.2 256k chunks >> >> md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2] >> sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0] >> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 >> [11/11] [UUUUUUUUUUU] >> >> md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1] >> sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6] >> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 >> [10/10] [UUUUUUUUUU] >> >> md3 : inactive sdac1[1](S) >> 195357344 blocks super 1.2 >> >> md2 : active raid0 sdz1[1] sdaa1[0] >> 62490672 blocks super 1.2 4k chunks >> >> unused devices: <none> >> >> >> >> My mdadm.conf file is as follows: >> >> >> # mdadm.conf >> # >> # Please refer to mdadm.conf(5) for information about this file. >> # >> >> # by default, scan all partitions (/proc/partitions) for MD superblocks. >> # alternatively, specify devices to scan, using wildcards if desired. >> DEVICE partitions >> >> # auto-create devices with Debian standard permissions >> CREATE owner=root group=disk mode=0660 auto=yes >> >> # automatically tag new arrays as belonging to the local system >> HOMEHOST <system> >> >> # instruct the monitoring daemon where to send mail alerts >> MAILADDR root >> >> # definitions of existing MD arrays >> >> # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500 >> # by mkconf $Id$ >> >> >> >> >> Any insight would be greatly appreciated. This is a big problem as it >> is now. Thank you very much in advance! >> >> Best, >> -Tommy >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-08 14:26 ` fibreraid @ 2010-08-09 9:00 ` fibreraid 2010-08-09 10:51 ` Neil Brown 2010-08-09 11:00 ` Neil Brown 1 sibling, 1 reply; 11+ messages in thread From: fibreraid @ 2010-08-09 9:00 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Hi Neil, I tested out mdadm 3.1.3 on my configuration and great news! Problem solved. After 30 reboots, all md's have come up correctly each and every time. I did not have to use watershed either for the mdadm -i command. Thanks for your recommendation! Sincerely, Tommy On Sun, Aug 8, 2010 at 7:26 AM, fibreraid@gmail.com <fibreraid@gmail.com> wrote: > Thank you Neil for the reply and heads-up on 3.1.3. I will test that > immediately and report back my findings. > > One potential issue I noticed is that Ubuntu Lucid's default kernel > configuration has CONFIG_MD_AUTODETECT enabled. I thought this feature > might conflict with udev, so I've attempted to disable this by adding > a parameter to my grub2 bootup: raid=noautodetect. But I am not sure > if this is effective. Do you think this kernel setting could also be a > problem source? > > Another method I was contemplating to avoid a potential locking issue > is to have udev's mdadm -i command run with watershed, which should in > theory serialize it. What do you think? > > SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \ > RUN+="watershed -i mdadm /sbin/mdadm --incremental $env{DEVNAME}" > > Finally, in your view, is it essential that the underlying partitions > used in the md's be the "Linux raid autodetect" type? My partitions at > the moment are just plain "Linux". > > Anyway, I will test mdadm 3.1.3 right now but I wanted to ask for your > insight/comments on the above. Thanks! > > Best, > Tommy > > > > On Sun, Aug 8, 2010 at 1:58 AM, Neil Brown <neilb@suse.de> wrote: >> On Sat, 7 Aug 2010 18:27:58 -0700 >> "fibreraid@gmail.com" <fibreraid@gmail.com> wrote: >> >>> Hi all, >>> >>> I am facing a serious issue with md's on my Ubuntu 10.04 64-bit >>> server. I am using mdadm 3.1.2. The system has 40 drives in it, and >>> there are 10 md devices, which are a combination of RAID 0, 1, 5, 6, >>> and 10 levels. The drives are connected via LSI SAS adapters in >>> external SAS JBODs. >>> >>> When I boot the system, about 50% of the time, the md's will not come >>> up correctly. Instead of md0-md9 being active, some or all will be >>> inactive and there will be new md's like md127, md126, md125, etc. >> >> Sounds like a locking problem - udev is calling "mdadm -I" on each device and >> might call some in parallel. mdadm needs to serialise things to ensure this >> sort of confusion doesn't happen. >> >> It is possible that this is fixed in the just-released mdadm-3.1.3. If you >> could test and and see if it makes a difference that would help a lot. >> >> Thanks, >> NeilBrown >> >>> >>> Here is the output of /proc/mdstat when all md's come up correctly: >>> >>> >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >>> [raid4] [raid10] >>> md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S) >>> sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0] >>> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 >>> [10/10] [UUUUUUUUUU] >>> >>> md9 : active raid0 sdao1[1] sdan1[0] >>> 976765440 blocks super 1.2 256k chunks >>> >>> md8 : active raid0 sdam1[1] sdal1[0] >>> 976765440 blocks super 1.2 256k chunks >>> >>> md7 : active raid0 sdak1[1] sdaj1[0] >>> 976765888 blocks super 1.2 4k chunks >>> >>> md6 : active raid0 sdai1[1] sdah1[0] >>> 976765696 blocks super 1.2 128k chunks >>> >>> md5 : active raid0 sdag1[1] sdaf1[0] >>> 976765440 blocks super 1.2 256k chunks >>> >>> md4 : active raid0 sdae1[1] sdad1[0] >>> 976765888 blocks super 1.2 32k chunks >>> >>> md3 : active raid1 sdac1[1] sdab1[0] >>> 195357272 blocks super 1.2 [2/2] [UU] >>> >>> md2 : active raid0 sdaa1[0] sdz1[1] >>> 62490672 blocks super 1.2 4k chunks >>> >>> md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5] >>> sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0] >>> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 >>> [11/11] [UUUUUUUUUUU] >>> >>> unused devices: <none> >>> >>> >>> -------------------------------------------------------------------------------------------------------------------------- >>> >>> >>> Here are several examples of when they do not come up correctly. >>> Again, I am not making any configuration changes; I just reboot the >>> system and check /proc/mdstat several minutes after it is fully >>> booted. >>> >>> >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >>> [raid4] [raid10] >>> md124 : inactive sdam1[1](S) >>> 488382944 blocks super 1.2 >>> >>> md125 : inactive sdag1[1](S) >>> 488382944 blocks super 1.2 >>> >>> md7 : active raid0 sdaj1[0] sdak1[1] >>> 976765888 blocks super 1.2 4k chunks >>> >>> md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S) >>> sdq1[2](S) sdx1[9](S) >>> 1757761512 blocks super 1.2 >>> >>> md9 : active raid0 sdan1[0] sdao1[1] >>> 976765440 blocks super 1.2 256k chunks >>> >>> md6 : inactive sdah1[0](S) >>> 488382944 blocks super 1.2 >>> >>> md4 : inactive sdae1[1](S) >>> 488382944 blocks super 1.2 >>> >>> md8 : inactive sdal1[0](S) >>> 488382944 blocks super 1.2 >>> >>> md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S) >>> sdf1[2](S) sdb1[10](S) >>> 860226027 blocks super 1.2 >>> >>> md5 : inactive sdaf1[0](S) >>> 488382944 blocks super 1.2 >>> >>> md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S) >>> sdy1[10](S) sdv1[7](S) >>> 1757761512 blocks super 1.2 >>> >>> md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S) >>> 860226027 blocks super 1.2 >>> >>> md3 : inactive sdab1[0](S) >>> 195357344 blocks super 1.2 >>> >>> md2 : active raid0 sdaa1[0] sdz1[1] >>> 62490672 blocks super 1.2 4k chunks >>> >>> unused devices: <none> >>> >>> >>> --------------------------------------------------------------------------------------------------------------------------- >>> >>> >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >>> [raid4] [raid10] >>> md126 : inactive sdaf1[0](S) >>> 488382944 blocks super 1.2 >>> >>> md127 : inactive sdae1[1](S) >>> 488382944 blocks super 1.2 >>> >>> md9 : active raid0 sdan1[0] sdao1[1] >>> 976765440 blocks super 1.2 256k chunks >>> >>> md7 : active raid0 sdaj1[0] sdak1[1] >>> 976765888 blocks super 1.2 4k chunks >>> >>> md4 : inactive sdad1[0](S) >>> 488382944 blocks super 1.2 >>> >>> md6 : active raid0 sdah1[0] sdai1[1] >>> 976765696 blocks super 1.2 128k chunks >>> >>> md8 : active raid0 sdam1[1] sdal1[0] >>> 976765440 blocks super 1.2 256k chunks >>> >>> md5 : inactive sdag1[1](S) >>> 488382944 blocks super 1.2 >>> >>> md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1] >>> sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S) >>> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 >>> [10/10] [UUUUUUUUUU] >>> >>> md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8] >>> sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4] >>> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 >>> [11/11] [UUUUUUUUUUU] >>> >>> md3 : active raid1 sdac1[1] sdab1[0] >>> 195357272 blocks super 1.2 [2/2] [UU] >>> >>> md2 : active raid0 sdz1[1] sdaa1[0] >>> 62490672 blocks super 1.2 4k chunks >>> >>> unused devices: <none> >>> >>> >>> -------------------------------------------------------------------------------------------------------------------------- >>> >>> >>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >>> [raid4] [raid10] >>> md127 : inactive sdab1[0](S) >>> 195357344 blocks super 1.2 >>> >>> md4 : active raid0 sdad1[0] sdae1[1] >>> 976765888 blocks super 1.2 32k chunks >>> >>> md7 : active raid0 sdak1[1] sdaj1[0] >>> 976765888 blocks super 1.2 4k chunks >>> >>> md8 : active raid0 sdam1[1] sdal1[0] >>> 976765440 blocks super 1.2 256k chunks >>> >>> md6 : active raid0 sdah1[0] sdai1[1] >>> 976765696 blocks super 1.2 128k chunks >>> >>> md9 : active raid0 sdao1[1] sdan1[0] >>> 976765440 blocks super 1.2 256k chunks >>> >>> md5 : active raid0 sdaf1[0] sdag1[1] >>> 976765440 blocks super 1.2 256k chunks >>> >>> md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2] >>> sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0] >>> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 >>> [11/11] [UUUUUUUUUUU] >>> >>> md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1] >>> sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6] >>> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 >>> [10/10] [UUUUUUUUUU] >>> >>> md3 : inactive sdac1[1](S) >>> 195357344 blocks super 1.2 >>> >>> md2 : active raid0 sdz1[1] sdaa1[0] >>> 62490672 blocks super 1.2 4k chunks >>> >>> unused devices: <none> >>> >>> >>> >>> My mdadm.conf file is as follows: >>> >>> >>> # mdadm.conf >>> # >>> # Please refer to mdadm.conf(5) for information about this file. >>> # >>> >>> # by default, scan all partitions (/proc/partitions) for MD superblocks. >>> # alternatively, specify devices to scan, using wildcards if desired. >>> DEVICE partitions >>> >>> # auto-create devices with Debian standard permissions >>> CREATE owner=root group=disk mode=0660 auto=yes >>> >>> # automatically tag new arrays as belonging to the local system >>> HOMEHOST <system> >>> >>> # instruct the monitoring daemon where to send mail alerts >>> MAILADDR root >>> >>> # definitions of existing MD arrays >>> >>> # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500 >>> # by mkconf $Id$ >>> >>> >>> >>> >>> Any insight would be greatly appreciated. This is a big problem as it >>> is now. Thank you very much in advance! >>> >>> Best, >>> -Tommy >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-09 9:00 ` fibreraid @ 2010-08-09 10:51 ` Neil Brown 0 siblings, 0 replies; 11+ messages in thread From: Neil Brown @ 2010-08-09 10:51 UTC (permalink / raw) To: fibreraid@gmail.com; +Cc: linux-raid On Mon, 9 Aug 2010 02:00:05 -0700 "fibreraid@gmail.com" <fibreraid@gmail.com> wrote: > Hi Neil, > > I tested out mdadm 3.1.3 on my configuration and great news! Problem > solved. After 30 reboots, all md's have come up correctly each and > every time. I did not have to use watershed either for the mdadm -i > command. Thanks for your recommendation! > Thanks for the confirmation - I hoped it 3.1.3 would fix it but wasn't completely confident. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-08 14:26 ` fibreraid 2010-08-09 9:00 ` fibreraid @ 2010-08-09 11:00 ` Neil Brown 2010-08-09 11:58 ` fibreraid 1 sibling, 1 reply; 11+ messages in thread From: Neil Brown @ 2010-08-09 11:00 UTC (permalink / raw) To: fibreraid@gmail.com; +Cc: linux-raid On Sun, 8 Aug 2010 07:26:59 -0700 "fibreraid@gmail.com" <fibreraid@gmail.com> wrote: > Thank you Neil for the reply and heads-up on 3.1.3. I will test that > immediately and report back my findings. > > One potential issue I noticed is that Ubuntu Lucid's default kernel > configuration has CONFIG_MD_AUTODETECT enabled. I thought this feature > might conflict with udev, so I've attempted to disable this by adding > a parameter to my grub2 bootup: raid=noautodetect. But I am not sure > if this is effective. Do you think this kernel setting could also be a > problem source? If you don't use "Linux raid autodetect" partition types (which you say below that you don't) this CONFIG setting will have no effect at all. > > Another method I was contemplating to avoid a potential locking issue > is to have udev's mdadm -i command run with watershed, which should in > theory serialize it. What do you think? > > SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \ > RUN+="watershed -i mdadm /sbin/mdadm --incremental $env{DEVNAME}" I haven't come across watershed before. I couldn't easily find out much about it on the web, so I cannot say what effect it would have. My guess from what little I have read is 'none'. > > Finally, in your view, is it essential that the underlying partitions > used in the md's be the "Linux raid autodetect" type? My partitions at > the moment are just plain "Linux". I actually recommend "Non-FS data" (0xDA) as 'Linux' might make some tools think there is a filesystem there even though there isn't. But 'Linux' is mostly fine. I avoid "Linux raid autodetect" as it enable the MD_AUTODETECT functionality which I don't like. NeilBrown > > Anyway, I will test mdadm 3.1.3 right now but I wanted to ask for your > insight/comments on the above. Thanks! > > Best, > Tommy > > > > On Sun, Aug 8, 2010 at 1:58 AM, Neil Brown <neilb@suse.de> wrote: > > On Sat, 7 Aug 2010 18:27:58 -0700 > > "fibreraid@gmail.com" <fibreraid@gmail.com> wrote: > > > >> Hi all, > >> > >> I am facing a serious issue with md's on my Ubuntu 10.04 64-bit > >> server. I am using mdadm 3.1.2. The system has 40 drives in it, and > >> there are 10 md devices, which are a combination of RAID 0, 1, 5, 6, > >> and 10 levels. The drives are connected via LSI SAS adapters in > >> external SAS JBODs. > >> > >> When I boot the system, about 50% of the time, the md's will not come > >> up correctly. Instead of md0-md9 being active, some or all will be > >> inactive and there will be new md's like md127, md126, md125, etc. > > > > Sounds like a locking problem - udev is calling "mdadm -I" on each device and > > might call some in parallel. mdadm needs to serialise things to ensure this > > sort of confusion doesn't happen. > > > > It is possible that this is fixed in the just-released mdadm-3.1.3. If you > > could test and and see if it makes a difference that would help a lot. > > > > Thanks, > > NeilBrown > > > >> > >> Here is the output of /proc/mdstat when all md's come up correctly: > >> > >> > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >> [raid4] [raid10] > >> md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S) > >> sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0] > >> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 > >> [10/10] [UUUUUUUUUU] > >> > >> md9 : active raid0 sdao1[1] sdan1[0] > >> 976765440 blocks super 1.2 256k chunks > >> > >> md8 : active raid0 sdam1[1] sdal1[0] > >> 976765440 blocks super 1.2 256k chunks > >> > >> md7 : active raid0 sdak1[1] sdaj1[0] > >> 976765888 blocks super 1.2 4k chunks > >> > >> md6 : active raid0 sdai1[1] sdah1[0] > >> 976765696 blocks super 1.2 128k chunks > >> > >> md5 : active raid0 sdag1[1] sdaf1[0] > >> 976765440 blocks super 1.2 256k chunks > >> > >> md4 : active raid0 sdae1[1] sdad1[0] > >> 976765888 blocks super 1.2 32k chunks > >> > >> md3 : active raid1 sdac1[1] sdab1[0] > >> 195357272 blocks super 1.2 [2/2] [UU] > >> > >> md2 : active raid0 sdaa1[0] sdz1[1] > >> 62490672 blocks super 1.2 4k chunks > >> > >> md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5] > >> sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0] > >> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 > >> [11/11] [UUUUUUUUUUU] > >> > >> unused devices: <none> > >> > >> > >> -------------------------------------------------------------------------------------------------------------------------- > >> > >> > >> Here are several examples of when they do not come up correctly. > >> Again, I am not making any configuration changes; I just reboot the > >> system and check /proc/mdstat several minutes after it is fully > >> booted. > >> > >> > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >> [raid4] [raid10] > >> md124 : inactive sdam1[1](S) > >> 488382944 blocks super 1.2 > >> > >> md125 : inactive sdag1[1](S) > >> 488382944 blocks super 1.2 > >> > >> md7 : active raid0 sdaj1[0] sdak1[1] > >> 976765888 blocks super 1.2 4k chunks > >> > >> md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S) > >> sdq1[2](S) sdx1[9](S) > >> 1757761512 blocks super 1.2 > >> > >> md9 : active raid0 sdan1[0] sdao1[1] > >> 976765440 blocks super 1.2 256k chunks > >> > >> md6 : inactive sdah1[0](S) > >> 488382944 blocks super 1.2 > >> > >> md4 : inactive sdae1[1](S) > >> 488382944 blocks super 1.2 > >> > >> md8 : inactive sdal1[0](S) > >> 488382944 blocks super 1.2 > >> > >> md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S) > >> sdf1[2](S) sdb1[10](S) > >> 860226027 blocks super 1.2 > >> > >> md5 : inactive sdaf1[0](S) > >> 488382944 blocks super 1.2 > >> > >> md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S) > >> sdy1[10](S) sdv1[7](S) > >> 1757761512 blocks super 1.2 > >> > >> md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S) > >> 860226027 blocks super 1.2 > >> > >> md3 : inactive sdab1[0](S) > >> 195357344 blocks super 1.2 > >> > >> md2 : active raid0 sdaa1[0] sdz1[1] > >> 62490672 blocks super 1.2 4k chunks > >> > >> unused devices: <none> > >> > >> > >> --------------------------------------------------------------------------------------------------------------------------- > >> > >> > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >> [raid4] [raid10] > >> md126 : inactive sdaf1[0](S) > >> 488382944 blocks super 1.2 > >> > >> md127 : inactive sdae1[1](S) > >> 488382944 blocks super 1.2 > >> > >> md9 : active raid0 sdan1[0] sdao1[1] > >> 976765440 blocks super 1.2 256k chunks > >> > >> md7 : active raid0 sdaj1[0] sdak1[1] > >> 976765888 blocks super 1.2 4k chunks > >> > >> md4 : inactive sdad1[0](S) > >> 488382944 blocks super 1.2 > >> > >> md6 : active raid0 sdah1[0] sdai1[1] > >> 976765696 blocks super 1.2 128k chunks > >> > >> md8 : active raid0 sdam1[1] sdal1[0] > >> 976765440 blocks super 1.2 256k chunks > >> > >> md5 : inactive sdag1[1](S) > >> 488382944 blocks super 1.2 > >> > >> md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1] > >> sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S) > >> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 > >> [10/10] [UUUUUUUUUU] > >> > >> md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8] > >> sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4] > >> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 > >> [11/11] [UUUUUUUUUUU] > >> > >> md3 : active raid1 sdac1[1] sdab1[0] > >> 195357272 blocks super 1.2 [2/2] [UU] > >> > >> md2 : active raid0 sdz1[1] sdaa1[0] > >> 62490672 blocks super 1.2 4k chunks > >> > >> unused devices: <none> > >> > >> > >> -------------------------------------------------------------------------------------------------------------------------- > >> > >> > >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] > >> [raid4] [raid10] > >> md127 : inactive sdab1[0](S) > >> 195357344 blocks super 1.2 > >> > >> md4 : active raid0 sdad1[0] sdae1[1] > >> 976765888 blocks super 1.2 32k chunks > >> > >> md7 : active raid0 sdak1[1] sdaj1[0] > >> 976765888 blocks super 1.2 4k chunks > >> > >> md8 : active raid0 sdam1[1] sdal1[0] > >> 976765440 blocks super 1.2 256k chunks > >> > >> md6 : active raid0 sdah1[0] sdai1[1] > >> 976765696 blocks super 1.2 128k chunks > >> > >> md9 : active raid0 sdao1[1] sdan1[0] > >> 976765440 blocks super 1.2 256k chunks > >> > >> md5 : active raid0 sdaf1[0] sdag1[1] > >> 976765440 blocks super 1.2 256k chunks > >> > >> md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2] > >> sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0] > >> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 > >> [11/11] [UUUUUUUUUUU] > >> > >> md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1] > >> sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6] > >> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 > >> [10/10] [UUUUUUUUUU] > >> > >> md3 : inactive sdac1[1](S) > >> 195357344 blocks super 1.2 > >> > >> md2 : active raid0 sdz1[1] sdaa1[0] > >> 62490672 blocks super 1.2 4k chunks > >> > >> unused devices: <none> > >> > >> > >> > >> My mdadm.conf file is as follows: > >> > >> > >> # mdadm.conf > >> # > >> # Please refer to mdadm.conf(5) for information about this file. > >> # > >> > >> # by default, scan all partitions (/proc/partitions) for MD superblocks. > >> # alternatively, specify devices to scan, using wildcards if desired. > >> DEVICE partitions > >> > >> # auto-create devices with Debian standard permissions > >> CREATE owner=root group=disk mode=0660 auto=yes > >> > >> # automatically tag new arrays as belonging to the local system > >> HOMEHOST <system> > >> > >> # instruct the monitoring daemon where to send mail alerts > >> MAILADDR root > >> > >> # definitions of existing MD arrays > >> > >> # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500 > >> # by mkconf $Id$ > >> > >> > >> > >> > >> Any insight would be greatly appreciated. This is a big problem as it > >> is now. Thank you very much in advance! > >> > >> Best, > >> -Tommy > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-09 11:00 ` Neil Brown @ 2010-08-09 11:58 ` fibreraid 2010-08-11 5:17 ` Dan Williams 0 siblings, 1 reply; 11+ messages in thread From: fibreraid @ 2010-08-09 11:58 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Hi Neil, I may have spoken a bit too soon. It seems that while the md's are coming up successfully, on occasion, hot-spares are not coming up associated with their proper md's. As a result, what was a RAID 5 md with one hot-spare will on occasion come up as a RAID 5 md with no hot-spare. Any ideas on this one? -Tommy On Mon, Aug 9, 2010 at 4:00 AM, Neil Brown <neilb@suse.de> wrote: > On Sun, 8 Aug 2010 07:26:59 -0700 > "fibreraid@gmail.com" <fibreraid@gmail.com> wrote: > >> Thank you Neil for the reply and heads-up on 3.1.3. I will test that >> immediately and report back my findings. >> >> One potential issue I noticed is that Ubuntu Lucid's default kernel >> configuration has CONFIG_MD_AUTODETECT enabled. I thought this feature >> might conflict with udev, so I've attempted to disable this by adding >> a parameter to my grub2 bootup: raid=noautodetect. But I am not sure >> if this is effective. Do you think this kernel setting could also be a >> problem source? > > If you don't use "Linux raid autodetect" partition types (which you say below > that you don't) this CONFIG setting will have no effect at all. > >> >> Another method I was contemplating to avoid a potential locking issue >> is to have udev's mdadm -i command run with watershed, which should in >> theory serialize it. What do you think? >> >> SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \ >> RUN+="watershed -i mdadm /sbin/mdadm --incremental $env{DEVNAME}" > > I haven't come across watershed before. I couldn't easily find out much > about it on the web, so I cannot say what effect it would have. My guess > from what little I have read is 'none'. > >> >> Finally, in your view, is it essential that the underlying partitions >> used in the md's be the "Linux raid autodetect" type? My partitions at >> the moment are just plain "Linux". > > I actually recommend "Non-FS data" (0xDA) as 'Linux' might make some tools > think there is a filesystem there even though there isn't. But 'Linux' is > mostly fine. I avoid "Linux raid autodetect" as it enable the MD_AUTODETECT > functionality which I don't like. > > NeilBrown > > >> >> Anyway, I will test mdadm 3.1.3 right now but I wanted to ask for your >> insight/comments on the above. Thanks! >> >> Best, >> Tommy >> >> >> >> On Sun, Aug 8, 2010 at 1:58 AM, Neil Brown <neilb@suse.de> wrote: >> > On Sat, 7 Aug 2010 18:27:58 -0700 >> > "fibreraid@gmail.com" <fibreraid@gmail.com> wrote: >> > >> >> Hi all, >> >> >> >> I am facing a serious issue with md's on my Ubuntu 10.04 64-bit >> >> server. I am using mdadm 3.1.2. The system has 40 drives in it, and >> >> there are 10 md devices, which are a combination of RAID 0, 1, 5, 6, >> >> and 10 levels. The drives are connected via LSI SAS adapters in >> >> external SAS JBODs. >> >> >> >> When I boot the system, about 50% of the time, the md's will not come >> >> up correctly. Instead of md0-md9 being active, some or all will be >> >> inactive and there will be new md's like md127, md126, md125, etc. >> > >> > Sounds like a locking problem - udev is calling "mdadm -I" on each device and >> > might call some in parallel. mdadm needs to serialise things to ensure this >> > sort of confusion doesn't happen. >> > >> > It is possible that this is fixed in the just-released mdadm-3.1.3. If you >> > could test and and see if it makes a difference that would help a lot. >> > >> > Thanks, >> > NeilBrown >> > >> >> >> >> Here is the output of /proc/mdstat when all md's come up correctly: >> >> >> >> >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >> >> [raid4] [raid10] >> >> md0 : active raid6 sdj1[6] sdk1[7] sdf1[2] sdb1[10] sdg1[3] sdl1[8](S) >> >> sdh1[4] sdm1[9] sde1[1] sdi1[12](S) sdc1[11] sdd1[0] >> >> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 >> >> [10/10] [UUUUUUUUUU] >> >> >> >> md9 : active raid0 sdao1[1] sdan1[0] >> >> 976765440 blocks super 1.2 256k chunks >> >> >> >> md8 : active raid0 sdam1[1] sdal1[0] >> >> 976765440 blocks super 1.2 256k chunks >> >> >> >> md7 : active raid0 sdak1[1] sdaj1[0] >> >> 976765888 blocks super 1.2 4k chunks >> >> >> >> md6 : active raid0 sdai1[1] sdah1[0] >> >> 976765696 blocks super 1.2 128k chunks >> >> >> >> md5 : active raid0 sdag1[1] sdaf1[0] >> >> 976765440 blocks super 1.2 256k chunks >> >> >> >> md4 : active raid0 sdae1[1] sdad1[0] >> >> 976765888 blocks super 1.2 32k chunks >> >> >> >> md3 : active raid1 sdac1[1] sdab1[0] >> >> 195357272 blocks super 1.2 [2/2] [UU] >> >> >> >> md2 : active raid0 sdaa1[0] sdz1[1] >> >> 62490672 blocks super 1.2 4k chunks >> >> >> >> md1 : active raid5 sdy1[10] sdx1[9] sdw1[8] sdv1[7] sdu1[6] sdt1[5] >> >> sds1[4] sdr1[3] sdq1[2] sdp1[11](S) sdo1[1] sdn1[0] >> >> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 >> >> [11/11] [UUUUUUUUUUU] >> >> >> >> unused devices: <none> >> >> >> >> >> >> -------------------------------------------------------------------------------------------------------------------------- >> >> >> >> >> >> Here are several examples of when they do not come up correctly. >> >> Again, I am not making any configuration changes; I just reboot the >> >> system and check /proc/mdstat several minutes after it is fully >> >> booted. >> >> >> >> >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >> >> [raid4] [raid10] >> >> md124 : inactive sdam1[1](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md125 : inactive sdag1[1](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md7 : active raid0 sdaj1[0] sdak1[1] >> >> 976765888 blocks super 1.2 4k chunks >> >> >> >> md126 : inactive sdw1[8](S) sdn1[0](S) sdo1[1](S) sdu1[6](S) >> >> sdq1[2](S) sdx1[9](S) >> >> 1757761512 blocks super 1.2 >> >> >> >> md9 : active raid0 sdan1[0] sdao1[1] >> >> 976765440 blocks super 1.2 256k chunks >> >> >> >> md6 : inactive sdah1[0](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md4 : inactive sdae1[1](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md8 : inactive sdal1[0](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md127 : inactive sdg1[3](S) sdl1[8](S) sdc1[11](S) sdi1[12](S) >> >> sdf1[2](S) sdb1[10](S) >> >> 860226027 blocks super 1.2 >> >> >> >> md5 : inactive sdaf1[0](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md1 : inactive sdr1[3](S) sdp1[11](S) sdt1[5](S) sds1[4](S) >> >> sdy1[10](S) sdv1[7](S) >> >> 1757761512 blocks super 1.2 >> >> >> >> md0 : inactive sde1[1](S) sdh1[4](S) sdm1[9](S) sdj1[6](S) sdd1[0](S) sdk1[7](S) >> >> 860226027 blocks super 1.2 >> >> >> >> md3 : inactive sdab1[0](S) >> >> 195357344 blocks super 1.2 >> >> >> >> md2 : active raid0 sdaa1[0] sdz1[1] >> >> 62490672 blocks super 1.2 4k chunks >> >> >> >> unused devices: <none> >> >> >> >> >> >> --------------------------------------------------------------------------------------------------------------------------- >> >> >> >> >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >> >> [raid4] [raid10] >> >> md126 : inactive sdaf1[0](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md127 : inactive sdae1[1](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md9 : active raid0 sdan1[0] sdao1[1] >> >> 976765440 blocks super 1.2 256k chunks >> >> >> >> md7 : active raid0 sdaj1[0] sdak1[1] >> >> 976765888 blocks super 1.2 4k chunks >> >> >> >> md4 : inactive sdad1[0](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md6 : active raid0 sdah1[0] sdai1[1] >> >> 976765696 blocks super 1.2 128k chunks >> >> >> >> md8 : active raid0 sdam1[1] sdal1[0] >> >> 976765440 blocks super 1.2 256k chunks >> >> >> >> md5 : inactive sdag1[1](S) >> >> 488382944 blocks super 1.2 >> >> >> >> md0 : active raid6 sdc1[11] sdd1[0] sdh1[4] sdf1[2] sdm1[9] sde1[1] >> >> sdb1[10] sdg1[3] sdl1[8](S) sdj1[6] sdk1[7] sdi1[12](S) >> >> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 >> >> [10/10] [UUUUUUUUUU] >> >> >> >> md1 : active raid5 sdq1[2] sdy1[10] sdv1[7] sdn1[0] sdt1[5] sdw1[8] >> >> sdp1[11](S) sdr1[3] sdu1[6] sdx1[9] sdo1[1] sds1[4] >> >> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 >> >> [11/11] [UUUUUUUUUUU] >> >> >> >> md3 : active raid1 sdac1[1] sdab1[0] >> >> 195357272 blocks super 1.2 [2/2] [UU] >> >> >> >> md2 : active raid0 sdz1[1] sdaa1[0] >> >> 62490672 blocks super 1.2 4k chunks >> >> >> >> unused devices: <none> >> >> >> >> >> >> -------------------------------------------------------------------------------------------------------------------------- >> >> >> >> >> >> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] >> >> [raid4] [raid10] >> >> md127 : inactive sdab1[0](S) >> >> 195357344 blocks super 1.2 >> >> >> >> md4 : active raid0 sdad1[0] sdae1[1] >> >> 976765888 blocks super 1.2 32k chunks >> >> >> >> md7 : active raid0 sdak1[1] sdaj1[0] >> >> 976765888 blocks super 1.2 4k chunks >> >> >> >> md8 : active raid0 sdam1[1] sdal1[0] >> >> 976765440 blocks super 1.2 256k chunks >> >> >> >> md6 : active raid0 sdah1[0] sdai1[1] >> >> 976765696 blocks super 1.2 128k chunks >> >> >> >> md9 : active raid0 sdao1[1] sdan1[0] >> >> 976765440 blocks super 1.2 256k chunks >> >> >> >> md5 : active raid0 sdaf1[0] sdag1[1] >> >> 976765440 blocks super 1.2 256k chunks >> >> >> >> md1 : active raid5 sdy1[10] sdv1[7] sdu1[6] sds1[4] sdq1[2] >> >> sdp1[11](S) sdt1[5] sdo1[1] sdx1[9] sdr1[3] sdw1[8] sdn1[0] >> >> 2929601120 blocks super 1.2 level 5, 16k chunk, algorithm 2 >> >> [11/11] [UUUUUUUUUUU] >> >> >> >> md0 : active raid6 sdl1[8](S) sdd1[0] sdc1[11] sdg1[3] sdk1[7] sde1[1] >> >> sdm1[9] sdb1[10] sdi1[12](S) sdh1[4] sdf1[2] sdj1[6] >> >> 1146967040 blocks super 1.2 level 6, 128k chunk, algorithm 2 >> >> [10/10] [UUUUUUUUUU] >> >> >> >> md3 : inactive sdac1[1](S) >> >> 195357344 blocks super 1.2 >> >> >> >> md2 : active raid0 sdz1[1] sdaa1[0] >> >> 62490672 blocks super 1.2 4k chunks >> >> >> >> unused devices: <none> >> >> >> >> >> >> >> >> My mdadm.conf file is as follows: >> >> >> >> >> >> # mdadm.conf >> >> # >> >> # Please refer to mdadm.conf(5) for information about this file. >> >> # >> >> >> >> # by default, scan all partitions (/proc/partitions) for MD superblocks. >> >> # alternatively, specify devices to scan, using wildcards if desired. >> >> DEVICE partitions >> >> >> >> # auto-create devices with Debian standard permissions >> >> CREATE owner=root group=disk mode=0660 auto=yes >> >> >> >> # automatically tag new arrays as belonging to the local system >> >> HOMEHOST <system> >> >> >> >> # instruct the monitoring daemon where to send mail alerts >> >> MAILADDR root >> >> >> >> # definitions of existing MD arrays >> >> >> >> # This file was auto-generated on Sun, 13 Jul 2008 20:42:57 -0500 >> >> # by mkconf $Id$ >> >> >> >> >> >> >> >> >> >> Any insight would be greatly appreciated. This is a big problem as it >> >> is now. Thank you very much in advance! >> >> >> >> Best, >> >> -Tommy >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-09 11:58 ` fibreraid @ 2010-08-11 5:17 ` Dan Williams 2010-08-12 1:43 ` Neil Brown 0 siblings, 1 reply; 11+ messages in thread From: Dan Williams @ 2010-08-11 5:17 UTC (permalink / raw) To: fibreraid@gmail.com; +Cc: Neil Brown, linux-raid On Mon, Aug 9, 2010 at 4:58 AM, fibreraid@gmail.com <fibreraid@gmail.com> wrote: > Hi Neil, > > I may have spoken a bit too soon. It seems that while the md's are > coming up successfully, on occasion, hot-spares are not coming up > associated with their proper md's. As a result, what was a RAID 5 md > with one hot-spare will on occasion come up as a RAID 5 md with no > hot-spare. > > Any ideas on this one? > Is this new behavior only seen with 3.1.3, i.e when it worked with 3.1.2 did the hot spares always arrive correctly? I suspect this is a result of the new behavior of -I to not add devices to a running array without the -R parameter, but you don't want to make this the default for udev otherwise your arrays will always come up degraded. We could allow disks to be added to active non-degraded arrays, but that still has the possibility of letting a stale device take the place of a fresh hot spare (the whole point of changing the behavior in the first place). So as far as I can see we need to query the other disks in the active array and permit the disk to be re-added to an active array when it is demonstrably a hot spare (or -R is specified). -- Dan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-11 5:17 ` Dan Williams @ 2010-08-12 1:43 ` Neil Brown 2010-08-14 16:57 ` fibreraid 0 siblings, 1 reply; 11+ messages in thread From: Neil Brown @ 2010-08-12 1:43 UTC (permalink / raw) To: Dan Williams; +Cc: fibreraid@gmail.com, linux-raid On Tue, 10 Aug 2010 22:17:19 -0700 Dan Williams <dan.j.williams@intel.com> wrote: > On Mon, Aug 9, 2010 at 4:58 AM, fibreraid@gmail.com <fibreraid@gmail.com> wrote: > > Hi Neil, > > > > I may have spoken a bit too soon. It seems that while the md's are > > coming up successfully, on occasion, hot-spares are not coming up > > associated with their proper md's. As a result, what was a RAID 5 md > > with one hot-spare will on occasion come up as a RAID 5 md with no > > hot-spare. > > > > Any ideas on this one? > > > > Is this new behavior only seen with 3.1.3, i.e when it worked with > 3.1.2 did the hot spares always arrive correctly? I suspect this is a > result of the new behavior of -I to not add devices to a running array > without the -R parameter, but you don't want to make this the default > for udev otherwise your arrays will always come up degraded. > > We could allow disks to be added to active non-degraded arrays, but > that still has the possibility of letting a stale device take the > place of a fresh hot spare (the whole point of changing the behavior > in the first place). So as far as I can see we need to query the > other disks in the active array and permit the disk to be re-added to > an active array when it is demonstrably a hot spare (or -R is > specified). > > -- > Dan Arg... another regression. Thanks for the report and the analysis. Here is the fix. NeilBrown From ef83fe7cba7355d3da330325e416747b0696baef Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@suse.de> Date: Thu, 12 Aug 2010 11:41:41 +1000 Subject: [PATCH] Allow --incremental to add spares to an array. Commit 3a6ec29ad56 stopped us from adding apparently-working devices to an active array with --incremental as there is a good chance that they are actually old/failed devices. Unfortunately it also stopped spares from being added to an active array, which is wrong. This patch refines the test to be more careful. Reported-by: <fibreraid@gmail.com> Analysed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de> diff --git a/Incremental.c b/Incremental.c index e4b6196..4d3d181 100644 --- a/Incremental.c +++ b/Incremental.c @@ -370,14 +370,15 @@ int Incremental(char *devname, int verbose, int runstop, else strcpy(chosen_name, devnum2devname(mp->devnum)); - /* It is generally not OK to add drives to a running array - * as they are probably missing because they failed. - * However if runstop is 1, then the array was possibly - * started early and our best be is to add this anyway. - * It would probably be good to allow explicit policy - * statement about this. + /* It is generally not OK to add non-spare drives to a + * running array as they are probably missing because + * they failed. However if runstop is 1, then the + * array was possibly started early and our best be is + * to add this anyway. It would probably be good to + * allow explicit policy statement about this. */ - if (runstop < 1) { + if ((info.disk.state & (1<<MD_DISK_SYNC)) != 0 + && runstop < 1) { int active = 0; if (st->ss->external) { ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-12 1:43 ` Neil Brown @ 2010-08-14 16:57 ` fibreraid 2010-08-16 4:45 ` Neil Brown 0 siblings, 1 reply; 11+ messages in thread From: fibreraid @ 2010-08-14 16:57 UTC (permalink / raw) To: Neil Brown; +Cc: Dan Williams, linux-raid Hi Neil and Dan, This patch does seem to have fixed the issue for me. Thanks! -Tommy On Wed, Aug 11, 2010 at 6:43 PM, Neil Brown <neilb@suse.de> wrote: > On Tue, 10 Aug 2010 22:17:19 -0700 > Dan Williams <dan.j.williams@intel.com> wrote: > >> On Mon, Aug 9, 2010 at 4:58 AM, fibreraid@gmail.com <fibreraid@gmail.com> wrote: >> > Hi Neil, >> > >> > I may have spoken a bit too soon. It seems that while the md's are >> > coming up successfully, on occasion, hot-spares are not coming up >> > associated with their proper md's. As a result, what was a RAID 5 md >> > with one hot-spare will on occasion come up as a RAID 5 md with no >> > hot-spare. >> > >> > Any ideas on this one? >> > >> >> Is this new behavior only seen with 3.1.3, i.e when it worked with >> 3.1.2 did the hot spares always arrive correctly? I suspect this is a >> result of the new behavior of -I to not add devices to a running array >> without the -R parameter, but you don't want to make this the default >> for udev otherwise your arrays will always come up degraded. >> >> We could allow disks to be added to active non-degraded arrays, but >> that still has the possibility of letting a stale device take the >> place of a fresh hot spare (the whole point of changing the behavior >> in the first place). So as far as I can see we need to query the >> other disks in the active array and permit the disk to be re-added to >> an active array when it is demonstrably a hot spare (or -R is >> specified). >> >> -- >> Dan > > > Arg... another regression. > > Thanks for the report and the analysis. > > Here is the fix. > > NeilBrown > > From ef83fe7cba7355d3da330325e416747b0696baef Mon Sep 17 00:00:00 2001 > From: NeilBrown <neilb@suse.de> > Date: Thu, 12 Aug 2010 11:41:41 +1000 > Subject: [PATCH] Allow --incremental to add spares to an array. > > Commit 3a6ec29ad56 stopped us from adding apparently-working devices > to an active array with --incremental as there is a good chance that they > are actually old/failed devices. > > Unfortunately it also stopped spares from being added to an active > array, which is wrong. This patch refines the test to be more > careful. > > Reported-by: <fibreraid@gmail.com> > Analysed-by: Dan Williams <dan.j.williams@intel.com> > Signed-off-by: NeilBrown <neilb@suse.de> > > diff --git a/Incremental.c b/Incremental.c > index e4b6196..4d3d181 100644 > --- a/Incremental.c > +++ b/Incremental.c > @@ -370,14 +370,15 @@ int Incremental(char *devname, int verbose, int runstop, > else > strcpy(chosen_name, devnum2devname(mp->devnum)); > > - /* It is generally not OK to add drives to a running array > - * as they are probably missing because they failed. > - * However if runstop is 1, then the array was possibly > - * started early and our best be is to add this anyway. > - * It would probably be good to allow explicit policy > - * statement about this. > + /* It is generally not OK to add non-spare drives to a > + * running array as they are probably missing because > + * they failed. However if runstop is 1, then the > + * array was possibly started early and our best be is > + * to add this anyway. It would probably be good to > + * allow explicit policy statement about this. > */ > - if (runstop < 1) { > + if ((info.disk.state & (1<<MD_DISK_SYNC)) != 0 > + && runstop < 1) { > int active = 0; > > if (st->ss->external) { > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 2010-08-14 16:57 ` fibreraid @ 2010-08-16 4:45 ` Neil Brown 0 siblings, 0 replies; 11+ messages in thread From: Neil Brown @ 2010-08-16 4:45 UTC (permalink / raw) To: fibreraid@gmail.com; +Cc: Dan Williams, linux-raid On Sat, 14 Aug 2010 09:57:01 -0700 "fibreraid@gmail.com" <fibreraid@gmail.com> wrote: > Hi Neil and Dan, > > This patch does seem to have fixed the issue for me. > Thanks for the confirmation. NeilBrown ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-08-16 4:45 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-08 1:27 md's fail to assemble correctly consistently at system startup - mdadm 3.1.2 and Ubuntu 10.04 fibreraid 2010-08-08 8:58 ` Neil Brown 2010-08-08 14:26 ` fibreraid 2010-08-09 9:00 ` fibreraid 2010-08-09 10:51 ` Neil Brown 2010-08-09 11:00 ` Neil Brown 2010-08-09 11:58 ` fibreraid 2010-08-11 5:17 ` Dan Williams 2010-08-12 1:43 ` Neil Brown 2010-08-14 16:57 ` fibreraid 2010-08-16 4:45 ` Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).