* Can't get md array to shut down cleanly @ 2006-07-06 13:48 Christian Pernegger 2006-07-06 15:46 ` Niccolo Rigacci 2006-07-06 22:13 ` Neil Brown 0 siblings, 2 replies; 11+ messages in thread From: Christian Pernegger @ 2006-07-06 13:48 UTC (permalink / raw) To: linux-raid Still more problems ... :( My md raid5 still does not always shut down cleanly. The last few lines of the shutdown sequence are always as follows: [...] Will now halt. md: stopping all md devices. md: md0 still in use. Synchronizing SCSI cache for disk /dev/sdd: Synchronizing SCSI cache for disk /dev/sdc: Synchronizing SCSI cache for disk /dev/sdb: Synchronizing SCSI cache for disk /dev/sda: Shutdown: hde System halted. Most of the time the md array comes up clean on the next boot, often enough it does not. Having the array rebuild after every other reboot is not my idea of fun, because the only reason to take it down is to exchange a failing disk. Again, help appreciated - I don't dare putting the system into "production" like that. Regards C. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-06 13:48 Can't get md array to shut down cleanly Christian Pernegger @ 2006-07-06 15:46 ` Niccolo Rigacci 2006-07-06 17:18 ` Christian Pernegger 2006-07-06 22:13 ` Neil Brown 1 sibling, 1 reply; 11+ messages in thread From: Niccolo Rigacci @ 2006-07-06 15:46 UTC (permalink / raw) To: Christian Pernegger; +Cc: linux-raid > My md raid5 still does not always shut down cleanly. The last few > lines of the shutdown sequence are always as follows: > > [...] > Will now halt. > md: stopping all md devices. > md: md0 still in use. > Synchronizing SCSI cache for disk /dev/sdd: > Synchronizing SCSI cache for disk /dev/sdc: > Synchronizing SCSI cache for disk /dev/sdb: > Synchronizing SCSI cache for disk /dev/sda: > Shutdown: hde > System halted. May be your shutdown script is doing "halt -h"? Halting the disk immediately without letting the RAID to settle to a clean state can be the cause? I see that my Debian avoids the -h option if running RAID, from /etc/init.d/halt: # Don't shut down drives if we're using RAID. hddown="-h" if grep -qs '^md.*active' /proc/mdstat then hddown="" fi # If INIT_HALT=HALT don't poweroff. poweroff="-p" if [ "$INIT_HALT" = "HALT" ] then poweroff="" fi log_action_msg "Will now halt" sleep 1 halt -d -f -i $poweroff $hddown -- Niccolo Rigacci Firenze - Italy Iraq, missione di pace: 38839 morti - www.iraqbodycount.net ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-06 15:46 ` Niccolo Rigacci @ 2006-07-06 17:18 ` Christian Pernegger 2006-07-06 19:29 ` thunder7 0 siblings, 1 reply; 11+ messages in thread From: Christian Pernegger @ 2006-07-06 17:18 UTC (permalink / raw) To: linux-raid > May be your shutdown script is doing "halt -h"? Halting the disk > immediately without letting the RAID to settle to a clean state > can be the cause? I'm using Debian as well and my halt script has the fragment you posted. Besides, shouldn't the array be marked clean at this point: > md: stopping all md devices. Apparently it isn't ... : > md: md0 still in use. If someone thinks it might make a difference I could remove everything evms and create a "pure" md array with mdadm. (Directly on the disks or on partitions? Which partition type?) How does a "normal" shutdown look? Will try 2.6.16 and 2.6.15 now ... the boring part is that I have to wait for the resync to complete before the next test ... Thank you, C. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-06 17:18 ` Christian Pernegger @ 2006-07-06 19:29 ` thunder7 2006-07-06 20:24 ` Christian Pernegger 0 siblings, 1 reply; 11+ messages in thread From: thunder7 @ 2006-07-06 19:29 UTC (permalink / raw) To: linux-raid From: Christian Pernegger <pernegger@gmail.com> Date: Thu, Jul 06, 2006 at 07:18:06PM +0200 > >May be your shutdown script is doing "halt -h"? Halting the disk > >immediately without letting the RAID to settle to a clean state > >can be the cause? > > I'm using Debian as well and my halt script has the fragment you posted. > Besides, shouldn't the array be marked clean at this point: > > >md: stopping all md devices. > > Apparently it isn't ... : > > >md: md0 still in use. > > If someone thinks it might make a difference I could remove everything > evms and create a "pure" md array with mdadm. (Directly on the disks > or on partitions? Which partition type?) > > How does a "normal" shutdown look? > > Will try 2.6.16 and 2.6.15 now ... the boring part is that I have to > wait for the resync to complete before the next test ... > I get these messages too on Debian Unstable, but since enabling the bitmaps on my devices, resyncing is so fast that I don't even notice it on booting. Waiting for resync is not happening here. I'm seeing it on my raid-1 root partition. Good luck, Jurriaan -- Debian (Unstable) GNU/Linux 2.6.17-rc4-mm3 2815 bogomips load 2.02 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-06 19:29 ` thunder7 @ 2006-07-06 20:24 ` Christian Pernegger 0 siblings, 0 replies; 11+ messages in thread From: Christian Pernegger @ 2006-07-06 20:24 UTC (permalink / raw) To: linux-raid > I get these messages too on Debian Unstable, but since enabling the > bitmaps on my devices, resyncing is so fast that I don't even notice it > on booting. Bitmaps are great, but the speed of the rebuild is not the problem. The box doesn't have hotswap bays, so I have to shut it down to replace a failed disk. If the array decides that it wasn't clean after the exchange I'm suddenly looking at a dead array. Yes, forcing assembly _should_ work but I'd rather have it shutdown cleanly in the first place. Regards, C. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-06 13:48 Can't get md array to shut down cleanly Christian Pernegger 2006-07-06 15:46 ` Niccolo Rigacci @ 2006-07-06 22:13 ` Neil Brown 2006-07-07 1:07 ` Christian Pernegger 1 sibling, 1 reply; 11+ messages in thread From: Neil Brown @ 2006-07-06 22:13 UTC (permalink / raw) To: Christian Pernegger; +Cc: linux-raid On Thursday July 6, pernegger@gmail.com wrote: > Still more problems ... :( > > My md raid5 still does not always shut down cleanly. The last few > lines of the shutdown sequence are always as follows: > > [...] > Will now halt. > md: stopping all md devices. > md: md0 still in use. > Synchronizing SCSI cache for disk /dev/sdd: > Synchronizing SCSI cache for disk /dev/sdc: > Synchronizing SCSI cache for disk /dev/sdb: > Synchronizing SCSI cache for disk /dev/sda: > Shutdown: hde > System halted. > > Most of the time the md array comes up clean on the next boot, often > enough it does not. Having the array rebuild after every other reboot > is not my idea of fun, because the only reason to take it down is to > exchange a failing disk. > How are you shutting down the machine? If something sending SIGKILL to all processes? If it does, then md really should shut down cleanly every time.... That said, I do see some room for improvement in the md shutdown sequence - it shouldn't give up at that point just because the device seems to be in use.... I'll look into that. You could try the following patch. I think it should be safe. NeilBrown Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./drivers/md/md.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff .prev/drivers/md/md.c ./drivers/md/md.c --- .prev/drivers/md/md.c 2006-07-07 08:11:43.000000000 +1000 +++ ./drivers/md/md.c 2006-07-07 08:12:15.000000000 +1000 @@ -3217,7 +3217,7 @@ static int do_md_stop(mddev_t * mddev, i struct gendisk *disk = mddev->gendisk; if (mddev->pers) { - if (atomic_read(&mddev->active)>2) { + if (mode != 1 && atomic_read(&mddev->active)>2) { printk("md: %s still in use.\n",mdname(mddev)); return -EBUSY; } ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-06 22:13 ` Neil Brown @ 2006-07-07 1:07 ` Christian Pernegger 2006-07-07 7:53 ` Neil Brown 0 siblings, 1 reply; 11+ messages in thread From: Christian Pernegger @ 2006-07-07 1:07 UTC (permalink / raw) To: linux-raid > How are you shutting down the machine? If something sending SIGKILL > to all processes? First SIGTERM, then SIGKILL, yes. > You could try the following patch. I think it should be safe. Hmm, it said chunk failed, so I replaced the line by hand. That didn't want to compile because "mode" supposedly wasn't defined ... was that supposed to be "mddev->safemode"? Closest thing to a mode I could find ... Anyway, this is much better: (lines with * are new) Done unmounting local file systems. *md: md0 stopped *md: unbind <sdf> *md: export_rdev<sdf> *[last two lines for each disk.] *Stopping RAID arrays ... done (1 array(s) stopped). Mounting root filesystem read-only ... done Will now halt. md: stopping all md devices * md: md0 switched to read-only mode Synchronizing SCSI cache for disk /dev/sdf: [...] As you can see the error message is gone now. Much more interesting are the lines before the "Will now halt." line. Those were not there before -- apparently this first attempt by whatever to shutdown the array failed silently. Not sure if this actually fixes the resync problem (I sure hope so, after the last of these no fs could be found anymore on the device) but it's 5 past 3 already, will try tomorrow. Thanks, C. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-07 1:07 ` Christian Pernegger @ 2006-07-07 7:53 ` Neil Brown 2006-07-07 9:25 ` Christian Pernegger 0 siblings, 1 reply; 11+ messages in thread From: Neil Brown @ 2006-07-07 7:53 UTC (permalink / raw) To: Christian Pernegger; +Cc: linux-raid On Friday July 7, pernegger@gmail.com wrote: > > How are you shutting down the machine? If something sending SIGKILL > > to all processes? > > First SIGTERM, then SIGKILL, yes. > That really should cause the array to be clean. Once the md thread gets SIGKILL (it ignores SIGTERM) it will mark the array as 'clean' the moment there are no pending writes. > > You could try the following patch. I think it should be safe. > > Hmm, it said chunk failed, so I replaced the line by hand. That didn't > want to compile because "mode" supposedly wasn't defined ... was that > supposed to be "mddev->safemode"? Closest thing to a mode I could find > ... That patch was against latest -mm.... For earlier kernels you want to test 'ro'. if (!no && atomic_read(&mddev->active)>2) { printk("md: %s still ing use.\n" .... > Anyway, this is much better: (lines with * are new) > > Done unmounting local file systems. > *md: md0 stopped > *md: unbind <sdf> > *md: export_rdev<sdf> > *[last two lines for each disk.] > *Stopping RAID arrays ... done (1 array(s) stopped). > Mounting root filesystem read-only ... done That isn't good. You've stopped the array before the filesystem is readonly. Switching to readonly could cause a write which won't work as the array doesn't exist any more... NeilBrown > Will now halt. > md: stopping all md devices > * md: md0 switched to read-only mode > Synchronizing SCSI cache for disk /dev/sdf: > [...] > > As you can see the error message is gone now. Much more interesting > are the lines before the "Will now halt." line. Those were not there > before -- apparently this first attempt by whatever to shutdown the > array failed silently. > > Not sure if this actually fixes the resync problem (I sure hope so, > after the last of these no fs could be found anymore on the device) > but it's 5 past 3 already, will try tomorrow. > > Thanks, > > C. > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-07 7:53 ` Neil Brown @ 2006-07-07 9:25 ` Christian Pernegger 2006-07-07 20:06 ` Christian Pernegger 0 siblings, 1 reply; 11+ messages in thread From: Christian Pernegger @ 2006-07-07 9:25 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid Good morning! > That patch was against latest -mm.... For earlier kernels you want to > test 'ro'. Ok. Was using stock 2.6.17. > > Done unmounting local file systems. > > *md: md0 stopped > > *md: unbind <sdf> > > *md: export_rdev<sdf> > > *[last two lines for each disk.] > > *Stopping RAID arrays ... done (1 array(s) stopped). > > Mounting root filesystem read-only ... done > > That isn't good. You've stopped the array before the filesystem is > readonly. Switching to readonly could cause a write which won't work > as the array doesn't exist any more... I don't have root on the md, just a regular fs, which is unmounted just before that first line above. > That really should cause the array to be clean. Once the md thread > gets SIGKILL (it ignores SIGTERM) it will mark the array as 'clean' > the moment there are no pending writes. After digging a little deeper it seems that the md thread(s) might not get their SIGKILL after all. The relevant portion from S20sendsigs is as follows: do_stop () { # Kill all processes. log_action_begin_msg "Sending all processes the TERM signal" killall5 -15 log_action_end_msg 0 sleep 5 log_action_begin_msg "Sending all processes the KILL signal" killall5 -9 log_action_end_msg 0 } Apparently killall5 excludes kernel threads. I tried regular killall but that kills the shutdown script as well :) What do other distros use? I could file a bug but I highly doubt it would be seen as such. S40umountfs unmounts non-root filesystems S50mdadm-raid tries to stop arrays (and maybe succeeds, with patch) via mdadm --stop. S90halt halts the machine. I'd really feel better if I didn't have to rely on userspace at all to shut down my arrays, though. At least for people with root-on-RAID the shutdown just before halt / reboot will have to work, anyway. Any idea what could keep the mddev->active above 2? Happy to help with bughuntiung -- I can't use the box properly anyway until I can be sure this is solved. Thanks, C. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-07 9:25 ` Christian Pernegger @ 2006-07-07 20:06 ` Christian Pernegger 2006-07-10 15:27 ` Christian Pernegger 0 siblings, 1 reply; 11+ messages in thread From: Christian Pernegger @ 2006-07-07 20:06 UTC (permalink / raw) To: linux-raid It seems like it really isn't an md issue -- when I remove everything to do with evms (userspace tools + initrd hooks) everything works fine. I took your patch back out and put a few printks in there ... Without evms the "active" counter is 1 in an "idle" state, i. e. after the box has finished booting. With evms the counter is 2 in an "idle" state, and always one higher. Directly before any attempt to shut down the array the counter is 3 with evms (thus the error) but only 2 without it. I don't know if evms is buggy and fails to put back a reference or if the +1 increase in the active counter is legit, and md.c needs a better check then just "active needs to be below 3". Longish dmesg excerpt follows, maybe someone can pinpoint the cause and decide what needs to be done. md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: bitmap version 4.39 md: linear personality registered for level -1 md: raid0 personality registered for level 0 md: raid1 personality registered for level 1 md: raid10 personality registered for level 10 raid5: automatically using best checksumming function: generic_sse generic_sse: 4566.000 MB/sec raid5: using function: generic_sse (4566.000 MB/sec) md: raid5 personality registered for level 5 md: raid4 personality registered for level 4 raid6: int64x1 1331 MB/s raid6: int64x2 1650 MB/s raid6: int64x4 2018 MB/s raid6: int64x8 1671 MB/s raid6: sse2x1 2208 MB/s raid6: sse2x2 3104 MB/s raid6: sse2x4 2806 MB/s raid6: using algorithm sse2x2 (3104 MB/s) md: raid6 personality registered for level 6 md: REF UP: 2 md: REF DOWN: 1 md: REF UP: 2 md: REF DOWN: 1 md: REF UP: 2 md: REF DOWN: 1 md: REF UP: 2 md: REF DOWN: 1 md: REF UP: 2 md: REF DOWN: 1 md: REF UP: 2 md: bind<sdb> md: REF DOWN: 1 md: REF UP: 2 md: bind<sdc> md: REF DOWN: 1 md: REF UP: 2 md: bind<sdd> md: REF DOWN: 1 md: REF UP: 2 md: bind<sde> md: REF DOWN: 1 md: REF UP: 2 md: REF UP: 3 md: REF DOWN: 2 raid5: device sdd operational as raid disk 2 raid5: device sdc operational as raid disk 1 raid5: device sdb operational as raid disk 0 raid5: allocated 4262kB for md0 raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2 RAID5 conf printout: --- rd:4 wd:3 fd:1 disk 0, o:1, dev:sdb disk 1, o:1, dev:sdc disk 2, o:1, dev:sdd md0: bitmap initialized from disk: read 15/15 pages, set 0 bits, status: 0 created bitmap (233 pages) for device md0 RAID5 conf printout: md: REF DOWN: 1 --- rd:4 wd:3 fd:1 disk 0, o:1, dev:sdb disk 1, o:1, dev:sdc disk 2, o:1, dev:sdd disk 3, o:1, dev:sde md: REF UP: 2 md: REF UP: 3 md: REF DOWN: 2 md: REF DOWN: 1 md: syncing RAID array md0 md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction. md: using 128k window, over a total of 488386432 blocks. md: REF UP: 2 md: REF DOWN: 1 *** [up to here everything is fine, but the counter never again drops to 1 afterwards] *** md: REF UP: 2 Attempting manual resume kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. md: REF UP: 3 hw_random: RNG not detected md: REF DOWN: 2 Adding 4000176k swap on /dev/evms/sda2. Priority:-1 extents:1 across:4000176k EXT3 FS on dm-0, internal journal md: REF UP: 3 md: REF DOWN: 2 *** [last two lines repeated fairly often, but more like excessive polling than an infinite error loop] *** Regards, C. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Can't get md array to shut down cleanly 2006-07-07 20:06 ` Christian Pernegger @ 2006-07-10 15:27 ` Christian Pernegger 0 siblings, 0 replies; 11+ messages in thread From: Christian Pernegger @ 2006-07-10 15:27 UTC (permalink / raw) To: linux-raid; +Cc: neilb Nope, EVMS is not the culprit. I installed the test system from scratch, EVMS nowhere in sight -- it now boots successfully from a partitionable md array, courtesty of a yaird-generated initrd I adapted for the purpose. Yay! Or not. I get the "md: md_d0 still in use." error again :( This is with Debian's 2.6.15, i. e. without the above patch of course. The only thing that this configuration and my earlier EVMS one have in common is that they start the array via initrd. It wasn't even a similar initrd, first initramfs-tools and now yaird ... Could this be the source of the problem, Neil? Something breaks due to the root-fs switching initrds do? C. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2006-07-10 15:27 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-06 13:48 Can't get md array to shut down cleanly Christian Pernegger 2006-07-06 15:46 ` Niccolo Rigacci 2006-07-06 17:18 ` Christian Pernegger 2006-07-06 19:29 ` thunder7 2006-07-06 20:24 ` Christian Pernegger 2006-07-06 22:13 ` Neil Brown 2006-07-07 1:07 ` Christian Pernegger 2006-07-07 7:53 ` Neil Brown 2006-07-07 9:25 ` Christian Pernegger 2006-07-07 20:06 ` Christian Pernegger 2006-07-10 15:27 ` Christian Pernegger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).