linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Can't get md array to shut down cleanly
@ 2006-07-06 13:48 Christian Pernegger
  2006-07-06 15:46 ` Niccolo Rigacci
  2006-07-06 22:13 ` Neil Brown
  0 siblings, 2 replies; 11+ messages in thread
From: Christian Pernegger @ 2006-07-06 13:48 UTC (permalink / raw)
  To: linux-raid

Still more problems ... :(

My md raid5 still does not always shut down cleanly. The last few
lines of the shutdown sequence are always as follows:

[...]
Will now halt.
md: stopping all md devices.
md: md0 still in use.
Synchronizing SCSI cache for disk /dev/sdd:
Synchronizing SCSI cache for disk /dev/sdc:
Synchronizing SCSI cache for disk /dev/sdb:
Synchronizing SCSI cache for disk /dev/sda:
Shutdown: hde
System halted.

Most of the time the md array comes up clean on the next boot, often
enough it does not. Having the array rebuild after every other reboot
is not my idea of fun, because the only reason to take it down is to
exchange a failing disk.

Again, help appreciated - I don't dare putting the system into
"production" like that.

Regards

C.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-06 13:48 Can't get md array to shut down cleanly Christian Pernegger
@ 2006-07-06 15:46 ` Niccolo Rigacci
  2006-07-06 17:18   ` Christian Pernegger
  2006-07-06 22:13 ` Neil Brown
  1 sibling, 1 reply; 11+ messages in thread
From: Niccolo Rigacci @ 2006-07-06 15:46 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-raid

> My md raid5 still does not always shut down cleanly. The last few
> lines of the shutdown sequence are always as follows:
> 
> [...]
> Will now halt.
> md: stopping all md devices.
> md: md0 still in use.
> Synchronizing SCSI cache for disk /dev/sdd:
> Synchronizing SCSI cache for disk /dev/sdc:
> Synchronizing SCSI cache for disk /dev/sdb:
> Synchronizing SCSI cache for disk /dev/sda:
> Shutdown: hde
> System halted.


May be your shutdown script is doing "halt -h"? Halting the disk 
immediately without letting the RAID to settle to a clean state 
can be the cause?

I see that my Debian avoids the -h option if running RAID,
from /etc/init.d/halt:


        # Don't shut down drives if we're using RAID.
        hddown="-h"
        if grep -qs '^md.*active' /proc/mdstat
        then
                hddown=""
        fi

        # If INIT_HALT=HALT don't poweroff.
        poweroff="-p"
        if [ "$INIT_HALT" = "HALT" ]
        then
                poweroff=""
        fi

        log_action_msg "Will now halt"
        sleep 1
        halt -d -f -i $poweroff $hddown


-- 
Niccolo Rigacci
Firenze - Italy

Iraq, missione di pace: 38839 morti - www.iraqbodycount.net

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-06 15:46 ` Niccolo Rigacci
@ 2006-07-06 17:18   ` Christian Pernegger
  2006-07-06 19:29     ` thunder7
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Pernegger @ 2006-07-06 17:18 UTC (permalink / raw)
  To: linux-raid

> May be your shutdown script is doing "halt -h"? Halting the disk
> immediately without letting the RAID to settle to a clean state
> can be the cause?

I'm using Debian as well and my halt script has the fragment you posted.
Besides, shouldn't the array be marked clean at this point:

> md: stopping all md devices.

Apparently it isn't ... :

> md: md0 still in use.

If someone thinks it might make a difference I could remove everything
evms and create a "pure" md array with mdadm. (Directly on the disks
or on partitions? Which partition type?)

How does a "normal" shutdown look?

Will try 2.6.16 and 2.6.15 now ... the boring part is that I have to
wait for the resync to complete before the next test ...

Thank you,

C.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-06 17:18   ` Christian Pernegger
@ 2006-07-06 19:29     ` thunder7
  2006-07-06 20:24       ` Christian Pernegger
  0 siblings, 1 reply; 11+ messages in thread
From: thunder7 @ 2006-07-06 19:29 UTC (permalink / raw)
  To: linux-raid

From: Christian Pernegger <pernegger@gmail.com>
Date: Thu, Jul 06, 2006 at 07:18:06PM +0200
> >May be your shutdown script is doing "halt -h"? Halting the disk
> >immediately without letting the RAID to settle to a clean state
> >can be the cause?
> 
> I'm using Debian as well and my halt script has the fragment you posted.
> Besides, shouldn't the array be marked clean at this point:
> 
> >md: stopping all md devices.
> 
> Apparently it isn't ... :
> 
> >md: md0 still in use.
> 
> If someone thinks it might make a difference I could remove everything
> evms and create a "pure" md array with mdadm. (Directly on the disks
> or on partitions? Which partition type?)
> 
> How does a "normal" shutdown look?
> 
> Will try 2.6.16 and 2.6.15 now ... the boring part is that I have to
> wait for the resync to complete before the next test ...
> 
I get these messages too on Debian Unstable, but since enabling the
bitmaps on my devices, resyncing is so fast that I don't even notice it
on booting. Waiting for resync is not happening here. I'm seeing it on
my raid-1 root partition.

Good luck,
Jurriaan
-- 
Debian (Unstable) GNU/Linux 2.6.17-rc4-mm3 2815 bogomips load 2.02

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-06 19:29     ` thunder7
@ 2006-07-06 20:24       ` Christian Pernegger
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Pernegger @ 2006-07-06 20:24 UTC (permalink / raw)
  To: linux-raid

> I get these messages too on Debian Unstable, but since enabling the
> bitmaps on my devices, resyncing is so fast that I don't even notice it
> on booting.

Bitmaps are great, but the speed of the rebuild is not the problem.
The box doesn't have hotswap bays, so I have to shut it down to
replace a failed disk. If the array decides that it wasn't clean after
the exchange I'm suddenly looking at a dead array. Yes, forcing
assembly _should_ work but I'd rather have it shutdown cleanly in the
first place.

Regards,

C.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-06 13:48 Can't get md array to shut down cleanly Christian Pernegger
  2006-07-06 15:46 ` Niccolo Rigacci
@ 2006-07-06 22:13 ` Neil Brown
  2006-07-07  1:07   ` Christian Pernegger
  1 sibling, 1 reply; 11+ messages in thread
From: Neil Brown @ 2006-07-06 22:13 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-raid

On Thursday July 6, pernegger@gmail.com wrote:
> Still more problems ... :(
> 
> My md raid5 still does not always shut down cleanly. The last few
> lines of the shutdown sequence are always as follows:
> 
> [...]
> Will now halt.
> md: stopping all md devices.
> md: md0 still in use.
> Synchronizing SCSI cache for disk /dev/sdd:
> Synchronizing SCSI cache for disk /dev/sdc:
> Synchronizing SCSI cache for disk /dev/sdb:
> Synchronizing SCSI cache for disk /dev/sda:
> Shutdown: hde
> System halted.
> 
> Most of the time the md array comes up clean on the next boot, often
> enough it does not. Having the array rebuild after every other reboot
> is not my idea of fun, because the only reason to take it down is to
> exchange a failing disk.
> 

How are you shutting down the machine?  If something sending SIGKILL
to all processes?  If it does, then md really should shut down cleanly
every time....

That said, I do see some room for improvement in the md shutdown
sequence - it shouldn't give up at that point just because the device
seems to be in use....  I'll look into that.
You could try the following patch.  I think it should be safe.

NeilBrown

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2006-07-07 08:11:43.000000000 +1000
+++ ./drivers/md/md.c	2006-07-07 08:12:15.000000000 +1000
@@ -3217,7 +3217,7 @@ static int do_md_stop(mddev_t * mddev, i
 	struct gendisk *disk = mddev->gendisk;
 
 	if (mddev->pers) {
-		if (atomic_read(&mddev->active)>2) {
+		if (mode != 1 && atomic_read(&mddev->active)>2) {
 			printk("md: %s still in use.\n",mdname(mddev));
 			return -EBUSY;
 		}

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-06 22:13 ` Neil Brown
@ 2006-07-07  1:07   ` Christian Pernegger
  2006-07-07  7:53     ` Neil Brown
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Pernegger @ 2006-07-07  1:07 UTC (permalink / raw)
  To: linux-raid

> How are you shutting down the machine?  If something sending SIGKILL
> to all processes?

First SIGTERM, then SIGKILL, yes.

> You could try the following patch.  I think it should be safe.

Hmm, it said chunk failed, so I replaced the line by hand. That didn't
want to compile because "mode" supposedly wasn't defined ... was that
supposed to be "mddev->safemode"? Closest thing to a mode I could find
...

Anyway, this is much better: (lines with * are new)

Done unmounting local file systems.
*md: md0 stopped
*md: unbind <sdf>
*md: export_rdev<sdf>
*[last two lines for each disk.]
*Stopping RAID arrays ... done (1 array(s) stopped).
Mounting root filesystem read-only ... done
Will now halt.
md: stopping all md devices
* md: md0 switched to read-only mode
Synchronizing SCSI cache for disk /dev/sdf:
[...]

As you can see the error message is gone now. Much more interesting
are the lines before the "Will now halt." line. Those were not there
before -- apparently this first attempt by whatever to shutdown the
array failed silently.

Not sure if this actually fixes the resync problem (I sure hope so,
after the last of these no fs could be found anymore on the device)
but it's 5 past 3 already, will try tomorrow.

Thanks,

C.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-07  1:07   ` Christian Pernegger
@ 2006-07-07  7:53     ` Neil Brown
  2006-07-07  9:25       ` Christian Pernegger
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Brown @ 2006-07-07  7:53 UTC (permalink / raw)
  To: Christian Pernegger; +Cc: linux-raid

On Friday July 7, pernegger@gmail.com wrote:
> > How are you shutting down the machine?  If something sending SIGKILL
> > to all processes?
> 
> First SIGTERM, then SIGKILL, yes.
> 

That really should cause the array to be clean.  Once the md thread
gets SIGKILL (it ignores SIGTERM) it will mark the array as 'clean'
the moment there are no pending writes.

> > You could try the following patch.  I think it should be safe.
> 
> Hmm, it said chunk failed, so I replaced the line by hand. That didn't
> want to compile because "mode" supposedly wasn't defined ... was that
> supposed to be "mddev->safemode"? Closest thing to a mode I could find
> ...

That patch was against latest -mm.... For earlier kernels you want to
test 'ro'.

   if (!no && atomic_read(&mddev->active)>2) {
          printk("md: %s still ing use.\n" ....


> Anyway, this is much better: (lines with * are new)
> 
> Done unmounting local file systems.
> *md: md0 stopped
> *md: unbind <sdf>
> *md: export_rdev<sdf>
> *[last two lines for each disk.]
> *Stopping RAID arrays ... done (1 array(s) stopped).
> Mounting root filesystem read-only ... done

That isn't good. You've stopped the array before the filesystem is
readonly.  Switching to readonly could cause a write which won't work
as the array doesn't exist any more...

NeilBrown


> Will now halt.
> md: stopping all md devices
> * md: md0 switched to read-only mode
> Synchronizing SCSI cache for disk /dev/sdf:
> [...]
> 
> As you can see the error message is gone now. Much more interesting
> are the lines before the "Will now halt." line. Those were not there
> before -- apparently this first attempt by whatever to shutdown the
> array failed silently.
> 
> Not sure if this actually fixes the resync problem (I sure hope so,
> after the last of these no fs could be found anymore on the device)
> but it's 5 past 3 already, will try tomorrow.
> 
> Thanks,
> 
> C.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-07  7:53     ` Neil Brown
@ 2006-07-07  9:25       ` Christian Pernegger
  2006-07-07 20:06         ` Christian Pernegger
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Pernegger @ 2006-07-07  9:25 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

Good morning!

> That patch was against latest -mm.... For earlier kernels you want to
> test 'ro'.

Ok. Was using stock 2.6.17.

> > Done unmounting local file systems.
> > *md: md0 stopped
> > *md: unbind <sdf>
> > *md: export_rdev<sdf>
> > *[last two lines for each disk.]
> > *Stopping RAID arrays ... done (1 array(s) stopped).
> > Mounting root filesystem read-only ... done
>
> That isn't good. You've stopped the array before the filesystem is
> readonly.  Switching to readonly could cause a write which won't work
> as the array doesn't exist any more...

I don't have root on the md, just a regular fs, which is unmounted
just before that first line above.

> That really should cause the array to be clean.  Once the md thread
> gets SIGKILL (it ignores SIGTERM) it will mark the array as 'clean'
> the moment there are no pending writes.

After digging a little deeper it seems that the md thread(s) might not
get their SIGKILL after all. The relevant portion from S20sendsigs is
as follows:

do_stop () {
        # Kill all processes.
        log_action_begin_msg "Sending all processes the TERM signal"
        killall5 -15
        log_action_end_msg 0
        sleep 5
        log_action_begin_msg "Sending all processes the KILL signal"
        killall5 -9
        log_action_end_msg 0
}

Apparently killall5 excludes kernel threads. I tried regular killall
but that kills the shutdown script as well :) What do other distros
use? I could file a bug but I highly doubt it would be seen as such.

S40umountfs unmounts non-root filesystems
S50mdadm-raid tries to stop arrays (and maybe succeeds, with patch)
via mdadm --stop.
S90halt halts the machine.

I'd really feel better if I didn't have to rely on userspace at all to
shut down my arrays, though. At least for people with root-on-RAID the
shutdown just before halt / reboot will have to work, anyway.

Any idea what could keep the mddev->active above 2?

Happy to help with bughuntiung -- I can't use the box properly anyway
until I can be sure this is solved.

Thanks,

C.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-07  9:25       ` Christian Pernegger
@ 2006-07-07 20:06         ` Christian Pernegger
  2006-07-10 15:27           ` Christian Pernegger
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Pernegger @ 2006-07-07 20:06 UTC (permalink / raw)
  To: linux-raid

It seems like it really isn't an md issue -- when I remove everything
to do with evms (userspace tools + initrd hooks) everything works
fine.

I took your patch back out and put a few printks in there ...
Without evms the "active" counter is 1 in an "idle" state, i. e. after the box
has finished booting.
With evms the counter is 2 in an "idle" state, and always one higher.

Directly before any attempt to shut down the array the counter is 3
with evms (thus the error) but only 2 without it.

I don't know if evms is buggy and fails to put back a reference or if
the +1 increase in the active counter is legit, and md.c needs a
better check then just "active needs to be below 3".

Longish dmesg excerpt follows, maybe someone can pinpoint the cause
and decide what
needs to be done.

md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
md: linear personality registered for level -1
md: raid0 personality registered for level 0
md: raid1 personality registered for level 1
md: raid10 personality registered for level 10
raid5: automatically using best checksumming function: generic_sse
  generic_sse:  4566.000 MB/sec
raid5: using function: generic_sse (4566.000 MB/sec)
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
raid6: int64x1   1331 MB/s
raid6: int64x2   1650 MB/s
raid6: int64x4   2018 MB/s
raid6: int64x8   1671 MB/s
raid6: sse2x1    2208 MB/s
raid6: sse2x2    3104 MB/s
raid6: sse2x4    2806 MB/s
raid6: using algorithm sse2x2 (3104 MB/s)
md: raid6 personality registered for level 6
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: REF DOWN: 1
md: REF UP: 2
md: bind<sdb>
md: REF DOWN: 1
md: REF UP: 2
md: bind<sdc>
md: REF DOWN: 1
md: REF UP: 2
md: bind<sdd>
md: REF DOWN: 1
md: REF UP: 2
md: bind<sde>
md: REF DOWN: 1
md: REF UP: 2
md: REF UP: 3
md: REF DOWN: 2
raid5: device sdd operational as raid disk 2
raid5: device sdc operational as raid disk 1
raid5: device sdb operational as raid disk 0
raid5: allocated 4262kB for md0
raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2
RAID5 conf printout:
 --- rd:4 wd:3 fd:1
 disk 0, o:1, dev:sdb
 disk 1, o:1, dev:sdc
 disk 2, o:1, dev:sdd
md0: bitmap initialized from disk: read 15/15 pages, set 0 bits, status: 0
created bitmap (233 pages) for device md0
RAID5 conf printout:
md: REF DOWN: 1
 --- rd:4 wd:3 fd:1
 disk 0, o:1, dev:sdb
 disk 1, o:1, dev:sdc
 disk 2, o:1, dev:sdd
 disk 3, o:1, dev:sde
md: REF UP: 2
md: REF UP: 3
md: REF DOWN: 2
md: REF DOWN: 1
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 488386432 blocks.
md: REF UP: 2
md: REF DOWN: 1

*** [up to here everything is fine, but the counter never again drops
to 1 afterwards] ***

md: REF UP: 2
Attempting manual resume
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
md: REF UP: 3
hw_random: RNG not detected
md: REF DOWN: 2
Adding 4000176k swap on /dev/evms/sda2.  Priority:-1 extents:1 across:4000176k
EXT3 FS on dm-0, internal journal
md: REF UP: 3
md: REF DOWN: 2
*** [last two lines repeated fairly often, but more like excessive
polling than an infinite error loop] ***

Regards,

C.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Can't get md array to shut down cleanly
  2006-07-07 20:06         ` Christian Pernegger
@ 2006-07-10 15:27           ` Christian Pernegger
  0 siblings, 0 replies; 11+ messages in thread
From: Christian Pernegger @ 2006-07-10 15:27 UTC (permalink / raw)
  To: linux-raid; +Cc: neilb

Nope, EVMS is not the culprit.

I installed the test system from scratch, EVMS nowhere in sight -- it
now boots successfully from a partitionable md array, courtesty of a
yaird-generated initrd I adapted for the purpose. Yay!

Or not. I get the "md: md_d0 still in use." error again :(

This is with Debian's 2.6.15, i. e. without the above patch of course.

The only thing that this configuration and my earlier EVMS one have in
common is that they start the array via initrd. It wasn't even a
similar initrd, first initramfs-tools and now yaird ...

Could this be the source of the problem, Neil? Something breaks due to
the root-fs switching initrds do?

C.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-07-10 15:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-06 13:48 Can't get md array to shut down cleanly Christian Pernegger
2006-07-06 15:46 ` Niccolo Rigacci
2006-07-06 17:18   ` Christian Pernegger
2006-07-06 19:29     ` thunder7
2006-07-06 20:24       ` Christian Pernegger
2006-07-06 22:13 ` Neil Brown
2006-07-07  1:07   ` Christian Pernegger
2006-07-07  7:53     ` Neil Brown
2006-07-07  9:25       ` Christian Pernegger
2006-07-07 20:06         ` Christian Pernegger
2006-07-10 15:27           ` Christian Pernegger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).