* How can I ensure that my swraid saves checkpoints with sysrq reboot?
@ 2012-09-12 16:52 Marc MERLIN
2012-09-12 21:29 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Marc MERLIN @ 2012-09-12 16:52 UTC (permalink / raw)
To: linux-raid
I don't have a lot of data on this because each time I get it wrong, I have
to endure 2 days of resync (2TB drives/raid5).
I'm using kernel 3.5.3.
It seems that when my system is having issues and I need to sysrq reboot,
I do:
- sync
- umount
- (re)boot
Last time I did this, all I saw was:
[33415.717023] SysRq : Resetting
[33415.721143] ACPI MEMORY or I/O RESET_REG.
and sure enough, my raid came back unclean.
Next time, I tried 'o' instead of 'b', and got the following which seems to
sync up my raid checkpoints before shutdown:
Is sysrq-reboot also supposed to sync checkpoints, but just fails to do so
when I'm rebooting due to stuck controller issues anyway?
(note that in this case it's another controller than the one the drives are on)
Or does 'off' sync raid checkpoints and 'reboot' does not?
[ 581.511867] SysRq : Power Off
[ 581.526466] md: md5: resync done.
[ 581.538448] md: checkpointing resync of md5.
[ 581.544403] md: delaying resync of md0 until md3 has finished (they share one or more physical units)
[ 581.583046] md: md3: resync done.
[ 581.669506] md: checkpointing resync of md3.
[ 581.675550] md: resync of RAID array md0
[ 581.681256] md: minimum _guaranteed_ speed: 40000 KB/sec/disk.
[ 581.688938] md: using maximum available idle IO bandwidth (but not more than 81920 KB/sec) for resync.
[ 581.699385] md: using 128k window, over a total of 1048512k.
[ 581.748192] md: md0: resync done.
[ 581.957972] md: checkpointing resync of md0.
[ 582.984647] kvm: exiting hardware virtualization
[ 583.192431] sd 15:0:0:0: [sdq] Synchronizing SCSI cache
[ 583.219673] sd 15:0:0:0: [sdq] Stopping disk
[ 583.678907] sd 14:0:0:0: [sdp] Synchronizing SCSI cache
[ 583.706855] sd 14:0:0:0: [sdp] Stopping disk
[ 584.166092] sd 13:0:0:0: [sdo] Synchronizing SCSI cache
[ 584.194037] sd 13:0:0:0: [sdo] Stopping disk
[ 584.653278] sd 12:0:0:0: [sdn] Synchronizing SCSI cache
[ 584.681225] sd 12:0:0:0: [sdn] Stopping disk
[ 585.140459] sd 10:0:0:0: [sdm] Synchronizing SCSI cache
[ 585.164369] sd 10:0:0:0: [sdm] Stopping disk
[ 585.616919] sd 8:4:0:0: [sdl] Synchronizing SCSI cache
[ 585.623818] sd 8:4:0:0: [sdl] Stopping disk
[ 586.083546] sd 8:3:0:0: [sdk] Synchronizing SCSI cache
[ 586.090333] sd 8:3:0:0: [sdk] Stopping disk
[ 586.098364] sd 8:2:0:0: [sdj] Synchronizing SCSI cache
[ 586.104642] sd 8:2:0:0: [sdj] Stopping disk
[ 586.562706] sd 8:1:0:0: [sdi] Synchronizing SCSI cache
[ 586.569474] sd 8:1:0:0: [sdi] Stopping disk
[ 586.577436] sd 8:0:0:0: [sdh] Synchronizing SCSI cache
[ 586.583685] sd 8:0:0:0: [sdh] Stopping disk
[ 586.596304] sd 4:0:1:0: [sdg] Synchronizing SCSI cache
[ 586.602869] sd 4:0:1:0: [sdg] Stopping disk
[ 586.759081] sd 4:0:0:0: [sdf] Synchronizing SCSI cache
[ 586.765676] sd 4:0:0:0: [sdf] Stopping disk
[ 586.771551] sd 3:0:0:0: [sde] Synchronizing SCSI cache
[ 586.778175] sd 3:0:0:0: [sde] Stopping disk
[ 587.219806] sd 2:0:0:0: [sdd] Synchronizing SCSI cache
[ 587.263591] sd 2:0:0:0: [sdd] Stopping disk
[ 588.024999] sd 1:0:1:0: [sdc] Synchronizing SCSI cache
[ 588.064693] sd 1:0:1:0: [sdc] Stopping disk
[ 588.826957] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
[ 588.833460] sd 0:0:1:0: [sdb] Stopping disk
[ 589.258918] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 589.265434] sd 0:0:0:0: [sda] Stopping disk
[ 589.690313] r8169 0000:05:00.0: wake-up capability enabled by ACPI
[ 589.713372] pcieport 0000:00:01.0: wake-up capability enabled by ACPI
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: How can I ensure that my swraid saves checkpoints with sysrq reboot?
2012-09-12 16:52 How can I ensure that my swraid saves checkpoints with sysrq reboot? Marc MERLIN
@ 2012-09-12 21:29 ` NeilBrown
2012-09-12 21:52 ` Marc MERLIN
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2012-09-12 21:29 UTC (permalink / raw)
To: Marc MERLIN; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 4513 bytes --]
On Wed, 12 Sep 2012 09:52:08 -0700 Marc MERLIN <marc@merlins.org> wrote:
> I don't have a lot of data on this because each time I get it wrong, I have
> to endure 2 days of resync (2TB drives/raid5).
>
> I'm using kernel 3.5.3.
>
> It seems that when my system is having issues and I need to sysrq reboot,
> I do:
> - sync
> - umount
> - (re)boot
>
> Last time I did this, all I saw was:
> [33415.717023] SysRq : Resetting
> [33415.721143] ACPI MEMORY or I/O RESET_REG.
>
> and sure enough, my raid came back unclean.
>
> Next time, I tried 'o' instead of 'b', and got the following which seems to
> sync up my raid checkpoints before shutdown:
>
> Is sysrq-reboot also supposed to sync checkpoints, but just fails to do so
> when I'm rebooting due to stuck controller issues anyway?
> (note that in this case it's another controller than the one the drives are on)
> Or does 'off' sync raid checkpoints and 'reboot' does not?
>
>
> [ 581.511867] SysRq : Power Off
> [ 581.526466] md: md5: resync done.
> [ 581.538448] md: checkpointing resync of md5.
> [ 581.544403] md: delaying resync of md0 until md3 has finished (they share one or more physical units)
> [ 581.583046] md: md3: resync done.
> [ 581.669506] md: checkpointing resync of md3.
> [ 581.675550] md: resync of RAID array md0
> [ 581.681256] md: minimum _guaranteed_ speed: 40000 KB/sec/disk.
> [ 581.688938] md: using maximum available idle IO bandwidth (but not more than 81920 KB/sec) for resync.
> [ 581.699385] md: using 128k window, over a total of 1048512k.
> [ 581.748192] md: md0: resync done.
> [ 581.957972] md: checkpointing resync of md0.
> [ 582.984647] kvm: exiting hardware virtualization
> [ 583.192431] sd 15:0:0:0: [sdq] Synchronizing SCSI cache
> [ 583.219673] sd 15:0:0:0: [sdq] Stopping disk
> [ 583.678907] sd 14:0:0:0: [sdp] Synchronizing SCSI cache
> [ 583.706855] sd 14:0:0:0: [sdp] Stopping disk
> [ 584.166092] sd 13:0:0:0: [sdo] Synchronizing SCSI cache
> [ 584.194037] sd 13:0:0:0: [sdo] Stopping disk
> [ 584.653278] sd 12:0:0:0: [sdn] Synchronizing SCSI cache
> [ 584.681225] sd 12:0:0:0: [sdn] Stopping disk
> [ 585.140459] sd 10:0:0:0: [sdm] Synchronizing SCSI cache
> [ 585.164369] sd 10:0:0:0: [sdm] Stopping disk
> [ 585.616919] sd 8:4:0:0: [sdl] Synchronizing SCSI cache
> [ 585.623818] sd 8:4:0:0: [sdl] Stopping disk
> [ 586.083546] sd 8:3:0:0: [sdk] Synchronizing SCSI cache
> [ 586.090333] sd 8:3:0:0: [sdk] Stopping disk
> [ 586.098364] sd 8:2:0:0: [sdj] Synchronizing SCSI cache
> [ 586.104642] sd 8:2:0:0: [sdj] Stopping disk
> [ 586.562706] sd 8:1:0:0: [sdi] Synchronizing SCSI cache
> [ 586.569474] sd 8:1:0:0: [sdi] Stopping disk
> [ 586.577436] sd 8:0:0:0: [sdh] Synchronizing SCSI cache
> [ 586.583685] sd 8:0:0:0: [sdh] Stopping disk
> [ 586.596304] sd 4:0:1:0: [sdg] Synchronizing SCSI cache
> [ 586.602869] sd 4:0:1:0: [sdg] Stopping disk
> [ 586.759081] sd 4:0:0:0: [sdf] Synchronizing SCSI cache
> [ 586.765676] sd 4:0:0:0: [sdf] Stopping disk
> [ 586.771551] sd 3:0:0:0: [sde] Synchronizing SCSI cache
> [ 586.778175] sd 3:0:0:0: [sde] Stopping disk
> [ 587.219806] sd 2:0:0:0: [sdd] Synchronizing SCSI cache
> [ 587.263591] sd 2:0:0:0: [sdd] Stopping disk
> [ 588.024999] sd 1:0:1:0: [sdc] Synchronizing SCSI cache
> [ 588.064693] sd 1:0:1:0: [sdc] Stopping disk
> [ 588.826957] sd 0:0:1:0: [sdb] Synchronizing SCSI cache
> [ 588.833460] sd 0:0:1:0: [sdb] Stopping disk
> [ 589.258918] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> [ 589.265434] sd 0:0:0:0: [sda] Stopping disk
> [ 589.690313] r8169 0000:05:00.0: wake-up capability enabled by ACPI
> [ 589.713372] pcieport 0000:00:01.0: wake-up capability enabled by ACPI
>
>
> Thanks,
> Marc
Hi Marc,
md registers a reboot notifier. When that is called it tries to checkpoint
everything.
All varieties of the 'reboot' system call seem to call the reboot notifiers.
alt-sysrq-b doesn't use the same path. It calls machine_emergency_restart,
bypassing all the reboot handling.
Once upon a time I had the idea that killing the md threads would lead to
proper checkpointing, so alt-sysrq-I would do-the-right-thing.
I'm not sure if it does though.
But alt-sysrq-o (power off) seems to use the normal reboot handling and so
works - as you noticed. So that should always be safe and seems to be the
only safe approach.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: How can I ensure that my swraid saves checkpoints with sysrq reboot?
2012-09-12 21:29 ` NeilBrown
@ 2012-09-12 21:52 ` Marc MERLIN
0 siblings, 0 replies; 3+ messages in thread
From: Marc MERLIN @ 2012-09-12 21:52 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]
On Thu, Sep 13, 2012 at 07:29:44AM +1000, NeilBrown wrote:
> Hi Marc,
>
> md registers a reboot notifier. When that is called it tries to checkpoint
> everything.
> All varieties of the 'reboot' system call seem to call the reboot notifiers.
> alt-sysrq-b doesn't use the same path. It calls machine_emergency_restart,
> bypassing all the reboot handling.
>
> Once upon a time I had the idea that killing the md threads would lead to
> proper checkpointing, so alt-sysrq-I would do-the-right-thing.
> I'm not sure if it does though.
>
> But alt-sysrq-o (power off) seems to use the normal reboot handling and so
> works - as you noticed. So that should always be safe and seems to be the
> only safe approach.
Thank you for confirming what I thought I saw.
For my specific case, I'll be ok because I have a controllable power supply,
but obviously for the hosted server case, this sucks since sending a
poweroff makes sure the machine won't come back.
Is there a chance you can contact whoever is responsible for the sysrq-b
codepath to make the right call so that you get a chance to sync arrays, or
should admins just know to use sysrq-i and then sysrq-r ?
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 308 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-09-12 21:52 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-12 16:52 How can I ensure that my swraid saves checkpoints with sysrq reboot? Marc MERLIN
2012-09-12 21:29 ` NeilBrown
2012-09-12 21:52 ` Marc MERLIN
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).