From: Benjamin ESTRABAUD <be@mpstor.com>
To: linux-raid@vger.kernel.org
Subject: MD: "sync_action" issues: pausing resync/recovery automatically restarts.
Date: Thu, 11 Feb 2010 12:02:56 +0000 [thread overview]
Message-ID: <4B73F1F0.1030800@mpstor.com> (raw)
Hi everybody,
I am getting a weird issue when I am writing values to
"/sys/block/mdX/md/sync_action".
For instance, I would like to pause a resync or/and a recovery when they
are happening.
I create a RAID 5 as follow:
mdadm --create -vvv --force --run --metadata=1.2 /dev/md/d0 --level=5
--size=9429760 --chunk=64 --name=1056856 -n5 --bitmap=internal
--bitmap-chunk=4096 --layout=ls /dev/sde2 /dev/sdb2 /dev/sdc2 /dev/sdf2
/dev/sdd2
The RAID is resyncing:
# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md_d0 : active raid5 sdd2[4] sdf2[3] sdc2[2] sdb2[1] sde2[0]
37719040 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5]
[UUUUU]
[====>................] resync = 22.2% (2101824/9429760)
finish=2.6min speed=46186K/sec
bitmap: 1/1 pages [64KB], 4096KB chunk
unused devices: <none>
I then decide to pause its resync:
# echo idle > /sys/block/md_d0/md/sync_action
The RAID resync should have paused by now, let's check the sys properties:
# cat /sys/block/md_d0/md/sync_action
resync
The resync seems to have not stopped/restarted, let's check dmesg:
[157287.049715] raid5: raid level 5 set md_d0 active with 5 out of 5
devices, algorithm 2
[157287.057601] RAID5 conf printout:
[157287.060909] --- rd:5 wd:5
[157287.063700] disk 0, o:1, dev:sde2
[157287.067182] disk 1, o:1, dev:sdb2
[157287.070664] disk 2, o:1, dev:sdc2
[157287.074147] disk 3, o:1, dev:sdf2
[157287.077628] disk 4, o:1, dev:sdd2
[157287.086813] md_d0: bitmap initialized from disk: read 1/1 pages, set
2303 bits
[157287.094134] created bitmap (1 pages) for device md_d0
[157287.113475] md: resync of RAID array md_d0
[157287.117650] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[157287.123555] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for resync.
[157287.133011] md: using 2048k window, over a total of 9429760 blocks.
[157345.158535] md: md_do_sync() got signal ... exiting
[157345.166057] md: checkpointing resync of md_d0.
[157345.179819] md: resync of RAID array md_d0
[157345.183993] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[157345.189899] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for resync.
[157345.199353] md: using 2048k window, over a total of 9429760 blocks.
The resync seem to stop at some stage since:
[157345.158535] md: md_do_sync() got signal ... exiting
But it seems to be restarting right after this:
[157345.179819] md: resync of RAID array md_d0
I read in the md.txt documentation that pausing a resync could sometimes
not work if a n event or trigger was triggering it to automatically
restart. However, I don't think I have any trigger that would cause it
to restart.
it then builds perfectly fine.
I now want to check if the same issue occurs while recovering, after
all, I especially want to be able to pause a recovery, while I don't
really need to pause/restart resyncs.
Let's say I pull a disk from the bay, fail it and remove it as follow:
# mdadm --fail /dev/md/d0 /dev/sde2
mdadm: set /dev/sde2 faulty in /dev/md/d0
# mdadm --remove /dev/md/d0 /dev/sde2
mdadm: hot removed /dev/sde2
Now let's add a spare:
# /opt/soma/bin/mdadm/mdadm --add /dev/md/d0 /dev/sda2
raid manager: added /dev/sda2
The RAID is now recovering:
# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md_d0 : active raid5 sda2[5] sdd2[4] sdf2[3] sdc2[2] sdb2[1]
37719040 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4]
[_UUUU]
[>....................] recovery = 1.7% (169792/9429760)
finish=0.9min speed=169792K/sec
bitmap: 0/1 pages [0KB], 4096KB chunk
unused devices: <none>
# cat /sys/block/md_d0/md/sync_action
recover
Let's try and stop this recovery:
# echo idle > /sys/block/md_d0/md/sync_action
[157641.618291] disk 3, o:1, dev:sdf2
[157641.621774] disk 4, o:1, dev:sdd2
[157641.632057] md: recovery of RAID array md_d0
[157641.636413] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[157641.642314] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for recovery.
[157641.651940] md: using 2048k window, over a total of 9429760 blocks.
[157657.120722] md: md_do_sync() got signal ... exiting
[157657.267055] RAID5 conf printout:
[157657.270381] --- rd:5 wd:4
[157657.273171] disk 0, o:1, dev:sda2
[157657.276650] disk 1, o:1, dev:sdb2
[157657.280129] disk 2, o:1, dev:sdc2
[157657.283605] disk 3, o:1, dev:sdf2
[157657.287087] disk 4, o:1, dev:sdd2
[157657.290568] RAID5 conf printout:
[157657.293876] --- rd:5 wd:4
[157657.296660] disk 0, o:1, dev:sda2
[157657.300139] disk 1, o:1, dev:sdb2
[157657.303615] disk 2, o:1, dev:sdc2
[157657.307096] disk 3, o:1, dev:sdf2
[157657.310579] disk 4, o:1, dev:sdd2
[157657.320835] md: recovery of RAID array md_d0
[157657.325194] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
[157657.331091] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for recovery.
[157657.340713] md: using 2048k window, over a total of 9429760 blocks.
[157657.347047] md: resuming recovery of md_d0 from checkpoint.
I am getting the same issue, the recovery stops, but restarts 200
milliseconds later.
This clearly indicates that some sort of trigger is automatically
restarting the resync and recovery, but I have no clue as of what could
it be.
Would anyone here had a similar experience with trying to stop resyncs?
Is there a "magic" variable that would enable or disable automatic
restart of resync/recoveries?
Would anyone know of a standard event or trigger that would cause a
resync or recovery to automatically restart?
Thank you very much in advance for your help.
My Kernel version is:
2.6.26.3
Ben.
next reply other threads:[~2010-02-11 12:02 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-11 12:02 Benjamin ESTRABAUD [this message]
2010-02-16 1:26 ` MD: "sync_action" issues: pausing resync/recovery automatically restarts Neil Brown
2010-02-17 16:24 ` Benjamin ESTRABAUD
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B73F1F0.1030800@mpstor.com \
--to=be@mpstor.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.