* RAID scrubbing @ 2010-04-10 1:28 Justin Maggard 2010-04-10 1:41 ` Michael Evans 0 siblings, 1 reply; 7+ messages in thread From: Justin Maggard @ 2010-04-10 1:28 UTC (permalink / raw) To: linux-raid Hi all, I've got a system using two RAID5 arrays that share some physical devices, combined using LVM. Oddly, when I "echo repair > /sys/block/md0/md/sync_action", once it finishes, it automatically starts a repair on md1 also, even though I haven't requested it. Also, if I try to stop it using "echo idle > /sys/block/md0/md/sync_action", a repair starts on md1 within a few seconds. If I stop that md1 repair immediately, sometimes it will respawn and start doing the repair again on md1. What should I be expecting here? If I start a repair on one array, is it supposed to automatically go through and do it on all arrays sharing that personality? Thanks! -Justin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing 2010-04-10 1:28 RAID scrubbing Justin Maggard @ 2010-04-10 1:41 ` Michael Evans [not found] ` <s2y150c16851004091846t94347cf8u9ffd65133061d16b@mail.gmail.com> 0 siblings, 1 reply; 7+ messages in thread From: Michael Evans @ 2010-04-10 1:41 UTC (permalink / raw) To: Justin Maggard; +Cc: linux-raid On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote: > Hi all, > > I've got a system using two RAID5 arrays that share some physical > devices, combined using LVM. Oddly, when I "echo repair > > /sys/block/md0/md/sync_action", once it finishes, it automatically > starts a repair on md1 also, even though I haven't requested it. > Also, if I try to stop it using "echo idle > > /sys/block/md0/md/sync_action", a repair starts on md1 within a few > seconds. If I stop that md1 repair immediately, sometimes it will > respawn and start doing the repair again on md1. What should I be > expecting here? If I start a repair on one array, is it supposed to > automatically go through and do it on all arrays sharing that > personality? > > Thanks! > -Justin > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Is md1 degraded with an active spare? It might be delaying resync on it until the other devices are idle. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <s2y150c16851004091846t94347cf8u9ffd65133061d16b@mail.gmail.com>]
* Re: RAID scrubbing [not found] ` <s2y150c16851004091846t94347cf8u9ffd65133061d16b@mail.gmail.com> @ 2010-04-10 2:01 ` Michael Evans 2010-04-15 0:51 ` Justin Maggard 0 siblings, 1 reply; 7+ messages in thread From: Michael Evans @ 2010-04-10 2:01 UTC (permalink / raw) To: Justin Maggard, linux-raid On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard <jmaggard10@gmail.com> wrote: > On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans <mjevans1983@gmail.com> wrote: >> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote: >>> Hi all, >>> >>> I've got a system using two RAID5 arrays that share some physical >>> devices, combined using LVM. Oddly, when I "echo repair > >>> /sys/block/md0/md/sync_action", once it finishes, it automatically >>> starts a repair on md1 also, even though I haven't requested it. >>> Also, if I try to stop it using "echo idle > >>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few >>> seconds. If I stop that md1 repair immediately, sometimes it will >>> respawn and start doing the repair again on md1. What should I be >>> expecting here? If I start a repair on one array, is it supposed to >>> automatically go through and do it on all arrays sharing that >>> personality? >>> >>> Thanks! >>> -Justin >>> >> >> Is md1 degraded with an active spare? It might be delaying resync on >> it until the other devices are idle. > > No, both arrays are redundant. I'm just trying to do scrubbing > (repair) on md0; no resync is going on anywhere. > > -Justin > First: Reply to all. Second, if you insist that things are not as I suspect: cat /proc/mdstat mdadm -Dvvs mdadm -Evvs -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing 2010-04-10 2:01 ` Michael Evans @ 2010-04-15 0:51 ` Justin Maggard 2010-04-15 1:22 ` Neil Brown 0 siblings, 1 reply; 7+ messages in thread From: Justin Maggard @ 2010-04-15 0:51 UTC (permalink / raw) To: Michael Evans; +Cc: linux-raid On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans <mjevans1983@gmail.com> wrote: > On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard <jmaggard10@gmail.com> wrote: >> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans <mjevans1983@gmail.com> wrote: >>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote: >>>> Hi all, >>>> >>>> I've got a system using two RAID5 arrays that share some physical >>>> devices, combined using LVM. Oddly, when I "echo repair > >>>> /sys/block/md0/md/sync_action", once it finishes, it automatically >>>> starts a repair on md1 also, even though I haven't requested it. >>>> Also, if I try to stop it using "echo idle > >>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few >>>> seconds. If I stop that md1 repair immediately, sometimes it will >>>> respawn and start doing the repair again on md1. What should I be >>>> expecting here? If I start a repair on one array, is it supposed to >>>> automatically go through and do it on all arrays sharing that >>>> personality? >>>> >>>> Thanks! >>>> -Justin >>>> >>> >>> Is md1 degraded with an active spare? It might be delaying resync on >>> it until the other devices are idle. >> >> No, both arrays are redundant. I'm just trying to do scrubbing >> (repair) on md0; no resync is going on anywhere. >> >> -Justin >> > > First: Reply to all. > > Second, if you insist that things are not as I suspect: > > cat /proc/mdstat > > mdadm -Dvvs > > mdadm -Evvs > I insist it's something different. :) Just ran into it again on another system. Here's the requested output: JMAGGARD:~# cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md3 : active raid5 sde6[0] sdc6[1] 976750832 blocks super 1.2 level 5, 16k chunk, algorithm 2 [2/2] [UU] md2 : active raid5 sda5[0] sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1] 4860206480 blocks super 1.2 level 5, 16k chunk, algorithm 2 [6/6] [UUUUUU] unused devices: <none> JMAGGARD:~# ls /sys/block/dm-0/slaves/ md2 md3 JMAGGARD:~# cat /sys/block/dm-0/slaves/md?/md/sync_action idle idle JMAGGARD:~# echo repair > /sys/block/md2/md/sync_action JMAGGARD:~# dmesg -c md: requested-resync of RAID array md2 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync. md: using 128k window, over a total of 972041296 blocks. JMAGGARD:~# JMAGGARD:~# cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md3 : active raid5 sde6[0] sdc6[1] 976750832 blocks super 1.2 level 5, 16k chunk, algorithm 2 [2/2] [UU] md2 : active raid5 sda5[0] sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1] 4860206480 blocks super 1.2 level 5, 16k chunk, algorithm 2 [6/6] [UUUUUU] [>....................] resync = 0.1% (1409104/972041296) finish=195.1min speed=82888K/sec unused devices: <none> JMAGGARD:~# echo idle > /sys/block/md2/md/sync_action JMAGGARD:~# dmesg -c md: md_do_sync() got signal ... exiting JMAGGARD:~# dmesg -c JMAGGARD:~# cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md3 : active raid5 sde6[0] sdc6[1] 976750832 blocks super 1.2 level 5, 16k chunk, algorithm 2 [2/2] [UU] md2 : active raid5 sda5[0] sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1] 4860206480 blocks super 1.2 level 5, 16k chunk, algorithm 2 [6/6] [UUUUUU] unused devices: <none> JMAGGARD:~# cat /proc/mdstat Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] md3 : active raid5 sde6[0] sdc6[1] 976750832 blocks super 1.2 level 5, 16k chunk, algorithm 2 [2/2] [UU] [>....................] resync = 0.1% (1213304/976750832) finish=227.8min speed=71370K/sec md2 : active raid5 sda5[0] sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1] 4860206480 blocks super 1.2 level 5, 16k chunk, algorithm 2 [6/6] [UUUUUU] unused devices: <none> JMAGGARD:~# dmesg -c md: requested-resync of RAID array md3 md: minimum _guaranteed_ speed: 1000 KB/sec/disk. md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync. md: using 128k window, over a total of 976750832 blocks. JMAGGARD:~# tail -10 /var/log/kern.log Apr 14 16:36:31 JMAGGARD kernel: usb 1-2: new high speed USB device using ehci_hcd and address 2 Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2 Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync. Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total of 972041296 blocks. Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3 Apr 14 17:33:35 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Apr 14 17:33:35 JMAGGARD kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync. Apr 14 17:33:35 JMAGGARD kernel: md: using 128k window, over a total of 976750832 blocks. JMAGGARD:~# JMAGGARD:~# mdadm -Dvvs /dev/md3: Version : 1.2 Creation Time : Wed Apr 14 10:30:07 2010 Raid Level : raid5 Array Size : 976750832 (931.50 GiB 1000.19 GB) Used Dev Size : 976750832 (931.50 GiB 1000.19 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Update Time : Wed Apr 14 16:11:08 2010 State : active, resyncing Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 16K Rebuild Status : 0% complete Name : 001AD408C964:3 UUID : 34522369:e16f6b97:c9ba035d:392c01ea Events : 27 Number Major Minor RaidDevice State 0 8 70 0 active sync /dev/sde6 1 8 38 1 active sync /dev/sdc6 JMAGGARD:~# mdadm -Evvs mdadm: No md superblock detected on /dev/md3. mdadm: No md superblock detected on /dev/c/c. mdadm: No md superblock detected on /dev/md2. /dev/sdf5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52 Name : 001AD408C964:2 Creation Time : Tue Apr 13 19:31:40 2010 Raid Level : raid5 Raid Devices : 6 Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB) Array Size : 9720412960 (4635.05 GiB 4976.85 GB) Used Dev Size : 1944082592 (927.01 GiB 995.37 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : c190fc75:fbf482b0:6e7ec0f1:bbc3f1f4 Update Time : Wed Apr 14 17:32:51 2010 Checksum : 418c175c - correct Events : 71 Layout : left-symmetric Chunk Size : 16K Device Role : Active device 5 Array State : AAAAAA ('A' == active, '.' == missing) /dev/sde6: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 34522369:e16f6b97:c9ba035d:392c01ea Name : 001AD408C964:3 Creation Time : Wed Apr 14 10:30:07 2010 Raid Level : raid5 Raid Devices : 2 Avail Dev Size : 1953501952 (931.50 GiB 1000.19 GB) Array Size : 1953501664 (931.50 GiB 1000.19 GB) Used Dev Size : 1953501664 (931.50 GiB 1000.19 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : 56016c35:ebb5b3b1:732a20a9:2e03e8e0 Update Time : Wed Apr 14 16:11:08 2010 Checksum : 8fec23c5 - correct Events : 27 Layout : left-symmetric Chunk Size : 16K Device Role : Active device 0 Array State : AA ('A' == active, '.' == missing) /dev/sde5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52 Name : 001AD408C964:2 Creation Time : Tue Apr 13 19:31:40 2010 Raid Level : raid5 Raid Devices : 6 Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB) Array Size : 9720412960 (4635.05 GiB 4976.85 GB) Used Dev Size : 1944082592 (927.01 GiB 995.37 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 4604bf53:fa8b8f98:29ac1273:ddd7d318 Update Time : Wed Apr 14 17:32:51 2010 Checksum : ac8f63ba - correct Events : 71 Layout : left-symmetric Chunk Size : 16K Device Role : Active device 4 Array State : AAAAAA ('A' == active, '.' == missing) /dev/sdd5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52 Name : 001AD408C964:2 Creation Time : Tue Apr 13 19:31:40 2010 Raid Level : raid5 Raid Devices : 6 Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB) Array Size : 9720412960 (4635.05 GiB 4976.85 GB) Used Dev Size : 1944082592 (927.01 GiB 995.37 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 162d3da0:9e796a82:e9811a93:5e21fc47 Update Time : Wed Apr 14 17:32:51 2010 Checksum : 3218996f - correct Events : 71 Layout : left-symmetric Chunk Size : 16K Device Role : Active device 3 Array State : AAAAAA ('A' == active, '.' == missing) /dev/sdc6: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 34522369:e16f6b97:c9ba035d:392c01ea Name : 001AD408C964:3 Creation Time : Wed Apr 14 10:30:07 2010 Raid Level : raid5 Raid Devices : 2 Avail Dev Size : 1953501952 (931.50 GiB 1000.19 GB) Array Size : 1953501664 (931.50 GiB 1000.19 GB) Used Dev Size : 1953501664 (931.50 GiB 1000.19 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : active Device UUID : fc30ef80:1341367e:e611bc15:ae905745 Update Time : Wed Apr 14 16:11:08 2010 Checksum : 78fd5386 - correct Events : 27 Layout : left-symmetric Chunk Size : 16K Device Role : Active device 1 Array State : AA ('A' == active, '.' == missing) /dev/sdc5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52 Name : 001AD408C964:2 Creation Time : Tue Apr 13 19:31:40 2010 Raid Level : raid5 Raid Devices : 6 Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB) Array Size : 9720412960 (4635.05 GiB 4976.85 GB) Used Dev Size : 1944082592 (927.01 GiB 995.37 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 45e19226:873f8207:49543089:a8d14f46 Update Time : Wed Apr 14 17:32:51 2010 Checksum : 31ef962f - correct Events : 71 Layout : left-symmetric Chunk Size : 16K Device Role : Active device 2 Array State : AAAAAA ('A' == active, '.' == missing) /dev/sdb5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52 Name : 001AD408C964:2 Creation Time : Tue Apr 13 19:31:40 2010 Raid Level : raid5 Raid Devices : 6 Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB) Array Size : 9720412960 (4635.05 GiB 4976.85 GB) Used Dev Size : 1944082592 (927.01 GiB 995.37 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 5d9c342e:569c096d:63efac3b:28912736 Update Time : Wed Apr 14 17:32:51 2010 Checksum : 416d08af - correct Events : 71 Layout : left-symmetric Chunk Size : 16K Device Role : Active device 1 Array State : AAAAAA ('A' == active, '.' == missing) /dev/sda5: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52 Name : 001AD408C964:2 Creation Time : Tue Apr 13 19:31:40 2010 Raid Level : raid5 Raid Devices : 6 Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB) Array Size : 9720412960 (4635.05 GiB 4976.85 GB) Used Dev Size : 1944082592 (927.01 GiB 995.37 GB) Data Offset : 272 sectors Super Offset : 8 sectors State : clean Device UUID : 50327e35:a1b602d2:da883a40:e0314bed Update Time : Wed Apr 14 17:32:51 2010 Checksum : 6960f31c - correct Events : 71 Layout : left-symmetric Chunk Size : 16K Device Role : Active device 0 Array State : AAAAAA ('A' == active, '.' == missing) JMAGGARD:~# -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing 2010-04-15 0:51 ` Justin Maggard @ 2010-04-15 1:22 ` Neil Brown 2010-04-17 0:03 ` Justin Maggard 0 siblings, 1 reply; 7+ messages in thread From: Neil Brown @ 2010-04-15 1:22 UTC (permalink / raw) To: Justin Maggard; +Cc: Michael Evans, linux-raid On Wed, 14 Apr 2010 17:51:11 -0700 Justin Maggard <jmaggard10@gmail.com> wrote: > On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans <mjevans1983@gmail.com> wrote: > > On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard <jmaggard10@gmail.com> wrote: > >> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans <mjevans1983@gmail.com> wrote: > >>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote: > >>>> Hi all, > >>>> > >>>> I've got a system using two RAID5 arrays that share some physical > >>>> devices, combined using LVM. Oddly, when I "echo repair > > >>>> /sys/block/md0/md/sync_action", once it finishes, it automatically > >>>> starts a repair on md1 also, even though I haven't requested it. > >>>> Also, if I try to stop it using "echo idle > > >>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few > >>>> seconds. If I stop that md1 repair immediately, sometimes it will > >>>> respawn and start doing the repair again on md1. What should I be > >>>> expecting here? If I start a repair on one array, is it supposed to > >>>> automatically go through and do it on all arrays sharing that > >>>> personality? > >>>> > >>>> Thanks! > >>>> -Justin > >>>> > >>> > >>> Is md1 degraded with an active spare? It might be delaying resync on > >>> it until the other devices are idle. > >> > >> No, both arrays are redundant. I'm just trying to do scrubbing > >> (repair) on md0; no resync is going on anywhere. > >> > >> -Justin > >> > > > > First: Reply to all. > > > > Second, if you insist that things are not as I suspect: > > > > cat /proc/mdstat > > > > mdadm -Dvvs > > > > mdadm -Evvs > > > > I insist it's something different. :) Just ran into it again on > another system. Here's the requested output: Thanks. Very thorough! > Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2 > Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000 > KB/sec/disk. > Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO > bandwidth (but not more than 200000 KB/sec) for requested-resync. > Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total > of 972041296 blocks. > Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting > Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3 So we see the requested-resync (repair) of md2 started as you requested, then finished at 17:32:51 when you write 'idle' to 'sync_action'. Then 44 seconds later a similar repair started on md3. 44 seconds is too long for it to be a direct consequence of the md2 repair stopping. Something *must* have written to md3/md/sync_action. But what? Maybe you have "mdadm --monitor" running and it notices when repair on one array finished and has been told to run a script (--program or PROGRAM in mdadm.conf) which would then start a repair on the next array??? Seems a bit far-fetched, but I'm quite confident that some program must be writing to md3/md/sync_action while you're not watching. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing 2010-04-15 1:22 ` Neil Brown @ 2010-04-17 0:03 ` Justin Maggard 2010-04-17 0:19 ` Berkey B Walker 0 siblings, 1 reply; 7+ messages in thread From: Justin Maggard @ 2010-04-17 0:03 UTC (permalink / raw) To: Neil Brown; +Cc: linux-raid On Wed, Apr 14, 2010 at 6:22 PM, Neil Brown <neilb@suse.de> wrote: > On Wed, 14 Apr 2010 17:51:11 -0700 > Justin Maggard <jmaggard10@gmail.com> wrote: > >> On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans <mjevans1983@gmail.com> wrote: >> > On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard <jmaggard10@gmail.com> wrote: >> >> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans <mjevans1983@gmail.com> wrote: >> >>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote: >> >>>> Hi all, >> >>>> >> >>>> I've got a system using two RAID5 arrays that share some physical >> >>>> devices, combined using LVM. Oddly, when I "echo repair > >> >>>> /sys/block/md0/md/sync_action", once it finishes, it automatically >> >>>> starts a repair on md1 also, even though I haven't requested it. >> >>>> Also, if I try to stop it using "echo idle > >> >>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few >> >>>> seconds. If I stop that md1 repair immediately, sometimes it will >> >>>> respawn and start doing the repair again on md1. What should I be >> >>>> expecting here? If I start a repair on one array, is it supposed to >> >>>> automatically go through and do it on all arrays sharing that >> >>>> personality? >> >>>> >> >>>> Thanks! >> >>>> -Justin >> >>>> >> >>> >> >>> Is md1 degraded with an active spare? It might be delaying resync on >> >>> it until the other devices are idle. >> >> >> >> No, both arrays are redundant. I'm just trying to do scrubbing >> >> (repair) on md0; no resync is going on anywhere. >> >> >> >> -Justin >> >> >> > >> > First: Reply to all. >> > >> > Second, if you insist that things are not as I suspect: >> > >> > cat /proc/mdstat >> > >> > mdadm -Dvvs >> > >> > mdadm -Evvs >> > >> >> I insist it's something different. :) Just ran into it again on >> another system. Here's the requested output: > > Thanks. Very thorough! > > >> Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2 >> Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000 >> KB/sec/disk. >> Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO >> bandwidth (but not more than 200000 KB/sec) for requested-resync. >> Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total >> of 972041296 blocks. >> Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting >> Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3 > > So we see the requested-resync (repair) of md2 started as you requested, > then finished at 17:32:51 when you write 'idle' to 'sync_action'. > > Then 44 seconds later a similar repair started on md3. > 44 seconds is too long for it to be a direct consequence of the md2 repair > stopping. Something *must* have written to md3/md/sync_action. But what? > > Maybe you have "mdadm --monitor" running and it notices when repair on one > array finished and has been told to run a script (--program or PROGRAM in > mdadm.conf) which would then start a repair on the next array??? > > Seems a bit far-fetched, but I'm quite confident that some program must be > writing to md3/md/sync_action while you're not watching. > > NeilBrown Well, this is embarrassing. You're exactly right. :) Looks like it was a bug in the script run by mdadm --monitor. Thanks for the insight! -Justin -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing 2010-04-17 0:03 ` Justin Maggard @ 2010-04-17 0:19 ` Berkey B Walker 0 siblings, 0 replies; 7+ messages in thread From: Berkey B Walker @ 2010-04-17 0:19 UTC (permalink / raw) To: Justin Maggard; +Cc: Neil Brown, linux-raid Justin Maggard wrote: > On Wed, Apr 14, 2010 at 6:22 PM, Neil Brown<neilb@suse.de> wrote: > >> On Wed, 14 Apr 2010 17:51:11 -0700 >> Justin Maggard<jmaggard10@gmail.com> wrote: >> >> >>> On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans<mjevans1983@gmail.com> wrote: >>> >>>> On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard<jmaggard10@gmail.com> wrote: >>>> >>>>> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans<mjevans1983@gmail.com> wrote: >>>>> >>>>>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard<jmaggard10@gmail.com> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I've got a system using two RAID5 arrays that share some physical >>>>>>> devices, combined using LVM. Oddly, when I "echo repair> >>>>>>> /sys/block/md0/md/sync_action", once it finishes, it automatically >>>>>>> starts a repair on md1 also, even though I haven't requested it. >>>>>>> Also, if I try to stop it using "echo idle> >>>>>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few >>>>>>> seconds. If I stop that md1 repair immediately, sometimes it will >>>>>>> respawn and start doing the repair again on md1. What should I be >>>>>>> expecting here? If I start a repair on one array, is it supposed to >>>>>>> automatically go through and do it on all arrays sharing that >>>>>>> personality? >>>>>>> >>>>>>> Thanks! >>>>>>> -Justin >>>>>>> >>>>>>> >>>>>> Is md1 degraded with an active spare? It might be delaying resync on >>>>>> it until the other devices are idle. >>>>>> >>>>> No, both arrays are redundant. I'm just trying to do scrubbing >>>>> (repair) on md0; no resync is going on anywhere. >>>>> >>>>> -Justin >>>>> >>>>> >>>> First: Reply to all. >>>> >>>> Second, if you insist that things are not as I suspect: >>>> >>>> cat /proc/mdstat >>>> >>>> mdadm -Dvvs >>>> >>>> mdadm -Evvs >>>> >>>> >>> I insist it's something different. :) Just ran into it again on >>> another system. Here's the requested output: >>> >> Thanks. Very thorough! >> >> >> >>> Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2 >>> Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000 >>> KB/sec/disk. >>> Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO >>> bandwidth (but not more than 200000 KB/sec) for requested-resync. >>> Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total >>> of 972041296 blocks. >>> Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting >>> Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3 >>> >> So we see the requested-resync (repair) of md2 started as you requested, >> then finished at 17:32:51 when you write 'idle' to 'sync_action'. >> >> Then 44 seconds later a similar repair started on md3. >> 44 seconds is too long for it to be a direct consequence of the md2 repair >> stopping. Something *must* have written to md3/md/sync_action. But what? >> >> Maybe you have "mdadm --monitor" running and it notices when repair on one >> array finished and has been told to run a script (--program or PROGRAM in >> mdadm.conf) which would then start a repair on the next array??? >> >> Seems a bit far-fetched, but I'm quite confident that some program must be >> writing to md3/md/sync_action while you're not watching. >> >> NeilBrown >> > Well, this is embarrassing. You're exactly right. :) Looks like it > was a bug in the script run by mdadm --monitor. Thanks for the > insight! > > -Justin > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > This, I think, is a nice (and polite) ending. Best wishes to all players. b- ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-04-17 0:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-10 1:28 RAID scrubbing Justin Maggard
2010-04-10 1:41 ` Michael Evans
[not found] ` <s2y150c16851004091846t94347cf8u9ffd65133061d16b@mail.gmail.com>
2010-04-10 2:01 ` Michael Evans
2010-04-15 0:51 ` Justin Maggard
2010-04-15 1:22 ` Neil Brown
2010-04-17 0:03 ` Justin Maggard
2010-04-17 0:19 ` Berkey B Walker
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).