* RAID scrubbing
@ 2010-04-10 1:28 Justin Maggard
2010-04-10 1:41 ` Michael Evans
0 siblings, 1 reply; 7+ messages in thread
From: Justin Maggard @ 2010-04-10 1:28 UTC (permalink / raw)
To: linux-raid
Hi all,
I've got a system using two RAID5 arrays that share some physical
devices, combined using LVM. Oddly, when I "echo repair >
/sys/block/md0/md/sync_action", once it finishes, it automatically
starts a repair on md1 also, even though I haven't requested it.
Also, if I try to stop it using "echo idle >
/sys/block/md0/md/sync_action", a repair starts on md1 within a few
seconds. If I stop that md1 repair immediately, sometimes it will
respawn and start doing the repair again on md1. What should I be
expecting here? If I start a repair on one array, is it supposed to
automatically go through and do it on all arrays sharing that
personality?
Thanks!
-Justin
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing
2010-04-10 1:28 RAID scrubbing Justin Maggard
@ 2010-04-10 1:41 ` Michael Evans
[not found] ` <s2y150c16851004091846t94347cf8u9ffd65133061d16b@mail.gmail.com>
0 siblings, 1 reply; 7+ messages in thread
From: Michael Evans @ 2010-04-10 1:41 UTC (permalink / raw)
To: Justin Maggard; +Cc: linux-raid
On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
> Hi all,
>
> I've got a system using two RAID5 arrays that share some physical
> devices, combined using LVM. Oddly, when I "echo repair >
> /sys/block/md0/md/sync_action", once it finishes, it automatically
> starts a repair on md1 also, even though I haven't requested it.
> Also, if I try to stop it using "echo idle >
> /sys/block/md0/md/sync_action", a repair starts on md1 within a few
> seconds. If I stop that md1 repair immediately, sometimes it will
> respawn and start doing the repair again on md1. What should I be
> expecting here? If I start a repair on one array, is it supposed to
> automatically go through and do it on all arrays sharing that
> personality?
>
> Thanks!
> -Justin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Is md1 degraded with an active spare? It might be delaying resync on
it until the other devices are idle.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing
[not found] ` <s2y150c16851004091846t94347cf8u9ffd65133061d16b@mail.gmail.com>
@ 2010-04-10 2:01 ` Michael Evans
2010-04-15 0:51 ` Justin Maggard
0 siblings, 1 reply; 7+ messages in thread
From: Michael Evans @ 2010-04-10 2:01 UTC (permalink / raw)
To: Justin Maggard, linux-raid
On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans <mjevans1983@gmail.com> wrote:
>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
>>> Hi all,
>>>
>>> I've got a system using two RAID5 arrays that share some physical
>>> devices, combined using LVM. Oddly, when I "echo repair >
>>> /sys/block/md0/md/sync_action", once it finishes, it automatically
>>> starts a repair on md1 also, even though I haven't requested it.
>>> Also, if I try to stop it using "echo idle >
>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few
>>> seconds. If I stop that md1 repair immediately, sometimes it will
>>> respawn and start doing the repair again on md1. What should I be
>>> expecting here? If I start a repair on one array, is it supposed to
>>> automatically go through and do it on all arrays sharing that
>>> personality?
>>>
>>> Thanks!
>>> -Justin
>>>
>>
>> Is md1 degraded with an active spare? It might be delaying resync on
>> it until the other devices are idle.
>
> No, both arrays are redundant. I'm just trying to do scrubbing
> (repair) on md0; no resync is going on anywhere.
>
> -Justin
>
First: Reply to all.
Second, if you insist that things are not as I suspect:
cat /proc/mdstat
mdadm -Dvvs
mdadm -Evvs
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing
2010-04-10 2:01 ` Michael Evans
@ 2010-04-15 0:51 ` Justin Maggard
2010-04-15 1:22 ` Neil Brown
0 siblings, 1 reply; 7+ messages in thread
From: Justin Maggard @ 2010-04-15 0:51 UTC (permalink / raw)
To: Michael Evans; +Cc: linux-raid
On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans <mjevans1983@gmail.com> wrote:
> On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
>> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans <mjevans1983@gmail.com> wrote:
>>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
>>>> Hi all,
>>>>
>>>> I've got a system using two RAID5 arrays that share some physical
>>>> devices, combined using LVM. Oddly, when I "echo repair >
>>>> /sys/block/md0/md/sync_action", once it finishes, it automatically
>>>> starts a repair on md1 also, even though I haven't requested it.
>>>> Also, if I try to stop it using "echo idle >
>>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few
>>>> seconds. If I stop that md1 repair immediately, sometimes it will
>>>> respawn and start doing the repair again on md1. What should I be
>>>> expecting here? If I start a repair on one array, is it supposed to
>>>> automatically go through and do it on all arrays sharing that
>>>> personality?
>>>>
>>>> Thanks!
>>>> -Justin
>>>>
>>>
>>> Is md1 degraded with an active spare? It might be delaying resync on
>>> it until the other devices are idle.
>>
>> No, both arrays are redundant. I'm just trying to do scrubbing
>> (repair) on md0; no resync is going on anywhere.
>>
>> -Justin
>>
>
> First: Reply to all.
>
> Second, if you insist that things are not as I suspect:
>
> cat /proc/mdstat
>
> mdadm -Dvvs
>
> mdadm -Evvs
>
I insist it's something different. :) Just ran into it again on
another system. Here's the requested output:
JMAGGARD:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sde6[0] sdc6[1]
976750832 blocks super 1.2 level 5, 16k chunk, algorithm 2 [2/2] [UU]
md2 : active raid5 sda5[0] sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1]
4860206480 blocks super 1.2 level 5, 16k chunk, algorithm 2 [6/6] [UUUUUU]
unused devices: <none>
JMAGGARD:~# ls /sys/block/dm-0/slaves/
md2 md3
JMAGGARD:~# cat /sys/block/dm-0/slaves/md?/md/sync_action
idle
idle
JMAGGARD:~# echo repair > /sys/block/md2/md/sync_action
JMAGGARD:~# dmesg -c
md: requested-resync of RAID array md2
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for requested-resync.
md: using 128k window, over a total of 972041296 blocks.
JMAGGARD:~#
JMAGGARD:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sde6[0] sdc6[1]
976750832 blocks super 1.2 level 5, 16k chunk, algorithm 2 [2/2] [UU]
md2 : active raid5 sda5[0] sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1]
4860206480 blocks super 1.2 level 5, 16k chunk, algorithm 2 [6/6] [UUUUUU]
[>....................] resync = 0.1% (1409104/972041296)
finish=195.1min speed=82888K/sec
unused devices: <none>
JMAGGARD:~# echo idle > /sys/block/md2/md/sync_action
JMAGGARD:~# dmesg -c
md: md_do_sync() got signal ... exiting
JMAGGARD:~# dmesg -c
JMAGGARD:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sde6[0] sdc6[1]
976750832 blocks super 1.2 level 5, 16k chunk, algorithm 2 [2/2] [UU]
md2 : active raid5 sda5[0] sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1]
4860206480 blocks super 1.2 level 5, 16k chunk, algorithm 2 [6/6] [UUUUUU]
unused devices: <none>
JMAGGARD:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md3 : active raid5 sde6[0] sdc6[1]
976750832 blocks super 1.2 level 5, 16k chunk, algorithm 2 [2/2] [UU]
[>....................] resync = 0.1% (1213304/976750832)
finish=227.8min speed=71370K/sec
md2 : active raid5 sda5[0] sdf5[5] sde5[4] sdd5[3] sdc5[2] sdb5[1]
4860206480 blocks super 1.2 level 5, 16k chunk, algorithm 2 [6/6] [UUUUUU]
unused devices: <none>
JMAGGARD:~# dmesg -c
md: requested-resync of RAID array md3
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for requested-resync.
md: using 128k window, over a total of 976750832 blocks.
JMAGGARD:~# tail -10 /var/log/kern.log
Apr 14 16:36:31 JMAGGARD kernel: usb 1-2: new high speed USB device
using ehci_hcd and address 2
Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2
Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000
KB/sec/disk.
Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for requested-resync.
Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total
of 972041296 blocks.
Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting
Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3
Apr 14 17:33:35 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000
KB/sec/disk.
Apr 14 17:33:35 JMAGGARD kernel: md: using maximum available idle IO
bandwidth (but not more than 200000 KB/sec) for requested-resync.
Apr 14 17:33:35 JMAGGARD kernel: md: using 128k window, over a total
of 976750832 blocks.
JMAGGARD:~#
JMAGGARD:~# mdadm -Dvvs
/dev/md3:
Version : 1.2
Creation Time : Wed Apr 14 10:30:07 2010
Raid Level : raid5
Array Size : 976750832 (931.50 GiB 1000.19 GB)
Used Dev Size : 976750832 (931.50 GiB 1000.19 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Wed Apr 14 16:11:08 2010
State : active, resyncing
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 16K
Rebuild Status : 0% complete
Name : 001AD408C964:3
UUID : 34522369:e16f6b97:c9ba035d:392c01ea
Events : 27
Number Major Minor RaidDevice State
0 8 70 0 active sync /dev/sde6
1 8 38 1 active sync /dev/sdc6
JMAGGARD:~# mdadm -Evvs
mdadm: No md superblock detected on /dev/md3.
mdadm: No md superblock detected on /dev/c/c.
mdadm: No md superblock detected on /dev/md2.
/dev/sdf5:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52
Name : 001AD408C964:2
Creation Time : Tue Apr 13 19:31:40 2010
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB)
Array Size : 9720412960 (4635.05 GiB 4976.85 GB)
Used Dev Size : 1944082592 (927.01 GiB 995.37 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : c190fc75:fbf482b0:6e7ec0f1:bbc3f1f4
Update Time : Wed Apr 14 17:32:51 2010
Checksum : 418c175c - correct
Events : 71
Layout : left-symmetric
Chunk Size : 16K
Device Role : Active device 5
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sde6:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 34522369:e16f6b97:c9ba035d:392c01ea
Name : 001AD408C964:3
Creation Time : Wed Apr 14 10:30:07 2010
Raid Level : raid5
Raid Devices : 2
Avail Dev Size : 1953501952 (931.50 GiB 1000.19 GB)
Array Size : 1953501664 (931.50 GiB 1000.19 GB)
Used Dev Size : 1953501664 (931.50 GiB 1000.19 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 56016c35:ebb5b3b1:732a20a9:2e03e8e0
Update Time : Wed Apr 14 16:11:08 2010
Checksum : 8fec23c5 - correct
Events : 27
Layout : left-symmetric
Chunk Size : 16K
Device Role : Active device 0
Array State : AA ('A' == active, '.' == missing)
/dev/sde5:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52
Name : 001AD408C964:2
Creation Time : Tue Apr 13 19:31:40 2010
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB)
Array Size : 9720412960 (4635.05 GiB 4976.85 GB)
Used Dev Size : 1944082592 (927.01 GiB 995.37 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 4604bf53:fa8b8f98:29ac1273:ddd7d318
Update Time : Wed Apr 14 17:32:51 2010
Checksum : ac8f63ba - correct
Events : 71
Layout : left-symmetric
Chunk Size : 16K
Device Role : Active device 4
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdd5:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52
Name : 001AD408C964:2
Creation Time : Tue Apr 13 19:31:40 2010
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB)
Array Size : 9720412960 (4635.05 GiB 4976.85 GB)
Used Dev Size : 1944082592 (927.01 GiB 995.37 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 162d3da0:9e796a82:e9811a93:5e21fc47
Update Time : Wed Apr 14 17:32:51 2010
Checksum : 3218996f - correct
Events : 71
Layout : left-symmetric
Chunk Size : 16K
Device Role : Active device 3
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdc6:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 34522369:e16f6b97:c9ba035d:392c01ea
Name : 001AD408C964:3
Creation Time : Wed Apr 14 10:30:07 2010
Raid Level : raid5
Raid Devices : 2
Avail Dev Size : 1953501952 (931.50 GiB 1000.19 GB)
Array Size : 1953501664 (931.50 GiB 1000.19 GB)
Used Dev Size : 1953501664 (931.50 GiB 1000.19 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : fc30ef80:1341367e:e611bc15:ae905745
Update Time : Wed Apr 14 16:11:08 2010
Checksum : 78fd5386 - correct
Events : 27
Layout : left-symmetric
Chunk Size : 16K
Device Role : Active device 1
Array State : AA ('A' == active, '.' == missing)
/dev/sdc5:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52
Name : 001AD408C964:2
Creation Time : Tue Apr 13 19:31:40 2010
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB)
Array Size : 9720412960 (4635.05 GiB 4976.85 GB)
Used Dev Size : 1944082592 (927.01 GiB 995.37 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 45e19226:873f8207:49543089:a8d14f46
Update Time : Wed Apr 14 17:32:51 2010
Checksum : 31ef962f - correct
Events : 71
Layout : left-symmetric
Chunk Size : 16K
Device Role : Active device 2
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sdb5:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52
Name : 001AD408C964:2
Creation Time : Tue Apr 13 19:31:40 2010
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB)
Array Size : 9720412960 (4635.05 GiB 4976.85 GB)
Used Dev Size : 1944082592 (927.01 GiB 995.37 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 5d9c342e:569c096d:63efac3b:28912736
Update Time : Wed Apr 14 17:32:51 2010
Checksum : 416d08af - correct
Events : 71
Layout : left-symmetric
Chunk Size : 16K
Device Role : Active device 1
Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sda5:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 002a8919:5d4f3b2d:99a502c5:6ad57a52
Name : 001AD408C964:2
Creation Time : Tue Apr 13 19:31:40 2010
Raid Level : raid5
Raid Devices : 6
Avail Dev Size : 1944082604 (927.01 GiB 995.37 GB)
Array Size : 9720412960 (4635.05 GiB 4976.85 GB)
Used Dev Size : 1944082592 (927.01 GiB 995.37 GB)
Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 50327e35:a1b602d2:da883a40:e0314bed
Update Time : Wed Apr 14 17:32:51 2010
Checksum : 6960f31c - correct
Events : 71
Layout : left-symmetric
Chunk Size : 16K
Device Role : Active device 0
Array State : AAAAAA ('A' == active, '.' == missing)
JMAGGARD:~#
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing
2010-04-15 0:51 ` Justin Maggard
@ 2010-04-15 1:22 ` Neil Brown
2010-04-17 0:03 ` Justin Maggard
0 siblings, 1 reply; 7+ messages in thread
From: Neil Brown @ 2010-04-15 1:22 UTC (permalink / raw)
To: Justin Maggard; +Cc: Michael Evans, linux-raid
On Wed, 14 Apr 2010 17:51:11 -0700
Justin Maggard <jmaggard10@gmail.com> wrote:
> On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans <mjevans1983@gmail.com> wrote:
> > On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
> >> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans <mjevans1983@gmail.com> wrote:
> >>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
> >>>> Hi all,
> >>>>
> >>>> I've got a system using two RAID5 arrays that share some physical
> >>>> devices, combined using LVM. Oddly, when I "echo repair >
> >>>> /sys/block/md0/md/sync_action", once it finishes, it automatically
> >>>> starts a repair on md1 also, even though I haven't requested it.
> >>>> Also, if I try to stop it using "echo idle >
> >>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few
> >>>> seconds. If I stop that md1 repair immediately, sometimes it will
> >>>> respawn and start doing the repair again on md1. What should I be
> >>>> expecting here? If I start a repair on one array, is it supposed to
> >>>> automatically go through and do it on all arrays sharing that
> >>>> personality?
> >>>>
> >>>> Thanks!
> >>>> -Justin
> >>>>
> >>>
> >>> Is md1 degraded with an active spare? It might be delaying resync on
> >>> it until the other devices are idle.
> >>
> >> No, both arrays are redundant. I'm just trying to do scrubbing
> >> (repair) on md0; no resync is going on anywhere.
> >>
> >> -Justin
> >>
> >
> > First: Reply to all.
> >
> > Second, if you insist that things are not as I suspect:
> >
> > cat /proc/mdstat
> >
> > mdadm -Dvvs
> >
> > mdadm -Evvs
> >
>
> I insist it's something different. :) Just ran into it again on
> another system. Here's the requested output:
Thanks. Very thorough!
> Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2
> Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000
> KB/sec/disk.
> Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO
> bandwidth (but not more than 200000 KB/sec) for requested-resync.
> Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total
> of 972041296 blocks.
> Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting
> Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3
So we see the requested-resync (repair) of md2 started as you requested,
then finished at 17:32:51 when you write 'idle' to 'sync_action'.
Then 44 seconds later a similar repair started on md3.
44 seconds is too long for it to be a direct consequence of the md2 repair
stopping. Something *must* have written to md3/md/sync_action. But what?
Maybe you have "mdadm --monitor" running and it notices when repair on one
array finished and has been told to run a script (--program or PROGRAM in
mdadm.conf) which would then start a repair on the next array???
Seems a bit far-fetched, but I'm quite confident that some program must be
writing to md3/md/sync_action while you're not watching.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing
2010-04-15 1:22 ` Neil Brown
@ 2010-04-17 0:03 ` Justin Maggard
2010-04-17 0:19 ` Berkey B Walker
0 siblings, 1 reply; 7+ messages in thread
From: Justin Maggard @ 2010-04-17 0:03 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid
On Wed, Apr 14, 2010 at 6:22 PM, Neil Brown <neilb@suse.de> wrote:
> On Wed, 14 Apr 2010 17:51:11 -0700
> Justin Maggard <jmaggard10@gmail.com> wrote:
>
>> On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans <mjevans1983@gmail.com> wrote:
>> > On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
>> >> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans <mjevans1983@gmail.com> wrote:
>> >>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard <jmaggard10@gmail.com> wrote:
>> >>>> Hi all,
>> >>>>
>> >>>> I've got a system using two RAID5 arrays that share some physical
>> >>>> devices, combined using LVM. Oddly, when I "echo repair >
>> >>>> /sys/block/md0/md/sync_action", once it finishes, it automatically
>> >>>> starts a repair on md1 also, even though I haven't requested it.
>> >>>> Also, if I try to stop it using "echo idle >
>> >>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few
>> >>>> seconds. If I stop that md1 repair immediately, sometimes it will
>> >>>> respawn and start doing the repair again on md1. What should I be
>> >>>> expecting here? If I start a repair on one array, is it supposed to
>> >>>> automatically go through and do it on all arrays sharing that
>> >>>> personality?
>> >>>>
>> >>>> Thanks!
>> >>>> -Justin
>> >>>>
>> >>>
>> >>> Is md1 degraded with an active spare? It might be delaying resync on
>> >>> it until the other devices are idle.
>> >>
>> >> No, both arrays are redundant. I'm just trying to do scrubbing
>> >> (repair) on md0; no resync is going on anywhere.
>> >>
>> >> -Justin
>> >>
>> >
>> > First: Reply to all.
>> >
>> > Second, if you insist that things are not as I suspect:
>> >
>> > cat /proc/mdstat
>> >
>> > mdadm -Dvvs
>> >
>> > mdadm -Evvs
>> >
>>
>> I insist it's something different. :) Just ran into it again on
>> another system. Here's the requested output:
>
> Thanks. Very thorough!
>
>
>> Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2
>> Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000
>> KB/sec/disk.
>> Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO
>> bandwidth (but not more than 200000 KB/sec) for requested-resync.
>> Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total
>> of 972041296 blocks.
>> Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting
>> Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3
>
> So we see the requested-resync (repair) of md2 started as you requested,
> then finished at 17:32:51 when you write 'idle' to 'sync_action'.
>
> Then 44 seconds later a similar repair started on md3.
> 44 seconds is too long for it to be a direct consequence of the md2 repair
> stopping. Something *must* have written to md3/md/sync_action. But what?
>
> Maybe you have "mdadm --monitor" running and it notices when repair on one
> array finished and has been told to run a script (--program or PROGRAM in
> mdadm.conf) which would then start a repair on the next array???
>
> Seems a bit far-fetched, but I'm quite confident that some program must be
> writing to md3/md/sync_action while you're not watching.
>
> NeilBrown
Well, this is embarrassing. You're exactly right. :) Looks like it
was a bug in the script run by mdadm --monitor. Thanks for the
insight!
-Justin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID scrubbing
2010-04-17 0:03 ` Justin Maggard
@ 2010-04-17 0:19 ` Berkey B Walker
0 siblings, 0 replies; 7+ messages in thread
From: Berkey B Walker @ 2010-04-17 0:19 UTC (permalink / raw)
To: Justin Maggard; +Cc: Neil Brown, linux-raid
Justin Maggard wrote:
> On Wed, Apr 14, 2010 at 6:22 PM, Neil Brown<neilb@suse.de> wrote:
>
>> On Wed, 14 Apr 2010 17:51:11 -0700
>> Justin Maggard<jmaggard10@gmail.com> wrote:
>>
>>
>>> On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans<mjevans1983@gmail.com> wrote:
>>>
>>>> On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard<jmaggard10@gmail.com> wrote:
>>>>
>>>>> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans<mjevans1983@gmail.com> wrote:
>>>>>
>>>>>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard<jmaggard10@gmail.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I've got a system using two RAID5 arrays that share some physical
>>>>>>> devices, combined using LVM. Oddly, when I "echo repair>
>>>>>>> /sys/block/md0/md/sync_action", once it finishes, it automatically
>>>>>>> starts a repair on md1 also, even though I haven't requested it.
>>>>>>> Also, if I try to stop it using "echo idle>
>>>>>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few
>>>>>>> seconds. If I stop that md1 repair immediately, sometimes it will
>>>>>>> respawn and start doing the repair again on md1. What should I be
>>>>>>> expecting here? If I start a repair on one array, is it supposed to
>>>>>>> automatically go through and do it on all arrays sharing that
>>>>>>> personality?
>>>>>>>
>>>>>>> Thanks!
>>>>>>> -Justin
>>>>>>>
>>>>>>>
>>>>>> Is md1 degraded with an active spare? It might be delaying resync on
>>>>>> it until the other devices are idle.
>>>>>>
>>>>> No, both arrays are redundant. I'm just trying to do scrubbing
>>>>> (repair) on md0; no resync is going on anywhere.
>>>>>
>>>>> -Justin
>>>>>
>>>>>
>>>> First: Reply to all.
>>>>
>>>> Second, if you insist that things are not as I suspect:
>>>>
>>>> cat /proc/mdstat
>>>>
>>>> mdadm -Dvvs
>>>>
>>>> mdadm -Evvs
>>>>
>>>>
>>> I insist it's something different. :) Just ran into it again on
>>> another system. Here's the requested output:
>>>
>> Thanks. Very thorough!
>>
>>
>>
>>> Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2
>>> Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000
>>> KB/sec/disk.
>>> Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO
>>> bandwidth (but not more than 200000 KB/sec) for requested-resync.
>>> Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total
>>> of 972041296 blocks.
>>> Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting
>>> Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3
>>>
>> So we see the requested-resync (repair) of md2 started as you requested,
>> then finished at 17:32:51 when you write 'idle' to 'sync_action'.
>>
>> Then 44 seconds later a similar repair started on md3.
>> 44 seconds is too long for it to be a direct consequence of the md2 repair
>> stopping. Something *must* have written to md3/md/sync_action. But what?
>>
>> Maybe you have "mdadm --monitor" running and it notices when repair on one
>> array finished and has been told to run a script (--program or PROGRAM in
>> mdadm.conf) which would then start a repair on the next array???
>>
>> Seems a bit far-fetched, but I'm quite confident that some program must be
>> writing to md3/md/sync_action while you're not watching.
>>
>> NeilBrown
>>
> Well, this is embarrassing. You're exactly right. :) Looks like it
> was a bug in the script run by mdadm --monitor. Thanks for the
> insight!
>
> -Justin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> This, I think, is a nice (and polite) ending. Best wishes to all players.
b-
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-04-17 0:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-10 1:28 RAID scrubbing Justin Maggard
2010-04-10 1:41 ` Michael Evans
[not found] ` <s2y150c16851004091846t94347cf8u9ffd65133061d16b@mail.gmail.com>
2010-04-10 2:01 ` Michael Evans
2010-04-15 0:51 ` Justin Maggard
2010-04-15 1:22 ` Neil Brown
2010-04-17 0:03 ` Justin Maggard
2010-04-17 0:19 ` Berkey B Walker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).