From mboxrd@z Thu Jan 1 00:00:00 1970 From: Berkey B Walker Subject: Re: RAID scrubbing Date: Fri, 16 Apr 2010 20:19:24 -0400 Message-ID: <4BC8FE8C.7060600@panix.com> References: <20100415112206.6fcd3d3f@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-raid-owner@vger.kernel.org To: Justin Maggard Cc: Neil Brown , linux-raid@vger.kernel.org List-Id: linux-raid.ids Justin Maggard wrote: > On Wed, Apr 14, 2010 at 6:22 PM, Neil Brown wrote: > >> On Wed, 14 Apr 2010 17:51:11 -0700 >> Justin Maggard wrote: >> >> >>> On Fri, Apr 9, 2010 at 7:01 PM, Michael Evans wrote: >>> >>>> On Fri, Apr 9, 2010 at 6:46 PM, Justin Maggard wrote: >>>> >>>>> On Fri, Apr 9, 2010 at 6:41 PM, Michael Evans wrote: >>>>> >>>>>> On Fri, Apr 9, 2010 at 6:28 PM, Justin Maggard wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I've got a system using two RAID5 arrays that share some physical >>>>>>> devices, combined using LVM. Oddly, when I "echo repair> >>>>>>> /sys/block/md0/md/sync_action", once it finishes, it automatically >>>>>>> starts a repair on md1 also, even though I haven't requested it. >>>>>>> Also, if I try to stop it using "echo idle> >>>>>>> /sys/block/md0/md/sync_action", a repair starts on md1 within a few >>>>>>> seconds. If I stop that md1 repair immediately, sometimes it will >>>>>>> respawn and start doing the repair again on md1. What should I be >>>>>>> expecting here? If I start a repair on one array, is it supposed to >>>>>>> automatically go through and do it on all arrays sharing that >>>>>>> personality? >>>>>>> >>>>>>> Thanks! >>>>>>> -Justin >>>>>>> >>>>>>> >>>>>> Is md1 degraded with an active spare? It might be delaying resync on >>>>>> it until the other devices are idle. >>>>>> >>>>> No, both arrays are redundant. I'm just trying to do scrubbing >>>>> (repair) on md0; no resync is going on anywhere. >>>>> >>>>> -Justin >>>>> >>>>> >>>> First: Reply to all. >>>> >>>> Second, if you insist that things are not as I suspect: >>>> >>>> cat /proc/mdstat >>>> >>>> mdadm -Dvvs >>>> >>>> mdadm -Evvs >>>> >>>> >>> I insist it's something different. :) Just ran into it again on >>> another system. Here's the requested output: >>> >> Thanks. Very thorough! >> >> >> >>> Apr 14 17:32:23 JMAGGARD kernel: md: requested-resync of RAID array md2 >>> Apr 14 17:32:23 JMAGGARD kernel: md: minimum _guaranteed_ speed: 1000 >>> KB/sec/disk. >>> Apr 14 17:32:23 JMAGGARD kernel: md: using maximum available idle IO >>> bandwidth (but not more than 200000 KB/sec) for requested-resync. >>> Apr 14 17:32:23 JMAGGARD kernel: md: using 128k window, over a total >>> of 972041296 blocks. >>> Apr 14 17:32:51 JMAGGARD kernel: md: md_do_sync() got signal ... exiting >>> Apr 14 17:33:35 JMAGGARD kernel: md: requested-resync of RAID array md3 >>> >> So we see the requested-resync (repair) of md2 started as you requested, >> then finished at 17:32:51 when you write 'idle' to 'sync_action'. >> >> Then 44 seconds later a similar repair started on md3. >> 44 seconds is too long for it to be a direct consequence of the md2 repair >> stopping. Something *must* have written to md3/md/sync_action. But what? >> >> Maybe you have "mdadm --monitor" running and it notices when repair on one >> array finished and has been told to run a script (--program or PROGRAM in >> mdadm.conf) which would then start a repair on the next array??? >> >> Seems a bit far-fetched, but I'm quite confident that some program must be >> writing to md3/md/sync_action while you're not watching. >> >> NeilBrown >> > Well, this is embarrassing. You're exactly right. :) Looks like it > was a bug in the script run by mdadm --monitor. Thanks for the > insight! > > -Justin > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > This, I think, is a nice (and polite) ending. Best wishes to all players. b-