From mboxrd@z Thu Jan 1 00:00:00 1970 From: Piergiorgio Sartor Subject: Re: md road-map: 2011 Date: Wed, 16 Feb 2011 23:53:17 +0100 Message-ID: <20110216225317.GA3306@lazy.lzy> References: <20110216212751.51a294aa@notabene.brown> <20110216202939.GA2756@lazy.lzy> <20110217084826.77f4dbf1@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20110217084826.77f4dbf1@notabene.brown> Sender: linux-raid-owner@vger.kernel.org To: NeilBrown Cc: Piergiorgio Sartor , linux-raid@vger.kernel.org List-Id: linux-raid.ids > > > when the rebuild of the secondary completes. Commonly this would be > > > ideal, but if the secondary experienced any write errors (that were > > > recorded in the bad block log) then it would be best to leave both in > > > place until the sysadmin resolves the situation. So in the first > > > implementation this failing should not be automatic. > > > > Maybe putting the primary as "spare", i.e. not failed nor > > working, unless the "migration" was not successful. In that > > case the secondary device should be failed. > > Maybe ... but what if both primary and secondary have bad blocks on them? > What do I do then? IMHO this means migration was not sucessful, so you return to the original state, with the primary disk up and running. Assuming you realize the secondary has bad blocks, otherwise I do not think there are any possibilities. > > My use case here is disk "rotation" :-). That is, for example, a > > RAID-5/6 with n disks + 1 spare. Each X months/weeks/days/hours > > one disk is pulled out of the array and the spare one takes over. > > The pulled out disk will be the new spare (and powered down, possibly). > > The idea here is to have n disks which will have, after some time, > > different (increasing) power on hours, so to minimize the possibility > > of multiple failures. > > Interesting idea. This could be managed with some user-space tool that > initiates the 'hot-replace' and 'fail' from time to time and keeps track of > ages. Exactly, my idea was to have a daemon, which, time to time, maybe reading the power up hours from the SMART information, will remove the oldest disk replacing it with the youngest. There could be other policies, of course. > > > Better reporting of inconsistencies. > > > ------------------------------------ > > > > > > When a 'check' finds a data inconsistency it would be useful if it > > > was reported. That would allow a sysadmin to try to understand the > > > cause and possibly fix it. > > > > Could you, please, consider to add, for RAID-6, the > > capability to report also which device, potentially, > > has the problem? Thanks! > > I would rather leave that to user-space. If I report where the problem is, a > tool could directly read all the blocks in that stripe and perform any fancy > calculations you like. I may even write that tool (but no promises). I guess you have already the tool, don't you remember? :-) bye, -- piergiorgio