From mboxrd@z Thu Jan 1 00:00:00 1970 From: Piergiorgio Sartor Subject: Re: md road-map: 2011 Date: Thu, 17 Feb 2011 20:56:21 +0100 Message-ID: <20110217195621.GA3296@lazy.lzy> References: <20110216212751.51a294aa@notabene.brown> <20110216202939.GA2756@lazy.lzy> <20110217084826.77f4dbf1@notabene.brown> <4D5C6AAF.1040600@turmel.org> <20110217115257.28a8d174@notabene.brown> <4D5C768A.1010502@turmel.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <4D5C768A.1010502@turmel.org> Sender: linux-raid-owner@vger.kernel.org To: Phil Turmel Cc: NeilBrown , Piergiorgio Sartor , linux-raid@vger.kernel.org List-Id: linux-raid.ids > > So when you do the computation on all of the bytes in all of the blocks you > > get a block full of answers. > > If the answers are all the same - that tells you something fairly strong. > > If they are a "all different" then that is also a fairly strong statement. > > But what if most are the same, but a few are different? How do you interpret > > that? > > Actually, I was thinking about that. (You suckered me into reading that PDF > some weeks ago.) I would be inclined to allow the kernel to make corrections > where "all the same" covers individual sectors, per the sector size reported > by the underlying device. I do agree with Neil on this. User space should collect the data, perform statistics and give suggestions. After that there should be a mechanism, at this point in kernel space, I guess, capable of correcting one single chunk of one device. > Also, the comparison would have to ignore "neutral bytes", where P & Q > happened to be correct for that byte position. Have a look at the patch I submitted to "restripe.c", it should cover the interesting cases. Even if more statistics could be applied. > Given that the hardware is going to do error correction and checking at a > sector size granularity, and the kernel would in fact rewrite that sector using > this calculation if the hardware made a "fairly strong" statement that it can't > be trusted, I'd argue that rewriting the sector is appropriate. The problem could be in the interface (it happened to me) and not in the disk. So, there will be no error correction, at this point, from the device. > Any corrective action that isn't consistent at the sector level should be punted. > I'm very curious what percentage that would be in production environments. Yeah, me too. bye, -- piergiorgio