From mboxrd@z Thu Jan 1 00:00:00 1970 From: Phil Turmel Subject: Re: md road-map: 2011 Date: Thu, 17 Feb 2011 13:46:58 -0500 Message-ID: <4D5D6D22.2010406@turmel.org> References: <20110216212751.51a294aa@notabene.brown> <20110216202939.GA2756@lazy.lzy> <20110217084826.77f4dbf1@notabene.brown> <4D5C6AAF.1040600@turmel.org> <20110217115257.28a8d174@notabene.brown> <4D5C768A.1010502@turmel.org> <20110217141017.01e30eab@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20110217141017.01e30eab@notabene.brown> Sender: linux-raid-owner@vger.kernel.org To: NeilBrown Cc: Piergiorgio Sartor , linux-raid@vger.kernel.org List-Id: linux-raid.ids On 02/16/2011 10:10 PM, NeilBrown wrote: > On Wed, 16 Feb 2011 20:14:50 -0500 Phil Turmel wrote: > >> On 02/16/2011 07:52 PM, NeilBrown wrote: > >>> So when you do the computation on all of the bytes in all of the blocks you >>> get a block full of answers. >>> If the answers are all the same - that tells you something fairly strong. >>> If they are a "all different" then that is also a fairly strong statement. >>> But what if most are the same, but a few are different? How do you interpret >>> that? >> >> Actually, I was thinking about that. (You suckered me into reading that PDF >> some weeks ago.) I would be inclined to allow the kernel to make corrections >> where "all the same" covers individual sectors, per the sector size reported >> by the underlying device. > > To see what I am strongly against having the kernel make automatic > corrections like this, see > > http://neil.brown.name/blog/20100211050355 I read it, and slept on it, and my gut wants to argue. But I have no data to back me up. I think I'll take a stab at reporting inconsistencies via simple printk with a sysfs on/off switch. >> Also, the comparison would have to ignore "neutral bytes", where P & Q >> happened to be correct for that byte position. >> >>> The point I'm trying to get to is that the result of this RAID6 calculation >>> isn't a simple "that device is bad". It is a block of data that needs to be >>> interpreted. >>> >>> I'd rather have user-space do that interpretation, so it may as well do the >>> calculation too. >>> >>> If you wanted to do it in the kernel, you would need to be very clear about >>> what information you provide, what it means exactly, and why it is sufficient. >> >> Given that the hardware is going to do error correction and checking at a >> sector size granularity, and the kernel would in fact rewrite that sector using >> this calculation if the hardware made a "fairly strong" statement that it can't >> be trusted, I'd argue that rewriting the sector is appropriate. > > You the RAID6 calculation tells you is that something cannot be trusted. It > doesn't tell you what. It could be the controller, the cable, the drive > logic, or the rust on the media. Without the knowledge, correction can be > dangerous. True, but inconsistent data is also dangerous, as traffic on this list shows. The question is, "When is it safer to correct than to leave alone?" I don't think there's enough data to answer that, unless you have some pointers to studies that address it. Either way, a reporting method is needed, and might give us some numbers to work with. Phil