From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dieter Stueken Subject: Re: Bad blocks are killing us! Date: Fri, 19 Nov 2004 19:47:30 +0100 Message-ID: <419E3FC2.6080205@conterra.de> References: <200411180147.iAI1l5N02116@www.watkins-home.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <200411180147.iAI1l5N02116@www.watkins-home.com> Sender: linux-raid-owner@vger.kernel.org To: linux-raid@vger.kernel.org List-Id: linux-raid.ids Guy Watkins wrote: > "but the md-level > approach might be better. But I'm not sure I see the point of > it---unless you have raid 6 with multiple parity blocks, if a disk > actually has the wrong information recorded on it I don't think you > can detect which drive is bad, just that one of them is." >=20 > If there is a parity block that does not match the data, true you do = not > know which device has the wrong data. However, if you do not "correc= t" the > parity, when a device fails, it will be constructed differently than = it was > before it failed. This will just cause more corrupt data. The parit= y must > be made consistent with whatever data is on the data blocks to preven= t this > corrosion of data. With RAID6 it should be possible to determine whi= ch > block is wrong. It would be a pain in the @$$, but I think it would = be > doable. I will explain my theory if someone asks. This is exactly the same conflict, a single drive has with a unreadable= sector. It notices the sector beeing bad, and it can not fulfill any read reque= st, until the data is not rewritten or erased. The single drive can not (and shou= ld never try to!) silently replace the bad sector by some spare sectors, as it c= an not recover the content. Also the RAID system can not solve this problem automagically, and neve= r should do so, as the former content can not be deduced any more. But notice, t= hat we have two very different problems to examine: The above problem arises, = if all disks of the RAID system claim to read correct data, whereas the parity= information tells us, that one of them must be wrong. As long as we don't have RAID= 6, to recover single bit errors, the data is LOST and can not be recovered= =2E This is very different to the situation, if one of the disks DOES repor= t an internal crc-error. In this case your data CAN be recovered reliable= from the parity information, and in most cases even successfully written back to= the disk. But there is also a difference between the problem for RAID compared to= the internal disk: Whereas the disk always reads all CRC data for the sector to veri= fy its integrity, the RAID system does not normally check the validity of the parity info= rmation by default. (this is, why the idea of data scans actually came up). So,= if a scan discovers a bad parity information, the only action that can (and must!= ) be taken is, to tag this peace of data to be invalid. And it is very important, = not only to log that information somewhere. It is even more important to prevent= further readings of this peace of lost data. Otherwise those definitely invalid data may= be read without any notice, may get written back again and thus turn into valid= data, even though it become garbage. People often argue for some spare sector management, which would solve = all problems. I think this is an illusion. Spare sectors can only be useful if you fa= il WRITING data, not when reading data failed or data loss occurred. This is realized al= ready within the single disks in a sufficient way (I think). If your disk gives writ= e errors, you either have a very old one, without internal spare sector management, o= r your disk run out of spare sectors already. Read errors are quite more frequent t= han write errors and thus a much more important issue. Dieter St=FCken. --=20 Dieter St=FCken, con terra GmbH, M=FCnster stueken@conterra.de http://www.conterra.de/ (0)251-7474-501 - To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html