From mboxrd@z Thu Jan 1 00:00:00 1970 From: Roy Sigurd Karlsbakk Subject: Re: Checksumming RAID? Date: Tue, 27 Nov 2012 12:39:34 +0100 (CET) Message-ID: <22100889.14.1354016374529.JavaMail.root@zimbra> References: <50B4A215.4000203@hesbynett.no> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <50B4A215.4000203@hesbynett.no> Sender: linux-raid-owner@vger.kernel.org To: David Brown Cc: Linux Raid , Bernd Schubert List-Id: linux-raid.ids > I can certainly sympathise with you, but I am not sure that data > checksumming would help here. If your hardware raid sends out > nonsense, > then it is going to be very difficult to get anything trustworthy. Th= e > obvious answer here is to throw out the broken hardware raid and use = a > system that works - but it is equally obvious that that is easier sai= d > than done! But I would find it hard to believe that this is a common > issue with hardware raid systems - it goes against the whole point of > data storage. >=20 > There is always a chance of undetected read errors - the question is > if > the chances of such read errors, and the consequences of them, justif= y > the costs of extra checking. And if they /do/ justify extra checking, > are data checksums the right way? The chance of a silent corruption is rather small with your average 3TB= home storage. On the other hand, if you had a petabyte or five, the ch= ances would be very high indeed to get silent corruption (ref the CERN = study done in 2007). In my last job, I worked with ZFS with ~350TiB sto= rage, and there we saw errors happen rather frequently, but then, since= ZFS checksums data and uses it to deal with errors, we never saw any d= ata loss. That is, except on an older machine, running ZFS on a hardwar= e RAID controlled storage unit (NexSAN SATABeast). We had error corrupt= ion on that one as well, after a disk failure, and had to resort to res= toring from tape, since ZFS couldn't control the RAID. > I agree with Neil's post that > end-to-end checksums (such as CRCs in a gzip file, or GPG integrity > checks) are the best check when they are possible, but they are not > always possible because they are not transparent. The problem with end-to-end-checksums at the application level, is it w= ill only be able to detect the error, not fix it, similar to the issues= I mentioned above. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 98013356 roy@karlsbakk.net http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt.= Det er et element=C3=A6rt imperativ for alle pedagoger =C3=A5 unng=C3=A5= eksessiv anvendelse av idiomer med xenotyp etymologi. I de fleste tilf= eller eksisterer adekvate og relevante synonymer p=C3=A5 norsk. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html