From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ric Wheeler Subject: Re: XFS corruption during power-blackout Date: Fri, 01 Jul 2005 09:57:48 -0400 Message-ID: <42C54BDC.6000206@emc.com> References: <20050629001847.GB850@frodo> <200506290453.HAA14576@raad.intranet> <556815.441dd7d1ebc32b4a80e049e0ddca5d18e872c6e8a722b2aefa7525e9504533049d801014.ANY@taniwha.stupidest.org> <42C4FC14.7070402@slaphack.com> <20050701092412.GD2243@suse.de> <20050701131950.GA15180@ime.usp.br> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-kernel@vger.kernel.org, Brett Russ , linux-fsdevel@vger.kernel.org Return-path: Received: from mailhub.lss.emc.com ([168.159.2.31]:57260 "EHLO mailhub.lss.emc.com") by vger.kernel.org with ESMTP id S263348AbVGAN7A (ORCPT ); Fri, 1 Jul 2005 09:59:00 -0400 To: =?ISO-8859-1?Q?Rog=E9rio_Brito?= In-Reply-To: <20050701131950.GA15180@ime.usp.br> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org Rog=E9rio Brito wrote: >On Jul 01 2005, Jens Axboe wrote: > =20 > >>On Fri, Jul 01 2005, David Masover wrote: >> =20 >> >>>Not always possible. Some disks lie and leave caching on anyway. >>> =20 >>> >>And the same (and others) disks will not honor a flush anyways. >>Moral of that story - avoid bad hardware. >> =20 >> > >But how does the end-user know what hardware is "good hardware"? Which >vendors don't lie (or, at least, lie less than others) regarding HDs? > > >Thanks, Rog=E9rio Brito. > > =20 > The only real way is to test the drive (and retest when you get a new=20 versions of firmware) and the whole fsync -> write barrier code path. We use a bus analyzer to make sure that when you fsync() a file, you=20 will see a cache flush command coming across the bus. Of course, that i= s=20 the easy step ;-) The second step is to test your system across power failures. We have = a=20 "wbtest" code that we have used to catch bugs. The basic idea is to=20 write a file to a disk with the cache turned off, write the same file t= o=20 the disk with the write barrier (and working cache flush command) and=20 then randomly drop power to the box. It is important to really drop=20 power to the whole box since a "reset button" push often does not drop=20 power to the drives and will give you false passes. Our wbtest used to be good at finding holes in the write barrier code=20 using 2.4 kernels and PATA drives, but we have had no luck yet in=20 catching known bugs with this test on 2.6 with S-ATA drives. Ideas on how to get a more effective test are welcome - it is a very=20 small window that you need to hit to catch a misbehaving drive (i.e.,=20 your write cache flush command has returned, you want to drop power and= =20 on reboot, validate that the platter contains that last IO correctly). = =20 If you had enough NVRAM in a test system, you might be able to=20 substitute a NVRAM backed file system for the write-cache disabled driv= e=20 and get closer to catching the window. The alternative is to either run with the write cache disabled (again,=20 you will need to validate that the drive really disabled the cache) or=20 to buy a mid-range or better storage array that provides a non-volatile= =20 (battery backed) write cache. - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html