From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Mgaqt-0005If-C4
	for qemu-devel@nongnu.org; Thu, 27 Aug 2009 05:01:03 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Mgaqo-0005GY-Nj
	for qemu-devel@nongnu.org; Thu, 27 Aug 2009 05:01:02 -0400
Received: from [199.232.76.173] (port=33164 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Mgaqo-0005GV-LR
	for qemu-devel@nongnu.org; Thu, 27 Aug 2009 05:00:58 -0400
Received: from mail2.shareable.org ([80.68.89.115]:53386)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <jamie@shareable.org>) id 1Mgaqo-0006RL-5h
	for qemu-devel@nongnu.org; Thu, 27 Aug 2009 05:00:58 -0400
Date: Thu, 27 Aug 2009 10:00:51 +0100
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [Qemu-devel] Re: Notes on block I/O data integrity
Message-ID: <20090827090051.GD22631@shareable.org>
References: <20090825181120.GA4863@lst.de>
	<90eb1dc70908251233m4b90ddfuabb4d26bccd62c63@mail.gmail.com>
	<20090825193621.GA19778@lst.de>
	<20090826185755.GF25726@shareable.org>
	<20090826221722.GA1962@lst.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20090826221722.GA1962@lst.de>
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Christoph Hellwig <hch@lst.de>
Cc: rusty@rustcorp.com.au, qemu-devel@nongnu.org, kvm@vger.kernel.org, Javier Guerra <javier@guerrag.com>

Christoph Hellwig wrote:
> On Wed, Aug 26, 2009 at 07:57:55PM +0100, Jamie Lokier wrote:
> > Christoph Hellwig wrote:
> > > > what about LVM? iv'e read somewhere that it used to just eat barriers
> > > > used by XFS, making it less safe than simple partitions.
> > > 
> > > Oh, any additional layers open another by cans of worms.  On Linux until
> > > very recently using LVM or software raid means only disabled
> > > write caches are safe.
> > 
> > I believe that's still true except if there's more than one backing
> > drive, so software RAID still isn't safe.  Did that change?
> 
> Yes, it did change. 

> I will recommend to keep doing what people caring for their data
> have been doing since these volatile write caches came up: turn them
> off.

Unfortunately I tried that on a batch of 1000 or so embedded thingies
with ext3, and the write performance plummeted.  They are the same
thingies where I observed lack of barriers resulting in filesystem
corruption after power failure.  We really need barriers with ATA
disks to get decent write performance.

It's a good recommendation though.

> That being said with the amount of bugs in filesystems related to
> write barriers my expectation for the RAID and device mapper code is
> not too high.

Turning off volatile write cache does not provide commit integrity
with RAID.

RAID needs barriers to plug, drain and unplug the queues across all
backing devices in a coordinated manner quite apart from the volatile
write cache.  And then there's still that pesky problem of writes
which reach one disk and not it's parity disk.

Unfortunately turning off the volatile write caches could actually
make the timing window for failure worse, in the case of system crash
without power failure.

-- Jamie