From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=51486 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OBtxT-0000LD-BM
	for qemu-devel@nongnu.org; Tue, 11 May 2010 14:13:35 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <jamie@shareable.org>) id 1OBtxP-0004CU-9E
	for qemu-devel@nongnu.org; Tue, 11 May 2010 14:13:29 -0400
Received: from mail2.shareable.org ([80.68.89.115]:48729)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <jamie@shareable.org>) id 1OBtxP-0004CG-4N
	for qemu-devel@nongnu.org; Tue, 11 May 2010 14:13:27 -0400
Date: Tue, 11 May 2010 19:13:22 +0100
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [Qemu-devel] Re: [PATCH 2/2] Add flush=off parameter to -drive
Message-ID: <20100511181322.GA30446@shareable.org>
References: <1273528310-7051-1-git-send-email-agraf@suse.de>
	<201005111315.08897.paul@codesourcery.com>
	<4BE950E0.5050107@codemonkey.ws>
	<201005111412.02809.paul@codesourcery.com>
	<4BE959B2.3090904@codemonkey.ws>
	<20100511163242.GA27028@shareable.org>
	<4BE990CE.40505@codemonkey.ws>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4BE990CE.40505@codemonkey.ws>
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Kevin Wolf <kwolf@redhat.com>, Alexander Graf <agraf@suse.de>, armbru@redhat.com, qemu-devel@nongnu.org, Paul Brook <paul@codesourcery.com>, Christoph Hellwig <hch@lst.de>

Anthony Liguori wrote:
> qemu-img create -f raw foo.img 10G
> mkfs.ext3 foo.img
> mount -oloop,rw,barrier=1 -t ext3 foo.img mnt
> 
> Works perfectly fine.

Hmm, interesting.  Didn't know loop propagated barriers.

So you're suggesting to use qemu with a loop device, and ext2 (bit
faster than ext3) and barrier=0 (well, that's implied if you use
ext2), and a raw image file on the ext2/3 filesystem, to provide the
effect of flush=off, becuase the loop device caches block writes on
the host, except for explicit barrier requests from the fs, which are
turned off?

That wasn't obvious the first time :-)

Does the loop device cache fs writes instead of propagating them
immediately to the underlying fs?  I guess it probably does.

Does the loop device allow the backing file to grow sparsely, to get
behavious like qcow2?

That's ugly but it might just work.

> >2. barrier=0 does _not_ provide the cache=off behaviour.  It only
> >disables barriers; it does not prevent writing to the disk hardware.
> 
> The proposal has nothing to do with cache=off.

Sorry, I meant flush=off (the proposal).  Mounting the host filesystem
(i.e. not using a loop device anywhere) with barrier=0 doesn't have
even close to the same effect.

> >>The problem with options added for developers is that those options are
> >>very often accidentally used for production.
> >>     
> >We already have risky cache= options.  Also, do we call fdatasync
> >(with barrier) on _every_ write for guests which disable the
> >emulated disk cache?
> 
> None of our cache= options should result in data corruption on power 
> loss.  If they do, it's a bug.

(I might have the details below a bit off.)

If cache=none uses O_DIRECT without calling fdatasync for guest
barriers, then it will get data corruption on power loss.

If cache=none does call fdatasync for guest barriers, then it might
still get corruption on power loss; I am not sure if recent Linux host
behaviour of O_DIRECT+fdatasync (with no buffered writes to commit)
issues the necessary barriers.  I am quite sure that older kernels did not.

cache=writethrough will get data corruption on power loss with older
Linux host kernels.  O_DSYNC did not issue barriers.  I'm not sure if
the behaviour of O_DSYNC that was recently changed is now issuing
barriers after every write.

Provided all the cache= options call fdatasync/fsync when the guest
issues a cache flush, and call fdatasync/fsync following _every_ write
when the guest has disabled the emulated write cache, that should be
as good as Qemu can reasonably do.  It's up to the host from there.

-- Jamie