From mboxrd@z Thu Jan  1 00:00:00 1970
From: Anthony Liguori <anthony@codemonkey.ws>
Subject: Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress introduces
 qcow2 corruptions
Date: Tue, 15 Nov 2011 07:56:28 -0600
Message-ID: <4EC26F8C.10102@codemonkey.ws>
References: <4EBAAA68.10801@redhat.com> <4EBAACAF.4080407@codemonkey.ws>	<4EBAB236.2060409@redhat.com> <4EBAB9FA.3070601@codemonkey.ws>	<4EBB919B.7040605@redhat.com> <4EBC1792.3030004@codemonkey.ws>	<4EBC4260.1090405@codemonkey.ws> <4EBCF5DA.1000605@redhat.com>	<4EBE499E.4030100@redhat.com> <20111114101610.GA32392@redhat.com>	<4EC1238B.2030906@codemonkey.ws> <874ny5iktv.fsf@trasno.mitica>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Kevin Wolf <kwolf@redhat.com>,
	Lucas Meneghel Rodrigues <lmr@redhat.com>,
	KVM mailing list <kvm@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	QEMU devel <qemu-devel@nongnu.org>, Avi Kivity <avi@redhat.com>
To: quintela@redhat.com
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-iy0-f174.google.com ([209.85.210.174]:44901 "EHLO
	mail-iy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753894Ab1KON4c (ORCPT <rfc822;kvm@vger.kernel.org>);
	Tue, 15 Nov 2011 08:56:32 -0500
Received: by iage36 with SMTP id e36so8452310iag.19
        for <kvm@vger.kernel.org>; Tue, 15 Nov 2011 05:56:32 -0800 (PST)
In-Reply-To: <874ny5iktv.fsf@trasno.mitica>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 11/15/2011 07:20 AM, Juan Quintela wrote:
>> Again, I think defaulting DAS to cache=none|directsync is what makes
>> the most sense here.
>
> I think it is the only sane solution.  Otherwise, we need to write the
> equivalent of a lock manager, to know _who_ has the storage, and
> distributed lock managers are a mess :-(
>
>> We can even add a migration blocker for DAS with cache=on.  If we can
>> do dynamic toggling of the cache setting, then that's pretty friendly
>> at the end of the day.
>
> That could fix the problem also.  At the moment that we start migration,
> we do an fsync() + switch to O_DIRECT for all filesystems.
>
> As you said, time for implementing fcntl(O_DIRECT).

Yeah, I think this ends up being a very elegant solution.

We always open block devices O_DIRECT to start with.  That ensures reads go 
directly to disk if its DAS or result in NFS protocol reads.

As long as we fsync on the source (and we do), then we're okay.

For cache=write{back,through}, we would then just fcntl() away O_DIRECT as soon 
as we start the guest.  Then we can start doing reads through the page cache.

Regards,

Anthony Liguori

> Later, Juan.
>