From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NHdhR-0002Ou-KN
	for qemu-devel@nongnu.org; Mon, 07 Dec 2009 08:32:26 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NHdhM-0002H1-FJ
	for qemu-devel@nongnu.org; Mon, 07 Dec 2009 08:32:24 -0500
Received: from [199.232.76.173] (port=50460 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NHdhM-0002Gm-8y
	for qemu-devel@nongnu.org; Mon, 07 Dec 2009 08:32:20 -0500
Received: from mail-iw0-f197.google.com ([209.85.223.197]:50088)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <anthony@codemonkey.ws>) id 1NHdhM-0005d5-5m
	for qemu-devel@nongnu.org; Mon, 07 Dec 2009 08:32:20 -0500
Received: by iwn35 with SMTP id 35so3035616iwn.4
	for <qemu-devel@nongnu.org>; Mon, 07 Dec 2009 05:32:19 -0800 (PST)
Message-ID: <4B1D03E0.5080006@codemonkey.ws>
Date: Mon, 07 Dec 2009 07:32:16 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH] Disk image shared and exclusive locks.
References: <20091204165301.GA4167@amd.home.annexia.org>
	<20091207103908.GI2271@arachsys.com>
In-Reply-To: <20091207103908.GI2271@arachsys.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Chris Webb <chris@arachsys.com>
Cc: "Richard W.M. Jones" <rjones@redhat.com>, qemu-devel@nongnu.org

Chris Webb wrote:
> Hi. There's a connected discussion on the sheepdog list about locking, and I
> have a patch there which could complement this one quite well.
>
> Sheepdog is a distributed, replicated block store being developed
> (primarily) for Qemu. Images have a mandatory exclusive locking requirement,
> enforced by the cluster manager. Without this, the replication scheme
> breaks down and you can end up with inconsistent copies of the block
> image.
>
> The initial release of sheepdog took these locks in the block driver
> bdrv_open() and bdrv_close() hooks. They also added a bdrv_closeall() and
> ensured it was called in all the usual qemu exit paths to avoid stray locks.
> (The rarer case of crashing hosts or crashing qemus will have to be handled
> externally, and is 'to do'.)
>
> The problem was that this prevented live migration, because both ends wanted
> to open the image at once, even though only one would be using it at a time.
>   
Yeah, this is a bigger problem I think.  Technically speaking, when 
using NFS as the backing filesystem, we really should not open the 
destination end before we close the source end to keep the caches fully 
coherent.

I've resisted this because I'm concerned that if we delay the opening of 
the file on the destination, it could fail.  That's a very late failure 
and that makes me uncomfortable as just a work around for NFS.

But considering this locking situation, I think it is not a bad idea now.

Regards,

Anthony Liguori