From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LNAVD-0001ao-NB
	for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:30:07 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LNAVB-0001YH-M1
	for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:30:07 -0500
Received: from [199.232.76.173] (port=51404 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LNAVB-0001Xo-DG
	for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:30:05 -0500
Received: from qw-out-1920.google.com ([74.125.92.145]:62796)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <anthony@codemonkey.ws>) id 1LNAVA-0007Tz-Pz
	for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:30:04 -0500
Received: by qw-out-1920.google.com with SMTP id 5so171985qwc.4
	for <qemu-devel@nongnu.org>; Wed, 14 Jan 2009 10:30:01 -0800 (PST)
Message-ID: <496E2F1D.9060809@codemonkey.ws>
Date: Wed, 14 Jan 2009 12:29:49 -0600
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH] Stop VM on ENOSPC error
References: <20090114120358.GS3267@redhat.com>	<20090114121147.GI24995@redhat.com>	<20090114164617.GB6431@shareable.org>
	<20090114173044.GS24995@redhat.com>
In-Reply-To: <20090114173044.GS24995@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Daniel P. Berrange" <berrange@redhat.com>, qemu-devel@nongnu.org

Daniel P. Berrange wrote:
> On Wed, Jan 14, 2009 at 04:46:17PM +0000, Jamie Lokier wrote:
>   
>> Daniel P. Berrange wrote:
>>     
>>> Thus I'd suggest we need an async notification of this event,
>>> and only enable this behaviour if the app controlling QEMU has
>>> explicitly enabled this notification / feature.
>>>       
>> I think the behaviour should always be enabled (unless explicitly
>> disabled, but I'm not sure why you'd want to do that).
>>
>> A corrupt VM with data loss sounds much worse than a stopped VM to me.
>>     
>
> You're not corrupting data in current code - you're just unable to finish
> new writes, because an IO failure is propagated back to the guest. If the
> guest is properly checking for & handling I/O failures, it should be pretty
> much OK once the host space problem is resolved - perhaps a reboot + journal
> recovery. 
>   

Not at all.  When the guest gets an IO error, it's going to try and mark 
the sector bad and move on.  If it does do a journal recover on reboot, 
you're even more screwed because writes will randomly fail.  Writes to 
pre-allocated storage will succeed but unallocated storage will fail.

The guest has no awareness into this error scenario so there's nothing 
that it can reasonably do to recover.

> Older QEMU certainly had catastrophic data loss on ENOSPC due to not sending
> any I/O errors back to the guest, so it thought its write had succeeded when
> in fact it had been thrown away. Current QEMU is more careful about error
> propagation now.
>   

But the error propagation in the event of ENOSPC is totally wrong.  Try 
it out and your guest will corrupt itself.  It's even more catastrophic 
with qcow2 but that shouldn't be surprising at this point.

Regards,

Anthony Liguori

> Daniel
>