From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LNAVD-0001ao-NB for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:30:07 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LNAVB-0001YH-M1 for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:30:07 -0500 Received: from [199.232.76.173] (port=51404 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LNAVB-0001Xo-DG for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:30:05 -0500 Received: from qw-out-1920.google.com ([74.125.92.145]:62796) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1LNAVA-0007Tz-Pz for qemu-devel@nongnu.org; Wed, 14 Jan 2009 13:30:04 -0500 Received: by qw-out-1920.google.com with SMTP id 5so171985qwc.4 for ; Wed, 14 Jan 2009 10:30:01 -0800 (PST) Message-ID: <496E2F1D.9060809@codemonkey.ws> Date: Wed, 14 Jan 2009 12:29:49 -0600 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] Stop VM on ENOSPC error References: <20090114120358.GS3267@redhat.com> <20090114121147.GI24995@redhat.com> <20090114164617.GB6431@shareable.org> <20090114173044.GS24995@redhat.com> In-Reply-To: <20090114173044.GS24995@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Daniel P. Berrange" , qemu-devel@nongnu.org Daniel P. Berrange wrote: > On Wed, Jan 14, 2009 at 04:46:17PM +0000, Jamie Lokier wrote: > >> Daniel P. Berrange wrote: >> >>> Thus I'd suggest we need an async notification of this event, >>> and only enable this behaviour if the app controlling QEMU has >>> explicitly enabled this notification / feature. >>> >> I think the behaviour should always be enabled (unless explicitly >> disabled, but I'm not sure why you'd want to do that). >> >> A corrupt VM with data loss sounds much worse than a stopped VM to me. >> > > You're not corrupting data in current code - you're just unable to finish > new writes, because an IO failure is propagated back to the guest. If the > guest is properly checking for & handling I/O failures, it should be pretty > much OK once the host space problem is resolved - perhaps a reboot + journal > recovery. > Not at all. When the guest gets an IO error, it's going to try and mark the sector bad and move on. If it does do a journal recover on reboot, you're even more screwed because writes will randomly fail. Writes to pre-allocated storage will succeed but unallocated storage will fail. The guest has no awareness into this error scenario so there's nothing that it can reasonably do to recover. > Older QEMU certainly had catastrophic data loss on ENOSPC due to not sending > any I/O errors back to the guest, so it thought its write had succeeded when > in fact it had been thrown away. Current QEMU is more careful about error > propagation now. > But the error propagation in the event of ENOSPC is totally wrong. Try it out and your guest will corrupt itself. It's even more catastrophic with qcow2 but that shouldn't be surprising at this point. Regards, Anthony Liguori > Daniel >