From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KQi2S-0005wQ-4k for qemu-devel@nongnu.org; Wed, 06 Aug 2008 08:22:48 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KQi2Q-0005tw-8v for qemu-devel@nongnu.org; Wed, 06 Aug 2008 08:22:47 -0400 Received: from [199.232.76.173] (port=48525 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KQi2Q-0005tR-0F for qemu-devel@nongnu.org; Wed, 06 Aug 2008 08:22:46 -0400 Received: from mail2.shareable.org ([80.68.89.115]:58824) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1KQi2P-0007kP-JG for qemu-devel@nongnu.org; Wed, 06 Aug 2008 08:22:45 -0400 Date: Wed, 6 Aug 2008 13:22:43 +0100 From: Jamie Lokier Subject: Re: [Qemu-devel] [PATCH] report read/write errors to IDE guest driver as ECC errors Message-ID: <20080806122241.GA14937@shareable.org> References: <20080805115506.GR4478@implementation.uk.xensource.com> <48990BC6.1050503@codemonkey.ws> <20080806092822.GC9055@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080806092822.GC9055@redhat.com> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Daniel P. Berrange" , qemu-devel@nongnu.org Daniel P. Berrange wrote: > If you have a journalling filesystem, the worst that'll happen in the > ENOSPC scenario is that you'll loose data from the open application files > that aren't flusshed to disk - no different to pulling the power plug. > The filesystem itself will not corrupt itself - it'll happily recover > the journal & carry on after rebootint. So the filesystem's ok but the application's files are corrupt - doesn't sound too good :-) Journalling filesystems are supposed to be robust against sudden reboot/power failure (despite this basic expectation, Linux ext3 is not robust against power failure by default). Journalled filesystems should also be robust against I/O errors, but in fact that would require an IOP sequence like WRITE, BARRIER, WRITE to abort the second WRITE if the first one fails with an I/O error. Linux does not abort the second WRITE - and therefore an isolated write I/O error can result in filesystem corruption on all its journalled filesystems. When TCQ/NCQ are used, all commands my be in flight concurrently, I'm not sure if it's even possible to auto-abort the second WRITE when the first errors, in any guest. (There are also weaknesses in Linux's handling of I/O errors in the VM, discussed recently with a "sweep it under the carpet, handling I/O errors properly in the VM is too hard" conclusion.) I wouldn't be surprised if other guests have similar weaknesses. Solaris ZFS may be an exception, as they claim to have thoroughly tested it with simulated I/O errors. Therefore, at least, when QEMU reports a write I/O error due to ENOSPC (and perhaps due to EIO), it should set a sticky flag so that all subsequent writes error without trying to write. > Unless someone wants to implement the ENOSPC handling right now, I'd > like to see this patch just committed as is, so we at least get > incremental benefit over current behaviour, which definitely *does* > corrupt guest filesystems by silently pretending the write succeeed. > Special ENOSPC handling can be added on top. I suggest adding a sticky flag: Once hit ENOSPC (due to extending qcow2), all further writes should fail even if they don't need to extend the file. This will prevent some kinds of guest journalled filesystem corruption. > I agree that pausing the guest is probably best option in that scenario, > the interesting question being how to inform management tools/API that > the VM has just paused itself. In libvirt we handle pause/resume by doing > 'stop'/'cont' in the QEMU monitor, and since we're triggering it ourselves > we can track the state change from running to paused. If the VM pauses > itself though we nee to figure out a way to detect this state change. > The monitor doesn't have any asynchronous notification capability as it > stands. It does have the log file, I suppose, or it could poll the CPU state every so often. Not the prettiest mechanisms. -- Jamie