From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LcOc5-0001yx-Sk for qemu-devel@nongnu.org; Wed, 25 Feb 2009 13:36:09 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LcOc5-0001yb-1f for qemu-devel@nongnu.org; Wed, 25 Feb 2009 13:36:09 -0500 Received: from [199.232.76.173] (port=37102 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LcOc4-0001yX-Pt for qemu-devel@nongnu.org; Wed, 25 Feb 2009 13:36:08 -0500 Received: from mail2.shareable.org ([80.68.89.115]:56168) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LcOc4-0006mB-9u for qemu-devel@nongnu.org; Wed, 25 Feb 2009 13:36:08 -0500 Date: Wed, 25 Feb 2009 18:36:05 +0000 From: Jamie Lokier Subject: Re: [Qemu-devel] [6388] Stop VM on ENOSPC error. Message-ID: <20090225183605.GA16453@shareable.org> References: <49A577FD.60701@codemonkey.ws> <20090225170422.GD8810@redhat.com> <20090225173429.GV24969@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090225173429.GV24969@redhat.com> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Daniel P. Berrange" , qemu-devel@nongnu.org Daniel P. Berrange wrote: > > Or even to stop. What guest can do with other errors anyway? > > The idea is that if the guest at least sees the I/O error, then it won't > continue writing as if everything were OK. It may not be able to continue > normal operation, but it can at least mark the FS read-only and avoid > ongoing damage. So you have a reasonable liklihood of shutting down the > guest, fixing the ENOSPC problem ont he host, and starting the guests > again & them recovering their journal. 'ignore' is guarenteed dataloss, > 'report' gives you a good fighting chance. 'stop'/'enospc' are best, if > the management app is able to detect that the VM is being paused & thus > report it to the user Even Linux guests don't respond to an I/O error quite as above. In Linux, the I/O queue (elevator) may have additional pending write commands which are executed despite an earlier "journal" write failing. This can result in an inconsistent filesystem if one write fails and later ones succeed. It's equivalent to losing power and barriers not being honoured. If QEMU were to have a "sticky" error flag which turns all writes into I/O errors after a failed one, even if the host ENOSPC is transient, that would be better for guest data consistency. (Not perfect for transactions spanning multiple disks (etc.), but good for a single journalled filesystem). -- Jamie