From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60268) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WT5Oe-0004rL-LG for qemu-devel@nongnu.org; Thu, 27 Mar 2014 04:10:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WT5OX-0004pL-2N for qemu-devel@nongnu.org; Thu, 27 Mar 2014 04:10:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:3682) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WT5OW-0004oz-PU for qemu-devel@nongnu.org; Thu, 27 Mar 2014 04:10:37 -0400 Date: Thu, 27 Mar 2014 10:10:40 +0200 From: "Michael S. Tsirkin" Message-ID: <20140327081040.GA21756@redhat.com> References: <20140327064158.GA17563@redhat.com> <87y4zwt7mu.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y4zwt7mu.fsf@blackfin.pond.sub.org> Subject: Re: [Qemu-devel] Massive read only kvm guests when backing file was missing List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Markus Armbruster Cc: kvm@vger.kernel.org, ghammer@redhat.com, Stefan Hajnoczi , Jason Wang , linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, Alejandro Comisario On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote: > "Michael S. Tsirkin" writes: > > > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote: > >> Hi List! > >> Hope some one can help me, we had a big issue in our cloud the other > >> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 ) > >> went read only filesystem from the guest side because the backing > >> files directory (the openstack _base directory) was compromised and > >> the data was lost, when we realized the data was lost, it took us 5 > >> mins to restore the backup of the backing files, but by that time all > >> the kvm guests received some kind of IO error from the hypervisor > >> layer, and went read only on root filesystem. > >> > >> My question would be, is there a way to hold the IO operations against > >> the backing files ( i thought that would be 99% READ operations ) for > >> a little longer ( im asking this because i dont quite understand what > >> is the process and when it raises the error ) in a case the backing > >> files are missing (no IO possible) but is recoverable within minutes ? > >> > >> Any tip on how to achieve this if possible, or information about how > >> backing files works on kvm, will be amazing. > >> Waiting for feedback! > >> > >> kindest regards. > >> Alejandro Comisario > > > > > > I'm guessing this is what happened: guests timed out meanwhile. > > You can increase the timeout within the guest: > > echo 600 > /sys/block/sda/device/timeout > > to timeout after 10 minutes. > > > > If you have installed qemu guest agent on your system, you can do this > > from the host. Unfortunately by default it's memory can be pushed out to swap > > and then on disk error access there might will fail :( > > Maybe we should consider mlock on all its memory at least as an option. > > > > You could pause your guests, restart them after the issue is resolved, > > and we could I guess add functionality to pause VM on disk errors > > automatically. > > Stefan? > > Would -drive rerror=stop do? I think it will. It's a pity it doesn't appear in --help output - would make it easier to find. -- MST