From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60268)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1WT5Oe-0004rL-LG
	for qemu-devel@nongnu.org; Thu, 27 Mar 2014 04:10:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1WT5OX-0004pL-2N
	for qemu-devel@nongnu.org; Thu, 27 Mar 2014 04:10:44 -0400
Received: from mx1.redhat.com ([209.132.183.28]:3682)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1WT5OW-0004oz-PU
	for qemu-devel@nongnu.org; Thu, 27 Mar 2014 04:10:37 -0400
Date: Thu, 27 Mar 2014 10:10:40 +0200
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20140327081040.GA21756@redhat.com>
References: <CAMrG31z=oy-53Lfya4svhNniD_7Q1YETuHeZsotHj8U5xJNYmw@mail.gmail.com>
	<20140327064158.GA17563@redhat.com>
	<87y4zwt7mu.fsf@blackfin.pond.sub.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <87y4zwt7mu.fsf@blackfin.pond.sub.org>
Subject: Re: [Qemu-devel] Massive read only kvm guests when backing file was
 missing
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Markus Armbruster <armbru@redhat.com>
Cc: kvm@vger.kernel.org, ghammer@redhat.com, Stefan Hajnoczi <stefanha@gmail.com>, Jason Wang <jasowang@redhat.com>, linux-kernel@vger.kernel.org, qemu-devel@nongnu.org, Alejandro Comisario <alejandro.comisario@mercadolibre.com>

On Thu, Mar 27, 2014 at 08:36:57AM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Wed, Mar 26, 2014 at 11:08:03PM -0300, Alejandro Comisario wrote:
> >> Hi List!
> >> Hope some one can help me, we had a big issue in our cloud the other
> >> day, a couple of our openstack regions ( +2000 kvm guests with qcow2 )
> >> went read only filesystem from the guest side because the backing
> >> files directory (the openstack _base directory) was compromised and
> >> the data was lost, when we realized the data was lost, it took us 5
> >> mins to restore the backup of the backing files, but by that time all
> >> the kvm guests received some kind of IO error from the hypervisor
> >> layer, and went read only on root filesystem.
> >> 
> >> My question would be, is there a way to hold the IO operations against
> >> the backing files ( i thought that would be 99% READ operations ) for
> >> a little longer ( im asking this because i dont quite understand what
> >> is the process and when it raises the error ) in a case the backing
> >> files are missing (no IO possible) but is recoverable within minutes ?
> >> 
> >> Any tip  on how to achieve this if possible, or information about how
> >> backing files works on kvm, will be amazing.
> >> Waiting for feedback!
> >> 
> >> kindest regards.
> >> Alejandro Comisario
> >
> >
> > I'm guessing this is what happened: guests timed out meanwhile.
> > You can increase the timeout within the guest:
> > echo 600 > /sys/block/sda/device/timeout
> > to timeout after 10 minutes.
> >
> > If you have installed qemu guest agent on your system, you can do this
> > from the host. Unfortunately by default it's memory can be pushed out to swap
> > and then on disk error access there might will fail :(
> > Maybe we should consider mlock on all its memory at least as an option.
> >
> > You could pause your guests, restart them after the issue is resolved,
> > and we could I guess add functionality to pause VM on disk errors
> > automatically.
> > Stefan?
> 
> Would -drive rerror=stop do?

I think it will. It's a pity it doesn't appear in --help output -
would make it easier to find.

-- 
MST