From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LckNU-0008Dy-1t
	for qemu-devel@nongnu.org; Thu, 26 Feb 2009 12:50:32 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LckNT-0008DB-3H
	for qemu-devel@nongnu.org; Thu, 26 Feb 2009 12:50:31 -0500
Received: from [199.232.76.173] (port=47270 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LckNS-0008Cx-UD
	for qemu-devel@nongnu.org; Thu, 26 Feb 2009 12:50:30 -0500
Received: from mail2.shareable.org ([80.68.89.115]:59839)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <jamie@shareable.org>) id 1LckNS-0000Nh-Ex
	for qemu-devel@nongnu.org; Thu, 26 Feb 2009 12:50:30 -0500
Received: from jamie by mail2.shareable.org with local (Exim 4.63)
	(envelope-from <jamie@shareable.org>) id 1LckNN-0002ku-Ep
	for qemu-devel@nongnu.org; Thu, 26 Feb 2009 17:50:25 +0000
Date: Thu, 26 Feb 2009 17:50:25 +0000
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
Message-ID: <20090226175025.GA10284@shareable.org>
References: <20090225233718.GA15750@amd.home.annexia.org>
	<20090226105106.GD22494@redhat.com>
	<1235658682.5894.152.camel@ecrins.fosdick.home.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1235658682.5894.152.camel@ecrins.fosdick.home.net>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Steve Fosdick wrote:
> Perhaps we could have a second timer such that if, on asking the guest
> to shut down via ACPI, the guest does not respond within a certain time
> limit with an ACPI request to turn the power off we go for one of the
> other options below?

Good idea.  ACPI is notoriously flaky, especially on a guest which has
already crashed its kernel...

> 1. Ensure continuity of service.  When a guest OS gets stuck for some
> reason make sure it is re-started.  This is probably the only use case
> on a real physical machine.

For real continuity of service you'd also want QEMU itself to have a
watchdog.  Either a software watchdog internally (SIGALRM => kill/exec
self, or child process expecting regular pings over a pipe), or by
QEMU itself becoming a client of the host watchdog.

I say this because I've experienced KVM lock up several times.

> 2. Limit the resource consumption of a crashed guest when the host
> serves other guests.  This probably only of concern for virtual
> machines, i.e. it is specific to the emulated watchdog and its
> interaction with qemu rather than being part of how a physical watchdog
> works.

Related to this is "omg the database guest has crashed - and frankly
we don't rtust the automatic recovery process - stop it for now and
we'll inspect for damage manually before starting it again".

> Do we want to offer the guest the option of a clean shutdown if it can
> still manage that and then reboot, i.e. the shutdown option but for use
> case 1?
> 
> If so we need to be able to turn the APCI power off request into a reset
> instead.  We already have the -no-reboot option to turn a reboot into a
> power off - this is the opposite.

Interesting idea.

> In fact, some people may find that option useful anyway even without the
> watchdog.  In an environment where someone has privileged access to a
> guest but no direct access to the host OS he could shut down a guest
> accidentally when intending to reboot (or logoff).  It may be useful to
> trap that and turn the shutdown into a reboot.

I've done that a few times.  It's only minorly annoying in that you
lose the VNC connection and have to login and restart the VM.

Side notes: It would be nice to be able to change the
"shutdown-when-asked-to-reboot" (et al) option from the monitor.  It
would also be nice to "pause-when-asked-to-shutdown/reboot", which is
useful during automatic OS installs - the host script changes the
media and/or hardware at each reboot.

-- Jamie