From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1LczJo-0002B5-Bd
	for qemu-devel@nongnu.org; Fri, 27 Feb 2009 04:47:44 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1LczJn-0002AK-1u
	for qemu-devel@nongnu.org; Fri, 27 Feb 2009 04:47:43 -0500
Received: from [199.232.76.173] (port=55633 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1LczJm-0002A2-Ff
	for qemu-devel@nongnu.org; Fri, 27 Feb 2009 04:47:42 -0500
Received: from pelvoux.gotadsl.co.uk ([81.6.248.91]:36818)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <lists@pelvoux.nildram.co.uk>)
	id 1LczJl-00069E-PZ
	for qemu-devel@nongnu.org; Fri, 27 Feb 2009 04:47:42 -0500
Received: from fozzy by ecrins.fosdick.home.net with local (Exim 4.69)
	(envelope-from <lists@pelvoux.nildram.co.uk>) id 1LczMP-0007hd-BY
	for qemu-devel@nongnu.org; Fri, 27 Feb 2009 09:50:25 +0000
Subject: Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
From: Steve Fosdick <lists@pelvoux.nildram.co.uk>
In-Reply-To: <20090226175025.GA10284@shareable.org>
References: <20090225233718.GA15750@amd.home.annexia.org>
	<20090226105106.GD22494@redhat.com>
	<1235658682.5894.152.camel@ecrins.fosdick.home.net>
	<20090226175025.GA10284@shareable.org>
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Date: Fri, 27 Feb 2009 09:50:24 +0000
Message-Id: <1235728224.5894.176.camel@ecrins.fosdick.home.net>
Mime-Version: 1.0
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

On Thu, 2009-02-26 at 17:50 +0000, Jamie Lokier wrote:

> For real continuity of service you'd also want QEMU itself to have a
> watchdog.  Either a software watchdog internally (SIGALRM => kill/exec
> self, or child process expecting regular pings over a pipe), or by
> QEMU itself becoming a client of the host watchdog.

So many possibilities - one, two, or three watchdogs?

If we want to cater for the situation where one host is running more
than one guest we would not want a single watchdog with a communication
path from one of the guests to a hardware watchdog on the host because
this would cause the host to reboot if any one of the guests failed thus
rebooting guests that were still working.

A two-watchdog solution could work though.

The host would be protected with a normal hardware watchdog and this
would use a normal user-space process to tickle that watchdog rather
than QEMU.

For QEMU there is a software watchdog that looks to the guest like a
hardware watchdog and therefore uses the already written driver and a
normal user space process on the guest.

The timer part of the QEMU software watchdog is implemented as a second
userspace process on the host which communicates with the main QEMU
process.  When the guest tickles the watchdog the tickle is forwarded to
the separate watchdog process which resets the timer.  When the timer
goes off the watchdog sends a message back to QEMU to perform the
configured action which QEMU must confirm is happening with a further
message back to the watchdog process.  If QEMU does not respond the
watchdog process uses host OS-level facilities to kill and re-start it.

The three watchdog solution uses the host hardware watchdog, a software
watchdog within QEMU as implemented by Richard and another watchdog that
sits in a separate process which can 'ping' QEMU and, if it does not
respond, uses host OS-level facilities to kill and re-start it.

> > In fact, some people may find that option useful anyway even without the
> > watchdog.  In an environment where someone has privileged access to a
> > guest but no direct access to the host OS he could shut down a guest
> > accidentally when intending to reboot (or logoff).  It may be useful to
> > trap that and turn the shutdown into a reboot.
> 
> I've done that a few times.  It's only minorly annoying in that you
> lose the VNC connection and have to login and restart the VM.

I suspect most of the people using QEMU have access to the host OS too
which means as you say that it is no big deal.

In some corporate environments accidentally shutting down a guest means
submitting a job to another team and then waiting while they get round
to restarting it.  A watchdog is one feature that takes QEMU in the
direction of being suitable for a business production environment and
this feature seemed like it would also be useful in such an environment.

> Side notes: It would be nice to be able to change the
> "shutdown-when-asked-to-reboot" (et al) option from the monitor.  It
> would also be nice to "pause-when-asked-to-shutdown/reboot", which is
> useful during automatic OS installs - the host script changes the
> media and/or hardware at each reboot.

Seems like a useful feature.

This is beginning to like a complete matrix rather than a simple option
i.e. QEMU could be configured to do any of the stop/reset/poweroff/reset
etc. options for any of the requests from the guest.

Regards,
Steve.