From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LczJo-0002B5-Bd for qemu-devel@nongnu.org; Fri, 27 Feb 2009 04:47:44 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LczJn-0002AK-1u for qemu-devel@nongnu.org; Fri, 27 Feb 2009 04:47:43 -0500 Received: from [199.232.76.173] (port=55633 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LczJm-0002A2-Ff for qemu-devel@nongnu.org; Fri, 27 Feb 2009 04:47:42 -0500 Received: from pelvoux.gotadsl.co.uk ([81.6.248.91]:36818) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LczJl-00069E-PZ for qemu-devel@nongnu.org; Fri, 27 Feb 2009 04:47:42 -0500 Received: from fozzy by ecrins.fosdick.home.net with local (Exim 4.69) (envelope-from ) id 1LczMP-0007hd-BY for qemu-devel@nongnu.org; Fri, 27 Feb 2009 09:50:25 +0000 Subject: Re: [Qemu-devel] Hardware watchdogs (patch for discussion only) From: Steve Fosdick In-Reply-To: <20090226175025.GA10284@shareable.org> References: <20090225233718.GA15750@amd.home.annexia.org> <20090226105106.GD22494@redhat.com> <1235658682.5894.152.camel@ecrins.fosdick.home.net> <20090226175025.GA10284@shareable.org> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Fri, 27 Feb 2009 09:50:24 +0000 Message-Id: <1235728224.5894.176.camel@ecrins.fosdick.home.net> Mime-Version: 1.0 Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On Thu, 2009-02-26 at 17:50 +0000, Jamie Lokier wrote: > For real continuity of service you'd also want QEMU itself to have a > watchdog. Either a software watchdog internally (SIGALRM => kill/exec > self, or child process expecting regular pings over a pipe), or by > QEMU itself becoming a client of the host watchdog. So many possibilities - one, two, or three watchdogs? If we want to cater for the situation where one host is running more than one guest we would not want a single watchdog with a communication path from one of the guests to a hardware watchdog on the host because this would cause the host to reboot if any one of the guests failed thus rebooting guests that were still working. A two-watchdog solution could work though. The host would be protected with a normal hardware watchdog and this would use a normal user-space process to tickle that watchdog rather than QEMU. For QEMU there is a software watchdog that looks to the guest like a hardware watchdog and therefore uses the already written driver and a normal user space process on the guest. The timer part of the QEMU software watchdog is implemented as a second userspace process on the host which communicates with the main QEMU process. When the guest tickles the watchdog the tickle is forwarded to the separate watchdog process which resets the timer. When the timer goes off the watchdog sends a message back to QEMU to perform the configured action which QEMU must confirm is happening with a further message back to the watchdog process. If QEMU does not respond the watchdog process uses host OS-level facilities to kill and re-start it. The three watchdog solution uses the host hardware watchdog, a software watchdog within QEMU as implemented by Richard and another watchdog that sits in a separate process which can 'ping' QEMU and, if it does not respond, uses host OS-level facilities to kill and re-start it. > > In fact, some people may find that option useful anyway even without the > > watchdog. In an environment where someone has privileged access to a > > guest but no direct access to the host OS he could shut down a guest > > accidentally when intending to reboot (or logoff). It may be useful to > > trap that and turn the shutdown into a reboot. > > I've done that a few times. It's only minorly annoying in that you > lose the VNC connection and have to login and restart the VM. I suspect most of the people using QEMU have access to the host OS too which means as you say that it is no big deal. In some corporate environments accidentally shutting down a guest means submitting a job to another team and then waiting while they get round to restarting it. A watchdog is one feature that takes QEMU in the direction of being suitable for a business production environment and this feature seemed like it would also be useful in such an environment. > Side notes: It would be nice to be able to change the > "shutdown-when-asked-to-reboot" (et al) option from the monitor. It > would also be nice to "pause-when-asked-to-shutdown/reboot", which is > useful during automatic OS installs - the host script changes the > media and/or hardware at each reboot. Seems like a useful feature. This is beginning to like a complete matrix rather than a simple option i.e. QEMU could be configured to do any of the stop/reset/poweroff/reset etc. options for any of the requests from the guest. Regards, Steve.