From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1LchEG-0003Lm-4A for qemu-devel@nongnu.org; Thu, 26 Feb 2009 09:28:48 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1LchEE-0003L3-RB for qemu-devel@nongnu.org; Thu, 26 Feb 2009 09:28:47 -0500 Received: from [199.232.76.173] (port=47004 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1LchEE-0003Ky-Kj for qemu-devel@nongnu.org; Thu, 26 Feb 2009 09:28:46 -0500 Received: from pelvoux.gotadsl.co.uk ([81.6.248.91]:46424) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1LchED-0000aZ-NU for qemu-devel@nongnu.org; Thu, 26 Feb 2009 09:28:46 -0500 Received: from fozzy by ecrins.fosdick.home.net with local (Exim 4.69) (envelope-from ) id 1LchGm-0005dV-AF for qemu-devel@nongnu.org; Thu, 26 Feb 2009 14:31:24 +0000 Subject: Re: [Qemu-devel] Hardware watchdogs (patch for discussion only) From: Steve Fosdick In-Reply-To: <20090226105106.GD22494@redhat.com> References: <20090225233718.GA15750@amd.home.annexia.org> <20090226105106.GD22494@redhat.com> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 26 Feb 2009 14:31:22 +0000 Message-Id: <1235658682.5894.152.camel@ecrins.fosdick.home.net> Mime-Version: 1.0 Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On Thu, 2009-02-26 at 10:51 +0000, Daniel P. Berrange wrote: > I think we can only support the following options > > - shutdown - graceful shutdown of guest via ACPI event via > qemu_system_powerdown_request() I wonder how many times the guest will be healthy enough to respond to this and how many times it will have crashed badly enough that this does no good. Perhaps we could have a second timer such that if, on asking the guest to shut down via ACPI, the guest does not respond within a certain time limit with an ACPI request to turn the power off we go for one of the other options below? > - poweroff - hard immediate power off of guest machine via > qemu_system_shutdown_request() > - reset - hard reset of the guest machine via > qemu_system_reset_request() > - pause - stop the guest CPU(s) Thinking a little more on this I can see two use cases for a watchdog: 1. Ensure continuity of service. When a guest OS gets stuck for some reason make sure it is re-started. This is probably the only use case on a real physical machine. 2. Limit the resource consumption of a crashed guest when the host serves other guests. This probably only of concern for virtual machines, i.e. it is specific to the emulated watchdog and its interaction with qemu rather than being part of how a physical watchdog works. Looking at the actions proposed by Daniel shutdown, poweroff and pause support the second use case whereas reset supports the first. Do we want to offer the guest the option of a clean shutdown if it can still manage that and then reboot, i.e. the shutdown option but for use case 1? If so we need to be able to turn the APCI power off request into a reset instead. We already have the -no-reboot option to turn a reboot into a power off - this is the opposite. In fact, some people may find that option useful anyway even without the watchdog. In an environment where someone has privileged access to a guest but no direct access to the host OS he could shut down a guest accidentally when intending to reboot (or logoff). It may be useful to trap that and turn the shutdown into a reboot. Regards, Steve.