qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Hardware watchdogs (patch for discussion only)
@ 2009-02-25 23:37 Richard W.M. Jones
  2009-02-26 10:51 ` Daniel P. Berrange
  0 siblings, 1 reply; 13+ messages in thread
From: Richard W.M. Jones @ 2009-02-25 23:37 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1581 bytes --]

Hi:

I want to share an unfinished patch with the list just to make sure
that I'm heading in the right direction.

This patch aims to implement a virtual hardware watchdog device.  A
hardware watchdog in a real machine is a device that must be tickled
periodically by a userspace process to prove that the machine's
userspace is (in some sense) "alive".  If the device isn't tickled
then after some timeout, it reboots the machine.

These devices are generally very simple.  I picked two devices to
emulate: the IBase 700, which is almost trivial, an ISA port to enable
and set the timeout, and another ISA port to disable.  And the Intel
6300ESB, a PCI device which represents a mid-high end range of
features and is very well documented.  Both have clean Linux device
drivers:

  http://lxr.linux.no/linux+v2.6.28.5/drivers/watchdog/ib700wdt.c
  http://lxr.linux.no/linux+v2.6.28.5/drivers/watchdog/i6300esb.c

(Both also come with Windows drivers which I haven't tested)

The attached patch contains a working emulation of the Intel 6300ESB.
The IB700 is simpler, but I haven't got around to implementing it yet.
Save/restore/migration is also on the to-do list.

I have tested my emulated Intel device using the Linux watchdog
program, and it appears to work.

  http://sourceforge.net/projects/watchdog/

Any comments?

Rich.

-- 
Richard Jones, Emerging Technologies, Red Hat  http://et.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

[-- Attachment #2: qemu-watchdog-1.patch --]
[-- Type: text/plain, Size: 32429 bytes --]

Index: Makefile.target
===================================================================
--- Makefile.target	(revision 6643)
+++ Makefile.target	(working copy)
@@ -584,6 +584,7 @@
 OBJS+= cirrus_vga.o apic.o parallel.o acpi.o piix_pci.o
 OBJS+= usb-uhci.o vmmouse.o vmport.o vmware_vga.o hpet.o
 OBJS += device-hotplug.o pci-hotplug.o
+OBJS+= wdt_ib700.o wdt_i6300esb.o
 CPPFLAGS += -DHAS_AUDIO -DHAS_AUDIO_CHOICE
 endif
 ifeq ($(TARGET_BASE_ARCH), ppc)
Index: vl.c
===================================================================
--- vl.c	(revision 6643)
+++ vl.c	(working copy)
@@ -42,6 +42,7 @@
 #include "migration.h"
 #include "kvm.h"
 #include "balloon.h"
+#include "watchdog.h"
 
 #include <unistd.h>
 #include <fcntl.h>
@@ -4072,6 +4073,8 @@
            "-old-param      old param mode\n"
 #endif
            "-tb-size n      set TB size\n"
+	   "-watchdog ib700|i6300esb[,action=reboot|shutdown|pause|exit]\n"
+	   "                enable virtual hardware watchdog [default=none]\n"
            "-incoming p     prepare for incoming migration, listen on port p\n"
            "\n"
            "During emulation, the following keys are useful:\n"
@@ -4183,6 +4186,7 @@
     QEMU_OPTION_startdate,
     QEMU_OPTION_icount,
     QEMU_OPTION_echr,
+    QEMU_OPTION_watchdog,
     QEMU_OPTION_virtiocon,
     QEMU_OPTION_show_cursor,
     QEMU_OPTION_semihosting,
@@ -4308,6 +4312,7 @@
     { "startdate", HAS_ARG, QEMU_OPTION_startdate },
     { "icount", HAS_ARG, QEMU_OPTION_icount },
     { "echr", HAS_ARG, QEMU_OPTION_echr },
+    { "watchdog", HAS_ARG, QEMU_OPTION_watchdog },
     { "virtioconsole", HAS_ARG, QEMU_OPTION_virtiocon },
     { "show-cursor", 0, QEMU_OPTION_show_cursor },
 #if defined(TARGET_ARM) || defined(TARGET_M68K)
@@ -5082,6 +5087,14 @@
                 serial_devices[serial_device_index] = optarg;
                 serial_device_index++;
                 break;
+	    case QEMU_OPTION_watchdog:
+		if (wdt_option) {
+		    fprintf (stderr,
+			     "qemu: only one watchdog option may be given\n");
+		    exit (1);
+		}
+		wdt_option = optarg;
+	        break;
             case QEMU_OPTION_virtiocon:
                 if (virtio_console_index >= MAX_VIRTIO_CONSOLES) {
                     fprintf(stderr, "qemu: too many virtio consoles\n");
Index: Makefile
===================================================================
--- Makefile	(revision 6643)
+++ Makefile	(working copy)
@@ -85,6 +85,7 @@
 OBJS+=bt.o bt-host.o bt-vhci.o bt-l2cap.o bt-sdp.o bt-hci.o bt-hid.o usb-bt.o
 OBJS+=buffered_file.o migration.o migration-tcp.o net.o qemu-sockets.o
 OBJS+=qemu-char.o aio.o net-checksum.o savevm.o cache-utils.o
+OBJS+=watchdog.o
 
 ifdef CONFIG_BRLAPI
 OBJS+= baum.o
Index: watchdog.c
===================================================================
--- watchdog.c	(revision 0)
+++ watchdog.c	(revision 0)
@@ -0,0 +1,191 @@
+/*
+ * Virtual hardware watchdog.
+ *
+ * Copyright (C) 2009 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * By Richard W.M. Jones (rjones@redhat.com).
+ */
+
+#include "qemu-common.h"
+#include "sysemu.h"
+#include "hw/wdt_ib700.h"
+#include "hw/wdt_i6300esb.h"
+#include "watchdog.h"
+
+/* Possible values for action parameter. */
+#define WDT_REBOOT       1
+#define WDT_REBOOTNICE   2
+#define WDT_SHUTDOWN     3
+#define WDT_SHUTDOWNNICE 4
+#define WDT_PAUSE        5
+#define WDT_EXIT         6
+
+static const char *string_of_action (int action);
+static void parse_option (const char *arg, int *action_r);
+static void parse_rest (const char *rest, int *action_r);
+
+/* Linked list of models - virtual watchdog devices add themselves
+ * to this list.
+ */
+struct wdt_model *wdt_models = NULL;
+
+/* The raw -watchdog option specified on the command line. */
+const char *wdt_option = NULL;
+
+/* Currently enabled watchdog device.  Only one may be enabled from
+ * the command line.
+ */
+static struct wdt_model *wdt = NULL;
+
+/* Called from the PC code to parse the option and finally configure
+ * the device.
+ */
+void
+watchdog_pc_init (PCIBus *pci_bus)
+{
+    int action;
+
+    if (!wdt_option) return;
+    parse_option (wdt_option, &action);
+    if (!wdt) return;		/* No watchdog configured. */
+    wdt->wdt_methods->wdt_pc_init (pci_bus, action);
+}
+
+/* This actually performs the "action" once a watchdog has expired,
+ * ie. reboot, shutdown, exit, etc.
+ */
+void
+watchdog_perform_action (int action)
+{
+    fprintf (stderr, "qemu: watchdog %s!\n", string_of_action (action));
+
+    switch (action) {
+    case WDT_REBOOT:
+	qemu_system_reset ();
+	break;
+
+    case WDT_REBOOTNICE:
+	qemu_system_reset_request ();
+	break;
+
+    case WDT_SHUTDOWN:
+	qemu_system_powerdown ();
+	break;
+
+    case WDT_SHUTDOWNNICE:
+	qemu_system_powerdown_request ();
+	break;
+
+    case WDT_PAUSE:
+	vm_stop (0);
+	break;
+
+    case WDT_EXIT:
+	exit (0);
+    }
+}
+
+static const char *
+string_of_action (int action)
+{
+    switch (action) {
+    case WDT_REBOOT: return "reboot";
+    case WDT_REBOOTNICE: return "reboot (nice)";
+    case WDT_SHUTDOWN: return "shutdown";
+    case WDT_SHUTDOWNNICE: return "shutdown (nice)";
+    case WDT_PAUSE: return "pause";
+    case WDT_EXIT: return "exit";
+    default: return "UNKNOWN";
+    }
+}
+
+/* This function parses the command line parameter of the form:
+ *   <<model>>[,action=<<action>>]
+ */
+static void
+parse_option (const char *arg, int *action_r)
+{
+    struct wdt_model *p;
+    int len;
+
+    if (wdt) {
+	fprintf (stderr, "Only one -watchdog device can be enabled\n");
+	exit (1);
+    }
+
+    /* -watchdog ? lists available devices and exits cleanly. */
+    if (strcmp (arg, "?") == 0) {
+	for (p = wdt_models; p; p = p->wdt_next) {
+	    fprintf (stderr, "\t%s\t%s\n",
+		     p->wdt_name, p->wdt_description);
+	}
+	exit (0);
+    }
+
+    for (p = wdt_models; p; p = p->wdt_next) {
+	len = strlen (p->wdt_name);
+	if (strncasecmp (arg, p->wdt_name, len) == 0 &&
+	    (arg[len] == '\0' || arg[len] == ',')) {
+	    parse_rest (&arg[len], action_r);
+	    if (p->wdt_methods->wdt_option)
+		p->wdt_methods->wdt_option (&arg[len]);
+	    wdt = p;
+	    return;
+	}
+    }
+
+    fprintf (stderr, "Unknown -watchdog device.  Supported devices are:\n");
+    for (p = wdt_models; p; p = p->wdt_next) {
+	fprintf (stderr, "\t%s\t%s\n",
+		 p->wdt_name, p->wdt_description);
+    }
+    exit (1);
+}
+
+/* Parse the remainder of the command line parameter, either:
+ *   ,action=<<action>>
+ * or empty string.  For forwards compatibility, ignore other
+ * parameters which may appear in the string.
+ */
+static void
+parse_rest (const char *arg, int *action_r)
+{
+    char buf[64];
+
+    *action_r = WDT_REBOOT;	/* Default action. */
+    if (*arg == '\0') return;
+    if (*arg == ',') arg++;
+
+    if (get_param_value (buf, sizeof (buf), "action", arg)) {
+	if (strcasecmp (buf, "reboot") == 0)
+	    *action_r = WDT_REBOOT;
+	if (strcasecmp (buf, "rebootnice") == 0)
+	    *action_r = WDT_REBOOTNICE;
+	else if (strcasecmp (buf, "shutdown") == 0)
+	    *action_r = WDT_SHUTDOWN;
+	else if (strcasecmp (buf, "shutdownnice") == 0)
+	    *action_r = WDT_SHUTDOWNNICE;
+	else if (strcasecmp (buf, "pause") == 0)
+	    *action_r = WDT_PAUSE;
+	else if (strcasecmp (buf, "exit") == 0)
+	    *action_r = WDT_EXIT;
+	else {
+	    fprintf (stderr, "Unknown -watchdog action parameter: %s\n", buf);
+	    exit (1);
+	}
+    }
+}
Index: qemu-doc.texi
===================================================================
--- qemu-doc.texi	(revision 6643)
+++ qemu-doc.texi	(working copy)
@@ -1158,8 +1158,43 @@
 @item -echr 20
 @end table
 
+@item -watchdog @var{model}[,action=@var{action}]
+Create a virtual hardware watchdog device.  Once enabled (by a guest
+action), the watchdog must be periodically polled by an agent inside
+the guest or else the guest will be restarted.
+
+The @var{model} is the model of hardware watchdog to emulate.  Choices
+for model are: @code{ib700} (iBASE 700) which is a very simple ISA
+watchdog with a single timer, or @code{i6300esb} (Intel 6300ESB I/O
+controller hub) which is a much more featureful PCI-based dual-timer
+watchdog.  Choose a model for which your guest has drivers.
+
+The @var{action} controls what QEMU will do when the timer expires.
+The default is
+@code{reboot} (forcefully reboot the guest).
+Other possible actions are:
+@code{rebootnice} (attempt to gracefully reboot the guest).
+@code{shutdown} (forcefully shutdown the guest),
+@code{shutdownnice} (attempt to gracefully shutdown the guest),
+@code{pause} (pause the guest), or
+@code{exit} (immediately exit the QEMU process).
+
+Note that the @code{rebootnice} and @code{shutdownnice} actions
+require that the guest responds to ACPI signals, which it may not be
+able to do in the sort of situations where the watchdog would have
+expired, and thus these actions are not recommended for production
+use.
+
+Use @code{-watchdog ?} to list available hardware models.  Only one
+watchdog can be enabled for a guest.
+
+@table @code
+@item -watchdog ib700
+@item -watchdog i6300esb,action=exit
 @end table
 
+@end table
+
 @c man end
 
 @node pcsys_keys
Index: watchdog.h
===================================================================
--- watchdog.h	(revision 0)
+++ watchdog.h	(revision 0)
@@ -0,0 +1,54 @@
+/*
+ * Virtual hardware watchdog.
+ *
+ * Copyright (C) 2008 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * By Richard W.M. Jones (rjones@redhat.com).
+ */
+
+#ifndef _QEMU_WATCHDOG_H
+#define _QEMU_WATCHDOG_H
+
+extern const char *wdt_option;
+
+extern void watchdog_pc_init (PCIBus *pci_bus);
+extern void watchdog_perform_action (int action);
+
+struct wdt_model {
+    struct wdt_model *wdt_next;
+
+    /* Short name of the device - used to select it on the command line. */
+    const char *wdt_name;
+    /* Longer description (eg. manufacturer and full model number). */
+    const char *wdt_description;
+    /* Callbacks for this device. */
+    struct wdt_methods *wdt_methods;
+};
+
+struct wdt_methods {
+    /* If further option parsing is needed, do it here. */
+    void (*wdt_option) (const char *arg);
+    /* This callback should create/register the device.  It is called
+     * indirectly from hw/pc.c when the virtual PC is being set up.
+     */
+    void (*wdt_pc_init) (PCIBus *pci_bus, int action);
+};
+
+/* Watchdog virtual devices add themselves to this linked list. */
+extern struct wdt_model *wdt_models;
+
+#endif /* _QEMU_WATCHDOG_H */
Index: hw/wdt_i6300esb.c
===================================================================
--- hw/wdt_i6300esb.c	(revision 0)
+++ hw/wdt_i6300esb.c	(revision 0)
@@ -0,0 +1,498 @@
+/*
+ * Virtual hardware watchdog.
+ *
+ * Copyright (C) 2008 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * By Richard W.M. Jones (rjones@redhat.com).
+ */
+
+#include <inttypes.h>
+
+#include "qemu-common.h"
+#include "qemu-timer.h"
+#include "watchdog.h"
+#include "wdt_i6300esb.h"
+#include "hw.h"
+#include "isa.h"
+#include "pc.h"
+#include "pci.h"
+
+/*#define I6300ESB_DEBUG 1*/
+
+#ifndef PCI_DEVICE_ID_INTEL_ESB_9
+#define PCI_DEVICE_ID_INTEL_ESB_9 0x25ab
+#endif
+
+/* PCI configuration registers */
+#define ESB_CONFIG_REG  0x60            /* Config register                   */
+#define ESB_LOCK_REG    0x68            /* WDT lock register                 */
+
+/* Memory mapped registers (offset from base address) */
+#define ESB_TIMER1_REG  0x00 /* Timer1 value after each reset     */
+#define ESB_TIMER2_REG  0x04 /* Timer2 value after each reset     */
+#define ESB_GINTSR_REG  0x08 /* General Interrupt Status Register */
+#define ESB_RELOAD_REG  0x0c /* Reload register                   */
+
+/* Lock register bits */
+#define ESB_WDT_FUNC    (0x01 << 2)   /* Watchdog functionality            */
+#define ESB_WDT_ENABLE  (0x01 << 1)   /* Enable WDT                        */
+#define ESB_WDT_LOCK    (0x01 << 0)   /* Lock (nowayout)                   */
+
+/* Config register bits */
+#define ESB_WDT_REBOOT  (0x01 << 5)   /* Enable reboot on timeout          */
+#define ESB_WDT_FREQ    (0x01 << 2)   /* Decrement frequency               */
+#define ESB_WDT_INTTYPE (0x11 << 0)   /* Interrupt type on timer1 timeout  */
+
+/* Reload register bits */
+#define ESB_WDT_RELOAD  (0x01 << 8)    /* prevent timeout                   */
+
+/* Magic constants */
+#define ESB_UNLOCK1     0x80            /* Step 1 to unlock reset registers  */
+#define ESB_UNLOCK2     0x86            /* Step 2 to unlock reset registers  */
+
+struct state;
+static void i6300esb_pc_init (PCIBus *pci_bus, int action);
+static void i6300esb_map (PCIDevice *dev, int region_num, uint32_t addr, uint32_t size, int type);
+static void i6300esb_config_write (PCIDevice *dev, uint32_t addr, uint32_t data, int len);
+static uint32_t i6300esb_config_read (PCIDevice *dev, uint32_t addr, int len);
+static uint32_t i6300esb_mem_readb (void *vp, target_phys_addr_t addr);
+static uint32_t i6300esb_mem_readw (void *vp, target_phys_addr_t addr);
+static uint32_t i6300esb_mem_readl (void *vp, target_phys_addr_t addr);
+static void i6300esb_mem_writeb (void *vp, target_phys_addr_t addr, uint32_t val);
+static void i6300esb_mem_writew (void *vp, target_phys_addr_t addr, uint32_t val);
+static void i6300esb_mem_writel (void *vp, target_phys_addr_t addr, uint32_t val);
+static void i6300esb_restart_timer (struct state *, int stage);
+static void i6300esb_disable_timer (struct state *);
+static void i6300esb_timer_expired (void *vp);
+static int i6300esb_save_live (QEMUFile *f, int stage, void *vp);
+static void i6300esb_save (QEMUFile *f, void *vp);
+static int i6300esb_load (QEMUFile *f, void *vp, int version);
+
+static struct wdt_methods i6300esb_wdt = {
+    .wdt_pc_init = i6300esb_pc_init,
+};
+
+static struct wdt_model model = {
+    .wdt_name = "i6300esb",
+    .wdt_description = "Intel 6300ESB",
+    .wdt_methods = &i6300esb_wdt,
+};
+
+/* Device state. */
+struct state {
+    PCIDevice dev;		/* PCI device state, must be first field. */
+
+    int action;			/* Action on expiry. */
+
+    int reboot_enabled;		/* "Reboot" on timer expiry.  The real action
+				 * performed depends on the action=* param
+				 * passed on QEMU command line.
+				 */
+    int clock_scale;		/* Clock scale. */
+#define CLOCK_SCALE_1KHZ 0
+#define CLOCK_SCALE_1MHZ 1
+
+    int int_type;		/* Interrupt type generated. */
+#define INT_TYPE_IRQ 0		/* APIC 1, INT 10 */
+#define INT_TYPE_SMI 2
+#define INT_TYPE_DISABLED 3
+
+    int free_run;               /* If true, reload timer on expiry. */
+    int locked;			/* If true, enabled field cannot be changed. */
+    int enabled;		/* If true, watchdog is enabled. */
+
+    QEMUTimer *timer;		/* The actual watchdog timer. */
+
+    uint32_t timer1_preload;	/* Values preloaded into timer1, timer2. */
+    uint32_t timer2_preload;
+    int stage;			/* Stage (1 or 2). */
+
+    int unlock_state;		/* Guest writes 0x80, 0x86 to unlock the
+				 * registers, and we transition through
+				 * states 0 -> 1 -> 2 when this happens.
+				 */
+
+    int previous_reboot_flag;	/* If the watchdog caused the previous
+				 * reboot, this flag will be set.
+				 */
+};
+
+void
+wdt_i6300esb_init (void)
+{
+    model.wdt_next = wdt_models;
+    wdt_models = &model;
+}
+
+/* Create and initialize a virtual Intel 6300ESB during PC creation. */
+static void
+i6300esb_pc_init (PCIBus *pci_bus, int action)
+{
+    struct state *d;
+    uint8_t *pci_conf;
+
+    if (!pci_bus) {
+	fprintf (stderr, "wdt_i6300esb: no PCI bus in this machine\n");
+	return;
+    }
+
+    d = (struct state *)
+	pci_register_device (pci_bus, "i6300esb_wdt", sizeof (struct state),
+			     -1,
+			     i6300esb_config_read, i6300esb_config_write);
+
+    d->action = action;
+    d->reboot_enabled = 1;
+    d->clock_scale = CLOCK_SCALE_1KHZ;
+    d->int_type = INT_TYPE_IRQ;
+    d->free_run = 0;
+    d->locked = 0;
+    d->enabled = 0;
+    d->timer = qemu_new_timer (vm_clock, i6300esb_timer_expired, d);
+    d->timer1_preload = 0xfffff;
+    d->timer2_preload = 0xfffff;
+    d->stage = 1;
+    d->unlock_state = 0;
+    d->previous_reboot_flag = 0;
+
+    pci_conf = d->dev.config;
+    pci_config_set_vendor_id (pci_conf, PCI_VENDOR_ID_INTEL);
+    pci_config_set_device_id (pci_conf, PCI_DEVICE_ID_INTEL_ESB_9);
+    pci_config_set_class (pci_conf, PCI_CLASS_SYSTEM_OTHER);
+    pci_conf[0x0e] = 0x00;
+
+    pci_register_io_region (&d->dev, 0, 0x10,
+			    PCI_ADDRESS_SPACE_MEM, i6300esb_map);
+
+    register_savevm_live ("i6300esb_wdt", 0, sizeof (struct state),
+			  i6300esb_save_live, i6300esb_save, i6300esb_load,
+			  NULL);
+}
+
+static void
+i6300esb_map (PCIDevice *dev, int region_num,
+	      uint32_t addr, uint32_t size, int type)
+{
+    static CPUReadMemoryFunc *mem_read[3] = {
+	i6300esb_mem_readb,
+	i6300esb_mem_readw,
+	i6300esb_mem_readl,
+    };
+    static CPUWriteMemoryFunc *mem_write[3] = {
+	i6300esb_mem_writeb,
+	i6300esb_mem_writew,
+	i6300esb_mem_writel,
+    };
+    struct state *d = (struct state *) dev;
+    int io_mem;
+
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_map: addr = %x, size = %x, type = %d\n",
+	     addr, size, type);
+#endif
+
+    io_mem = cpu_register_io_memory (0, mem_read, mem_write, d);
+    cpu_register_physical_memory (addr, 0x10, io_mem);
+    /* qemu_register_coalesced_mmio (addr, 0x10); ? */
+}
+
+static void
+i6300esb_config_write (PCIDevice *dev, uint32_t addr, uint32_t data, int len)
+{
+    struct state *d = (struct state *) dev;
+    int old;
+
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_config_write: addr = %x, data = %x, len = %d\n",
+	     addr, data, len);
+#endif
+
+    if (addr == ESB_CONFIG_REG && len == 2) {
+	d->reboot_enabled = (data & ESB_WDT_REBOOT) == 0;
+	d->clock_scale =
+	    (data & ESB_WDT_FREQ) != 0 ? CLOCK_SCALE_1MHZ : CLOCK_SCALE_1KHZ;
+	d->int_type = (data & ESB_WDT_INTTYPE);
+    } else if (addr == ESB_LOCK_REG && len == 1) {
+	if (!d->locked) {
+	    d->locked = (data & ESB_WDT_LOCK) != 0;
+	    d->free_run = (data & ESB_WDT_FUNC) != 0;
+	    old = d->enabled;
+	    d->enabled = (data & ESB_WDT_ENABLE) != 0;
+	    if (!old && d->enabled) /* Enabled transitioned from 0 -> 1 */
+		i6300esb_restart_timer (d, 1);
+	    else if (!d->enabled)
+		i6300esb_disable_timer (d);
+	}
+    } else {
+	pci_default_write_config (dev, addr, data, len);
+    }
+}
+
+static uint32_t
+i6300esb_config_read (PCIDevice *dev, uint32_t addr, int len)
+{
+    struct state *d = (struct state *) dev;
+    uint32_t data;
+
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_config_read: addr = %x, len = %d\n",
+	     addr, len);
+#endif
+
+    if (addr == ESB_CONFIG_REG && len == 2) {
+	data =
+	    (d->reboot_enabled ? 0 : ESB_WDT_REBOOT) |
+	    (d->clock_scale == CLOCK_SCALE_1MHZ ? ESB_WDT_FREQ : 0) |
+	    d->int_type;
+	return data;
+    } else if (addr == ESB_LOCK_REG && len == 1) {
+	data =
+	    (d->free_run ? ESB_WDT_FUNC : 0) |
+	    (d->locked ? ESB_WDT_LOCK : 0) |
+	    (d->enabled ? ESB_WDT_ENABLE : 0);
+	return data;
+    } else {
+	return pci_default_read_config (dev, addr, len);
+    }
+}
+
+static uint32_t
+i6300esb_mem_readb (void *vp, target_phys_addr_t addr)
+{
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_mem_readb: addr = %x\n",
+	     (int) addr);
+#endif
+
+    return 0;
+}
+
+static uint32_t
+i6300esb_mem_readw (void *vp, target_phys_addr_t addr)
+{
+    uint32_t data = 0;
+    struct state *d = (struct state *) vp;
+
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_mem_readw: addr = %x\n",
+	     (int) addr);
+#endif
+
+    if (addr == 0xc) {
+	/* The previous reboot flag is really bit 9, but there is
+	 * a bug in the Linux driver where it thinks it's bit 12.
+	 * Set both.
+	 */
+	data = d->previous_reboot_flag ? 0x1200 : 0;
+    }
+
+    return data;
+}
+
+static uint32_t
+i6300esb_mem_readl (void *vp, target_phys_addr_t addr)
+{
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_mem_readl: addr = %x\n",
+	     (int) addr);
+#endif
+
+    return 0;
+}
+
+static void
+i6300esb_mem_writeb (void *vp, target_phys_addr_t addr, uint32_t val)
+{
+    struct state *d = (struct state *) vp;
+
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_mem_writeb: addr = %x, val = %x\n",
+	     (int) addr, val);
+#endif
+
+    if (addr == 0xc && val == 0x80)
+	d->unlock_state = 1;
+    else if (addr == 0xc && val == 0x86 && d->unlock_state == 1)
+	d->unlock_state = 2;
+}
+
+static void
+i6300esb_mem_writew (void *vp, target_phys_addr_t addr, uint32_t val)
+{
+    struct state *d = (struct state *) vp;
+
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_mem_writew: addr = %x, val = %x\n",
+	     (int) addr, val);
+#endif
+
+    if (addr == 0xc && val == 0x80)
+	d->unlock_state = 1;
+    else if (addr == 0xc && val == 0x86 && d->unlock_state == 1)
+	d->unlock_state = 2;
+    else {
+	if (d->unlock_state == 2) {
+	    if (addr == 0xc) {
+		if ((val & 0x100) != 0)
+		    /* This is the "ping" from the userspace watchdog in
+		     * the guest ...
+		     */
+		    i6300esb_restart_timer (d, 1);
+
+		/* Setting bit 9 resets the previous reboot flag.
+		 * There's a bug in the Linux driver where it sets
+		 * bit 12 instead.
+		 */
+		if ((val & 0x200) != 0 || (val & 0x1000) != 0) {
+		    d->previous_reboot_flag = 0;
+		}
+	    }
+
+	    d->unlock_state = 0;
+	}
+    }
+}
+
+static void
+i6300esb_mem_writel (void *vp, target_phys_addr_t addr, uint32_t val)
+{
+    struct state *d = (struct state *) vp;
+
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_mem_writel: addr = %x, val = %x\n",
+	     (int) addr, val);
+#endif
+
+    if (addr == 0xc && val == 0x80)
+	d->unlock_state = 1;
+    else if (addr == 0xc && val == 0x86 && d->unlock_state == 1)
+	d->unlock_state = 2;
+    else {
+	if (d->unlock_state == 2) {
+	    if (addr == 0)
+		d->timer1_preload = val & 0xfffff;
+	    else if (addr == 4)
+		d->timer2_preload = val & 0xfffff;
+
+	    d->unlock_state = 0;
+	}
+    }
+}
+
+/* This function is called when the watchdog has either been enabled
+ * (hence it starts counting down) or has been keep-alived.
+ */
+static void
+i6300esb_restart_timer (struct state *d, int stage)
+{
+    int64_t timeout;
+
+    if (!d->enabled) return;
+
+    d->stage = stage;
+
+    if (d->stage <= 1)
+	timeout = d->timer1_preload;
+    else
+	timeout = d->timer2_preload;
+
+    if (d->clock_scale == CLOCK_SCALE_1KHZ)
+	timeout <<= 15;
+    else
+	timeout <<= 5;
+
+    /* Get the timeout in units of ticks_per_sec. */
+    timeout = ticks_per_sec * timeout / 33000000;
+
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_restart_timer: stage %d, timeout %" PRIi64 "\n",
+	     d->stage, timeout);
+#endif
+
+    qemu_mod_timer (d->timer, qemu_get_clock (vm_clock) + timeout);
+}
+
+/* This is called when the guest disables the watchdog. */
+static void
+i6300esb_disable_timer (struct state *d)
+{
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_disable_timer: timer disabled\n");
+#endif
+
+    qemu_del_timer (d->timer);
+}
+
+/* This function is called when the watchdog expires.  Note that
+ * the hardware has two timers, and so expiry happens in two stages.
+ * If d->stage == 1 then we perform the first stage action (usually,
+ * sending an interrupt) and then restart the timer again for the
+ * second stage.  If the second stage expires then the watchdog
+ * really has run out.
+ */
+static void
+i6300esb_timer_expired (void *vp)
+{
+    struct state *d = (struct state *) vp;
+
+#ifdef I6300ESB_DEBUG
+    fprintf (stderr, "i6300esb_timer_expired: stage %d\n", d->stage);
+#endif
+
+    if (d->stage == 1) {
+	/* What to do at the end of stage 1? */
+	switch (d->int_type) {
+	case INT_TYPE_IRQ:
+	    fprintf (stderr, "i6300esb_timer_expired: I would send APIC 1 INT 10 here (XXX)\n");
+	    break;
+	case INT_TYPE_SMI:
+	    fprintf (stderr, "i6300esb_timer_expired: I would send SMI here (XXX)\n");
+	    break;
+	}
+
+	/* Start the second stage. */
+	i6300esb_restart_timer (d, 2);
+    } else {
+	/* Second stage expired, reboot for real. */
+	if (d->reboot_enabled) {
+	    d->previous_reboot_flag = 1;
+	    watchdog_perform_action (d->action); /* This reboots, exits, etc */
+	}
+
+	/* In "free running mode" we start stage 1 again. */
+	if (d->free_run)
+	    i6300esb_restart_timer (d, 1);
+    }
+}
+
+static int
+i6300esb_save_live (QEMUFile *f, int stage, void *vp)
+{
+    abort ();
+}
+
+static void
+i6300esb_save (QEMUFile *f, void *vp)
+{
+    abort ();
+}
+
+static int
+i6300esb_load (QEMUFile *f, void *vp, int version)
+{
+    abort ();
+}
Index: hw/wdt_i6300esb.h
===================================================================
--- hw/wdt_i6300esb.h	(revision 0)
+++ hw/wdt_i6300esb.h	(revision 0)
@@ -0,0 +1,28 @@
+/*
+ * Virtual hardware watchdog.
+ *
+ * Copyright (C) 2008 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * By Richard W.M. Jones (rjones@redhat.com).
+ */
+
+#ifndef _QEMU_WDT_I6300ESB_H
+#define _QEMU_WDT_I6300ESB_H
+
+extern void wdt_i6300esb_init (void);
+
+#endif /* _QEMU_WDT_I6300ESB_H */
Index: hw/wdt_ib700.c
===================================================================
--- hw/wdt_ib700.c	(revision 0)
+++ hw/wdt_ib700.c	(revision 0)
@@ -0,0 +1,101 @@
+/*
+ * Virtual hardware watchdog.
+ *
+ * Copyright (C) 2008 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * By Richard W.M. Jones (rjones@redhat.com).
+ */
+
+#include "qemu-common.h"
+#include "watchdog.h"
+#include "wdt_ib700.h"
+#include "hw.h"
+#include "isa.h"
+#include "pc.h"
+
+/*#define IB700_DEBUG 1*/
+
+static void ib700_pc_init (PCIBus *unused, int action);
+static void ib700_write_enable_reg (void *vp, uint32_t addr, uint32_t data);
+static void ib700_write_disable_reg (void *vp, uint32_t addr, uint32_t data);
+static int ib700_save_live (QEMUFile *f, int stage, void *vp);
+static void ib700_save (QEMUFile *f, void *vp);
+static int ib700_load (QEMUFile *f, void *vp, int version);
+
+static struct wdt_methods ib700_wdt = {
+    .wdt_pc_init = ib700_pc_init,
+};
+
+static struct wdt_model model = {
+  .wdt_name = "ib700",
+  .wdt_description = "iBASE 700",
+  .wdt_methods = &ib700_wdt,
+};
+
+void
+wdt_ib700_init (void)
+{
+    model.wdt_next = wdt_models;
+    wdt_models = &model;
+}
+
+/* Create and initialize a virtual IB700 during PC creation. */
+static void
+ib700_pc_init (PCIBus *unused, int action)
+{
+    register_savevm_live ("ib700_wdt", 0, 0,
+			  ib700_save_live, ib700_save, ib700_load, NULL);
+
+    register_ioport_write (0x441, 2, 1, ib700_write_enable_reg, NULL);
+    register_ioport_write (0x443, 2, 1, ib700_write_disable_reg, NULL);
+}
+
+static void
+ib700_write_enable_reg (void *vp, uint32_t addr, uint32_t data)
+{
+#ifdef IB700_DEBUG
+    fprintf (stderr, "ib700_write_enable_reg: addr = %x, data = %x\n",
+	     addr, data);
+#endif
+}
+
+static void
+ib700_write_disable_reg (void *vp, uint32_t addr, uint32_t data)
+{
+#ifdef IB700_DEBUG
+    fprintf (stderr, "ib700_write_disable_reg: addr = %x, data = %x\n",
+	     addr, data);
+#endif
+}
+
+static int
+ib700_save_live (QEMUFile *f, int stage, void *vp)
+{
+    abort ();
+}
+
+static void
+ib700_save (QEMUFile *f, void *vp)
+{
+    abort ();
+}
+
+static int
+ib700_load (QEMUFile *f, void *vp, int version)
+{
+    abort ();
+}
Index: hw/wdt_ib700.h
===================================================================
--- hw/wdt_ib700.h	(revision 0)
+++ hw/wdt_ib700.h	(revision 0)
@@ -0,0 +1,28 @@
+/*
+ * Virtual hardware watchdog.
+ *
+ * Copyright (C) 2008 Red Hat Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ *
+ * By Richard W.M. Jones (rjones@redhat.com).
+ */
+
+#ifndef _QEMU_WDT_IB700_H
+#define _QEMU_WDT_IB700_H
+
+extern void wdt_ib700_init (void);
+
+#endif /* _QEMU_WDT_IB700_H */
Index: hw/pc.c
===================================================================
--- hw/pc.c	(revision 6643)
+++ hw/pc.c	(working copy)
@@ -37,6 +37,9 @@
 #include "virtio-balloon.h"
 #include "virtio-console.h"
 #include "hpet_emul.h"
+#include "wdt_ib700.h"
+#include "wdt_i6300esb.h"
+#include "watchdog.h"
 
 /* output Bochs bios info messages */
 //#define DEBUG_BIOS
@@ -1001,6 +1004,10 @@
         }
     }
 
+    wdt_ib700_init ();
+    wdt_i6300esb_init ();
+    watchdog_pc_init (pci_bus);
+
     for(i = 0; i < nb_nics; i++) {
         NICInfo *nd = &nd_table[i];
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-25 23:37 [Qemu-devel] Hardware watchdogs (patch for discussion only) Richard W.M. Jones
@ 2009-02-26 10:51 ` Daniel P. Berrange
  2009-02-26 13:55   ` Richard W.M. Jones
  2009-02-26 14:31   ` Steve Fosdick
  0 siblings, 2 replies; 13+ messages in thread
From: Daniel P. Berrange @ 2009-02-26 10:51 UTC (permalink / raw)
  To: qemu-devel

On Wed, Feb 25, 2009 at 11:37:18PM +0000, Richard W.M. Jones wrote:
> Hi:
> 
> I want to share an unfinished patch with the list just to make sure
> that I'm heading in the right direction.
> 
> This patch aims to implement a virtual hardware watchdog device.  A
> hardware watchdog in a real machine is a device that must be tickled
> periodically by a userspace process to prove that the machine's
> userspace is (in some sense) "alive".  If the device isn't tickled
> then after some timeout, it reboots the machine.
> 
> These devices are generally very simple.  I picked two devices to
> emulate: the IBase 700, which is almost trivial, an ISA port to enable
> and set the timeout, and another ISA port to disable.  And the Intel
> 6300ESB, a PCI device which represents a mid-high end range of
> features and is very well documented.  Both have clean Linux device
> drivers:
> 
>   http://lxr.linux.no/linux+v2.6.28.5/drivers/watchdog/ib700wdt.c
>   http://lxr.linux.no/linux+v2.6.28.5/drivers/watchdog/i6300esb.c
> 
> (Both also come with Windows drivers which I haven't tested)
> 

A quick note WRT the possible actions

> @@ -4072,6 +4073,8 @@
>             "-old-param      old param mode\n"
>  #endif
>             "-tb-size n      set TB size\n"
> +	   "-watchdog ib700|i6300esb[,action=reboot|shutdown|pause|exit]\n"
> +	   "                enable virtual hardware watchdog [default=none]\n"
>             "-incoming p     prepare for incoming migration, listen on port p\n"
>             "\n"
>             "During emulation, the following keys are useful:\n"


> +/* This actually performs the "action" once a watchdog has expired,
> + * ie. reboot, shutdown, exit, etc.
> + */
> +void
> +watchdog_perform_action (int action)
> +{
> +    fprintf (stderr, "qemu: watchdog %s!\n", string_of_action (action));
> +
> +    switch (action) {
> +    case WDT_REBOOT:
> +	qemu_system_reset ();
> +	break;

This one shouldn't be called directly. qemu_system_reset_request()
sets a flag to interrupt the CPU, and then the main loop when
seeing the flag set, will call qemu_system_reset().

It also doesn't actally do a reboot. It hard resets the CPU and
devices. The guest OS doesn't do a controlled shutdown.

> +    case WDT_REBOOTNICE:
> +	qemu_system_reset_request ();
> +	break;
> +
> +    case WDT_SHUTDOWN:
> +	qemu_system_powerdown ();
> +	break;

Likewise here - this is called indirectly by the main loop after
qemu_system_powerdown_request() sets the powerdown flag.

> +    case WDT_SHUTDOWNNICE:
> +	qemu_system_powerdown_request ();
> +	break;
> +
> +    case WDT_PAUSE:
> +	vm_stop (0);
> +	break;
> +
> +    case WDT_EXIT:
> +	exit (0);
> +    }
> +}


> +
> +The @var{action} controls what QEMU will do when the timer expires.
> +The default is
> +@code{reboot} (forcefully reboot the guest).
> +Other possible actions are:
> +@code{rebootnice} (attempt to gracefully reboot the guest).
> +@code{shutdown} (forcefully shutdown the guest),
> +@code{shutdownnice} (attempt to gracefully shutdown the guest),
> +@code{pause} (pause the guest), or
> +@code{exit} (immediately exit the QEMU process).

I think we can only support the following options

 - shutdown - graceful shutdown of guest via ACPI event via
              qemu_system_powerdown_request()
 - poweroff - hard immediate power off of guest machine via
              qemu_system_shutdown_request()
 - reset    - hard reset of the guest machine via
              qemu_system_reset_request()
 - pause    - stop the guest CPU(s)

Don't think we ned an 'exit' event, because I believe 'poweroff' should
cause the emulator to exit

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-26 10:51 ` Daniel P. Berrange
@ 2009-02-26 13:55   ` Richard W.M. Jones
  2009-02-26 19:30     ` Blue Swirl
  2009-02-26 14:31   ` Steve Fosdick
  1 sibling, 1 reply; 13+ messages in thread
From: Richard W.M. Jones @ 2009-02-26 13:55 UTC (permalink / raw)
  To: Daniel P. Berrange, qemu-devel

On Thu, Feb 26, 2009 at 10:51:06AM +0000, Daniel P. Berrange wrote:
> I think we can only support the following options
> 
>  - shutdown - graceful shutdown of guest via ACPI event via
>               qemu_system_powerdown_request()
>  - poweroff - hard immediate power off of guest machine via
>               qemu_system_shutdown_request()
>  - reset    - hard reset of the guest machine via
>               qemu_system_reset_request()
>  - pause    - stop the guest CPU(s)

Thanks.

My experiments show that qemu_system_powerdown_request does nothing
visible, even for a recent (F-10) guest running acpid.
"system_powerdown" in the monitor also does nothing for the same
guest.  Maybe this is a recent regression in QEMU?

Anyway, I had a look at what monitor.c is doing and came up with this
list of actions / functions:

void
watchdog_perform_action (int action)
{
    fprintf (stderr, "qemu: watchdog %s!\n", string_of_action (action));

    switch (action) {
    case WDT_RESET:		/* same as 'system_reset' in monitor */
        qemu_system_reset_request ();
        break;

    case WDT_SHUTDOWN:		/* same as 'system_powerdown' in monitor */
        qemu_system_powerdown_request ();
	break;

    case WDT_POWEROFF:		/* same as 'quit' command in monitor */
        exit (0);
	break;

    case WDT_PAUSE:		/* same as 'stop' command in monitor */
        vm_stop (0);
        break;
    }
}

I have tested all four combinations, and they all work correctly
except for the 'shutdown' option.

Rich.

-- 
Richard Jones, Emerging Technologies, Red Hat  http://et.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-26 10:51 ` Daniel P. Berrange
  2009-02-26 13:55   ` Richard W.M. Jones
@ 2009-02-26 14:31   ` Steve Fosdick
  2009-02-26 14:45     ` Richard W.M. Jones
  2009-02-26 17:50     ` Jamie Lokier
  1 sibling, 2 replies; 13+ messages in thread
From: Steve Fosdick @ 2009-02-26 14:31 UTC (permalink / raw)
  To: qemu-devel

On Thu, 2009-02-26 at 10:51 +0000, Daniel P. Berrange wrote:

> I think we can only support the following options
> 
>  - shutdown - graceful shutdown of guest via ACPI event via
>               qemu_system_powerdown_request()

I wonder how many times the guest will be healthy enough to respond to
this and how many times it will have crashed badly enough that this does
no good.

Perhaps we could have a second timer such that if, on asking the guest
to shut down via ACPI, the guest does not respond within a certain time
limit with an ACPI request to turn the power off we go for one of the
other options below?

>  - poweroff - hard immediate power off of guest machine via
>               qemu_system_shutdown_request()
>  - reset    - hard reset of the guest machine via
>               qemu_system_reset_request()
>  - pause    - stop the guest CPU(s)

Thinking a little more on this I can see two use cases for a watchdog:

1. Ensure continuity of service.  When a guest OS gets stuck for some
reason make sure it is re-started.  This is probably the only use case
on a real physical machine.

2. Limit the resource consumption of a crashed guest when the host
serves other guests.  This probably only of concern for virtual
machines, i.e. it is specific to the emulated watchdog and its
interaction with qemu rather than being part of how a physical watchdog
works.

Looking at the actions proposed by Daniel shutdown, poweroff and pause
support the second use case whereas reset supports the first.

Do we want to offer the guest the option of a clean shutdown if it can
still manage that and then reboot, i.e. the shutdown option but for use
case 1?

If so we need to be able to turn the APCI power off request into a reset
instead.  We already have the -no-reboot option to turn a reboot into a
power off - this is the opposite.

In fact, some people may find that option useful anyway even without the
watchdog.  In an environment where someone has privileged access to a
guest but no direct access to the host OS he could shut down a guest
accidentally when intending to reboot (or logoff).  It may be useful to
trap that and turn the shutdown into a reboot.

Regards,
Steve.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-26 14:31   ` Steve Fosdick
@ 2009-02-26 14:45     ` Richard W.M. Jones
  2009-02-27  9:55       ` Steve Fosdick
  2009-02-26 17:50     ` Jamie Lokier
  1 sibling, 1 reply; 13+ messages in thread
From: Richard W.M. Jones @ 2009-02-26 14:45 UTC (permalink / raw)
  To: qemu-devel

On Thu, Feb 26, 2009 at 02:31:22PM +0000, Steve Fosdick wrote:
> On Thu, 2009-02-26 at 10:51 +0000, Daniel P. Berrange wrote:
> 
> > I think we can only support the following options
> > 
> >  - shutdown - graceful shutdown of guest via ACPI event via
> >               qemu_system_powerdown_request()
> 
> I wonder how many times the guest will be healthy enough to respond to
> this and how many times it will have crashed badly enough that this does
> no good.
[...]
> Do we want to offer the guest the option of a clean shutdown if it can
> still manage that and then reboot, i.e. the shutdown option but for use
> case 1?

The Intel 6300ESB in fact offers this possibility already.  As with
other high-end watchdog hardware, it can be configured to send a
"pretimer interrupt".  For example, if the watchdog is set to expire
after 60 seconds, you can get an interrupt N seconds before this,
which you can use to try a graceful shutdown.

Having said that, Linux watchdog software doesn't support this feature ...

Rich.

-- 
Richard Jones, Emerging Technologies, Red Hat  http://et.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-26 14:31   ` Steve Fosdick
  2009-02-26 14:45     ` Richard W.M. Jones
@ 2009-02-26 17:50     ` Jamie Lokier
  2009-02-27  9:50       ` Steve Fosdick
  1 sibling, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2009-02-26 17:50 UTC (permalink / raw)
  To: qemu-devel

Steve Fosdick wrote:
> Perhaps we could have a second timer such that if, on asking the guest
> to shut down via ACPI, the guest does not respond within a certain time
> limit with an ACPI request to turn the power off we go for one of the
> other options below?

Good idea.  ACPI is notoriously flaky, especially on a guest which has
already crashed its kernel...

> 1. Ensure continuity of service.  When a guest OS gets stuck for some
> reason make sure it is re-started.  This is probably the only use case
> on a real physical machine.

For real continuity of service you'd also want QEMU itself to have a
watchdog.  Either a software watchdog internally (SIGALRM => kill/exec
self, or child process expecting regular pings over a pipe), or by
QEMU itself becoming a client of the host watchdog.

I say this because I've experienced KVM lock up several times.

> 2. Limit the resource consumption of a crashed guest when the host
> serves other guests.  This probably only of concern for virtual
> machines, i.e. it is specific to the emulated watchdog and its
> interaction with qemu rather than being part of how a physical watchdog
> works.

Related to this is "omg the database guest has crashed - and frankly
we don't rtust the automatic recovery process - stop it for now and
we'll inspect for damage manually before starting it again".

> Do we want to offer the guest the option of a clean shutdown if it can
> still manage that and then reboot, i.e. the shutdown option but for use
> case 1?
> 
> If so we need to be able to turn the APCI power off request into a reset
> instead.  We already have the -no-reboot option to turn a reboot into a
> power off - this is the opposite.

Interesting idea.

> In fact, some people may find that option useful anyway even without the
> watchdog.  In an environment where someone has privileged access to a
> guest but no direct access to the host OS he could shut down a guest
> accidentally when intending to reboot (or logoff).  It may be useful to
> trap that and turn the shutdown into a reboot.

I've done that a few times.  It's only minorly annoying in that you
lose the VNC connection and have to login and restart the VM.

Side notes: It would be nice to be able to change the
"shutdown-when-asked-to-reboot" (et al) option from the monitor.  It
would also be nice to "pause-when-asked-to-shutdown/reboot", which is
useful during automatic OS installs - the host script changes the
media and/or hardware at each reboot.

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-26 13:55   ` Richard W.M. Jones
@ 2009-02-26 19:30     ` Blue Swirl
  0 siblings, 0 replies; 13+ messages in thread
From: Blue Swirl @ 2009-02-26 19:30 UTC (permalink / raw)
  To: qemu-devel

On 2/26/09, Richard W.M. Jones <rjones@redhat.com> wrote:
> On Thu, Feb 26, 2009 at 10:51:06AM +0000, Daniel P. Berrange wrote:
>  > I think we can only support the following options
>  >
>  >  - shutdown - graceful shutdown of guest via ACPI event via
>  >               qemu_system_powerdown_request()
>  >  - poweroff - hard immediate power off of guest machine via
>  >               qemu_system_shutdown_request()
>  >  - reset    - hard reset of the guest machine via
>  >               qemu_system_reset_request()
>  >  - pause    - stop the guest CPU(s)
>
>
> Thanks.
>
>  My experiments show that qemu_system_powerdown_request does nothing
>  visible, even for a recent (F-10) guest running acpid.
>  "system_powerdown" in the monitor also does nothing for the same
>  guest.  Maybe this is a recent regression in QEMU?

It's only implemented on Sparc32 and even there Linux does not have a
driver for the power fail interrupt. The idea was that power supply,
power button press or UPS would notify the OS that power will fail and
the system should be shut down.

On Sparc32 actually the power fail interrupt doesn't exactly indicate
what I thought it does, there are two power supplies and the other one
may fail but system can still be running for a long time.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-26 17:50     ` Jamie Lokier
@ 2009-02-27  9:50       ` Steve Fosdick
  2009-02-27 12:19         ` Paul Brook
  0 siblings, 1 reply; 13+ messages in thread
From: Steve Fosdick @ 2009-02-27  9:50 UTC (permalink / raw)
  To: qemu-devel

On Thu, 2009-02-26 at 17:50 +0000, Jamie Lokier wrote:

> For real continuity of service you'd also want QEMU itself to have a
> watchdog.  Either a software watchdog internally (SIGALRM => kill/exec
> self, or child process expecting regular pings over a pipe), or by
> QEMU itself becoming a client of the host watchdog.

So many possibilities - one, two, or three watchdogs?

If we want to cater for the situation where one host is running more
than one guest we would not want a single watchdog with a communication
path from one of the guests to a hardware watchdog on the host because
this would cause the host to reboot if any one of the guests failed thus
rebooting guests that were still working.

A two-watchdog solution could work though.

The host would be protected with a normal hardware watchdog and this
would use a normal user-space process to tickle that watchdog rather
than QEMU.

For QEMU there is a software watchdog that looks to the guest like a
hardware watchdog and therefore uses the already written driver and a
normal user space process on the guest.

The timer part of the QEMU software watchdog is implemented as a second
userspace process on the host which communicates with the main QEMU
process.  When the guest tickles the watchdog the tickle is forwarded to
the separate watchdog process which resets the timer.  When the timer
goes off the watchdog sends a message back to QEMU to perform the
configured action which QEMU must confirm is happening with a further
message back to the watchdog process.  If QEMU does not respond the
watchdog process uses host OS-level facilities to kill and re-start it.

The three watchdog solution uses the host hardware watchdog, a software
watchdog within QEMU as implemented by Richard and another watchdog that
sits in a separate process which can 'ping' QEMU and, if it does not
respond, uses host OS-level facilities to kill and re-start it.

> > In fact, some people may find that option useful anyway even without the
> > watchdog.  In an environment where someone has privileged access to a
> > guest but no direct access to the host OS he could shut down a guest
> > accidentally when intending to reboot (or logoff).  It may be useful to
> > trap that and turn the shutdown into a reboot.
> 
> I've done that a few times.  It's only minorly annoying in that you
> lose the VNC connection and have to login and restart the VM.

I suspect most of the people using QEMU have access to the host OS too
which means as you say that it is no big deal.

In some corporate environments accidentally shutting down a guest means
submitting a job to another team and then waiting while they get round
to restarting it.  A watchdog is one feature that takes QEMU in the
direction of being suitable for a business production environment and
this feature seemed like it would also be useful in such an environment.

> Side notes: It would be nice to be able to change the
> "shutdown-when-asked-to-reboot" (et al) option from the monitor.  It
> would also be nice to "pause-when-asked-to-shutdown/reboot", which is
> useful during automatic OS installs - the host script changes the
> media and/or hardware at each reboot.

Seems like a useful feature.

This is beginning to like a complete matrix rather than a simple option
i.e. QEMU could be configured to do any of the stop/reset/poweroff/reset
etc. options for any of the requests from the guest.

Regards,
Steve.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-26 14:45     ` Richard W.M. Jones
@ 2009-02-27  9:55       ` Steve Fosdick
  0 siblings, 0 replies; 13+ messages in thread
From: Steve Fosdick @ 2009-02-27  9:55 UTC (permalink / raw)
  To: qemu-devel

On Thu, 2009-02-26 at 14:45 +0000, Richard W.M. Jones wrote:

> The Intel 6300ESB in fact offers this possibility already.  As with
> other high-end watchdog hardware, it can be configured to send a
> "pretimer interrupt".  For example, if the watchdog is set to expire
> after 60 seconds, you can get an interrupt N seconds before this,
> which you can use to try a graceful shutdown.
> 
> Having said that, Linux watchdog software doesn't support this feature ...

Am I right in thinking that the kernel will not respond to an ACPI event
that indicates a power button has been pressed but forward it to
userspace where something (acpid) should respond?

If so there isn't much point in making the pre-timer interrupt do that
as we are assuming userspace is dead.  Perhaps instead it could use the
Alt-SysRq mechanism to do an emergency sync,reboot?

Regards,
Steve.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-27  9:50       ` Steve Fosdick
@ 2009-02-27 12:19         ` Paul Brook
  2009-02-28 21:34           ` Jamie Lokier
  2009-02-28 22:11           ` Richard W.M. Jones
  0 siblings, 2 replies; 13+ messages in thread
From: Paul Brook @ 2009-02-27 12:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: Steve Fosdick

On Friday 27 February 2009, Steve Fosdick wrote:
> On Thu, 2009-02-26 at 17:50 +0000, Jamie Lokier wrote:
> > For real continuity of service you'd also want QEMU itself to have a
> > watchdog.  Either a software watchdog internally (SIGALRM => kill/exec
> > self, or child process expecting regular pings over a pipe), or by
> > QEMU itself becoming a client of the host watchdog.
>
> So many possibilities - one, two, or three watchdogs?

IMHO external watchdog (i.e. ones that monitor qemu itself) are out of scope. 
We should restrict this to internal watchdog devices within a VM.

There already exist several solutions for external fencing. Traditionally 
these are used in a clustered environment, where physical machines are 
connected to remote power switches (or equivalent management cards). Making 
these system kill/restart virtual machines seems like it should be a very 
minor tweak.

Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-27 12:19         ` Paul Brook
@ 2009-02-28 21:34           ` Jamie Lokier
  2009-02-28 22:00             ` Andreas Färber
  2009-02-28 22:11           ` Richard W.M. Jones
  1 sibling, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2009-02-28 21:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: Steve Fosdick

Paul Brook wrote:
> IMHO external watchdog (i.e. ones that monitor qemu itself) are out
> of scope.  We should restrict this to internal watchdog devices
> within a VM.

I agree.  The thing of interest is whether the guest is running, not
whether QEMU is running.

It would be good for the internal watchdog device to have the option
of no effect except for its status to be available in the QEMU
monitor, so management apps can see when the guest has crashed.

Steve's idea of a matrix (Guest event -> QEMU action) is a good one,
and could include "watchdog not pinged" as a guest event.

-- Jamie

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-28 21:34           ` Jamie Lokier
@ 2009-02-28 22:00             ` Andreas Färber
  0 siblings, 0 replies; 13+ messages in thread
From: Andreas Färber @ 2009-02-28 22:00 UTC (permalink / raw)
  To: qemu-devel


Am 28.02.2009 um 22:34 schrieb Jamie Lokier:

> Paul Brook wrote:
>> IMHO external watchdog (i.e. ones that monitor qemu itself) are out
>> of scope.  We should restrict this to internal watchdog devices
>> within a VM.
>
> I agree.  The thing of interest is whether the guest is running, not
> whether QEMU is running.
>
> It would be good for the internal watchdog device to have the option
> of no effect except for its status to be available in the QEMU
> monitor, so management apps can see when the guest has crashed.

...or when the guest runs into ENOSPC. ;)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] Hardware watchdogs (patch for discussion only)
  2009-02-27 12:19         ` Paul Brook
  2009-02-28 21:34           ` Jamie Lokier
@ 2009-02-28 22:11           ` Richard W.M. Jones
  1 sibling, 0 replies; 13+ messages in thread
From: Richard W.M. Jones @ 2009-02-28 22:11 UTC (permalink / raw)
  To: qemu-devel

On Fri, Feb 27, 2009 at 12:19:53PM +0000, Paul Brook wrote:
> IMHO external watchdog (i.e. ones that monitor qemu itself) are out
> of scope.  We should restrict this to internal watchdog devices
> within a VM.

Yes -- this patch just tries to emulate existing hardware devices.  As
you say, checking if QEMU is up is a separate matter.

BTW does anyone want to review the patch?

http://lists.gnu.org/archive/html/qemu-devel/2009-02/msg01434.html

Rich.

-- 
Richard Jones, Emerging Technologies, Red Hat  http://et.redhat.com/~rjones
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://et.redhat.com/~rjones/virt-top

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-02-28 22:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-25 23:37 [Qemu-devel] Hardware watchdogs (patch for discussion only) Richard W.M. Jones
2009-02-26 10:51 ` Daniel P. Berrange
2009-02-26 13:55   ` Richard W.M. Jones
2009-02-26 19:30     ` Blue Swirl
2009-02-26 14:31   ` Steve Fosdick
2009-02-26 14:45     ` Richard W.M. Jones
2009-02-27  9:55       ` Steve Fosdick
2009-02-26 17:50     ` Jamie Lokier
2009-02-27  9:50       ` Steve Fosdick
2009-02-27 12:19         ` Paul Brook
2009-02-28 21:34           ` Jamie Lokier
2009-02-28 22:00             ` Andreas Färber
2009-02-28 22:11           ` Richard W.M. Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).