All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] x86/watchdog: Always disable watchdog before console_force_unlock()
@ 2013-08-09 21:17 Andrew Cooper
  2013-08-12  8:50 ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Cooper @ 2013-08-09 21:17 UTC (permalink / raw)
  To: Xen-devel; +Cc: Andrew Cooper, Keir Fraser, Jan Beulich, Tim Deegan

Depending on the state of the conring and serial_tx_buffer,
console_force_unlock() can be a long running operation, usually because of
serial_start_sync()

XenServer testing has found a reliable case where console_force_unlock() on
one PCPU takes long enough for another PCPU to timeout due to the watchdog
(such as waiting for a tlb flush callin).

The watchdog timeout causes the second PCPU to repeat the
console_force_unlock(), at which point the first PCPU typically fails an
assertion in spin_unlock_irqrestore(&port->tx_lock) (because the tx_lock has
been unlocked behind itself).

console_force_unlock() is only on emergency paths, so one way or another the
host is going down.  Disable the watchdog before forcing the console lock to
help prevent having pcpus completing with each other to bring the host down.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/cpu/mcheck/mce.c |    1 +
 xen/arch/x86/nmi.c            |    1 +
 xen/arch/x86/traps.c          |    3 +++
 3 files changed, 5 insertions(+)

diff --git a/xen/arch/x86/cpu/mcheck/mce.c b/xen/arch/x86/cpu/mcheck/mce.c
index 93d7ae1..4c679f3 100644
--- a/xen/arch/x86/cpu/mcheck/mce.c
+++ b/xen/arch/x86/cpu/mcheck/mce.c
@@ -1537,6 +1537,7 @@ static void mc_panic_dump(void)
 void mc_panic(char *s)
 {
     is_mc_panic = 1;
+    watchdog_disable();
     console_force_unlock();
 
     printk("Fatal machine check: %s\n", s);
diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index c93812f..091e520 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -439,6 +439,7 @@ void nmi_watchdog_tick(struct cpu_user_regs * regs)
         this_cpu(alert_counter)++;
         if ( this_cpu(alert_counter) == opt_watchdog_timeout*nmi_hz )
         {
+            watchdog_disable();
             console_force_unlock();
             printk("Watchdog timer detects that CPU%d is stuck!\n",
                    smp_processor_id());
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 57dbd0c..b12869e 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -3163,6 +3163,7 @@ static void pci_serr_error(struct cpu_user_regs *regs)
         raise_softirq(PCI_SERR_SOFTIRQ);
         break;
     default:  /* 'fatal' */
+        watchdog_disable();
         console_force_unlock();
         printk("\n\nNMI - PCI system error (SERR)\n");
         fatal_trap(TRAP_nmi, regs);
@@ -3178,6 +3179,7 @@ static void io_check_error(struct cpu_user_regs *regs)
     case 'i': /* 'ignore' */
         break;
     default:  /* 'fatal' */
+        watchdog_disable();
         console_force_unlock();
         printk("\n\nNMI - I/O ERROR\n");
         fatal_trap(TRAP_nmi, regs);
@@ -3197,6 +3199,7 @@ static void unknown_nmi_error(struct cpu_user_regs *regs, unsigned char reason)
     case 'i': /* 'ignore' */
         break;
     default:  /* 'fatal' */
+        watchdog_disable();
         console_force_unlock();
         printk("Uhhuh. NMI received for unknown reason %02x.\n", reason);
         printk("Do you have a strange power saving mode enabled?\n");
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-08-12 11:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-09 21:17 [RFC] x86/watchdog: Always disable watchdog before console_force_unlock() Andrew Cooper
2013-08-12  8:50 ` Jan Beulich
2013-08-12  9:35   ` Andrew Cooper
2013-08-12  9:43     ` Jan Beulich
2013-08-12 11:31       ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.