[RFC] Common API for bring the system to a crash

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] Common API for bring the system to a crash_stop state
@ 2006-10-13 13:23 Keith Owens
  2006-10-13 13:24 ` [RFC] crash_stop: crash_stop_headers Keith Owens
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Keith Owens @ 2006-10-13 13:23 UTC (permalink / raw)
  To: linux-arch
  Cc: kdb, fastboot, lkcd-devel.lists.sourceforge.net, info,
	Jan Beulich, James.Bottomley, linux-visws-devel

This initial mail is going to a wide distribution because there are
several different people and groups who are working on kernel debug
style tools.  These tools include debuggers such as kdb, kgdb, nlkd.
There are also kernel dump tools like netdump, lkcd, crash, kexec/kdump
and others.  To cut down the cross list noise, I have arbitrarily
designated linux-arch@vger.kernel.org as the only list to receive and
discuss the patches that follow this initial mail.

Reply-To is set to linux-arch@vger.kernel.org, please honour it and
trim the rest of the cc: list.

-------------------------------------------------------------------------------------

All the kernel debug style tools (kdb, kgdb, nlkd, netdump, lkcd,
crash, kdump etc.) have a common requirement, they need to do a crash
stop of the systems.  This means stopping all the cpus, even if some of
the cpus are spinning disabled.  In addition, each cpu has to save
enough state to start diagnosis of the problem.

* Each debug style tool has written its own code for interrupting the
  other cpus and for saving cpu state.

* Some tools try a normal IPI first then send a non-maskable interrupt
  after a delay.

* Some tools always send a NMI first, which can result in incomplete
  machine state if it arrives at the wrong time.

* Most of the tools do not know how to cope with the IA64 architecture
  defined rendezvous algorithm, which interferes with an OS driven
  rendezvous.

* Needless to say, every single patch set conflicts with all the
  others, which makes it very difficult to install more than one of the
  tools at a time.

The solution is to define a common crash_stop API that can be used by
_all_ of the debug style tools, without reinventing the wheel each
time.

The following crash_stop patches will only appear on linux-arch.

crash_stop_headers         The common and i386 crash_stop.h files

crash_stop_i386_handler    Add the crash_stop i386 interrupt handlers.
                           This patch changes existing i386 files.  It
                           needs testing on visw and updating for
                           voyager.

crash_stop_i386            I386 specific crash_stop code.

crash_stop_common          Architecture independent crash_stop code.

crash_stop_common_Kconfig  Kconfig change to activate crash_stop.

crash_stop_demo            A demo module to test crash_stop().

This is a work in progress, it does most of the job on i386.  x86_64
will be easy once i386 is working.  I have an incomplete patch for
ia64, coexxisting with the MCA/INIT rendezvous algorithm is
non-trivial.  At the moment, I am more interested in feedback on the
design of the API, to ensure that it suits everybody's requirements.

Most of the design documentation is in the crash_stop_common patch.
Please read that before replying.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] crash_stop: crash_stop_headers
  2006-10-13 13:23 [RFC] Common API for bring the system to a crash_stop state Keith Owens
@ 2006-10-13 13:24 ` Keith Owens
  2006-10-13 13:25 ` [RFC] crash_stop: crash_stop_i386_handler Keith Owens
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Keith Owens @ 2006-10-13 13:24 UTC (permalink / raw)
  To: linux-arch

crash_stop() headers.  The common header defines the API.  The
architecture headers define arch dependent state that is saved on each
cpu, this is typically the data that is required to get a decent
backtrace.

---
 include/asm-i386/crash_stop.h |   14 ++++++++++
 include/linux/crash_stop.h    |   57 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+)

Index: linux/include/asm-i386/crash_stop.h
===================================================================
--- /dev/null
+++ linux/include/asm-i386/crash_stop.h
@@ -0,0 +1,14 @@
+#ifndef _ASM_CRASH_STOP_H
+#define _ASM_CRASH_STOP_H
+
+/* CONFIG_4KSTACKS means that the registers (including eip) at the time of the
+ * interrupt can be on one stack while the crash_stop code is running on
+ * another stack.  We have to save the current esp and eip.
+ */
+struct crash_stop_running_process_arch
+{
+	unsigned long esp;
+	unsigned long eip;
+};
+
+#endif	/* _ASM_CRASH_STOP_H */
Index: linux/include/linux/crash_stop.h
===================================================================
--- /dev/null
+++ linux/include/linux/crash_stop.h
@@ -0,0 +1,57 @@
+#ifndef _LINUX_CRASH_STOP_H
+#define _LINUX_CRASH_STOP_H
+
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+
+#include <linux/cpumask.h>
+#include <linux/ptrace.h>
+#include <asm/crash_stop.h>
+
+typedef asmlinkage int (*printk_t)(const char * fmt, ...)
+	__attribute__ ((format (printf, 1, 2)));
+
+/* These five entries are the only ones used by code outside crash_stop itself.
+ * Anything starting with 'crash_stop' is part of the external ABI, anything
+ * starting with'cs_' is only to be used by internal crash_stop code.
+ */
+extern int crash_stop(void (*callback)(int monarch, void *data),
+		      void *data, struct pt_regs *regs, printk_t print,
+		      const char *text);
+extern int crash_stop_recovered(void);
+extern void crash_stop_cpu(int monarch, struct pt_regs *regs);
+extern int crash_stop_sent_nmi(void);
+struct crash_stop_running_process {
+	struct task_struct *p;
+	struct pt_regs *regs;
+	struct crash_stop_running_process *prev;
+	struct crash_stop_running_process_arch arch;
+};
+
+extern void cs_common_ipi(struct pt_regs *regs);
+extern void cs_arch_send_ipi(int);
+extern void cs_arch_send_nmi(int);
+
+extern void cs_arch_cpu(int, struct crash_stop_running_process *);
+extern void cs_common_cpu(int);
+
+extern spinlock_t cs_lock;
+extern int cs_monarch;
+extern int cs_notify_chain;
+
+struct cs_global {
+	void (*callback)(int monarch, void *data);
+	void *data;
+	printk_t print;
+};
+extern struct cs_global cs_global;
+
+#else	/* !CONFIG_CRASH_STOP_SUPPORTED */
+
+#define crash_stop(callback, data, regs, print, text) {(void) data; (void) regs; -ENOSYS}
+#define crash_stop_recovered() do {} while(0)
+#define crash_stop_cpu(monarch, regs) {(void) monarch; (void)regs}
+#define crash_stop_sent_nmi() 0
+
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
+
+#endif	/* _LINUX_CRASH_STOP_H */


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] crash_stop: crash_stop_i386_handler
  2006-10-13 13:23 [RFC] Common API for bring the system to a crash_stop state Keith Owens
  2006-10-13 13:24 ` [RFC] crash_stop: crash_stop_headers Keith Owens
@ 2006-10-13 13:25 ` Keith Owens
  2006-10-13 13:55   ` James Bottomley
  2006-10-13 13:26 ` [RFC] crash_stop: crash_stop_i386 Keith Owens
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Keith Owens @ 2006-10-13 13:25 UTC (permalink / raw)
  To: linux-arch

Define the i386 crash_stop() interrupt handler and associated routines.

Yes, I know that CRASH_STOP_VECTOR conflicts with the lkcd vector.
This is deliberate, one aim of the crash_stop() API is to remove all
the interrupt code from the various kernel debug patches.

Note 1: I have patched visw but I cannot test it, no hardware.

Note 2: This patch does not cover i386 voyager.  I do not understand
        the voyager interrupt mechanism well enough to define its
        interrupt handlers (hint, hint).

---
 arch/i386/kernel/smp.c                      |   21 +++++++++++++++++++++
 arch/i386/kernel/smpboot.c                  |    4 ++++
 include/asm-i386/hw_irq.h                   |    1 +
 include/asm-i386/mach-default/entry_arch.h  |    3 +++
 include/asm-i386/mach-default/irq_vectors.h |    1 +
 include/asm-i386/mach-visws/entry_arch.h    |    3 +++
 6 files changed, 33 insertions(+)

Index: linux/arch/i386/kernel/smp.c
===================================================================
--- linux.orig/arch/i386/kernel/smp.c
+++ linux/arch/i386/kernel/smp.c
@@ -20,6 +20,7 @@
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
 #include <linux/module.h>
+#include <linux/crash_stop.h>
 
 #include <asm/mtrr.h>
 #include <asm/tlbflush.h>
@@ -727,3 +728,23 @@ int safe_smp_processor_id(void)
 
 	return cpuid >= 0 ? cpuid : 0;
 }
+
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+void
+cs_arch_send_ipi(int cpu)
+{
+	send_IPI_mask(cpumask_of_cpu(cpu), CRASH_STOP_VECTOR);
+}
+
+void
+cs_arch_send_nmi(int cpu)
+{
+	send_IPI_mask(cpumask_of_cpu(cpu), NMI_VECTOR);
+}
+
+fastcall void smp_crash_stop_interrupt(struct pt_regs *regs)
+{
+	ack_APIC_irq();
+	cs_common_ipi(regs);
+}
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
Index: linux/arch/i386/kernel/smpboot.c
===================================================================
--- linux.orig/arch/i386/kernel/smpboot.c
+++ linux/arch/i386/kernel/smpboot.c
@@ -1497,6 +1497,10 @@ void __init smp_intr_init(void)
 
 	/* IPI for generic function call */
 	set_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);
+
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+	set_intr_gate(CRASH_STOP_VECTOR, crash_stop_interrupt);
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
 }
 
 /*
Index: linux/include/asm-i386/hw_irq.h
===================================================================
--- linux.orig/include/asm-i386/hw_irq.h
+++ linux/include/asm-i386/hw_irq.h
@@ -35,6 +35,7 @@ extern void (*interrupt[NR_IRQS])(void);
 fastcall void reschedule_interrupt(void);
 fastcall void invalidate_interrupt(void);
 fastcall void call_function_interrupt(void);
+fastcall void crash_stop_interrupt(void);
 #endif
 
 #ifdef CONFIG_X86_LOCAL_APIC
Index: linux/include/asm-i386/mach-default/entry_arch.h
===================================================================
--- linux.orig/include/asm-i386/mach-default/entry_arch.h
+++ linux/include/asm-i386/mach-default/entry_arch.h
@@ -13,6 +13,9 @@
 BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
 BUILD_INTERRUPT(invalidate_interrupt,INVALIDATE_TLB_VECTOR)
 BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+BUILD_INTERRUPT(crash_stop_interrupt,CRASH_STOP_VECTOR)
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
 #endif
 
 /*
Index: linux/include/asm-i386/mach-default/irq_vectors.h
===================================================================
--- linux.orig/include/asm-i386/mach-default/irq_vectors.h
+++ linux/include/asm-i386/mach-default/irq_vectors.h
@@ -48,6 +48,7 @@
 #define INVALIDATE_TLB_VECTOR	0xfd
 #define RESCHEDULE_VECTOR	0xfc
 #define CALL_FUNCTION_VECTOR	0xfb
+#define CRASH_STOP_VECTOR	0xfa
 
 #define THERMAL_APIC_VECTOR	0xf0
 /*
Index: linux/include/asm-i386/mach-visws/entry_arch.h
===================================================================
--- linux.orig/include/asm-i386/mach-visws/entry_arch.h
+++ linux/include/asm-i386/mach-visws/entry_arch.h
@@ -7,6 +7,9 @@
 BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
 BUILD_INTERRUPT(invalidate_interrupt,INVALIDATE_TLB_VECTOR)
 BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+BUILD_INTERRUPT(crash_stop_interrupt,CRASH_STOP_VECTOR)
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
 #endif
 
 /*


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] crash_stop: crash_stop_i386
  2006-10-13 13:23 [RFC] Common API for bring the system to a crash_stop state Keith Owens
  2006-10-13 13:24 ` [RFC] crash_stop: crash_stop_headers Keith Owens
  2006-10-13 13:25 ` [RFC] crash_stop: crash_stop_i386_handler Keith Owens
@ 2006-10-13 13:26 ` Keith Owens
  2006-10-13 13:27 ` [RFC] crash_stop: crash_stop_common Keith Owens
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Keith Owens @ 2006-10-13 13:26 UTC (permalink / raw)
  To: linux-arch

Add the i386 specific crash_stop code.  This contains routines that are
called from the common crash_stop code and from the i386 notify_die
chain.

Note 1: Not tested on visw nor voyager, no hardware.

Note 2: It is missing the code to handle a notify_chain (see scenario 4
        in kernel/crash_stop.c in a later patch).  I want to make sure
        that the API design is right before adding that code.

---
 arch/i386/kernel/Makefile     |    1 
 arch/i386/kernel/crash_stop.c |   75 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+)

Index: linux/arch/i386/kernel/Makefile
===================================================================
--- linux.orig/arch/i386/kernel/Makefile
+++ linux/arch/i386/kernel/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_VM86)		+= vm86.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
 obj-$(CONFIG_HPET_TIMER) 	+= hpet.o
 obj-$(CONFIG_K8_NB)		+= k8.o
+obj-$(CONFIG_CRASH_STOP_SUPPORTED)	+= crash_stop.o
 
 EXTRA_AFLAGS   := -traditional
 
Index: linux/arch/i386/kernel/crash_stop.c
===================================================================
--- /dev/null
+++ linux/arch/i386/kernel/crash_stop.c
@@ -0,0 +1,75 @@
+/*
+ * linux/arch/i386/crash_stop.c
+ *
+ * Copyright (C) 2006 Keith Owens <kaos@sgi.com>
+ *
+ * Most of the i386 specific bits of the crash_stop code.  There is a little
+ * bit of crash_stop code in arch/i386/kernel/{smp,smpboot}.c to handle
+ * CRASH_STOP_VECTOR, everything else is in this file.
+ */
+
+#include <linux/crash_stop.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/ptrace.h>
+#include <asm/kdebug.h>
+
+/* The starting point for a backtrace on a running process is set to
+ * cs_arch_cpu().  It would be nice to go up a couple of levels to start the
+ * backtrace where crash_stop() or cs_ipi() were invoked, but that is more work
+ * than I am willing to do in this function.  Starting the backtrace at
+ * cs_arch_cpu() is simple and reliable, which is exactly what we want when the
+ * machine is already failing.
+ */
+void
+cs_arch_cpu(int monarch, struct crash_stop_running_process *r)
+{
+	r->arch.esp = current_stack_pointer;
+	r->arch.eip = (unsigned long)current_text_addr();
+	/* separate any stack changes from current_stack_pointer above */
+	barrier();
+	cs_common_cpu(monarch);
+}
+
+/* Pick up any NMI IPIs that were sent by crash_stop. */
+static int
+crash_stop_arch_notify(struct notifier_block *self,
+		       unsigned long val, void *data)
+{
+	struct die_args *args = (struct die_args *)data;
+	switch(val) {
+	case DIE_NMI_IPI:
+		if (crash_stop_sent_nmi()) {
+			crash_stop_cpu(0, args->regs);
+			return NOTIFY_STOP;
+		}
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block crash_stop_arch_nb = {
+	.notifier_call = crash_stop_arch_notify,
+	.priority = ~0 >> 1,
+};
+
+static int __init
+crash_stop_arch_init(void)
+{
+	int err = register_die_notifier(&crash_stop_arch_nb);
+	if (err) {
+		printk(KERN_ERR "Failed to register crash_stop_arch_nb\n");
+		return err;
+	}
+	return 0;
+}
+
+static void __exit
+crash_stop_arch_exit(void)
+{
+	unregister_die_notifier(&crash_stop_arch_nb);
+	return;
+}
+
+module_init(crash_stop_arch_init);
+module_exit(crash_stop_arch_exit);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] crash_stop: crash_stop_common
  2006-10-13 13:23 [RFC] Common API for bring the system to a crash_stop state Keith Owens
                   ` (2 preceding siblings ...)
  2006-10-13 13:26 ` [RFC] crash_stop: crash_stop_i386 Keith Owens
@ 2006-10-13 13:27 ` Keith Owens
  2006-10-13 13:28 ` [RFC] crash_stop: crash_stop_common_Kconfig Keith Owens
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Keith Owens @ 2006-10-13 13:27 UTC (permalink / raw)
  To: linux-arch

The common crash_stop code, including the design level documentation.

This patch is missing the code to handle a notify_chain (see scenario 4
in kernel/crash_stop.c).  I want to make sure that the API design is
right before adding that code.

---
 kernel/Makefile     |    1 
 kernel/crash_stop.c |  563 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 564 insertions(+)

Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_UTS_NS) += utsname.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_CRASH_STOP_SUPPORTED) += crash_stop.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux/kernel/crash_stop.c
===================================================================
--- /dev/null
+++ linux/kernel/crash_stop.c
@@ -0,0 +1,563 @@
+/*
+ * linux/kernel/crash_stop.c
+ *
+ * Copyright (C) 2006 Keith Owens <kaos@sgi.com>
+ *
+ * Bring the system to a crash stop for debugging by stopping all the online
+ * cpus apart from the current cpu.  To interrupt the other cpus, first send a
+ * normal IPI, if any cpus have not responded after a few seconds then send a
+ * non-maskable interrupt.  Only used if CONFIG_SMP=y.
+ *
+ * These routines can be used by any debug style code that needs to stop the
+ * other cpus in the system, including those cpus that are not responding to
+ * normal interrupts.  Debug style code includes debuggers such as kdb, kgdb,
+ * nlkd as well as dump tools such as netdump, lkcd, kdump.  All these tools
+ * have the same basic synchronization requirements, the need to stop all the
+ * cpus, save the complete state of the tasks that were running then do some
+ * work on the current cpu.
+ *
+ * For each invocation of crash_stop, one cpu is the monarch, the other cpus
+ * are slaves.  There is no external guarantee of ordering between monarch and
+ * slave events.  The most common case is when the monarch is invoked via
+ * crash_stop(), it then drives the debugger's callback on the slave cpus,
+ * followed by the callback on the monarch cpu.
+ *
+ * Some architectures (IA64 in particular) define their own global machine
+ * synchronization events where a global event can drive the slave cpus ether
+ * before or after the monarch.  See INIT in Documentation/ia64/mca.txt.
+ *
+ * To hide the external monarch/slave races from the users of crash_stop, this
+ * code enforces a standard order on the events.  The debugger's callback
+ * routine is invoked on all the slaves "at the same time", followed 10 ms
+ * later by the callback on the monarch cpu.  Typically the callback will spin
+ * on the slave cpus until the monarch callback has done its work and released
+ * the slave cpus.
+ *
+ * There is no guarantee that all online cpus will be in crash_stop state when
+ * the monarch is entered.  If a cpu or chipset is so badly hung that it will
+ * not even respond to NMI then there will be no state for that cpu in
+ * crash_stop_running_process.
+ *
+ * A live locked system can result in a slave cpu processing the crash_stop IPI
+ * _after_ the monarch cpu has done its processing and left crash_stop status.
+ * The slave will not service the normal IPI fast enough (it is live locked
+ * with interrupts disabled) so it will be interrupted by NMI.  The monarch
+ * does its work and leaves crash_stop.  Later the slave gets out of the live
+ * lock and services the crash_stop IPI, but now there is no monarch to do
+ * anything.  To catch this delayed event, a crash_stop IPI is ignored if there
+ * is no current monarch.
+ *
+ * For some events, we cannot tell straight away if we want to debug the event
+ * or not.  For example, an IA64 MCA is architecturally defined to stop all the
+ * slaves before entering the monarch.  Only when the monarch is entered do we
+ * get any data on the event, it is only on the monarch that we can tell if the
+ * MCA is recoverable or not.  In this case, the monarch must call
+ * crash_stop_recovered() instead of crash_stop().  crash_stop_recovered()
+ * releases all the slaves.  Neither the slaves nor the monarch will use the
+ * callback routine.
+ *
+ * All routines are entered with interrupts disabled.  If necessary, the caller
+ * must disable interrupts before calling crash_stop.
+ */
+
+
+/* There are several possible scenarios for using crash_stop:
+ *
+ * (1) An explicit call to crash_stop from debugging code.  For example, a
+ *     direct entry into a debugger or an explicit request to dump via sysrq.
+ *     The debugging code calls crash_stop() which stops the slaves.
+ *
+ * (2) A nested call to crash_stop on the same cpu.  For example, a user is
+ *     debugging and they decide to take a kernel dump from inside the
+ *     debugger.  The debugger has already brought the system to crash_stop
+ *     state so the dump callback will be called on the current cpu (the
+ *     monarch) but not on the slaves.  The dump code uses the data that is
+ *     already in crash_stop_running_process[].
+ *
+ * (3) Concurrent calls to crash_stop on separate cpus.  One cpu will become
+ *     the monarch for one of the events and interrupt all the others,
+ *     including any cpus that are also trying to enter crash_stop.  When the
+ *     current monarch finishes, the other cpus will race for the crash_stop
+ *     lock and one will become the new monarch (assuming the system is still
+ *     usable).
+ *
+ * (4) A system error occurs and drives the notify_die callback chain, this one
+ *     can be tricky.  It is not known which entries on the notify_die chain
+ *     will do any work, but all of them need to see the same system state.  An
+ *     arch dependent crash_stop callback is called at the start and end of the
+ *     notify_die chain.  At the start it brings the system into crash_stop
+ *     state, using its own callbacks on the slave cpus.  Then it holds the
+ *     slave cpus and releases the monarch cpu.  This allows the rest of the
+ *     entries on the notify_die chain to run, each of them can call crash_stop
+ *     and run their callback on the current cpu and the slaves.  At the end of
+ *     the notify_die chain, the main crash_stop code releases the slave cpus.
+ *     This gives a consistent view of the system to all the entries on the
+ *     notify_die chain.
+ *
+ * The various states are a little complicated, because the code has to cope
+ * with normal calls, nested calls, concurrent calls on separate cpus plus
+ * keeping a consistent view for the life of a notify chain.  A few rules :-
+ *
+ *   Variables cs_lock_cpu, cs_monarch and cs_notify_chain hold a cpu number,
+ *   -1 is 'not set'.  These variables are only updated on the monarch cpu and
+ *   are all protected by cs_lock.
+ *
+ *   Entering a nested call only affects the monarch cpu.  The slave cpus will
+ *   continue to spin in the callback for the first crash_stop() event.
+ *
+ *   Returning from a nested call does not clear cs_monarch nor release the
+ *   slaves.
+ *
+ *   If a monarch gets the lock and cs_notify_chain is not the current cpu then
+ *   another cpu is already running a notify chain.  This monarch must back off
+ *   and wait for the other cpu to finish running its notify chain.
+ *
+ *   Returning from a notify_chain call clears cs_monarch but does not release
+ *   the slaves.  Instead the slaves loop inside this code, in the expectation
+ *   that another notify chain driven routine will call crash_stop and will
+ *   need the slaves.  Unlike a nested call, the slaves will use the supplied
+ *   callback for each entry on the notify chain that calls crash_stop().
+ *
+ * Why the difference between nested calls and a notify chain?  Mainly because
+ * the entries on a notify chain are defined to be separate, also crash_stop
+ * can easily detect the start and end of running the chain.  With a nested
+ * call, there is no way to tell if the first callback will use crash_stop() a
+ * second time.  Nested calls can result from explicit calls to other debug
+ * style code or from an oops in the current callback.  On a nested call, the
+ * monarch callback owns and controls the slaves, they are out of crash_stop()
+ * control.  Only the monarch callback can release the slaves by leaving
+ * crash_stop() state, at which point the sconed call to crash_stop is not a
+ * nested call.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/delay.h>
+#include <linux/crash_stop.h>
+#include <linux/kernel.h>
+#include <linux/ptrace.h>
+#include <linux/nmi.h>
+#include <linux/spinlock.h>
+#include <linux/threads.h>
+
+DEFINE_SPINLOCK(cs_lock);
+int cs_lock_cpu = -1;
+int cs_monarch = -1;
+int cs_notify_chain = -1;
+
+static int cs_recovered;
+
+static cpumask_t cs_cpu_mask, cs_sent_ipi, cs_sent_nmi;
+
+static struct crash_stop_running_process crash_stop_running_process[NR_CPUS];
+
+struct cs_global cs_global;
+
+/* Use a local version of mdelay because RedHat patch the kernel to give a
+ * warning when mdelay is used with interrupts disabled.  Why do RedHat do
+ * these silly things, have they never heard of debugging?
+ */
+static void
+cs_mdelay(int ms)
+{
+	while (ms > 0) {
+		touch_nmi_watchdog();
+		udelay(1000);
+		--ms;
+	}
+}
+
+static void
+cpu_relax_watchdog(void)
+{
+	touch_nmi_watchdog();
+	cpu_relax();
+}
+
+/* If we cannot safely use an external print routine then save any messages in
+ * a local buffer.  This code is not performance sensitive so we take the time
+ * to left justify the entire buffer instead of using ring pointers, this
+ * removes the need for users to cope with wrapped cs_msg text when analysing a
+ * crash_stopped kernel.
+ */
+
+static char cs_msg[4096];
+
+static asmlinkage int
+cs_printk(const char * fmt, ...)
+{
+	int l, ret, shift;
+	va_list ap;
+	static DEFINE_SPINLOCK(cs_msg_lock);
+	spin_lock(&cs_msg_lock);
+	l = strlen(cs_msg);
+	while (1) {
+		va_start(ap, fmt);
+		ret = vsnprintf(cs_msg+l, sizeof(cs_msg)-l, fmt, ap);
+		va_end(ap);
+		if (l == 0 || ret < sizeof(cs_msg)-l)
+			break;
+		shift = sizeof(cs_msg) / 10 + 1;
+		shift = max(shift, ret);
+		shift = min(shift, l);
+		l -= shift;
+		memcpy(cs_msg, cs_msg+shift, l);
+		memset(cs_msg+l, 0, sizeof(cs_msg)-l);
+	}
+	spin_unlock(&cs_msg_lock);
+	return 0;
+}
+
+static void
+cs_online_cpu_status(const char *text)
+{
+	int slaves = num_online_cpus() - 1, count = 0, cpu, unknown;
+	if (!slaves)
+		return;
+	for_each_online_cpu(cpu) {
+		if (cpu_isset(cpu, cs_cpu_mask) &&
+		    cpu != smp_processor_id())
+			++count;
+	}
+	unknown = slaves - count;
+	if (unknown == 0)
+		cs_global.print(
+			"All cpus are in crash_stop for %s\n",
+			text);
+	else
+		cs_global.print(
+			"%d cpu%s not in crash_stop for %s, %s state is unknown\n",
+			unknown,
+			unknown == 1 ? " is" : "s are",
+			text,
+			unknown == 1 ? "its" : "their");
+}
+
+/* Should only be called by the arch interrupt handlers, when they receive the
+ * crash_stop specific IPI.
+ */
+void
+cs_common_ipi(struct pt_regs *regs)
+{
+	if (!cs_global.callback) {
+		printk(KERN_DEBUG "Ignoring late cs_ipi on cpu %d\n",
+		       smp_processor_id());
+		return;
+	}
+	crash_stop_cpu(0, regs);
+}
+
+/* Should only be called by the arch specific NMI handlers, to see if this NMI
+ * is for crash_stop or for something else.  On most architectures, an NMI
+ * signal carries no state so we have to deduce why it was sent.
+ */
+int
+crash_stop_sent_nmi(void)
+{
+	return cpu_isset(smp_processor_id(), cs_sent_nmi);
+}
+
+/* Should only be called by the arch specific crash_stop code, after they have
+ * saved any arch specific state.  The call chain is :-
+ *
+ * crash_stop() [monarch] or cs_ipi() [slave] ->
+ *   crash_stop_cpu() [common front end code] ->
+ *     cs_arch_cpu() [arch dependent code] ->
+ *       cs_common_cpu() [common back end code] ->
+ *         callback
+ *
+ * When cs_common_cpu() is entered for a slave cpu, it must spin while
+ * cs_monarch < 0.  That enforces the order of slave callbacks first, then
+ * monarch callback.
+ *
+ * When handling a notify chain, park the slave cpus in this holding routine
+ * while the monarch cpu runs down the notify chain.  If any entry on the
+ * notify chain calls crash_stop_cpu() than release the slaves to the
+ * corresponding crash_stop callback.  On return from the callback, put them
+ * back in a holding loop.  The state of the slave cpus is not significantly
+ * changed by this process and each caller of crash_stop_cpu() gets the same
+ * data in crash_stop_running_process.  IOW, all entries on the notify chain
+ * see the state that was saved by the first crash_stop entry on the chain, not
+ * some state that changes as we run the notify chain.
+ */
+void
+cs_common_cpu(int monarch)
+{
+	do {
+		/* wait until the monarch enters */
+		while (cs_monarch < 0)
+			cpu_relax_watchdog();
+		if (!cs_recovered)
+			cs_global.callback(monarch, cs_global.data);
+		if (monarch)
+			return;
+		/* wait until the monarch leaves */
+		while (cs_monarch >= 0)
+			cpu_relax_watchdog();
+	} while (cs_notify_chain >= 0);
+}
+
+/* Wait for at least 3 seconds, but allow an extra 100 ms per online cpu to
+ * cope with live lock on systems with large cpu counts.  These are arbitrary
+ * numbers, it might be worth exposing them as /sys values so sites can tune
+ * their debugging.  Review this after we have more experience with this
+ * code - KAO.
+ */
+static void
+cs_wait_for_cpus(void)
+{
+	int count, prev_count = 0, sent_nmi = 0, t, wait_secs, slaves, cpu;
+	slaves = num_online_cpus() - 1;
+	wait_secs = min(3, (slaves * 100) / 1000);
+	cs_mdelay(100);
+	for (t = 0; t < wait_secs; ++t) {
+		count = 0;
+		slaves = num_online_cpus() - 1;
+		for_each_online_cpu(cpu) {
+			if (cpu_isset(cpu, cs_cpu_mask))
+				++count;
+		}
+		if (count == slaves)
+			break;
+		if (prev_count != count) {
+			cs_global.print(
+				"  %d out of %d cpus in crash_stop, "
+				"waiting for the rest, timeout in %d "
+				"second(s)\n",
+				count, slaves, wait_secs-t);
+			prev_count = count;
+		}
+		cs_mdelay(1000);
+		if (!sent_nmi && t == min(wait_secs / 2, 5)) {
+			for_each_online_cpu(cpu) {
+				if (cpu_isset(cpu, cs_cpu_mask) ||
+				    cpu_isset(cpu, cs_sent_nmi) ||
+				    cpu == smp_processor_id())
+					continue;
+				if (!sent_nmi) {
+					cs_global.print(" sending NMI ");
+					sent_nmi = 1;
+				}
+				cpu_set(cpu, cs_sent_nmi);
+				wmb();
+				cs_arch_send_nmi(cpu);
+			}
+		}
+		if (t % 4 == 0)
+			cs_global.print(".");
+	}
+}
+
+static void
+cs_send_ipi(void)
+{
+	int sent_ipi = 0, cpu;
+	for_each_online_cpu(cpu) {
+		if (cpu_isset(cpu, cs_cpu_mask) ||
+		    cpu_isset(cpu, cs_sent_ipi) ||
+		    cpu == smp_processor_id())
+			continue;
+		cpu_set(cpu, cs_sent_ipi);
+		cs_arch_send_ipi(cpu);
+		sent_ipi = 1;
+	}
+	if (sent_ipi)
+		cs_wait_for_cpus();
+}
+
+/**
+ * crash_stop_cpu: - Put the current cpu into crash_stop state.
+ * @monarch: 0 for a slave cpu, 1 for the monarch cpu.
+ * @regs: pt_regs for the current interrupting event.
+ *
+ * Invoked on every cpu that is being stopped, with no externally defined order
+ * between monarch and slaves.  The arch independent running state is saved
+ * here, then cs_arch_cpu() saves any arch specific state, followed by
+ * invocation of cs_common_cpu() which drives the callback routine.
+ */
+void
+crash_stop_cpu(int monarch, struct pt_regs *regs)
+{
+	struct crash_stop_running_process *r, prev;
+	int cpu = smp_processor_id();
+	cpu_set(cpu, cs_cpu_mask);
+	r = crash_stop_running_process + cpu;
+	prev = *r;
+	r->p = current;
+	r->regs = regs;
+	r->prev = &prev;
+	if (monarch && cs_monarch < 0) {
+		cs_monarch = cpu;
+		wmb();
+		cs_mdelay(10);	/* give the slaves a chance to get going */
+	}
+	cs_arch_cpu(monarch, r);
+	*r = prev;
+	if (r->p == NULL) {
+		cpu_clear(cpu, cs_sent_ipi);
+		cpu_clear(cpu, cs_sent_nmi);
+		smp_mb__before_clear_bit();
+		cpu_clear(cpu, cs_cpu_mask);
+		if (monarch)
+			cs_monarch = -1;
+	}
+}
+
+/**
+ * crash_stop: - Bring the system to a crash stop for debugging.
+ * @callback: After each cpu has been interrupted, the callback is invoked on
+ * that cpu, with the monarch flag set to 0.  After all cpus have responded or
+ * the timeout has been reached then the callback is invoked on the current cpu
+ * with the monarch flag set to 1.
+ * @data: Callback specific data, crash_stop does not use this data.
+ * @print: Optionally, the name of a debugger specific print routine.  If this
+ * is NULL then crash_stop will default to using cs_printk(), messages will be
+ * left justified in cs_msg[].
+ *
+ * Unlike stop_machine(), crash_stop() does not ask if the other cpus are
+ * ready to be stopped and will use non-maskable interrupts to stop cpus that
+ * do not respond after a few seconds.
+ *
+ * crash_stop() must be entered with interrupts disabled, it can even be
+ * entered from an NMI event.  It is the caller's responsibility to ensure that
+ * their print routine (if any) is safe in the current context.
+ *
+ * If the system has already entered a globally stopped state then sending IPI
+ * or NMI is pointless and may even be unsafe.  This particularly applies to
+ * MCA or global INIT on IA64, these events are already defined to stop the
+ * entire machine and they also prevent crash_stop() from sending any IPI or
+ * NMI events.  Only send IPI/NMI to cpus that are not yet in crash_stop state.
+ *
+ * The global structure crash_stop_running_process is updated with information
+ * about the tasks that are running on each cpu.  The debugger can use this
+ * information to start the analysis of the running tasks.
+ *
+ * Returns: 0 normal
+ *
+ *          -ENOSYS crash_stop is not supported on this architecture.
+ */
+
+int
+crash_stop(void (*callback)(int monarch, void *data), void *data,
+	   struct pt_regs *regs, printk_t print,
+	   const char *text)
+{
+	int cpu;
+	struct cs_global csg_save, csg = {
+		.callback = callback,
+		.data = data,
+		.print = print ? print : cs_printk,
+	};
+
+	WARN_ON(!irqs_disabled());
+retry:
+	if (!spin_trylock(&cs_lock)) {
+		if (cs_lock_cpu == smp_processor_id()) {
+			/* nested call on the same cpu */
+			csg_save = cs_global;
+			cs_global = csg;
+			wmb();
+			cs_online_cpu_status(text);
+			crash_stop_cpu(1, regs);
+			cs_global = csg_save;
+			wmb();
+			return 0;
+		}
+		/* concurrent call on another cpu */
+		while (cs_lock_cpu != -1)
+			cpu_relax_watchdog();
+		goto retry;
+	}
+
+	if (cs_notify_chain >= 0 &&
+	    cs_notify_chain != smp_processor_id()) {
+		/* another cpu is running a notify chain, back off */
+		spin_unlock(&cs_lock);
+		cs_mdelay(1);
+		goto retry;
+	}
+
+	cs_lock_cpu = smp_processor_id();
+	csg_save = cs_global;
+	cs_global = csg;
+	wmb();
+	cs_send_ipi();
+	cs_online_cpu_status(text);
+	crash_stop_cpu(1, regs);
+	if (cs_monarch < 0 && cs_notify_chain < 0) {
+		/* leaving a normal call, wait for the slaves to exit */
+		for_each_online_cpu(cpu) {
+			while (cpu_isset(cpu, cs_cpu_mask))
+				cpu_relax_watchdog();
+		}
+	}
+	cs_global = csg_save;
+	cs_lock_cpu = -1;
+	spin_unlock(&cs_lock);
+	return 0;
+}
+
+/**
+ * crash_stop_recovered: - Release any slaves in crash_stop state.
+ *
+ * On architectures that define their own global synchronization methods, the
+ * slave cpus may enter crash_stop state before the monarch.  If the monarch
+ * decides that the event is recoverable then the slaves need to be released
+ * from crash_stop, without invoking any callbacks.
+ *
+ * For recovered events, we do not always force the other cpus into slave
+ * state.  The assumption is that crash_stop_recovered() is only required on
+ * architectures that define their own global synchronization methods (e.g.
+ * IA64 MCA), in which case the architecture has already take care of the
+ * slaves.  If no slave cpu is in crash_stop() state then do nothing, otherwise
+ * wait until all the slaves are in crash_stop().
+ *
+ * Note: this routine does not check for a nested call to crash_stop, nor does
+ * it handle notify chains.  It makes no sense to recover an error except at
+ * the top level.
+ */
+int
+crash_stop_recovered(void)
+{
+	int cpu, any_slaves = 0;
+
+	WARN_ON(!irqs_disabled());
+retry:
+	spin_lock(&cs_lock);
+	if (cs_notify_chain >= 0 &&
+	    cs_notify_chain != smp_processor_id()) {
+		/* another cpu is running a notify chain, back off */
+		spin_unlock(&cs_lock);
+		cs_mdelay(1);
+		goto retry;
+	}
+	BUG_ON(cs_notify_chain >= 0);
+	for_each_online_cpu(cpu) {
+		if (cpu_isset(cpu, cs_cpu_mask) &&
+		    cpu != smp_processor_id()) {
+			any_slaves = 1;
+			break;
+		}
+	}
+	if (any_slaves) {
+		/* give cs_send_ipi/cs_wait_for_cpus a safe print routine */
+		struct cs_global csg_save, csg = {
+			.print = cs_printk,
+		};
+		csg_save = cs_global;
+		cs_global = csg;
+		wmb();
+		cs_send_ipi();
+		cs_global = csg_save;
+	}
+	cs_recovered = 1;
+	wmb();
+	cs_monarch = smp_processor_id();
+	for_each_online_cpu(cpu) {
+		while (cpu_isset(cpu, cs_cpu_mask))
+			cpu_relax_watchdog();
+	}
+	cs_recovered = 0;
+	cs_monarch = -1;
+	spin_unlock(&cs_lock);
+	return 0;
+}


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] crash_stop: crash_stop_common_Kconfig
  2006-10-13 13:23 [RFC] Common API for bring the system to a crash_stop state Keith Owens
                   ` (3 preceding siblings ...)
  2006-10-13 13:27 ` [RFC] crash_stop: crash_stop_common Keith Owens
@ 2006-10-13 13:28 ` Keith Owens
  2006-10-13 13:28 ` [RFC] crash_stop: crash_stop_demo Keith Owens
  2006-10-13 13:45 ` [RFC] Common API for bring the system to a crash_stop state Keith Owens
  6 siblings, 0 replies; 10+ messages in thread
From: Keith Owens @ 2006-10-13 13:28 UTC (permalink / raw)
  To: linux-arch

Add CONFIG_CRASH_STOP and CONFIG_CRASH_STOP_SUPPORTED.  Debug code will
select CRASH_STOP.  That in turn will select CRASH_STOP_SUPPORTED if
the config supports crash_stop().  If the supported architecture list
changes then only the CRASH_STOP entty needs to be updated, all the
tools that select CRASH_STOP do not have to worry about arch dependent
details.

Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -398,3 +398,10 @@ config LKDTM
 
 	Documentation on how to use the module can be found in
 	drivers/misc/lkdtm.c
+
+config CRASH_STOP
+	bool
+	select CRASH_STOP_SUPPORTED if SMP && ( (X86 && !X86_VOYAGER) )
+
+config CRASH_STOP_SUPPORTED
+	bool


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [RFC] crash_stop: crash_stop_demo
  2006-10-13 13:23 [RFC] Common API for bring the system to a crash_stop state Keith Owens
                   ` (4 preceding siblings ...)
  2006-10-13 13:28 ` [RFC] crash_stop: crash_stop_common_Kconfig Keith Owens
@ 2006-10-13 13:28 ` Keith Owens
  2006-10-13 13:45 ` [RFC] Common API for bring the system to a crash_stop state Keith Owens
  6 siblings, 0 replies; 10+ messages in thread
From: Keith Owens @ 2006-10-13 13:28 UTC (permalink / raw)
  To: linux-arch

A quick and dirty crash_stop() demo program.  Most of the code is to
get the machine into a suitable state for testing both the normal IPI
and NMI code.  The interesting crash_stop bits are cs_demo_callback*()
and simulate_crash_stop_event().

The base patch does not export any crash_stop() symbols.  They can be
added later if any debug style code can be installed in a module.
Since the demo is best used as a module, this patch temporarilly
exports some symbols.

---
 kernel/Makefile          |    1 
 kernel/crash_stop.c      |    5 +
 kernel/crash_stop_demo.c |  160 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug        |   12 +++
 4 files changed, 177 insertions(+), 1 deletion(-)

Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_UTS_NS) += utsname.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_CRASH_STOP_SUPPORTED) += crash_stop.o
+obj-$(CONFIG_CRASH_STOP_DEMO) += crash_stop_demo.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux/kernel/crash_stop.c
===================================================================
--- linux.orig/kernel/crash_stop.c
+++ linux/kernel/crash_stop.c
@@ -134,6 +134,7 @@
 #include <linux/delay.h>
 #include <linux/crash_stop.h>
 #include <linux/kernel.h>
+#include <linux/module.h>		/* used by crash_stop_demo */
 #include <linux/ptrace.h>
 #include <linux/nmi.h>
 #include <linux/spinlock.h>
@@ -255,6 +256,7 @@ crash_stop_sent_nmi(void)
 {
 	return cpu_isset(smp_processor_id(), cs_sent_nmi);
 }
+EXPORT_SYMBOL(crash_stop_sent_nmi);	/* used by crash_stop_demo */
 
 /* Should only be called by the arch specific crash_stop code, after they have
  * saved any arch specific state.  The call chain is :-
@@ -307,7 +309,7 @@ cs_wait_for_cpus(void)
 {
 	int count, prev_count = 0, sent_nmi = 0, t, wait_secs, slaves, cpu;
 	slaves = num_online_cpus() - 1;
-	wait_secs = min(3, (slaves * 100) / 1000);
+	wait_secs = 3 + (slaves * 100) / 1000;
 	cs_mdelay(100);
 	for (t = 0; t < wait_secs; ++t) {
 		count = 0;
@@ -495,6 +497,7 @@ retry:
 	spin_unlock(&cs_lock);
 	return 0;
 }
+EXPORT_SYMBOL(crash_stop);		/* used by crash_stop_demo */
 
 /**
  * crash_stop_recovered: - Release any slaves in crash_stop state.
Index: linux/kernel/crash_stop_demo.c
===================================================================
--- /dev/null
+++ linux/kernel/crash_stop_demo.c
@@ -0,0 +1,160 @@
+/*
+ * linux/arch/i386/crash_stop_demo.c
+ *
+ * Copyright (C) 2006 Keith Owens <kaos@sgi.com>
+ *
+ * Demonstrate the use of crash_stop().  This module requires at least 2 slave
+ * cpus, plus the monarch cpu.  One of the slaves is put into a disabled spin
+ * loop, the other slaves are left alone.  The monarch calls crash_stop().
+ * Most of the slaves will respond to the normal IPI, the disabled cpu will
+ * only respond to NMI.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/crash_stop.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/nmi.h>
+
+MODULE_LICENSE("GPL");
+
+/* The callback function passed to crash_stop() is invoked on each cpu that is
+ * in crash_stop state.  The main crash_stop() code will ensure that slaves are
+ * entered first, followed by the monarch after a short delay.  The debug
+ * specific callback then does its own work.
+ */
+
+static int cs_demo_monarch_entered, cs_demo_monarch_exited;
+
+static void
+cs_demo_callback_monarch(void *data) {
+	printk(KERN_ALERT "%s: entering monarch cpu %d\n",
+	       __FUNCTION__, smp_processor_id());
+	cs_demo_monarch_entered = 1;
+	wmb();
+	/* Monarch callback processing using data and struct
+	 * crash_stop_running_process goes here.
+	 */
+	cs_demo_monarch_exited = 1;
+	wmb();
+}
+
+static void
+cs_demo_callback_slave(void *data) {
+	printk(KERN_ALERT "%s: entering slave cpu %d via %s\n",
+	       __FUNCTION__, smp_processor_id(),
+	       crash_stop_sent_nmi() ? "NMI" : "IPI");
+	while (!cs_demo_monarch_entered) {
+		touch_nmi_watchdog();
+		cpu_relax();
+	}
+	/* Slave callback processing using data goes here.   In most cases the
+	 * slaves will just spin until the monarch releases them.  The main
+	 * crash_stop() code saves the state for each slave cpu before entering
+	 * the callback.  The monarch can used that saved state without the
+	 * slave callback doing any more work.
+	 */
+	while (!cs_demo_monarch_exited) {
+		touch_nmi_watchdog();
+		cpu_relax();
+	}
+}
+
+static void
+cs_demo_callback(int monarch, void *data)
+{
+	if (monarch)
+		cs_demo_callback_monarch(data);
+	else
+		cs_demo_callback_slave(data);
+	printk(KERN_ALERT "%s: leaving cpu %d\n",
+	       __FUNCTION__, smp_processor_id());
+}
+
+/* crash_stop() is usually called from an error state where pt_regs are
+ * available and interrupts are already disabled.  For the demo, use a NULL
+ * pt_regs and disable interrupts by hand.  Use printk as the demo I/O routine,
+ * even though that is not always a good choice (not NMI safe).
+ */
+static void
+simulate_crash_stop_event(void)
+{
+	printk(KERN_ALERT "%s: before crash_stop()\n", __FUNCTION__);
+	local_irq_disable();
+	crash_stop(cs_demo_callback, NULL, NULL, printk, "cs_demo");
+	local_irq_enable();
+	printk(KERN_ALERT "%s: after crash_stop()\n", __FUNCTION__);
+}
+
+static int cs_demo_do_spin = 1, cs_demo_spinning;
+static DECLARE_COMPLETION(cs_demo_done);
+
+/* spin disabled on one cpu until the crash_stop test has finished */
+static int
+cs_demo_spin(void *vdata)
+{
+	local_irq_disable();
+	cs_demo_spinning = 1;
+	wmb();
+	while (cs_demo_do_spin) {
+		touch_nmi_watchdog();
+		cpu_relax();
+	}
+	local_irq_enable();
+	complete(&cs_demo_done);
+	do_exit(0);
+}
+
+/* Ignore most of this routine, the complexity comes from getting the various
+ * cpus into a suitable state for testing crash_stop(), including NMI
+ * processing.  In real life, the system would already be dying before
+ * crash_stop() was invoked.
+ */
+static int __init
+cs_demo_init(void)
+{
+	struct task_struct *p;
+	int c, disabled = 0, this_cpu = get_cpu(), slaves = 0;
+
+	printk(KERN_ALERT "%s: monarch is cpu %d\n",
+	       __FUNCTION__, this_cpu);
+	set_cpus_allowed(current, cpumask_of_cpu(this_cpu));
+	put_cpu();
+	for_each_online_cpu(c) {
+		if (c != this_cpu) {
+			++slaves;
+			disabled = c;
+		}
+	}
+	if (slaves < 2) {
+		printk(KERN_ERR "%s needs at least two slave cpus\n",
+		       __FUNCTION__);
+		return -EINVAL;
+	}
+
+	init_completion(&cs_demo_done);
+	p = kthread_create(cs_demo_spin, NULL, "kcrash_stop_demo");
+	if (IS_ERR(p))
+		return PTR_ERR(p);
+	kthread_bind(p, disabled);
+	wake_up_process(p);
+	while (!cs_demo_spinning)
+		cpu_relax();
+	printk(KERN_ALERT "%s: cpu %d is spinning disabled\n",
+	       __FUNCTION__, disabled);
+
+	simulate_crash_stop_event();
+
+	cs_demo_do_spin = 0;
+	wmb();
+	wait_for_completion(&cs_demo_done);
+	return 0;
+}
+
+static void __exit
+cs_demo_exit(void)
+{
+}
+
+module_init(cs_demo_init)
+module_exit(cs_demo_exit)
Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -405,3 +405,15 @@ config CRASH_STOP
 
 config CRASH_STOP_SUPPORTED
 	bool
+
+config CRASH_STOP_DEMO
+	tristate "Demonstrate the use of crash_stop"
+	default m
+	depends on SMP
+	select CRASH_STOP
+	help
+          Code to demonstrate the use of crash_stop.  Build it as a
+          module and load it.  It will make one cpu spin disabled then
+          call crash_stop.  All slave cpus bar one will get a normal
+          IPI, the spinning cpu will get NMI.  You need at least 3 cpus
+          to run crash_stop_demo.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] Common API for bring the system to a crash_stop state
  2006-10-13 13:23 [RFC] Common API for bring the system to a crash_stop state Keith Owens
                   ` (5 preceding siblings ...)
  2006-10-13 13:28 ` [RFC] crash_stop: crash_stop_demo Keith Owens
@ 2006-10-13 13:45 ` Keith Owens
  6 siblings, 0 replies; 10+ messages in thread
From: Keith Owens @ 2006-10-13 13:45 UTC (permalink / raw)
  To: linux-arch

Keith Owens (on Fri, 13 Oct 2006 23:23:35 +1000) wrote:
>The solution is to define a common crash_stop API that can be used by
>_all_ of the debug style tools, without reinventing the wheel each
>time.
>
>The following crash_stop patches will only appear on linux-arch.

Forgot to mention that these patches are against 2.6.19-rc1.  They will
have to change for 2.6.19-rc2 due to the pt_regs clean up.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] crash_stop: crash_stop_i386_handler
  2006-10-13 13:25 ` [RFC] crash_stop: crash_stop_i386_handler Keith Owens
@ 2006-10-13 13:55   ` James Bottomley
  2006-10-13 14:00     ` Keith Owens
  0 siblings, 1 reply; 10+ messages in thread
From: James Bottomley @ 2006-10-13 13:55 UTC (permalink / raw)
  To: Keith Owens; +Cc: linux-arch

On Fri, 2006-10-13 at 23:25 +1000, Keith Owens wrote:
> Note 2: This patch does not cover i386 voyager.  I do not understand
>         the voyager interrupt mechanism well enough to define its
>         interrupt handlers (hint, hint).

Voyager actually doesn't need a software based crash dump at all.

The system itself has a type of GSP that runs SUS (sort of a bios)
continuously.  SUS will dump a complete memory and state image to a
nominated dump slice (usually swap) if the OS becomes unresponsive or
requests it.  So all I really have to do is come up with a user space
tool that converts from this raw image format to your crash tool format.

James

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] crash_stop: crash_stop_i386_handler
  2006-10-13 13:55   ` James Bottomley
@ 2006-10-13 14:00     ` Keith Owens
  0 siblings, 0 replies; 10+ messages in thread
From: Keith Owens @ 2006-10-13 14:00 UTC (permalink / raw)
  To: James Bottomley; +Cc: linux-arch

James Bottomley (on Fri, 13 Oct 2006 08:55:35 -0500) wrote:
>On Fri, 2006-10-13 at 23:25 +1000, Keith Owens wrote:
>> Note 2: This patch does not cover i386 voyager.  I do not understand
>>         the voyager interrupt mechanism well enough to define its
>>         interrupt handlers (hint, hint).
>
>Voyager actually doesn't need a software based crash dump at all.

This API is designed for any kernel debug tool, not just crash dump.
This includes run time debuggers such as kdb, kgdb and nlkd.

>The system itself has a type of GSP that runs SUS (sort of a bios)
>continuously.  SUS will dump a complete memory and state image to a
>nominated dump slice (usually swap) if the OS becomes unresponsive or
>requests it.  So all I really have to do is come up with a user space
>tool that converts from this raw image format to your crash tool format.

Not my crash format, other people maintain the crash dump tools.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-10-13 13:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-13 13:23 [RFC] Common API for bring the system to a crash_stop state Keith Owens
2006-10-13 13:24 ` [RFC] crash_stop: crash_stop_headers Keith Owens
2006-10-13 13:25 ` [RFC] crash_stop: crash_stop_i386_handler Keith Owens
2006-10-13 13:55   ` James Bottomley
2006-10-13 14:00     ` Keith Owens
2006-10-13 13:26 ` [RFC] crash_stop: crash_stop_i386 Keith Owens
2006-10-13 13:27 ` [RFC] crash_stop: crash_stop_common Keith Owens
2006-10-13 13:28 ` [RFC] crash_stop: crash_stop_common_Kconfig Keith Owens
2006-10-13 13:28 ` [RFC] crash_stop: crash_stop_demo Keith Owens
2006-10-13 13:45 ` [RFC] Common API for bring the system to a crash_stop state Keith Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).