[patch 2.6.19-rc5 0/12] crash

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch 2.6.19-rc5 0/12] crash_stop: Summary
@ 2006-11-09  4:04 Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 1/12] crash_stop: common header Keith Owens
                   ` (12 more replies)
  0 siblings, 13 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:04 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

All the kernel debug style tools (kdb, kgdb, nlkd, netdump, lkcd,
crash, kdump etc.) have a common requirement, they need to do a crash
stop of the systems.  This means stopping all the cpus, even if some of
the cpus are spinning disabled.  In addition, each cpu has to save
enough state to start the diagnosis of the problem.

* Each debug style tool has written its own code for interrupting the
  other cpus and for saving cpu state.

* Some tools try a normal IPI first then send a non-maskable interrupt
  after a delay.

* Some tools always send a NMI first, which can result in incomplete or
  wrong machine state if NMI arrives at the wrong time.

* Most of the tools do not know how to cope with the IA64 architecture
  defined rendezvous algorithm, which interferes with an OS driven
  rendezvous.

* Needless to say, every single patch set conflicts with all the
  others, which makes it very difficult to install more than one of the
  tools at a time.

The solution is to define a common crash_stop API that can be used by
_all_ of the debug style tools, without reinventing the wheel each
time.  The following crash_stop patches implement this API for i386,
x86_64 and ia64.  It correctly handles the complicated ia64 algorithm
for MCA and INIT, unlike almost every current debug style tool.

Adding other architectures is a fairly simple matter, define
the IPI and NMI routines (the crash_stop_$(ARCH)_handlers patch),
intercept the events that indicate that the system is dying (the
crash_stop_$(ARCH) patch), update the Kconfig entry for CRASH_STOP to
add the new $(ARCH).

Most of the design documentation is in the crash_stop_common patch.
Please read that before replying.

crash_stop_header               The architecture independent header.

crash_stop_common               The architecture independent code.

crash_stop_i386_handlers        i386 specific code to send and respond
                                to the crash_stop IPI and NMI.

crash_stop_i386                 i386 specific code to intercept events
                                that indicate that the system is dying.

crash_stop_x86_64_nmiwatchdog   i386 creates an event for NMI watchdog,
                                it is missing from x86_64.  Add
                                DIE_NMIWATCHDOG to x86_64.

crash_stop_x86_64_handlers      x86_64 specific code to send and
                                respond to the crash_stop IPI and NMI.

crash_stop_x86_64               x86_64 specific code to intercept
                                events that indicate that the system is
                                dying.

crash_stop_ia64_handlers        ia64 specific code to send and respond
                                to the crash_stop IPI and NMI.

crash_stop_ia64                 ia64 specific code to intercept events
                                that indicate that the system is dying.

crash_stop_common_Kconfig       Add crash_stop to the config system.
                                Only for i386, x86_64 and ia64 at the
                                moment, extend as new architectures are
                                added.

crash_stop_demo                 A demonstration of using crash_stop in
                                a debug style tool.  Not for inclusion
                                in the kernel.

crash_stop_test                 Test the crash_stop code.  Not for
                                inclusion in the kernel.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 1/12] crash_stop: common header
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
@ 2006-11-09  4:04 ` Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 2/12] crash_stop: common code Keith Owens
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:04 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

crash_stop() common header.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 include/linux/crash_stop.h |   71 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 71 insertions(+)

Index: linux/include/linux/crash_stop.h
===================================================================
--- /dev/null
+++ linux/include/linux/crash_stop.h
@@ -0,0 +1,71 @@
+#ifndef _LINUX_CRASH_STOP_H
+
+#include <linux/cpumask.h>
+#include <linux/errno.h>
+#include <linux/ptrace.h>
+
+typedef asmlinkage int (*printk_t)(const char * fmt, ...)
+	__attribute__ ((format (printf, 1, 2)));
+#define _LINUX_CRASH_STOP_H
+
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+
+#include <asm/crash_stop.h>
+
+/* These six entries are the only ones used by code outside crash_stop itself.
+ * Anything starting with 'crash_stop' is part of the external ABI, anything
+ * starting with'cs_' is only to be used by internal crash_stop code.
+ */
+extern int crash_stop(void (*callback)(int monarch, void *data),
+		      void *data, printk_t print,
+		      struct pt_regs *regs, const char *text);
+extern void crash_stop_recovered(void);
+#ifdef	CONFIG_SMP
+extern void crash_stop_slave(void);
+extern int crash_stop_sent_nmi(void);
+#else	/* !CONFIG_SMP */
+#define crash_stop_slave() do {} while(0)
+#define crash_stop_sent_nmi() 0
+#endif	/* CONFIG_SMP */
+extern int crash_stop_slaves(void);
+struct crash_stop_running_process {
+	struct task_struct *p;
+	struct pt_regs *regs;
+	struct crash_stop_running_process *prev;
+	struct crash_stop_running_process_arch arch;
+};
+
+extern void cs_common_ipi(void);
+extern void cs_arch_send_ipi(int);
+extern void cs_arch_send_nmi(int);
+
+extern void cs_arch_cpu(int, struct crash_stop_running_process *);
+extern void cs_common_cpu(int);
+
+extern void cs_notify_chain_start(struct pt_regs *);
+extern void cs_notify_chain_end(void);
+
+struct cs_global {
+	void (*callback)(int monarch, void *data);
+	void *data;
+	printk_t print;
+};
+extern struct cs_global cs_global;
+
+#else	/* !CONFIG_CRASH_STOP_SUPPORTED */
+
+static inline
+int crash_stop(void (*callback)(int monarch, void *data),
+		     void *data, printk_t print,
+		     struct pt_regs *regs, const char *text)
+{
+	return -ENOSYS;
+}
+#define crash_stop_recovered() do {} while(0)
+#define crash_stop_slave() do {} while(0)
+#define crash_stop_sent_nmi() 0
+#define crash_stop_slaves() 0
+
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
+
+#endif	/* _LINUX_CRASH_STOP_H */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 2/12] crash_stop: common code
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 1/12] crash_stop: common header Keith Owens
@ 2006-11-09  4:04 ` Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 3/12] crash_stop: i386 interrupt handlers Keith Owens
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:04 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

The common crash_stop code, including the design level documentation.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 kernel/Makefile     |    1 
 kernel/crash_stop.c |  843 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sys.c        |    5 
 3 files changed, 848 insertions(+), 1 deletion(-)

Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -51,6 +51,7 @@ obj-$(CONFIG_RELAY) += relay.o
 obj-$(CONFIG_UTS_NS) += utsname.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
+obj-$(CONFIG_CRASH_STOP_SUPPORTED) += crash_stop.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux/kernel/crash_stop.c
===================================================================
--- /dev/null
+++ linux/kernel/crash_stop.c
@@ -0,0 +1,843 @@
+/*
+ * linux/kernel/crash_stop.c
+ *
+ * Copyright (C) 2006 Keith Owens <kaos@sgi.com>
+ *
+ * Bring the system to a crash stop for debugging by stopping all the online
+ * cpus apart from the current cpu.  To interrupt the other cpus, first send a
+ * normal IPI, if any cpus have not responded after a few seconds then send a
+ * non-maskable interrupt.
+ *
+ * Most of this code disappears with CONFIG_SMP=n.  It devolves to running the
+ * callback routine on this cpu as the monarch and setting up the saved state
+ * for this cpu.  That gives a common interface for debug style tools, even on
+ * UP.
+ *
+ * These routines can be used by any debug style code that needs to stop the
+ * other cpus in the system, including those cpus that are not responding to
+ * normal interrupts.  Debug style code includes debuggers such as kdb, kgdb,
+ * nlkd as well as dump tools such as netdump, lkcd, kdump.  All these tools
+ * have the same basic synchronization requirements, the need to stop all the
+ * cpus, save the complete state of the tasks that were running then do some
+ * work on the current cpu.
+ *
+ * For each invocation of crash_stop, one cpu is the monarch, the other cpus
+ * are slaves.  There is no external guarantee of ordering between monarch and
+ * slave events.  The most common case is when the monarch is invoked via
+ * crash_stop(), it then drives the debugger's callback on the slave cpus,
+ * followed by the callback on the monarch cpu.
+ *
+ * Some architectures (IA64 in particular) define their own global machine
+ * synchronization events where a global event can drive the slave cpus ether
+ * before or after the monarch.  See INIT in Documentation/ia64/mca.txt.
+ *
+ * To hide the external monarch/slave races from the users of crash_stop, this
+ * code enforces a standard order on the events.  The debugger's callback
+ * routine is invoked on all the slaves "at the same time", followed 10 ms
+ * later by the callback on the monarch cpu.  Typically the callback will spin
+ * on the slave cpus until the monarch callback has done its work and released
+ * the slave cpus.
+ *
+ * There is no guarantee that all online cpus will be in crash_stop state when
+ * the monarch is entered.  If a cpu or chipset is so badly hung that it will
+ * not even respond to NMI then there will be no state for that cpu in
+ * crash_stop_running_process.
+ *
+ * A live locked system can result in a slave cpu processing the crash_stop IPI
+ * _after_ the monarch cpu has done its processing and left crash_stop status.
+ * The slave will not service the normal IPI fast enough (it is live locked
+ * with interrupts disabled) so it will be interrupted by NMI.  The monarch
+ * does its work and leaves crash_stop.  Later the slave gets out of the live
+ * lock and services the crash_stop IPI, but now there is no monarch to do
+ * anything.  To catch this delayed event, a crash_stop IPI is ignored if there
+ * is no current monarch.
+ *
+ * For some events, we cannot tell straight away if we want to debug the event
+ * or not.  For example, an IA64 MCA is architecturally defined to stop all the
+ * slaves before entering the monarch.  Only when the monarch is entered do we
+ * get any data on the event, it is only on the monarch that we can tell if the
+ * MCA is recoverable or not.  In this case, the monarch must call
+ * crash_stop_recovered() instead of crash_stop().  crash_stop_recovered()
+ * releases all the slaves.  Neither the slaves nor the monarch will use the
+ * callback routine.
+ *
+ * All routines are entered with interrupts disabled.  If necessary, the caller
+ * must disable interrupts before calling crash_stop.
+ */
+
+
+/* There are several possible scenarios for using crash_stop:
+ *
+ * (1) An explicit call to crash_stop from debugging code.  For example, a
+ *     direct entry into a debugger or an explicit request to dump via sysrq.
+ *     The debugging code calls crash_stop() which stops the slaves.
+ *
+ * (2) A nested call to crash_stop on the same cpu.  For example, a user is
+ *     debugging and they decide to take a kernel dump from inside the
+ *     debugger.  The debugger has already brought the system to crash_stop
+ *     state so the dump callback will be called on the current cpu (the
+ *     monarch) but not on the slaves.  The dump code uses the data that is
+ *     already in crash_stop_running_process[].
+ *
+ * (3) Concurrent calls to crash_stop on separate cpus.  One cpu will become
+ *     the monarch for one of the events and interrupt all the others,
+ *     including any cpus that are also trying to enter crash_stop.  When the
+ *     current monarch finishes, the other cpus will race for the crash_stop
+ *     lock and one will become the new monarch (assuming the system is still
+ *     usable).
+ *
+ * (4) A system error occurs and drives the notify_die callback chain, this one
+ *     can be tricky.  It is not known which entries on the notify_die chain
+ *     will do any work, but all of them need to see the same system state.  An
+ *     arch dependent crash_stop callback is called at the start and end of the
+ *     notify_die chain.  At the start it brings the system into crash_stop
+ *     state, using its own callbacks on the slave cpus.  Then it holds the
+ *     slave cpus and releases the monarch cpu.  This allows the rest of the
+ *     entries on the notify_die chain to run, each of them can call crash_stop
+ *     and run their callback on the current cpu and the slaves.  At the end of
+ *     the notify_die chain, the main crash_stop code releases the slave cpus.
+ *     This gives a consistent view of the system to all the entries on the
+ *     notify_die chain.
+ *
+ *     To make things more interesting, crash_stop() can be entered for one
+ *     reason then a software interrupt or NMI can come over the top.  That
+ *     will result in a notify_chain being run while the system is already in
+ *     crash_stop state.  Which means that any calls from the notify_chain to
+ *     crash_stop() must be treated as a nested calls.
+ *
+ *     Finally it is just possible to have multiple levels of notify_chain
+ *     running at the same time.  For example, an oops occurs and drives the
+ *     notify_chain.  At the start of that chain, the slaves are put into
+ *     crash_stop state and the monarch is allowed to run the chain.  A
+ *     callback on the chain breaks or loops and a second scan of the
+ *     notify_chain is done for this nested failure.  For this case, crash_stop
+ *     must ignore the use of the second notify_chain and treat any calls as
+ *     nested ones.
+ *
+ * The various states are a little complicated, because the code has to cope
+ * with normal calls, nested calls, concurrent calls on separate cpus,
+ * keeping a consistent view for the life of a notify_chain plus nested events
+ * that involve notify_chains.  And do it all without deadlocks, particularly
+ * on non-maskable interrupts.  A few rules :-
+ *
+ *   Variables cs_lock_owner, cs_monarch and cs_notify_chain_owner hold a cpu
+ *   number, -1 is 'not set'.  cs_notify_chain_depth is a counter.  These
+ *   variables are only updated on the monarch cpu.  The variables are
+ *   protected by cs_lock or by the fact that the current cpu is handling a
+ *   nested monarch event.
+ *
+ *   Entering a nested call only affects the monarch cpu.  The slave cpus will
+ *   continue to spin in the callback for the first crash_stop() event.  Nested
+ *   calls cannot take cs_lock, they would deadlock.
+ *
+ *   Returning from a nested call does not clear cs_monarch nor release the
+ *   slaves.
+ *
+ *   If a monarch gets the lock and cs_notify_chain_owner is not the current
+ *   cpu then another cpu is already running a notify_chain.  This monarch must
+ *   back off and wait for the other cpu to finish running its notify_chain.
+ *
+ *   Returning from a notify_chain call clears cs_monarch but does not release
+ *   the slaves.  Instead the slaves loop inside this code, in the expectation
+ *   that another notify_chain driven routine will call crash_stop and will
+ *   need the slaves.  Unlike a nested call, the slaves will use the supplied
+ *   callback for each entry on the notify_chain that calls crash_stop().
+ *
+ *   If cs_notify_chain_owner is already set to the monarch cpu on entry to a
+ *   notify_chain then ignore the use of the chain.  Any calls to crash_stop()
+ *   from entries on the chain will be treated as nested calls.
+ *
+ * Why the difference between nested calls and a notify_chain?  Mainly because
+ * the entries on a notify_chain are defined to be separate, also crash_stop
+ * can easily detect the start and end of running the chain.  With a nested
+ * call, there is no way to tell if the first callback will use crash_stop() a
+ * second time.  Nested calls can result from explicit calls to other debug
+ * style code or from an error in the current callback.  On a nested call, the
+ * monarch callback owns and controls the slaves, they are out of crash_stop()
+ * control.  Only the monarch callback can release the slaves by leaving
+ * crash_stop() state, at which point the second call to crash_stop is not a
+ * nested call.
+ */
+
+/* FIXME (maybe): There is a possible deadlock scenario:
+ *
+ * A monarch cpu calls crash_stop().
+ * All the slave cpus are put into crash_stop() state.
+ * One of the slaves gets a non-maskable interrupt - where from?
+ * The slave calls crash_stop() and spins waiting for cs_lock.
+ * The monarch exits and waits for all the slaves to exit.
+ * The slave that took NMI will not exit until cs_lock is free.
+ * The monarch will not free cs_lock until all the slaves exit.
+ *
+ * This deadlock can only occur if some external hardware generates an NMI and
+ * that NMI is sent to slave cpus instead of the monarch.  Until that situation
+ * can be demonstrated (and any workaround can be tested), I am going to ignore
+ * this scenario - KAO.
+ */
+
+/* Danger - Here there be races and compiler/hardware reordering gotchas.
+ *
+ * This code relies on variables that must be set on one cpu and seen on other
+ * cpus in the right order.  Both the compiler and the hardware can reorder
+ * operations, so use memory barriers when necessary.
+ *
+ * The biggest problem is that the compiler does not know about the other cpus,
+ * so the compiler may incorrectly think that an operation on this cpu has no
+ * side effects and may move the operation or even optimize it away.  To be on
+ * the safe side and to document the ordering requirements, barriers have been
+ * used wherever there is even the remote possibility of a current or future
+ * compiler being too smart for its own good.  Look for 'barrier:' comments.
+ *
+ * Obviously calls to spin_lock/spin_unlock are already barriers.  Only the
+ * additional barrier operations are commented.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/delay.h>
+#include <linux/crash_stop.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/ptrace.h>
+#include <linux/nmi.h>
+#include <linux/spinlock.h>
+#include <linux/threads.h>
+
+static DEFINE_SPINLOCK(cs_lock);
+static int cs_lock_owner = -1;
+static int cs_monarch = -1;
+static int cs_monarch_depth;
+static int cs_notify_chain_owner = -1;
+static int cs_notify_chain_depth;
+static int cs_notify_chain_ended;
+static int cs_leaving;
+static atomic_t cs_common_cpu_slaves;
+
+static int cs_recovered;
+
+static cpumask_t cs_cpu_mask, cs_sent_ipi, cs_sent_nmi;
+
+static struct crash_stop_running_process crash_stop_running_process[NR_CPUS];
+
+struct cs_global cs_global;
+
+/* Use a local version of mdelay because RedHat patch their kernel to give a
+ * warning when mdelay is used with interrupts disabled.  Why do RedHat do
+ * these silly things, have they never heard of debugging?
+ */
+static void
+cs_mdelay(int ms)
+{
+	while (ms > 0) {
+		touch_nmi_watchdog();
+		udelay(1000);	/* barrier: udelay -> cpu_relax -> barrier */
+		--ms;
+	}
+}
+
+static void
+cs_cpu_relax_watchdog(void)
+{
+	touch_nmi_watchdog();
+	cpu_relax();	/* barrier: cpu_relax -> barrier */
+}
+
+/* barrier: On some architectures, set_mb() uses xchg so it only works on 1, 2,
+ * 4 or 8 byte quantities.  This violates Documentation/memory-barriers.txt
+ * which implies that set_mb can be used on any data type.  The only
+ * requirement is that their be a memory barrier after assigning the value, so
+ * define our own version that uses generic operations.
+ */
+#define cs_set_mb(var, value) do { (var) = (value); mb(); } while (0)
+
+/* If we cannot safely use an external print routine then save any messages in
+ * a local buffer, allowing 256 bytes of messages per cpu.  This code is not
+ * performance sensitive so we take the time to left justify the entire buffer
+ * instead of using ring pointers, this removes the need for users to cope with
+ * wrapped cs_msg text when analysing a crash_stopped kernel.
+ */
+
+static char cs_msg[256*NR_CPUS];
+static DEFINE_SPINLOCK(cs_msg_lock);
+static int cs_msg_lock_owner = -1;
+
+static asmlinkage int
+cs_printk(const char * fmt, ...)
+{
+	int l, ret, shift;
+	va_list ap;
+	/* If we get NMI'd during this code then discard any messages for the
+	 * nested event.  Either way we lose some messages and it is far easier
+	 * (and safer) to discard the nested messages.
+	 */
+	if (cs_msg_lock_owner == smp_processor_id())
+		return 0;
+	spin_lock(&cs_msg_lock);
+	/* barrier: setting cs_msg_lock_owner must not move down */
+	set_mb(cs_msg_lock_owner, smp_processor_id());
+	l = strlen(cs_msg);
+	while (1) {
+		va_start(ap, fmt);
+		ret = vsnprintf(cs_msg+l, sizeof(cs_msg)-l, fmt, ap);
+		va_end(ap);
+		if (l == 0 || ret < sizeof(cs_msg)-l)
+			break;
+		shift = sizeof(cs_msg) / 10;
+		shift = max(shift, ret);
+		shift = min(shift, l);
+		l -= shift;
+		memcpy(cs_msg, cs_msg+shift, l);
+		memset(cs_msg+l, 0, sizeof(cs_msg)-l);
+	}
+	/* barrier: clearing cs_msg_lock_owner must not move up */
+	barrier();
+	cs_msg_lock_owner = -1;
+	spin_unlock(&cs_msg_lock);
+	return ret;
+}
+
+/* At the start of a notify_chain, all cpus are driven into this routine, via
+ * cs_common_cpu().  It is a dummy callback, cs_common_cpu() takes care of
+ * holding the slave cpus until the end of the notify_chain.
+ */
+static void
+cs_notify_callback(int monarch, void *data)
+{
+}
+
+/* Called by the arch specific crash_stop code, when they see a notify_chain()
+ * event that debug style code might care about.
+ */
+void
+cs_notify_chain_start(struct pt_regs *regs)
+{
+	int cpu = smp_processor_id();
+	WARN_ON(!irqs_disabled());
+	while (cs_leaving)
+		cs_cpu_relax_watchdog();
+	if (cs_lock_owner == cpu || cs_notify_chain_owner == cpu) {
+		/* This cpu is already the crash_stop monarch, so the slaves
+		 * are already stopped.  Ignore the fact that we are being
+		 * called from a notify_chain, instead any calls to crash_stop
+		 * from the chain will be treated as nested calls.
+		 */
+		++cs_notify_chain_depth;
+		return;
+	}
+retry:
+	spin_lock(&cs_lock);
+	if (cs_notify_chain_owner >= 0) {
+		/* another monarch is running a notify_chain, back off */
+		spin_unlock(&cs_lock);
+		cs_mdelay(1);
+		goto retry;
+	}
+	set_mb(cs_lock_owner, cpu);
+	set_mb(cs_notify_chain_owner, cpu);
+	set_mb(cs_lock_owner, -1);
+	spin_unlock(&cs_lock);
+	crash_stop(cs_notify_callback, NULL, NULL, regs, __FUNCTION__);
+}
+
+/* Called by the arch specific crash_stop code, when they reach the end of a
+ * notify_chain() event that debug style code might care about.  It is also
+ * called by notifier_call_chain() when it does an early termination of the
+ * chain, that call is required because the arch code will now not be called
+ * for the end of the chain.  For the latter case, do not assume that interupts
+ * are disabled.  Which in turn means using raw_smp_processor_id() to check if
+ * this cpu is running a notify_chain or not.
+ */
+void
+cs_notify_chain_end(void)
+{
+	int cpu = raw_smp_processor_id();
+	while (cs_leaving)
+		cs_cpu_relax_watchdog();
+	if (cs_lock_owner == cpu || cs_notify_chain_owner == cpu) {
+		WARN_ON(!irqs_disabled());
+		if (cs_notify_chain_depth) {
+			/* end of a nested chain */
+			--cs_notify_chain_depth;
+			return;
+		}
+		spin_lock(&cs_lock);
+		set_mb(cs_lock_owner, cpu);
+		/* barrier: setting cs_notify_chain_ended must not move down */
+		set_mb(cs_notify_chain_ended, 1);
+		while (atomic_read(&cs_common_cpu_slaves))
+			cs_cpu_relax_watchdog();
+		set_mb(cs_notify_chain_ended, 0);
+		set_mb(cs_notify_chain_owner, -1);
+		set_mb(cs_lock_owner, -1);
+		spin_unlock(&cs_lock);
+	}
+}
+
+static void
+cs_online_cpu_status(const char *text)
+{
+#ifdef	CONFIG_SMP
+	int slaves = num_online_cpus() - 1, count = 0, cpu, unknown;
+	if (!slaves)
+		return;
+	for_each_online_cpu(cpu) {
+		if (cpu_isset(cpu, cs_cpu_mask) &&
+		    cpu != smp_processor_id())
+			++count;
+	}
+	unknown = slaves - count;
+	if (unknown == 0)
+		cs_global.print(
+			"All cpus are in crash_stop for %s\n", text);
+	else {
+		int first_print = 1, start = -1, stop = -1;
+		cs_global.print("%d cpu%s ",
+			unknown,
+			unknown == 1 ? "" : "s");
+		for (cpu = 0; cpu <= NR_CPUS; ++cpu) {
+			if (cpu == NR_CPUS ||
+			    !cpu_online(cpu) ||
+			    cpu_isset(cpu, cs_cpu_mask) ||
+			    cpu == smp_processor_id())
+				stop = cpu;
+			if (stop >= 0 && start >= 0) {
+				if (first_print) {
+					cs_global.print("(");
+					first_print = 0;
+				} else {
+					cs_global.print(", ");
+				}
+				cs_global.print("%d", start);
+				if (stop - 1 > start)
+					cs_global.print("-%d", stop - 1);
+				stop = -1;
+				start = -1;
+			}
+			if (cpu < NR_CPUS &&
+			    cpu_online(cpu) &&
+			    !cpu_isset(cpu, cs_cpu_mask) &&
+			    cpu != smp_processor_id() &&
+			    start < 0)
+				start = cpu;
+		}
+		cs_global.print(") %s not in crash_stop for %s, %s state is unknown\n",
+			unknown == 1 ? "is" : "are",
+			text,
+			unknown == 1 ? "its" : "their");
+	}
+#else	/* !CONFIG_SMP */
+	cs_global.print(
+		"All cpus are in crash_stop for %s\n", text);
+#endif	/* CONFIG_SMP */
+}
+
+#ifdef	CONFIG_SMP
+/* Should only be called by the arch interrupt handlers, when the slave cpus
+ * receive the crash_stop specific IPI.
+ */
+void
+cs_common_ipi(void)
+{
+	while (cs_leaving)
+		cs_cpu_relax_watchdog();
+	if (!cs_global.callback) {
+		printk(KERN_DEBUG "Ignoring late cs_ipi on cpu %d\n",
+		       smp_processor_id());
+		return;
+	}
+	crash_stop_slave();
+}
+
+/* Should only be called by the arch specific NMI handlers, to see if this NMI
+ * is for crash_stop or for something else.  On most architectures, an NMI
+ * signal carries no state so we have to maintain an external state to indicate
+ * why it was sent.
+ *
+ * Note: this function is only valid when a slave is entering crash_stop()
+ * state.  Due to races between the time that the monarch releases a slave and
+ * the slave actually exiting, it is not safe to call this routine while a
+ * slave is leaving.  It is up to the calling code to save the state of
+ * crash_stop_sent_nmi() on entry if they need to test it on exit.
+ */
+int
+crash_stop_sent_nmi(void)
+{
+	return cpu_isset(smp_processor_id(), cs_sent_nmi);
+}
+#endif	/* CONFIG_SMP */
+
+/* Should only be called by the arch specific crash_stop code, after they have
+ * saved any arch specific state.  The call chain is :-
+ *
+ * crash_stop() [monarch] or cs_common_ipi() [slave] ->
+ *   crash_stop_slave() [common front end code] ->
+ *     cs_arch_cpu() [arch dependent code] ->
+ *       cs_common_cpu() [common back end code] ->
+ *         external crash_stop callback
+ *
+ * When cs_common_cpu() is entered for a slave cpu, it must spin while
+ * cs_monarch < 0.  That enforces the order of slave callbacks first, then
+ * monarch callback.
+ *
+ * When handling a notify_chain, park the slave cpus in this holding routine
+ * while the monarch cpu runs down the notify_chain.  If any entry on the
+ * notify_chain calls crash_stop_slave() than release the slaves to the
+ * corresponding crash_stop callback.  On return from the callback, put them
+ * back in a holding loop.  The state of the slave cpus is not significantly
+ * changed by this process and each caller of crash_stop_slave() gets the same
+ * data in crash_stop_running_process.  IOW, all entries on the notify_chain
+ * see the state that was saved by the first crash_stop entry on the chain, not
+ * some state that changes as the monarch runs the notify_chain.
+ */
+void
+cs_common_cpu(int monarch)
+{
+	if (monarch) {
+		if (!cs_recovered) {
+			++cs_monarch_depth;
+			cs_global.callback(1, cs_global.data);
+			--cs_monarch_depth;
+		}
+		return;
+	}
+	atomic_inc(&cs_common_cpu_slaves);
+	do {
+		/* slaves wait until the monarch enters */
+		while (cs_monarch < 0 && !cs_notify_chain_ended)
+			cs_cpu_relax_watchdog();
+		if (cs_notify_chain_ended)
+			break;
+		if (!cs_recovered)
+			cs_global.callback(0, cs_global.data);
+		/* slaves wait until the monarch leaves */
+		while (cs_monarch >= 0)
+			cs_cpu_relax_watchdog();
+	} while (cs_notify_chain_owner >= 0);
+	atomic_dec(&cs_common_cpu_slaves);
+}
+
+#ifdef	CONFIG_SMP
+/* The monarch has to wait for the slaves to enter crash_stop state.  Wait for
+ * up to 3 seconds plus an extra 100 ms per online cpu to cope with live lock
+ * on systems with large cpu counts.  These are arbitrary numbers, it might be
+ * worth exposing them as /sys values so sites can tune their debugging.
+ * Review this after we have more experience with this code - KAO.
+ */
+static void
+cs_wait_for_cpus(void)
+{
+	int count, prev_count = 0, sent_nmi = 0, t, wait_secs, slaves, cpu;
+	slaves = num_online_cpus() - 1;
+	wait_secs = 3 + (slaves * 100) / 1000;
+	cs_mdelay(100);
+	for (t = 0; t < wait_secs; ++t) {
+		count = 0;
+		slaves = num_online_cpus() - 1;
+		for_each_online_cpu(cpu) {
+			if (cpu_isset(cpu, cs_cpu_mask))
+				++count;
+		}
+		if (count == slaves)
+			break;
+		if (prev_count != count) {
+			cs_global.print(
+				"  %d out of %d cpus in crash_stop, "
+				"waiting for the rest, timeout in %d "
+				"second(s)\n",
+				count+1, slaves+1, wait_secs-t);
+			prev_count = count;
+		}
+		cs_mdelay(1000);
+		if (!sent_nmi && t == min(wait_secs / 2, 5)) {
+			for_each_online_cpu(cpu) {
+				if (cpu_isset(cpu, cs_cpu_mask) ||
+				    cpu_isset(cpu, cs_sent_nmi) ||
+				    cpu == smp_processor_id())
+					continue;
+				if (!sent_nmi) {
+					cs_global.print(" sending NMI ");
+					sent_nmi = 1;
+				}
+				cpu_set(cpu, cs_sent_nmi);
+				smp_wmb();
+				cs_arch_send_nmi(cpu);
+			}
+		}
+		if (t % 4 == 0)
+			cs_global.print(".");
+	}
+}
+#endif	/* CONFIG_SMP */
+
+static void
+cs_stop_the_slaves(void)
+{
+#ifdef	CONFIG_SMP
+	int sent_ipi = 0, cpu;
+	for_each_online_cpu(cpu) {
+		if (cpu_isset(cpu, cs_cpu_mask) ||
+		    cpu_isset(cpu, cs_sent_ipi) ||
+		    cpu == smp_processor_id())
+			continue;
+		cpu_set(cpu, cs_sent_ipi);
+		cs_arch_send_ipi(cpu);
+		sent_ipi = 1;
+	}
+	if (sent_ipi)
+		cs_wait_for_cpus();
+#endif	/* CONFIG_SMP */
+}
+
+/**
+ * cs_cpu: - Put the current cpu into crash_stop state.
+ * @monarch: 0 for a slave cpu, 1 for the monarch cpu.
+ *
+ * Invoked on every cpu that is being stopped, with no externally defined order
+ * between monarch and slaves.  The arch independent running state is saved
+ * here, then cs_arch_cpu() saves any arch specific state, followed by
+ * invocation of cs_common_cpu() which drives the callback routine.
+ */
+static void
+cs_cpu(int monarch)
+{
+	struct crash_stop_running_process *r, prev;
+	int cpu = smp_processor_id();
+	cpu_set(cpu, cs_cpu_mask);
+	r = crash_stop_running_process + cpu;
+	prev = *r;
+	r->p = current;
+	r->regs = get_irq_regs();
+	r->prev = &prev;
+	if (!prev.p) {
+		if (monarch) {
+			/* Top level call to crash_stop().  Delay 10 ms to give
+			 * the slave callbacks (see cs_common_cpu()) a chance
+			 * to get started before running the callback on the
+			 * monarch.
+			 */
+			set_mb(cs_monarch, cpu);
+			cs_mdelay(10);
+		}
+	}
+	cs_arch_cpu(monarch, r);
+	*r = prev;
+	if (!prev.p) {
+		if (monarch) {
+			set_mb(cs_leaving, 1);
+			set_mb(cs_monarch, -1);
+		}
+		cpu_clear(cpu, cs_sent_ipi);
+		cpu_clear(cpu, cs_sent_nmi);
+		/* barrier: cs_cpu_mask functions as the main filter
+		 * for the state of the cpus, flush preceding updates
+		 * to memory before clearing cs_cpu_mask.
+		 */
+		smp_mb__before_clear_bit();
+		cpu_clear(cpu, cs_cpu_mask);
+	}
+}
+
+#ifdef	CONFIG_SMP
+/* crash_stop_slave: - Put the current slave cpu into crash_stop state.  */
+void
+crash_stop_slave(void)
+{
+	while (cs_leaving)
+		cs_cpu_relax_watchdog();
+	cs_cpu(0);
+}
+#endif	/* CONFIG_SMP */
+
+/**
+ * crash_stop: - Bring the system to a crash stop for debugging.
+ * @callback: After each cpu has been interrupted, the callback is invoked on
+ * that cpu, with the monarch flag set to 0.  After all cpus have responded or
+ * the timeout has been reached then the callback is invoked on the current cpu
+ * with the monarch flag set to 1.
+ * @data: Callback specific data, crash_stop does not use this data.
+ * @print: Optionally, the name of a debugger specific print routine.  If this
+ * is NULL then crash_stop will default to using cs_printk(), messages will be
+ * left justified in cs_msg[].
+ *
+ * Unlike stop_machine(), crash_stop() does not ask if the other cpus are
+ * ready to be stopped and will use non-maskable interrupts to stop cpus that
+ * do not respond after a few seconds.
+ *
+ * crash_stop() must be entered with interrupts disabled, it can even be
+ * entered from an NMI event.  It is the caller's responsibility to ensure that
+ * their print routine (if any) is safe in the current context.
+ *
+ * If the system has already entered a globally stopped state then sending IPI
+ * or NMI is pointless and may even be unsafe.  This particularly applies to
+ * MCA or global INIT on IA64, these events are already defined to stop the
+ * entire machine and they also prevent crash_stop() from sending any IPI or
+ * NMI events.  Only send IPI/NMI to cpus that are not yet in crash_stop state.
+ *
+ * The global structure crash_stop_running_process is updated with information
+ * about the tasks that are running on each cpu.  The debugger can use this
+ * information to start the analysis of the running tasks.
+ *
+ * This function cannot assume that the caller has already saved the pt_regs,
+ * so it does it anyway.  Some callers (e.g. oops) will have called
+ * set_irq_regs(), others (e.g. NMI watchdog) will not.
+ *
+ * Returns: 0 normal
+ *          -ENOSYS crash_stop is not supported on this architecture.
+ */
+
+int
+crash_stop(void (*callback)(int monarch, void *data),
+	   void *data, printk_t print,
+	   struct pt_regs *regs, const char *text)
+{
+	int cpu;
+	struct cs_global csg_save, csg = {
+		.callback = callback,
+		.data = data,
+		.print = print ? print : cs_printk,
+	};
+	struct pt_regs *old_regs;
+
+	WARN_ON(!irqs_disabled());
+retry:
+	if (!spin_trylock(&cs_lock)) {
+		if (cs_lock_owner == smp_processor_id()) {
+			/* nested call on the same cpu */
+			csg_save = cs_global;
+			cs_set_mb(cs_global, csg);
+			cs_online_cpu_status(text);
+			cs_cpu(1);
+			cs_set_mb(cs_global, csg_save);
+			return 0;
+		}
+		/* concurrent call on another cpu */
+		while (cs_lock_owner != -1)
+			cs_cpu_relax_watchdog();
+		goto retry;
+	}
+
+	if (cs_leaving) {
+		/* previous crash stop has not quite completed, back off */
+		spin_unlock(&cs_lock);
+		cs_mdelay(1);
+		goto retry;
+	}
+
+	if (cs_notify_chain_owner >= 0 &&
+	    cs_notify_chain_owner != smp_processor_id()) {
+		/* another cpu is running a notify_chain, back off */
+		spin_unlock(&cs_lock);
+		cs_mdelay(1);
+		goto retry;
+	}
+
+	set_mb(cs_lock_owner, smp_processor_id());
+	old_regs = set_irq_regs(regs);
+	csg_save = cs_global;
+	cs_set_mb(cs_global, csg);
+	cs_stop_the_slaves();
+	cs_online_cpu_status(text);
+	cs_cpu(1);
+	set_mb(cs_leaving, 1);
+	if (cs_monarch < 0 && cs_notify_chain_owner < 0) {
+		/* leaving a normal call, wait for the slaves to exit */
+		for_each_online_cpu(cpu) {
+			while (cpu_isset(cpu, cs_cpu_mask))
+				cs_cpu_relax_watchdog();
+		}
+	}
+	cs_set_mb(cs_global, csg_save);
+	set_mb(cs_lock_owner, -1);
+	set_irq_regs(old_regs);
+	spin_unlock(&cs_lock);
+	set_mb(cs_leaving, 0);
+	return 0;
+}
+
+/**
+ * crash_stop_recovered: - Release any slaves in crash_stop state.
+ *
+ * On architectures that define their own global synchronization methods, the
+ * slave cpus may enter crash_stop state before the monarch.  If the monarch
+ * decides that the event is recoverable then the slaves need to be released
+ * from crash_stop, without invoking any callbacks.
+ *
+ * For recovered events, we do not always force the other cpus into slave
+ * state.  The assumption is that crash_stop_recovered() is only required on
+ * architectures that define their own global synchronization methods (e.g.
+ * IA64 MCA), in which case the architecture has already take care of the
+ * slaves.  If no slave cpu is in crash_stop() state then do nothing, otherwise
+ * wait until all the slaves are in crash_stop().
+ *
+ * If the code that calls crash_stop_recovered() is in a notify_chain then the
+ * caller must call cs_notify_chain_end() before crash_stop_recovered().
+ * Calling this function when this cpu is the notify_chain owner is assumed to
+ * be a nested call and it is silently ignored.  IOW it is a recovery from a
+ * nested event and we want to hold the slaves until we exit from the top level
+ * of crash_stop code.
+ */
+void
+crash_stop_recovered(void)
+{
+	int cpu, any_slaves = 0;
+
+	WARN_ON(!irqs_disabled());
+	while (cs_leaving)
+		cs_cpu_relax_watchdog();
+	if (cs_notify_chain_owner >= 0 &&
+	    cs_notify_chain_owner == smp_processor_id())
+		return;
+retry:
+	spin_lock(&cs_lock);
+	if (cs_notify_chain_owner >= 0) {
+		/* another cpu is running a notify_chain, back off */
+		spin_unlock(&cs_lock);
+		cs_mdelay(1);
+		goto retry;
+	}
+	set_mb(cs_lock_owner, smp_processor_id());
+	for_each_online_cpu(cpu) {
+		if (cpu_isset(cpu, cs_cpu_mask) &&
+		    cpu != smp_processor_id()) {
+			any_slaves = 1;
+			break;
+		}
+	}
+	if (any_slaves) {
+		/* give cs_stop_the_slaves/cs_wait_for_cpus a safe print
+		 * routine.
+		 */
+		struct cs_global csg_save, csg = {
+			.print = cs_printk,
+		};
+		csg_save = cs_global;
+		cs_set_mb(cs_global, csg);
+		cs_stop_the_slaves();
+		cs_set_mb(cs_global, csg_save);
+	}
+	set_mb(cs_recovered, 1);
+	set_mb(cs_monarch, smp_processor_id());
+	for_each_online_cpu(cpu) {
+		while (cpu_isset(cpu, cs_cpu_mask))
+			cs_cpu_relax_watchdog();
+	}
+	set_mb(cs_recovered, 0);
+	set_mb(cs_monarch, -1);
+	set_mb(cs_lock_owner, -1);
+	spin_unlock(&cs_lock);
+	return;
+}
+
+/**
+ * crash_stop_slaves: - Return the number of slave cpus that the user will see.
+ *
+ * For a non-nested call, the user will see all the cpus that are in crash_stop
+ * state.  For a nested call, the user will not see any slave cpus.
+ */
+int
+crash_stop_slaves(void)
+{
+	if (cs_monarch_depth == 1)
+		return atomic_read(&cs_common_cpu_slaves);
+	else
+		return 0;
+}
Index: linux/kernel/sys.c
===================================================================
--- linux.orig/kernel/sys.c
+++ linux/kernel/sys.c
@@ -29,6 +29,7 @@
 #include <linux/signal.h>
 #include <linux/cn_proc.h>
 #include <linux/getcpu.h>
+#include <linux/crash_stop.h>
 
 #include <linux/compat.h>
 #include <linux/syscalls.h>
@@ -144,8 +145,10 @@ static int __kprobes notifier_call_chain
 	while (nb) {
 		next_nb = rcu_dereference(nb->next);
 		ret = nb->notifier_call(nb, val, v);
-		if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
+		if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) {
+			cs_notify_chain_end();
 			break;
+		}
 		nb = next_nb;
 	}
 	return ret;

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 3/12] crash_stop: i386 interrupt handlers
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 1/12] crash_stop: common header Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 2/12] crash_stop: common code Keith Owens
@ 2006-11-09  4:04 ` Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 4/12] crash_stop: i386 specific code Keith Owens
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:04 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

Define the i386 crash_stop() interrupt handler and associated routines.

CRASH_STOP_VECTOR conflicts with the lkcd vector.  This is deliberate,
one aim of the crash_stop() API is to remove all the interrupt code
from the various kernel debug patches and use the common crash_stop
code instead.

Note 1: I have patched visw but I cannot test it, no hardware.

Note 2: This patch does not cover i386 voyager.  I do not understand
        the voyager interrupt mechanism well enough to define its
        interrupt handlers.  Somebody with voyager equipment needs to
        define the cs_arch_send_ipi() and cs_arch_send_nmi() routines,
        plus the gate for the voyager CRASH_STOP_VECTOR.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 arch/i386/kernel/smp.c                      |   23 +++++++++++++++++++++++
 arch/i386/kernel/smpboot.c                  |    4 ++++
 include/asm-i386/hw_irq.h                   |    1 +
 include/asm-i386/mach-default/entry_arch.h  |    3 +++
 include/asm-i386/mach-default/irq_vectors.h |    1 +
 include/asm-i386/mach-visws/entry_arch.h    |    3 +++
 6 files changed, 35 insertions(+)

Index: linux/arch/i386/kernel/smp.c
===================================================================
--- linux.orig/arch/i386/kernel/smp.c
+++ linux/arch/i386/kernel/smp.c
@@ -20,6 +20,7 @@
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
 #include <linux/module.h>
+#include <linux/crash_stop.h>
 
 #include <asm/mtrr.h>
 #include <asm/tlbflush.h>
@@ -733,3 +734,25 @@ int safe_smp_processor_id(void)
 
 	return cpuid >= 0 ? cpuid : 0;
 }
+
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+void
+cs_arch_send_ipi(int cpu)
+{
+	send_IPI_mask(cpumask_of_cpu(cpu), CRASH_STOP_VECTOR);
+}
+
+void
+cs_arch_send_nmi(int cpu)
+{
+	send_IPI_mask(cpumask_of_cpu(cpu), NMI_VECTOR);
+}
+
+fastcall void smp_crash_stop_interrupt(struct pt_regs *regs)
+{
+	struct pt_regs *old_regs = set_irq_regs(regs);
+	ack_APIC_irq();
+	cs_common_ipi();
+	set_irq_regs(old_regs);
+}
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
Index: linux/arch/i386/kernel/smpboot.c
===================================================================
--- linux.orig/arch/i386/kernel/smpboot.c
+++ linux/arch/i386/kernel/smpboot.c
@@ -1497,6 +1497,10 @@ void __init smp_intr_init(void)
 
 	/* IPI for generic function call */
 	set_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);
+
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+	set_intr_gate(CRASH_STOP_VECTOR, crash_stop_interrupt);
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
 }
 
 /*
Index: linux/include/asm-i386/hw_irq.h
===================================================================
--- linux.orig/include/asm-i386/hw_irq.h
+++ linux/include/asm-i386/hw_irq.h
@@ -32,6 +32,7 @@ extern void (*interrupt[NR_IRQS])(void);
 fastcall void reschedule_interrupt(void);
 fastcall void invalidate_interrupt(void);
 fastcall void call_function_interrupt(void);
+fastcall void crash_stop_interrupt(void);
 #endif
 
 #ifdef CONFIG_X86_LOCAL_APIC
Index: linux/include/asm-i386/mach-default/entry_arch.h
===================================================================
--- linux.orig/include/asm-i386/mach-default/entry_arch.h
+++ linux/include/asm-i386/mach-default/entry_arch.h
@@ -13,6 +13,9 @@
 BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
 BUILD_INTERRUPT(invalidate_interrupt,INVALIDATE_TLB_VECTOR)
 BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+BUILD_INTERRUPT(crash_stop_interrupt,CRASH_STOP_VECTOR)
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
 #endif
 
 /*
Index: linux/include/asm-i386/mach-default/irq_vectors.h
===================================================================
--- linux.orig/include/asm-i386/mach-default/irq_vectors.h
+++ linux/include/asm-i386/mach-default/irq_vectors.h
@@ -48,6 +48,7 @@
 #define INVALIDATE_TLB_VECTOR	0xfd
 #define RESCHEDULE_VECTOR	0xfc
 #define CALL_FUNCTION_VECTOR	0xfb
+#define CRASH_STOP_VECTOR	0xfa
 
 #define THERMAL_APIC_VECTOR	0xf0
 /*
Index: linux/include/asm-i386/mach-visws/entry_arch.h
===================================================================
--- linux.orig/include/asm-i386/mach-visws/entry_arch.h
+++ linux/include/asm-i386/mach-visws/entry_arch.h
@@ -7,6 +7,9 @@
 BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
 BUILD_INTERRUPT(invalidate_interrupt,INVALIDATE_TLB_VECTOR)
 BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+BUILD_INTERRUPT(crash_stop_interrupt,CRASH_STOP_VECTOR)
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
 #endif
 
 /*

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 4/12] crash_stop: i386 specific code
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (2 preceding siblings ...)
  2006-11-09  4:04 ` [patch 2.6.19-rc5 3/12] crash_stop: i386 interrupt handlers Keith Owens
@ 2006-11-09  4:04 ` Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 5/12] crash_stop: add DIE_NMIWATCHDOG to x86_64 Keith Owens
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:04 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

Add the i386 specific crash_stop code.  This contains routines that are
called from the common crash_stop code and from the i386 notify_die
chain.

Note: Not tested on visw nor voyager, no hardware.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 arch/i386/kernel/Makefile     |    1 
 arch/i386/kernel/crash_stop.c |  127 ++++++++++++++++++++++++++++++++++++++++++
 include/asm-i386/crash_stop.h |   14 ++++
 3 files changed, 142 insertions(+)

Index: linux/arch/i386/kernel/Makefile
===================================================================
--- linux.orig/arch/i386/kernel/Makefile
+++ linux/arch/i386/kernel/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_VM86)		+= vm86.o
 obj-$(CONFIG_EARLY_PRINTK)	+= early_printk.o
 obj-$(CONFIG_HPET_TIMER) 	+= hpet.o
 obj-$(CONFIG_K8_NB)		+= k8.o
+obj-$(CONFIG_CRASH_STOP_SUPPORTED)	+= crash_stop.o
 
 EXTRA_AFLAGS   := -traditional
 
Index: linux/arch/i386/kernel/crash_stop.c
===================================================================
--- /dev/null
+++ linux/arch/i386/kernel/crash_stop.c
@@ -0,0 +1,127 @@
+/*
+ * linux/arch/i386/crash_stop.c
+ *
+ * Copyright (C) 2006 Keith Owens <kaos@sgi.com>
+ *
+ * The i386 specific bits of the crash_stop code.  There is a little bit of
+ * crash_stop code in arch/i386/kernel/{smp,smpboot}.c to handle
+ * CRASH_STOP_VECTOR, everything else is in this file.
+ */
+
+#include <linux/crash_stop.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/ptrace.h>
+#include <asm/kdebug.h>
+
+/* The starting point for a backtrace on a running process is set to
+ * cs_arch_cpu().  It would be nice to go up a couple of levels to start the
+ * backtrace where crash_stop() or cs_common_ipi() were invoked, but that is
+ * more work than I am willing to do in this function.  Starting the backtrace
+ * at cs_arch_cpu() is simple and reliable, which is exactly what we want when
+ * the machine is already failing.
+ */
+void
+cs_arch_cpu(int monarch, struct crash_stop_running_process *r)
+{
+	r->arch.esp = current_stack_pointer;
+	r->arch.eip = (unsigned long)current_text_addr();
+	/* separate any stack changes from current_stack_pointer above */
+	barrier();
+	cs_common_cpu(monarch);
+}
+
+/* Called at the start of a notify_chain. */
+static int
+cs_arch_notify_start(struct notifier_block *self,
+		     unsigned long val, void *data)
+{
+	struct die_args *args = data;
+	switch(val) {
+	case DIE_OOPS:
+	case DIE_NMIWATCHDOG:
+		cs_notify_chain_start(args->regs);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+/* Called at the end of a notify_chain. */
+static int
+cs_arch_notify_end(struct notifier_block *self,
+		   unsigned long val, void *data)
+{
+	switch(val) {
+	case DIE_OOPS:
+	case DIE_NMIWATCHDOG:
+		cs_notify_chain_end();
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+/* Pick up any NMI IPIs that were sent by crash_stop. */
+static int
+cs_arch_notify_nmi(struct notifier_block *self,
+		   unsigned long val, void *data)
+{
+	switch(val) {
+	case DIE_NMI_IPI:
+		if (crash_stop_sent_nmi()) {
+			crash_stop_slave();
+			return NOTIFY_STOP;
+		}
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cs_arch_nb_start = {
+	.notifier_call = cs_arch_notify_start,
+	.priority = ~0U >> 1,
+};
+
+static struct notifier_block cs_arch_nb_end = {
+	.notifier_call = cs_arch_notify_end,
+	.priority = 1,
+};
+
+static struct notifier_block cs_arch_nb_nmi = {
+	.notifier_call = cs_arch_notify_nmi,
+	.priority = 10,
+};
+
+static int __init
+cs_arch_init(void)
+{
+	int err;
+	const char *nb_name;
+	nb_name = "cs_arch_nb_start";
+	if ((err = register_die_notifier(&cs_arch_nb_start)))
+		goto error;
+	nb_name = "cs_arch_nb_end";
+	if ((err = register_die_notifier(&cs_arch_nb_end)))
+		goto error;
+	nb_name = "cs_arch_nb_nmi";
+	if ((err = register_die_notifier(&cs_arch_nb_nmi)))
+		goto error;
+	return 0;
+error:
+	printk(KERN_ERR "Failed to register %s\n", nb_name);
+	unregister_die_notifier(&cs_arch_nb_start);
+	unregister_die_notifier(&cs_arch_nb_end);
+	unregister_die_notifier(&cs_arch_nb_nmi);
+	return err;
+}
+
+static void __exit
+cs_arch_exit(void)
+{
+	unregister_die_notifier(&cs_arch_nb_nmi);
+	unregister_die_notifier(&cs_arch_nb_start);
+	unregister_die_notifier(&cs_arch_nb_end);
+	return;
+}
+
+module_init(cs_arch_init);
+module_exit(cs_arch_exit);
Index: linux/include/asm-i386/crash_stop.h
===================================================================
--- /dev/null
+++ linux/include/asm-i386/crash_stop.h
@@ -0,0 +1,14 @@
+#ifndef _ASM_CRASH_STOP_H
+#define _ASM_CRASH_STOP_H
+
+/* CONFIG_4KSTACKS means that the registers (including eip) at the time of the
+ * interrupt can be on one stack while the crash_stop code is running on
+ * another stack.  We have to save the current esp and eip.
+ */
+struct crash_stop_running_process_arch
+{
+	unsigned long esp;
+	unsigned long eip;
+};
+
+#endif	/* _ASM_CRASH_STOP_H */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 5/12] crash_stop: add DIE_NMIWATCHDOG to x86_64
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (3 preceding siblings ...)
  2006-11-09  4:04 ` [patch 2.6.19-rc5 4/12] crash_stop: i386 specific code Keith Owens
@ 2006-11-09  4:04 ` Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 6/12] crash_stop: x86_64 interrupt handlers Keith Owens
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:04 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

x86_64 defines DIE_NMIWATCHDOG but does not use it.  Add an x86_64 call to
notify_die when the NMI watchdog is about to go off, as i386 does.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 arch/x86_64/kernel/traps.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Index: linux/arch/x86_64/kernel/traps.c
===================================================================
--- linux.orig/arch/x86_64/kernel/traps.c
+++ linux/arch/x86_64/kernel/traps.c
@@ -613,7 +613,11 @@ void die(const char * str, struct pt_reg
 
 void __kprobes die_nmi(char *str, struct pt_regs *regs, int do_panic)
 {
-	unsigned long flags = oops_begin();
+	unsigned long flags;
+	if (notify_die(DIE_NMIWATCHDOG, str, regs, 0, 2, SIGINT) ==
+	    NOTIFY_STOP)
+		return;
+	flags = oops_begin();
 
 	/*
 	 * We are in trouble anyway, lets at least try

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 6/12] crash_stop: x86_64 interrupt handlers
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (4 preceding siblings ...)
  2006-11-09  4:04 ` [patch 2.6.19-rc5 5/12] crash_stop: add DIE_NMIWATCHDOG to x86_64 Keith Owens
@ 2006-11-09  4:04 ` Keith Owens
  2006-11-09  4:04 ` [patch 2.6.19-rc5 7/12] crash_stop: x86_64 specific code Keith Owens
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:04 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

Define the x86_64 crash_stop() interrupt handler and associated routines.

CRASH_STOP_VECTOR conflicts with the KDB vector.  This is deliberate,
one aim of the crash_stop() API is to remove all the interrupt code
from the various kernel debug patches and use the common crash_stop
code instead.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 arch/x86_64/kernel/entry.S  |    6 ++++++
 arch/x86_64/kernel/i8259.c  |    4 ++++
 arch/x86_64/kernel/smp.c    |   23 +++++++++++++++++++++++
 include/asm-x86_64/hw_irq.h |    3 +--
 4 files changed, 34 insertions(+), 2 deletions(-)

Index: linux/arch/x86_64/kernel/entry.S
===================================================================
--- linux.orig/arch/x86_64/kernel/entry.S
+++ linux/arch/x86_64/kernel/entry.S
@@ -699,6 +699,12 @@ END(invalidate_interrupt\num)
 ENTRY(call_function_interrupt)
 	apicinterrupt CALL_FUNCTION_VECTOR,smp_call_function_interrupt
 END(call_function_interrupt)
+
+#ifdef CONFIG_CRASH_STOP_SUPPORTED
+ENTRY(crash_stop_interrupt)
+	apicinterrupt CRASH_STOP_VECTOR,smp_crash_stop_interrupt
+END(crash_stop_interrupt)
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
 #endif
 
 ENTRY(apic_timer_interrupt)
Index: linux/arch/x86_64/kernel/i8259.c
===================================================================
--- linux.orig/arch/x86_64/kernel/i8259.c
+++ linux/arch/x86_64/kernel/i8259.c
@@ -458,6 +458,7 @@ void invalidate_interrupt6(void);
 void invalidate_interrupt7(void);
 void thermal_interrupt(void);
 void threshold_interrupt(void);
+void crash_stop_interrupt(void);
 void i8254_timer_resume(void);
 
 static void setup_timer_hardware(void)
@@ -541,6 +542,9 @@ void __init init_IRQ(void)
 
 	/* IPI for generic function call */
 	set_intr_gate(CALL_FUNCTION_VECTOR, call_function_interrupt);
+#ifdef CONFIG_CRASH_STOP_SUPPORTED
+	set_intr_gate(CRASH_STOP_VECTOR, crash_stop_interrupt);
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
 #endif	
 	set_intr_gate(THERMAL_APIC_VECTOR, thermal_interrupt);
 	set_intr_gate(THRESHOLD_APIC_VECTOR, threshold_interrupt);
Index: linux/arch/x86_64/kernel/smp.c
===================================================================
--- linux.orig/arch/x86_64/kernel/smp.c
+++ linux/arch/x86_64/kernel/smp.c
@@ -19,6 +19,7 @@
 #include <linux/kernel_stat.h>
 #include <linux/mc146818rtc.h>
 #include <linux/interrupt.h>
+#include <linux/crash_stop.h>
 
 #include <asm/mtrr.h>
 #include <asm/pgalloc.h>
@@ -522,3 +523,25 @@ asmlinkage void smp_call_function_interr
 	}
 }
 
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+void
+cs_arch_send_ipi(int cpu)
+{
+	send_IPI_mask(cpumask_of_cpu(cpu), CRASH_STOP_VECTOR);
+}
+
+void
+cs_arch_send_nmi(int cpu)
+{
+	send_IPI_mask(cpumask_of_cpu(cpu), NMI_VECTOR);
+}
+
+asmlinkage void
+smp_crash_stop_interrupt(struct pt_regs *regs)
+{
+	struct pt_regs *old_regs = set_irq_regs(regs);
+	ack_APIC_irq();
+	cs_common_ipi();
+	set_irq_regs(old_regs);
+}
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
Index: linux/include/asm-x86_64/hw_irq.h
===================================================================
--- linux.orig/include/asm-x86_64/hw_irq.h
+++ linux/include/asm-x86_64/hw_irq.h
@@ -47,8 +47,7 @@
 #define ERROR_APIC_VECTOR	0xfe
 #define RESCHEDULE_VECTOR	0xfd
 #define CALL_FUNCTION_VECTOR	0xfc
-/* fb free - please don't readd KDB here because it's useless
-   (hint - think what a NMI bit does to a vector) */
+#define CRASH_STOP_VECTOR	0xfb
 #define THERMAL_APIC_VECTOR	0xfa
 #define THRESHOLD_APIC_VECTOR   0xf9
 /* f8 free */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 7/12] crash_stop: x86_64 specific code
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (5 preceding siblings ...)
  2006-11-09  4:04 ` [patch 2.6.19-rc5 6/12] crash_stop: x86_64 interrupt handlers Keith Owens
@ 2006-11-09  4:04 ` Keith Owens
  2006-11-09  4:05 ` [patch 2.6.19-rc5 8/12] crash_stop: ia64 interrupt handlers Keith Owens
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:04 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

Add the x86_64 specific crash_stop code.  This contains routines that are
called from the common crash_stop code and from the x86_64 notify_die
chain.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 arch/x86_64/kernel/Makefile     |    1 
 arch/x86_64/kernel/crash_stop.c |  128 ++++++++++++++++++++++++++++++++++++++++
 include/asm-x86_64/crash_stop.h |   14 ++++
 3 files changed, 143 insertions(+)

Index: linux/arch/x86_64/kernel/Makefile
===================================================================
--- linux.orig/arch/x86_64/kernel/Makefile
+++ linux/arch/x86_64/kernel/Makefile
@@ -37,6 +37,7 @@ obj-$(CONFIG_X86_PM_TIMER)	+= pmtimer.o
 obj-$(CONFIG_X86_VSMP)		+= vsmp.o
 obj-$(CONFIG_K8_NB)		+= k8.o
 obj-$(CONFIG_AUDIT)		+= audit.o
+obj-$(CONFIG_CRASH_STOP_SUPPORTED)	+= crash_stop.o
 
 obj-$(CONFIG_MODULES)		+= module.o
 obj-$(CONFIG_PCI)		+= early-quirks.o
Index: linux/arch/x86_64/kernel/crash_stop.c
===================================================================
--- /dev/null
+++ linux/arch/x86_64/kernel/crash_stop.c
@@ -0,0 +1,128 @@
+/*
+ * linux/arch/x86_64/crash_stop.c
+ *
+ * Copyright (C) 2006 Keith Owens <kaos@sgi.com>
+ *
+ * The x86_64 specific bits of the crash_stop code.  There is a little bit of
+ * crash_stop code in arch/x86_64/kernel/{smp,i8259}.c to handle
+ * CRASH_STOP_VECTOR, everything else is in this file.
+ */
+
+#include <linux/crash_stop.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/ptrace.h>
+#include <asm/kdebug.h>
+
+/* The starting point for a backtrace on a running process is set to
+ * cs_arch_cpu().  It would be nice to go up a couple of levels to start the
+ * backtrace where crash_stop() or cs_common_ipi() were invoked, but that is
+ * more work than I am willing to do in this function.  Starting the backtrace
+ * at cs_arch_cpu() is simple and reliable, which is exactly what we want when
+ * the machine is already failing.
+ */
+void
+cs_arch_cpu(int monarch, struct crash_stop_running_process *r)
+{
+	register unsigned long current_stack_pointer asm("rsp");
+	r->arch.rsp = current_stack_pointer;
+	r->arch.rip = (unsigned long)current_text_addr();
+	/* separate any stack changes from current_stack_pointer above */
+	barrier();
+	cs_common_cpu(monarch);
+}
+
+/* Called at the start of a notify_chain. */
+static int
+cs_arch_notify_start(struct notifier_block *self,
+		     unsigned long val, void *data)
+{
+	struct die_args *args = data;
+	switch(val) {
+	case DIE_OOPS:
+	case DIE_NMIWATCHDOG:
+		cs_notify_chain_start(args->regs);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+/* Called at the end of a notify_chain. */
+static int
+cs_arch_notify_end(struct notifier_block *self,
+		   unsigned long val, void *data)
+{
+	switch(val) {
+	case DIE_OOPS:
+	case DIE_NMIWATCHDOG:
+		cs_notify_chain_end();
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+/* Pick up any NMI IPIs that were sent by crash_stop. */
+static int
+cs_arch_notify_nmi(struct notifier_block *self,
+		   unsigned long val, void *data)
+{
+	switch(val) {
+	case DIE_NMI_IPI:
+		if (crash_stop_sent_nmi()) {
+			crash_stop_slave();
+			return NOTIFY_STOP;
+		}
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cs_arch_nb_start = {
+	.notifier_call = cs_arch_notify_start,
+	.priority = ~0U >> 1,
+};
+
+static struct notifier_block cs_arch_nb_end = {
+	.notifier_call = cs_arch_notify_end,
+	.priority = 1,
+};
+
+static struct notifier_block cs_arch_nb_nmi = {
+	.notifier_call = cs_arch_notify_nmi,
+	.priority = 10,
+};
+
+static int __init
+cs_arch_init(void)
+{
+	int err;
+	const char *nb_name;
+	nb_name = "cs_arch_nb_start";
+	if ((err = register_die_notifier(&cs_arch_nb_start)))
+		goto error;
+	nb_name = "cs_arch_nb_end";
+	if ((err = register_die_notifier(&cs_arch_nb_end)))
+		goto error;
+	nb_name = "cs_arch_nb_nmi";
+	if ((err = register_die_notifier(&cs_arch_nb_nmi)))
+		goto error;
+	return 0;
+error:
+	printk(KERN_ERR "Failed to register %s\n", nb_name);
+	unregister_die_notifier(&cs_arch_nb_start);
+	unregister_die_notifier(&cs_arch_nb_end);
+	unregister_die_notifier(&cs_arch_nb_nmi);
+	return err;
+}
+
+static void __exit
+cs_arch_exit(void)
+{
+	unregister_die_notifier(&cs_arch_nb_nmi);
+	unregister_die_notifier(&cs_arch_nb_start);
+	unregister_die_notifier(&cs_arch_nb_end);
+	return;
+}
+
+module_init(cs_arch_init);
+module_exit(cs_arch_exit);
Index: linux/include/asm-x86_64/crash_stop.h
===================================================================
--- /dev/null
+++ linux/include/asm-x86_64/crash_stop.h
@@ -0,0 +1,14 @@
+#ifndef _ASM_CRASH_STOP_H
+#define _ASM_CRASH_STOP_H
+
+/* x86_64 uses multiple stacks so the registers (including rip) at the time of
+ * the interrupt can be on one stack while the crash_stop code is running on
+ * another stack.  We have to save the current rsp and rip.
+ */
+struct crash_stop_running_process_arch
+{
+	unsigned long rsp;
+	unsigned long rip;
+};
+
+#endif	/* _ASM_CRASH_STOP_H */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 8/12] crash_stop: ia64 interrupt handlers
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (6 preceding siblings ...)
  2006-11-09  4:04 ` [patch 2.6.19-rc5 7/12] crash_stop: x86_64 specific code Keith Owens
@ 2006-11-09  4:05 ` Keith Owens
  2006-11-09  4:05 ` [patch 2.6.19-rc5 9/12] crash_stop: ia64 specific code Keith Owens
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:05 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

Define the ia64 crash_stop() interrupt handler and associated routines.

IPI_CRASH_STOP conflicts with the KDB vector.  This is deliberate, one
aim of the crash_stop() API is to remove all the interrupt code from
the various kernel debug patches and use the common crash_stop code
instead.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 arch/ia64/kernel/smp.c |   35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

Index: linux/arch/ia64/kernel/smp.c
===================================================================
--- linux.orig/arch/ia64/kernel/smp.c
+++ linux/arch/ia64/kernel/smp.c
@@ -30,6 +30,7 @@
 #include <linux/delay.h>
 #include <linux/efi.h>
 #include <linux/bitops.h>
+#include <linux/crash_stop.h>
 
 #include <asm/atomic.h>
 #include <asm/current.h>
@@ -66,6 +67,7 @@ static volatile struct call_data_struct 
 
 #define IPI_CALL_FUNC		0
 #define IPI_CPU_STOP		1
+#define IPI_CRASH_STOP		2
 
 /* This needs to be cacheline aligned because it is written to by *other* CPUs.  */
 static DEFINE_PER_CPU(u64, ipi_operation) ____cacheline_aligned;
@@ -156,6 +158,16 @@ handle_IPI (int irq, void *dev_id)
 				stop_this_cpu();
 				break;
 
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+			      case IPI_CRASH_STOP:
+				{
+					extern void smp_crash_stop_interrupt(
+						    struct pt_regs *regs);
+					smp_crash_stop_interrupt(get_irq_regs());
+				}
+				break;
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */
+
 			      default:
 				printk(KERN_CRIT "Unknown IPI on CPU %d: %lu\n", this_cpu, which);
 				break;
@@ -379,3 +391,26 @@ setup_profiling_timer (unsigned int mult
 {
 	return -EINVAL;
 }
+
+#ifdef	CONFIG_CRASH_STOP_SUPPORTED
+void
+cs_arch_send_ipi(int cpu)
+{
+	send_IPI_single(cpu, IPI_CRASH_STOP);
+}
+
+void
+cs_arch_send_nmi(int cpu)
+{
+	set_mb(cs_arch_monarch_cpu, smp_processor_id());
+	platform_send_ipi(cpu, 0, IA64_IPI_DM_INIT, 0);
+}
+
+void
+smp_crash_stop_interrupt(struct pt_regs *regs)
+{
+	struct pt_regs *old_regs = set_irq_regs(regs);
+	cs_common_ipi();
+	set_irq_regs(old_regs);
+}
+#endif	/* CONFIG_CRASH_STOP_SUPPORTED */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 9/12] crash_stop: ia64 specific code
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (7 preceding siblings ...)
  2006-11-09  4:05 ` [patch 2.6.19-rc5 8/12] crash_stop: ia64 interrupt handlers Keith Owens
@ 2006-11-09  4:05 ` Keith Owens
  2006-11-09  4:05 ` [patch 2.6.19-rc5 10/12] crash_stop: add to config system Keith Owens
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:05 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

Add the ia64 specific crash_stop code.  This contains routines that are
called from the common crash_stop code and from the ia64 notify_die
chain.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 arch/ia64/kernel/Makefile     |    1 
 arch/ia64/kernel/crash_stop.c |  237 ++++++++++++++++++++++++++++++++++++++++++
 include/asm-ia64/crash_stop.h |   11 +
 3 files changed, 249 insertions(+)

Index: linux/arch/ia64/kernel/Makefile
===================================================================
--- linux.orig/arch/ia64/kernel/Makefile
+++ linux/arch/ia64/kernel/Makefile
@@ -31,6 +31,7 @@ obj-$(CONFIG_KPROBES)		+= kprobes.o jpro
 obj-$(CONFIG_IA64_UNCACHED_ALLOCATOR)	+= uncached.o
 obj-$(CONFIG_AUDIT)		+= audit.o
 obj-$(CONFIG_PCI_MSI)		+= msi_ia64.o
+obj-$(CONFIG_CRASH_STOP_SUPPORTED)	+= crash_stop.o
 mca_recovery-y			+= mca_drv.o mca_drv_asm.o
 
 obj-$(CONFIG_IA64_ESI)		+= esi.o
Index: linux/arch/ia64/kernel/crash_stop.c
===================================================================
--- /dev/null
+++ linux/arch/ia64/kernel/crash_stop.c
@@ -0,0 +1,237 @@
+/*
+ * linux/arch/ia64/crash_stop.c
+ *
+ * Copyright (C) 2006 Keith Owens <kaos@sgi.com>
+ *
+ * Most of the IA64 specific bits of the crash_stop code.  There is a little
+ * bit of crash_stop code in arch/ia64/kernel/smp.c to handle IPI_CRASH_STOP,
+ * everything else is in this file.
+ *
+ * IA64 is more complicated than the other architectures (isn't it always?).
+ * An MCA will force the entire machine into an MCA rendezvous state, using
+ * normal interrupts and/or selective INIT.  The NMI command/button will send
+ * INIT to all cpus at the "same" time.  For both these cases, one cpu is the
+ * monarch and all others are slaves[1].  An INIT can also be generated by
+ * crash_stop, to interrupt any cpus that are spinning disabled.  In the latter
+ * case, mca.c does not know about the monarch that called crash_stop().
+ *
+ * The code in arch/ia64/kernel/mca.c only handles the first two cases, i.e.
+ * the ones that are defined by the SAL specification.  To handle the Linux
+ * specific crash_stop case, we have to fool mca.c into thinking that the
+ * monarch cpu has already been defined, then clear the monarch cpu to allow
+ * the INIT slaves to resume.  The notify_die callbacks are passed a data
+ * pointer, for MCA/INIT events, data->err is a pointer to a struct
+ * ia64_mca_notify_die.  The data in that structure lets crash_stop decide
+ * which cpu is the monarch and which is the slave, as well as override the
+ * 'wait for slaves' logic in mca.c.
+ *
+ * [1] Ignoring broken proms which assign the wrong values to the monarch flag.
+ *
+ * Another IA64 complication is that struct pt_regs only contains part of the
+ * system state.  IA64 also needs a struct switch_stack in order to give the
+ * unwinder all the state information.  Tasks that have been scheduled off a
+ * cpu already have a switch_stack, but the running tasks do not.  Create a
+ * switch_stack for each running task and store the address of that structure
+ * in the arch specific area of crash_stop_running_process.
+ */
+
+#include <linux/crash_stop.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/ptrace.h>
+#include <asm/kdebug.h>
+#include <asm/mca.h>
+
+int cs_arch_monarch_cpu;
+
+/* cs_arch_cpu() -> unw_init_running() -> cs_ca_switch_stack().  Save
+ * the address of the switch_stack created by unw_init_running() in the arch
+ * specific area of crash_stop_running_process then cs_common_cpu() to
+ * do the rest of the per cpu setup for a crash stop.
+ */
+
+struct cs_ca_data {
+	int monarch;
+	struct crash_stop_running_process *r;
+};
+
+static
+void cs_ca_switch_stack(struct unw_frame_info *info, void *vdata)
+{
+	struct cs_ca_data *data = vdata;
+	int monarch = data->monarch;
+	struct crash_stop_running_process *r = data->r;
+	struct switch_stack *sw;
+	sw = (struct switch_stack *)(info+1);
+	/* padding from unw_init_running */
+	sw = (struct switch_stack *)(((unsigned long)sw + 15) & ~15);
+	r->arch.sw = sw;
+	cs_common_cpu(monarch);
+}
+
+void
+cs_arch_cpu(int monarch, struct crash_stop_running_process *r)
+{
+	struct cs_ca_data data = {
+		.monarch = monarch,
+		.r = r,
+	};
+	unw_init_running(cs_ca_switch_stack, &data);
+}
+
+/* Called at the start of a notify_chain. */
+static int
+cs_arch_notify_start(struct notifier_block *self,
+		     unsigned long val, void *data)
+{
+	struct die_args *args = data;
+	switch(val) {
+	case DIE_OOPS:
+	case DIE_MCA_MONARCH_ENTER:
+	case DIE_INIT_MONARCH_ENTER:
+		cs_notify_chain_start(args->regs);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+/* Called at the end of a notify_chain. */
+static int
+cs_arch_notify_end(struct notifier_block *self,
+		   unsigned long val, void *data)
+{
+	struct die_args *args = data;
+	switch(val) {
+	case DIE_OOPS:
+	case DIE_INIT_MONARCH_LEAVE:
+		cs_notify_chain_end();
+		break;
+	case DIE_MCA_MONARCH_LEAVE:
+		cs_notify_chain_end();
+		/* mca.c passes the recover flag as signr */
+		if (args->signr)
+			crash_stop_recovered();
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static int
+cs_arch_notify_nmi(struct notifier_block *self,
+		   unsigned long val, void *data)
+{
+	struct ia64_mca_notify_die *nd;
+	struct pt_regs *old_regs;
+	struct die_args *args = data;
+	static cpumask_t cs_INIT;
+	int cpu = smp_processor_id();
+	nd = (struct ia64_mca_notify_die *)(args->err);
+
+	switch(val) {
+	/* FIXME: if the MCA rendezvous timeout is increased (case
+	 * DIE_MCA_NEW_TIMEOUT), does crash_stop care about the new limit?  It
+	 * might affect wait_secs in crash_stop() - KAO.
+	 */
+	case DIE_MCA_RENDZVOUS_PROCESS:
+		/* The MCA monarch event has woken up the slaves that were
+		 * suspended via the MCA rendezvous interrupt.  Tell
+		 * crash_stop() that this slave cpu is ready and waiting to be
+		 * debugged.
+		 */
+		old_regs = set_irq_regs(args->regs);
+		crash_stop_slave();
+		set_irq_regs(old_regs);
+		break;
+	case DIE_INIT_ENTER:
+		/* INIT that is sent to all cpus is correctly handled by mca.c.
+		 * If cs_arch_send_nmi() was invoked on IA64 because a cpu was
+		 * spinning disabled then we get a lone INIT event with no
+		 * monarch, or at least not a monarch that mca.c knows about.
+		 * Tell mca.c that we already have a monarch.  Also clear the
+		 * sos->monarch flag, some broken proms incorrectly mark
+		 * individual INIT events as a monarch event.
+		 */
+		oops_in_progress = 1;
+		if (crash_stop_sent_nmi()) {
+			cpu_set(cpu, cs_INIT);
+			*(nd->monarch_cpu) = cs_arch_monarch_cpu;
+			nd->sos->monarch = 0;
+		}
+		break;
+	case DIE_INIT_SLAVE_ENTER:
+		/* This slave INIT event could have come from crash_stop(), it
+		 * could also have come from a global INIT event.  In either
+		 * case, drop into the crash_stop() slave processing.
+		 */
+		old_regs = set_irq_regs(args->regs);
+		crash_stop_slave();
+		set_irq_regs(old_regs);
+		break;
+	case DIE_INIT_SLAVE_PROCESS:
+		/* Reverse the processing for DIE_INIT_ENTER.  Normal mca.c
+		 * processing waits for the MCA (monarch) to release any INIT
+		 * slaves, but we may not have an MCA monarch.  Pretend that
+		 * each slave that was hit with INIT by crash_stop() is a
+		 * monarch, to avoid complicating mca.c any more than it
+		 * already is.
+		 */
+		if (cpu_isset(cpu, cs_INIT)) {
+			cpu_clear(cpu, cs_INIT);
+			*(nd->monarch_cpu) = -1;
+			nd->sos->monarch = 1;
+		}
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+
+static struct notifier_block cs_arch_nb_start = {
+	.notifier_call = cs_arch_notify_start,
+	.priority = ~0U >> 1,
+};
+
+static struct notifier_block cs_arch_nb_end = {
+	.notifier_call = cs_arch_notify_end,
+	.priority = 1,
+};
+
+static struct notifier_block cs_arch_nb_nmi = {
+	.notifier_call = cs_arch_notify_nmi,
+	.priority = 10,
+};
+
+static int __init
+cs_arch_init(void)
+{
+	int err;
+	const char *nb_name;
+	nb_name = "cs_arch_nb_start";
+	if ((err = register_die_notifier(&cs_arch_nb_start)))
+		goto error;
+	nb_name = "cs_arch_nb_end";
+	if ((err = register_die_notifier(&cs_arch_nb_end)))
+		goto error;
+	nb_name = "cs_arch_nb_nmi";
+	if ((err = register_die_notifier(&cs_arch_nb_nmi)))
+		goto error;
+	return 0;
+error:
+	printk(KERN_ERR "Failed to register %s\n", nb_name);
+	unregister_die_notifier(&cs_arch_nb_start);
+	unregister_die_notifier(&cs_arch_nb_end);
+	unregister_die_notifier(&cs_arch_nb_nmi);
+	return err;
+}
+
+static void __exit
+cs_arch_exit(void)
+{
+	unregister_die_notifier(&cs_arch_nb_nmi);
+	unregister_die_notifier(&cs_arch_nb_start);
+	unregister_die_notifier(&cs_arch_nb_end);
+	return;
+}
+
+module_init(cs_arch_init);
+module_exit(cs_arch_exit);
Index: linux/include/asm-ia64/crash_stop.h
===================================================================
--- /dev/null
+++ linux/include/asm-ia64/crash_stop.h
@@ -0,0 +1,11 @@
+#ifndef _ASM_CRASH_STOP_H
+#define _ASM_CRASH_STOP_H
+
+struct crash_stop_running_process_arch
+{
+	struct switch_stack *sw;
+};
+
+extern int cs_arch_monarch_cpu;
+
+#endif	/* _ASM_CRASH_STOP_H */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 10/12] crash_stop: add to config system
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (8 preceding siblings ...)
  2006-11-09  4:05 ` [patch 2.6.19-rc5 9/12] crash_stop: ia64 specific code Keith Owens
@ 2006-11-09  4:05 ` Keith Owens
  2006-11-09  4:05 ` [patch 2.6.19-rc5 11/12] crash_stop: demonstration code Keith Owens
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:05 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

Add CONFIG_CRASH_STOP and CONFIG_CRASH_STOP_SUPPORTED.  Debug code will
select CRASH_STOP.  That in turn will select CRASH_STOP_SUPPORTED if
the config supports crash_stop().  If the supported architecture list
changes then only the CRASH_STOP entry needs to be updated, none of the
tools that select CRASH_STOP have to worry about arch dependent
details.

Signed-off-by: Keith Owens <kaos@sgi.com>
---
 lib/Kconfig.debug |    7 +++++++
 1 file changed, 7 insertions(+)

Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -412,3 +412,10 @@ config LKDTM
 
 	Documentation on how to use the module can be found in
 	drivers/misc/lkdtm.c
+
+config CRASH_STOP
+	bool
+	select CRASH_STOP_SUPPORTED if ( IA64 || (X86 && !X86_VOYAGER) )
+
+config CRASH_STOP_SUPPORTED
+	bool

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 11/12] crash_stop: demonstration code
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (9 preceding siblings ...)
  2006-11-09  4:05 ` [patch 2.6.19-rc5 10/12] crash_stop: add to config system Keith Owens
@ 2006-11-09  4:05 ` Keith Owens
  2006-11-09  4:05 ` [patch 2.6.19-rc5 12/12] crash_stop: test code Keith Owens
  2006-11-11  1:45 ` [patch 2.6.19-rc5 0/12] crash_stop: Summary Vivek Goyal
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:05 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

A quick and dirty crash_stop() demo program.

Notice that this demo is almost completely architecture independent.
It has no code to do per architecture IPI/NMI, nor does it need to save
any per cpu state - all of that is handled by the common code.  IOW, a
debug style tool can concentrate on its own requirements, leaving all
the complicated code to the crash_stop API.

The base patch does not export any crash_stop() symbols.  They can be
added later if any debug style code can be installed in a module.
Since the demo is best used as a module, this patch temporarilly
exports some symbols.

No signed-off-by, this code is not going into the kernel.
---
 kernel/Makefile          |    1 
 kernel/crash_stop.c      |    6 +-
 kernel/crash_stop_demo.c |  134 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug        |   11 +++
 4 files changed, 151 insertions(+), 1 deletion(-)

Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -52,6 +52,7 @@ obj-$(CONFIG_UTS_NS) += utsname.o
 obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_CRASH_STOP_SUPPORTED) += crash_stop.o
+obj-$(CONFIG_CRASH_STOP_DEMO) += crash_stop_demo.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux/kernel/crash_stop.c
===================================================================
--- linux.orig/kernel/crash_stop.c
+++ linux/kernel/crash_stop.c
@@ -197,6 +197,7 @@
 #include <linux/crash_stop.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
+#include <linux/module.h>		/* used by crash_stop_demo */
 #include <linux/ptrace.h>
 #include <linux/nmi.h>
 #include <linux/spinlock.h>
@@ -335,7 +336,7 @@ retry:
 	set_mb(cs_notify_chain_owner, cpu);
 	set_mb(cs_lock_owner, -1);
 	spin_unlock(&cs_lock);
-	crash_stop(cs_notify_callback, NULL, NULL, regs, __FUNCTION__);
+	crash_stop(cs_notify_callback, NULL, printk, regs, __FUNCTION__);
 }
 
 /* Called by the arch specific crash_stop code, when they reach the end of a
@@ -463,6 +464,7 @@ crash_stop_sent_nmi(void)
 {
 	return cpu_isset(smp_processor_id(), cs_sent_nmi);
 }
+EXPORT_SYMBOL(crash_stop_sent_nmi);	/* used by crash_stop_demo */
 #endif	/* CONFIG_SMP */
 
 /* Should only be called by the arch specific crash_stop code, after they have
@@ -752,6 +754,7 @@ retry:
 	set_mb(cs_leaving, 0);
 	return 0;
 }
+EXPORT_SYMBOL(crash_stop);		/* used by crash_stop_demo */
 
 /**
  * crash_stop_recovered: - Release any slaves in crash_stop state.
@@ -841,3 +844,4 @@ crash_stop_slaves(void)
 	else
 		return 0;
 }
+EXPORT_SYMBOL(crash_stop_slaves);	/* used by crash_stop_demo */
Index: linux/kernel/crash_stop_demo.c
===================================================================
--- /dev/null
+++ linux/kernel/crash_stop_demo.c
@@ -0,0 +1,134 @@
+/*
+ * linux/kernel/crash_stop_demo.c
+ *
+ * Copyright (C) 2006 Keith Owens <kaos@sgi.com>
+ *
+ * Demonstrate the use of crash_stop() by debug style code.
+ */
+
+#include <linux/crash_stop.h>
+#include <linux/module.h>
+#include <linux/nmi.h>
+#include <asm/kdebug.h>
+
+MODULE_LICENSE("GPL");
+
+/* The callback function passed to crash_stop() is invoked on each cpu that is
+ * in crash_stop state.  The main crash_stop() code will ensure that slaves are
+ * entered first, followed by the monarch after a short delay.  The debug
+ * specific callback then does its own work.
+ */
+
+static int cs_demo_monarch_entered, cs_demo_monarch_exited;
+atomic_t slave_count;
+
+static void
+cs_demo_callback_monarch(void *data) {
+	printk("%s: entering monarch cpu %d\n",
+	       __FUNCTION__, smp_processor_id());
+	if (cs_demo_monarch_entered) {
+		printk("%s: recursive call detected\n", __FUNCTION__);
+		return;
+	}
+	set_mb(cs_demo_monarch_entered, 1);
+	/* wait for all the slaves to enter */
+	while (atomic_read(&slave_count) != crash_stop_slaves()) {
+		touch_nmi_watchdog();
+		cpu_relax();
+	}
+
+	/* Monarch callback processing using data and struct
+	 * crash_stop_running_process goes here.
+	 */
+
+	set_mb(cs_demo_monarch_exited, 1);
+	/* wait for all the slaves to leave */
+	while (atomic_read(&slave_count)) {
+		touch_nmi_watchdog();
+		cpu_relax();
+	}
+	/* reset state for next entry */
+	set_mb(cs_demo_monarch_entered, 0);
+	set_mb(cs_demo_monarch_exited, 0);
+}
+
+static void
+cs_demo_callback_slave(void *data) {
+	printk("%s: entering slave cpu %d via %s\n",
+	       __FUNCTION__, smp_processor_id(),
+	       crash_stop_sent_nmi() ? "NMI" : "IPI");
+	atomic_inc(&slave_count);
+	while (!cs_demo_monarch_entered) {
+		touch_nmi_watchdog();
+		cpu_relax();
+	}
+	/* Slave callback processing using data goes here.   In most cases the
+	 * slaves will just spin until the monarch releases them.  The main
+	 * crash_stop() code saves the state for each slave cpu before entering
+	 * the callback.  The monarch can used that saved state without the
+	 * callback on the slave cpu doing any more work.
+	 */
+	while (!cs_demo_monarch_exited) {
+		touch_nmi_watchdog();
+		cpu_relax();
+	}
+	atomic_dec(&slave_count);
+}
+
+static void
+cs_demo_callback(int monarch, void *data)
+{
+	if (monarch)
+		cs_demo_callback_monarch(data);
+	else
+		cs_demo_callback_slave(data);
+	printk("%s: leaving cpu %d\n",
+	       __FUNCTION__, smp_processor_id());
+}
+
+/* Handle various kernel error conditions. */
+static int
+cs_demo_notify(struct notifier_block *self,
+	       unsigned long val, void *data)
+{
+	struct die_args *args = data;
+	switch(val) {
+#ifdef	CONFIG_X86
+	case DIE_NMIWATCHDOG:
+#endif
+#ifdef	CONFIG_IA64
+	case DIE_MCA_MONARCH_LEAVE:
+	case DIE_INIT_MONARCH_LEAVE:
+#endif
+	case DIE_OOPS:
+		crash_stop(cs_demo_callback, NULL, printk, args->regs, __FUNCTION__);
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cs_demo_nb = {
+	.notifier_call = cs_demo_notify,
+	.priority = 100,
+};
+
+static int __init
+cs_demo_init(void)
+{
+	int err;
+	if ((err = register_die_notifier(&cs_demo_nb))) {
+		printk(KERN_ERR "%s: failed to register cs_demo_nb\n",
+		       __FUNCTION__);
+		return err;
+	}
+	return 0;
+}
+
+static void __exit
+cs_demo_exit(void)
+{
+	unregister_die_notifier(&cs_demo_nb);
+}
+
+module_init(cs_demo_init)
+module_exit(cs_demo_exit)
Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -419,3 +419,14 @@ config CRASH_STOP
 
 config CRASH_STOP_SUPPORTED
 	bool
+
+config CRASH_STOP_DEMO
+	tristate "Demonstrate the use of crash_stop"
+	default m
+	select CRASH_STOP
+	help
+          Code to demonstrate the use of crash_stop.  Build it as a
+          module and load it.  It will make one cpu spin disabled then
+          call crash_stop.  All slave cpus bar one will get a normal
+          IPI, the spinning cpu will get NMI.  You need at least 3 cpus
+          to run crash_stop_demo.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [patch 2.6.19-rc5 12/12] crash_stop: test code
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (10 preceding siblings ...)
  2006-11-09  4:05 ` [patch 2.6.19-rc5 11/12] crash_stop: demonstration code Keith Owens
@ 2006-11-09  4:05 ` Keith Owens
  2006-11-11  1:45 ` [patch 2.6.19-rc5 0/12] crash_stop: Summary Vivek Goyal
  12 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-09  4:05 UTC (permalink / raw)
  To: linux-arch; +Cc: Keith Owens

A quick and dirty crash_stop() test program.  Most of the code is to
get the machine into a suitable state for testing both the normal IPI
and NMI code.  The interesting crash_stop bits are cs_test_callback*()
and simulate_crash_stop_event().

No signed-off-by, this code is not going into the kernel.
---
 kernel/Makefile          |    1 
 kernel/crash_stop_test.c |  177 +++++++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug        |   11 ++
 3 files changed, 189 insertions(+)

Index: linux/kernel/Makefile
===================================================================
--- linux.orig/kernel/Makefile
+++ linux/kernel/Makefile
@@ -53,6 +53,7 @@ obj-$(CONFIG_TASK_DELAY_ACCT) += delayac
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_CRASH_STOP_SUPPORTED) += crash_stop.o
 obj-$(CONFIG_CRASH_STOP_DEMO) += crash_stop_demo.o
+obj-$(CONFIG_CRASH_STOP_TEST) += crash_stop_test.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
Index: linux/kernel/crash_stop_test.c
===================================================================
--- /dev/null
+++ linux/kernel/crash_stop_test.c
@@ -0,0 +1,177 @@
+/*
+ * linux/kernel/crash_stop_test.c
+ *
+ * Copyright (C) 2006 Keith Owens <kaos@sgi.com>
+ *
+ * Test crash_stop().  This module requires at least 2 slave cpus, plus the
+ * monarch cpu.  One of the slaves is put into a disabled spin loop, the other
+ * slaves are left alone.  The monarch calls crash_stop().  Most of the slaves
+ * will respond to the normal IPI, the disabled cpu will only respond to NMI.
+ *
+ * If test_watchdog is non-zero, the monarch exercises the crash_stop code
+ * that handles the NMI watchdog, but only on i386 or x86_64.  After putting
+ * one of the other cpus into a disabled spin, the monarch itself spins
+ * disabled.  When the nmi_watchdog trips (boot with nmi_watchdog=1 or
+ * nmi_watchdog=2), the kernel drives the notify_die chain with
+ * DIE_NMIWATCHDOG.
+ *
+ * If test_oops is non-zero, the monarch generates an oops.
+ *
+ * For both test_watchdog=1 and test_oops=1, you will first need to load a
+ * debug style tool that uses crash_stop and intercepts DIE_NMIWATCHDOG and
+ * DIE_OOPS.  modprobe crash_stop_demo will work, or you can load and test your
+ * own tool.
+ */
+
+#include <linux/cpumask.h>
+#include <linux/crash_stop.h>
+#include <linux/delay.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/nmi.h>
+#include <asm/kdebug.h>
+
+MODULE_LICENSE("GPL");
+
+static int test_watchdog;
+static int test_oops;
+
+module_param(test_watchdog, int, 0444);
+module_param(test_oops, int, 0444);
+
+static int cs_test_do_spin, cs_test_spinning;
+static DECLARE_COMPLETION(cs_test_done);
+
+#ifdef	CONFIG_X86
+static int
+cs_test_notify(struct notifier_block *self,
+	       unsigned long val, void *data)
+{
+	switch(val) {
+	case DIE_NMIWATCHDOG:
+		test_watchdog = 0;
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block cs_test_nb = {
+	.notifier_call = cs_test_notify,
+	.priority = 20,
+};
+#endif	/* CONFIG_X86 */
+
+static void
+cs_test_callback(int monarch, void *data)
+{
+	printk("%s: cpu %d monarch %d\n",
+	       __FUNCTION__, smp_processor_id(), monarch);
+	set_mb(cs_test_do_spin, 0);
+	set_mb(test_watchdog, 0);
+}
+
+static void
+simulate_crash_stop_event(void)
+{
+	oops_in_progress = 1;
+	if (test_oops)
+		BUG();
+	printk("%s: cpu %d starting\n", __FUNCTION__, smp_processor_id());
+	local_irq_disable();
+	while (test_watchdog)
+		cpu_relax();
+	/* crash_stop() is usually called from an error state where pt_regs are
+	 * available and interrupts are already disabled.  For the test, use a
+	 * NULL pt_regs and disable interrupts by hand.  Use printk as the test
+	 * I/O routine, even though that is not always a good choice (not NMI
+	 * safe).
+	 */
+	crash_stop(cs_test_callback, NULL, printk, NULL, "cs_test");
+	local_irq_enable();
+	printk("%s: cpu %d leaving\n", __FUNCTION__, smp_processor_id());
+}
+
+/* spin disabled on one cpu until the crash_stop test has finished */
+static int
+cs_test_spin(void *vdata)
+{
+	set_mb(cs_test_spinning, 1);
+	if (test_watchdog)
+		mdelay(2000);
+	local_irq_disable();
+	while (cs_test_do_spin) {
+		if (!test_watchdog)
+			touch_nmi_watchdog();
+		cpu_relax();
+		mb();
+	}
+	printk("%s: cpu %d leaving\n", __FUNCTION__, smp_processor_id());
+	local_irq_enable();
+	complete(&cs_test_done);
+	do_exit(0);
+}
+
+/* Get the various cpus into a suitable state for testing crash_stop(),
+ * including NMI processing.  In real life, the system would already be dying
+ * before crash_stop() was invoked.
+ */
+static int __init
+cs_test_init(void)
+{
+	struct task_struct *p;
+	int c, disabled = 0, this_cpu = get_cpu(), slaves = 0;
+	oops_in_progress = 1;
+
+	printk("%s: monarch is cpu %d\n",
+	       __FUNCTION__, this_cpu);
+	set_cpus_allowed(current, cpumask_of_cpu(this_cpu));
+	put_cpu();
+	for_each_online_cpu(c) {
+		if (c != this_cpu) {
+			++slaves;
+			disabled = c;
+		}
+	}
+	if (slaves < 2) {
+		printk(KERN_ERR "%s needs at least two slave cpus\n",
+		       __FUNCTION__);
+		return -EINVAL;
+	}
+
+#ifdef	CONFIG_X86
+	if ((c = register_die_notifier(&cs_test_nb))) {
+		printk(KERN_ERR "%s: failed to register cs_test_nb\n",
+		       __FUNCTION__);
+		return c;
+	}
+#endif	/* CONFIG_X86 */
+
+	init_completion(&cs_test_done);
+	set_mb(cs_test_do_spin, 1);
+	p = kthread_create(cs_test_spin, NULL, "kcrash_stop_test");
+	if (IS_ERR(p))
+		return PTR_ERR(p);
+	kthread_bind(p, disabled);
+	wake_up_process(p);
+	while (!cs_test_spinning)
+		cpu_relax();
+	printk("%s: cpu %d is spinning disabled\n",
+	       __FUNCTION__, disabled);
+
+	simulate_crash_stop_event();
+
+	set_mb(cs_test_do_spin, 0);
+	wait_for_completion(&cs_test_done);
+	return 0;
+}
+
+static void __exit
+cs_test_exit(void)
+{
+#ifdef	CONFIG_X86
+	unregister_die_notifier(&cs_test_nb);
+#endif	/* CONFIG_X86 */
+}
+
+module_init(cs_test_init)
+module_exit(cs_test_exit)
Index: linux/lib/Kconfig.debug
===================================================================
--- linux.orig/lib/Kconfig.debug
+++ linux/lib/Kconfig.debug
@@ -430,3 +430,14 @@ config CRASH_STOP_DEMO
           call crash_stop.  All slave cpus bar one will get a normal
           IPI, the spinning cpu will get NMI.  You need at least 3 cpus
           to run crash_stop_demo.
+
+config CRASH_STOP_TEST
+	tristate "Test crash_stop"
+	default m
+	help
+          Code to test the use of crash_stop.  Build it as a module and
+          load it.  It will make one cpu spin disabled then generate an
+          oops or NMI.  All slave cpus bar one will get a normal IPI,
+          the spinning cpu will get NMI.  You need at least 3 cpus to
+          run crash_stop_test.  You can also test the NMI watchdog and
+          oops handling of crash_stop, see kernel/crash_stop_test.c.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 2.6.19-rc5 0/12] crash_stop: Summary
  2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
                   ` (11 preceding siblings ...)
  2006-11-09  4:05 ` [patch 2.6.19-rc5 12/12] crash_stop: test code Keith Owens
@ 2006-11-11  1:45 ` Vivek Goyal
  2006-11-13  2:08   ` Keith Owens
  12 siblings, 1 reply; 15+ messages in thread
From: Vivek Goyal @ 2006-11-11  1:45 UTC (permalink / raw)
  To: Keith Owens; +Cc: linux-arch

On Thu, Nov 09, 2006 at 03:04:18PM +1100, Keith Owens wrote:
Hi Keith,

> All the kernel debug style tools (kdb, kgdb, nlkd, netdump, lkcd,
> crash, kdump etc.) have a common requirement, they need to do a crash
> stop of the systems.  This means stopping all the cpus, even if some of
> the cpus are spinning disabled.  In addition, each cpu has to save
> enough state to start the diagnosis of the problem.
> 
> * Each debug style tool has written its own code for interrupting the
>   other cpus and for saving cpu state.
> 
> * Some tools try a normal IPI first then send a non-maskable interrupt
>   after a delay.
> 
> * Some tools always send a NMI first, which can result in incomplete or
>   wrong machine state if NMI arrives at the wrong time.
> 

What kind of problem one can run into if NMI is sent directly instead
of trying an normal IPI first?

On a general note, I am not sure how well suited this infrastructure
is for crash dump needs. We are trying to follow one theme and that
is run a bare minimal code after a system crash to increase reliabitity.

Avoid taking locks avoid relying on crashed task's stack etc. (if possible).
Of course it is an ideal situation and we have not achieved that state but
roughly that seems to be the long term goal. Looking at the patches it
looks like it introduces lots of code to be run after crash and also
uses smp_processor_id() which introduces a dependency on stack. This poses
problem in stack overflow cases. Fernando from valinux, introduced 
safe_smp_processor_id() call to read apic id from LAPIC instead of relying
on the thread's stack. As of today, the crash path is no safe from stack
overflow but we hope that someday it would be.

So I think crash dump will be a little special case and we need to be 
careful while introducing extra code that new code will be able to run even 
if system is in very bad state. (for example stack overflow).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [patch 2.6.19-rc5 0/12] crash_stop: Summary
  2006-11-11  1:45 ` [patch 2.6.19-rc5 0/12] crash_stop: Summary Vivek Goyal
@ 2006-11-13  2:08   ` Keith Owens
  0 siblings, 0 replies; 15+ messages in thread
From: Keith Owens @ 2006-11-13  2:08 UTC (permalink / raw)
  To: vgoyal; +Cc: linux-arch

Vivek Goyal (on Fri, 10 Nov 2006 20:45:05 -0500) wrote:
>On Thu, Nov 09, 2006 at 03:04:18PM +1100, Keith Owens wrote:
>Hi Keith,
>
>> All the kernel debug style tools (kdb, kgdb, nlkd, netdump, lkcd,
>> crash, kdump etc.) have a common requirement, they need to do a crash
>> stop of the systems.  This means stopping all the cpus, even if some of
>> the cpus are spinning disabled.  In addition, each cpu has to save
>> enough state to start the diagnosis of the problem.
>> 
>> * Each debug style tool has written its own code for interrupting the
>>   other cpus and for saving cpu state.
>> 
>> * Some tools try a normal IPI first then send a non-maskable interrupt
>>   after a delay.
>> 
>> * Some tools always send a NMI first, which can result in incomplete or
>>   wrong machine state if NMI arrives at the wrong time.
>> 
>
>What kind of problem one can run into if NMI is sent directly instead
>of trying an normal IPI first?

Incomplete cpu state on the cpus that are hit with NMI.  By definition,
NMI can be delivered at any time, with the cpu in any state.  It can be
in the middle of saving the state for a previous interrupt when NMI is
delivered.  On IA64, a cpu can even be in physical mode instead of
virtual mode when INIT is delivered.  All of which means that you have
incomplete or misleading information in your dump.

Sending a normal interrupt first, waiting a short while, then sending
an NMI later maximises the chance that we get good state for every cpu.

>On a general note, I am not sure how well suited this infrastructure
>is for crash dump needs. We are trying to follow one theme and that
>is run a bare minimal code after a system crash to increase reliabitity.

Agreed.  But you still have to stop all the existing cpus _and_ capture
their state before switching to the second kernel.  There is no point
in switching to a new kernel if you cannot get information about all
the cpus from the failing kernel.

>Avoid taking locks avoid relying on crashed task's stack etc. (if possible).
>Of course it is an ideal situation and we have not achieved that state but
>roughly that seems to be the long term goal. Looking at the patches it
>looks like it introduces lots of code to be run after crash and also
>uses smp_processor_id() which introduces a dependency on stack. This poses
>problem in stack overflow cases. Fernando from valinux, introduced 
>safe_smp_processor_id() call to read apic id from LAPIC instead of relying
>on the thread's stack. As of today, the crash path is no safe from stack
>overflow but we hope that someday it would be.

Good point.  I will look at converting crash_stop to use
safe_smp_processor_id.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2006-11-13  2:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-09  4:04 [patch 2.6.19-rc5 0/12] crash_stop: Summary Keith Owens
2006-11-09  4:04 ` [patch 2.6.19-rc5 1/12] crash_stop: common header Keith Owens
2006-11-09  4:04 ` [patch 2.6.19-rc5 2/12] crash_stop: common code Keith Owens
2006-11-09  4:04 ` [patch 2.6.19-rc5 3/12] crash_stop: i386 interrupt handlers Keith Owens
2006-11-09  4:04 ` [patch 2.6.19-rc5 4/12] crash_stop: i386 specific code Keith Owens
2006-11-09  4:04 ` [patch 2.6.19-rc5 5/12] crash_stop: add DIE_NMIWATCHDOG to x86_64 Keith Owens
2006-11-09  4:04 ` [patch 2.6.19-rc5 6/12] crash_stop: x86_64 interrupt handlers Keith Owens
2006-11-09  4:04 ` [patch 2.6.19-rc5 7/12] crash_stop: x86_64 specific code Keith Owens
2006-11-09  4:05 ` [patch 2.6.19-rc5 8/12] crash_stop: ia64 interrupt handlers Keith Owens
2006-11-09  4:05 ` [patch 2.6.19-rc5 9/12] crash_stop: ia64 specific code Keith Owens
2006-11-09  4:05 ` [patch 2.6.19-rc5 10/12] crash_stop: add to config system Keith Owens
2006-11-09  4:05 ` [patch 2.6.19-rc5 11/12] crash_stop: demonstration code Keith Owens
2006-11-09  4:05 ` [patch 2.6.19-rc5 12/12] crash_stop: test code Keith Owens
2006-11-11  1:45 ` [patch 2.6.19-rc5 0/12] crash_stop: Summary Vivek Goyal
2006-11-13  2:08   ` Keith Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).