public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] make INIT# handler call panic
@ 2004-11-05 13:55 Cliff Larsen
  2004-11-05 16:26 ` Bjorn Helgaas
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Cliff Larsen @ 2004-11-05 13:55 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

This is a small patch to enable a change of behavior when the INIT#
interrupt is received. By including a command line argument 
'ia64initpanic', the init_handler_platform will call panic. This 
is useful because the system is then connected to a call chain and 
machine restart. The call chain may invoke a netdump or diskdump 
routine. I think this would be useful for 2.6 as well.

Without the cmdline arg, the existing behavior is left unchanged
(printing task backtraces to the console and spinning forever).

The patch is off 2.4.27 
-- 
Cliff Larsen <clarsen@egenera.com>

[-- Attachment #2: init_handler_platform-2.4.27.patch --]
[-- Type: text/plain, Size: 1644 bytes --]

--- linux-2.4.27.orig/Documentation/kernel-parameters.txt	2004-08-07 19:26:04.000000000 -0400
+++ linux-2.4.27/Documentation/kernel-parameters.txt	2004-11-04 15:45:46.000000000 -0500
@@ -251,6 +251,10 @@
 
 	i810=		[HW,DRM]
 
+	ia64initpanic	[IA-64,KNL] Causes INIT# handler to call panic()
+			which connects to a notifier list and
+			machine_restart rather than spinning forever.
+
 	ibmmcascsi=	[HW,MCA,SCSI] IBM MicroChannel SCSI adapter.
 
 	icn=		[HW,ISDN]
--- linux-2.4.27.orig/arch/ia64/kernel/mca.c	2004-04-14 09:05:26.000000000 -0400
+++ linux-2.4.27/arch/ia64/kernel/mca.c	2004-11-04 15:45:46.000000000 -0500
@@ -425,6 +425,15 @@
 	PUT_NAT_BIT(sw->caller_unat, &pt->r30);	PUT_NAT_BIT(sw->caller_unat, &pt->r31);
 }
 
+static int ia64initpanic = 0;
+static int __init ia64initpanic_setup(char *str)
+{
+	printk(KERN_INFO "ia64: panic on INIT# interrupt\n");
+	ia64initpanic = 1;
+	return 1;
+}
+__setup("ia64initpanic", ia64initpanic_setup);
+
 static void
 init_handler_platform (pal_min_state_area_t *ms,
 		       struct pt_regs *pt, struct switch_stack *sw)
@@ -434,6 +443,13 @@
 	/* if a kernel debugger is available call it here else just dump the registers */
 
 	/*
+	 * if ia64initpanic is present on the cmdline,
+	 * panic so that we get to notifier_call_chain and restart
+	 */
+	if (ia64initpanic)
+		panic("INIT# received by processor %d", smp_processor_id());
+
+	/*
 	 * Wait for a bit.  On some machines (e.g., HP's zx2000 and zx6000, INIT can be
 	 * generated via the BMC's command-line interface, but since the console is on the
 	 * same serial line, the user will need some time to switch out of the BMC before

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
@ 2004-11-05 16:26 ` Bjorn Helgaas
  2004-11-05 21:04 ` Cliff Larsen
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2004-11-05 16:26 UTC (permalink / raw)
  To: linux-ia64

On Friday 05 November 2004 6:55 am, Cliff Larsen wrote:
> The patch is off 2.4.27 

There's not much happening on 2.4 these days.  And there's
plenty of room for improvement in the 2.6 INIT handler,
hint, hint ;-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
  2004-11-05 16:26 ` Bjorn Helgaas
@ 2004-11-05 21:04 ` Cliff Larsen
  2004-11-05 22:04 ` Bjorn Helgaas
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Cliff Larsen @ 2004-11-05 21:04 UTC (permalink / raw)
  To: linux-ia64

On Fri, 2004-11-05 at 11:26, Bjorn Helgaas wrote:
> On Friday 05 November 2004 6:55 am, Cliff Larsen wrote:
> > The patch is off 2.4.27 
> 
> There's not much happening on 2.4 these days.  And there's
> plenty of room for improvement in the 2.6 INIT handler,
> hint, hint ;-)

I've been working with 2.4 so I thought it would be appropriate
to submit the patch with its latest version. I've not gotten to 
2.6 yet. I have looked at 2.6 sources and essentially the same 
patch would apply. What do you think of the concept of the patch
and its utility in 2.6?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
  2004-11-05 16:26 ` Bjorn Helgaas
  2004-11-05 21:04 ` Cliff Larsen
@ 2004-11-05 22:04 ` Bjorn Helgaas
  2004-11-05 22:57 ` Cliff Larsen
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Bjorn Helgaas @ 2004-11-05 22:04 UTC (permalink / raw)
  To: linux-ia64

On Friday 05 November 2004 2:04 pm, Cliff Larsen wrote:
> I've been working with 2.4 so I thought it would be appropriate
> to submit the patch with its latest version. I've not gotten to 
> 2.6 yet. I have looked at 2.6 sources and essentially the same 
> patch would apply. What do you think of the concept of the patch
> and its utility in 2.6?

Yeah, I'm sure it would apply easily to 2.6.  Sorry, I guess I
was just being lazy because I haven't paid much attention to
the MCA/INIT path recently.  Some of the folks who have will
probably jump in.

My $0.02 is that it *is* annoying that we just hang after printing
the INIT register state and backtraces.  However, I wonder if we
could just leverage the existing panic_timeout (set by "panic=")
so we don't need a new parameter.

I don't have an opinion about whether calling panic from
init_handler_platform() is the right thing to do or not.
Certainly it is a good place for some sort of hook for a
debugger and/or crashdump.

My personal preference would be something like this:
   1) dump register state (for all CPUs, not just the INIT monarch)
      on the console
   2) print backtraces (maybe just for currently-running tasks;
      currently we do the task on the INIT monarch plus all other
      non-running tasks, which is definitely non-optimal)
   3) optional debugger/crashdump hook
   4) call panic (maybe)
   5) optional timeout, then reboot (if not calling panic)

Part 5 would be trivial and probably not *too* controversial.
Part 1 is harder but extremely useful, and I think someone (Zoltan?)
posted a start.  Part 2 should be simple given part 1.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
                   ` (2 preceding siblings ...)
  2004-11-05 22:04 ` Bjorn Helgaas
@ 2004-11-05 22:57 ` Cliff Larsen
  2004-11-05 23:04 ` Russ Anderson
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Cliff Larsen @ 2004-11-05 22:57 UTC (permalink / raw)
  To: linux-ia64

On Fri, 2004-11-05 at 17:04, Bjorn Helgaas wrote:
> My $0.02 is that it *is* annoying that we just hang after printing
> the INIT register state and backtraces.  However, I wonder if we
> could just leverage the existing panic_timeout (set by "panic=")
> so we don't need a new parameter.

I'm fine with that.

> I don't have an opinion about whether calling panic from
> init_handler_platform() is the right thing to do or not.
> Certainly it is a good place for some sort of hook for a
> debugger and/or crashdump.

My major motivation was to get to a crashdump hook and get 
to restart, and panic does both, so I chose it.

> My personal preference would be something like this:
>    1) dump register state (for all CPUs, not just the INIT monarch)
>       on the console
>    2) print backtraces (maybe just for currently-running tasks;
>       currently we do the task on the INIT monarch plus all other
>       non-running tasks, which is definitely non-optimal)
>    3) optional debugger/crashdump hook
>    4) call panic (maybe)
>    5) optional timeout, then reboot (if not calling panic)
> 
> Part 5 would be trivial and probably not *too* controversial.
> Part 1 is harder but extremely useful, and I think someone (Zoltan?)
> posted a start.  Part 2 should be simple given part 1.

I'll see what I can do about most of these. Part 1 would be
difficult since the hardware/firmware we've currently got 
available makes both processors the monarch on INIT.

Thanks for your feedback,
   Cliff


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
                   ` (3 preceding siblings ...)
  2004-11-05 22:57 ` Cliff Larsen
@ 2004-11-05 23:04 ` Russ Anderson
  2004-11-08 12:14 ` Takao Indoh
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Russ Anderson @ 2004-11-05 23:04 UTC (permalink / raw)
  To: linux-ia64

Bjorn Helgaas wrote:
> 
> My personal preference would be something like this:
>    1) dump register state (for all CPUs, not just the INIT monarch)
>       on the console
>    2) print backtraces (maybe just for currently-running tasks;
>       currently we do the task on the INIT monarch plus all other
>       non-running tasks, which is definitely non-optimal)
>    3) optional debugger/crashdump hook
>    4) call panic (maybe)
>    5) optional timeout, then reboot (if not calling panic)
> 
> Part 5 would be trivial and probably not *too* controversial.
> Part 1 is harder but extremely useful, and I think someone (Zoltan?)
> posted a start.  Part 2 should be simple given part 1.

I agree.  I am working on part 1 (per cpu MCA/INIT save areas).

For example, the following sample patch:
  1) Reserves ar.k3 for a pointer to this cpu's mca info save area.
  2) Defines the struct layout of the save area.
  3) Allocates the memory for the save area (at boot time).

The part that I'm debugging it tying this into mca_asm.S.

-----------------------------------------------------------
Index: sles9-sgidev/linux/include/asm-ia64/kregs.h
=================================--- sles9-sgidev.orig/linux/include/asm-ia64/kregs.h	2004-02-23 22:44:17.000000000 -0600
+++ sles9-sgidev/linux/include/asm-ia64/kregs.h	2004-11-04 11:12:06.000000000 -0600
@@ -14,6 +14,7 @@
  */
 #define IA64_KR_IO_BASE		0	/* ar.k0: legacy I/O base address */
 #define IA64_KR_TSSD		1	/* ar.k1: IVE uses this as the TSSD */
+#define IA64_KR_MCA_INFO	3	/* ar.k3: phys addr of this cpu's mca_info struct */
 #define IA64_KR_CURRENT_STACK	4	/* ar.k4: what's mapped in IA64_TR_CURRENT_STACK */
 #define IA64_KR_FPU_OWNER	5	/* ar.k5: fpu-owner (UP only, at the moment) */
 #define IA64_KR_CURRENT		6	/* ar.k6: "current" task pointer */
Index: sles9-sgidev/linux/include/asm-ia64/mca.h
=================================--- sles9-sgidev.orig/linux/include/asm-ia64/mca.h	2004-02-23 23:57:45.000000000 -0600
+++ sles9-sgidev/linux/include/asm-ia64/mca.h	2004-11-04 12:38:23.000000000 -0600
@@ -107,6 +107,15 @@
 						 */
 } ia64_mca_os_to_sal_state_t;
 
+typedef struct ia64_mca_cpu_s {
+	u64		ia64_mca_proc_state_dump[512];
+	u64		ia64_mca_stack[1024] __attribute__((aligned(16)));
+	u64		ia64_mca_stackframe[32];
+	u64		ia64_mca_bspstore[1024];
+	u64		ia64_init_stack[KERNEL_STACK_SIZE/8] __attribute__((aligned(16)));
+	struct ia64_mca_tlb_info ia64_mca_cpu_tlb;
+} ia64_mca_cpu_t;
+
 extern void ia64_mca_init(void);
 extern void ia64_os_mca_dispatch(void);
 extern void ia64_os_mca_dispatch_end(void);
Index: sles9-sgidev/linux/arch/ia64/mm/discontig.c
=================================--- sles9-sgidev.orig/linux/arch/ia64/mm/discontig.c	2004-09-24 08:43:54.000000000 -0500
+++ sles9-sgidev/linux/arch/ia64/mm/discontig.c	2004-11-04 14:36:23.000000000 -0600
@@ -4,6 +4,10 @@
  * Copyright (c) 2001 Tony Luck <tony.luck@intel.com>
  * Copyright (c) 2002 NEC Corp.
  * Copyright (c) 2002 Kimio Suganuma <k-suganuma@da.jp.nec.com>
+ * Copyright (c) 2003-2004 Silicon Graphics, Inc
+ *      Russ Anderson <rja@sgi.com>
+ *      Jesse Barnes <jbarnes@sgi.com>
+ *      Jack Steiner <steiner@sgi.com>
  */
 
 /*
@@ -21,6 +25,7 @@
 #include <asm/meminit.h>
 #include <asm/numa.h>
 #include <asm/sections.h>
+#include <asm/mca.h>
 
 /*
  * Track per-node information needed to setup the boot memory allocator, the
@@ -203,12 +208,33 @@
 }
 
 /**
+ * early_nr_phys_cpus_node - return number of physical cpus on a given node
+ * @node: node to check
+ *
+ * Count the number of physical cpus on @node.  These are cpus that actually
+ * exist.  We can't use nr_cpus_node() yet because
+ * acpi_boot_init() (which builds the node_to_cpu_mask array) hasn't been
+ * called yet.
+ */
+static int early_nr_phys_cpus_node(int node)
+{
+	int cpu, n = 0;
+
+	for (cpu = 0; cpu < NR_CPUS; cpu++)
+		if (node = node_cpuid[cpu].nid)
+			if ((cpu = 0) || node_cpuid[cpu].phys_id)
+				n++;
+
+	return n;
+}
+
+/**
  * early_nr_cpus_node - return number of cpus on a given node
  * @node: node to check
  *
  * Count the number of cpus on @node.  We can't use nr_cpus_node() yet because
  * acpi_boot_init() (which builds the node_to_cpu_mask array) hasn't been
- * called yet.
+ * called yet.  Note that node 0 will also count all non-existent cpus.
  */
 static int early_nr_cpus_node(int node)
 {
@@ -235,12 +261,15 @@
  *   |                        |
  *   |~~~~~~~~~~~~~~~~~~~~~~~~| <-- NODEDATA_ALIGN(start, node) for the first
  *   |    PERCPU_PAGE_SIZE *  |     start and length big enough
- *   |        NR_CPUS         |
+ *   |    cpus_on_this_node   | Node 0 will also have entries for all non-existent cpus.
  *   |------------------------|
  *   |   local pg_data_t *    |
  *   |------------------------|
  *   |  local ia64_node_data  |
  *   |------------------------|
+ *   |    MCA/INIT data *     |
+ *   |    cpus_on_this_node   |
+ *   |------------------------|
  *   |          ???           |
  *   |________________________|
  *
@@ -252,9 +281,9 @@
 static int __init find_pernode_space(unsigned long start, unsigned long len,
 				     int node)
 {
-	unsigned long epfn, cpu, cpus;
+	unsigned long epfn, cpu, cpus, phys_cpus;
 	unsigned long pernodesize = 0, pernode, pages, mapsize;
-	void *cpu_data;
+	void *cpu_data, *mca_data_phys;
 	struct bootmem_data *bdp = &mem_data[node].bootmem_data;
 
 	epfn = (start + len) >> PAGE_SHIFT;
@@ -278,9 +307,11 @@
 	 * for good alignment and alias prevention.
 	 */
 	cpus = early_nr_cpus_node(node);
+	phys_cpus = early_nr_phys_cpus_node(node);
 	pernodesize += PERCPU_PAGE_SIZE * cpus;
 	pernodesize += L1_CACHE_ALIGN(sizeof(pg_data_t));
 	pernodesize += L1_CACHE_ALIGN(sizeof(struct ia64_node_data));
+	pernodesize += L1_CACHE_ALIGN(sizeof(ia64_mca_cpu_t)) * phys_cpus;
 	pernodesize = PAGE_ALIGN(pernodesize);
 	pernode = NODEDATA_ALIGN(start, node);
 
@@ -299,6 +330,9 @@
 		mem_data[node].node_data = __va(pernode);
 		pernode += L1_CACHE_ALIGN(sizeof(struct ia64_node_data));
 
+		mca_data_phys = (void *)pernode;
+		pernode += L1_CACHE_ALIGN(sizeof(ia64_mca_cpu_t)) * phys_cpus;
+
 		mem_data[node].pgdat->bdata = bdp;
 		pernode += L1_CACHE_ALIGN(sizeof(pg_data_t));
 
@@ -311,6 +345,14 @@
 			if (node = node_cpuid[cpu].nid) {
 				memcpy(__va(cpu_data), __phys_per_cpu_start,
 				       __per_cpu_end - __per_cpu_start);
+				if ((cpu = 0) || (node_cpuid[cpu].phys_id > 0)) {
+					ia64_set_kr(IA64_KR_MCA_INFO, __pa(mca_data_phys));
+					mca_data_phys += L1_CACHE_ALIGN(sizeof(ia64_mca_cpu_t));
+				}
 				__per_cpu_offset[cpu] = (char*)__va(cpu_data) -
 					__per_cpu_start;
 				cpu_data += PERCPU_PAGE_SIZE;


-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
                   ` (4 preceding siblings ...)
  2004-11-05 23:04 ` Russ Anderson
@ 2004-11-08 12:14 ` Takao Indoh
  2004-11-10 15:53 ` Philip R Auld
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Takao Indoh @ 2004-11-08 12:14 UTC (permalink / raw)
  To: linux-ia64

Hi,

On Fri, 05 Nov 2004 17:57:29 -0500, Cliff Larsen wrote:

>> I don't have an opinion about whether calling panic from
>> init_handler_platform() is the right thing to do or not.
>> Certainly it is a good place for some sort of hook for a
>> debugger and/or crashdump.
>
>My major motivation was to get to a crashdump hook and get 
>to restart, and panic does both, so I chose it.

IIRC, LKCD is invoked by panic_notifier_list in the panic(), so
LKCD may work correctly. But diskdump/netdump may not. They
are called via BUG(). For example, netdump is called from the following
BUG().

NORET_TYPE void panic(const char * fmt, ...)
{
(snipped)
	bust_spinlocks(1);
	va_start(args, fmt);
	vsprintf(buf, fmt, args);
	va_end(args);
	printk(KERN_EMERG "Kernel panic: %s\n",buf);
	if (netdump_func)
		BUG();

Normally BUG() invokes exception handler and dump function is called.
But, I am not sure exception handler is correctly invoked from the INIT
context.


>> My personal preference would be something like this:
>>    1) dump register state (for all CPUs, not just the INIT monarch)
>>       on the console
>>    2) print backtraces (maybe just for currently-running tasks;
>>       currently we do the task on the INIT monarch plus all other
>>       non-running tasks, which is definitely non-optimal)
>>    3) optional debugger/crashdump hook
>>    4) call panic (maybe)
>>    5) optional timeout, then reboot (if not calling panic)
>> 
>> Part 5 would be trivial and probably not *too* controversial.
>> Part 1 is harder but extremely useful, and I think someone (Zoltan?)
>> posted a start.  Part 2 should be simple given part 1.
>
>I'll see what I can do about most of these. Part 1 would be
>difficult since the hardware/firmware we've currently got 
>available makes both processors the monarch on INIT.

Even if crashdump hook is added into the init_handler, dump does not
work correctly because of single INIT stack. Therefore Russ Anderson's
patch which separates INIT stack is also indispensable.

Regards,
Takao Indoh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
                   ` (5 preceding siblings ...)
  2004-11-08 12:14 ` Takao Indoh
@ 2004-11-10 15:53 ` Philip R Auld
  2004-11-11  0:55 ` Takao Indoh
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Philip R Auld @ 2004-11-10 15:53 UTC (permalink / raw)
  To: linux-ia64

Hi,

Rumor has it that on Mon, Nov 08, 2004 at 09:14:21PM +0900 Takao Indoh said:
> Hi,
> 
> On Fri, 05 Nov 2004 17:57:29 -0500, Cliff Larsen wrote:
> 
> >> I don't have an opinion about whether calling panic from
> >> init_handler_platform() is the right thing to do or not.
> >> Certainly it is a good place for some sort of hook for a
> >> debugger and/or crashdump.
> >
> >My major motivation was to get to a crashdump hook and get 
> >to restart, and panic does both, so I chose it.
> 
> IIRC, LKCD is invoked by panic_notifier_list in the panic(), so
> LKCD may work correctly. But diskdump/netdump may not. They
> are called via BUG(). For example, netdump is called from the following
> BUG().
> 

Calling BUG would also work, assuming the hooks are in the
BUG path. I'm not seeing that in 2.6.8 anyway.


> Normally BUG() invokes exception handler and dump function is called.
> But, I am not sure exception handler is correctly invoked from the INIT
> context.

This doesn't currently do much in ia64 as far as I can tell. It ends up 
in die via die_if_kernel, but that doesn't look like it will ever get to a 
machine restart, much less a crash dump or even a for(;;) loop. I may be 
missing something though. I'm pretty new to Itanium. 

In i386 there is panic_on_oops in die which can at least get to the 
panic call chain (as there used to be in ia64).

None of the dump stuff is in the stock kernels yet is it?

> 
> 
> >> My personal preference would be something like this:
> >>    1) dump register state (for all CPUs, not just the INIT monarch)
> >>       on the console
> >>    2) print backtraces (maybe just for currently-running tasks;
> >>       currently we do the task on the INIT monarch plus all other
> >>       non-running tasks, which is definitely non-optimal)
> >>    3) optional debugger/crashdump hook
> >>    4) call panic (maybe)
> >>    5) optional timeout, then reboot (if not calling panic)
> >> 
> >> Part 5 would be trivial and probably not *too* controversial.
> >> Part 1 is harder but extremely useful, and I think someone (Zoltan?)
> >> posted a start.  Part 2 should be simple given part 1.
> >
> >I'll see what I can do about most of these. Part 1 would be
> >difficult since the hardware/firmware we've currently got 
> >available makes both processors the monarch on INIT.
> 
> Even if crashdump hook is added into the init_handler, dump does not
> work correctly because of single INIT stack. Therefore Russ Anderson's
> patch which separates INIT stack is also indispensable.
> 

We are still mostly a working with 2.4 (rhel3 which has netdump_func hooks) 
and this all worked fine. A crashdump hook, a call to panic, or a call 
to BUG each worked.

I doesn't look like anything but the crashdump hook can work in stock
2.6.8 since there are no dump routine calls in the panic or die paths.

Cheers,

Phil

> Regards,
> Takao Indoh

-- 
Philip R. Auld, Ph.D.  	        	       Egenera, Inc.    
Software Architect                            165 Forest St.
(508) 858-2628                            Marlboro, MA 01752

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
                   ` (6 preceding siblings ...)
  2004-11-10 15:53 ` Philip R Auld
@ 2004-11-11  0:55 ` Takao Indoh
  2004-11-11  1:14 ` Luck, Tony
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Takao Indoh @ 2004-11-11  0:55 UTC (permalink / raw)
  To: linux-ia64

On Wed, 10 Nov 2004 10:53:38 -0500, Philip R Auld wrote:

>> Normally BUG() invokes exception handler and dump function is called.
>> But, I am not sure exception handler is correctly invoked from the INIT
>> context.
>
>This doesn't currently do much in ia64 as far as I can tell. It ends up 
>in die via die_if_kernel, but that doesn't look like it will ever get to a 
>machine restart, much less a crash dump or even a for(;;) loop. I may be 
>missing something though. I'm pretty new to Itanium. 
>
>In i386 there is panic_on_oops in die which can at least get to the 
>panic call chain (as there used to be in ia64).
>
>None of the dump stuff is in the stock kernels yet is it?

There is not dump stuff. 


>> >> My personal preference would be something like this:
>> >>    1) dump register state (for all CPUs, not just the INIT monarch)
>> >>       on the console
>> >>    2) print backtraces (maybe just for currently-running tasks;
>> >>       currently we do the task on the INIT monarch plus all other
>> >>       non-running tasks, which is definitely non-optimal)
>> >>    3) optional debugger/crashdump hook
>> >>    4) call panic (maybe)
>> >>    5) optional timeout, then reboot (if not calling panic)
>> >> 
>> >> Part 5 would be trivial and probably not *too* controversial.
>> >> Part 1 is harder but extremely useful, and I think someone (Zoltan?)
>> >> posted a start.  Part 2 should be simple given part 1.
>> >
>> >I'll see what I can do about most of these. Part 1 would be
>> >difficult since the hardware/firmware we've currently got 
>> >available makes both processors the monarch on INIT.
>> 
>> Even if crashdump hook is added into the init_handler, dump does not
>> work correctly because of single INIT stack. Therefore Russ Anderson's
>> patch which separates INIT stack is also indispensable.
>> 
>
>We are still mostly a working with 2.4 (rhel3 which has netdump_func hooks) 
>and this all worked fine. A crashdump hook, a call to panic, or a call 
>to BUG each worked.

Crashdump itself succeeds, but isn't there any problem in analyzing
dump? Backtrace of "current" on each cpu seem to not work because
switch_stack is not saved correctly.


Regards,
Takao Indoh

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
                   ` (7 preceding siblings ...)
  2004-11-11  0:55 ` Takao Indoh
@ 2004-11-11  1:14 ` Luck, Tony
  2004-11-11 17:12 ` Cliff Larsen
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Luck, Tony @ 2004-11-11  1:14 UTC (permalink / raw)
  To: linux-ia64

>>    1) dump register state (for all CPUs, not just the INIT monarch)
>>       on the console

>I'll see what I can do about most of these. Part 1 would be
>difficult since the hardware/firmware we've currently got 
>available makes both processors the monarch on INIT.

You could change the call to ia64_sal_set_vectors in ia64_mca_init
to point all cpus to just one routine (pass pointer to the same
routine for monarch/slave) ... and then have the OS init code
handle the serialization.  That would work on both correct and
buggy SAL implementations.

-Tony

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
                   ` (8 preceding siblings ...)
  2004-11-11  1:14 ` Luck, Tony
@ 2004-11-11 17:12 ` Cliff Larsen
  2004-11-11 17:18 ` Cliff Larsen
  2004-11-11 17:33 ` Luck, Tony
  11 siblings, 0 replies; 13+ messages in thread
From: Cliff Larsen @ 2004-11-11 17:12 UTC (permalink / raw)
  To: linux-ia64

On Wed, 2004-11-10 at 19:55, Takao Indoh wrote:
> On Wed, 10 Nov 2004 10:53:38 -0500, Philip R Auld wrote:
> >
> >We are still mostly a working with 2.4 (rhel3 which has netdump_func hooks) 
> >and this all worked fine. A crashdump hook, a call to panic, or a call 
> >to BUG each worked.
> 
> Crashdump itself succeeds, but isn't there any problem in analyzing
> dump? Backtrace of "current" on each cpu seem to not work because
> switch_stack is not saved correctly.

We are seeing the same behavior with our 2.4 - we can backtrace all
process but the active.

-- 
Cliff Larsen <clarsen@egenera.com>



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
                   ` (9 preceding siblings ...)
  2004-11-11 17:12 ` Cliff Larsen
@ 2004-11-11 17:18 ` Cliff Larsen
  2004-11-11 17:33 ` Luck, Tony
  11 siblings, 0 replies; 13+ messages in thread
From: Cliff Larsen @ 2004-11-11 17:18 UTC (permalink / raw)
  To: linux-ia64

On Wed, 2004-11-10 at 20:14, Luck, Tony wrote:
> You could change the call to ia64_sal_set_vectors in ia64_mca_init
> to point all cpus to just one routine (pass pointer to the same
> routine for monarch/slave) ... and then have the OS init code
> handle the serialization.  That would work on both correct and
> buggy SAL implementations.
> 
> -Tony

Certainly true. Do you have any sense of how widespread the problem
is? Being relatively new to Itanium and having just a SR870BH2 
to work with, I'm wondering whether such a workaround would be
generally useful.

-- 
Cliff Larsen <clarsen@egenera.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH] make INIT# handler call panic
  2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
                   ` (10 preceding siblings ...)
  2004-11-11 17:18 ` Cliff Larsen
@ 2004-11-11 17:33 ` Luck, Tony
  11 siblings, 0 replies; 13+ messages in thread
From: Luck, Tony @ 2004-11-11 17:33 UTC (permalink / raw)
  To: linux-ia64

>> You could change the call to ia64_sal_set_vectors in ia64_mca_init
>> to point all cpus to just one routine (pass pointer to the same
>> routine for monarch/slave) ... and then have the OS init code
>> handle the serialization.  That would work on both correct and
>> buggy SAL implementations.
>> 
>> -Tony
>
>Certainly true. Do you have any sense of how widespread the problem
>is? Being relatively new to Itanium and having just a SR870BH2 
>to work with, I'm wondering whether such a workaround would be
>generally useful.

The only platform that I _know_ has this SAL bug is ... shuffles feet
in embarrassment ... the Intel Tiger.  But it is possible that others
have copied this bug.

-Tony

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-11-11 17:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-11-05 13:55 [PATCH] make INIT# handler call panic Cliff Larsen
2004-11-05 16:26 ` Bjorn Helgaas
2004-11-05 21:04 ` Cliff Larsen
2004-11-05 22:04 ` Bjorn Helgaas
2004-11-05 22:57 ` Cliff Larsen
2004-11-05 23:04 ` Russ Anderson
2004-11-08 12:14 ` Takao Indoh
2004-11-10 15:53 ` Philip R Auld
2004-11-11  0:55 ` Takao Indoh
2004-11-11  1:14 ` Luck, Tony
2004-11-11 17:12 ` Cliff Larsen
2004-11-11 17:18 ` Cliff Larsen
2004-11-11 17:33 ` Luck, Tony

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox