Re: [Fastboot] Ia64 kdump patch

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [Fastboot] Ia64 kdump patch
@ 2006-06-08  8:35 Horms
  2006-06-08 22:47 ` Zou Nan hai
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Horms @ 2006-06-08  8:35 UTC (permalink / raw)
  To: linux-ia64

On Thu, Jun 08, 2006 at 06:48:23AM +0800, Zou Nan hai wrote:
> The ia64 kdump patch is in 2 parts.
> 
> the kexec-kdump-ia64-2.6.16.patch should apply on top of the previous 
> kexec patch by Khalid in Tony's test tree.
> 
> the kexec-tools-kdump-ia64.patch should apply to kexec-tools-1.101
> with kexec-tools-1.101-kdump.patch
> 
> 
> To test it.
> Build first SMP kernel with KEXEC and KDUMP enabled.
> 
> Boot it with kernel parameter "crashkernel=XXX@YYY"
> means reserver XXX from YYY for crashdumping.
> Build an UP kernel with KEXEC KDUMP VMCORE enabled.
> load this kernel as a crashdumping kernel
> kexec -p vmlinux.gz --initrd=initrd --append="...."
> 
> trigger a crash,
> maybe "echo c > /proc/sysrq-trigger"
> after the crash kernel boots,
> cp /proc/vmcore core
> 
> gdb first_kernel_vmlinux core
> 
> please test and review.
> 
> Signed-off-by: Khalid Aziz <khalid_aziz@hp.com>
> Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>

Hi,

I'm very excited to be able to play with the new version of this patch,
but the version you posted seems to included include all the kexec patch
that went into Tony Luck's tree. Here is a rediff relative to the
existing kexec patch (no other changes).

The code does seem to be working for me. The main difficulty so far
seems to have been finding an appropriate place and size and place for
the reserved area. 128M@256M seems to work for me, offering enough
memory and not lie on a resource boundry for me.

Lastly, is it possible for you to comment on what areas of concern
you have with regards to kdump/kexec on ia64. I am looking to port this
code to xen, as my colleague Magnus Damm and I have already done so for i386
(complete) and x86_64 (almost complete).

http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01272.html

Signed-Off-By: Horms <horms@verge.net.au>

 arch/ia64/Kconfig                  |    8 ++
 arch/ia64/kernel/crash.c           |  113 +++++++++++++++++++++++++++++++++++-
 arch/ia64/kernel/efi.c             |   17 ++++-
 arch/ia64/kernel/machine_kexec.c   |   43 ++-----------
 arch/ia64/kernel/relocate_kernel.S |   38 +++++-------
 arch/ia64/kernel/setup.c           |   38 ++++++++++++
 include/asm-ia64/kexec.h           |    4 -
 include/asm-ia64/meminit.h         |    3 
 include/linux/irq.h                |    1 
 kernel/irq/manage.c                |   19 ++++++
 10 files changed, 218 insertions(+), 66 deletions(-)

34b91b63c4f4144a4e67c67dca97b424ed8fc26c
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 4bab62d..1202c6e 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -413,6 +413,8 @@ config IA64_PALINFO
 config SGI_SN
 	def_bool y if (IA64_SGI_SN2 || IA64_GENERIC)
 
+source "drivers/sn/Kconfig"
+
 config KEXEC
 	bool "kexec system call (EXPERIMENTAL)"
 	depends on EXPERIMENTAL
@@ -430,7 +432,11 @@ config KEXEC
 	  support.  As of this writing the exact hardware interface is
 	  strongly in flux, so no good recommendation can be made.
 
-source "drivers/sn/Kconfig"
+config CRASH_DUMP
+	  bool "kernel crash dumps (EXPERIMENTAL)"
+	  depends on EXPERIMENTAL
+	  help
+	    Generate crash dump after being started by kexec.
 
 source "drivers/firmware/Kconfig"
 
diff --git a/arch/ia64/kernel/crash.c b/arch/ia64/kernel/crash.c
index a0e49b7..03c3118 100644
--- a/arch/ia64/kernel/crash.c
+++ b/arch/ia64/kernel/crash.c
@@ -4,8 +4,8 @@
  * Architecture specific (ia64) functions for kexec based crash dumps.
  *
  * Created by: Khalid Aziz <khalid.aziz@hp.com>
- *
  * Copyright (C) 2005 Hewlett-Packard Development Company, L.P.
+ * Copyright (C) 2005 Intel Corp	Zou Nan hai <nanhai.zou@intel.com>
  *
  */
 #include <linux/init.h>
@@ -13,6 +13,7 @@ #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/smp.h>
 #include <linux/irq.h>
+#include <linux/pci.h>
 #include <linux/reboot.h>
 #include <linux/kexec.h>
 #include <linux/irq.h>
@@ -20,6 +21,111 @@ #include <linux/delay.h>
 #include <linux/elf.h>
 #include <linux/elfcore.h>
 #include <linux/device.h>
+#include <asm/uaccess.h>
+
+size_t copy_oldmem_page(unsigned long pfn, char *buf,
+                               size_t csize, unsigned long offset, int userbuf)
+{
+        void  *vaddr;
+
+        if (!csize)
+                return 0;
+        vaddr = page_address(pfn_to_page(pfn));
+
+        if (userbuf) {
+                if (copy_to_user(buf, (vaddr + offset), csize)) {
+                        return -EFAULT;
+                }
+        } else
+                memcpy(buf, (vaddr + offset), csize);
+        return csize;
+}
+
+static void device_shootdown(void)
+{
+       struct pci_dev *dev;
+       irq_desc_t *desc;
+       u16 pci_command;
+
+       list_for_each_entry(dev, &pci_devices, global_list) {
+               desc = irq_descp(dev->irq);
+               if (!desc->action)
+                       continue;
+               pci_read_config_word(dev, PCI_COMMAND, &pci_command);
+               if (pci_command & PCI_COMMAND_MASTER) {
+                       pci_command &= ~PCI_COMMAND_MASTER;
+                       pci_write_config_word(dev, PCI_COMMAND, pci_command);
+               }
+               disable_irq_nosync(dev->irq);
+               desc->handler->end(dev->irq);
+       }
+}
+
+static Elf64_Word
+*append_elf_note(Elf64_Word *buf, char *name, unsigned type, void *data,
+		size_t data_len)
+{
+	struct elf_note *note = (struct elf_note *)buf;
+	note->n_namesz = strlen(name) + 1;
+	note->n_descsz = data_len;
+	note->n_type   = type;
+	buf += (sizeof(*note) + 3)/4;
+	memcpy(buf, name, note->n_namesz);
+	buf += (note->n_namesz + 3)/4;
+	memcpy(buf, data, data_len);
+	buf += (data_len + 3)/4;
+	return buf;
+}
+
+static void
+final_note(void *buf)
+{
+	memset(buf, 0, sizeof(struct elf_note));
+}
+
+static void
+crash_save_this_cpu(void)
+{
+	void *buf;
+	struct elf_prstatus prstatus;
+	int cpu = smp_processor_id();
+	elf_greg_t *dst = (elf_greg_t *)&prstatus.pr_reg;
+
+	memset(&prstatus, 0, sizeof(prstatus));
+	prstatus.pr_pid = current->pid;
+
+    	dst[1] = ia64_getreg(_IA64_REG_GP);
+    	dst[12] = ia64_getreg(_IA64_REG_SP);
+    	dst[13] = ia64_getreg(_IA64_REG_TP);
+
+    	dst[42] = ia64_getreg(_IA64_REG_IP);
+    	dst[45] = ia64_getreg(_IA64_REG_AR_RSC);
+
+	ia64_setreg(_IA64_REG_AR_RSC, 0);
+	ia64_srlz_i();
+
+    	dst[46] = ia64_getreg(_IA64_REG_AR_BSP);
+    	dst[47] = ia64_getreg(_IA64_REG_AR_BSPSTORE);
+
+    	dst[48] = ia64_getreg(_IA64_REG_AR_RNAT);
+    	dst[49] = ia64_getreg(_IA64_REG_AR_CCV);
+    	dst[50] = ia64_getreg(_IA64_REG_AR_UNAT);
+
+    	dst[51] = ia64_getreg(_IA64_REG_AR_FPSR);
+    	dst[52] = ia64_getreg(_IA64_REG_AR_PFS);
+    	dst[53] = ia64_getreg(_IA64_REG_AR_LC);
+
+    	dst[54] = ia64_getreg(_IA64_REG_AR_LC);
+    	dst[55] = ia64_getreg(_IA64_REG_AR_CSD);
+    	dst[56] = ia64_getreg(_IA64_REG_AR_SSD);
+
+        buf = (u64 *) per_cpu_ptr(crash_notes, cpu);
+	if (!buf)
+		return;
+	buf = append_elf_note(buf, "CORE", NT_PRSTATUS, &prstatus,
+		sizeof(prstatus));
+	final_note(buf);
+}
 
 void
 machine_crash_shutdown(struct pt_regs *pt)
@@ -32,8 +138,11 @@ machine_crash_shutdown(struct pt_regs *p
 	 * In practice this means shooting down the other cpus in
 	 * an SMP system.
 	 */
-	if (in_interrupt())
+	if (in_interrupt()) {
 		ia64_eoi();
+	}
+	crash_save_this_cpu();
+	device_shootdown();
 #ifdef CONFIG_SMP
 	smp_send_stop();
 #endif
diff --git a/arch/ia64/kernel/efi.c b/arch/ia64/kernel/efi.c
index c33d0ba..5c657e6 100644
--- a/arch/ia64/kernel/efi.c
+++ b/arch/ia64/kernel/efi.c
@@ -27,6 +27,7 @@ #include <linux/init.h>
 #include <linux/types.h>
 #include <linux/time.h>
 #include <linux/efi.h>
+#include <linux/kexec.h>
 
 #include <asm/io.h>
 #include <asm/kregs.h>
@@ -42,7 +43,7 @@ extern efi_status_t efi_call_phys (void 
 struct efi efi;
 EXPORT_SYMBOL(efi);
 static efi_runtime_services_t *runtime;
-static unsigned long mem_limit = ~0UL, max_addr = ~0UL;
+static unsigned long mem_limit = ~0UL, max_addr = ~0UL, min_addr = 0UL;
 
 #define efi_call_virt(f, args...)	(*(f))(args)
 
@@ -422,6 +423,8 @@ efi_init (void)
 			mem_limit = memparse(cp + 4, &cp);
 		} else if (memcmp(cp, "max_addr=", 9) = 0) {
 			max_addr = GRANULEROUNDDOWN(memparse(cp + 9, &cp));
+		} else if (memcmp(cp, "min_addr=", 9) = 0) {
+			min_addr = GRANULEROUNDDOWN(memparse(cp + 9, &cp));
 		} else {
 			while (*cp != ' ' && *cp)
 				++cp;
@@ -429,6 +432,8 @@ efi_init (void)
 				++cp;
 		}
 	}
+	if (min_addr != 0UL)
+		printk(KERN_INFO "Ignoring memory below %luMB\n", min_addr >> 20);
 	if (max_addr != ~0UL)
 		printk(KERN_INFO "Ignoring memory above %luMB\n", max_addr >> 20);
 
@@ -895,7 +900,8 @@ find_memmap_space (void)
 		as = max(contig_low, md->phys_addr);
 		ae = min(contig_high, efi_md_end(md));
 
-		/* keep within max_addr= command line arg */
+		/* keep within max_addr= and min_addr= command line arg */
+		as = max(as, min_addr);
 		ae = min(ae, max_addr);
 		if (ae <= as)
 			continue;
@@ -1005,7 +1011,8 @@ efi_memmap_init(unsigned long *s, unsign
 		} else
 			ae = efi_md_end(md);
 
-		/* keep within max_addr= command line arg */
+		/* keep within max_addr= and min_addr= command line arg */
+		as = max(as, min_addr);
 		ae = min(ae, max_addr);
 		if (ae <= as)
 			continue;
@@ -1117,6 +1124,10 @@ efi_initialize_iomem_resources(struct re
 			 */
 			insert_resource(res, code_resource);
 			insert_resource(res, data_resource);
+#ifdef CONFIG_KEXEC
+			if (crashk_res.end > crashk_res.start)
+				insert_resource(res, &crashk_res);
+#endif
 		}
 	}
 }
diff --git a/arch/ia64/kernel/machine_kexec.c b/arch/ia64/kernel/machine_kexec.c
index 3b46143..73fbb26 100644
--- a/arch/ia64/kernel/machine_kexec.c
+++ b/arch/ia64/kernel/machine_kexec.c
@@ -1,5 +1,5 @@
 /*
- * arch/ia64/kernel/machine_kexec.c
+ * arch/ia64/kernel/machine_kexec.c 
  *
  * Handle transition of Linux booting another kernel
  * Copyright (C) 2005 Hewlett-Packard Development Comapny, L.P.
@@ -25,9 +25,7 @@ #include <asm/tlbflush.h>
 #include <asm/delay.h>
 #include <asm/meminit.h>
 
-extern unsigned long ia64_iobase;
-
-typedef void (*relocate_new_kernel_t)( unsigned long, unsigned long,
+typedef void (*relocate_new_kernel_t)(unsigned long, unsigned long,
 		struct ia64_boot_param *, unsigned long);
 
 /*
@@ -43,9 +41,9 @@ int machine_kexec_prepare(struct kimage 
 	func = (unsigned long *)&relocate_new_kernel;
 	/* Pre-load control code buffer to minimize work in kexec path */
 	control_code_buffer = page_address(image->control_code_page);
-	memcpy((void *)control_code_buffer, (const void *)func[0],
+	memcpy((void *)control_code_buffer, (const void *)func[0], 
 			relocate_new_kernel_size);
-	flush_icache_range((unsigned long)control_code_buffer,
+	flush_icache_range((unsigned long)control_code_buffer, 
 			(unsigned long)control_code_buffer + relocate_new_kernel_size);
 
 	return 0;
@@ -61,7 +59,6 @@ #ifdef CONFIG_PCI
 	struct pci_dev *dev = NULL;
 	irq_desc_t *idesc;
 	cpumask_t mask = CPU_MASK_NONE;
-
 	/* Disable all PCI devices */
 	while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
 		if (!(dev->is_enabled))
@@ -91,7 +88,6 @@ #elif defined(CONFIG_SMP)
 	smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0);
 #endif
 
-	ia64_set_itv(1<<16);
 
 #ifdef CONFIG_IA64_HP_ZX1
 	ioc_iova_disable();
@@ -100,41 +96,20 @@ #endif
 
 /*
  * Do not allocate memory (or fail in any way) in machine_kexec().
- * We are past the point of no return, committed to rebooting now.
+ * We are past the point of no return, committed to rebooting now. 
  */
+extern void *efi_get_pal_addr(void);
 void machine_kexec(struct kimage *image)
 {
-	unsigned long indirection_page;
 	relocate_new_kernel_t rnk;
-	unsigned long pta, impl_va_bits;
 	void *pal_addr = efi_get_pal_addr();
 	unsigned long code_addr = (unsigned long)page_address(image->control_code_page);
-
 	/* Interrupts aren't acceptable while we reboot */
+	ia64_set_itv(1<<16);
 	local_irq_disable();
-
-	/* Disable VHPT */
-	impl_va_bits = ffz(~(local_cpu_data->unimpl_va_mask | (7UL << 61)));
-	pta = POW2(61) - POW2(vmlpt_bits);
-	ia64_set_pta(pta | (0 << 8) | (vmlpt_bits << 2) | 0);
-
-	/* now execute the control code.
-	 * We will start by executing the control code linked into the
-	 * kernel as opposed to the code we copied in control code buffer		 * page. When this code switches to physical mode, we will start
-	 * executing the code in control code buffer page. Reason for
-	 * doing this is we start code execution in virtual address space.
-	 * If we were to try to execute the newly copied code in virtual
-	 * address space, we will need to make an ITLB entry to avoid ITLB
-	 * miss. By executing the code linked into kernel, we take advantage
-	 * of the ITLB entry already in place for kernel and avoid making
-	 * a new entry.
-	 */
-	indirection_page = image->head & PAGE_MASK;
-
 	rnk = (relocate_new_kernel_t)&code_addr;
-	(*rnk)(indirection_page, image->start, ia64_boot_param,
+	(*rnk)(image->head, image->start, ia64_boot_param,
 		     GRANULEROUNDDOWN((unsigned long) pal_addr));
 	BUG();
-	for (;;)
-		;
+	for (;;);
 }
diff --git a/arch/ia64/kernel/relocate_kernel.S b/arch/ia64/kernel/relocate_kernel.S
index d3e20ad..09bd041 100644
--- a/arch/ia64/kernel/relocate_kernel.S
+++ b/arch/ia64/kernel/relocate_kernel.S
@@ -1,5 +1,5 @@
 /*
- * arch/ia64/kernel/relocate_kernel.S
+ * arch/ia64/kernel/relocate_kernel.S 
  *
  * Relocate kexec'able kernel and start it
  *
@@ -17,9 +17,7 @@ #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/mca_asm.h>
 
-       /* Must be relocatable PIC code callable as a C function, that once
-        * it starts can not use the previous processes stack.
-        *
+       /* Must be relocatable PIC code callable as a C function
         */
 GLOBAL_ENTRY(relocate_new_kernel)
 	.prologue
@@ -36,22 +34,16 @@ GLOBAL_ENTRY(relocate_new_kernel)
         srlz.i
 }
 	;;
-
+	dep r2=0,r2,61,3		//to physical address
+	;;
 	//first switch to physical mode
 	add r3\x1f-.reloc_entry, r2
-	movl r16 = IA64_PSR_AC|IA64_PSR_BN|IA64_PSR_IC|IA64_PSR_MFL
+	movl r16 = IA64_PSR_AC|IA64_PSR_BN|IA64_PSR_IC
 	mov ar.rsc=0	          	// put RSE in enforced lazy mode
 	;;
-	add r2=(memory_stack-.reloc_entry), r2
-	;;
-	add sp=(memory_stack_end - .reloc_entry),r2
+	add sp=(memory_stack_end - 16 - .reloc_entry),r2
 	add r8=(register_stack - .reloc_entry),r2
 	;;
-	tpa sp=sp
-	tpa r3=r3
-	;;
-	loadrs
-	;;
 	mov r18=ar.rnat
 	mov ar.bspstore=r8
 	;;
@@ -66,7 +58,7 @@ GLOBAL_ENTRY(relocate_new_kernel)
 1:
 	//physical mode code begin
 	mov b6=in1
-	tpa r28=in2			// tpa must before TLB purge
+	dep r28=0,in2,61,3	//to physical address
 
 	// purge all TC entries
 #define O(member)       IA64_CPUINFO_##member##_OFFSET
@@ -145,10 +137,10 @@ (p7)    br.cond.dpnt.few 4f
         srlz.i
 	;;
 
-	// copy kexec kernel segments
+	//copy segments
 	movl r16=PAGE_MASK
-	ld8  r30=[in0],8;;			// in0 is page_list
-	br.sptk.few .dest_page
+        mov  r30=in0                    // in0 is page_list
+        br.sptk.few .dest_page
 	;;
 .loop:
 	ld8  r30=[in0], 8;;
@@ -188,6 +180,8 @@ (p6)	br.cond.sptk.few .loop
 	srlz.d
 	;;
 	br.call.sptk.many b0¶;;
+
+.align  32
 memory_stack:
 	.fill           8192, 1, 0
 memory_stack_end:
@@ -310,7 +304,7 @@ check_irr0:
 	cmp.eq	p6,p0=0,r8
 (p6)	br.cond.sptk.few	check_irr0
 	br.few	call_start
-
+	
 check_irr1:
 	mov	r8=cr.irr1
 	;;
@@ -319,7 +313,7 @@ check_irr1:
 	cmp.eq	p6,p0=0,r8
 (p6)	br.cond.sptk.few	check_irr1
 	br.few	call_start
-
+	
 check_irr2:
 	mov	r8=cr.irr2
 	;;
@@ -328,7 +322,7 @@ check_irr2:
 	cmp.eq	p6,p0=0,r8
 (p6)	br.cond.sptk.few	check_irr2
 	br.few	call_start
-
+	
 check_irr3:
 	mov	r8=cr.irr3
 	;;
@@ -337,7 +331,7 @@ check_irr3:
 	cmp.eq	p6,p0=0,r8
 (p6)	br.cond.sptk.few	check_irr3
 	br.few	call_start
-
+	
 call_start:
 	mov	cr.eoi=r0
 	;;
diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index e4dfda1..7665d4d 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -44,6 +44,8 @@ #include <linux/efi.h>
 #include <linux/initrd.h>
 #include <linux/pm.h>
 #include <linux/cpufreq.h>
+#include <linux/kexec.h>
+#include <linux/crash_dump.h>
 
 #include <asm/ia32.h>
 #include <asm/machvec.h>
@@ -251,6 +253,32 @@ #ifdef CONFIG_BLK_DEV_INITRD
 	}
 #endif
 
+#ifdef CONFIG_KEXEC
+	/* crashkernel=size@addr specifies the location to reserve for
+	 * a crash kernel.  By reserving this memory we guarantee
+	 * that linux never set's it up as a DMA target.
+	 * Useful for holding code to do something appropriate
+	 * after a kernel panic.
+	 */
+	{
+		char *from = strstr(saved_command_line, "crashkernel=");
+		if (from) {
+			unsigned long size, base;
+			size = memparse(from + 12, &from);
+			if (*from = '@') {
+				base = memparse(from + 1, &from);
+				rsvd_region[n].start +					(unsigned long)__va(base);
+				rsvd_region[n].end +					(unsigned long)__va(base + size);
+				crashk_res.start = base;
+				crashk_res.end = base + size - 1;
+				n++;
+			}
+		}
+	}
+#endif
+
 	efi_memmap_init(&rsvd_region[n].start, &rsvd_region[n].end);
 	n++;
 
@@ -484,6 +512,16 @@ #endif
 	if (!nomca)
 		ia64_mca_init();
 
+#ifdef CONFIG_CRASH_DUMP
+	{
+		char *from = strstr(saved_command_line, "elfcorehdr=");
+
+		if (from)
+			elfcorehdr_addr = memparse(from+11, &from);
+		saved_max_pfn = (unsigned long) -1;
+	}
+#endif
+
 	platform_setup(cmdline_p);
 	paging_init();
 }
diff --git a/include/asm-ia64/kexec.h b/include/asm-ia64/kexec.h
index e9f0b7a..d45c03f 100644
--- a/include/asm-ia64/kexec.h
+++ b/include/asm-ia64/kexec.h
@@ -21,14 +21,12 @@ #define vmlpt_bits	(impl_va_bits - PAGE_
 #define POW2(n)		(1ULL << (n))
 
 DECLARE_PER_CPU(u64, ia64_mca_pal_base);
-
 const extern unsigned int relocate_new_kernel_size;
 volatile extern long kexec_rendez;
-extern void relocate_new_kernel(unsigned long, unsigned long,
+extern void relocate_new_kernel(unsigned long, unsigned long, 
 		struct ia64_boot_param *, unsigned long);
 extern void kexec_fake_sal_rendez(void *start, unsigned long wake_up,
 		unsigned long pal_base);
-
 static inline void
 crash_setup_regs(struct pt_regs *newregs, struct pt_regs *oldregs)
 {
diff --git a/include/asm-ia64/meminit.h b/include/asm-ia64/meminit.h
index 46501b0..7a0e216 100644
--- a/include/asm-ia64/meminit.h
+++ b/include/asm-ia64/meminit.h
@@ -16,11 +16,12 @@ #include <linux/config.h>
  * 	- initrd (optional)
  * 	- command line string
  * 	- kernel code & data
+ * 	- crash dumping code reserved region
  * 	- Kernel memory map built from EFI memory map
  *
  * More could be added if necessary
  */
-#define IA64_MAX_RSVD_REGIONS 6
+#define IA64_MAX_RSVD_REGIONS 7
 
 struct rsvd_region {
 	unsigned long start;	/* virtual address of beginning of element */
diff --git a/include/linux/irq.h b/include/linux/irq.h
index ee2a82a..a4e89f4 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -94,6 +94,7 @@ irq_descp (int irq)
 #include <asm/hw_irq.h> /* the arch dependent stuff */
 
 extern int setup_irq(unsigned int irq, struct irqaction * new);
+extern void terminate_irqs(void);
 
 #ifdef CONFIG_GENERIC_HARDIRQS
 extern cpumask_t irq_affinity[NR_IRQS];
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 1279e34..7d855d2 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -393,3 +393,22 @@ int request_irq(unsigned int irq,
 
 EXPORT_SYMBOL(request_irq);
 
+/*
+ * Terminate any outstanding interrupts
+ */
+void terminate_irqs(void)
+{
+	struct irqaction * action;
+	irq_desc_t *idesc;
+	int i;
+
+	for (i=0; i < NR_IRQS; i++) {
+		idesc = irq_descp(i);
+		action = idesc->action;
+		if (!action)
+			continue;
+		if (idesc->handler->end)
+			idesc->handler->end(i);
+	}
+}
+

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
@ 2006-06-08 22:47 ` Zou Nan hai
  2006-06-12  0:16 ` Zou Nan hai
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Zou Nan hai @ 2006-06-08 22:47 UTC (permalink / raw)
  To: linux-ia64

On Thu, 2006-06-08 at 16:35, Horms wrote:
> On Thu, Jun 08, 2006 at 06:48:23AM +0800, Zou Nan hai wrote:
> > The ia64 kdump patch is in 2 parts.
> > 
> > the kexec-kdump-ia64-2.6.16.patch should apply on top of the previous 
> > kexec patch by Khalid in Tony's test tree.
> > 
> > the kexec-tools-kdump-ia64.patch should apply to kexec-tools-1.101
> > with kexec-tools-1.101-kdump.patch
> > 
> > 
> > To test it.
> > Build first SMP kernel with KEXEC and KDUMP enabled.
> > 
> > Boot it with kernel parameter "crashkernel=XXX@YYY"
> > means reserver XXX from YYY for crashdumping.
> > Build an UP kernel with KEXEC KDUMP VMCORE enabled.
> > load this kernel as a crashdumping kernel
> > kexec -p vmlinux.gz --initrd=initrd --append="...."
> > 
> > trigger a crash,
> > maybe "echo c > /proc/sysrq-trigger"
> > after the crash kernel boots,
> > cp /proc/vmcore core
> > 
> > gdb first_kernel_vmlinux core
> > 
> > please test and review.
> > 
> > Signed-off-by: Khalid Aziz <khalid_aziz@hp.com>
> > Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
> 
> Hi,
> 
> I'm very excited to be able to play with the new version of this patch,
> but the version you posted seems to included include all the kexec patch
> that went into Tony Luck's tree. Here is a rediff relative to the
> existing kexec patch (no other changes).
> 
> The code does seem to be working for me. The main difficulty so far
> seems to have been finding an appropriate place and size and place for
> the reserved area. 128M@256M seems to work for me, offering enough
> memory and not lie on a resource boundry for me.
> 
> Lastly, is it possible for you to comment on what areas of concern
> you have with regards to kdump/kexec on ia64. I am looking to port this
> code to xen, as my colleague Magnus Damm and I have already done so for i386
> (complete) and x86_64 (almost complete).
> 
> http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01272.html
> 
> Signed-Off-By: Horms <horms@verge.net.au>
> 

 Thanks for testing and review.
 
 There is still a lot of work to do for ia64 Kdump to be a very useful
and robust feature.

 Major issues.
 1. Full percpu dumping on INIT. 
    You may notices I only send an IPI to user CPUs and dump part of
registers for crashing CPU.Just stop other CPUs, not dumping their
status. This is only a temp hack.
 On other platforms they did this by an NMI, on IA64 we should use INIT
to acknowledge other CPUs. And I know on some platform there is a
trigger on panel can trigger INIT. We could use that to dump at the time
of deadlock. But currently INIT is used by MCA, we need to find a way to
coordinate with MAC on INIT.

 2. unwind section is missing in vmcore.
    When you do a readelf on vmcore, you may notice there is no unwind
sections. We should add this percpu stack unwind sections to help dump
filter tools to analize the core dump.

 3. kdump path at crash time. 
    Currently I still have to do a irq->end on each level triggered irq,
without that the MPT fusion driver can not restart. We should fix this,
at least do that in a way of not touching any memory in previous kernel.

 4. Other than this, we need port the dump filter to IA64.

There are still some minor issues.
e.g
  When I get a crash when X is active, the new kernel will startup in a
blank screen(network is still working). I have indeed do a brute force
VGA reset on in purgatory code. But that seems to only shutdown the VGA
but not reinit it if X is running.

  Current kexec can't not run on a kexec'd kernel, that is because the
memory region of EFI memmap is not reserverd in /proc/iomem, I will sent
a patch to reserve that region later.

There should be other issues and gaps need to find out.

Thanks
Zou Nan hai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
  2006-06-08 22:47 ` Zou Nan hai
@ 2006-06-12  0:16 ` Zou Nan hai
  2006-06-12  1:50 ` Takao Indoh
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Zou Nan hai @ 2006-06-12  0:16 UTC (permalink / raw)
  To: linux-ia64

On Mon, 2006-06-12 at 09:50, Takao Indoh wrote:
> On 09 Jun 2006 06:47:59 +0800, Zou Nan hai wrote:
> 
> > Thanks for testing and review.
> > 
> > There is still a lot of work to do for ia64 Kdump to be a very useful
> >and robust feature.
> >
> > Major issues.
> > 1. Full percpu dumping on INIT. 
> >    You may notices I only send an IPI to user CPUs and dump part of
> >registers for crashing CPU.Just stop other CPUs, not dumping their
> >status. This is only a temp hack.
> > On other platforms they did this by an NMI, on IA64 we should use INIT
> >to acknowledge other CPUs. And I know on some platform there is a
> >trigger on panel can trigger INIT. We could use that to dump at the time
> >of deadlock. But currently INIT is used by MCA, we need to find a way to
> >coordinate with MAC on INIT.
> >
> > 2. unwind section is missing in vmcore.
> >    When you do a readelf on vmcore, you may notice there is no unwind
> >sections. We should add this percpu stack unwind sections to help dump
> >filter tools to analize the core dump.
> >
> > 3. kdump path at crash time. 
> >    Currently I still have to do a irq->end on each level triggered irq,
> >without that the MPT fusion driver can not restart. We should fix this,
> >at least do that in a way of not touching any memory in previous kernel.
> >
> > 4. Other than this, we need port the dump filter to IA64.
> >
> >There are still some minor issues.
> >e.g
> >  When I get a crash when X is active, the new kernel will startup in a
> >blank screen(network is still working). I have indeed do a brute force
> >VGA reset on in purgatory code. But that seems to only shutdown the VGA
> >but not reinit it if X is running.
> >
> >  Current kexec can't not run on a kexec'd kernel, that is because the
> >memory region of EFI memmap is not reserverd in /proc/iomem, I will sent
> >a patch to reserve that region later.
> >
> >There should be other issues and gaps need to find out.
> >
> >Thanks
> >Zou Nan hai
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Hi, 
> 
> I tried to use kdump on ia64, and it seemed to work, but
> I was not able to execute back trace for the panic process using
> crash tool.
> 
> I think the reason is that 1st kernel does not save switch stack before
> 2nd kernel boots.
> 
> Do you have a plan to improve kdump to save switch stack?
> Or is there another method to trace panic process?
> 
  Yes, I am going to dump per-cpu unwind sections to the core file.

  Thanks
  Zou Nan hai
> Regards,
> Takao Indoh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
  2006-06-08 22:47 ` Zou Nan hai
  2006-06-12  0:16 ` Zou Nan hai
@ 2006-06-12  1:50 ` Takao Indoh
  2006-06-14 23:30 ` Luck, Tony
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Takao Indoh @ 2006-06-12  1:50 UTC (permalink / raw)
  To: linux-ia64

On 09 Jun 2006 06:47:59 +0800, Zou Nan hai wrote:

> Thanks for testing and review.
> 
> There is still a lot of work to do for ia64 Kdump to be a very useful
>and robust feature.
>
> Major issues.
> 1. Full percpu dumping on INIT. 
>    You may notices I only send an IPI to user CPUs and dump part of
>registers for crashing CPU.Just stop other CPUs, not dumping their
>status. This is only a temp hack.
> On other platforms they did this by an NMI, on IA64 we should use INIT
>to acknowledge other CPUs. And I know on some platform there is a
>trigger on panel can trigger INIT. We could use that to dump at the time
>of deadlock. But currently INIT is used by MCA, we need to find a way to
>coordinate with MAC on INIT.
>
> 2. unwind section is missing in vmcore.
>    When you do a readelf on vmcore, you may notice there is no unwind
>sections. We should add this percpu stack unwind sections to help dump
>filter tools to analize the core dump.
>
> 3. kdump path at crash time. 
>    Currently I still have to do a irq->end on each level triggered irq,
>without that the MPT fusion driver can not restart. We should fix this,
>at least do that in a way of not touching any memory in previous kernel.
>
> 4. Other than this, we need port the dump filter to IA64.
>
>There are still some minor issues.
>e.g
>  When I get a crash when X is active, the new kernel will startup in a
>blank screen(network is still working). I have indeed do a brute force
>VGA reset on in purgatory code. But that seems to only shutdown the VGA
>but not reinit it if X is running.
>
>  Current kexec can't not run on a kexec'd kernel, that is because the
>memory region of EFI memmap is not reserverd in /proc/iomem, I will sent
>a patch to reserve that region later.
>
>There should be other issues and gaps need to find out.
>
>Thanks
>Zou Nan hai
>-
>To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi, 

I tried to use kdump on ia64, and it seemed to work, but
I was not able to execute back trace for the panic process using
crash tool.

I think the reason is that 1st kernel does not save switch stack before
2nd kernel boots.

Do you have a plan to improve kdump to save switch stack?
Or is there another method to trace panic process?

Regards,
Takao Indoh


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (2 preceding siblings ...)
  2006-06-12  1:50 ` Takao Indoh
@ 2006-06-14 23:30 ` Luck, Tony
  2006-06-26  7:47 ` Horms
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Luck, Tony @ 2006-06-14 23:30 UTC (permalink / raw)
  To: linux-ia64

> I'm very excited to be able to play with the new version of this patch,
> but the version you posted seems to included include all the kexec patch
> that went into Tony Luck's tree. Here is a rediff relative to the
> existing kexec patch (no other changes).

I applied this on top of the existing patch in my test tree.  The
first few kexec's went just fine, but when I tried to switch from
a kernel based on arch/ia64/configs/tiger_defconfig to a kernel
based on arch/ia64/defconfig, the system failed to boot (probably
because of problems with the mpt/fusion driver).  Last few messages
on the console were:

Fusion MPT base driver 3.03.09
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SPI Host driver 3.03.09
GSI 28 (level, low) -> CPU 1 (0xc018) vector 49
ACPI: PCI Interrupt 0000:06:02.0[A] -> GSI 28 (level, low) -> IRQ 49
mptbase: Initiating ioc0 bringup
mptbase: ioc0: ERROR - Doorbell ACK timeout (countI99), IntStatus€000000!
mptbase: ioc0: ERROR - Doorbell ACK timeout (countI99), IntStatus€000000!
mptbase: ioc0: ERROR - Diagnostic reset FAILED! (142h)
mptbase: ioc0 NOT READY WARNING!
mptbase: WARNING - ioc0 did not initialize properly! (-1)
mptspi: probe of 0000:06:02.0 failed with error -1
GSI 29 (level, low) -> CPU 2 (0xc218) vector 50
ACPI: PCI Interrupt 0000:06:02.1[B] -> GSI 29 (level, low) -> IRQ 50
mptbase: Initiating ioc1 bringup

I reset the system, and the generic kernel booted just fine. kexec
from the generic kernel to the tiger_defconfig kernel also worked OK.

-Tony

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (3 preceding siblings ...)
  2006-06-14 23:30 ` Luck, Tony
@ 2006-06-26  7:47 ` Horms
  2006-06-26  8:10 ` Zou, Nanhai
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Horms @ 2006-06-26  7:47 UTC (permalink / raw)
  To: linux-ia64

On Fri, Jun 09, 2006 at 06:47:59AM +0800, Zou Nan hai wrote:
> On Thu, 2006-06-08 at 16:35, Horms wrote:
> > On Thu, Jun 08, 2006 at 06:48:23AM +0800, Zou Nan hai wrote:
> > > The ia64 kdump patch is in 2 parts.
> > > 
> > > the kexec-kdump-ia64-2.6.16.patch should apply on top of the previous 
> > > kexec patch by Khalid in Tony's test tree.
> > > 
> > > the kexec-tools-kdump-ia64.patch should apply to kexec-tools-1.101
> > > with kexec-tools-1.101-kdump.patch
> > > 
> > > 
> > > To test it.
> > > Build first SMP kernel with KEXEC and KDUMP enabled.
> > > 
> > > Boot it with kernel parameter "crashkernel=XXX@YYY"
> > > means reserver XXX from YYY for crashdumping.
> > > Build an UP kernel with KEXEC KDUMP VMCORE enabled.
> > > load this kernel as a crashdumping kernel
> > > kexec -p vmlinux.gz --initrd=initrd --append="...."
> > > 
> > > trigger a crash,
> > > maybe "echo c > /proc/sysrq-trigger"
> > > after the crash kernel boots,
> > > cp /proc/vmcore core
> > > 
> > > gdb first_kernel_vmlinux core
> > > 
> > > please test and review.
> > > 
> > > Signed-off-by: Khalid Aziz <khalid_aziz@hp.com>
> > > Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
> > 
> > Hi,
> > 
> > I'm very excited to be able to play with the new version of this patch,
> > but the version you posted seems to included include all the kexec patch
> > that went into Tony Luck's tree. Here is a rediff relative to the
> > existing kexec patch (no other changes).
> > 
> > The code does seem to be working for me. The main difficulty so far
> > seems to have been finding an appropriate place and size and place for
> > the reserved area. 128M@256M seems to work for me, offering enough
> > memory and not lie on a resource boundry for me.
> > 
> > Lastly, is it possible for you to comment on what areas of concern
> > you have with regards to kdump/kexec on ia64. I am looking to port this
> > code to xen, as my colleague Magnus Damm and I have already done so for i386
> > (complete) and x86_64 (almost complete).
> > 
> > http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01272.html
> > 
> > Signed-Off-By: Horms <horms@verge.net.au>
> > 
> 
>  Thanks for testing and review.
>  
>  There is still a lot of work to do for ia64 Kdump to be a very useful
> and robust feature.
> 
>  Major issues.
>  1. Full percpu dumping on INIT. 
>     You may notices I only send an IPI to user CPUs and dump part of
> registers for crashing CPU.Just stop other CPUs, not dumping their
> status. This is only a temp hack.
>  On other platforms they did this by an NMI, on IA64 we should use INIT
> to acknowledge other CPUs. And I know on some platform there is a
> trigger on panel can trigger INIT. We could use that to dump at the time
> of deadlock. But currently INIT is used by MCA, we need to find a way to
> coordinate with MAC on INIT.
> 
>  2. unwind section is missing in vmcore.
>     When you do a readelf on vmcore, you may notice there is no unwind
> sections. We should add this percpu stack unwind sections to help dump
> filter tools to analize the core dump.
> 
>  3. kdump path at crash time. 
>     Currently I still have to do a irq->end on each level triggered irq,
> without that the MPT fusion driver can not restart. We should fix this,
> at least do that in a way of not touching any memory in previous kernel.
> 
>  4. Other than this, we need port the dump filter to IA64.
> 
> There are still some minor issues.
> e.g
>   When I get a crash when X is active, the new kernel will startup in a
> blank screen(network is still working). I have indeed do a brute force
> VGA reset on in purgatory code. But that seems to only shutdown the VGA
> but not reinit it if X is running.
> 
>   Current kexec can't not run on a kexec'd kernel, that is because the
> memory region of EFI memmap is not reserverd in /proc/iomem, I will sent
> a patch to reserve that region later.
> 
> There should be other issues and gaps need to find out.

Thanks for that list, it is very useful to me. I hope that I can
find some time to help with some of those problems.

One thing that I am puzzling over is why you shutdown the PCI devices
as part of machine_crash_shutdown(). As I am trying to port your code
to xen this is quite a problem for me, as I'm not sure that Xen
actually knows enough about PCI to do this. Its it a problem relating
to bringing the devices back online after a reboot? Is it the MPT fusion
problem you mention above?

-- 
Horms                                           
H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (4 preceding siblings ...)
  2006-06-26  7:47 ` Horms
@ 2006-06-26  8:10 ` Zou, Nanhai
  2006-06-26  8:37 ` Horms
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Zou, Nanhai @ 2006-06-26  8:10 UTC (permalink / raw)
  To: linux-ia64

> -----Original Message-----
> From: Horms [mailto:horms@verge.net.au]
> Sent: 2006Äê6ÔÂ26ÈÕ 15:47
> To: Zou, Nanhai
> Cc: Linux-IA64; khalid_aziz@hp.com; fastboot@lists.osdl.org
> Subject: Re: [Fastboot] Ia64 kdump patch
> 
> On Fri, Jun 09, 2006 at 06:47:59AM +0800, Zou Nan hai wrote:
> > On Thu, 2006-06-08 at 16:35, Horms wrote:
> > > On Thu, Jun 08, 2006 at 06:48:23AM +0800, Zou Nan hai wrote:
> > > > The ia64 kdump patch is in 2 parts.
> > > >
> > > > the kexec-kdump-ia64-2.6.16.patch should apply on top of the previous
> > > > kexec patch by Khalid in Tony's test tree.
> > > >
> > > > the kexec-tools-kdump-ia64.patch should apply to kexec-tools-1.101
> > > > with kexec-tools-1.101-kdump.patch
> > > >
> > > >
> > > > To test it.
> > > > Build first SMP kernel with KEXEC and KDUMP enabled.
> > > >
> > > > Boot it with kernel parameter "crashkernel=XXX@YYY"
> > > > means reserver XXX from YYY for crashdumping.
> > > > Build an UP kernel with KEXEC KDUMP VMCORE enabled.
> > > > load this kernel as a crashdumping kernel
> > > > kexec -p vmlinux.gz --initrd=initrd --append="...."
> > > >
> > > > trigger a crash,
> > > > maybe "echo c > /proc/sysrq-trigger"
> > > > after the crash kernel boots,
> > > > cp /proc/vmcore core
> > > >
> > > > gdb first_kernel_vmlinux core
> > > >
> > > > please test and review.
> > > >
> > > > Signed-off-by: Khalid Aziz <khalid_aziz@hp.com>
> > > > Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
> > >
> > > Hi,
> > >
> > > I'm very excited to be able to play with the new version of this patch,
> > > but the version you posted seems to included include all the kexec patch
> > > that went into Tony Luck's tree. Here is a rediff relative to the
> > > existing kexec patch (no other changes).
> > >
> > > The code does seem to be working for me. The main difficulty so far
> > > seems to have been finding an appropriate place and size and place for
> > > the reserved area. 128M@256M seems to work for me, offering enough
> > > memory and not lie on a resource boundry for me.
> > >
> > > Lastly, is it possible for you to comment on what areas of concern
> > > you have with regards to kdump/kexec on ia64. I am looking to port this
> > > code to xen, as my colleague Magnus Damm and I have already done so for
> i386
> > > (complete) and x86_64 (almost complete).
> > >
> > >
> http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01272.html
> > >
> > > Signed-Off-By: Horms <horms@verge.net.au>
> > >
> >
> >  Thanks for testing and review.
> >
> >  There is still a lot of work to do for ia64 Kdump to be a very useful
> > and robust feature.
> >
> >  Major issues.
> >  1. Full percpu dumping on INIT.
> >     You may notices I only send an IPI to user CPUs and dump part of
> > registers for crashing CPU.Just stop other CPUs, not dumping their
> > status. This is only a temp hack.
> >  On other platforms they did this by an NMI, on IA64 we should use INIT
> > to acknowledge other CPUs. And I know on some platform there is a
> > trigger on panel can trigger INIT. We could use that to dump at the time
> > of deadlock. But currently INIT is used by MCA, we need to find a way to
> > coordinate with MAC on INIT.
> >
> >  2. unwind section is missing in vmcore.
> >     When you do a readelf on vmcore, you may notice there is no unwind
> > sections. We should add this percpu stack unwind sections to help dump
> > filter tools to analize the core dump.
> >
> >  3. kdump path at crash time.
> >     Currently I still have to do a irq->end on each level triggered irq,
> > without that the MPT fusion driver can not restart. We should fix this,
> > at least do that in a way of not touching any memory in previous kernel.
> >
> >  4. Other than this, we need port the dump filter to IA64.
> >
> > There are still some minor issues.
> > e.g
> >   When I get a crash when X is active, the new kernel will startup in a
> > blank screen(network is still working). I have indeed do a brute force
> > VGA reset on in purgatory code. But that seems to only shutdown the VGA
> > but not reinit it if X is running.
> >
> >   Current kexec can't not run on a kexec'd kernel, that is because the
> > memory region of EFI memmap is not reserverd in /proc/iomem, I will sent
> > a patch to reserve that region later.
> >
> > There should be other issues and gaps need to find out.
> 
> Thanks for that list, it is very useful to me. I hope that I can
> find some time to help with some of those problems.
> 
> One thing that I am puzzling over is why you shutdown the PCI devices
> as part of machine_crash_shutdown(). As I am trying to port your code
> to xen this is quite a problem for me, as I'm not sure that Xen
> actually knows enough about PCI to do this. Its it a problem relating
> to bringing the devices back online after a reboot? Is it the MPT fusion
> problem you mention above?
> 
  The list is a bit wrong.., I notice that we don't need to dump unwind segment to core file for stack unwind to work... I am working on full register dumping and fixing the stack unwind issue.

 The PCI device shutdown code was to un-master all the PCI devices so that no DMA transaction will be issued by Device. However I think maybe we can remove this code because the new kernel memory space is invisible to first kernel.

There is another problem that I call irq->end for each devices, it is not safe to touch any pointer belong to previous kernel at the crash time.
But without this code, MPT fusion driver is very likely unable to restart. It sometimes failed to restart even with the irq->end code. This is an open issue need to be fixed.

Thanks
Zou Nan hai

> Horms
> H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (5 preceding siblings ...)
  2006-06-26  8:10 ` Zou, Nanhai
@ 2006-06-26  8:37 ` Horms
  2006-06-26  8:49 ` Zou, Nanhai
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Horms @ 2006-06-26  8:37 UTC (permalink / raw)
  To: linux-ia64

On Mon, Jun 26, 2006 at 04:10:44PM +0800, Zou, Nanhai wrote:
>   The list is a bit wrong.., I notice that we don't need to dump
>   unwind segment to core file for stack unwind to work... I am working
>   on full register dumping and fixing the stack unwind issue.

Awsome, if you have anything that needs testing I am more than
happy to help. I am a bit confused about which registers are
saved and which aren't in the current code, but I figure that mainly
relates to the code being incomplete.

>  The PCI device shutdown code was to un-master all the PCI devices so that no DMA transaction will be issued by Device. However I think maybe we can remove this code because the new kernel memory space is invisible to first kernel.

Ok, that makes a lot of sense. I was really puzzling over
why ia64 does this but x86_32 doesn't. I also suspect it
can be safely removed. Would you like me to test that out
and send a patch if it works?

> There is another problem that I call irq->end for each devices, it is
> not safe to touch any pointer belong to previous kernel at the crash
> time.  But without this code, MPT fusion driver is very likely unable
> to restart. It sometimes failed to restart even with the irq->end
> code. This is an open issue need to be fixed.

Right, that does sound quite nicely, especially as that device is quite
common, right?

-- 
Horms                                           
H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (6 preceding siblings ...)
  2006-06-26  8:37 ` Horms
@ 2006-06-26  8:49 ` Zou, Nanhai
  2006-07-27 21:23 ` Zou Nan hai
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Zou, Nanhai @ 2006-06-26  8:49 UTC (permalink / raw)
  To: linux-ia64



> -----Original Message-----
> From: Horms [mailto:horms@verge.net.au]
> Sent: 2006Äê6ÔÂ26ÈÕ 16:37
> To: Zou, Nanhai
> Cc: Linux-IA64; khalid_aziz@hp.com; fastboot@lists.osdl.org
> Subject: Re: [Fastboot] Ia64 kdump patch
> 
> On Mon, Jun 26, 2006 at 04:10:44PM +0800, Zou, Nanhai wrote:
> >   The list is a bit wrong.., I notice that we don't need to dump
> >   unwind segment to core file for stack unwind to work... I am working
> >   on full register dumping and fixing the stack unwind issue.
> 
> Awsome, if you have anything that needs testing I am more than
> happy to help. I am a bit confused about which registers are
> saved and which aren't in the current code, but I figure that mainly
> relates to the code being incomplete.
> 
> >  The PCI device shutdown code was to un-master all the PCI devices so that
> no DMA transaction will be issued by Device. However I think maybe we can remove
> this code because the new kernel memory space is invisible to first kernel.
> 
> Ok, that makes a lot of sense. I was really puzzling over
> why ia64 does this but x86_32 doesn't. I also suspect it
> can be safely removed. Would you like me to test that out
> and send a patch if it works?
> 
  Please.
> > There is another problem that I call irq->end for each devices, it is
> > not safe to touch any pointer belong to previous kernel at the crash
> > time.  But without this code, MPT fusion driver is very likely unable
> > to restart. It sometimes failed to restart even with the irq->end
> > code. This is an open issue need to be fixed.
> 
> Right, that does sound quite nicely, especially as that device is quite
> common, right?
> 
  That is a standard device for some server platforms like Tiger.

Thanks 
Zou Nan hai
> --
> Horms
> H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (7 preceding siblings ...)
  2006-06-26  8:49 ` Zou, Nanhai
@ 2006-07-27 21:23 ` Zou Nan hai
  2006-07-27 21:41 ` Jay Lan
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Zou Nan hai @ 2006-07-27 21:23 UTC (permalink / raw)
  To: linux-ia64

On Fri, 2006-07-28 at 05:41, Jay Lan wrote:
> Hi,
> 
> I applied the patch to 2.6.18-rc2. However, compilation failed
> at machine_shutdown() of arch/ia64/kernel/machine_kexec.c on
> an sn2 machine.
> 
> It was easy to figure out irq_descp() is gone and idesc->handle
> is replaced with idesc->chip. But this code in machine_shutdown()
> caused an error:
> 
> ...
> if (cpu != smp_processor_id())
> cpu_down(cpu);
> }
> }
> #elif defined(CONFIG_SMP)
> smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0); <=> #endif
> 
> 'image' is undefined in the code. Was it a global? Where was it
> declared?
> 
> Thanks,
> - jay
> 
  Hi, can you try if it works with CONFIG_HOTPLUG_CPU enabled?
  Thanks
  Zou Nan hai
> 
> Zou, Nanhai wrote:
> >>-----Original Message-----
> >>From: Horms [mailto:horms@verge.net.au]
> >>Sent: 2006年6月26日 15:47
> >>To: Zou, Nanhai
> >>Cc: Linux-IA64; khalid_aziz@hp.com; fastboot@lists.osdl.org
> >>Subject: Re: [Fastboot] Ia64 kdump patch
> >>
> >>On Fri, Jun 09, 2006 at 06:47:59AM +0800, Zou Nan hai wrote:
> >>    
> >>>On Thu, 2006-06-08 at 16:35, Horms wrote:
> >>>      
> >>>>On Thu, Jun 08, 2006 at 06:48:23AM +0800, Zou Nan hai wrote:
> >>>>        
> >>>>>The ia64 kdump patch is in 2 parts.
> >>>>>
> >>>>>the kexec-kdump-ia64-2.6.16.patch should apply on top of the previous
> >>>>>kexec patch by Khalid in Tony's test tree.
> >>>>>
> >>>>>the kexec-tools-kdump-ia64.patch should apply to kexec-tools-1.101
> >>>>>with kexec-tools-1.101-kdump.patch
> >>>>>
> >>>>>
> >>>>>To test it.
> >>>>>Build first SMP kernel with KEXEC and KDUMP enabled.
> >>>>>
> >>>>>Boot it with kernel parameter "crashkernel=XXX@YYY"
> >>>>>means reserver XXX from YYY for crashdumping.
> >>>>>Build an UP kernel with KEXEC KDUMP VMCORE enabled.
> >>>>>load this kernel as a crashdumping kernel
> >>>>>kexec -p vmlinux.gz --initrd=initrd --append="...."
> >>>>>
> >>>>>trigger a crash,
> >>>>>maybe "echo c > /proc/sysrq-trigger"
> >>>>>after the crash kernel boots,
> >>>>>cp /proc/vmcore core
> >>>>>
> >>>>>gdb first_kernel_vmlinux core
> >>>>>
> >>>>>please test and review.
> >>>>>
> >>>>>Signed-off-by: Khalid Aziz <khalid_aziz@hp.com>
> >>>>>Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
> >>>>>          
> >>>>Hi,
> >>>>
> >>>>I'm very excited to be able to play with the new version of this patch,
> >>>>but the version you posted seems to included include all the kexec patch
> >>>>that went into Tony Luck's tree. Here is a rediff relative to the
> >>>>existing kexec patch (no other changes).
> >>>>
> >>>>The code does seem to be working for me. The main difficulty so far
> >>>>seems to have been finding an appropriate place and size and place for
> >>>>the reserved area. 128M@256M seems to work for me, offering enough
> >>>>memory and not lie on a resource boundry for me.
> >>>>
> >>>>Lastly, is it possible for you to comment on what areas of concern
> >>>>you have with regards to kdump/kexec on ia64. I am looking to port this
> >>>>code to xen, as my colleague Magnus Damm and I have already done so for
> >>>>        
> >>i386
> >>    
> >>>>(complete) and x86_64 (almost complete).
> >>>>
> >>>>
> >>>>        
> >>http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01272.html
> >>    
> >>>>Signed-Off-By: Horms <horms@verge.net.au>
> >>>>
> >>>>        
> >>> Thanks for testing and review.
> >>>
> >>> There is still a lot of work to do for ia64 Kdump to be a very useful
> >>>and robust feature.
> >>>
> >>> Major issues.
> >>> 1. Full percpu dumping on INIT.
> >>>    You may notices I only send an IPI to user CPUs and dump part of
> >>>registers for crashing CPU.Just stop other CPUs, not dumping their
> >>>status. This is only a temp hack.
> >>> On other platforms they did this by an NMI, on IA64 we should use INIT
> >>>to acknowledge other CPUs. And I know on some platform there is a
> >>>trigger on panel can trigger INIT. We could use that to dump at the time
> >>>of deadlock. But currently INIT is used by MCA, we need to find a way to
> >>>coordinate with MAC on INIT.
> >>>
> >>> 2. unwind section is missing in vmcore.
> >>>    When you do a readelf on vmcore, you may notice there is no unwind
> >>>sections. We should add this percpu stack unwind sections to help dump
> >>>filter tools to analize the core dump.
> >>>
> >>> 3. kdump path at crash time.
> >>>    Currently I still have to do a irq->end on each level triggered irq,
> >>>without that the MPT fusion driver can not restart. We should fix this,
> >>>at least do that in a way of not touching any memory in previous kernel.
> >>>
> >>> 4. Other than this, we need port the dump filter to IA64.
> >>>
> >>>There are still some minor issues.
> >>>e.g
> >>>  When I get a crash when X is active, the new kernel will startup in a
> >>>blank screen(network is still working). I have indeed do a brute force
> >>>VGA reset on in purgatory code. But that seems to only shutdown the VGA
> >>>but not reinit it if X is running.
> >>>
> >>>  Current kexec can't not run on a kexec'd kernel, that is because the
> >>>memory region of EFI memmap is not reserverd in /proc/iomem, I will sent
> >>>a patch to reserve that region later.
> >>>
> >>>There should be other issues and gaps need to find out.
> >>>      
> >>Thanks for that list, it is very useful to me. I hope that I can
> >>find some time to help with some of those problems.
> >>
> >>One thing that I am puzzling over is why you shutdown the PCI devices
> >>as part of machine_crash_shutdown(). As I am trying to port your code
> >>to xen this is quite a problem for me, as I'm not sure that Xen
> >>actually knows enough about PCI to do this. Its it a problem relating
> >>to bringing the devices back online after a reboot? Is it the MPT fusion
> >>problem you mention above?
> >>
> >>    
> >  The list is a bit wrong.., I notice that we don't need to dump unwind segment to core file for stack unwind to work... I am working on full register dumping and fixing the stack unwind issue.
> >
> > The PCI device shutdown code was to un-master all the PCI devices so that no DMA transaction will be issued by Device. However I think maybe we can remove this code because the new kernel memory space is invisible to first kernel.
> >
> >There is another problem that I call irq->end for each devices, it is not safe to touch any pointer belong to previous kernel at the crash time.
> >But without this code, MPT fusion driver is very likely unable to restart. It sometimes failed to restart even with the irq->end code. This is an open issue need to be fixed.
> >
> >Thanks
> >Zou Nan hai
> >
> >  
> >>Horms
> >>H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/
> >>    
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (8 preceding siblings ...)
  2006-07-27 21:23 ` Zou Nan hai
@ 2006-07-27 21:41 ` Jay Lan
  2006-08-04  1:47 ` Jay Lan
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Jay Lan @ 2006-07-27 21:41 UTC (permalink / raw)
  To: linux-ia64

Hi,

I applied the patch to 2.6.18-rc2. However, compilation failed
at machine_shutdown() of arch/ia64/kernel/machine_kexec.c on
an sn2 machine.

It was easy to figure out irq_descp() is gone and idesc->handle
is replaced with idesc->chip. But this code in machine_shutdown()
caused an error:

...
if (cpu != smp_processor_id())
cpu_down(cpu);
}
}
#elif defined(CONFIG_SMP)
smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0); <=#endif

'image' is undefined in the code. Was it a global? Where was it
declared?

Thanks,
- jay


Zou, Nanhai wrote:
>>-----Original Message-----
>>From: Horms [mailto:horms@verge.net.au]
>>Sent: 2006Äê6ÔÂ26ÈÕ 15:47
>>To: Zou, Nanhai
>>Cc: Linux-IA64; khalid_aziz@hp.com; fastboot@lists.osdl.org
>>Subject: Re: [Fastboot] Ia64 kdump patch
>>
>>On Fri, Jun 09, 2006 at 06:47:59AM +0800, Zou Nan hai wrote:
>>    
>>>On Thu, 2006-06-08 at 16:35, Horms wrote:
>>>      
>>>>On Thu, Jun 08, 2006 at 06:48:23AM +0800, Zou Nan hai wrote:
>>>>        
>>>>>The ia64 kdump patch is in 2 parts.
>>>>>
>>>>>the kexec-kdump-ia64-2.6.16.patch should apply on top of the previous
>>>>>kexec patch by Khalid in Tony's test tree.
>>>>>
>>>>>the kexec-tools-kdump-ia64.patch should apply to kexec-tools-1.101
>>>>>with kexec-tools-1.101-kdump.patch
>>>>>
>>>>>
>>>>>To test it.
>>>>>Build first SMP kernel with KEXEC and KDUMP enabled.
>>>>>
>>>>>Boot it with kernel parameter "crashkernel=XXX@YYY"
>>>>>means reserver XXX from YYY for crashdumping.
>>>>>Build an UP kernel with KEXEC KDUMP VMCORE enabled.
>>>>>load this kernel as a crashdumping kernel
>>>>>kexec -p vmlinux.gz --initrd=initrd --append="...."
>>>>>
>>>>>trigger a crash,
>>>>>maybe "echo c > /proc/sysrq-trigger"
>>>>>after the crash kernel boots,
>>>>>cp /proc/vmcore core
>>>>>
>>>>>gdb first_kernel_vmlinux core
>>>>>
>>>>>please test and review.
>>>>>
>>>>>Signed-off-by: Khalid Aziz <khalid_aziz@hp.com>
>>>>>Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
>>>>>          
>>>>Hi,
>>>>
>>>>I'm very excited to be able to play with the new version of this patch,
>>>>but the version you posted seems to included include all the kexec patch
>>>>that went into Tony Luck's tree. Here is a rediff relative to the
>>>>existing kexec patch (no other changes).
>>>>
>>>>The code does seem to be working for me. The main difficulty so far
>>>>seems to have been finding an appropriate place and size and place for
>>>>the reserved area. 128M@256M seems to work for me, offering enough
>>>>memory and not lie on a resource boundry for me.
>>>>
>>>>Lastly, is it possible for you to comment on what areas of concern
>>>>you have with regards to kdump/kexec on ia64. I am looking to port this
>>>>code to xen, as my colleague Magnus Damm and I have already done so for
>>>>        
>>i386
>>    
>>>>(complete) and x86_64 (almost complete).
>>>>
>>>>
>>>>        
>>http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01272.html
>>    
>>>>Signed-Off-By: Horms <horms@verge.net.au>
>>>>
>>>>        
>>> Thanks for testing and review.
>>>
>>> There is still a lot of work to do for ia64 Kdump to be a very useful
>>>and robust feature.
>>>
>>> Major issues.
>>> 1. Full percpu dumping on INIT.
>>>    You may notices I only send an IPI to user CPUs and dump part of
>>>registers for crashing CPU.Just stop other CPUs, not dumping their
>>>status. This is only a temp hack.
>>> On other platforms they did this by an NMI, on IA64 we should use INIT
>>>to acknowledge other CPUs. And I know on some platform there is a
>>>trigger on panel can trigger INIT. We could use that to dump at the time
>>>of deadlock. But currently INIT is used by MCA, we need to find a way to
>>>coordinate with MAC on INIT.
>>>
>>> 2. unwind section is missing in vmcore.
>>>    When you do a readelf on vmcore, you may notice there is no unwind
>>>sections. We should add this percpu stack unwind sections to help dump
>>>filter tools to analize the core dump.
>>>
>>> 3. kdump path at crash time.
>>>    Currently I still have to do a irq->end on each level triggered irq,
>>>without that the MPT fusion driver can not restart. We should fix this,
>>>at least do that in a way of not touching any memory in previous kernel.
>>>
>>> 4. Other than this, we need port the dump filter to IA64.
>>>
>>>There are still some minor issues.
>>>e.g
>>>  When I get a crash when X is active, the new kernel will startup in a
>>>blank screen(network is still working). I have indeed do a brute force
>>>VGA reset on in purgatory code. But that seems to only shutdown the VGA
>>>but not reinit it if X is running.
>>>
>>>  Current kexec can't not run on a kexec'd kernel, that is because the
>>>memory region of EFI memmap is not reserverd in /proc/iomem, I will sent
>>>a patch to reserve that region later.
>>>
>>>There should be other issues and gaps need to find out.
>>>      
>>Thanks for that list, it is very useful to me. I hope that I can
>>find some time to help with some of those problems.
>>
>>One thing that I am puzzling over is why you shutdown the PCI devices
>>as part of machine_crash_shutdown(). As I am trying to port your code
>>to xen this is quite a problem for me, as I'm not sure that Xen
>>actually knows enough about PCI to do this. Its it a problem relating
>>to bringing the devices back online after a reboot? Is it the MPT fusion
>>problem you mention above?
>>
>>    
>  The list is a bit wrong.., I notice that we don't need to dump unwind segment to core file for stack unwind to work... I am working on full register dumping and fixing the stack unwind issue.
>
> The PCI device shutdown code was to un-master all the PCI devices so that no DMA transaction will be issued by Device. However I think maybe we can remove this code because the new kernel memory space is invisible to first kernel.
>
>There is another problem that I call irq->end for each devices, it is not safe to touch any pointer belong to previous kernel at the crash time.
>But without this code, MPT fusion driver is very likely unable to restart. It sometimes failed to restart even with the irq->end code. This is an open issue need to be fixed.
>
>Thanks
>Zou Nan hai
>
>  
>>Horms
>>H: http://www.vergenet.net/~horms/          W: http://www.valinux.co.jp/en/
>>    
>-
>To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>  


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (9 preceding siblings ...)
  2006-07-27 21:41 ` Jay Lan
@ 2006-08-04  1:47 ` Jay Lan
  2006-08-04  2:06 ` Zou, Nanhai
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Jay Lan @ 2006-08-04  1:47 UTC (permalink / raw)
  To: linux-ia64

Zou Nan hai wrote:
> On Fri, 2006-07-28 at 05:41, Jay Lan wrote:
> 
>>Hi,
>>
>>I applied the patch to 2.6.18-rc2. However, compilation failed
>>at machine_shutdown() of arch/ia64/kernel/machine_kexec.c on
>>an sn2 machine.
>>
>>It was easy to figure out irq_descp() is gone and idesc->handle
>>is replaced with idesc->chip. But this code in machine_shutdown()
>>caused an error:
>>
>>...
>>if (cpu != smp_processor_id())
>>cpu_down(cpu);
>>}
>>}
>>#elif defined(CONFIG_SMP)
>>smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0); <=>>#endif
>>
>>'image' is undefined in the code. Was it a global? Where was it
>>declared?
>>
>>Thanks,
>>- jay
>>
> 
>   Hi, can you try if it works with CONFIG_HOTPLUG_CPU enabled?
>   Thanks
>   Zou Nan hai

Sorry i have been pulled away on something else...
With CONFIG_HOTPLUG_CPU enabled, the problem code would not get
compiled, thus i was able to create a kernel rpm. Unfortunately
i failed to boot the sn2 kernel...

So i switched to my reference box, an ia64 HP zx box. It booted
up OK (tiger defconfig). Then i got stuck on a stupid quesiton:
I was not able to run the 'kexec' command to load the kernel
specified in the '-p' parameter. What kernel was i supposed
to specify there? The one i built? Or, a reqular kernel that
proves to work?

Thanks,
  - jay

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (10 preceding siblings ...)
  2006-08-04  1:47 ` Jay Lan
@ 2006-08-04  2:06 ` Zou, Nanhai
  2006-08-04  2:08 ` Zou, Nanhai
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Zou, Nanhai @ 2006-08-04  2:06 UTC (permalink / raw)
  To: linux-ia64



> -----Original Message-----
> From: Jay Lan [mailto:jlan@sgi.com]
> Sent: 2006Äê8ÔÂ4ÈÕ 9:47
> To: Zou, Nanhai
> Cc: Jay Lan; fastboot@lists.osdl.org; Horms; Linux-IA64; khalid_aziz@hp.com
> Subject: Re: [Fastboot] Ia64 kdump patch
> 
> Zou Nan hai wrote:
> > On Fri, 2006-07-28 at 05:41, Jay Lan wrote:
> >
> >>Hi,
> >>
> >>I applied the patch to 2.6.18-rc2. However, compilation failed
> >>at machine_shutdown() of arch/ia64/kernel/machine_kexec.c on
> >>an sn2 machine.
> >>
> >>It was easy to figure out irq_descp() is gone and idesc->handle
> >>is replaced with idesc->chip. But this code in machine_shutdown()
> >>caused an error:
> >>
> >>...
> >>if (cpu != smp_processor_id())
> >>cpu_down(cpu);
> >>}
> >>}
> >>#elif defined(CONFIG_SMP)
> >>smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0); <=> >>#endif
> >>
> >>'image' is undefined in the code. Was it a global? Where was it
> >>declared?
> >>
> >>Thanks,
> >>- jay
> >>
> >
> >   Hi, can you try if it works with CONFIG_HOTPLUG_CPU enabled?
> >   Thanks
> >   Zou Nan hai
> 
> Sorry i have been pulled away on something else...
> With CONFIG_HOTPLUG_CPU enabled, the problem code would not get
> compiled, thus i was able to create a kernel rpm. Unfortunately
> i failed to boot the sn2 kernel...
> 
> So i switched to my reference box, an ia64 HP zx box. It booted
> up OK (tiger defconfig). Then i got stuck on a stupid quesiton:
> I was not able to run the 'kexec' command to load the kernel
> specified in the '-p' parameter. What kernel was i supposed
> to specify there? The one i built? Or, a reqular kernel that
> proves to work?
> 
 It is possible to use the same kernel as the crash dumping kernel, 
 You should reserve a certain amount of RAM for it by pass a crashkernel=xxxM@yyyM kernel parameter means " reserved xxxM from yyyM for crash dumping kernel", 
Here you should align yyy to 64M.
 Then you will see the "Crash kernel" region in /proc/iomem after first kernel boots.

Thanks
Zou Nan hai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (11 preceding siblings ...)
  2006-08-04  2:06 ` Zou, Nanhai
@ 2006-08-04  2:08 ` Zou, Nanhai
  2006-08-10 19:28 ` Jay Lan
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Zou, Nanhai @ 2006-08-04  2:08 UTC (permalink / raw)
  To: linux-ia64



> -----Original Message-----
> From: Jay Lan [mailto:jlan@sgi.com]
> Sent: 2006Äê8ÔÂ4ÈÕ 9:47
> To: Zou, Nanhai
> Cc: Jay Lan; fastboot@lists.osdl.org; Horms; Linux-IA64; khalid_aziz@hp.com
> Subject: Re: [Fastboot] Ia64 kdump patch
> 
> Zou Nan hai wrote:
> > On Fri, 2006-07-28 at 05:41, Jay Lan wrote:
> >
> >>Hi,
> >>
> >>I applied the patch to 2.6.18-rc2. However, compilation failed
> >>at machine_shutdown() of arch/ia64/kernel/machine_kexec.c on
> >>an sn2 machine.
> >>
> >>It was easy to figure out irq_descp() is gone and idesc->handle
> >>is replaced with idesc->chip. But this code in machine_shutdown()
> >>caused an error:
> >>
> >>...
> >>if (cpu != smp_processor_id())
> >>cpu_down(cpu);
> >>}
> >>}
> >>#elif defined(CONFIG_SMP)
> >>smp_call_function(kexec_stop_this_cpu, (void *)image->start, 0, 0); <=> >>#endif
> >>
> >>'image' is undefined in the code. Was it a global? Where was it
> >>declared?
> >>
> >>Thanks,
> >>- jay
> >>
> >
> >   Hi, can you try if it works with CONFIG_HOTPLUG_CPU enabled?
> >   Thanks
> >   Zou Nan hai
> 
> Sorry i have been pulled away on something else...
> With CONFIG_HOTPLUG_CPU enabled, the problem code would not get
> compiled, thus i was able to create a kernel rpm. Unfortunately
> i failed to boot the sn2 kernel...
> 
> So i switched to my reference box, an ia64 HP zx box. It booted
> up OK (tiger defconfig). Then i got stuck on a stupid quesiton:
> I was not able to run the 'kexec' command to load the kernel
> specified in the '-p' parameter. What kernel was i supposed
> to specify there? The one i built? Or, a reqular kernel that
> proves to work?
> 
> Thanks,
>   - jay
 Forget to mention,
 If you are using same kernel as first and crash kernel, you'd better pass an additional 
 "maxcpus=1" kernel parameter to the second kernel.

Thanks
Zou Nan hai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (12 preceding siblings ...)
  2006-08-04  2:08 ` Zou, Nanhai
@ 2006-08-10 19:28 ` Jay Lan
  2006-08-10 19:58 ` Jay Lan
  2006-08-10 20:11 ` Jay Lan
  15 siblings, 0 replies; 17+ messages in thread
From: Jay Lan @ 2006-08-10 19:28 UTC (permalink / raw)
  To: linux-ia64

Hi Nanhai and horms,

Zou, Nanhai wrote:
> 
[snip]
>  Forget to mention,
>  If you are using same kernel as first and crash kernel, you'd better pass an additional 
>  "maxcpus=1" kernel parameter to the second kernel.

The "maxpus=1' did help. Without it, the kernel went into double panic.
However, with "maxcpus=1" i saw crash dump and Call Trace...

holism.engr.sgi.com login: SysRq : Trigger a crashdump 
         kernel BUG at kernel/irq/migration.c:39! 
                  bash[3213]: bugcheck! 0 [1] 
                           Modules linked in: radeon drm agpgart nfs 
lockd sunrpc binfmt_misc dm_mirror dmi 
                                               Pid: 3213, CPU 1, comm: 
                bash                                    psr : 
00001010085a2010 ifs : 800000000000038b ip  : [<a0000001000e1c00>] 
Notdip is at move_native_irq+0x1a0/0x360 
             unat: 0000000000000000 pfs : 000000000000038b rsc : 
0000000000000000            rnat: d6304e388f73eebc bsps: 
9d40fde11cbbf11b pr  : 0000000000596599            ldrs: 
0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f 
   csd : 0000000000000000 ssd : 0000000000000000 
            b0  : a0000001000e1c00 b6  : a000000100036060 b7  : 
a000000100010150            f6  : 1003e0000006c1c8a11a7 f7  : 
1003e0000000000000514                         f8  : 
1003e0000006c1c8a0c93 f9  : 1003e0000000000000001 
   f10 : 0fffd9999999996900000 f11 : 1003e0000000000000000 
            r1  : a000000100b66020 r2  : 000000000000048d r3  : 
a0000001009665b8            r8  : 000000000000002c r9  : 
0000000000003446 r10 : a00000010097d1f8            r11 : 
0000000000000000 r12 : e00000001a62f7e0 r13 : e00000001a628000 
   r14 : 0000000000004000 r15 : a00000010097d200 r16 : 0000000000000000 
            r17 : 0000000000004000 r18 : 0000000000000060 r19 : 
0000000000004000            r20 : a00000010097c430 r21 : 
0000000000000000 r22 : a00000010096a4b8            r23 : 
a00000010097d210 r24 : a00000010097d1f0 r25 : a000000100910c88 
   r26 : a000000100910c88 r27 : a000000100966270 r28 : 0000000000000034 
            r29 : 0000000000000034 r30 : 0000000000000000 r31 : 
a00000010097d1cc 
                                     Call Trace: 
                                               [<a000000100013880>] 
show_stack+0x40/0xa0 
                   spà0000001a62f370 bspà0000001a6294b8 
[<a0000001000144e0>] show_regs+0x840/0x880 
                                        spà0000001a62f540 
bspà0000001a629460         [<a0000001000363a0>] die+0x1c0/0x2e0 
 
spà0000001a62f540 bspà0000001a629418         [<a000000100036510>] 
die_if_kernel+0x50/0x80 
                   spà0000001a62f560 bspà0000001a6293e0 
[<a000000100037c10>] ia64_bad_break+0x270/0x4a0 
                                        spà0000001a62f560 
bspà0000001a6293b8         [<a00000010000c320>] 
ia64_leave_kernel+0x0/0x280 
                   spà0000001a62f610 bspà0000001a6293b8 
[<a0000001000e1c00>] move_native_irq+0x1a0/0x360 
                                        spà0000001a62f7e0 
bspà0000001a629360         [<a00000010004ecb0>] 
iosapic_end_level_irq+0x30/0xe0 
                   spà0000001a62f7e0 bspà0000001a629340 
[<a00000010005d620>] machine_crash_shutdown+0x5c0/0x680 
                                        spà0000001a62f7e0 
bspà0000001a629300         [<a0000001000d4fd0>] crash_kexec+0x70/0xc0 
 
spà0000001a62fc60 bspà0000001a6292e0         [<a00000010045dfc0>] 
sysrq_handle_crashdump+0x20/0x40 
                   spà0000001a62fe20 bspà0000001a6292b8 
[<a00000010045e1a0>] __handle_sysrq+0x160/0x300 
                                        spà0000001a62fe20 
bspà0000001a629268         [<a0000001001b9770>] 
write_sysrq_trigger+0xb0/0xe0 
                   spà0000001a62fe20 bspà0000001a629238 
[<a00000010012f1d0>] vfs_write+0x1b0/0x340 
                                        spà0000001a62fe20 
bspà0000001a6291e0         [<a00000010012fe50>] sys_write+0x70/0xe0 
 
spà0000001a62fe20 bspà0000001a629168         [<a00000010000c180>] 
ia64_ret_from_syscall+0x0/0x20 
                   spà0000001a62fe30 bspà0000001a629168 
[<a000000000010620>] __kernel_syscall_via_break+0x0/0x20 
                                        spà0000001a630000 
bspà0000001a629168 
                                     Fedora Core release 5 (Bordeaux) 
                                              Kernel 2.6.18-rc2 on an 
ia64 
 
holism.engr.sgi.com login:

However, the second kernel was not booted and system remained
alive.

The /proc/{sysrq-trigger,iomem} after the trigger were as below:
(holism,9) ls -l /proc/{sysrq-trigger,iomem}
-r--r--r-- 1 root root 0 Aug  9 08:10 /proc/iomem
--w------- 1 root root 0 Aug  9 08:10 /proc/sysrq-trigger
(holism,10)

I built the kernel based on 2.6.18-rc2 with Nanhai's
kexec-kdump-ia64-2.6.16.patch and my fixes to
   - replace  irq_descp(dev->irq) with irq_desc + dev->irq
   - replace  desc->handle with desc-chip
for compilation.

As to kexec-tools, i use kdump9 with Nanhai's
kexec-tools-kdump9-ia64-zou.patch.

What did i miss here? Do i miss any patch(es)?
My machine is a HP zx6000 loaded with FC5.

Thanks!
  - jay

> 
> Thanks
> Zou Nan hai

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (13 preceding siblings ...)
  2006-08-10 19:28 ` Jay Lan
@ 2006-08-10 19:58 ` Jay Lan
  2006-08-10 20:11 ` Jay Lan
  15 siblings, 0 replies; 17+ messages in thread
From: Jay Lan @ 2006-08-10 19:58 UTC (permalink / raw)
  To: linux-ia64

Jay Lan wrote:
> Hi Nanhai and horms,
> 

[snip]

> However, the second kernel was not booted and system remained
> alive.
> 
> The /proc/{sysrq-trigger,iomem} after the trigger were as below:
> (holism,9) ls -l /proc/{sysrq-trigger,iomem}
> -r--r--r-- 1 root root 0 Aug  9 08:10 /proc/iomem
> --w------- 1 root root 0 Aug  9 08:10 /proc/sysrq-trigger
> (holism,10)
> 
> I built the kernel based on 2.6.18-rc2 with Nanhai's
> kexec-kdump-ia64-2.6.16.patch and my fixes to
>   - replace  irq_descp(dev->irq) with irq_desc + dev->irq
>   - replace  desc->handle with desc-chip
> for compilation.
> 
> As to kexec-tools, i use kdump9 with Nanhai's
> kexec-tools-kdump9-ia64-zou.patch.

Hi,

I forgot to mention that i ran kexec with these parameters:
[root@holism redhat]# kexec -p <dump-capture-kernel> \
  --initrd=,initrd-for-dump-capture-kernel> \
  --append="root=LABEL=/ init 1 irqpoll console=ttyS0,38400n8 maxcpus=1"
[root@holism redhat]#

The command was executed without an error or warning.

Thanks,
  - jay


> What did i miss here? Do i miss any patch(es)?
> My machine is a HP zx6000 loaded with FC5.
> 
> Thanks!
>  - jay
> 
>>
>> Thanks
>> Zou Nan hai


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Fastboot] Ia64 kdump patch
  2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
                   ` (14 preceding siblings ...)
  2006-08-10 19:58 ` Jay Lan
@ 2006-08-10 20:11 ` Jay Lan
  15 siblings, 0 replies; 17+ messages in thread
From: Jay Lan @ 2006-08-10 20:11 UTC (permalink / raw)
  To: linux-ia64

Jay Lan wrote:
> Jay Lan wrote:
> 
>> Hi Nanhai and horms,
>>
> 
> [snip]
> 
>> However, the second kernel was not booted and system remained
>> alive.
>>
>> The /proc/{sysrq-trigger,iomem} after the trigger were as below:
>> (holism,9) ls -l /proc/{sysrq-trigger,iomem}
>> -r--r--r-- 1 root root 0 Aug  9 08:10 /proc/iomem
>> --w------- 1 root root 0 Aug  9 08:10 /proc/sysrq-trigger
>> (holism,10)
>>
>> I built the kernel based on 2.6.18-rc2 with Nanhai's
>> kexec-kdump-ia64-2.6.16.patch and my fixes to
>>   - replace  irq_descp(dev->irq) with irq_desc + dev->irq
>>   - replace  desc->handle with desc-chip
>> for compilation.
>>
>> As to kexec-tools, i use kdump9 with Nanhai's
>> kexec-tools-kdump9-ia64-zou.patch.
> 
> 
> Hi,
> 
> I forgot to mention that i ran kexec with these parameters:
> [root@holism redhat]# kexec -p <dump-capture-kernel> \
>  --initrd=,initrd-for-dump-capture-kernel> \
>  --append="root=LABEL=/ init 1 irqpoll console=ttyS0,38400n8 maxcpus=1"
> [root@holism redhat]#
> 
> The command was executed without an error or warning.

And, the first kernel was booted with this append:
append="rhgb quiet root=LABEL=/ console=ttyS0,38400n8 \
  crashkernel\x128M@256M"

I should have provided all these info at my 1st email :(

Regards,
  - jay

> 
> Thanks,
>  - jay
> 
> 
>> What did i miss here? Do i miss any patch(es)?
>> My machine is a HP zx6000 loaded with FC5.
>>
>> Thanks!
>>  - jay
>>
>>>
>>> Thanks
>>> Zou Nan hai
> 
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2006-08-10 20:11 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-08  8:35 [Fastboot] Ia64 kdump patch Horms
2006-06-08 22:47 ` Zou Nan hai
2006-06-12  0:16 ` Zou Nan hai
2006-06-12  1:50 ` Takao Indoh
2006-06-14 23:30 ` Luck, Tony
2006-06-26  7:47 ` Horms
2006-06-26  8:10 ` Zou, Nanhai
2006-06-26  8:37 ` Horms
2006-06-26  8:49 ` Zou, Nanhai
2006-07-27 21:23 ` Zou Nan hai
2006-07-27 21:41 ` Jay Lan
2006-08-04  1:47 ` Jay Lan
2006-08-04  2:06 ` Zou, Nanhai
2006-08-04  2:08 ` Zou, Nanhai
2006-08-10 19:28 ` Jay Lan
2006-08-10 19:58 ` Jay Lan
2006-08-10 20:11 ` Jay Lan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox